01 - Summit Data Receipt and Ingest SA Process
Related Page | link |
---|---|
Composite Application | 01 - Summit Data Management |
Scope
The scope of the Summit Data Reception and Ingest process begins with the responsibility for verifiably receiving data from the summit. The two defined types of data that are being received from the summit (i.e., Science Data and Transfer Manifests) are then routed to their appropriate ingest pipeline and stored in their appropriate data store. Failures, such as those occurring from schema validation or file categorization, are routed to a failure management process for triage and remediation.
Goals
Goals |
---|
Verify data received and notify summit of result |
Route Science Data |
Route Transfer Manifest data |
Receive data sent from the Summit to the Data Center |
Make valid Science Data received at the Data Center searchable by a DKIST Authorized Agent within 10 days |
Ingest Science Data |
Ingest Transfer Manifest Data |
Facilitate validation of stored data integrity |
Facilitate management of failed ingest(s) |
Key Concept: Inventory
Science Data is big both in volume and count. The data storage is broken apart similar to a warehouse, where the data frames are stored in various places and their location record in inventory for later retrieval. The inventory component enables discovery and select metadata analysis without the overhead of full frame retrieval.
Key Concept: Routing
Routing is determined by the path of the file/object after it has been verifiably received. The existence of certain keywords (identified below) will result in the appropriate ingest code storing the transferred file:
Transferred File Path | Routed Process |
---|---|
(date)/control/ | Transfer Manifest Ingest |
(date)/transfer/ | Science Data Ingest |
(date)/camera/ | Science Data Ingest |
*anything else* | Ingest Failure Management |
Data Center Receipt
The Data Center Receipt process is responsible for verifiably receiving data from the summit, categorizing the data received based upon routing rules and sending the received data to the appropriate ingest process.
Science Data Ingest
The Science Data Ingest process is responsible for storing Science Data and inventorying its stored location. The Science Data headers are validated against a schema as defined by SPEC-122, raising an ingest error if validation fails. The Frame headers are appended with select process management metadata to aid in inventory regeneration (if necessary). Any keys that are renamed in SPEC-214 are also added under their renamed key. The Frame (along with its appended headers) is then stored in the object store under a "folder" named after its "observingProgramExecutionId," and an inventory record is created which includes all of the Frame's headers plus the location of the file and the date. Lastly a counter is incremented to track how many files have been received by the folder now named "observingProgramExecutionId."
Transfer Manifest Ingest
The Transfer Manifest Ingest process is responsible for storing the expected frame counts by the "observingProgramExecutionId" name. Ingest errors arise from schema validation issues.
Ingest Failure Management
The Ingest Failure Management process is responsible for triaging failures that occur during categorization or ingest of any of the summit data. Operations Support is notified in digest form of the existence of errors requiring attention. The errors are then evaluated and resolved based upon the type of data and its failure.