01 - Summit Data Receipt and Ingest SA Process

Related Pagelink
Composite Application01 - Summit Data Management


Scope

The scope of the Summit Data Reception and Ingest process begins with the responsibility for verifiably receiving data from the summit. The two defined types of data that are being received from the summit (i.e., Science Data and Transfer Manifests) are then routed to their appropriate ingest pipeline and stored in their appropriate data store. Failures, such as those occurring from schema validation or file categorization, are routed to a failure management process for triage and remediation.


Goals

Goals
Verify data received and notify summit of result
Route Science Data 
Route Transfer Manifest data
Receive data sent from the Summit to the Data Center
Make valid Science Data received at the Data Center searchable by a DKIST Authorized Agent within 10 days
Ingest Science Data
Ingest Transfer Manifest Data
Facilitate validation of stored data integrity
Facilitate management of failed ingest(s)

Key Concept: Inventory

Science Data is big both in volume and count. The data storage is broken apart similar to a warehouse, where the data frames are stored in various places and their location record in inventory for later retrieval. The inventory component enables discovery and select metadata analysis without the overhead of full frame retrieval.


Key Concept: Routing

Routing is determined by the path of the file/object after it has been verifiably received. The existence of certain keywords (identified below) will result in the appropriate ingest code storing the transferred file:

Transferred File PathRouted Process
(date)/control/

Transfer Manifest Ingest

(date)/transfer/Science Data Ingest
(date)/camera/Science Data Ingest
*anything else*Ingest Failure Management

Data Center Receipt

The Data Center Receipt process is responsible for verifiably receiving data from the summit, categorizing the data received based upon routing rules and sending the received data to the appropriate ingest process.


Science Data Ingest

The Science Data Ingest process is responsible for storing Science Data and inventorying its stored location. The Science Data headers are validated against a schema as defined by SPEC-122, raising an ingest error if validation fails. The Frame headers are appended with select process management metadata to aid in inventory regeneration (if necessary). Any keys that are renamed in SPEC-214 are also added under their renamed key. The Frame (along with its appended headers) is then stored in the object store under a "folder" named after its "observingProgramExecutionId," and an inventory record is created which includes all of the Frame's headers plus the location of the file and the date. Lastly a counter is incremented to track how many files have been received by the folder now named "observingProgramExecutionId."



Transfer Manifest Ingest

The Transfer Manifest Ingest process is responsible for storing the expected frame counts by the "observingProgramExecutionId" name. Ingest errors arise from schema validation issues.



Ingest Failure Management

The Ingest Failure Management process is responsible for triaging failures that occur during categorization or ingest of any of the summit data. Operations Support is notified in digest form of the existence of errors requiring attention. The errors are then evaluated and resolved based upon the type of data and its failure.