03 - Data Processing Management

Design Diagram

The Data Processing Management Composite Application supports the 03 - Science Data Processing SA Process. This services within this Composite Application can be grouped into 4 functional areas:

Preparing for processing:

The preparation for processing begins with the locking of a Processing Candidate. The Processing Candidate Locker is triggered by the Service Scheduler based upon the schedule defined in the KV store of 01 - System Orchestration (DC Cloud). It then reviews the currently "un-locked" Processing Candidates that were received by the Data Acquisition Support Ingester in 02 - External Data Management, evaluates their age and planned vs. received counts by Observing Program Execution ID, and locks Processing Candidates ready for processing. All of the metadata required for these evaluations are retrieved from the 07 - Metadata Store (01 - Processing Support). The Processing Preparation Worker1 is then used by calibration support personnel to review data in the 07 - Metadata Store (01 - Processing Support + 02 - Object Inventory), create the necessary metadata to begin processing (Recipe + input data) in the 07 - Metadata Store (01 - Processing Support) and finally initialize it's execution, which is called a Recipe Run. This Recipe Run is sent for execution by setting the status to READY, at which point it will eventually be picked up by the recipe-run-submitter based upon priority. Manual processing requests are handled by the Ticket Requester and loaded into the 02 - Ticketing system, while automated processing requests are handled by the Automated Processing Manager

Automated Processing Execution:

Automated processing execution begins with the Automated Processing Manager triggering a DAG as directed by the recipe-run-submitter based upon resource availability. The execution of a task within the DAG is handled by the Automated Processing Worker. The beginning of every DAG updates the status of the Recipe Run, retrieves the record of planned inputs from the 07 - Metadata Store (01 - Processing Support) and triggers the 04 - Automated Processing scratch repository to be loaded with the inputs from "raw" bucket in the 01 - Object Store via the 08 - Transfer Manager. The processing itself records provenance information in the output frames and the 07 - Metadata Store (01 - Processing Support). At the end of processing, the generated outputs are loaded in the "data" bucket in the 01 - Object Store via the 08 - Transfer Manager, a browse movie is generated and loaded in the "data" bucket in the 01 - Object Store via the 08 - Transfer Manager, a quality record is created in the 07 - Metadata Store (03 - Search Support), the 04 - Automated Processing space is cleaned up, the frames and browse movie loaded in the object store are sent for cataloging, and finally, the Recipe Run status is updated. Failures during processing result in a message being created for the Ticket Requester, which generates a manual processing ticket in the 02 - Ticketing system and the 04 - Automated Processing space being cleaned up. 

Manual Processing Execution:

Manual processing execution begins with the receipt of a manual processing ticket in the 02 - Ticketing system. The Manual Processing Worker Provider is then used to carve off a resource from the automated processing pool for the guided/observed execution of a processing job. The same steps performed by the Automated Processing Worker to retrieve processing code, maintain Recipe Run state, retrieve inputs and store output can be executed by the Manual Processing Worker.

Data Cataloging:

The cataloging process is shared by the automated and manual processing execution services. Objects that were loaded in the object store are sent for cataloging via messages on the 06 - Interservice Bus. Frame cataloging is separate from all other objects due to the metadata being harvested from the FITS headers. Concurrently, a quality report is formatted from the quality record and uploaded alongside the dataset frames. When all data that needed to be cataloged for a dataset is cataloged, the Dataset Catalog Locker triggers the creation of an ASDF file for the dataset and subsequently the creation of dataset inventory. Once dataset inventory is created, the browse movie is sent for publishing on youtube and a notification record is created for the proposal.

Footnote 1

The processing preparation worker functionality is fulfilled via a Jupyter Notebook (https://jupyter.org/), which is a mix of code and markup. This code will initially require logic that will be applied by a human. As experience is gained regarding the rules for associating processing recipes and assembling input data, this will be automated by putting flow control code around the already existing functions that support the Jupyter Notebook.



Key Concept: Recipe Run State Diagram

The states of the Recipe Run and their transitions are depicted below. By evaluating the state of Recipe Runs and their related metadata, the Data Center can determine when the work necessary to fully process data identified by the Ops Tools as Processing Candidates is complete.