00 - Data Center
Related page | Link |
|---|---|
High Level Architecture |
Scope:
The scope of Data Center processes and software are depicted in an operational view by the contents of the blue dashed box in the image below. Lines that cross the scope boundary are interfaces to either other systems, such as Ops Tools in orange and Data Handling System for the telescope in maroon, or to different user roles that interact with the Data Center system and processes. Each process box within the Data Center scope represents a Process Area that will be further decomposed in later sections. The interactions between these Process Areas are either direct (e.g., from Search to Distribution) or indirect by way of exchanging data through the different data stores in the Data Center architecture.
Goals |
|---|
Provide an infrastructure capable of calibrating data received from the DKIST summit |
Deliver discovered data in well defined and structured formats |
Enable the discovery of processed data held at the Data Center |
Support the embargo of data based upon proposal |
Facilitate the operational management of the Data Center |
Make processed DKIST data freely available to the public |
Notify PI when their processed data becomes available |
Optimize hardware utilization (processing/storage) to minimize cost |
Support data delivery via network |
Protect science data from loss |
Support the securing of cyber systems through:
|
Serve as the official and accepted repository of DKIST data |
Support the modification and removal of data as directed by the Data Quality Assessment Committee (DQAC) |
Support the user community with open source code, documentation and help desk interaction. |
Support the version tracking of processed science data |
Receive data collected on the summit for curation |
Support raw data use by DKIST instrument scientists |
Capable of receiving and ingesting Operations Tools data necessary for Data Center processing and search functions |
Key Concept: Process Area to Composite Application Mapping
The "Composite Applications" (e.g. Summit Data Management, External Data Management, etc.) are the headings under which the software supporting Data Center processes are described from a system perspective.
Key Concept: Data Stores
Data is central to the processes and supporting software for the Data Center. The data stores of the Data Center are depicted below along with more detail about how they are factored, how they relate to each other, and the technologies which support them.
Data Center Process Rules
Rule | Justification |
|---|---|
Data Center user passwords are required to be "strong." | This is part of the security plan. |
Raw search requests require approval. | Raw data is of questionable external use. Requestor must be deemed able to use the data in order to justify the use of DC resources to assemble the raw data for distribution. More rules about escalation TBD. |
Email addresses must be verified for the account to be used. | We need to know that we can contact a user. |
For a user to be an Authorized Agent, they must have the approval of the DKIST Director. The authorization shall remain in effect until notification from Data Center Scientist. | Operations activities may require the review of data (including proprietary data) to handle calibration and instrument performance issues. These issues would be handled by non-Data Center personnel, and therefore would be required to use the end-user facing functions to retrieve data. |
DKIST Authorized Agents shall not use proprietary data for science. | It is the DKIST Authorized Agent's responsibility to know whether the data they are using for science is proprietary, and if it is, to not use it. |
proprietary data shall only be distributed externally to users when the following criteria is met: | Embargo statuses support PhD students in keeping their data private for a period of time (typically 12 months) so as to not allow interference with their theses. |
No science data younger than 6 months can be removed. | This is a requirement. |
Processed Frames shall be proprietary based upon the proposal associated with a Processing Candidate. Artisanal (i.e., no candidate affiliation) Input Datasets are exempt from proposal embargo rules. | Input Datasets indicate the proposal for the observation's operational run. This is either inherited from the calibration association set's primary operational run or set by the calibration support team when constructing an Artisanal Input Dataset. |
Before the removal or modification of failed science ingest data, the instrument scientist must be consulted. | Science data will potentially be discarded, so a scientist must sign off on the action. |
Data ingested into the Data Center from non-Data Center sources must be provided through an authorized channel. | This is part of the security plan. |
Processed data shall be made discoverable to all users, both registered and unregistered (this excludes proprietary data, as its distribution has been defined differently in a previous rule). | This is part of the data policy. |
Science Data distributed from the Data Center to external destinations must transit an authorized channel. | This is part of the security plan. |
Data Center Process Roles and Responsibilities
Role | Responsibility | Strategy Area | Applicable Process |
|---|---|---|---|
Calibration Support | Evaluate, and Approve or Disapprove Raw Search requests | Ops Support | Service Desk Management - Approval |
Calibration Support | Cancel Recipe Run | Science Data Processing | Process Scheduling |
Calibration Support | Create a dataset review ticket for superseded datasets | Science Data Processing | Reprocessing |
Calibration Support | Create Input Datasets comprised of input frames, calibration parameter values, and primary proposal ID | Science Data Processing | Input Data Assembly |
Calibration Support | Create Recipe Instances and Recipe Runs for reprocessing | Science Data Processing | Reprocessing |
Calibration Support | Create Recipe instances combining applicable Recipes and Input Datasets | Science Data Processing | Process Scheduling - Readiness Evaluation |
Calibration Support | Create Recipe runs based upon Recipe Instances for scheduling | Science Data Processing | Process Scheduling - Readiness Evaluation |
Calibration Support | Create Recipe Specifications | Science Data Processing | Manage Recipes - Create |
Calibration Support | Create work tickets for Calibration Associations that have no applicable Recipe | Science Data Processing | Process Scheduling - Readiness Evaluation |
Calibration Support | Determine Recipe applicability for Calibration Associations that are ready | Science Data Processing | Process Scheduling - Readiness Evaluation |
Calibration Support | Determine superseded datasets due to a planned reprocessing | Science Data Processing | Reprocessing |
Calibration Support | Execute manual Recipe Runs | Science Data Processing | Manual Processing |
Calibration Support | Identify Calibration Associations requiring Input Dataset assembly | Science Data Processing | Input Data Assembly |
Calibration Support | Pause and activate resource scheduling | Science Data Processing | Process Scheduling |
Calibration Support | Update Recipe Specifications | Science Data Processing | Manage Recipes - Update |
Calibration Support | Manage Science Data ingest failures | Summit Data Reception and Ingest | Ingest Failure Management |
Data Center | Send data in a check-pointed manner from a virtual folder to an authorized user endpoint | Distribution | User Download |
Data Center | Execute removal of proprietary proposals | External Data Ingest | Ingest Proprietary Proposals |
Data Center | Execute removal of proposal investigator links | External Data Ingest | Ingest Proposal Investigator Link |
Data Center | Ingest new Calibration Association sets | External Data Ingest | Ingest Data Acquisition Support Data |
Data Center | Ingest new Proprietary proposals | External Data Ingest | Ingest Proprietary Proposals |
Data Center | Ingest new parameter values for existing parameters | External Data Ingest | Ingest Parameter Values |
Data Center | Ingest new proposal investigator links | External Data Ingest | Ingest Proposal Investigator Link |
Data Center | Ingest updated Calibration Association sets | External Data Ingest | Ingest Data Acquisition Support Data |
Data Center | Raise a ticket if a Calibration Association set being updated is already in use | External Data Ingest | Ingest Data Acquisition Support Data |
Data Center | Assign Service Desk tickets to a group based upon ticket type | Ops Support | Service Desk Management - Assign |
Data Center | Audit and record inventory to object accuracy | Ops Support | Data Holding Audit - Inventory to Object |
Data Center | Audit and record object integrity | Ops Support | Data Holding Audit - Object Integrity |
Data Center | Audit and record object to inventory accuracy | Ops Support | Data Holding Audit - Object to Inventory |
Data Center | Generate aggregate events prior to removal of transactional events | Ops Support | Monitoring |
Data Center | Generate alerts based upon event review rules | Ops Support | Monitoring |
Data Center | Notify investigators when newly processed data is available based upon user preference (i.e., if their digest setting is on or off) | Ops Support | Processed Data Notification |
Data Center | Provide password reset instructions to valid email address requests | Ops Support | User Registration - Forgotten password |
Data Center | Record email verification | Ops Support | User Registration - Forgotten password |
Data Center | Regenerate lost Inventory Cards | Ops Support | Data Holding Audit - Object to Inventory |
Data Center | Remove transactional monitoring events greater than 1 year old if space is needed | Ops Support | Monitoring |
Data Center | Request email verification | Ops Support | User Registration - Forgotten password |
Data Center | Retrieve monitoring info from other DC systems | Ops Support | Monitoring |
Data Center | Route Service Desk tickets for approval | Ops Support | Service Desk Management - Create |
Data Center | Create work tickets for Recipe Runs identified as "manual" | Science Data Processing | Process Scheduling |
Data Center | Execute automated Recipe Runs | Science Data Processing | Automated Processing |
Data Center | Flag Calibration Associations as "ready" when all data has been received, or when 13 days from acquisition have passed, whichever comes first | Science Data Processing | Process Scheduling - Readiness Evaluation |
Data Center | Generate a Dataset Inventory record containing aggregate metadata from the associated frames | Science Data Processing | Science Data Ingest |
Data Center | Generate a Frame Inventory record containing header values | Science Data Processing | Science Data Ingest |
Data Center | Incorporate process management header values into the ingested Frame | Science Data Processing | Science Data Ingest |
Data Center | Ingest Processed Data Frames into the Object Store | Science Data Processing | Science Data Ingest |
Data Center | Schedule resources for Recipe Runs identified as "manual" | Science Data Processing | Process Scheduling |
Data Center | Create Frame Inventory records for science data that have been ingested | Summit Data Reception and Ingest | Science Data Ingest |
Data Center | Ingest Science Data | Summit Data Reception and Ingest | Science Data Ingest |
Data Center | Receive data from the Summit | Summit Data Reception and Ingest | Data Center Receipt |
Data Center | Record expected Frame counts by Observing Program Run ID | Summit Data Reception and Ingest | Transfer Manifest Ingest |
Data Center | Record receipt count of science data ingested by Observing Program Run ID | Summit Data Reception and Ingest | Science Data Ingest |
Data Center | Retain Ancillary Data for at least 90 days | Summit Data Reception and Ingest | Ancillary Data Ingest |
Data Center | Route data from the Summit to the appropriate ingest process | Summit Data Reception and Ingest | Data Center Receipt |
Data Center Operations | Create and review system event reports | Ops Support | Monitoring |
Data Center Operations | Create event alert rules | Ops Support | Monitoring |
Data Center Operations | Create Help Tickets | Ops Support | Service Desk Management - Create |
Data Center Operations |