00 - Data Center

00 - Data Center

Related page

Link

Related page

Link

High Level Architecture

00 - High Level System Design

 

Scope:

The scope of Data Center processes and software are depicted in an operational view by the contents of the blue dashed box in the image below. Lines that cross the scope boundary are interfaces to either other systems, such as Ops Tools in orange and Data Handling System for the telescope in maroon, or to different user roles that interact with the Data Center system and processes. Each process box within the Data Center scope represents a Process Area that will be further decomposed in later sections. The interactions between these Process Areas are either direct (e.g., from Search to Distribution) or indirect by way of exchanging data through the different data stores in the Data Center architecture.

 

 

Goals

Goals

Provide an infrastructure capable of calibrating data received from the DKIST summit

Deliver discovered data in well defined and structured formats 

Enable the discovery of processed data held at the Data Center 

Support the embargo of data based upon proposal

Facilitate the operational management of the Data Center

Make processed DKIST data freely available to the public

Notify PI when their processed data becomes available

Optimize hardware utilization (processing/storage) to minimize cost

Support data delivery via network

Protect science data from loss

Support the securing of cyber systems through:

  • Capability to perform auditing

  • Capability to perform anomaly detection

  • Firewall & network security

Serve as the official and accepted repository of DKIST data

Support the modification and removal of data as directed by the Data Quality Assessment Committee (DQAC)

Support the user community with open source code, documentation and help desk interaction.

Support the version tracking of processed science data 

Receive data collected on the summit for curation

Support raw data use by DKIST instrument scientists

Capable of receiving and ingesting Operations Tools data necessary for Data Center processing and search functions

Key Concept: Process Area to Composite Application Mapping

The "Composite Applications" (e.g. Summit Data Management, External Data Management, etc.) are the headings under which the software supporting Data Center processes are described from a system perspective.

 

 

Key Concept: Data Stores

Data is central to the processes and supporting software for the Data Center. The data stores of the Data Center are depicted below along with more detail about how they are factored, how they relate to each other, and the technologies which support them. 

 

Data Center Process Rules

Rule

Justification

Rule

Justification

Data Center user passwords are required to be "strong."

This is part of the security plan. 

Raw search requests require approval.

Raw data is of questionable external use. Requestor must be deemed able to use the data in order to justify the use of DC resources to assemble the raw data for distribution. More rules about escalation TBD.

Email addresses must be verified for the account to be used.

We need to know that we can contact a user.

For a user to be an Authorized Agent, they must have the approval of the DKIST Director. The authorization shall remain in effect until notification from Data Center Scientist.

Operations activities may require the review of data (including proprietary data) to handle calibration and instrument performance issues. These issues would be handled by non-Data Center personnel, and therefore would be required to use the end-user facing functions to retrieve data.

DKIST Authorized Agents shall not use proprietary data for science.

It is the DKIST Authorized Agent's responsibility to know whether the data they are using for science is proprietary, and if it is, to not use it.

proprietary data shall only be distributed externally to users when the following criteria is met:
- the embargo is currently "active"
- the user is an investigator or co-investigator on the proposal associated with the data
- the user is an Authorized Agent of the DKIST

Embargo statuses support PhD students in keeping their data private for a period of time (typically 12 months) so as to not allow interference with their theses.

No science data younger than 6 months can be removed.

This is a requirement.

Processed Frames shall be proprietary based upon the proposal associated with a Processing Candidate. Artisanal (i.e., no candidate affiliation) Input Datasets are exempt from proposal embargo rules.

Input Datasets indicate the proposal for the observation's operational run. This is either inherited from the calibration association set's primary operational run or set by the calibration support team when constructing an Artisanal Input Dataset.

Before the removal or modification of failed science ingest data, the instrument scientist must be consulted.

Science data will potentially be discarded, so a scientist must sign off on the action.

Data ingested into the Data Center from non-Data Center sources must be provided through an authorized channel.

This is part of the security plan. 

Processed data shall be made discoverable to all users, both registered and unregistered (this excludes proprietary data, as its distribution has been defined differently in a previous rule).

This is part of the data policy.

Science Data distributed from the Data Center to external destinations must transit an authorized channel.

This is part of the security plan. 

Data Center Process Roles and Responsibilities

Role

Responsibility

Strategy Area

Applicable Process

Role

Responsibility

Strategy Area

Applicable Process

Calibration Support

Evaluate, and Approve or Disapprove Raw Search requests

Ops Support

Service Desk Management - Approval

Calibration Support

Cancel Recipe Run

Science Data Processing

Process Scheduling 

Calibration Support

Create a dataset review ticket for superseded datasets

Science Data Processing

Reprocessing

Calibration Support

Create Input Datasets comprised of input frames, calibration parameter values, and primary proposal ID

Science Data Processing

Input Data Assembly

Calibration Support

Create Recipe Instances and Recipe Runs for reprocessing

Science Data Processing

Reprocessing

Calibration Support

Create Recipe instances combining applicable Recipes and Input Datasets 

Science Data Processing

Process Scheduling - Readiness Evaluation

Calibration Support

Create Recipe runs based upon Recipe Instances for scheduling

Science Data Processing

Process Scheduling - Readiness Evaluation

Calibration Support

Create Recipe Specifications

Science Data Processing

Manage Recipes - Create

Calibration Support

Create work tickets for Calibration Associations that have no applicable Recipe

Science Data Processing

Process Scheduling - Readiness Evaluation

Calibration Support

Determine Recipe applicability for Calibration Associations that are ready

Science Data Processing

Process Scheduling - Readiness Evaluation

Calibration Support

Determine superseded datasets due to a planned reprocessing

Science Data Processing

Reprocessing

Calibration Support

Execute manual Recipe Runs

Science Data Processing

Manual Processing

Calibration Support

Identify Calibration Associations requiring Input Dataset assembly

Science Data Processing

Input Data Assembly

Calibration Support

Pause and activate resource scheduling

Science Data Processing

Process Scheduling 

Calibration Support

Update Recipe Specifications

Science Data Processing

Manage Recipes - Update

Calibration Support

Manage Science Data ingest failures

Summit Data Reception and Ingest

Ingest Failure Management

Data Center

Send data in a check-pointed manner from a virtual folder to an authorized user endpoint

Distribution

User Download

Data Center

Execute removal of proprietary proposals 

External Data Ingest

Ingest Proprietary Proposals

Data Center

Execute removal of proposal investigator links

External Data Ingest

Ingest Proposal Investigator Link

Data Center

Ingest new Calibration Association sets 

External Data Ingest

Ingest Data Acquisition Support Data

Data Center

Ingest new Proprietary proposals

External Data Ingest

Ingest Proprietary Proposals

Data Center

Ingest new parameter values for existing parameters

External Data Ingest

Ingest Parameter Values

Data Center

Ingest new proposal investigator links

External Data Ingest

Ingest Proposal Investigator Link

Data Center

Ingest updated Calibration Association sets 

External Data Ingest

Ingest Data Acquisition Support Data

Data Center

Raise a ticket if a Calibration Association set being updated is already in use

External Data Ingest

Ingest Data Acquisition Support Data

Data Center

Assign Service Desk tickets to a group based upon ticket type

Ops Support

Service Desk Management - Assign

Data Center

Audit and record inventory to object accuracy

Ops Support

Data Holding Audit - Inventory to Object

Data Center

Audit and record object integrity

Ops Support

Data Holding Audit - Object Integrity

Data Center

Audit and record object to inventory accuracy

Ops Support

Data Holding Audit - Object to Inventory

Data Center

Generate aggregate events prior to removal of transactional events

Ops Support

Monitoring

Data Center

Generate alerts based upon event review rules

Ops Support

Monitoring

Data Center

Notify investigators when newly processed data is available based upon user preference (i.e., if their digest setting is on or off)

Ops Support

Processed Data Notification

Data Center

Provide password reset instructions to valid email address requests

Ops Support

User Registration - Forgotten password

Data Center

Record email verification

Ops Support

User Registration - Forgotten password

Data Center

Regenerate lost Inventory Cards

Ops Support

Data Holding Audit - Object to Inventory

Data Center

Remove transactional monitoring events greater than 1 year old if space is needed

Ops Support

Monitoring

Data Center

Request email verification

Ops Support

User Registration - Forgotten password

Data Center

Retrieve monitoring info from other DC systems

Ops Support

Monitoring

Data Center

Route Service Desk tickets for approval

Ops Support

Service Desk Management - Create

Data Center

Create work tickets for Recipe Runs identified as "manual"

Science Data Processing

Process Scheduling 

Data Center

Execute automated Recipe Runs

Science Data Processing

Automated Processing

Data Center

Flag Calibration Associations as "ready" when all data has been received, or when 13 days from acquisition have passed, whichever comes first

Science Data Processing

Process Scheduling - Readiness Evaluation

Data Center

Generate a Dataset Inventory record containing aggregate metadata from the associated frames

Science Data Processing

Science Data Ingest

Data Center

Generate a Frame Inventory record containing header values

Science Data Processing

Science Data Ingest

Data Center

Incorporate process management header values into the ingested Frame

Science Data Processing

Science Data Ingest

Data Center

Ingest Processed Data Frames into the Object Store

Science Data Processing

Science Data Ingest

Data Center

Schedule resources for Recipe Runs identified as "manual"

Science Data Processing

Process Scheduling 

Data Center

Create Frame Inventory records for science data that have been ingested

Summit Data Reception and Ingest

Science Data Ingest

Data Center

Ingest Science Data 

Summit Data Reception and Ingest

Science Data Ingest

Data Center

Receive data from the Summit

Summit Data Reception and Ingest

Data Center Receipt

Data Center

Record expected Frame counts by Observing Program Run ID

Summit Data Reception and Ingest

Transfer Manifest Ingest

Data Center

Record receipt count of science data ingested by Observing Program Run ID

Summit Data Reception and Ingest

Science Data Ingest

Data Center

Retain Ancillary Data for at least 90 days

Summit Data Reception and Ingest

Ancillary Data Ingest

Data Center

Route data from the Summit to the appropriate ingest process

Summit Data Reception and Ingest

Data Center Receipt

Data Center Operations

Create and review system event reports

Ops Support

Monitoring

Data Center Operations

Create event alert rules

Ops Support

Monitoring

Data Center Operations

Create Help Tickets

Ops Support

Service Desk Management - Create

Data Center Operations