00 - Data Center

Related pageLink
High Level Architecture00 - High Level System Design


Scope:

The scope of Data Center processes and software are depicted in an operational view by the contents of the blue dashed box in the image below. Lines that cross the scope boundary are interfaces to either other systems, such as Ops Tools in orange and Data Handling System for the telescope in maroon, or to different user roles that interact with the Data Center system and processes. Each process box within the Data Center scope represents a Process Area that will be further decomposed in later sections. The interactions between these Process Areas are either direct (e.g., from Search to Distribution) or indirect by way of exchanging data through the different data stores in the Data Center architecture.



Goals
Provide an infrastructure capable of calibrating data received from the DKIST summit
Deliver discovered data in well defined and structured formats 
Enable the discovery of processed data held at the Data Center 

Support the embargo of data based upon proposal

Facilitate the operational management of the Data Center

Make processed DKIST data freely available to the public
Notify PI when their processed data becomes available
Optimize hardware utilization (processing/storage) to minimize cost
Support data delivery via network
Protect science data from loss

Support the securing of cyber systems through:

  • Capability to perform auditing
  • Capability to perform anomaly detection
  • Firewall & network security
Serve as the official and accepted repository of DKIST data
Support the modification and removal of data as directed by the Data Quality Assessment Committee (DQAC)
Support the user community with open source code, documentation and help desk interaction.
Support the version tracking of processed science data 
Receive data collected on the summit for curation
Support raw data use by DKIST instrument scientists
Capable of receiving and ingesting Operations Tools data necessary for Data Center processing and search functions

Key Concept: Process Area to Composite Application Mapping

The "Composite Applications" (e.g. Summit Data Management, External Data Management, etc.) are the headings under which the software supporting Data Center processes are described from a system perspective.



Key Concept: Data Stores

Data is central to the processes and supporting software for the Data Center. The data stores of the Data Center are depicted below along with more detail about how they are factored, how they relate to each other, and the technologies which support them. 


Data Center Process Rules

RuleJustification
Data Center user passwords are required to be "strong."This is part of the security plan. 
Raw search requests require approval.Raw data is of questionable external use. Requestor must be deemed able to use the data in order to justify the use of DC resources to assemble the raw data for distribution. More rules about escalation TBD.
Email addresses must be verified for the account to be used.We need to know that we can contact a user.
For a user to be an Authorized Agent, they must have the approval of the DKIST Director. The authorization shall remain in effect until notification from Data Center Scientist.Operations activities may require the review of data (including proprietary data) to handle calibration and instrument performance issues. These issues would be handled by non-Data Center personnel, and therefore would be required to use the end-user facing functions to retrieve data.
DKIST Authorized Agents shall not use proprietary data for science.It is the DKIST Authorized Agent's responsibility to know whether the data they are using for science is proprietary, and if it is, to not use it.
proprietary data shall only be distributed externally to users when the following criteria is met:
- the embargo is currently "active"
- the user is an investigator or co-investigator on the proposal associated with the data
- the user is an Authorized Agent of the DKIST
Embargo statuses support PhD students in keeping their data private for a period of time (typically 12 months) so as to not allow interference with their theses.
No science data younger than 6 months can be removed.This is a requirement.
Processed Frames shall be proprietary based upon the proposal associated with a Processing Candidate. Artisanal (i.e., no candidate affiliation) Input Datasets are exempt from proposal embargo rules.Input Datasets indicate the proposal for the observation's operational run. This is either inherited from the calibration association set's primary operational run or set by the calibration support team when constructing an Artisanal Input Dataset.
Before the removal or modification of failed science ingest data, the instrument scientist must be consulted.Science data will potentially be discarded, so a scientist must sign off on the action.
Data ingested into the Data Center from non-Data Center sources must be provided through an authorized channel.This is part of the security plan. 
Processed data shall be made discoverable to all users, both registered and unregistered (this excludes proprietary data, as its distribution has been defined differently in a previous rule).This is part of the data policy.
Science Data distributed from the Data Center to external destinations must transit an authorized channel.This is part of the security plan. 

Data Center Process Roles and Responsibilities

RoleResponsibilityStrategy AreaApplicable Process
Calibration SupportEvaluate, and Approve or Disapprove Raw Search requestsOps SupportService Desk Management - Approval
Calibration Support

Cancel Recipe Run

Science Data ProcessingProcess Scheduling 
Calibration SupportCreate a dataset review ticket for superseded datasetsScience Data ProcessingReprocessing
Calibration SupportCreate Input Datasets comprised of input frames, calibration parameter values, and primary proposal IDScience Data ProcessingInput Data Assembly
Calibration SupportCreate Recipe Instances and Recipe Runs for reprocessingScience Data ProcessingReprocessing
Calibration SupportCreate Recipe instances combining applicable Recipes and Input Datasets Science Data ProcessingProcess Scheduling - Readiness Evaluation
Calibration SupportCreate Recipe runs based upon Recipe Instances for schedulingScience Data ProcessingProcess Scheduling - Readiness Evaluation
Calibration SupportCreate Recipe SpecificationsScience Data ProcessingManage Recipes - Create
Calibration SupportCreate work tickets for Calibration Associations that have no applicable RecipeScience Data ProcessingProcess Scheduling - Readiness Evaluation
Calibration SupportDetermine Recipe applicability for Calibration Associations that are readyScience Data ProcessingProcess Scheduling - Readiness Evaluation
Calibration SupportDetermine superseded datasets due to a planned reprocessingScience Data ProcessingReprocessing
Calibration SupportExecute manual Recipe RunsScience Data ProcessingManual Processing
Calibration SupportIdentify Calibration Associations requiring Input Dataset assemblyScience Data ProcessingInput Data Assembly
Calibration SupportPause and activate resource schedulingScience Data ProcessingProcess Scheduling 
Calibration SupportUpdate Recipe SpecificationsScience Data ProcessingManage Recipes - Update
Calibration SupportManage Science Data ingest failuresSummit Data Reception and IngestIngest Failure Management
Data CenterSend data in a check-pointed manner from a virtual folder to an authorized user endpointDistributionUser Download
Data CenterExecute removal of proprietary proposals External Data IngestIngest Proprietary Proposals
Data CenterExecute removal of proposal investigator linksExternal Data IngestIngest Proposal Investigator Link
Data CenterIngest new Calibration Association sets External Data IngestIngest Data Acquisition Support Data
Data CenterIngest new Proprietary proposalsExternal Data IngestIngest Proprietary Proposals
Data CenterIngest new parameter values for existing parametersExternal Data IngestIngest Parameter Values
Data CenterIngest new proposal investigator linksExternal Data IngestIngest Proposal Investigator Link
Data CenterIngest updated Calibration Association sets External Data IngestIngest Data Acquisition Support Data
Data CenterRaise a ticket if a Calibration Association set being updated is already in useExternal Data IngestIngest Data Acquisition Support Data
Data CenterAssign Service Desk tickets to a group based upon ticket typeOps SupportService Desk Management - Assign
Data CenterAudit and record inventory to object accuracyOps SupportData Holding Audit - Inventory to Object
Data CenterAudit and record object integrityOps SupportData Holding Audit - Object Integrity
Data CenterAudit and record object to inventory accuracyOps SupportData Holding Audit - Object to Inventory
Data CenterGenerate aggregate events prior to removal of transactional eventsOps SupportMonitoring
Data CenterGenerate alerts based upon event review rulesOps SupportMonitoring
Data CenterNotify investigators when newly processed data is available based upon user preference (i.e., if their digest setting is on or off)Ops SupportProcessed Data Notification
Data CenterProvide password reset instructions to valid email address requestsOps SupportUser Registration - Forgotten password
Data CenterRecord email verificationOps SupportUser Registration - Forgotten password
Data CenterRegenerate lost Inventory CardsOps SupportData Holding Audit - Object to Inventory
Data CenterRemove transactional monitoring events greater than 1 year old if space is neededOps SupportMonitoring
Data CenterRequest email verificationOps SupportUser Registration - Forgotten password
Data CenterRetrieve monitoring info from other DC systemsOps SupportMonitoring
Data CenterRoute Service Desk tickets for approvalOps SupportService Desk Management - Create
Data CenterCreate work tickets for Recipe Runs identified as "manual"Science Data ProcessingProcess Scheduling 
Data CenterExecute automated Recipe RunsScience Data ProcessingAutomated Processing
Data CenterFlag Calibration Associations as "ready" when all data has been received, or when 13 days from acquisition have passed, whichever comes firstScience Data ProcessingProcess Scheduling - Readiness Evaluation
Data CenterGenerate a Dataset Inventory record containing aggregate metadata from the associated framesScience Data ProcessingScience Data Ingest
Data CenterGenerate a Frame Inventory record containing header valuesScience Data ProcessingScience Data Ingest
Data CenterIncorporate process management header values into the ingested FrameScience Data ProcessingScience Data Ingest
Data CenterIngest Processed Data Frames into the Object StoreScience Data ProcessingScience Data Ingest
Data CenterSchedule resources for Recipe Runs identified as "manual"Science Data ProcessingProcess Scheduling 
Data CenterCreate Frame Inventory records for science data that have been ingestedSummit Data Reception and IngestScience Data Ingest
Data CenterIngest Science Data Summit Data Reception and IngestScience Data Ingest
Data CenterReceive data from the SummitSummit Data Reception and IngestData Center Receipt
Data CenterRecord expected Frame counts by Observing Program Run IDSummit Data Reception and IngestTransfer Manifest Ingest
Data CenterRecord receipt count of science data ingested by Observing Program Run IDSummit Data Reception and IngestScience Data Ingest
Data CenterRetain Ancillary Data for at least 90 daysSummit Data Reception and IngestAncillary Data Ingest
Data CenterRoute data from the Summit to the appropriate ingest processSummit Data Reception and IngestData Center Receipt
Data Center OperationsCreate and review system event reportsOps SupportMonitoring
Data Center OperationsCreate event alert rulesOps SupportMonitoring
Data Center OperationsCreate Help TicketsOps SupportService Desk Management - Create
Data Center OperationsEnsure the embargo (proprietary) (proprietary) Update Rule is followedOps SupportUser Authorization Management
Data Center OperationsActivate/deactivate user accountsOps SupportUser Registration - Inactivate/Activate Users
Data Center OperationsMaintain service desk ticket status to reflect work statusOps SupportService Desk Management - Do Work
Data Center OperationsNotify requestors of ticket completion for help ticketsOps SupportService Desk Management - Do Work
Data Center OperationsPerform data deletion at the direction of DQAC data removal ticketsOps SupportDQAC Reduction
Data Center OperationsPerform Raw Data searches as directed in approved Raw Data search ticketsOps SupportRaw Search
Data Center OperationsResolve open service desk ticketsOps SupportService Desk Management - Do Work
Data Center OperationsReview open service desk ticketsOps SupportService Desk Management 
Data Center OperationsReview storage health monitoring informationOps SupportMonitoring
Data Center OperationsRoute system authorizations to System AdminOps SupportUser Authorization Management
Data Center OperationsStage Raw Data requested in Raw Data search ticketsOps SupportRaw Search
Data Center OperationsUpdate embargo (proprietary)   Agent AuthorizationsOps SupportUser Authorization Management
Data Center OperationsUpdate user account informationOps SupportUser Registration - Update
Data Center OperationsManage Frame Manifest Ingest FailuresSummit Data Reception and IngestIngest Failure Management
Data Center OperationsManage Summit Ingest categorization failuresSummit Data Reception and IngestIngest Failure Management
DC Project ManagerCoordinate change management issues for Summit Ingest schema discrepanciesSummit Data Reception and IngestIngest Failure Management
Registered UserInitiate transfer of data from an authorized data center endpointDistributionUser Download
Registered UserRetrieve documentation from dataset linksOps SupportCode and Algorithm Document Distribution
Registered UserSearch and retrieve code and toolsOps SupportCode and Algorithm Document Distribution
Registered UserSearch and retrieve documentationOps SupportCode and Algorithm Document Distribution
Registered UserChange passwordOps SupportUser Registration - Update
Registered UserCreate Help TicketOps SupportService Desk Management - Create
Registered UserReset passwordOps SupportUser Registration - Forgotten password
Registered UserUpdate user account informationOps SupportUser Registration - Update
Registered UserCreate requests for Raw DataSearchSearch
Registered UserRequest distributionSearchRequest Distribution
Registered UserSearch for processed dataSearchSearch
Registered UserSelect datasets for downloadSearchReview Results
ScienceCreate DQAC reduction tickets for datasets that need to be removed due to reprocessingScience Data ProcessingReprocessing
ScienceReview new/superseded datasets to determine data removal action(s)Science Data ProcessingReprocessing
System AdminCreate, update and remove system level authorizations for DC usersOps SupportUser Authorization Management
Unregistered UserSearch and retrieve code and toolsOps SupportCode and Algorithm
Unregistered UserSearch and retrieve documentationOps SupportCode and Algorithm
Unregistered UserCreate Help TicketOps SupportService Desk Management - Create
Unregistered UserEnter info to create an accountOps SupportUser Registration - Create
Unregistered UserVerify email address providedOps SupportUser Registration - Create
Unregistered UserCreate requests for Raw DataSearchSearch
Unregistered UserSearch for Processed dataSearchSearch
Unregistered UserSelect datasets for downloadSearchReview Results
Authorized AgentDetermine embargo (proprietary) status of datasets in search resultsSearchReview Results
DKIST DirectorApprove Authorized Agent status permissionsOps SupportUser Authorization Management
Data Center ScientistTrigger the removal of Authorized Agent permissionsOps SupportUser Authorization Management