This document identifies the strategies, processes, management, responsibilities, and metrics related to the assurance of DKIST data quality within the Data Center scope of operations. The DKIST Data Center comprises both the hardware and software systems designed and built to meet the DC Science Requirements (AD[01]) and the DC system requirements (AD[02]). This document details the data quality framework within which the DC will operate, and includes assumptions made, limitations on the DC pertaining to data quality, and the evolution of the framework going forward.

Introduction

Purpose

The DKIST Data Center (DC) is responsible for the processes required to manage and maintain DKIST data throughout its lifecycle, and for the calibration of the raw science data to a Level 1 product, suitable for distribution to the science community. In addition, the data center is responsible for monitoring, assessing, and improving the calibration processes as it gains experience with the instruments and data, so as to produce the best possible quality of the calibrated Level 1 data product.

The purpose of this document is to describe all the processes and metrics that the DC will use to assess and assure the quality of the DKIST Level 1 calibrated data. The details of how these metrics will be calculated within the calibration pipelines will be placed on the Data Center Read the Docs (RTD) documentation page which will also include calibration algorithm documentation. Its link is TBD.

Scope

The scope of this document is limited to quality processes within the DC. The DC has no control over the quality of the data generated at the summit, nor over the conditions under which such data is generated.

Assumptions

The following assumptions directly impact the processes described within this document.

The DC science data processing team is small (4 FTE) and the assumption is that it will remain small in the foreseeable future.
The algorithms used as the starting point for the pipelines will have been vetted through the verification process
All of the data required by the algorithms will be delivered via the DHS and Ops tools interfaces.

Limitations

The quality processes described within this document will be implemented after the DKIST commissioning process is completed. Implementing the processes at the same time as creating the pipelines from verification codes and verifying the end-to-end system would overwhelm the small DC staff.

References

Ref	Title
AD[01]	OPS-DC-SPEC-001 “Data Center Science Requirements”
AD[02]	Data Center System Requirements
AD[03]	OPS-DC-PLAN-007 Data Center Test and Commissioning Plan
RTD	Read the Docs : DC calibrations pipeline documentation

Philosophy

It is the goal and the intention of the DKIST DC to generate pipelines that will automatically calibrate L0 DKIST data as soon as it is viable for calibration. Given that the DC is responsible for the quality of the algorithms and codes that produce the L1 data, it follows that quality control should be built into the pipelines themselves. Traditionally, solar scientists perceive the quality of the data based on visual inspection, noise levels, looking for fringes, etc. Given the enormous volume of data and images that the DC will be processing, and the fact that that staffing at the DC will remain small, visual examination of all calibrated data in full operation is not feasible.

In light of this, the DC has opted for a philosophically different approach to assessing DKIST data quality. The approach we have taken is to automatically generate metrics within the calibration pipelines that can act as indicators of quality across individual frames or whole datasets. Subsets of these metrics will be used by the processing pipelines to alter the flow of pipeline, by users as quality metadata, and by DC staff in monitoring trends as well as sudden or unexpected changes in the data.

Strategy

As a starting point for the Quality Control process, the DC staff convened a group of solar scientists at the NSO for a 2 day workshop whose sole purpose was to propose and define quality metrics that could/should be collected/generated. The output of this workshop was:

A set of quality metrics to be produced routinely as part of the calibration process
Expected usage of the quality metrics

That set of metrics form the backbone of this document, which details what the metrics are, how they will be computed, how they will be used, and what they can tell users, DC staff, and science staff about the data that DKIST is producing.

Definitions

L1 dataset: A single L1 dataset is the collection of calibrated data resulting from a single calibration run where the observations were gathered from one Instrument Program using a unique set of Data Set Parameters.
Spatial step: A single spatial position within a map (ViSP, Cryo-NIRSP, DL-NIRSP).
Spectral step: A single spectral position within a spectrum (VTF).
Spatial scan: A collection of spatial steps put together.
Spectral scan: A collection of spectral steps put together.
Map scan: A cube-like structure made up of multiple spatial or spectral scan steps with three scan dimensions: two spatial dimensions, and one spectral dimension.

Quality Assurance Framework

Quality Control (QC), in the context of the Data Center, is not a single step in the calibration pipeline but rather a continuous process that begins when the DC ingests data from the telescope and carries on throughout the calibration process to the creation of Level 1 data products.

The DC quality framework is comprised of four components, each of which has a role to play in assuring the best quality data possible emanating from the DC. The four components are:

The DC Staff
The Calibration Pipelines
The NSO scientists
The solar community at large

Each of these components has a role to play in quality assurance as described in the following sections.

Quality Management

The Data Center Quality processes are managed by the DC Project Manager as part of his/her routine duties. Quality processes are adhered to by all DC team members, and quality issues are discussed and resolved during daily stand-ups and more formally during biweekly sprint preparation meetings. There are no standing quality related meetings, but one or more may be scheduled if a quality issue that requires more in-depth discussion and action arises. Issues that significantly affect the quality of the published data will take the highest priority in tasking during sprint planning meetings. Quality issues that require more extensive scientific or instrument knowledge for resolution will be assigned to the DC scientist(s) for investigation and resolution.

Quality Processes

Quality Assurance of DC Infrastructure

The DC infrastructure is the system that has been built to provide the means by which raw L0 data may be processed into calibrated (L1) data. The infrastructure includes all the hardware and software systems that ingest, verify, catalog, store, transfer, stage, schedule raw L0 data for processing to L1, and make the calibrated L1 data publicly accessible.

Quality assurance of the DC infrastructure is described in the Data Center Test and Commissioning Plan AD[03] and consists of continuous and automated testing of all the services and components that make up the infrastructure. The question being asked of the QA process in this case is – are all the services working in concert correctly in order to make it possible to start processing data from L0 to L1. This question is a yes/no question that is conceptually easy to answer as long as the system and services have been designed and implemented properly.

Quality Assurance of Data Pipelines

Data processing pipelines are the components of the DC that pull in data, metadata, and ancillary data required to calibrate raw L0 data to L1 data. The pipelines vary by instrument, wavelength, mode, algorithm, and by other aspects of the data as required. Given the numerous combinations of modes, wavelengths, instruments, algorithms, and parameters that are possible, it follows that an indeterminate number of pipelines may be used at any time in the life of the DKIST project.

When the first set of pipelines is constructed by the DC staff, they will reflect the algorithms and codes that were used to perform verification of the instruments. In light of this, the first pipelines will be tested to verify that the pipelines’ results are similar to, if not exactly the same as, the outputs of the verification codes, given the same inputs. As the pipelines evolve, and change with algorithmic updates and new techniques, testing of the pipelines will require a set of defined inputs and outputs that may or may not simulate real solar data. In the absence of expected output values when testing with real solar data, improvements to the quality metrics generated by previous algorithms may be used as a proxy in assessing efficacy of new algorithms or improvements.

Dashboard

For those metrics that the Data Center will monitor over the short and long term, the DC will generate a dashboard that at a quick glance will allow DC personnel to discern trends within the data. DC staff will monitor, assess, and react to these trends as they appear and as necessary to correct, if possible, any trend that may adversely affect the data quality.

Data Quality Starting Point

As noted in paragraph 2.4, the DKIST DC and scientists held a workshop to determine the best candidates for quality metrics. These metrics will therefore be the starting set of quantifiable quality measures applied to the data. It is however unknown whether one or all of these measures prove useful in the long term, and it is equally unknown what other metrics may be developed as a consequence of working with the data on a regular basis. Assessment on the usability and utility of quality metrics will therefore occur on a continual basis in order to assure that continued and steady improvement of data quality.

Responsibilities

Quality Control is not an independent process within the DC pipelines, but one that is highly dependent upon the entire DKIST system working in an integrated manner to produce high quality data from beginning to end. This includes short and long term monitoring, by which enough metadata must be captured to examine trends in quality. As an example, an increase in average dark current over time may point to long-term changes in the camera itself.

DC staff will also rely on NSO scientists, as users of the data, as well as the larger solar science community (through help desk tickets, workshops, and user community meetings) to identify any issues that they may come across as they download and utilize data from the DC data stores. DKIST responsibilities relating to quality are defined in the following paragraphs.

DC Staff

The DC staff is ultimately responsible to produce the highest quality L1 data possible within the constraints of input data quality and the state of the art of calibration algorithms. Given that L1 data quality is in many ways subjective, the DC staff is responsible for the following activities:

Assess the usability of L0 data that did not pass ingestion inspections and, if possible, take remedial action as necessary to make the data usable.
Design, implement, test, and monitor the Data Center infrastructure that makes it possible to produce L1 data from L0 data generated by the telescope systems.
Create, implement, and test calibration algorithms that process L0 data to L1 data
Create, implement, and monitor objective measures by which it can measure success in the context of providing quality data.
Monitor trends and issues in both the infrastructure and quality measures to proactively and preemptively prevent systemic reductions in the quality of data generated by the DC.
Visually monitor a subset of the data generated by the DC to assess quality in a more subjective manner and to assure that quantitative measures taken correlate well with subjective level of quality.
Work with the NSO science staff as well as external scientists on ideas to improve the quality of the data through new or improved calibration algorithms, new and or different measures of quality, and improvements in usage of quality measures.
Alert the Science operation team of any pressing L0 data issues that the team uncovers from L0 data ingest verification and/or trend analyses.

Science Operations Staff

The science operations staff has the responsibility of the upstream tasking, logistics and scheduling that ultimately results in raw data generation by the telescope.

Data Center Scientists

With the delivery of the prototype pipelines and algorithm documents from the construction project to the data center the responsibility for implementation and further development of the prototype and additional pipelines transfers to the data center. The Data Center Scientist(s) are responsible for the scientific integrity of data center products. Calibration Engineers and Scientific Programmers have a strong science background and will support the DC Scientist(s). The DC Scientist(s) will spot check L1 datasets, verify data quality and address any issues discovered. The DC Scientist(s) collect and process feedback from NSO and Community Scientists primarily but not exclusively submitted via the helpdesk. The data center scientists will notify the instrumentation program scientist of potential instrument issues that impact L0 and L1 data quality.

NSO and Community Scientists

NSO scientists will work with the calibrated data generated by the DC as part of their personal research activities. Similar to the general user, NSO scientists will perform research with data they have access to as users and on their individual schedules. Users, including those at NSO, will identify and communicate data quality issues using the helpdesk. NSO scientists may in addition to using the helpdesk (tracking), communicate issues directly to the responsible DC scientist(s).

If a NSO scientist has specific expertise to address a specific quality issue, he/she may be assigned by their supervisor to consult with the data center staff, as other commitments allow, on resolving the issue. Similarly, as assigned NSO science staff will consult with the DC science programming and calibration team on ideas to improve the quality of the data through new or improved calibration algorithms, new and/or different metadata relating to L0 data, and improvements in usage of quality measures. NSO staff, in particular, DKIST Science staff will flag potential instrument issues and communicate to the responsible Program Scientist.

Quality Metrics and Processes

The quality measures developed through the efforts of quality workshop attendees are described in the following paragraphs.

L0 Data Verifications

FITS Header Conformance with SPEC-0122

During the L0 data ingest process, the DC verifies that all required keywords in the FITS headers are present and of the correct type.

Checksums

During the L0 data ingest process, the DC verifies that checksums included in the FITS header – if present- match the calculated checksum. If the checksums are not present in the header, the calculated checksum is added to the header.

L1 Data Processing

Prior to commencing a calibration run, All of the L0 data, parameters, and metadata that is required to complete a calibration run is verified as being present, of the correct type, and within specified ranges. If all of the necessary data is not present, the calibration run will not be attempted.

Once the calibration run has been initiated, the following metrics are computed as part of the calibration run.

Frame Counts

Description: This metric is a count of the number of frames used to produce a calibrated L1 dataset, split by instrument program task type.

Applicability:

Instruments: VBI, ViSP, Cryo-NIRSP, DL-NIRSP, VTF
Instrument program task types: All

Units: Number of frames.

Source Data Needed: Number of frame inventory records.

Source(s): Frame inventory table, object store.

Metric Processing:

Determine type of frame from header.
Count number of each type of frame.
Determine frames to be used for calibration.
For each type of frame, compare no. of raw files with no. of files used for calibration.
Raise warning if number of files used for calibration is less than 50% of raw files. Note: A calibration run will not be started only if there is less than 1 file suitable to be used for calibration.

Metric Utility: The fewer frames used for calibration, the more noise in the resulting output images. In the case that there is less than 1 file suitable to be used for calibration (i.e. the calibration run is not started), then the DC staff will intervene and determine if frames from a different observing day can be used.

Metric Output: Quality report.

Metric Output Type: Table

Example:

Task type	# raw files	# files used	Warnings
Observe	117265	117265
Dark	1348	1345
Flat	1562	602	62% of flat frames not used
Gain	2106	2106

Fried Parameter

Description: This metric quantifies the stability of the atmosphere during an observation and directly impacts the data quality through a phenomenon known as atmospheric seeing.

Applicability:

Instruments: VBI, ViSP, DL-NIRSP, VTF
Instrument program task types: All

Variable: r₀

Units: Centimeters

Source Data Needed: Header keyword: HOAOFriedParameter (AO___001)

Source(s): AO___nnn header table, frame headers

Metric Processing:

Read and store Fried Parameter from each frame header.
Calculate and store the L1 dataset average Fried Parameter.
Calculate and store standard deviation.

Metric Utility: A high Fried Parameter indicates better seeing at the time of observation. A large standard deviation indicates a large variability of seeing during data acquisition.

Metric Output: L1 frame headers (individual Fried Parameter), L1 dataset inventory record, quality report.

Metric Output Type:

a. Plot of Fried parameter vs time.

b. Number – average of Fried parameters for L1 dataset.

c. Number – standard deviation of Fried Parameters for L1 dataset.

Example: Average Fried Parameter for L1 dataset: 15.0 0.22 cm.

Light Level

Description: This metric describes the value of the telescope light level at the start of data acquisition of each frame.

Applicability:

Instruments: VBI, ViSP, Cryo-NIRSP, DL-NIRSP, VTF
Instrument program task types: Observe

Units: Analog digital units

Source Data Needed: Header keyword: LightLevel (DKIST010)

Source(s): DKISTnnn header table, frame headers

Metric Processing:

Read light level from each frame header.
Normalize by exposure time.
Calculate and store the L1 dataset average light level.
Calculate and store standard deviation.

Metric Utility: The light level can show a user many things; it can show if a cloud crossed the observing path during an observation, it can show if a flare or ejection occurred and in what frames. The light level can also be useful to NSO; a low light level could indicate that one of DKIST’s mirrors needs cleaning, or that there is an issue with one of the cameras.

Metric Output: L1 frame headers (individual frame light level), L1 dataset inventory record, quality report.

Metric Output Types

a. Plot of light level counts vs time.

b. Number – average of light level counts for L1 dataset.

c. Number – standard deviation of light level counts for L1 dataset.

Example: L1 dataset average light level: 10 1.18 analog digital units.

Average Value Across Frame

Description: This metric is the calculated average intensity value across a single frame for each frame of each instrument program task type. Note: this metric may only be applied to a subsection of a frame if only that subsection of the frame is useful (i.e. polarimetric beams, unusable area due to CCD errors, etc.) See RTD for mathematical specifications of this metric.

Applicability:

Instruments: VBI, ViSP, Cryo-NIRSP, DL-NIRSP, VTF
Instrument program task types: Observe, Dark, Gain

Units: Analog digital units

Source Data Needed: Analog digital units

Source(s): Raw frames

Metric Processing:

Calculate average number of digital units per pixel in the frame for all frames prior to any processing being run.
Normalize by exposure time.
Calculate interquartile range and standard deviation across L1 dataset.
Raise a warning if a frame has an average value outside of the interquartile range of the respective instrument program task type data.

Metric Utility: If a single calibration frame has an average value that falls outside of the interquartile range of the respective instrument program task types, then that frame is considered to be an outlier and could be unreliable and unusable. If a single observation frame has an average value that falls outside of the interquartile range of all observe frames, then that could indicate to the user that there could be an anomaly within that frame.

Metric Output: Quality report.

Metric Output Type: Plot only if a warning is issued. (Plot will include statistical metrics showing interquartile range, and outliers.)

Example: WARNING: The average value of frames 2, 98, 101, 305, and 476 fall outside of the interquartile range of all observe frames from this L1 dataset.

Root Mean Square (RMS) Across Frame

Description: This metric quantifies the uncertainty and the RMS across a single frame for each frame of each instrument program task type. See RTD for mathematical specifications of this metric.

Applicability:

Instruments: VBI, ViSP, Cryo-NIRSP, DL-NIRSP, VTF
Instrument program task types: Observe, Dark, Gain

Units: Analog digital units

Source Data Needed: Analog digital units

Source(s): Raw frames

Metric Processing:

Calculate RMS across each frame prior to any processing being done. (See RTD)
Normalize by exposure time.
Calculate interquartile range and standard deviation across L1 dataset.
Raise a warning if a single frame within a respective observing program has an RMS value outside of the interquartile range of the respective instrument program task type data.

Metric Utility: If a frame has a much higher or lower RMS value, this could indicate that there is bad data in that frame.

Metric Output: Quality report.

Metric Output Type: Plot only if a warning is issued. (Plot will include statistical metrics showing interquartile range, and outliers.)

Example: WARNING: RMS value of frames 2, 6, 230, and 301 fall outside of the interquartile range of all dark frames from this L1 dataset.

Average Value of a L1 Dataset

Description: This metric is the calculated mean intensity value across an entire L1 dataset. I.e. this is the average of all average values across a frame for an instrument program task type contained in one L1 dataset. See RTD for mathematical specifications of this metric.

Applicability:

Instruments: VBI, ViSP, Cryo-NIRSP, DL-NIRSP, VTF
Instrument program task types: Observe, Dark, Gain

Units: Analog digital units

Source Data Needed: Analog digital units

Source(s): Raw frames

Metric Processing:

For each instrument program task type, obtain average value across a frame for each frame used during data processing. (See RTD)
Average these values.
This metric will be compared with other average values across a L1 dataset for a respective task type from past L1 datasets stored in the DC. A warning will be raised if one of these values is at least 3 sigma higher or lower than other L1 datasets.

Metric Utility: This metric will be used to determine the goodness of a L1 dataset. I.e. if the average value of any calibration frames within a L1 dataset is at least 3 sigma higher or lower than in other L1 datasets, then that series may contain many outliers and may not be reliable. It is more difficult to determine the goodness of any observe frames using this metric due to the Sun’s variability over the course of a day, month, week, etc.

Metric Output: Quality report

Metric Output Type: Number

Example: WARNING: Average value of DL-NIRSP dark series: 42 3.4 analog digital units. This is 3 sigma higher than past DL-NIRSP dark series averages.

RMS of Series Average

Description: This metric quantifies the uncertainty and the RMS of counts across an entire L1 dataset. See RTD for mathematical specifications of this metric.

Applicability:

Instruments: VBI, ViSP, Cryo-NIRSP, DL-NIRSP, VTF
Instrument program task types: Observe, Dark, Gain

Units: Analog digital units

Source Data Needed: Analog digital units

Source(s): Raw frames

Metric Processing:

For each instrument program task type, obtain the RMS across a frame for each frame used during data processing. (See RTD)
Average these values.
Raise warning if RMS of a L1 dataset is 3 sigma higher or lower than other nominal L1 datasets.

Metric Utility: If a warning is raised, then it could indicate that a L1 dataset has a high standard deviation.

Metric Output: Quality report.

Metric Output Type: Number

Example: RMS of DL-NIRSP observe series: 15.2 analog digital units. This is 4 sigma higher than similar L1 datasets.

Noise

Description: This metric is measured by value of the noise over time. See RTD for mathematical specifications of this metric.

Applicability:

Instruments: VBI, ViSP, Cryo-NIRSP, DL-NIRSP, VTF
Instrument program task types: Calibrated frames

Variable:

Units: Analog digital units

Source Data Needed: Analog digital units

Source(s): Calibrated frames

Metric Processing:

For all calibrated image frames, calculate the noise value. (See RTD)
Normalize by exposure time.
Average these values to find the average noise across a L1 dataset.
Calculate standard deviation over L1 dataset.

Metric Utility: If the noise of a frame is high, then the images can be fuzzy, distorted, unclear, and difficult to extract a signal from.

Metric Output: L1 frame headers (individual frames), L1 dataset inventory record, quality report.

Metric Output Type: Number, plot

Example: Average RMS noise value of DL-NIRSP observe series: 0.1 0.005 analog digital units.

Range Checking

Description: This metric is checking that certain input and calculated parameters fall within a valid data range. This metric will be used extensively in the Data Center across many processes but will only be reported to users in the event of a parameter moving outside of a valid range. (These ranges will be determined by the DC staff using initial DKST L1 datasets and will be refined as we get more data.)

Applicability:

Instruments: VBI, ViSP, Cryo-NIRSP, DL-NIRSP, VTF
Instrument program task types: All

Units: None

Source Data Needed: Raw frames.

Source(s): Output of various calibration steps.

Metric Processing: Range checking will be applied at multiple steps of the calibration process.

Determine range of parameter.
Check this range with stored historical range, statistical count range, and positive and negative value range for the same parameter.
Raise a warning if a parameter or metric falls outside of a determined range.

Metric Utility: If a parameter falls outside of a range, this could mean a number of things; it could be bad data, but it could also be a valid solar anomaly. In some cases, if a warning is raised for a parameter that is vital, the processing code could be stopped and the DC staff will intervene.

Metric Output: Quality report.

Metric Output Type: Warning

Warning Count

Description: This metric measures how many warnings were raised during the calibration process.

Applicability: Calibration runs.

Units: None

Source Data Needed: Warnings raised.

Source(s): Other metrics.

Metric Processing:

Collect all warnings raised during processing.
Make a count of all warnings.

Metric Utility: The number of warnings raised can indicate to a user whether a L1 dataset is “good” or “bad”. If a large number of warnings were raised, then it is possible that there were many problems during processing and the data could be “bad”. However, determining whether data is good or bad is up to the user.

Metric Output: Quality report.

Metric Output Type: Statement(s) displayed at the beginning of a quality check report

Example: 32 warnings raised

Data Source Health

Description: This metric contains the worst health status of the data source during data acquisition. Possible values can be Good, Ill, or Bad.

Applicability:

Instruments: VBI, ViSP, Cryo-NIRSP, DL-NIRSP, VTF
Instrument program task types: All

Units: None

Source Data Needed: Header keyword: DataSourceHealthStatus (DKIST006)

Source(s): DKISTnnn header table, frame headers

Metric Processing:

Read data source health keyword from each frame header for entire L1 dataset.
Sort and store number of frames in the L1 dataset that were assigned to each of the health status options in a list.

Metric Utility: This metric will be used to determine the health status of the instrument during data acquisition.

Metric Output: L1 headers (individual frames), quality report.

Metric Output Type: Table

Example:

Status	# frames
GOOD	126527
ILL	11
BAD	0

Adaptive Optics (AO) Status

Description: This metric shows if the adaptive optics system was running at the time of observation

Applicability:

Instruments: VBI, ViSP, DL-NIRSP, VTF
Instrument program task types: Observe

Units: None

Source Data Needed: Header keywords: Any AO___nnn keywords

Source(s): AO___nnn header table, frame headers

Metric Processing:

Look for the existence of the AO___nnn header table from each observe frame header for entire L1 dataset.
For a single frame, if the AO___nnn table does exist, mark frame as AO running.
Calculate percentage of observe frames for which AO system was active.

Metric Utility: This metric is used to show users which frames had atmospheric correction applied.

Metric Output: L1 headers (individual frames), quality report.

Metric Output Type: Number of frames AO was on for.

Example: Adaptive Optics running for 82% of observe frames.

Polarization Characteristics

Description: This metric shows the polarization characteristics as calculated from the RMS noise and estimate of sensitivity of the polarimetric data during data processing. See RTD for mathematical specifications of this metric.

Applicability:

Instruments: ViSP, Cryo-NIRSP, DL-NIRSP, VTF
Instrument program task types: Observe

Units: None (RMS noise), Analog digital units (estimate of sensitivity)

Source Data Needed: Stokes parameters.

Source(s): Reduced polarimetric data.

Metric Processing:

Extract I, Q, U, and V Stokes parameters from data.
Calculate RMS noise (See RTD):
1. For each spatial/spectral step or spatial/spectral scan, calculate RMS noise of the polarimetric data using the following equation:

)

Average all RMS noise values and return one value to be the Stokes Q RMS noise of this map scan.
Repeat for Stokes U and V.
Repeat for all map scans made.

Estimate the sensitivity (See RTD):
1. Estimate the sensitivity of an observation using photon statistics. We do this by calculating the smallest signal that can be measured using the following equation:

where max_intensity represents the largest Stokes I intensity value in a single reduced observe frame over all spatial/spectral steps or spatial/spectral scans.

Repeat for all scans.

Metric Utility: If an observation has a high RMS noise value, then it could contain inaccurate data. If an observation has a low estimate of sensitivity, then the data could be very clear and precise, detecting very small polarization signals.

Metric Output: Quality report.

Metric Output Type: Number

Example:

Cryo-NIRSP scan 13 statistics:

RMS Noise: 0.016 digital units

Estimate of sensitivity 0.04 digital units

Historical Comparisons

Description: Over time, the data center will be comparing some of the above quality metrics and other parameters derived from file headers to see how the DKIST instruments and observations are changing. The following is a list of metrics that will be monitored historically:

L1 dataset average Fried Parameter (AO___001)
L1 dataset average light level (DKIST010)
HOAO lock status at start of data acquisition (AO___002) ¹
Limb sensor radial set position (AO___007) ¹
DKIST local outside sky brightness (WS___008) ¹
Local outside wind speed (WS___002) ¹
Local outside wind direction (WS___003) ¹
Local outside temperature (WS___004) ¹
Local outside relative humidity (WS___005) ¹
Local outside barometric pressure (WS___007) ¹
L1 dataset average upper GOS optics temperature (PAC___011)
Worst health status of data source during data acquisition (DKIST006)
L1 dataset average Dark current / exposure time / binning ²
L1 dataset average dark current RMS value ²
L1 dataset average Gain / exposure time / binning ²
L1 dataset average gain RMS value ²
Quality control report result (good, error, aborted)

Applicability:

Instruments: VBI, ViSP, Cryo-NIRSP, DL-NIRSP, VTF
Instrument program task types: All

Units: None

Source Data Needed: Historical statistical count values from each instrument and frame type.

Source(s): Stored frame values

Metric Processing: Every few months (weekly at the start of operations), comparisons of stored L1 dataset parameters will be made and recorded by the Data Center.

Retrieve stored parameter values.
Compare to historical parameter values.
Raise a warning if a new L1 dataset value is at least 3 sigma higher or lower than other historical values.

Metric Utility: If a value falls at least 3 sigma outside of other stored historical values, then the data is very different than nominal data and it could be unusable. By keeping track of these metrics over time, we can evaluate a number of things such as how cameras are performing/degrading, locate bad/hot pixels, etc. However, for the time being, we are focused on obtaining and storing these values, and determining what they can tell us.

Metric Output: Internal database.

Metric Output Type: Plot, number.

Data Quality Evolution

It is expected that QC will be an evolving process in that both the quality metrics, as well as the parameters used in their calculation, will evolve over time, as actual science data becomes available, and as calibration routines are improved or added. This document will be updated as required to document those changes.

¹ We will only record the value from the first frame of the L1 dataset

² For each instrument

OPS-DC-PLAN-020

Overview

Introduction

Purpose

Scope

Assumptions

Limitations

References

OPS-DC-SPEC-001 “Data Center Science Requirements”

Data Center System Requirements

OPS-DC-PLAN-007 Data Center Test and Commissioning Plan

Read the Docs : DC calibrations pipeline documentation

Philosophy

Strategy

Definitions

Quality Assurance Framework

Quality Management

Quality Processes

Quality Assurance of DC Infrastructure

Quality Assurance of Data Pipelines

Dashboard

Data Quality Starting Point

Responsibilities

DC Staff

Science Operations Staff

Data Center Scientists

NSO and Community Scientists

Quality Metrics and Processes

L0 Data Verifications

FITS Header Conformance with SPEC-0122

Checksums

L1 Data Processing

Frame Counts

Fried Parameter

Light Level

Average Value Across Frame

Root Mean Square (RMS) Across Frame

Average Value of a L1 Dataset

RMS of Series Average

Noise

Range Checking

Warning Count

Data Source Health

Adaptive Optics (AO) Status

Polarization Characteristics

Historical Comparisons

Data Quality Evolution