How do I request raw DKIST data?

Summary

The DKIST Data Center Archive (DC) provides calibrated (a.k.a. Level 1, “L1”) data to the community via the User Portal for download. It does not generally provide general access to raw (a.k.a. Level 0, “L0”) data. However, special requests can be made using a request form to the DC for raw data files for certain instruments. If the user has a justified reason to get these raw data, they can make a request to the DC for access. Requests will be reviewed on a case by case basis. It is assumed that a user will already have downloaded the L1 data before requesting the raw data. You are required to specify the dataset IDs from the L1 data that corresponds to the raw data you wish to download. Raw data is not searchable and cannot be filtered by instrument or other similar parameters. If the request is approved, the user will receive ALL the raw data for at least one observing program as executed at the telescope. 

Note that if you are a PI who has been contacted by the data center regarding issues calibrating data from your proposal, do not use this form. Please respond to the DC using the email you were contacted by.

Warning: Raw data requests are for experts only. The dataset sizes are massive and working with these data requires a large amount of storage and processing power.

Please note that VBI raw data is NOT available for request.

Prerequisites to obtaining DKIST raw (L0) data

  1. The user must have an active Globus account (see: How to create a globus account (for DKIST data))

  2. The user must have an active endpoint for Globus data transfers (see: Globus Connect & Endpoints)

  3. The user should already have downloaded and looked at the Level 1 datasets

Instructions

  1. Visit the DKIST Data Portal Raw Data request form.

  2. Log into the Data Portal if you have not already.

  3. Fill in the following mandatory fields on the request form; please be as descriptive as possible:

    1. Datasets Requested: Dataset ID(s) where the FITS header keyword = “DSETID” in the L1 data

    2. Justification: Statement for the reason for this raw data request

4. If the request is approved by the DKIST management, then

  • The user will receive a notification from the DC that the request was approved

  • Data Center staff will contact the user for additional information on requested datasets

  • Data Center staff will provide the raw data endpoint details for this Globus download as well as the allotted time the data will be made available to them

  • The user must initiate the Globus data transfer within the allotted time - the requested data will then be transferred to the user endpoint

  • If the user has requested notifications from Globus upon transfer completion, the user will receive a notification when the transfer completed (status: Success/Error)

5. If a raw data request is denied, the requester will be notified by email with the reason.

I got my raw data, but how are the files named and organized?

When the RAW data files are delivered to the data center, the only useful bit of information that the raw data filename contains is the instrument name. The data center subsequently renames the files (e.g. 0056d294667a4bb5a552bfbda08a4dd3.fits) to ensure that if we happened to get the same filename twice from the summit there are no issues handling multiple copies of the same file. As the data center will not generally release raw data, our overriding goal is just to create unique filenames. When the raw data is ingested at the data center, meta-information about the file goes into frame inventory, so the data center subsequently knows what the file is and what it contains. However, the data center doesn’t have mechanism to expose frame inventory publicly. The only way for a user to extract the information you require, is to read the file headers and parse them. The L1 files are more usefully named, e.g.

VISP_2022_06_03T19_33_53_514_00630205_I_AKNPB_L1.fits.

But again this is only for processed data, as is the creation of the additional files like the ASDF file. The figure below illustrates what you would see in the Globus web client, once you have been given access to the raw data files for your proposal.

Raw data layout from the Globus client

Each one of the raw directories is one observing program execution id (OP). e.g.

eid_1_118_op4Asyox_R001.82591.14038621.

You should find there are N observe OPs that would correspond to N * wavelength L1 science datasets, but there are also OPs for required calibration files such as darks, PolCals, and gains in the raw data. Within the observe frames, you will find FITS files that correspond to a singular modulator state at a single slit step, but again the only way to access this information is to open the files one-by-one and read the appropriate keywords.