How do I install and run the DKIST data processing pipeline?

Summary

The DKIST Data Center Archive provides calibrated (a.k.a. Level 1, “L1”) data to the community. It does not provide access to raw (a.k.a. Level 0, “L0”) DKIST data via the User Portal for download. However, special requests can be made using a request form to the DC for raw data files. If the user has a justified reason to get these raw data, they can make a request to the DC for access (see <raw data instructions> (link available soon)). The DKIST pipeline calibrations software can be downloaded and installed following the instructions below.

 

Warning: Raw data requests and pipeline processing are for experts only. The dataset sizes are massive and working with these data requires a large amount of storage, processing power and python expertise.

Prerequisites to running DKIST L0 → L1 data processing

  1. The user will need to make a request for raw DKIST data; this request will be vetted by the DKIST project; if the request is granted, the user will get access to raw DKIST data. Without raw DKIST data, there is no value in running the DKIST L0->L1 processing pipeline.

  2. The user should already have downloaded and looked at the Level 1 datasets to determine why they need to rerun the data processing themselves. Reporting issues with the Level 1 datasets is encouraged and appreciated.

Instructions

  1. Set up your local python environment (python version must be 3.8 or greater, python 2.x is not supported) <download/setup site>.

  2. Install and run a Redis server which will act as the mapping store for data locations on your local disk (more details can be found at https://redis.io/download )

  3. Choose which DKIST instrument you are using for your local pipeline.

  4. Clone the relevant instrument repository (named dkist-processing-*instrument*) which can be found at https://pypi.org/user/dkistdc/

  5. Go to the directory you cloned the pipeline code into and run pip install -e . in a terminal to install the package into your python environment.

  6. Use the included ManualProcessing object to run either DKIST calibration modules or your own custom written calibration modules (see the <documentation> that comes with the instrument pipelines for more details on how to do this).

 

Reprocessing raw DKIST data requires expertise. If you find an issue with the DKIST processing pipeline(s), please notify the DKIST Data Center Archive by filing a Help Desk ticket. It is important to make sure the DKIST public archive is updated to reflect any bugs or problems with software and/or data.