Design Diagram

The Object Store is primarily backed by Ceph. Ceph (https://ceph.com/) is open source object storage software that can be used with commodity hardware. Also supporting the Object Store is Amazon Glacier (https://aws.amazon.com/glacier/). Glacier is used for the long term back up of raw data.

An idea to facilitate floor-to-book auditing was to mirror create, update and delete operations from the Data Zone to the Metadata Zone for metadata replication in the Metadata Store. In order to leverage RGW Sync, constraints were placed on the Ceph multi-site configuration that prevented dynamic bucket index re-sharding. Dynamic bucket index re-sharding is needed to prevent large bucket indexes (which we would have for raw and data) from causing performance issues. Future releases of Ceph may remove this constraint, but until such a time as the constraint is removed or a new design created, book-to-floor support via RGW Sync is not enabled.

Logical Data Model

In an object store, "folders" do not exist. The object key can be formatted as a path which contains folders, file names and extensions, which is then understandable by the Globus Storage Connectors (see 08 - Transfer Manager) and presented to users as folders. Most important, however, is the controlling of access through Globus which is at the "folder" level. In order to limit access based upon embargo rules, data within the data bucket includes a dataset folder to ensure all data is both logically grouped and embargo (by proposal) consistent. The same is true of the raw data in the raw bucket, but instead of datasets (which don't exist at that time), the grouping folder is "Observing Program Execution Id."

Bucket Naming Convention

The convention used is all lowercase, and separated by the "-" symbol.

Type: Object Store

Bucket	Description
inbox	Frame manifests to be ingested with lines in a file containing "/Transfer/$ExperimentID/$ObsProgramExecID/$InstrumentProgramID/$FrameID"
inbox	Science data to be ingested
inbox	Science Data (Raw VBI) data to be ingested
ingest-fail	Frame manifests failing ingest
ingest-fail	Science data failing ingest
ingest-fail	Raw VBI data failing ingest
category-fail	Data failing categorization
data	Processed Frames, ASDF files, and browse movies
data	Parameter values that were too large to be loaded as JSON into the parameter value table
raw	Raw Frames (prefix is "ProposalId/ObservingProgramExecutionId")
raw-data-archive	Raw Frames (prefix is "ProposalId/ObservingProgramExecutionId")

Data Protection

Ceph uniquely delivers object, block, and file storage in one unified system. Ceph is highly reliable, easy to manage, and free. Ceph delivers extraordinary scalability–thousands of clients accessing petabytes to exabytes of data. A Ceph Node leverages commodity hardware and intelligent daemons, and a Ceph Storage Cluster accommodates large numbers of nodes, which communicate with each other to replicate and redistribute data dynamically.

For more information, visit: Ceph Architecture

Public DKIST Operations Information

01 - Object Store

Design Diagram

Logical Data Model

Type: Object Store

Data Protection

Related content