01 - Object Store
Design Diagram
The Object Store is primarily backed by Ceph. Ceph (https://ceph.com/) is open source object storage software that can be used with commodity hardware. Also supporting the Object Store is Amazon Glacier (https://aws.amazon.com/glacier/). Glacier is used for the long term back up of raw data.Â
An idea to facilitate floor-to-book auditing was to mirror create, update and delete operations from the Data Zone to the Metadata Zone for metadata replication in the Metadata Store. In order to leverage RGW Sync, constraints were placed on the Ceph multi-site configuration that prevented dynamic bucket index re-sharding. Dynamic bucket index re-sharding is needed to prevent large bucket indexes (which we would have for raw and data) from causing performance issues. Future releases of Ceph may remove this constraint, but until such a time as the constraint is removed or a new design created, book-to-floor support via RGW Sync is not enabled.
Logical Data ModelÂ
In an object store, "folders" do not exist. The object key can be formatted as a path which contains folders, file names and extensions, which is then understandable by the Globus Storage Connectors (see 08 - Transfer Manager) and presented to users as folders. Most important, however, is the controlling of access through Globus which is at the "folder" level. In order to limit access based upon embargo rules, data within the data bucket includes a dataset folder to ensure all data is both logically grouped and embargo (by proposal) consistent. The same is true of the raw data in the raw bucket, but instead of datasets (which don't exist at that time), the grouping folder is "Observing Program Execution Id."
Bucket Naming Convention
The convention used is all lowercase, and separated by the "-" symbol.
Type: Object Store
Bucket | Description |
---|---|
inbox | Frame manifests to be ingested with lines in a file containing "/Transfer/$ExperimentID/$ObsProgramExecID/$InstrumentProgramID/$FrameID" |
inbox | Science data to be ingested |
inbox | Science Data (Raw VBI) data to be ingested |
ingest-fail | Frame manifests failing ingest |
ingest-fail | Science data failing ingest |
ingest-fail | Raw VBI data failing ingest |
category-fail | Data failing categorization |
data | Processed Frames, ASDF files, and browse movies |
Parameter values that were too large to be loaded as JSON into the parameter value table | |
raw | Raw Frames (prefix is "ProposalId/ObservingProgramExecutionId") |
raw-data-archive | Raw Frames (prefix is "ProposalId/ObservingProgramExecutionId") |
Data Protection
Ceph uniquely delivers object, block, and file storage in one unified system. Ceph is highly reliable, easy to manage, and free. Ceph delivers extraordinary scalability–thousands of clients accessing petabytes to exabytes of data. A Ceph Node leverages commodity hardware and intelligent daemons, and a Ceph Storage Cluster accommodates large numbers of nodes, which communicate with each other to replicate and redistribute data dynamically.
For more information, visit:Â Ceph Architecture