/
01 - Object Store

01 - Object Store

Design Diagram

The Object Store is primarily backed by Ceph. Ceph (https://ceph.com/) is open source object storage software that can be used with commodity hardware. Also supporting the Object Store is Amazon Glacier (https://aws.amazon.com/glacier/). Glacier is used for the long term back up of raw data. 

An idea to facilitate floor-to-book auditing was to mirror create, update and delete operations from the Data Zone to the Metadata Zone for metadata replication in the Metadata Store. In order to leverage RGW Sync, constraints were placed on the Ceph multi-site configuration that prevented dynamic bucket index re-sharding. Dynamic bucket index re-sharding is needed to prevent large bucket indexes (which we would have for raw and data) from causing performance issues. Future releases of Ceph may remove this constraint, but until such a time as the constraint is removed or a new design created, book-to-floor support via RGW Sync is not enabled.


Logical Data Model 

In an object store, "folders" do not exist. The object key can be formatted as a path which contains folders, file names and extensions, which is then understandable by the Globus Storage Connectors (see 08 - Transfer Manager) and presented to users as folders. Most important, however, is the controlling of access through Globus which is at the "folder" level. In order to limit access based upon embargo rules, data within the data bucket includes a dataset folder to ensure all data is both logically grouped and embargo (by proposal) consistent. The same is true of the raw data in the raw bucket, but instead of datasets (which don't exist at that time), the grouping folder is "Observing Program Execution Id."

Bucket Naming Convention

The convention used is all lowercase, and separated by the "-" symbol.

Type: Object Store

BucketDescription
inbox

Frame manifests to be ingested with lines in a file containing

"/Transfer/$ExperimentID/$ObsProgramExecID/$InstrumentProgramID/$FrameID"

inboxScience data to be ingested
inboxScience Data (Raw VBI) data to be ingested
ingest-failFrame manifests failing ingest
ingest-failScience data failing ingest
ingest-failRaw VBI data failing ingest
category-failData failing categorization
dataProcessed Frames, ASDF files, and browse movies
Parameter values that were too large to be loaded as JSON into the parameter value table
rawRaw Frames (prefix is "ProposalId/ObservingProgramExecutionId")
raw-data-archiveRaw Frames (prefix is "ProposalId/ObservingProgramExecutionId")

Data Protection

Ceph uniquely delivers object, block, and file storage in one unified system. Ceph is highly reliable, easy to manage, and free. Ceph delivers extraordinary scalability–thousands of clients accessing petabytes to exabytes of data. A Ceph Node leverages commodity hardware and intelligent daemons, and a Ceph Storage Cluster accommodates large numbers of nodes, which communicate with each other to replicate and redistribute data dynamically.

For more information, visit: Ceph Architecture