01 - System Orchestration (DC Cloud)

Design Diagram

The System Orchestration service is the backbone of the Data Center, and is responsible for the provisioning of services and their run time ability to communicate with each other. To achieve this, there are multiple components supported by a number of interconnected products that work together in concert:

  • Consul (https://www.hashicorp.com/products/consul)
    • Service Discovery: Clients of Consul can register a service, such as api or mysql, and other clients can use Consul to discover providers of a given service. Using either DNS or HTTP, applications can easily find the services they depend upon.
    • Health Checking: Consul clients can provide any number of health checks, either associated with a given service ("is the webserver returning 200 OK"), or with the local node ("is memory utilization below 90%"). This information can be used by an operator to monitor cluster health, and it is used by the service discovery components to route traffic away from unhealthy hosts.
    • Secure Service Communication: Consul can generate and distribute TLS certificates for services to establish mutual TLS connections. Intentions can be used to define which services are allowed to communicate. Service segmentation can be easily managed with intentions that can be changed in real time instead of using complex network topologies and static firewall rules.
    • KV Store: Applications can make use of Consul's hierarchical key/value store for any number of purposes, including dynamic configuration, feature flagging, coordination, leader election and more. The simple HTTP API makes it easy to use.
  • Ansible (https://docs.ansible.com/)
    • Agentless: Ansible uses standard network connections (SSH) so it can coexist with legacy tools and is easy to install, configure, and maintain.
    • General purpose IT automation platform: Ansible handles declared state enforcement across the infrastructure, as well as broad multi-component and multi-system orchestration of complicated interconnected systems.
  • Vault (https://www.hashicorp.com/products/vault/)
    • Vault allows for the user to centrally store, access, and distribute secrets like API keys, AWS IAM/STS credentials, SQL/NoSQL databases, X.509 certificates, SSH credentials, etc.
  • JFrog Artifactory (https://jfrog.com/artifactory/) 
    • This component acts as a universal repository manager, and is used as a Local PyPi and repository for service binaries, e.g., Docker containers, jars, etc.
  • Bitbucket (https://bitbucket.org/product)
    • This component acts as a script and configuration repository.
  • MAAS (https://maas.io/docs)
    • MAAS gives the user access to templates to define the topology of infrastructure using code.
    • It also provide physical hardware provisioning.
  • Nomad (https://www.hashicorp.com/products/nomad)
    • Nomad is a flexible cluster scheduler to automate the deployment of any services on any platform.
  •  DNS
    • This component offers name resolution for "*.dkistdc.nso.edu" and "*.consul" items.


Key Concept: Orchestration Inventory

The Orchestration Inventory contains multiple types of metadata with global and environment-specific scope.


Key Concept: Ansible Inventory, Playbooks and Variables

Ansible uses an inventory file (basically a list of servers) to communicate with servers. Playbooks are the code that runs on inventory host.

Playbooks: The infrastructure as code.

Variables: This term references variables necessary to re-use infrastructure code to realize multiple instances of infrastructure. They are merged at runtime to create a specific list of variables for a specific host. The items below are some of the available variable types:

  • Global: Values that are applicable to the entire Data Center inventory.
  • Site: Values that are applicable to either Boulder or Maui.
  • Environment: Values that are applicable to an environment inside a site (i.e., stage, prod, etc.).
  • Group: Values that are applicable to a group of hosts(i.e., gluster_servers, hashistack_servers, etc.). Hosts can be members of multiple groups.
  • Host: Values that are applicable to a specific host.


Consul KV Data Model

Config:

The "config" tree captures configuration data for the services that can be accessed during runtime. KV's that are specific to the service are in the service's root folder. KV's that are used for a related service are in a subfolder named after that related service.

Ops:

The "ops" tree captures data that can be used in operating the system. Some of the "key:value" pairs allow the Data Center to dynamically orchestrate groups of services. Other pairs provide the "Source of Truth" for the installed version of software that is successfully deployed to a stack. This in turn allows for analysis and rollback options to be available.


Key Concept: Service Discovery

Service Discovery enables services to identify how to contact a service they depend on via an unchanging name (e.g., "interservice-bus"). This enables services to be deployed to an available host without impact to dependent services. Additionally, the state of health monitoring provided by Consul enables load balancing to only healthy nodes. 



Key Concept: Turtlebot

Turtlebot is Data Center developed python software that assists in intelligent, automated deployments. The software is comprised of several deployers. Deployers interact with a specific platform inside the Data Center cloud. Based on deployer metadata it queries from the Service Library, Turtlebot operates and deploys artifacts on the correct platform. 


Basic Flow:


Key Concept: Turtlebot Deployers

DeployerArtifactTarget PlatformMethod
NomadDeployerTemplate + Docker ContainerNomadTemplate is hydrated with service-library metadata and submitted to Nomad via the Nomad API.
AirflowDeployerZip FileAutomated ProcessingZip file is exploded into a static target location configured for the automated processing shared service.
MongoDeployerjavascriptMongo ServerJavascript is remotely executed via the Mongo API.
PostgresDeployerSQLPostgres DBSQL is remotely executed on the DB server using a Postgres DB API.

Key Concept: Service Library

The service-library is, at its core, a property graph database that maps relationships and entities of the Data Center system. 

Data Model:

Conceptually there are 4 major nodes and relationships that make up the heart of the service library, which depict how it represents the design. These are shown in the diagram below:


Additional Relationships and Nodes also make up the full service library schema to support the queries it is currently designed to support. These are shown in the following diagram: