02 - System Monitoring

Design Diagram

The System Monitoring service is supported by the ELK Stack (Elasticsearch, Logstash, Kibana, and X-Pack alerting) and Prometheus/Kibana systems. The monitoring data is segmented into 2 classes:

  • Events: Textual information such as logs are handled by ELK.
  • Telemetry: Time series metrics such as CPU load or storage utilization are handled by Prometheus/Kibana.

Event and telemetry data is harvested from the Data Center infrastructure to retrieve system health for operational support and consumption. The method is described in more detail on the following page: 01 - System Orchestration (DC Cloud), and it allows for dynamically responding to the state of the system to occur.


More information is available via the following links:

https://www.elastic.co/products/

https://prometheus.io/

https://elastalert.readthedocs.io/en/latest/elastalert.html#overview