Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This article describes in detail how the yuuvis® RAD metrics-manager works, on what technologies it is based, and what it is used for.

...

In the past, users often asked "how can I tell if yuuvis® RAD is running properly?". Also, even if it is running, some systems are running better and some systems worse. Apart from the obvious differences in hardware sizing, we had a hard time to measure the quality of a system's performance. In the lack of informative metrics and criteria, thorough analysis of the log files and strenuous tracing of the path of the data and messages from the gained information had to be done to get a grip on the bottlenecks. 
To make this process easier and more convenient, we envisioned a framework or platform that does most of the work of condensing and aggregating the log files for us and presenting the results in a visually comprehensible way. This is what the yuuvis® RAD metrics-manager does.

Basics

To be able to do this, the yuuvis® RAD components needed to be adapted in the following way:

...

We decided to use the Elasticsearch database and its stack for this purpose as it has great aggregation functions and can handle queries very fast, even on millions of records. In detail, the tools that yuuvis® RAD metrics-manager comprises are:

  • logstash

    Logstash is a server-side data processing pipeline that ingests data from many sources like tcp or one of the elastic beats, transforms it, and then sends it to Elasticsearch. All metrics-manager tools use logstash to send data to Elasticsearch.
  • filebeat

    Filebeat is a small and simple tool that reads log files and sends the data line by line to Elasticsearch using logstash.
  • metricbeat

    Metricbeat is another tool of the beats family that can read system metrics like CPU load or disc I/O and sends the data to Elasticsearch using logstash.
  • elastalert

    Elastalert is a third-party tool that can be used to alert users over various channels on anomalies, spikes, or other patterns of interest from data in Elasticsearch. This can be done by creating definition files that specify the conditions that need to be met for an alert to trigger.
  • kibana

    Kibana is a frontend application that lets you visualize the data in Elasticsearch indices by running aggregations or similar queries and plotting the results in diagrams, graphs, timelines, etc. You can restrict the visualization to specific time ranges or view the entire data at once. 

So, this is how yuuvis® RAD metrics-manager works:

We use filebeat to read the metrics log files from the dms-service and service-manager and create documents in Elasticsearch for each logged REST call by sending it to logstash. Also, we use metricbeat to collect system metrics like CPU load, RAM utilization and disk I/O. We let metricbeat write the collected data into a file so that filebeat can read it and sent it to Elasticsearch via logstash as well. Logstash takes all data and puts it into an index called logstash-<datestamp>. So a new index is created every day containing all the logged calls of that day. As said before, on a heavily used system the amount of data can quickly become very large. In order not to let the hard drive run full, make use of the elasticsearch index lifecycle management (ILM) to rollover and delete indices after a predefined period. Per default, the indices are rolled over after 1 day and deleted after 45 days. We find this to be a sufficiently long period of being able to look back on the things happening in the system. But, of course, this can be configured to suit your needs. The longer you want to be able to look back, the more indices and thus space you'll need and vice versa.
Finally we use kibana to visualize the data in graphs, diagrams and timelines. Here, we take full advantage of the aggregation and condensing abilities that Elasticsearch and kibana offer us. Here are some examples of what you can find out with the available data:

...

In combination with elastalert, you can send notifications to admins or managers to inform them about conditions like reached maximums, (too many) errors or dropping response times. 

Integrating the metrics-manager into the yuuvis® RAD environment

The yuuvis® RAD metrics-manager is an optional extension to the yuuvis® RAD system. As such, it is not installed by default. To run it, you might have to extend your hardware resources to support the extra load. 
The installation - as described in the installation guide - is basically divided into two parts. The first one is the activation of the metrics log files and letting filebeat (+metricbeat) send the data to logstash. The second one is installing elasticsearch, logstash and kibana on a machine to store and display the data received by filebeat and metricbeat. While the first part "only" adds the load of writing (lots of) lines to a file, the second part adds an entire Elasticsearch database with potentially millions of records plus the kibana backend. The machine hosting this part should have at least 8 GB of free RAM, the equivalent of about 2 free CPUs and enough free hard drive space for the new data. Depending on the load of the system, this can range from a couple to 20-30 GB per day. If possible, an exclusive machine with 4 CPUs, 16GB RAM and about 300 GB hard drive space would surely be the best choice.

...