This article describes in detail how the yuuvis® RAD metrics-manager works, on what technologies it is based, and what it is used for.
Introduction
In the past, users often asked "how can I tell if yuuvis® RAD is running properly?". Also, even if it is running, some systems are running better and some systems worse. Apart from the obvious differences in hardware sizing, we had a hard time to measure the quality of a system's performance. In the lack of informative metrics and criteria, thorough analysis of the log files and strenuous tracing of the path of the data and messages from the gained information had to be done to get a grip on the bottlenecks.
To make this process easier and more convenient, we envisioned a framework or platform that does most of the work of condensing and aggregating the log files for us and presenting the results in a visually comprehensible way. This is what the yuuvis® RAD metrics manager does.
Basics
To be able to do this, the yuuvis® RAD components needed to be adapted in the following way:
...
Of course, when a system is heavily used, these metric log files will quickly contain thousands or even millions of lines. A manual analysis is not feasible anymore - or only covers a very small fraction of the actual data. This is not representative and might not even contain the data that leads you to the cause of your problems. Hence, the data must be processed and aggregated to be able to effectively work with it.
The metrics-manager and the elastic(search) stack*
We decided to use the Elasticsearch database and its stack for this purpose as it has great aggregation functions and can handle queries very fast, even on millions of records. In detail, the tools that yuuvis® RAD metrics-manager comprises are:
logstash
Logstash is a server-side data processing pipeline that ingests data from many sources like tcp or one of the elastic beats, transforms it, and then sends it to Elasticsearch. All metrics-manager tools use logstash to send data to Elasticsearch.filebeat
Filebeat is a small and simple tool that reads log files and sends the data line by line to Elasticsearch using logstash.metricbeat
Metricbeat is another tool of the beats family that can read system metrics like CPU load or disc I/O and sends the data to Elasticsearch using logstash.elastalert
Elastalert is a third-party tool that can be used to alert users over various channels on anomalies, spikes, or other patterns of interest from data in Elasticsearch. This can be done by creating definition files that specify the conditions that need to be met for an alert to trigger.kibana
Kibana is a frontend application that lets you visualize the data in Elasticsearch indices by running aggregations or similar queries and plotting the results in diagrams, graphs, timelines, etc. You can restrict the visualization to specific time ranges or view the entire data at once.
...
In combination with elastalert, you can send notifications to admins or managers to inform them about conditions like reached maximums, (too many) errors or dropping response times.
Integrating the metrics manager into the yuuvis® RAD environment
The yuuvis® RAD metrics manager is an optional extension to the yuuvis® RAD system. As such, it is not installed by default. To run it, you might have to extend your hardware resources to support the extra load.
The installation - as described in the installation guide - is basically divided into two parts. The first one is the activation of the metrics log files and letting filebeat (+metricbeat) send the data to logstash. The second one is installing elasticsearch, logstash and kibana on a machine to store and display the data received by filebeat and metricbeat. While the first part "only" adds the load of writing (lots of) lines to a file, the second part adds an entire Elasticsearch database with potentially millions of records plus the kibana backend. The machine hosting this part should have at least 8 GB of free RAM, the equivalent of about 2 free CPUs and enough free hard drive space for the new data. Depending on the load of the system, this can range from a couple to 20-30 GB per day. If possible, an exclusive machine with 4 CPUs, 16GB RAM and about 300 GB hard drive space would surely be the best choice.
...