metrics-manager Dashboard
Once you have successfully installed yuuvis® RAD metrics-manager and opened Kibana in your browser, you are automatically forwarded to the Metrics-Dashboard in the Dashboard section of Kibana. Here, you will find a predefined set of visualizations of yuuvis® metrics, as well as the status of the hardware resources of your servers. The following picture shows you an exemplary view of the Metrics-Dashboard. Below the picture, each visualization/point of interest (POI) highlighted in yellow is explained in detail.
(1) This is the menu section. Click on it to navigate to other dashboards, sections (Discover, Visualizations), or the management – or, if already opened before, restore the previously opened dashboards.
(2) This is the time range selector. Click on the arrow down icon to select your time range. It is very important to keep in mind that all visualizations shown only represent the data that corresponds to the selected time range. If you have empty visualizations or you think some data is missing, your time range might be too small.
(3) This is the navigation bar in the Metrics-Dashboard. Click on any dashboard title to switch to it. The specialized dashboards may contain more visualizations than shown on the overview dashboards.
(4) This visualization shows the CPU utilization of each one of your servers that runs metricbeat. The value shown is the average over the entire selected time range ((2)). (You can change intervals for the green, yellow and red zones in the visualization editor, but only for the entire visualization and not for each individual tacho.)
(5) This visualization shows the RAM utilization of each one of your servers that runs metricbeat. The value shown is the average over the entire selected time range ((2)). 100% means the RAM is completely utilized. (You can change intervals for the green, yellow and red zones in the visualization editor.)
(6) This visualization shows the course of the CPU utilization of each one of your servers that runs metricbeat over the selected time range. The time range is subdivided into single buckets that aggregate its data. The amount of time that each bucket represents is automatically determined by Kibana so that the entire diagram fits the available space. Each dot in the diagram represents one bucket. The actual bucket size is the difference between two dots. The value of each dot is the average CPU utilization of the depicted server over the time that the bucket represents.
Handling the diagrams
Zoom in the time range:
You can zoom in to a smaller time range by clicking on the desired start of the time range in the diagram, holding the left mouse button and dragging the mouse to the end of the desired time range. The time range selector will be adjusted to the new values and the diagram (actually the entire page) will be reloaded to show only this time range. Be aware that all visualizations are affected by this change. You can return to the original time range by clicking on the back button of the browser or manually adjusting the time range selector ((2)).
Course Line Filtering/Color:
With a click on the name of the desired course line in the legend of the visualization, a sub-menu with two magnifying glasses and a color palette will show up. If you click on one of the colors you will change the color of the corresponding course line. If you click on the magnifying glass with the plus symbol, you will set a filter so that only this specific course line is shown and all others are taken out of the diagram. By clicking on the magnifying glass with the minus symbol the corresponding course line will be filtered out of the diagram, leaving all others there.
To go back to the previous color, either click on it in the palette or click the back button of the browser. To get the filtered course line back, click the back button of your browser or scroll to the top and click on the "x" of the filter that is now present (right above the CPU tachos visualization).
(7) This visualization shows the course of the RAM utilization of each one of your servers that runs metricbeat over the selected time range. The visualization behaves analogous to (6).
(8) This visualization shows the course of how many bytes (on average over the bucket size) were read on each of the hard drive partitions of each server running metricbeat.
(9) This visualization shows the course of the utilization (on average over the bucket size) of each of the hard drive partitions/network shares (if using NetworkShareMonitor) of each server running metricbeat.
(10) This visualization shows the course of how many bytes (on average over the bucket size) were written on each of the hard drive partitions of each server running metricbeat.
(11) This visualization shows the course of how much CPU time in percent (on average over the bucket size) was utilized by each (operating system) process of each server running metricbeat. Reduced to the top 10 processes over all servers.
(12) This visualization shows the course of how much RAM space in percent (on average over the bucket size) was utilized by each (operating system) process of each server running metricbeat. Reduced to the top 10 processes over all servers.
(13) This visualization shows the course of how many bytes (on average over the bucket size) were received by each network interface of each server running metricbeat.
(14) This visualization shows the course of how many bytes (on average over the bucket size) were sent by each network interface of each server running metricbeat.
(15) This visualization shows the status of each (operating system) service that is a part of the yuuvis® RAD system of each server running metricbeat.
(16) This visualization shows the percentage share of the status of each (operating system) service that is a part of the yuuvis® RAD system of each server running metricbeat.
(17) This visualization shows the course of the utilization of the JDK's heap space (on average over the bucket size) of each yuuvis® RAD component of each server running metricbeat.
(18) This indicates that the below following visualizations are all related to Elasticsearch.
(19) This visualization shows the status of each Elasticsearch cluster that is monitored by a server running metricbeat. (Usually this is only the "es-red" cluster used by yuuvis® RAD.)
(20) This visualization shows the status of each Elasticsearch index that is part of a cluster shown in (19).
(21) This visualization shows the status of each enaiored* and autocomplete* index that is part of an index shown in (20). (Other indices are used only for management purposes and should have more than one shard.)
(22) This visualization shows the course of the number of documents (on average over the bucket size) that the enaiored_* index contained. (Note: The number of documents in Elasticsearch can be higher than the number of dms objects because lists and tables are saved sub-documents in Elasticsearch that also count as documents.)
(23) This visualization shows the course of the duration (on average over the bucket size) of each garbage collection run of the Elasticsearch JDK.
(24) This indicates that the below visualizations are all related to yuuvis® RAD.
(25) This visualization shows how many different users and sessions (on average over the bucket size) were active during the selected time range. With users, the unique login names are meant. With sessions, the unique session ids are meant. So for example, if one user logs into the client with their browser and also has an agent that is logged in, then there will be one active user but two active sessions.
(26) This visualization shows the course of how many REST-API requests each user (login name) has made (summed up per bucket). This means either implicitly by using the client, the agent, etc. or explicitly by directly sending a request to the API. It is ordered by the total amount of calls made, descending.
(27) This metric shows the count of how many different users (i.e., login names) made at least one call over the entire selected time range.
(28) This metric shows the count of how many different sessions (i.e., session ids) appeared in at least one request over the entire selected time range.
(29) This visualization shows the course of how many messages each queue of the messaging service contained – on average over the bucket size. This visualization is especially suitable for monitoring if bursts of actions – e.g., importing or updating many objects at once – can be processed before another/the next burst starts. In this case, for example, you would see a sudden raise in the FULLTEXTINDEX and RENDITION queues and a gradual descent of them. If the queues reach 0 before the next burst the performance is sufficient, if not, there might be a performance/hardware resources problem.
(30) This metric shows the course of how many messages (on average over the bucket size) were enqueued in each of the queues in the ActiveMQ message broker.
(31) This metric shows the course of how many messages (on average over the bucket size) were dequeued in each of the queues in the ActiveMQ message broker.
(32) This metric shows the course of how many messages (on average over the bucket size) were enqueued in each of the topics in the ActiveMQ message broker.
(33) This metric shows the course of how many messages (on average over the bucket size) were dequeued in each of the topics in the ActiveMQ message broker.
(34) This set of metrics shows the average processing duration in milliseconds of a) all requests b) all search requests (including aggregation requests) c) all object creation requests (with and without content file) d) all object read requests (equal to opening the object in the client) made during the entire selected time range. This is only to get a rough overview of the duration relations between creating, reading and searching objects in comparison to the overall average.
(35) This visualization shows the course of the processing duration in milliseconds of a) all requests b) all search requests (including aggregation requests) c) all object creation requests (with and without content file) d) all object read requests (equal to opening the object in the client) made during the selected time range (averaged over the bucket size).
(36) This set of metrics shows the total number of requests were send to the a) core-service b) search-service and c) index-service. This is only to get an understanding under how much load each components is.
(37) This visualization shows the course of how many requests resulted what type of http response code where 2xx means OK, 3xx means forwarded but OK, 4xx means logical error (user error) and 5xx means internal server error. The values are the sum over each bucket. This visualization is especially suitable for monitoring if, for example, after a change or update, something does not work correctly anymore. In this case you will see the 4xx and/or 5xx request count rising up and (logically) the 2xx request count descending. The Y axis is logarithmic for once so that course of the usually very few 4xx and maybe 5xx requests can still be seen in sufficient detail.
The metrics (38) - (52) show the total number of the following actions that succeeded (left side) and failed (right side):
(38) object (document) creations with a content file (the endpoint can also be called without actually passing a content file, this cannot be distinguished here)
(39) object creations without a content file (documents and folders)
(40) subsequent additions of a content file to an existing object (document)
(41) batch creations of objects (only the number of batch requests. The actual number of objects created using this endpoint is not shown here.)
(42) updates to the index data of an object
(43) batch updates to the index data of objects (only the number of batch requests; the actual number of objects updated using this endpoint is not shown here)
(44) object deletions (includes soft (to the recycle bin) and hard (actual deletion) deletions)
(45) batch deletions of objects (only the number of batch requests; the actual number of objects deleted using this endpoint is not shown here)
(46) retrievals of the PDF rendition of an object (document with content file)
(47) retrievals of the original content file of an object (document with content file)
(48) search requests excluding aggregations
(49) aggregation requests (previews the number of objects that the current search configuration would yield)
(50) creations of objects using the prepared state of the client (i.e., clicking on "File" or "File and open" in the prepared state of the client)
(51) executed ETL configurations (Be aware that ETL is already deprecated and will soon not be supported anymore. We recommend using Talend Open Studio.)
(52) starts of workflow instances
(53) This metric set shows the count of requests sent to the top 10 – i.e., most often called – endpoints of the core-service's REST API. The value is the sum of requests made during the entire selected time range. This is to get a rough estimate of which endpoints are very important to the system.
(54) This metrics set is analogous to (53) only that it is restricted to the DmsService subset of endpoints. This is to get an estimate of how many and what kind of object manipulations are done.
(55) This visualization corresponds to the "API Trace" site of the REST API website. It shows one table for each of the http response code categories (2xx, 3xx, 4xx, 5xx). In the tables, you see the endpoints that responded at least once with the corresponding http result code – sorted by the total amount of requests sent to this endpoint (total invocations). Besides the count you can see what the minimum, maximum, total (summed up) and average response duration was so for – for the selected time range. If a table does not show up, like in this example the 5xx table, it means there were no requests that resulted in this response code.
For more information on how to handle Kibana (manually setting filters, editing existing or creating new visualizations, etc.) please refer to the Kibana documentation at https://www.elastic.co/guide/en/kibana/8.4/index.html.