metrics-manager dashboard

Once you have successfully installed the metrics-manager and opened Kibana in your browser, you can open the "metrics-dashboard" in the "Dashboards" section of Kibana. Here you will find a predefinded set of visualizations of yuuvis® metrics as well as the status of the hardware resources of your servers. The following picture shows you an exemplary view of the metrics-dashboard. Below each visualization highlighted in the yellow boxes is explained in detailed.

(1) This is the Dasboards section. Klick on it to see all available dashboards or, if already opened before, restore the previously opened dashboard. In the latter case click on the button again to get back to the overview of all dashboards.

(2) This is the time range selector. Click on the arrow-down symbol to choose your time range. It is very important to keep in mind that all visualizations shown only represent the data that corresponds to the here selected time range. If you have empty visualizations or you think some data is missing, your time range might be too small.

(3) This visualization shows the CPU utilization of each of your server that runs metricbeat. The value shown is the average over the entire chosen time range ((2)). (You can change intervals for the green, yellow and red zones in the visualization editor, but only for the entire visualization and not for each tacho.)

(4) This visualization shows the RAM utilization of each of your server that runs metricbeat. The value shown is the average over the entire chosen time range ((2)). 100% means the RAM is completely utilized. (You can change intervals for the green, yellow and red zones in the visualization editor.)

(5) This visualization shows the course of the CPU utilization of each of your server that runs metricbeat over the chosen time range. The time range is subdivided into single buckets that aggregate its data. The amount of time that each bucket represents is automatically determined by kibana so that the entire diagram fits the available space. Each dot in the diagram represents one bucket. The actual bucket size is the difference between two dots. The value of each dot is the average CPU utilization of the depicted server over the time that the bucket represents.

Handling the diagrams

Zoom in the time range:
You can zoom in to a smaller time range by clicking on the desired start of the time range in the diagram, holding the button clicked and dragging the mouse to the end of the desired time range. The time range selector will be adjusted to the new values and the diagram (actually the entire page) will be reloaded to show only this time range. Be aware that all visualizations are affected by this change. You can return to the original time range by clicking on the back button of the browser or manually adjusting the time range selector ((2)).

Course Line Filtering / Color:
With a click on the name of the desired course line in the legend of the visualization, a sub-menu with two magnifying glasses and a color palette will show up. If you click on one of the colors you will change the color of the corresponding course line. If you click on the magnifying glass with the plus symbol in it, you will set a filter so that only this specific course line is shown and all others are taken out of the diagram. By clicking on the magnifying glass with the minus symbol in it the corresponding course line will be filtered out of the diagram, leaving all others there.
To go back to the previous color, either click on it in the palette or click the back button of the browser. To get the filtered course line back, click the back button of your browser or scroll to the top and click on the "x" of the filter that is now present (right above the CPU tachos visualization).

(6) This visualization shows the course of the RAM utilization of each of your server that runs metricbeat over the chosen time range. The visualization behaves analogous to (5).

(7) This visualization shows the course of how many bytes (on average over the bucket size) were read on each of the hard drive partitions of each server running metricbeat.

(8) This visualization shows the course of how many bytes (on average over the bucket size) were written on each of the hard drive partitions of each server running metricbeat.

(9) This visualization shows how many different users and sessions (on average over the bucket size) were active during the chosen time range. With users, the unique login-names are meant. With sessions, the unique session-ids are meant. So for example, if one user logs into the client with his/her browser and also has an agent that is logged in, then there will be one active user but two active sessions.

(10) This visualization shows the course of how many REST-API requests each user (login-name) has made (summed up per bucket). This means either implicitly by using the client, the agent, etc. or explicitly by directly sending a request to the API. It is ordered by the total amount of calls made, descending.

(11) This metric shows the count of how many different users (i.e. login-names) made at least one call over the entire time range chosen.

(12) This metric shows the count of how many different sessions (i.e. session-ids) appeared in at least one request over the entire time range chosen.

(13) This visualization shows the course of how many messages each queue of the messaging service contained - on average over the bucket size. This visualization is especially suitable for monitoring if bursts of actions - e.g. importing or updating many objects at once - can be processed before another / the next burst starts. In this case, for example, you would see a sudden raise in the FULLTEXTINDEX and RENDITION queues and a gradual descent of them. If the queues reach 0 before the next burst the performance is sufficient, if not there might be a performance / hardware resources problem.

(14) This set of metrics shows the average processing duration in milliseconds of a) all requests b) all search requests (including aggregation requests) c) all object creation requests (with and without content-file) d) all object read requests (equal to opening the object in the client) made during the entire time range chosen. This is only to get a rough overview of the duration relations between creating, reading and searching objects in comparison to the overall average.

(15) This visualization shows the course of the processing duration in milliseconds of a) all requests b) all search requests (including aggregation requests) c) all object creation requests (with and without content-file) d) all object read requests (equal to opening the object in the client) made during the time range chosen (averaged over the bucket size).

(16) This set of metrics shows the total number of requests were send to the a) dms-service b) search service and c) index service. This is only to get an understanding of how much load each components is under.

(17) This visualization shows the course of how many requests resulted what type of http response code where 2xx means ok, 3xx means forwarded but ok, 4xx means logical error (user-error) and 5xx means internal server error. The values are the sum over each bucket. This visualization is especially suitable for monitoring if, for example after a change or update, something does not work correctly anymore. In this case you will see the 4xx and/or 5xx request count rising up and (logically) the 2xx request count descending. The Y-axis is logarithmic for once so that course of the usually very few 4xx and maybe 5xx requests can still be seen in sufficient detail.

The metrics (18) - (32) show the total number of the following actions that succeeded (left side) and failed (right side):

(18) object (document) creations with a content file (the endpoint can also be called without actually passing a content-file, this can not be distinguished here)

(19) object creations without a content-file (documents and folders)

(20) subsequent additions of a content-file to an existing object (document)

(21) batch creations of objects (only the number of batch requests. The actual number of objects created using this endpoint is not shown here.)

(22) updates to the index data of an object

(23) batch updates to the index data of objects (only the number of batch requests. The actual number of objects updated using this endpoint is not shown here.)

(24) object deletions (includes soft (to the recycle bin) and hard (actual deletion) deletions)

(25) batch deletions of objects (only the number of batch requests. The actual number of objects deleted using this endpoint is not shown here.)

(26) retrievals of the PDF rendition of an object (document with content-file)

(27) retrievals of the original content-file of an object (document with content-file)

(28) search requests excluding aggregations

(29) aggregation requests (previews the number of objects that the current search configuration would yield)

(30) creations of objects using the prepared state of the client (i.e. clicking on "file" or "file and open" in the prepared state of the client)

(31) executed ETL configurations (Be aware that ETL is already deprecated and will soon not be supported anymore. We recommend using Talend Open Studio.)

(32) Starts of workflow instances

(33) This metric set shows the count of requests sent to the top 10 - i.e. most often called - endpoints of the dms services REST API. The value is the sum of requests made during the entire time range chosen. This is to get a rough estimate of which endpoints are very important to the system.

(34) This metrics set is analogous to (33) only that it is restricted to the DmsService subset of endpoints. This is to get an estimate of how many and what kind of object manipulations are done.

(35) This visualization corresponds to the "API Trace" site of the REST API website. It shows one table for each of the http response code categories (2xx, 3xx, 4xx, 5xx). In the tables you see the endpoints that responded at least once with the corresponding http result code - sorted by the total amount of requests sent to this endpoint (total invocations). Besides the count you can see what the minimum, maximum, total (summed up) and average response duration was so for - for the time range chosen. If a table doesn't show up, like in this example the 5xx table, it means there were no requests that resulted in this response code.

For more information on how to handle kibana (manually setting filters, editing existing or creating new visualizations, etc.) please refer to the kibana documentation at https://www.elastic.co/guide/en/kibana/7.1/index.html