This guide describes how to install the yuuvis® RAD metrics-manager.
For a successful installation of yuuvis® RAD metrics-manager, you need to have one installation on each machine that hosts a yuuvis® RAD core-service or service-manager instance that you want to monitor. Also you need to identify a system where the metrics-manager Elasticsearch database, kibana, logstash and optionally elastalert2 will be hosted. This machine should have sufficient free resources to handle this extra load (at least 4 CPUs and 12 GB RAM).
Where to Find the Installers
The setup for yuuvis® RAD metrics-manager is included in the regular product release folder.
Activating the Metrics Log Files
In order for yuuvis® RAD metrics-manager to work properly, the core-service and the service-manager must be configured to write their metrics information to a metrics log file. To do this, follow these steps:
- core-service
- Navigate to the logging configuration of the REST-WS gui page at http://<gateway>/rest-ws/#PAGE:monitor/logging
Set the logger "com.os.ecm.ws.metrics" to the log level "TRACE".
Make sure that "use parent handler" is not checked.- The change takes immediate effect.
- service-manager
- Edit the file <service-manager>\config\application-prod.yml
- Set the parameter "monitoring.trace.enabled" to true
- Save the file and restart the service-manager
Main Installation (including Elasticsearch, logstash and kibana)
- Double-click to start the setup.
- Click next to start the setup procedure.
- Choose the installation directory.
Attention: Do not use an installation path containing spaces! - At HTTP port, you can configure the metrics-manager servicewatcher port. This port has no special significance but should only be changed if the default port (8283) is already being used.
- Choose the IP address that metrics services should bind to. This should be the IP address visible to other machines in the LAN/WAN.
- Click next to accept the installation of elasticsearch, kibana and logstash.
- If you have a core-service and/or service-manager installation on this machine, keep filebeat marked for installation and enter the paths to the core-service metrics log file and/or service-manager metrics log file (or use the buttons to the right to open a file selection dialog).
If you have a distributed system, just leave the field(s) of the component(s) located on other machines empty. If you have neither a core-service nor a service-manager installed on this machine, you will still need filebeat to send the metricbeat information to logstash. If you do not want system metrics of this machine in your collected data, then uncheck the filebeat as well as metricbeat checkboxes.
The path to the metricbeat log file is predefined for you. If for some reason, this is incorrect or you want to change it, you can do so now.
Under the metricbeat checkbox, make sure that the prefilled IP address is the address of the machine running the logstash service. If not, change it to that address. - If you chose to install metricbeat, you can now choose if you want to install optional metrics modules for collecting metrics of the relational database, elasticsearch, and ActiveMQ.
- If you chose to install the database metrics module, enter the jdbc connection-string (see the file <core-service>\standalone\configuration\jas-app.xml for a reference), the username and the password for the database connection.
- If you chose to install the elasticsearch metrics module, enter the host-address(es) of the elasticsearch server(s) and the username and password to access elasticsearch. You can refer to the file <service-manager>\config\application-es.yml for these values.
- If you chose to install the ActiveMQ metrics module, enter the host address of the messaging-service (within the service-manager) and the path to the jolokia endpoints (the predefined value should be correct already).
- Optionally you can choose to install the Network Share Monitor by cheking the "Install network share monitor" checkbox. Then enter the URI(s) for the network share(s) and the credentials, i.e. the Windows-Domain name, the Windows-Domain username and the password. The path of the datafile can be adjusted if desired. It is automatically added to the list of watched files in the configuration of the filebeat component.
- Optionally you can choose to install elastalert2 which will provide alerting for critical and error situations. You can choose if you want to be alerted by email or via MS Teams notification, or both.
- For the email notification you need to enter valid email (smtp-) server access data and if necessary credentials.
- For the MS Teams notification you need to enter the Webhook URL that you get when creating the "incoming webhook" connector for the desired Teams channel. Refer to https://docs.microsoft.com/en-us/microsoftteams/platform/webhooks-and-connectors/how-to/add-incoming-webhook for more information.
- Setup now has all required information. Click next to start the installation.
- If you like, you can start the service right away.
- Click finish to end the setup procedure.
- If metricbeat is installed it will automatically try to collect JVM runtime information from all microservices and the core-service. For it to work with the core-service the wildfly-hawtio adapter needs to be deployed to the core-service. To do so, copy the file <metrics-manager>\tools\hawtio-wildfly-2.15.0.war to <core-service>\standalone\deployments. It will automatically be deployed right away (or at next start of the core-service)
Log-Shipping Installation (only sending log information to Elasticsearch)
- Double-click to start the setup.
- Click next to start the setup procedure.
- Choose the installation directory.
- At HTTP port, you can configure the metrics-manager servicewatcher port. This port has no special significance but should only be changed if the default port (8283) is already being used.
- Choose the IP address that metrics services should bind to. This should be the IP address visible to other machines in the LAN / WAN.
- Uncheck all check boxes. (Elasticsearch, Kibana and Logstash are only for the main installation.)
- If you have a core-service and/or service-manager installation on this machine, keep filebeat marked for installation and enter the paths to the core-service metrics log file and/or service-manager metrics log file (or use the buttons to the right to open a file selection dialog).
If you have a distributed system, just leave the field(s) of the component(s) located on other machines empty. If you have neither a core-service nor a service-manager installed on this machine, you will still need filebeat to send the metricbeat information to logstash. If you do not want system metrics of this machine in your collected data, then uncheck the filebeat as well as metricbeat checkboxes.
The path to the metricbeat log file is predefined for you. If for some reason, this is incorrect or you want to change it, you can do so now.
Under the metricbeat checkbox, make sure that the prefilled IP address is the address of the machine running the logstash service. If not, change it to that address. - If you chose to install metricbeat, you can now choose if you want to install optional metrics modules for collecting metrics of the relational database, elasticsearch, and ActiveMQ.
- If you chose to install the database metrics module, enter the jdbc connection-string (see the file <core-service>\standalone\configuration\jas-app.xml for a reference), the username and the password for the database connection.
- If you chose to install the elasticsearch metrics module, enter the host-address(es) of the elasticsearch server(s) and the username and password to access elasticsearch. You can refer to the file <service-manager>\config\application-es.yml for these values.
- If you chose to install the ActiveMQ metrics module, enter the host address of the messaging-service (within the service-manager) and the path to the jolokia endpoints (the predefined value should be correct already).
- Optionally you can choose to install the Network Share Monitor by cheking the "Install network share monitor" checkbox. Then enter the URI(s) for the network share(s) and the credentials, i.e. the Windows-Domain name, the Windows-Domain username and the password. The path of the datafile can be adjusted if desired. It is automatically added to the list of watched files in the configuration of the filebeat component.
- Keep elastalert unchecked.
- Setup now has all required information. Click next to start the installation.
- If you like, you can start the service right away.
- Click finish to end the setup procedure.
- If metricbeat is installed it will automatically try to collect JVM runtime information from all microservices and the core-service. For it to work with the core-service the wildfly-hawtio adapter needs to be deployed to the core-service. To do so, copy the file <metrics-manager>\tools\hawtio-wildfly-2.15.0.war to <core-service>\standalone\deployments. It will automatically be deployed right away (or at next start of the core-service)
Starting Kibana
- On the machine containing the main installation, open a browser and navigate to http://<local-ip-address>:5601
- The kibana frontend will show up.
- Login with your credentials. The default is elastic / optimal.
- You are automatically forwarded to the predefined metrics manager dashboard.
- You can watch the data coming in and start to explore it. If you want to, you can also define your own visualizations and/or dashboards.
Elastalert2 (optional)
If you chose to use elastalert2 to receive e-mail/Teams notifications about critical and error situations, these are the predefined rules that trigger an alert :
System:
Rule Filename Condition description server_down.yaml less than 5 documents with host.name.keyword = Servername for 3 minutes high_cpu_load.yaml system.cpu.total.normalized.pct > 90% for 15 minutes per host.name.keyword high_ram_utilization.yaml system.memory.used.pct > 95% for 10 minutes per host.name.keyword network_error_spike.yaml system.network.in.errors + system.network.out.errors > 100 || 2 * previous timeframe value over 2x 1 minute hdd_utilization_warning.yaml system.filesystem.used.pct > 95% for 10 minutes per host.name.keyword and system.filesystem.device_name.keyword hdd_full.yaml system.filesystem.free < 1GB for 10 minutes per host.name.keyword and system.filesystem.device_name.keyword jvm_heap_usage_warning.yaml jolokia.services.memory.heap_usage.used_pct > 0.90 for 3 minutes jvm_heap_usage_overload.yaml jolokia.services.memory.heap_usage.used_pct > 0.98 for 3 minutes yuuvis:
Rule Filename Condition description failed_logins_warning.yaml data.headers.response.x-os-autherror: "USERNAME_PASSWORD_INVALID" > 3x in 10 minutes per data.authdetails.user.keyword brute_force_warning.yaml data.headers.response.x-os-autherror: "USERNAME_PASSWORD_INVALID" > 50x in 15 minutes login_try_to_locked_account.yaml data.headers.response.x-os-autherror: "ACCOUNT_LOCKED" > 1x in 5 minutes per data.authdetails.user.keyword http_5xx_spike.yaml data.headers.response.status = 500..599 > 100 || 2 * previous timeframe value over 2x 1 minute per data.headers.response.status http_5xx_percentage.yaml data.headers.response.status = 500..599 for > 2% of all stati, at least 20, in 5 minutes per data.headers.response.status http_4xx_spike.yaml data.headers.response.status = 400..499 > 20 || 1.3 * previous timeframe value over 2x 1 minute per data.headers.response.status http_4xx_percentage.yaml data.headers.response.status = 400..499 for > 2% of all stati, at least 20, in 5 minutes per data.headers.response.status search_latency.yaml duration_ms > 300 AND servicename:search.72 AND data.path:"/search" NOT data.path:"/search/aggregate" NOT data.path:"/search/storedqueries" for 5 minutes, at least 15 searches microservice_went_down.yaml service.name appeared at least once and then not anymore within 2 minutes per servicename activemq_broker_down.yaml activemq.broker.name.keyword appeared at least once and then not anymore within 3 minutes activemq_error_queue_size.yaml activemq.queue.messages.size.avg > 1 for 6 hours in queues "errors" and "ActiveMQ.DLQ" activemq_queue_size_congestion.yaml activemq.queue.messages.size.avg > 100 for 12 hours metricsdata_missing.yaml "host.name" and "log.file.path" appeared at least once and then not anymore within 5 minutes Elasticsearch:
Rule Filename Condition description elasticsearch_cluster_down.yaml elasticsearch.cluster.name appeared at least once and then not anymore within 3 minutes elasticsearch_cluster_state_change.yaml elasticsearch.cluster.stats.status changed its value in the last 1 minute elasticsearch_index_state_change.yaml elasticsearch.index.status changed its value in the last 1 minute elasticsearch_shard_state_change.yaml elasticsearch.shard.state changed its value in the last 1 minute Relational Database
Rule Filename Condition description mssql_database_down.yaml mssql.database.name appeared at least once and then not anymore within 3 minutes mssql_transaction_log_full.yaml mssql.transaction_log.space_usage.used.pct > 90% for 10 minutes mssql_transaction_log_utilization_warning.yaml mssql.transaction_log.space_usage.used.pct > 98% for 10 minutes oracle_database_down.yaml oracle.tablespace.name.keyword appeared at least once and then not anymore within 3 minutes oracle_tablespace_full.yaml oracle.tablespace.space.used.pct > 95% for 10 minutes postgresql_database_down.yaml postgresql.database.name appeared at least once and then not anymore within 3 minutes postgresql_database_rollback_spike.yaml postgresql.database.transactions.rollback > 2 * previous timeframe value over 2x 1 minute
You can find these files in <metrics-manager>\config\elastalert\elastalert-rules\. If you want to add new rules or adapt the values of the existing rules, simply add new files or edit the existing ones. Changes take effect within 1-2 minutes during the runtime - no restart necessary.
You can find the documentation here: https://elastalert2.readthedocs.io/en/latest/ruletypes.html
The list of e-mail recipients is globally defined in the <metrics-manager>\config\elastalert\elastalert.yaml
file in the 'email' field. The value can either be a single address or an array of addresses in the form ["recipient@one", "recipient@two", ...]. You can also overwrite this list within the rule files.
Enabling HTTPS Communication
To enable HTTPS communication for Kibana (external) and/or for Elasticsearch, Logstash, Metricbeat and Filebeat (internal) follow the below instructions:
Kibana
- For Kibana a SSL/TLS certifcate in .cer / .crt and .key format is required. Place these two files in the
<mertrics-manager>\config
folder - Open the
<metrics-manager>\config\kibana.yml
file for editing- Uncomment the following three lines and replace
certificate.cer
andcertificate.key
with the file names of your certificate filesserver.ssl.enabled: true
server.ssl.certificate: ../../../config/certificate.cer
server.ssl.key: ../../../config/certificate.key
- Find the below lines and replace the hostname with the exact hostname defined in the certificate
server.host: "metrics.optimal-systems.de"
server.name: "metrics.optimal-systems.de"
- Find the below line and change the protocol from http to https:
server.publicBaseUrl: "https://metrics.optimal-systems.de:5601"
Note: Do the following only if you're also enabling HTTPS for elasticsearch: - Find the below line and change the protocol from http to https:
elasticsearch.hosts: ["https://metrics.optimal-systems.de:5200"]
- Find the below line and uncomment it. If the used certificate is self-signed, set the value to
none
, otherwise leave it atfull
elasticsearch.ssl.verificationMode: none
- Uncomment the following three lines and replace
- Save the file and restart Kibana. It is now accessible via
https://metrics.optimal-systems.de:5601
.
Elasticsearch
To enable HTTPS in elasticsearch, a certificate in .p12 format (the same as for the gateway microservice) can be used. If Elasticsearch is set to HTTPS communication, the configuration of Kibana and Logstash needs to be changed so that https is used for communication with Elasticsearch. This can be done by following the below steps:
- Elasticsearch
- Place the certificate file in the
<metrics-manager>\config\elasticsearch
folder. - Edit the
<metrics-manager>\config\elasticsearch\elasticsearch.yml
file. - Add the following lines at the end of the file. Replace certificate.p12 with the filename of your certificate and 'password' with the password for your certificate
xpack.security.http.ssl.enabled: true
xpack.security.http.ssl.verification_mode: certificate
xpack.security.http.ssl.keystore.path: certificate.p12
xpack.security.http.ssl.keystore.password: password
- Save the file and restart Elasticsearch. It is now available at
https://<certificate-hostname>:5200
- Place the certificate file in the
- Kibana
- If not already configured in the above steps (Kibana), follow these steps to use https communication with Elasticsearch.
- Find the below line and change the protocol from http to https:
elasticsearch.hosts: ["https://<certificate-hostname>:5200"]
- Find the below line and uncomment it. If the used certificate is self-signed, set the value to
none
, otherwise leave it atfull
elasticsearch.ssl.verificationMode: none
- Logstash
- Edit the
<metrics-manager>\config\logstash\logstash.conf
file. - Find the following lines and change the url from
http://<ip>:5200
tohttps://<certificate-hostname>:5200
output {
elasticsearch {
hosts => ["https://metrics.optimal-systems.de:5200"]
- Edit the
Logstash
- For Logstash a SSL/TLS certifcate in .cer / .crt and .key format is required. The .key file needs to be in unencrypted PKCS8 format. Place these two files in the
<mertrics-manager>\config\logstash
folder. Open the
<metrics-manager>\config\logstash\logstash.conf
file and expand the input section to look like below:input { # input from filebeat beats { # the port to listen on port => 5044 ssl => true ssl_certificate => "D:\yuuvis\metrics-manager\config\logstash\certificate.cer" ssl_key => "D:\yuuvis\metrics-manager\config\logstash\certificate.key" ssl_verify_mode => "none" } }
Replace the
certificate.cer
andcertificate.key
file names with the actual names of the certificate files. If the certificate is self-signed usessl_verify_mode
with valuenone
(as shown above). Else, useforce_peer
as value. Only absolute paths are valid.