Documentation

Documentation

Monitoring and Alerting using Prometheus

MinIO publishes cluster and node metrics using the Prometheus Data Model. The procedure on this page documents the following:

  • Configuring a Prometheus service to scrape and display metrics from a MinIO deployment

  • Configuring an Alert Rule on a MinIO Metric to trigger an AlertManager action

Prerequisites

This procedure requires the following:

  • An existing Prometheus deployment with backing Alert Manager

  • An existing MinIO deployment with network access to the Prometheus deployment

  • An mc installation on your local host configured to access the MinIO deployment

The MinIO Operator supports deploying a per-tenant Prometheus instance configured to support metrics and visualizations. This includes automatically configuring the Tenant to enable the Tenant Console historical metric view.

You can still use this procedure to configure an external Prometheus service for supporting monitoring and alerting for a MinIO Tenant. You must configure all necessary network control components, such as Ingress or a Load Balancer, to facilitate access between the Tenant and the Prometheus service. This procedure assumes your local host machine can access the Tenant via mc.

Configure Prometheus to Collect and Alert using MinIO Metrics

1) Generate the Scrape Configuration

Use the mc admin prometheus generate command to generate the scrape configuration for use by Prometheus in making scraping requests:

mc admin prometheus generate ALIAS

Replace ALIAS with the alias of the MinIO deployment.

The command returns output similar to the following:

scrape_configs:
- job_name: minio-job
  bearer_token: TOKEN
  metrics_path: /minio/v2/metrics/cluster
  scheme: https
  static_configs:
  - targets: [minio.example.net]
  • Set the job_name to a value associated to the MinIO deployment.

    Use a unique value to ensure isolation of the deployment metrics from any others collected by that Prometheus service.

  • MinIO deployments started with MINIO_PROMETHEUS_AUTH_TYPE set to "public" can omit the bearer_token field.

  • Set the scheme to http for MinIO deployments not using TLS.

  • Set the targets array with a hostname that resolves to the MinIO deployment.

    This can be any single node, or a load balancer/proxy which handles connections to the MinIO nodes.

2) Restart Prometheus with the Updated Configuration

Append the scrape_configs job generated in the previous step to the configuration file:

global:
   scrape_interval: 15s

scrape_configs:
   - job_name: minio-job
     bearer_token: TOKEN
     metrics_path: /minio/v2/metrics/cluster
     scheme: https
     static_configs:
     - targets: [minio.example.net]

Start the Prometheus cluster using the configuration file:

prometheus --config.file=prometheus.yaml

3) Analyze Collected Metrics

Prometheus includes a expression browser. You can execute queries here to analyze the collected metrics.

The following query examples return metrics collected by Prometheus:

minio_cluster_disk_online_total{job="minio-job"}[5m]
minio_cluster_disk_offline_total{job="minio-job"}[5m]

minio_bucket_usage_object_total{job="minio-job"}[5m]

minio_cluster_capacity_usable_free_bytes{job="minio-job"}[5m]

See Available Metrics for a complete list of published metrics.

4) Configure an Alert Rule using MinIO Metrics

You must configure Alert Rules on the Prometheus deployment to trigger alerts based on collected MinIO metrics.

The following example alert rule files provide a baseline of alerts for a MinIO deployment. You can modify or otherwise use these examples as guidance in building your own alerts.

groups:
- name: minio-alerts
  rules:
  - alert: NodesOffline
    expr: avg_over_time(minio_cluster_nodes_offline_total{job="minio-job"}[5m]) > 0
    for: 10m
    labels:
      severity: warn
    annotations:
      summary: "Node down in MinIO deployment"
      description: "Node(s) in cluster {{ $labels.instance }} offline for more than 5 minutes"

  - alert: DisksOffline
    expr: avg_over_time(minio_cluster_disk_offline_total{job="minio-job"}[5m]) > 0
    for: 10m
    labels:
      severity: warn
    annotations:
      summary: "Disks down in MinIO deployment"
      description: "Disks(s) in cluster {{ $labels.instance }} offline for more than 5 minutes"

Specify the path to the alert file to the Prometheus configuration as part of the rule_files key:

global:
  scrape_interval: 5s

rule_files:
- minio-alerting.yml

Once triggered, Prometheus sends the alert to the configured AlertManager service.

5) (Optional) Configure MinIO Console to Query Prometheus

The Console also supports displaying time-series and historical data by querying a Prometheus service configured to scrape data from the MinIO deployment.

MinIO Console displaying Prometheus-backed Monitoring Data

To enable historical data visualization in MinIO Console, set the following environment variables on each node in the MinIO deployment:

Restart the MinIO deployment and visit the Monitoring pane to see the historical data views.