Documentation

Monitoring and Alerting using InfluxDB

MinIO publishes cluster and node metrics using the Prometheus Data Model. InfluxDB supports scraping MinIO metrics data for monitoring and alerting.

The procedure on this page documents the following:

  • Configuring an InfluxDB service to scrape and display metrics from a MinIO deployment

  • Configuring an Alert on a MinIO metric

Prerequisites

This procedure requires the following:

  • An existing InfluxDB deployment configured with one or more notification endpoints

  • An existing MinIO deployment with network access to the InfluxDB deployment

  • An mc installation on your local host configured to access the MinIO deployment

Configure InfluxDB to Collect and Alert using MinIO Metrics

Important

This procedure specifically uses the InfluxDB UI to create a scraping endpoint.

The InfluxDB UI does not provide the same level of configuration as using Telegraf and the corresponding Prometheus plugin. Specifically:

  • You cannot enable authenticated access to the MinIO metrics endpoint via the InfluxDB UI

  • You cannot set a tag for collected metrics (e.g. url_tag) for uniquely identifying the metrics for a given MinIO deployment

Configuring Telegraf is out of scope for this procedure. You can use this procedure as general guidance for configuring Telegraf to scrape MinIO metrics.

  1. Configure Public Access to MinIO Metrics

    Set the MINIO_PROMETHEUS_AUTH_TYPE environment variable to "public" for all nodes in the MinIO deployment. You can then restart the deployment to allow public access to MinIO metrics.

    You can validate the change by attempting to curl the metrics endpoint:

    curl https://HOSTNAME/minio/v2/metrics/cluster
    

    Replace HOSTNAME with the URL of the load balancer or reverse proxy through which you access the MinIO deployment. You can alternatively specify any single node as HOSTNAME:PORT, specifying the MinIO server API port in addition to the node hostname.

    The response body should include a list of collected MinIO metrics.

  2. Log into the InfluxDB UI and Create a Bucket

    Select the Organization under which you want to store MinIO metrics.

    Create a New Bucket in which to store metrics for the MinIO deployment.

  3. Create a new Scraping Source

    Create a new InfluxDB Scraper.

    Specify the full URL to the MinIO deployment, including the metrics endpoint:

    https://HOSTNAME/minio/v2/metrics/cluster
    

    Replace HOSTNAME with the URL of the load balancer or reverse proxy through which you access the MinIO deployment. You can alternatively specify any single node as HOSTNAME:PORT, specifying the MinIO server API port in addition to the node hostname.

  4. Validate the Data

    Use the DataExplorer to visualize the collected MinIO data.

    For example, you can set a filter on minio_cluster_capacity_usable_total_bytes and minio_cluster_capacity_usable_free_bytes to compare the total usable against total free space on the MinIO deployment.

  5. Configure a Check

    Create a new Check on a MinIO metric.

    The following example check rules provide a baseline of alerts for a MinIO deployment. You can modify or otherwise use these examples for guidance in building your own checks.

    • Create a Threshold Check named MINIO_NODE_DOWN.

      Set the filter for the minio_cluster_nodes_offline_total key.

      Set the Thresholds to WARN when the value is greater than 1

    • Create a Threshold Check named MINIO_QUORUM_WARNING.

      Set the filter for the minio_cluster_drive_offline_total key.

      Set the Thresholds to CRITICAL when the value is one less than your configured Erasure Code Parity setting.

      For example, a deployment using EC:4 should set this value to 3.

    Configure your Notification endpoints and Notification rules such that checks of each type trigger an appropriate response.