Metrics and alerts
MinIO publishes metrics using the Prometheus Data Model. You can use any scraping tool to pull metrics data from MinIO for further analysis and alerting.
Starting with MinIO Server RELEASE.2024-07-15T19-02-30Z and MinIO Client RELEASE.2024-07-11T18-01-28Z, metrics version 3 provides additional endpoints. MinIO recommends version 3 for new deployments.
Metrics version 2
Existing deployments can continue to use version 2 metrics and Grafana dashboards.
Version 3 Endpoints
For metrics version 3, all metrics are available under the base /minio/metrics/v3
endpoint.
You can scrape the base endpoint to collect all metrics in a single operation, or append an optional path to return a specific category.
For example, the following endpoint returns audit metrics:
http://HOSTNAME:PORT/minio/metrics/v3/audit
Replace HOSTNAME:PORT
with the FQDN and port of the MinIO deployment.
For deployments with a load balancer managing connections between MinIO nodes, specify the address of the load balancer.
By default, MinIO requires authentication to scrape the metrics endpoints.
To generate the needed bearer tokens, use mc admin prometheus generate
.
You can also disable metrics endpoint authentication by setting MINIO_PROMETHEUS_AUTH_TYPE
to public
.
MinIO provides the following scraping endpoints, relative to the base URL:
Category |
Path |
---|---|
API |
|
Audit |
|
Cluster |
|
Debug |
|
ILM |
|
Logger webhook |
|
Notification |
|
Replication |
|
Scanner |
|
System |
|
For a complete list of metrics for each endpoint, see Available version 3 etrics.
The MinIO Operator supports deploying a per-tenant Prometheus instance configured to support metrics and visualization.
If you deploy the Tenant with this feature disabled but still want the historical metric views, you can instead configure an external Prometheus service to scrape the Tenant metrics. Once configured, you can update the Tenant to query that Prometheus service to retrieve metric data:
Set
MINIO_PROMETHEUS_URL
to the URL of the Prometheus serviceSet
MINIO_PROMETHEUS_JOB_ID
to the unique job ID assigned to the collected metrics
Available version 3 metrics
MinIO publishes a number of metrics for clusters, API requests, buckets, and other aspects of the MinIO service:
Many metrics include labels identifying the resource which generated that metric and other relevant details.
API metrics
Metrics about requests served by the current node.
Path |
Description |
---|---|
|
Metrics over all requests. |
|
Metrics over all requests for a given bucket. |
/api/requests
Name |
Description |
Labels |
---|---|---|
|
Total number of requests rejected for auth failure. |
|
|
Total number of requests rejected for invalid header. |
|
|
Total number of requests rejected for invalid timestamp. |
|
|
Total number of invalid requests. |
|
|
Total number of requests in the waiting queue. |
|
|
Total number of incoming requests. |
|
|
Total number of requests currently in flight. |
|
|
Total number of requests. |
|
|
Total number of requests with 4xx or 5xx errors. |
|
|
Total number of requests with 5xx errors. |
|
|
Total number of requests with 4xx errors. |
|
|
Total number of requests canceled by the client. |
|
|
Distribution of time to first byte across API calls. |
|
|
Total number of bytes sent. |
|
|
Total number of bytes received. |
|
/bucket/api
Name |
Description |
Labels |
---|---|---|
|
Total number of bytes sent for a bucket. |
|
|
Total number of bytes received for a bucket. |
|
|
Total number of requests currently in flight for a bucket. |
|
|
Total number of requests for a bucket. |
|
|
Total number of requests canceled by the client for a bucket. |
|
|
Total number of requests with 4xx errors for a bucket. |
|
|
Total number of requests with 5xx errors for a bucket. |
|
|
Distribution of time to first byte across API calls for a bucket. |
|
Audit metrics
Metrics about the MinIO audit functionality.
Path |
Description |
---|---|
|
Metrics related to audit functionality. |
/audit
Name |
Description |
Labels |
---|---|---|
|
Total number of messages that failed to send since start. |
|
|
Number of unsent messages in queue for target. |
|
|
Total number of messages sent since start. |
|
Cluster metrics
Metrics about an entire MinIO cluster.
Path |
Description |
---|---|
|
Cluster configuration metrics. |
|
Erasure set metrics. |
|
Cluster health metrics. |
|
Cluster iam metrics. |
|
Object statistics by bucket. |
|
Object statistics. |
/cluster/config
Name |
Description |
Labels |
---|---|---|
|
Reduced redundancy storage class parity. |
|
|
Standard storage class parity. |
/cluster/erasure-set
Name |
Description |
Labels |
---|---|---|
|
Overall write quorum across pools and sets. |
|
|
Overall health across pools and sets (1=healthy, 0=unhealthy). |
|
|
Read quorum for the erasure set in a pool. |
|
|
Write quorum for the erasure set in a pool. |
|
|
Count of online drives in the erasure set in a pool. |
|
|
Count of healing drives in the erasure set in a pool. |
|
|
Health of the erasure set in a pool (1=healthy, 0=unhealthy). |
|
|
Number of drive failures that can be tolerated without disrupting read operations. |
|
|
Number of drive failures that can be tolerated without disrupting write operations. |
|
|
Health of the erasure set in a pool for read operations (1=healthy, 0=unhealthy). |
|
|
Health of the erasure set in a pool for write operations (1=healthy, 0=unhealthy). |
|
/cluster/health
Name |
Description |
Labels |
---|---|---|
|
Count of offline drives in the cluster. |
|
|
Count of online drives in the cluster. |
|
|
Count of all drives in the cluster. |
|
|
Count of offline nodes in the cluster. |
|
|
Count of online nodes in the cluster. |
|
|
Total cluster raw storage capacity in bytes. |
|
|
Total cluster raw storage free in bytes. |
|
|
Total cluster usable storage capacity in bytes. |
|
|
Total cluster usable storage free in bytes. |
/cluster/iam
Name |
Description |
Labels |
---|---|---|
|
Last successful IAM data sync duration in milliseconds. |
|
|
When plugin authentication is configured, returns failed requests count in the last full minute. |
|
|
When plugin authentication is configured, returns time (in seconds) since the last failed request to the service. |
|
|
When plugin authentication is configured, returns time (in seconds) since the last successful request to the service. |
|
|
When plugin authentication is configured, returns average round-trip time of successful requests in the last full minute. |
|
|
When plugin authentication is configured, returns maximum round-trip time of successful requests in the last full minute. |
|
|
When plugin authentication is configured, returns total requests count in the last full minute. |
|
|
Time (in milliseconds) since last successful IAM data sync. |
|
|
Number of failed IAM data syncs since server start. |
|
|
Number of successful IAM data syncs since server start. |
/cluster/usage/buckets
Name |
Description |
Labels |
---|---|---|
|
Time since last update of usage metrics in seconds. |
|
|
Total bucket size in bytes. |
|
|
Total object count in bucket. |
|
|
Total object versions count in bucket, including delete markers. |
|
|
Total delete markers count in bucket. |
|
|
Total bucket quota in bytes. |
|
|
Bucket object size distribution. |
|
|
Bucket object version count distribution. |
|
/cluster/usage/objects
Name |
Description |
Labels |
---|---|---|
|
Time since last update of usage metrics in seconds. |
|
|
Total cluster usage in bytes. |
|
|
Total cluster objects count. |
|
|
Total cluster object versions count, including delete markers. |
|
|
Total cluster delete markers count. |
|
|
Total cluster buckets count. |
|
|
Cluster object size distribution. |
|
|
Cluster object version count distribution. |
|
Debug metrics
Standard Go runtime metrics from the Prometheus Go Client base collector.
Path |
Description |
---|---|
|
Go runtime metrics. |
ILM metrics
Metrics about the MinIO ILM functionality.
Path |
Description |
---|---|
|
Metrics related to ILM functionality. |
/ilm
Name |
Description |
Labels |
---|---|---|
|
Number of pending ILM expiry tasks in the queue. |
|
|
Number of active ILM transition tasks. |
|
|
Number of pending ILM transition tasks in the queue. |
|
|
Number of missed immediate ILM transition tasks. |
|
|
Total number of object versions checked for ILM actions since server start. |
|
Logger webhook metrics
Metrics about MinIO logger webhooks.
Path |
Description |
---|---|
|
Metrics related to logger webhooks. |
/logger/webhook
Name |
Description |
Labels |
---|---|---|
|
Number of messages that failed to send. |
|
|
Webhook queue length. |
|
|
Total number of messages sent to this target. |
|
Notification metrics
Metrics about the MinIO notification functionality.
Path |
Description |
---|---|
|
Metrics related to notification functionality. |
/notification
Name |
Description |
Labels |
---|---|---|
|
Number of concurrent async Send calls active to all targets. |
|
|
Total number of events that failed to send to the targets. |
|
|
Total number of events sent to the targets. |
|
|
Number of events not sent to the targets due to the in-memory queue being full. |
|
Replication metrics
Metrics about MinIO site and bucket replication.
Path |
Description |
---|---|
|
Metrics related to bucket replication. |
|
Metrics related to site replication. |
/replication
Name |
Description |
Labels |
---|---|---|
|
Average number of active replication workers. |
|
|
Average number of bytes queued for replication since server start. |
|
|
Average number of objects queued for replication since server start. |
|
|
Average replication data transfer rate in bytes/sec. |
|
|
Total number of active replication workers. |
|
|
Current replication data transfer rate in bytes/sec. |
|
|
Number of bytes queued for replication in the last full minute. |
|
|
Number of objects queued for replication in the last full minute. |
|
|
Maximum number of active replication workers seen since server start. |
|
|
Maximum number of bytes queued for replication since server start. |
|
|
Maximum number of objects queued for replication since server start. |
|
|
Maximum replication data transfer rate in bytes/sec since server start. |
|
|
Total number of objects seen in replication backlog in the last 5 minutes |
|
/bucket/replication
Name |
Description |
Labels |
---|---|---|
|
Total number of bytes on a bucket which failed to replicate at least once in the last hour. |
|
|
Total number of objects on a bucket which failed to replicate in the last hour. |
|
|
Total number of bytes on a bucket which failed at least once in the last full minute. |
|
|
Total number of objects on a bucket which failed to replicate in the last full minute. |
|
|
Replication latency on a bucket in milliseconds. |
|
|
Number of DELETE tagging requests proxied to replication target. |
|
|
Number of failures in GET requests proxied to replication target. |
|
|
Number of GET requests proxied to replication target. |
|
|
Number of failures in GET tagging requests proxied to replication target. |
|
|
Number of GET tagging requests proxied to replication target. |
|
|
Number of failures in HEAD requests proxied to replication target. |
|
|
Number of HEAD requests proxied to replication target. |
|
|
Number of failures in PUT tagging requests proxied to replication target. |
|
|
Number of PUT tagging requests proxied to replication target. |
|
|
Total number of bytes replicated to the target. |
|
|
Total number of objects replicated to the target. |
|
|
Total number of bytes failed to replicate at least once since server start. |
|
|
Total number of objects that failed to replicate since server start. |
|
|
Number of failures in DELETE tagging requests proxied to replication target. |
|
Scanner metrics
Metrics about the MinIO scanner.
Path |
Description |
---|---|
|
Metrics related to the MinIO scanner. |
/scanner
Name |
Description |
Labels |
---|---|---|
|
Total number of bucket scans completed since server start. |
|
|
Total number of bucket scans started since server start. |
|
|
Total number of directories scanned since server start. |
|
|
Time elapsed (in seconds) since last scan activity. |
|
|
Total number of unique objects scanned since server start. |
|
|
Total number of object versions scanned since server start. |
|
System metrics
Metrics about the MinIO process and the node.
Path |
Description |
---|---|
|
Metrics about CPUs on the system. |
|
Metrics about drives on the system. |
|
Metrics about internode requests made by the node. |
|
Metrics about memory on the system. |
|
Standard process metrics. |
/system/drive
Name |
Description |
Labels |
---|---|---|
|
Total storage used on a drive in bytes. |
|
|
Total storage free on a drive in bytes. |
|
|
Total storage available on a drive in bytes. |
|
|
Total used inodes on a drive. |
|
|
Total free inodes on a drive. |
|
|
Total inodes available on a drive. |
|
|
Total timeout errors on a drive. |
|
|
Total I/O errors on a drive. |
|
|
Total availability errors (I/O errors, timeouts) on a drive. |
|
|
Total waiting I/O operations on a drive. |
|
|
Average last minute latency in µs for drive API storage operations. |
|
|
Count of offline drives. |
|
|
Count of online drives. |
|
|
Count of all drives. |
|
|
Drive health (0 = offline, 1 = healthy, 2 = healing). |
|
|
Reads per second on a drive. |
|
|
Kilobytes read per second on a drive. |
|
|
Average time for read requests served on a drive. |
|
|
Writes per second on a drive. |
|
|
Kilobytes written per second on a drive. |
|
|
Average time for write requests served on a drive. |
|
|
Percentage of time the disk was busy. |
|
/system/memory
Name |
Description |
Labels |
---|---|---|
|
Used memory on the node. |
|
|
Used memory percentage on the node. |
|
|
Free memory on the node. |
|
|
Total memory on the node. |
|
|
Buffers memory on the node. |
|
|
Cache memory on the node. |
|
|
Shared memory on the node. |
|
|
Available memory on the node. |
|
/system/cpu
Name |
Description |
Labels |
---|---|---|
|
Average CPU idle time. |
|
|
Average CPU IOWait time. |
|
|
CPU load average 1min. |
|
|
CPU load average 1min (percentage). |
|
|
CPU nice time. |
|
|
CPU steal time. |
|
|
CPU system time. |
|
|
CPU user time. |
|
/system/network/internode
Name |
Description |
Labels |
---|---|---|
|
Total number of failed internode calls. |
|
|
Total number of internode TCP dial timeouts and errors. |
|
|
Average dial time of internodes TCP calls in nanoseconds. |
|
|
Total number of bytes sent to other peer nodes. |
|
|
Total number of bytes received from other peer nodes. |
|
/system/process
Name |
Description |
Labels |
---|---|---|
|
Number of current READ locks on this peer. |
|
|
Number of current WRITE locks on this peer. |
|
|
Total user and system CPU time spent in seconds. |
|
|
Total number of go routines running. |
|
|
Total bytes read by the process from the underlying storage system including cache, /proc/[pid]/io rchar. |
|
|
Total bytes read by the process from the underlying storage system, /proc/[pid]/io read_bytes. |
|
|
Total bytes written by the process to the underlying storage system including page cache, /proc/[pid]/io wchar. |
|
|
Total bytes written by the process to the underlying storage system, /proc/[pid]/io write_bytes. |
|
|
Start time for MinIO process in seconds since Unix epoch. |
|
|
Uptime for MinIO process in seconds. |
|
|
Limit on total number of open file descriptors for the MinIO Server process. |
|
|
Total number of open file descriptors by the MinIO Server process. |
|
|
Total read SysCalls to the kernel. /proc/[pid]/io syscr. |
|
|
Total write SysCalls to the kernel. /proc/[pid]/io syscw. |
|
|
Resident memory size in bytes. |
|
|
Virtual memory size in bytes. |
|
|
Maximum virtual memory size in bytes. |
|