This page provides an overview of MinIO deployment architectures from a production perspective. For information on specific hardware or software configurations, see:
Distributed MinIO Deployments
- A production MinIO deployment consists of at least 4 MinIO hosts with homogeneous storage and compute resources.
MinIO aggregates these resources together as a pool and presents itself as a single object storage service.
- MinIO provides best performance when using locally-attached storage, such as NVMe or SSD drives attached to a PCI-E controller board on the host machine.
Storage controllers should present XFS-formatted drives in “Just a Bunch of Drives” (JBOD) configurations with no RAID, pooling, or other hardware/software resiliency layers. MinIO recommends against caching, either at the drive or the controller layer. Either type of caching can cause I/O spikes as the cache fills and clears, resulting in unpredictable performance.
- MinIO automatically groups drives in the pool into erasure sets.
Erasure sets are the foundational component of MinIO availability and resiliency. MinIO stripes erasure sets symmetrically across the nodes in the pool to maintain even distribution of erasure set drives. MinIO then partitions objects into data and parity shards based on the deployment parity and distributes them across an erasure set.
For a more complete discussion of MinIO redundancy and healing, see Erasure Coding.
- MinIO uses a deterministic hashing algorithm based on object name and path to select the erasure set for a given object.
For each unique object namespace
BUCKET/PREFIX/[PREFIX/...]/OBJECT.EXTENSION, MinIO always selects the same erasure set for read/write operations. MinIO handles all routing within pools and erasure sets, making the select/read/write process entirely transparent to applications.
- Each MinIO server has a complete picture of the distributed topology, such that an application can connect and direct operations against any node in the deployment.
The MinIO responding node automatically handles routing internal requests to other nodes in the deployment and returning the final response to the client.
Applications typically should not manage those connections, as any changes to the deployment topology would require application updates. Production environments should instead deploy a load balancer or similar network control plane component to manage connections to the MinIO deployment. For example, you can deploy an NGINX load balancer to perform “least connections” or “round robin” load balancing against the available nodes in the deployment.
- You can expand a MinIO deployment’s available storage through pool expansion.
Each pool consists of an independent group of nodes with their own erasure sets. MinIO must query each pool to determine the correct erasure set to which it directs read and write operations, such that each additional pool adds increased internode traffic per call. The pool which contains the correct erasure set then responds to the operation, remaining entirely transparent to the application.
If you modify the MinIO topology through pool expansion, you can update your applications by modifying the load balancer to include the new pool’s nodes. Applications can continue using the load balancer address for the MinIO deployment without any updates or modifications. This ensures even distribution of requests across all pools, while applications continue using the single load balancer URL for MinIO operations.
- Client applications can use any S3-compatible SDK or library to interact with the MinIO deployment.
MinIO publishes its own SDK specifically intended for use with S3-compatible deployments.
MinIO uses a strict implementation of the S3 API, including requiring clients to sign all operations using AWS Signature V4 or the legacy Signature V2. AWS signature calculation uses the client-provided headers, such that any modification to those headers by load balancers, proxies, security programs, or other components will result in signature mismatch errors and request failure. Ensure any such intermediate components support pass-through of unaltered headers from client to server.
While the S3 API uses HTTP methods like
POSTfor all operations, applications typically use an SDK for S3 operations. In particular, the complexity of signature calculation typically makes interfacing via
curlor similar REST clients impractical. MinIO recommends using S3-compatible SDKs or libraries which perform the signature calculation automatically as part of operations.
Replicated MinIO Deployments
- MinIO site replication provides support for synchronizing distinct independent deployments.
You can deploy peer sites in different racks, datacenters, or geographic regions to support functions like BC/DR or geo-local read/write performance in a globally distributed MinIO object store.
- Each peer site consists of an independent set of MinIO hosts, ideally having matching pool configurations.
The architecture of each peer site should closely match to ensure consistent performance and behavior between sites. All peer sites must use the same primary identity provider, and during initial configuration only one peer site can have any data.
- Replication performance primarily depends on the network latency between each peer site.
With geographically distributed peer sites, high latency between sites can result in significant replication lag. This can compound with workloads that are near or at the deployment’s overall performance capacity, as the replication process itself requires sufficient free I/O to synchronize objects.
- Deploying a global load balancer or similar network appliance with support for site-to-site failover protocols is critical to the functionality of multi-site deployments.
The load balancer should support a health probe/check setting to detect the failure of one site and automatically redirect applications to any remaining healthy peer.
The load balancer should meet the same requirements as single-site deployments regarding connection balancing and header preservation. MinIO replication handles transient failures by queuing objects for replication.
- MinIO replication can automatically heal a site that has partial or total data loss due to transient or sustained downtime.
If a peer site completely fails, you can remove that site from the configuration entirely. The load balancer configuration should also remove that site to avoid routing client requests to the offline site.
You can then restore the peer site, either after repairing the original hardware or replacing it entirely, by adding it back to the site replication configuration. MinIO automatically begins resynchronizing existing data while continuously replicating new data.
Once all data synchronizes, you can restore normal connectivity to that site. Depending on the amount of replication lag, latency between sites and overall workload I/O, you may need to temporarily stop write operations to allow the sites to completely catch up.