Documentation

Documentation

Deployment Architecture

This page provides an overview of MinIO deployment architectures from a production perspective. For information on specific hardware or software configurations, see:

Distributed MinIO Deployments

A production MinIO deployment consists of at least 4 MinIO hosts with homogeneous storage and compute resources.

MinIO aggregates these resources together as a pool and presents itself as a single object storage service.

4 Node MinIO deployment with homogeneous storage and compute resources

Each MinIO host in this pool has matching compute, storage, and network configurations

MinIO provides best performance when using locally-attached storage, such as NVMe or SSD drives attached to a PCI-E controller board on the host machine.

Storage controllers should present XFS-formatted drives in “Just a Bunch of Drives” (JBOD) configurations with no RAID, pooling, or other hardware/software resiliency layers. MinIO recommends against caching, either at the drive or the controller layer. Either type of caching can cause I/O spikes as the cache fills and clears, resulting in unpredictable performance.

MinIO Server diagram of Direct-Attached Storage via SAS to a PCI-E Storage Controller

Each SSD connects by SAS to a PCI-E-attached storage controller operating in HBA mode

MinIO automatically groups drives in the pool into erasure sets.

Erasure sets are the foundational component of MinIO availability and resiliency. MinIO stripes erasure sets symmetrically across the nodes in the pool to maintain even distribution of erasure set drives. MinIO then partitions objects into data and parity shards based on the deployment parity and distributes them across an erasure set.

For a more complete discussion of MinIO redundancy and healing, see Erasure Coding.

Diagram of object being sharded into four data and four parity blocks, distributed across eight drives

With the default parity of EC:4, MinIO shards the object into 4 data and 4 parity blocks, distributing them across the drives in the erasure set.

MinIO uses a deterministic hashing algorithm based on object name and path to select the erasure set for a given object.

For each unique object namespace BUCKET/PREFIX/[PREFIX/...]/OBJECT.EXTENSION, MinIO always selects the same erasure set for read/write operations. MinIO handles all routing within pools and erasure sets, making the select/read/write process entirely transparent to applications.

Diagram of object retrieval from only data shards

MinIO reconstructs objects from data or parity shards transparently before returning the object to the requesting client.

Each MinIO server has a complete picture of the distributed topology, such that an application can connect and direct operations against any node in the deployment.

The MinIO responding node automatically handles routing internal requests to other nodes in the deployment and returning the final response to the client.

Applications typically should not manage those connections, as any changes to the deployment topology would require application updates. Production environments should instead deploy a load balancer or similar network control plane component to manage connections to the MinIO deployment. For example, you can deploy an NGINX load balancer to perform “least connections” or “round robin” load balancing against the available nodes in the deployment.

Diagram of an eight node MinIO deployment behind a load balancer

The load balancer routes the request to any node in the deployment. The receiving node handles any internode requests thereafter.

You can expand a MinIO deployment’s available storage through pool expansion.

Each pool consists of an independent group of nodes with their own erasure sets. MinIO must query each pool to determine the correct erasure set to which it directs read and write operations, such that each additional pool adds increased internode traffic per call. The pool which contains the correct erasure set then responds to the operation, remaining entirely transparent to the application.

If you modify the MinIO topology through pool expansion, you can update your applications by modifying the load balancer to include the new pool’s nodes. Applications can continue using the load balancer address for the MinIO deployment without any updates or modifications. This ensures even distribution of requests across all pools, while applications continue using the single load balancer URL for MinIO operations.

Diagram of a multi-pool minio deployment behind a load balancer

The PUT request requires checking each pool for the correct erasure set. Once identified, MinIO partitions the object and distributes the data and parity shards across the appropriate set.

Client applications can use any S3-compatible SDK or library to interact with the MinIO deployment.

MinIO publishes its own SDK specifically intended for use with S3-compatible deployments.

Diagram of multiple S3-compatible clients using SDKs to connect to MinIO

Clients using a variety of S3-compatible SDKs can perform operations against the same MinIO deployment.

MinIO uses a strict implementation of the S3 API, including requiring clients to sign all operations using AWS Signature V4 or the legacy Signature V2. AWS signature calculation uses the client-provided headers, such that any modification to those headers by load balancers, proxies, security programs, or other components will result in signature mismatch errors and request failure. Ensure any such intermediate components support pass-through of unaltered headers from client to server.

While the S3 API uses HTTP methods like GET and POST for all operations, applications typically use an SDK for S3 operations. In particular, the complexity of signature calculation typically makes interfacing via curl or similar REST clients impractical. MinIO recommends using S3-compatible SDKs or libraries which perform the signature calculation automatically as part of operations.

Replicated MinIO Deployments

MinIO site replication provides support for synchronizing distinct independent deployments.

You can deploy peer sites in different racks, datacenters, or geographic regions to support functions like BC/DR or geo-local read/write performance in a globally distributed MinIO object store.

Diagram of a multi-site deployment with three MinIO peer site

A MinIO multi-site deployment with three peers. Write operations on one peer replicate to all other peers in the configuration automatically.

Each peer site consists of an independent set of MinIO hosts, ideally having matching pool configurations.

The architecture of each peer site should closely match to ensure consistent performance and behavior between sites. All peer sites must use the same primary identity provider, and during initial configuration only one peer site can have any data.

Diagram of a multi-site deployment during initial setup

The initial setup of a MinIO multi-site deployment. The first peer site replicates all required information to other peers in the configuration. Adding new peers uses the same sequence for synchronizing data.

Replication performance primarily depends on the network latency between each peer site.

With geographically distributed peer sites, high latency between sites can result in significant replication lag. This can compound with workloads that are near or at the deployment’s overall performance capacity, as the replication process itself requires sufficient free I/O to synchronize objects.

Diagram of a multi-site deployment with latency between sites

In this peer configuration, the latency between Site A and its peer sites is 100ms. The soonest the object fully synchronizes to all sites is at least 110ms.

Deploying a global load balancer or similar network appliance with support for site-to-site failover protocols is critical to the functionality of multi-site deployments.

The load balancer should support a health probe/check setting to detect the failure of one site and automatically redirect applications to any remaining healthy peer.

Diagram of a multi-site deployment with a failed site

One of the peer sites has failed completely. The load balancer automatically routes requests to the remaining healthy peer site.

The load balancer should meet the same requirements as single-site deployments regarding connection balancing and header preservation. MinIO replication handles transient failures by queuing objects for replication.

MinIO replication can automatically heal a site that has partial or total data loss due to transient or sustained downtime.

If a peer site completely fails, you can remove that site from the configuration entirely. The load balancer configuration should also remove that site to avoid routing client requests to the offline site.

You can then restore the peer site, either after repairing the original hardware or replacing it entirely, by adding it back to the site replication configuration. MinIO automatically begins resynchronizing existing data while continuously replicating new data.

Diagram of a multi-site deployment with a healing site

The peer site has recovered and reestablished connectivity with its healthy peers. MinIO automatically works through the replication queue to catch the site back up.

Once all data synchronizes, you can restore normal connectivity to that site. Depending on the amount of replication lag, latency between sites and overall workload I/O, you may need to temporarily stop write operations to allow the sites to completely catch up.