Erasure Coding
MinIO implements Erasure Coding as a core component in providing data redundancy and availability. This page provides an introduction to MinIO Erasure Coding.
See Availability and Resiliency and Deployment Architecture for more information on how MinIO uses erasure coding in production deployments.
Erasure Coding Basics
Note
The diagrams and content in this section present a simplified view of MinIO erasure coding operations and are not intended to represent the complexities of MinIO’s full erasure coding implementation.
- MinIO groups drives in each server pool into one or more Erasure Sets of the same size.
-
MinIO determines the optimal number and size of erasure sets when initializing a server pool. You cannot modify these settings after this initial setup.
- For each write operation, MinIO partitions the object into data and parity shards.
Erasure set stripe size dictates the maximum possible parity of the deployment. The formula for determining the number of data and parity shards to generate is:
N (ERASURE SET SIZE) = K (DATA) + M (PARITY)
- You can set the parity value between 0 and 1/2 the Erasure Set size.
-
Objects written with a given parity settings do not automatically update if you change the parity values later.
- MinIO requires a minimum of
K
shards of any type to read an object. The value
K
here constitutes the read quorum for the deployment. The erasure set must therefore have at leastK
healthy drives in the erasure set to support read operations.MinIO cannot reconstruct an object that has lost read quorum. Such objects may be recovered through other means such as replication resynchronization.
- MinIO requires a minimum of
K
erasure set drives to write an object. The value
K
here constitutes the write quorum for the deployment. The erasure set must therefore have at leastK
available drives online to support write operations.- If Parity
EC:M
is exactly 1/2 the erasure set size, write quorum isK+1
This prevents a split-brain type scenario, such as one where a network issue isolates exactly half the erasure set drives from the other.
The
K+1
logic ensures that a client could not potentially write the same object twice - once to each “half” of the erasure set.- For an object maintaining read quorum, MinIO can use any data or parity shard to heal damaged shards.
Use the MinIO Erasure Coding Calculator to explore the possible erasure set size and distributions for your planned topology. Where possible, use an even number of nodes and drives per node to simplify topology planning and conceptualization of drive/erasure-set distribution.
Exclusive access to drives
MinIO requires exclusive access to the drives or volumes provided for object storage. No other processes, software, scripts, or persons should perform any actions directly on the drives or volumes provided to MinIO or the objects or files MinIO places on them.
Unless directed by MinIO Engineering, do not use scripts or tools to directly modify, delete, or move any of the data shards, parity shards, or metadata files on the provided drives, including from one drive or node to another. Such operations are very likely to result in widespread corruption and data loss beyond MinIO’s ability to heal.
Erasure Parity and Storage Efficiency
Setting the parity for a deployment is a balance between availability and total usable storage. Higher parity values increase resiliency to drive or node failure at the cost of usable storage, while lower parity provides maximum storage with reduced tolerance for drive/node failures. Use the MinIO Erasure Code Calculator to explore the effect of parity on your planned cluster deployment.
The following table lists the outcome of varying erasure code parity levels on a MinIO deployment consisting of 1 node and 16 1TB drives:
Bit Rot Protection
Bit rot is silent data corruption from random changes at the storage media level. For data drives, it is typically the result of decay of the electrical charge or magnetic orientation that represents the data. These sources can range from the small current spike during a power outage to a random cosmic ray resulting in flipped bits. The resulting “bit rot” can cause subtle errors or corruption on the data medium without triggering monitoring tools or hardware.
MinIO’s optimized implementation of the HighwayHash algorithm ensures that it captures and heals corrupted objects on the fly. Integrity is ensured from end to end by computing a hash on READ and verifying it on WRITE from the application, across the network, and to the memory or drive. The implementation is designed for speed and can achieve hashing speeds over 10 GB/sec on a single core on Intel CPUs.