MinIO is a high performance, distributed object storage system. It is software-defined, runs on industry standard hardware and is 100% open source with the dominant license being GNU AGPL v3.
MinIO is different in that it was designed from its inception to be the standard in private/hybrid cloud object storage. Because MinIO is purpose-built to serve only objects, a single-layer architecture achieves all of the necessary functionality without compromise. The result is a cloud-native object server that is simultaneously performant, scalable and lightweight.
While MinIO excels at traditional object storage use cases like secondary storage, disaster recovery and archiving, it is unique at overcoming the challenges associated with machine learning, analytics and cloud-native application workloads.
MinIO’s enterprise class features represent the standard in the object storage space. From the AWS S3 API to S3 Select and our implementations of inline erasure coding and security, our code is widely admired and frequently copied by some of the biggest names in technology and business.
MinIO protects data with per-object inline erasure coding written in assembly code to deliver the highest possible performance. MinIO uses Reed-Solomon code to stripe objects into data and parity blocks with user-configurable redundancy levels. MinIO's Erasure Coding performs healing at the object level and can heal multiple objects independently.
At the maximum parity of N/2, MinIO's implementation can ensure uninterrupted read and write operations with only ((N/2)+1) operational drives in the deployment. For example, in a 12-drive setup, MinIO shards objects across 6 data and 6 parity drives and can reliably write new objects or reconstruct existing objects with only 7 drives remaining in the deployment
Silent data corruption or bitrot is a serious problem faced by disk drives resulting in data getting corrupted without the user’s knowledge. The reasons are manifold (aging drives, current spikes, bugs in disk firmware, phantom writes, misdirected reads/writes, driver errors, accidental overwrites) but the result is the same - compromised data.
MinIO’s optimized implementation of the HighwayHash algorithm ensures that it will never read corrupted data - it captures and heals corrupted objects on the fly. Integrity is ensured from end to end by computing a hash on READ and verifying it on WRITE from the application, across the network and to the memory/drive. The implementation is designed for speed and can achieve hashing speeds over 10 GB/sec on a single core on Intel CPUs.
It is one thing to encrypt data in flight; it is another to protect data at rest. MinIO supports multiple, sophisticated server-side encryption schemes to protect data - wherever it may be. MinIO’s approach assures confidentiality, integrity and authenticity with negligible performance overhead. Server side and client side encryption are supported using AES-256-GCM, ChaCha20-Poly1305 and AES-CBC.
Encrypted objects are tamper-proofed with AEAD server side encryption. Additionally, MinIO is compatible with and tested against all commonly used Key Management solutions (e.g. HashiCorp Vault). MinIO uses a key-management-system (KMS) to support SSE-S3.
If a client requests SSE-S3, or auto-encryption is enabled, the MinIO server encrypts each object with a unique object key which is protected by a master key managed by the KMS. Given the exceptionally low overhead, auto-encryption can be turned on for every application and instance.
When WORM is enabled, MinIO disables all APIs that can potentially mutate the object data and metadata. The means that data once written becomes tamper-proof. This has practical applications for many different regulatory requirements.
MinIO supports the most advanced standards in identity management, integrating with the OpenID connect compatible providers as well as key external IDP vendors. That means that access is centralized and passwords are temporary and rotated, not stored in config files and databases. Furthermore, access policies are fine grained and highly configurable, which means that supporting multi-tenant and multi-instance deployments become simple.
The challenge with traditional replication approaches is that they do not scale effectively beyond a few hundred TiB. Having said that, everyone needs a replication strategy to support disaster recovery and that strategy needs to span geographies, data centers and clouds.
MinIO’s continuous replication is designed for large scale, cross data center deployments. By leveraging Lambda compute notifications and object metadata it can compute the delta efficiently and quickly. Lambda notifications ensure that changes are propagated immediately as opposed to traditional batch mode.
Continuous replication means that data loss will be kept to a bare minimum should a failure occur - even in the face of highly dynamic datasets. Finally, like all that MinIO does, continuous replication is multi-vendor, meaning that your backup location can be anything from NAS to the public cloud.
The modern enterprise has data everywhere. MinIO allows those various instances to be combined to form a unified global namespace. Specifically, any number of MinIO servers can be combined into a Distributed Mode set and multiple Distributed Mode sets can be combined into a MinIO Server Federation. Each MinIO Server Federation provides a unified admin and namespace.
A MinIO Federation Server supports an unlimited number of Distributed Mode sets. The impact of this approach is that an object store can scale massively for large, geographically distributed enterprise while retaining the ability to accommodate a variety of applications (Splunk, Teradata, Spark, Hive, Presto, TensorFlow, H20) from a single console.
All enterprises are adopting a multi-cloud strategy. This also includes private clouds. As a result, your bare-metal virtualization containers and public cloud services (including non-S3 providers like Google, Microsoft and Alibaba) have to look identical. While the modern application is highly portable, the data that powers those applications is not.
Making that data available, wherever it may reside, is the primary challenge that MinIO addresses. MinIO runs on bare metal, network attached storage and every public cloud. More importantly, MinIO ensures your view of that data looks exactly the same from an application and management perspective via the Amazon S3 API.
MinIO, can go even further, making your existing storage infrastructure compatible with Amazon S3. The implications are profound. Now organizations can truly unify their data infrastructure - from file to block, all appearing as objects accessible via the Amazon S3 API without the requirement for migration.
MinIO is designed to be cloud native and can run as lightweight containers managed by external orchestration services such as Kubernetes. The entire server is a ~40 MB static binary and is highly efficient in its use of CPU and memory resources - even under high loads. The result is that you can co-host a large number of tenants on shared hardware.
MinIO operates on commodity servers with locally attached drives (JBOD/JBOF). All of the servers in a cluster are equal in capability (fully symmetrical architecture). There are no name nodes or metadata servers.
MinIO writes data and metadata together as objects, eliminating the need for a metadata database. In addition MinIO performs all functions (erasure code, bitrot check, encryption) as inline, strictly consistent operations. The result is that MinIO is exceptionally resilient.
Each MinIO cluster is a collection of distributed MinIO servers with one process per node. MinIO runs in the user space as a single process and uses lightweight co-routines for high concurrency. Drives are grouped into erasure sets (16 drives per set by default) and objects are placed on these sets using a deterministic hashing algorithm.
MinIO is designed for large scale, multi-data center cloud storage services. Each tenant runs their own MinIO cluster, fully isolated from other tenants giving them the ability to protect them from any disruption on upgrade, update, security incidents. Each tenant scales independently by federating clusters across geographies.