MinIO Object Storage

A high performance, distributed object storage server, designed for large-scale data infrastructure. It is an ideal S3-compatible replacement for Hadoop HDFS for machine learning and other big data workloads

Architecture
Architecture

MinIO supports a range of modern workloads from machine learning to backup by
employing cloud native technologies and disaggregating the compute and storage layer
to create highly efficient and scalable object storage solutions.

Erasure Coding

MinIO protects data with per-object, inline erasure coding which is written in assembly code to deliver the highest performance possible. MinIO uses Reed-Solomon code to stripe objects into n/2 data and n/2 parity blocks - although these can be configured to any desired redundancy level. This means that in a 12 drive setup, an object is sharded across as 6 data and 6 parity blocks. Even if you lose as many as 5 ((n/2)–1) drives, be it parity or data, you can still reconstruct the data reliably from the remaining drives. MinIO’s implementation ensures that objects can be read or new objects written even if multiple devices are lost or unavailable. Finally, MinIO's erasure code is at the object level and can heal one object at a time.

Erasure Coding

Bitrot Protection

Silent data corruption or bitrot is a serious problem faced by disk drives resulting in data getting corrupted without the user’s knowledge. The reasons are manifold (aging drives, current spikes, bugs in disk firmware, phantom writes, misdirected reads/writes, driver errors, accidental overwrites) but the result is the same - compromised data.

MinIO’s optimized implementation of the HighwayHash algorithm ensures that it will never read corrupted data - it captures and heals corrupted objects on the fly. Integrity is ensured from end to end by computing hash on READ and verifying it on WRITE from the application, across the network and to the memory/drive. The implementation is designed for speed and can achieve hashing speeds over 10 GB/sec on a single core on Intel CPUs.

Bitrot Protection

Encryption

It is one thing to encrypt data in flight it is another to protect data at rest. MinIO supports multiple, sophisticated server-side encryption schemes to protect data - wherever it may be. MinIO’s approach assures confidentiality, integrity and authenticity with negligible performance overhead. Server side and client side encryption are supported using AES-256-GCM, ChaCha20-Poly1305 and AES-CBC. Encrypted objects are tamper-proofed with AEAD server side encryption. Additionally, MinIO is compatible with and tested against all commonly used Key Management solutions (e.g. HashiCorp Vault).

MinIO uses a key-management-system (KMS) to support SSE-S3. If a client requests SSE-S3, or auto-encryption is enabled, the MinIO server encrypts each object with an unique object key which is protected by a master key managed by the KMS. Given the exceptionally low overhead, auto-encryption can be turned on for every application and instance.

WORM

When WORM is enabled, MinIO disables all APIs that can potentially mutate the object data and metadata. The means that data once written becomes tamper-proof. This has practical applications for a number of different regulatory requirements.

Encryption & WORM

Identity Management

MinIO supports the most advanced standards in identity management, integrating with the OpenID connect compatible providers as well as key external IDP vendors. That means that access is centralized and passwords are temporary and rotated, not stored in config files and databases. Furthermore, access policies are fine grained and highly configurable which means that supporting multi-tenant and multi-instance deployments become simple.

Identity Management

Continuous Replication

The challenge with traditional replication approaches is that they do not scale effectively beyond a few hundred TB. Having said that, everyone needs a replication strategy to support disaster recovery and that strategy needs to span geographies, data centers and clouds. MinIO’s continuous replication is designed for large scale, cross data center deployments. By leveraging Lambda compute notifications and object metadata it can compute the delta efficiently and quickly.

Lambda notifications ensure that changes are propagated immediately as opposed to traditional batch mode. Continuous replication means that data loss will be kept to a bare minimum should a failure occur - even in the face of highly dynamic datasets. Finally, like all that MinIO does, continuous replication is multi-vendor, meaning that your backup location can be anything from NAS to the public cloud.

Continuous Replication

Global Federation

The modern enterprise has data everywhere. MinIO allows those various instances to be combined to form a unified global namespace. Specifically, up to 32 MinIO servers can be combined into a Distributed Mode set and multiple Distributed Mode sets can be combined into a MinIO Server Federation. Each MinIO Server Federation provides a unified admin and namespace.

A MinIO Federation Server supports an unlimited number of Distributed Mode sets.

The impact of this approach is that an object store can scale massively for large, geographically distributed enterprise while retaining the ability to accommodate a variety of applications (S3 Select, MinSQL, Spark, Hive, Presto, TensorFlow, H20) from a single console.

Global Federation

Multi-Cloud Gateway

All enterprises are adopting a multi-cloud strategy. This also includes private clouds. As a result, your bare-metal virtualization containers and public cloud services (including non-S3 providers like Google, Microsoft and Alibaba) have to look identical. While the modern application is highly portable, the data that powers those applications is not.

Making that data available, wherever it may reside, is the primary challenge that MinIO addresses. MinIO runs on bare metal, network attached storage and every public cloud. More importantly, MinIO ensures your view of that data looks exactly the same from an application and management perspective via the Amazon S3 API.

MinIO, can go even further, making your existing storage infrastructure compatible with Amazon S3. The implications are profound. Now organizations can truly unify their data infrastructure - from file to block, all appearing as objects accessible via the Amazon S3 API without the requirement for migration.

Multi-cloud Gateway
Reference Hardware

High Performance Software Demands High Performance Hardware

While MinIO is hardware agnostic, these industry standard, widely available boxes are tested to deliver against the exceptional capabilities of our software. We work tirelessly to optimize our software against the latest hardware and, as a result, can significantly outperform overpriced, outdated appliances.