MinIO with VMware FAQs
How does MinIO integrate with vSphere?
vSphere 7.0 Update 1 ships with the VMware vSAN™ Data Persistence platform that enables software defined storage offerings like MinIO to be natively integrated with vCenter Workload Clusters running on top of vSAN. This deep integration allows admins to enable, manage and monitor MinIO from vSphere APIs and UI.
The integration includes the lifecycle management of MinIO object storage via the vSphere cluster while offering bare-metal like performance and cost efficiencies.
Which version of VMware vSphere do I need?
MinIO is built on top of the vSAN Data Persistence platform and is available in vSphere 7.0 Update 1 or higher.
Which workloads can leverage MinIO integrated with vSphere?
Object stores are often the core infrastructure that data intensive applications depend on. Because of MinIO’s performance characteristics it is used across a number of applications and use cases:
ML/AI: TensorFlow, KubeFlow, H20.ai
BigData: Splunk, Spark, Presto, Druid, Teradata
Content Delivery: VMware Harbor Container Registry, Video Streaming
Backup: Velero, Veeam
Which hardware profiles can be used?
MinIO can run on almost all hardware profiles. Our recommended partners are listed here.
What is the recommended density of hardware configuration in a cost-saving way?
We recommend the following:
- CPUs/Node: 2 Xeon Gold CPUs with 8 cores per CPU
- NIC/Node: 25GbE for Capacity
- Memory/Node: 128GB
- Drives/Node: Minimum 8 HDD (16TiB/drive)
- Minimum No of Nodes per Cluster: 4
- HW OEM Options: Please refer to the HW OEMs listed here for Capacity - https://min.io/product/reference-hardware
What is the multi tenancy model when using MinIO integrated with vSphere?
MinIO integrated with vSphere leverages the Data Persistency Platform to create multiple instances of MinIO to allow for isolated multiple tenants to be on boarded on the same vSphere install. Each tenant lives in their own namespace and the VI admin can carve resources, set quotas and assign them to each tenant.
How does MinIO’s encryption relate to vSAN’s encryption?
MinIO’s encryption functionality is implemented at the object layer and is designed for high-performance I/O. Objects are independently encrypted each with their own keys. Applications can choose to supply these encryption keys from the client side, or via an external Key Management Service. You do not need vSAN encryption when MinIO’s encryption is turned on.
Can I run MinIO on a Guest Cluster?
Running MinIO as a guest cluster is not recommended but it is possible. MinIO will need storage on the nodes exposed either through vSphere CSI driver or from VM local storage using Direct CSI driver.
How do I map applications to object-store instances/buckets?
Apps connect to an S3 endpoint which is provided in the MinIO instance plugin UI with virtual hosted style requests. MinIO is strictly consistent and has no bucket limitation. Multiple MinIO clusters can be deployed on the same bare metal or Kubernetes platform to achieve multi-tenancy as well if more separation is required.
Does MinIO support site to site replication?
MinIO’s server-side replication feature supports both active-passive and active-active configurations to keep two data-centers in sync continuously. Objects are replicated immediately as long as there is sufficient bandwidth.
How do I backup MinIO?
MinIO provides continuous data protection capability with built-in object versioning, immutability and server-side replication. You can roll back objects to any point in time without ever performing snapshots. Remote site is continuously kept in sync including the object version history. Since MinIO itself is a popular backup target, it is important to eliminate the need for yet-another backup storage tier.
How do I backup/restore MinIO instances?
Object Storage is typically backed up to another cluster in a geographically distant location. Cost will depend on the needs of the backup, i.e. does it need to be active-active replication backup or is it purely archival. In either case the cost is directly related to the amount of hardware needed for the backup. So, if all data does not need to be backed up, less hardware can be used on the DR site. If MinIO is being backed up to a cloud provider such as AWS S3, then the cost is specific to the provider.
Best practices related to replication and backups tend to be application and customer specific, however please see the following resources.
https://github.com/minio/minio/blob/master/docs/bucket/replication/DESIGN.md and https://docs.min.io/minio/baremetal/replication/replication-overview.html for a detailed guide.
How do I archive MinIO objects?
MinIO supports object expiry and transition via its ILM APIs. Objects are chosen for archival based on the object tags, expiry timeline and object name prefixes. API allows you to setup richer rules to selectively expire and transition the objects to a remote low-cost HDD based MinIO site.
How does MinIO work with Disaster Recovery planning on VMware?
MinIO supports disaster recovery planning through its server-side replication functionality. You can pair buckets with remote MinIO setups to keep them in sync continuously.
Do I need to plan for spare rebuild capacity?
MinIO uses Erasure Coding to evenly distribute all objects across the available servers. Spare nodes are not necessary in this configuration.
What is the software license for deploying MinIO on vSphere?
The MinIO Subscription Network can be found here. VMware requires a license key for all ISVs on the vSAN Data Persistence platform. There is a 60 day trial period, during which the enterprise will be automatically onboarded to SUBNET.
How do I get support for MinIO on VMware?
VMware requires that all of the ISVs on the vSAN Data Persistence platform offer licensing terms and product support independently of VMware. Information on the MinIO Subscription Network can be found here.
Does MinIO use vCenter HA?
MinIO is a highly-available distributed object storage system. It can tolerate multiple node and drive failures through its erasure-code and bitrot protection mechanism.
How can I find usage for buckets in a MinIO instance?
vSAN can offer usage consumption for the storage pools that host MinIO. Capacity for vSAN SNA and vSAN Direct is thick provisioned where the entire capacity needed is pre provisioned for MinIO. For further breakdown of the used capacity, MinIO Console provides a bucket level summary of used capacity.
What kind of performance would we expect from such a capacity-optimized configuration?
MinIO is the world’s fastest object store. Benchmarks can be found here.
Example results of MinIO HDD Cluster Performance:
- Read throughput: 16.3GB/s
- Write throughput: 9.4GB/s
What are the rough $/GiB numbers for this capacity-optimized configuration?
Please see the reference hardware page for various calculators. Please note that the calculators do not include VMware vSAN DPP related license costs.
What is the recommendation for performant hardware configuration?
We recommend a cluster built using NVMe based drives with 100GbE NIC.
Please refer to the below recommendation for NVMe cluster:
- CPUs/Node: 2 Xeon Gold CPUs with 8 cores per CPU
- NIC/Node: 100GbE
- Memory/Node: 128GB
- Drives/Node: Minimum 8 NVMe (7.6TiB+ / drive)
- Minimum No of Nodes per Cluster: 4
- HW OEM Options: Please refer to the HW OEMs listed here for Performance - https://min.io/product/reference-hardware
Note: For this recommendation, we need to take into account any Kubernetes Master Node configuration requirements, which are not addressed in the list above.
What is the rough $/GiB for the performant configuration?
For large scale clusters, this is a function of primarily the drive used for MinIO Object Storage. NVMe SSD is more expensive than HDD clusters. But the cost of NVMe is coming down significantly. The $/GiB also depends on the Erasure Coding configuration used in the cluster.
Note: MinIO doesn’t have any NVMe related calculator published. But the NVMe costs at a high level can be accessed here.
What is the usable capacity based on the Erasure Coding scheme on MinIO for the 500TiB raw capacity instance?
How do I plan in terms of CPU, RAM, number of drives per pod and number of nodes for each instance based on the expected performance?
Does the number of pods or PVs affect performance?
Generally, more pods and PVs lead to more parallelism that improves performance. The number is limited by physical resources.
What is the relationship between users and instances?
MinIO looks at tenancy more from a resource management perspective (i.e. storage class or buckets). Most applications only require the endpoint and bucket to function. Multi-tenancy is useful to group similar applications, groups or users from resource management (i.e reporting) as well as operational management. There are no technical limitations restricting the sharing of many buckets in a large MinIO deployment.
What are the pros and cons of one application to one instance versus many applications to one instance and using individual buckets within the instance for each application?
This is dependent on the operational needs of a given organization. Many applications to a single instance will typically be used in scenarios where multiple groups from the same organization want to utilize a single cluster with resource access defined via IAM policies. This use case can be simpler operationally as there is only a single cluster to manage. Using unique instances for a given application is typically deployed for two use cases. The first, when you wish to have complete isolation of data, for example when providing MinIO services to end customers, you may wish to ensure that an incorrect policy application does not allow one customer to see another customer's data. The second is when you want to have different configurations for each unique cluster. One example would be to allow for different erasure code parity settings for non-critical data in one cluster, and higher redundancy for another cluster with business critical data.
Is there any consideration for the tenancy model from a performance and security isolation perspective?
Multi-user vs Multi-tenancy options available for customers to deploy MinIO in a variety of configurations without negatively impacting the performance of the system.
MinIO Policy-Based Access Control (PBAC) supports granular isolation of user access down to the bucket or bucket prefix. For applications that require exclusive access to object storage, you can deploy a MinIO tenant for supporting only those applications. For example:
- Applications write to the same bucket on a single tenant.
- Applications write to their own bucket on a single tenant.
- Applications write to their own MinIO Tenant.
- Storage and compute resources are exclusive to each tenant
For more information, see:
What is needed to match erasure coding (and drive technology together) to number of 9s?
Data durability by most cloud storage vendors offers 11 nines of reliability. MinIO is designed from the ground up to be a drop-in replacement for AWS S3 service for various deployment methods and using similar commodity HW and features (i.e. EC and replication) MinIO can easily provide 11 9s of durability. Due its object-level granularity based on some calculations it is possible to even provide better durability but this would require very conservative erasure coding (EC) configurations which may be not practical or cost effective.
11 9s of durability (99.999999999) translates into about 12 objects lost per year per PiB. This can easily be prevented by using N/2 EC configuration and at a minimum of 2x site replication.
Basic calculations around erasure coding (EC) and its impact on durability can be found here.
Does MinIO support Multi-AZ deployment?
MinIO can be deployed in multiple availability zones via replication. Multiple MinIO instances (or tenants) can be deployed against disparate Kubernetes / VMware clusters that are deployed in various zones or regions of the overall Global Data Center footprint. Any given MinIO tenant should reside in a single vSphere cluster. Multiple tenants can exist in a single cluster. Stretch clusters are not recommended. MinIO supports active-active replication between independent tenants, where objects written to one tenant are automatically synchronized to the remote. Documentation can be found here.
What is the performance cost of sync or durability cost of async for replication?
MinIO supports both sync and async replications across multiple sites. Choice will depend on each customer's latency and throughput availability across these data centers, availability zones (AZs) or sites in general.
The performance cost for synchronous replication will be directly related to available throughput and intersite network latency since the application will have to wait for the roundtrip to complete.
Durability cost for asynchronous will be similarly impacted by available throughput and network latency but since MinIO implements object-level granularity this minimizes the impact compared to other async approaches.
When can we set up replicas and EC of an instance?
MinIO EC is configured at Tenant creation as part of the Tenant Size step. Select the Erasure Code Parity setting to use for the initial Zone. MinIO has an Erasure Code Calculator to assist in selecting an appropriate parity setting based on the desired Zone topology.
When expanding a tenant, the new Zone inherits its EC setting from the Tenant. MinIO requires that the new zone have a sufficient number of drives to support the Tenant EC. Specifically, the new zone must have at least 2x the number of drives to support the same EC configuration as the existing Zone’s EC in that particular tenant.
You can set up replication at any time after deploying the MinIO Tenant. MinIO server-side replication requires a MinIO service as the replication target. MinIO has documented procedures for One-Way Active-Passive and Two-Way Active-Active replication.
MinIO also provides the mc mirror tool for performing client-side content synchronization to any S3-compatible service.
How do I upgrade my MinIO instances?
The MinIO plugin tenant tab will indicate when an update is available. Customers can simply click on the UI to trigger the rolling instance upgrade. There is no service interruption.
How do I scale the instance as it grows? How does the scaling impact performance over time?
MinIO scales seamlessly from TiB to PiB. To learn more about the details we suggest this documentation.
Customers can also modify the vCPU allocation for the tenant. Please refer to this documentation for additional information.
Customers can also add more vSAN direct drives to scale capacity for each ESXi server. ESXi servers may also be expanded to accommodate capacity requirements.
What's the performance cost of instance rebuilding overhead on the production workload?
For single disk failures there is no performance impact. In general, the rebuild workload takes a lower priority if there is an active workload that is happening in parallel. This process is completely executed in the background with no active management requirement by the customers.
Do we ever need any kind of rebalancing for capacity?
No. Rebalancing is not required with MinIO.
How do I upgrade the certified vSphere operator that manages the Hyperstore instances?
Please see the compatibility matrix below. The Operator upgrade must stay within the boundary of each row to keep all components in check. If user wants to upgrade MinIO Operator to the latest that cross different rows, need to upgrade the vDPp plugin as well, plugin upgrades are currently not handled from UI.
||Shipped with 7.0U1
||Shipped with 7.0U2
||Last v3 operator is v3.0.29
||Shipped with 7.0U3
||Latest v4 operator is v4.2.14