Documentation

Site Failure Recovery

MinIO can make the loss of an entire site, while significant, a relatively minor incident. Site recovery depends on the replication option you use for the site.

Site Replication

Total restoration of IAM configurations, bucket configurations, and data from the healthy peer site(s)

Bucket Replication

Data restoration of objects and metadata from a healthy remote location for each bucket configured for replication

mc mirror

Data restoration of objects only from a healthy remote location with no versioning

Site replication healing automatically adds IAM settings, buckets, bucket configurations, and objects from the existing site(s) to the new site with no further action required.

You cannot configure site replication if any bucket replication rules remain in place on other healthy sites. Bucket replication is mutually exclusive with site replication.

If you are switching from using bucket replication to using site replication, you must first remove all bucket replication rules from the healthy site prior to setting up site replication.

Restore an Unhealthy Peer to Site Replication

Important

The https://github.com/minio/minio/releases/tag/RELEASE.2023-01-02T09-40-09Z MinIO server release includes important fixes for removing a downed site in replication configurations containing three or more peer sites.

For deployments configured for site replication, plan to test and upgrade all peer sites to the specified release. In the event of a site failure, you can update the remaining healthy sites to the specified version and use this procedure.

Site replication keeps two or more MinIO deployments in sync with IAM policies, buckets, bucket configurations, objects, and object metadata. If a peer site fails, such as due to a major disaster or long power outage, you can use the remaining healthy site(s) to restore the replicable data.

The following procedure can restore data in scenarios where site replication was active prior to the site loss. This procedure assumes a total loss of one or more peer sites versus replication lag or delays due to latency or transient deployment downtime.

  1. Remove the failed site from the MinIO site replication configuration using the mc admin replicate rm command with the --force option.

    The following command force-removes an unhealthy peer site from the replication configuration:

    mc admin replicate rm HEALTHY_PEER UNHEALTHY_PEER --force
    
    • Replace HEALTHY_PEER with the alias of any healthy peer in the replication configuration

    • Replace UNHEALTHY_PEER with the alias of the unhealthy peer site

    All healthy peers in the site replication configuration update to remove the unhealthy peer automatically. You can use the mc admin replicate info command to verify the new site replication configuration.

  2. Deploy a new MinIO site following the site replication requirements.

    • Do not upload any data or otherwise configure the deployment beyond the stated requirements.

    • Validate that the new MinIO deployment functions normally and has bidirectional connectivity to the other peer sites.

    • Ensure the new site matches the server version on the existing peer sites

    Warning

    The mc admin replicate rm --force command only operates on the online or healthy nodes in the site replication configuration. The removed offline MinIO deployment retains its original replication configuration, such that if the deployment resumes normal operations it would continue replication operations to its configured peer sites.

    If you plan to re-use the hardware for the site replication configuration, you must completely wipe the drives for the deployment before re-initializing MinIO and adding the site back to the replication configuration.

  3. Add the replacement peer site to the replication configuration.

    Use the mc admin replicate add command to update the replication configuration with the new site:

    mc admin replicate add HEALTHY_PEER NEW_PEER
    
    • Replace HEALTHY_PEER with the alias of any healthy peer in the replication configuration

    • Replace NEW_PEER with the alias of the new peer

    All healthy peers in the site replication configuration update for the new peer automatically. You can use the mc admin replicate info command to verify the new site replication configuration.

  4. Resynchronize the new peer with mc admin replicate resync.

    mc admin replicate resync start HEALTHY_PEER NEW_PEER
    
    • Replace HEALTHY_PEER with the alias of any healthy peer in the replication configuration

    • Replace NEW_PEER with the alias of the new peer

  5. Validate the replication status.

    Use the following commands to track the replication status:

Active Bucket Replication Resynchronization

For scenarios where bucket replication was in place prior to the failure, you can use mc replicate resync to restore data to a new site. Create a new site to replace the failed deployment, then synchronize the data from an existing, healthy, bucket replication-enabled deployment to the new site.

  1. Deploy a new MinIO site.

  2. Set up IAM and users as needed.

  3. On the site with data, create a new remote target using the mc admin bucket remote add command and record the ARN from the output.

  4. From the site with the data, use the mc replicate resync start command with the ARN from the previous command to rebuild the bucket on the new site.

  5. Wait for re-synchronization to complete (use mc replicate resync status to check).

  6. Set up bucket replication rule(s) from the new MinIO site to the existing target bucket(s).

  7. (Optional) Delete the bucket replication rules from the target deployment(s) to restore an active-passive replication scenario.

Passive Bucket Replication Resynchronization

Bucket replication can directly restore the site contents by performing a replication from the target bucket(s) to a new MinIO site.

As a passive process, bucket replication may not perform as quickly as desired for a site recovery scenario.

Using bucket replication relies on the standard replication scanner queue, which does not take priority over other processes. For recovery procedures with stricter SLA/SLO, use the active bucket replication process with mc replicate resync command as described above.

Bucket replication rules copy the object, its version ID, versions, and other metadata to the target bucket. MinIO can restore the object with all of these attributes to a new MinIO site if bucket replication had already been in use prior to the site loss.

  1. Deploy a new MinIO site.

  2. Set up IAM and users as needed.

  3. On the remaining target bucket deployment(s), create bucket replication rule(s) for each bucket to the new MinIO site.

  4. Wait for replication to complete.

  5. Set up bucket replication rule(s) from the new MinIO site to the existing target bucket(s).

  6. (Optional) Delete the bucket replication rules from the target deployment(s) to restore an active-passive replication scenario.

    Do not delete the bucket replication rules from the deployments used to recover data if you prefer to keep an active-active replication between the buckets. In active-active replication, changes to the objects at either location affect the objects at the other location.

Mirroring

MinIO’s mirroring copies an object from any S3 compatible storage system.

Mirroring only copies the latest version of each object and does not include versioning metadata, regardless of the source. You cannot restore those attributes with this method.

Use mc mirror in situations where you need to restore only the latest version of an object. Use bucket replication or site replication where those methods were already in use if you are copying from another MinIO deployment and wish to restore the object’s version history and version metadata.

  1. Deploy a new MinIO site.

  2. Set up IAM and users as needed.

  3. Create buckets on the new site.

  4. Use the mc cp CLI command to copy the contents from the mirror location to the new MinIO site.