Documentation

Batch Framework

New in version MinIO: RELEASE.2022-10-08T20-11-00Z

The Batch Framework was introduced with the replicate job type in the mc RELEASES.2022-10-08T20-11-00Z.

Overview

The MinIO Batch Framework allows you to create, manage, monitor, and execute jobs using a YAML-formatted job definition file (a “batch file”). The batch jobs run directly on the MinIO deployment to take advantage of the server-side processing power without constraints of the local machine where you run the MinIO Client.

A batch file defines one job task.

Once started, MinIO starts processing the job. Time to completion depends on the resources available to the deployment.

If any portion of the job fails, MinIO retries the job up to the number of times defined in the job definition.

The MinIO Batch Framework supports the following job types:

Job Type

Description

replicate

Perform a one-time replication procedure from one MinIO location to another MinIO location.

keyrotate

Perform a one-time process to cycle the sse-s3 or sse-kms cryptographic keys on objects.

MinIO Batch CLI

The mc batch commands include

mc batch generate

The mc batch generate command creates a basic YAML-formatted template file for the specified job type.

mc batch start

The mc batch start command launches a batch job from a job batch YAML file.

mc batch list

The mc batch list command outputs a list of the batch jobs currently in progress on a deployment.

mc batch status

The mc batch status command outputs real-time summaries of job events on a MinIO server.

mc batch describe

The mc batch describe command outputs the job definition for a specified job ID.

Access to mc batch

A user’s access keys and policies do not restrict the the buckets, prefixes, or objects the batch function can access or the types of actions the process can perform on any objects.

For some job types, the credentials passed to the batch job through the YAML file do restrict the objects that the job can access. However, any restrictions to the job are from the credentials in the YAML, not policies attached to the user who starts the job.

Use MinIO’s Policy Based Access Control and the administrative policy actions to restrict who can perform various batch job functions. MinIO provides the following admin policy actions for Batch Jobs:

admin:ListBatchJobs

Grants the user the ability to see batch jobs currently in process.

admin:DescribeBatchJobs

Grants the user the ability to see the definition details of batch job currently in process.

admin:StartBatchJob

Grants the user the ability to start a batch job. The job may be further restricted by the credentials the job uses to access either the source or target deployments.

admin:CancelBatchJob

Allows the user to stop a batch job currently in progress.

You can assign any of these actions to users independently or in any combination.

The built-in ConsoleAdmin policy includes sufficient access to perform all of these types of batch job actions.

Job Types

Note

Depending on the job type, the success or failure of any batch job may be impacted by the credentials given in the batch job’s YAML for the source or target deployments.

Local Deployment

You run a batch job against a particular deployment by passing an alias to the mc batch command. The deployment you specify in the command becomes the local deployment within the context of that batch job.

Replicate

Use the replicate job type to create a batch job that replicates objects from one MinIO deployment (the source deployment) to another MinIO deployment (the target deployment). Either the source or the target must be the local deployment. Starting with the MinIO Server RELEASE.2023-05-04T21-44-30Z, the other deployment can be either another MinIO deployment or any S3-compatible location.

The batch job definition file can limit the replication by bucket, prefix, and/or filters to only replicate certain objects. The access to objects and buckets for the replication process may be restricted by the credentials you provide in the YAML for either the source or target destinations.

Changed in version MinIO: Server RELEASE.2023-04-07T05-28-58Z

You can replicate from a remote MinIO deployment to the local deployment that runs the batch job.

For example, you can use a batch job to perform a one-time replication sync to push objects from a bucket on a local deployment at minio-local/invoices/ to a bucket on a remote deployment at minio-remote/invoices. You can also pull objects from the remote deployment at minio-remote/invoices to the local deployment at minio-local/invoices.

The advantages of Batch Replication over mc mirror include:

  • Removes the client to cluster network as a potential bottleneck

  • A user only needs access to starting a batch job with no other permissions, as the job runs entirely server side on the cluster

  • The job provides for retry attempts in event that objects do not replicate

  • Batch jobs are one-time, curated processes allowing for fine control replication

  • (MinIO to MinIO only) The replication process copies object versions from source to target

Changed in version MinIO: Server RELEASE.2023-02-17T17-52-43Z

Run batch replication with multiple workers in parallel by specifying the MINIO_BATCH_REPLICATION_WORKERS environment variable.

Sample YAML Description File for a replicate Job Type

Create a basic replicate job definition file you can edit with mc batch generate.

For the local deployment, do not specify the endpoint or credentials. Either delete or comment out those lines for the source or the target section, depending on which is the local.

replicate:
  apiVersion: v1
  # source of the objects to be replicated
  # if source is not the local deployment for the command, provide the endpoint and credentials
  source:
    type: TYPE # valid values are "s3" or "minio"
    bucket: BUCKET
    prefix: PREFIX
    # endpoint: ENDPOINT
    # path: "on|off|auto"
    # credentials:
    #   accessKey: ACCESS-KEY
    #   secretKey: SECRET-KEY
    #   sessionToken: SESSION-TOKEN # Available when rotating credentials are used

  # target where the objects must be replicated
  # if target is not the local deployment for the command, provide the endpoint and credentials
  target:
    type: TYPE # valid values are "s3" or "minio"
    bucket: BUCKET
    prefix: PREFIX
    # endpoint: ENDPOINT
    # path: "on|off|auto"
    # credentials:
    #   accessKey: ACCESS-KEY
    #   secretKey: SECRET-KEY
    #   sessionToken: SESSION-TOKEN # Available when rotating credentials are used

  # optional flags based filtering criteria
  # for all source objects
  flags:
    filter:
      newerThan: "7d" # match objects newer than this value (e.g. 7d10h31s)
      olderThan: "7d" # match objects older than this value (e.g. 7d10h31s)
      createdAfter: "date" # match objects created after "date"
      createdBefore: "date" # match objects created before "date"

      # tags:
      #   - key: "name"
      #     value: "pick*" # match objects with tag 'name', with all values starting with 'pick'

      ## NOTE: metadata filter not supported when "source" is non MinIO.
      # metadata:
      #   - key: "content-type"
      #     value: "image/*" # match objects with 'content-type', with all values starting with 'image/'

  notify:
    endpoint: "https://notify.endpoint" # notification endpoint to receive job status events
    token: "Bearer xxxxx" # optional authentication token for the notification endpoint

  retry:
    attempts: 10 # number of retries for the job before giving up
    delay: "500ms" # least amount of delay between each retry

Key Rotate

New in version MinIO: RELEASE.2023-04-07T05-28-58Z

Use the keyrotate job type to create a batch job that cycles the sse-s3 or sse-kms keys for encrypted objects.

The YAML configuration supports filters to restrict key rotation to a specific set of objects by creation date, tags, metadata, or kms key. You can also define retry attempts or set a notification endpoint and token.

Sample YAML Description File for a keyrotate Job Type

Create a basic keyrotate job definition file you can edit with mc batch generate.

keyrotate:
  apiVersion: v1
  bucket: bucket
  prefix: 
  encryption:
    type: sse-kms # valid values are sse-s3 and sse-kms
    
    # The following encryption values only apply for sse-kms type.
    # For sse-s3 key types, MinIO uses the key provided by the MINIO_KMS_KES_KEY_FILE environment variable.
    # The following two values are ignored if type is set to sse-s3.
    key: my-new-keys2 # valid only for sse-kms
    context: <new-kms-key-context> # valid only for sse-kms

  # optional flags based filtering criteria
  flags:
    filter:
      newerThan: "84h" # match objects newer than this value (e.g. 7d10h31s)
      olderThan: "80h" # match objects older than this value (e.g. 7d10h31s)
      createdAfter: "2023-03-02T15:04:05Z07:00" # match objects created after "date"
      createdBefore: "2023-03-02T15:04:05Z07:00" # match objects created before "date"
      tags:
        - key: "name"
          value: "pick*" # match objects with tag 'name', with all values starting with 'pick'
      metadata:
        - key: "content-type"
          value: "image/*" # match objects with 'content-type', with all values starting with 'image/'
      kmskey: "key-id" # match objects with KMS key-id (applicable only for sse-kms)
  
  # optional entries to add notifications for the job
  notify:
    endpoint: "https://notify.endpoint" # notification endpoint to receive job status events
    token: "Bearer xxxxx" # optional authentication token for the notification endpoint
  
  # optional entries to add retry attempts if the job is interrupted
  retry:
    attempts: 10 # number of retries for the job before giving up
    delay: "500ms" # least amount of delay between each retry