Automating Batch Execution

Overview

Automating batch model inference simplifies running batch tasks on a periodic basis..

You can set up scheduled executions to dynamically process data and ensure regular updates without manual intervention, or a need to setup external scheduling tools such as Airflow.


Configuring BatchExecution

Batch model execution runs a storage-based execution for batch deployed model.

❗️

A model must be deployed as batch before running the automation, otherwise the automation will fail.


Defining a batch execution automation includes two parts:

  1. BatchJobDataSpecifications - Telling the automation where to fetch data from.
  2. Optional BatchJobExecutionSpecifications - Defining custom deployment resources for the batch model. When not provided, the default deployed parameters will be used.

For more details on all available parameters for configuring batch model executions, please refer to Storage-Based Execution page.

from qwak.automations import Automation, ScheduledTrigger, \
    BatchExecution, BatchJobDataSpecifications, BatchJobExecutionSpecifications

batch_execution_automation = Automation(
    name="scheduled_batch_inference",
    model_id="my-model-id",
    trigger=ScheduledTrigger(cron="0 0 * * *"),
    action=BatchExecution(
        data_specifications=BatchJobDataSpecifications(
            access_token_secret_name="api-token",
            access_secret_secret_name="api-secret",
            source_bucket="input_s3_bucket",
            source_folder="data_folder",
            input_file_type="<csv/parquet/feather>",
            destination_bucket="output_s3_bucket",
            destination_folder="output_data_folder",
            output_file_type="<csv/parquet/feather>",
        ),
        execution_specifications=BatchJobExecutionSpecifications(
            executors=1,
            params={"key": "value"},
            instance="small",
            job_timeout=0,
            task_timeout=0,
            custom_iam_role_arn="your-iam-role-name"
        ),
        build_id="optional-batch-model-build-id"
    )
)

📘

Scheduler Timezone

The default timezone for the cron scheduler is UTC.


Dynamic folder paths

When configuring the source and destination folders to read and write data, we may define folder paths based on the runtime timestamp.

The dynamic path may include a timestamp template, which will be injected when the automation runs.

📘

The timestamp format should follow Python strftime formatting, and wrapped with curly brackets, i.e.: {%d-%m-%Y}

Example path template

There are two input parameters which support the dynamic timestamp template:

  • destination_folder
  • source_folder

Defining source_folder="input_folder/{%d-%m-%Y}" will format the path based on the current timestamp:

source_folder="input_folder/23-05-2023"