Automating Build & Deploy

Overview

Automating model build and deployment helps maintaining models accurate in production.

This action streamlines the build and deployment workflows. It keeps your model accurate by automatically re-training and deploying based on a cron expression, defined time interval, or metric base triggers.

You can also define a deployment conditions to verify that the new build passes acceptance criteria within the desired parameters before replacing a currently deployed model.


Automation Example

❗️

Before Setting Up Automation

Prior to configuring automation, it's essential to have your model's code stored in a Git repository. It's recommended to confirm that all necessary Git repository access is correctly configured via CLI model builds. Ensure the Qwak model can successfully build from Git before proceeding with automation.

For additional details on building models from Git, refer to our Build Configurations page.

The automation will fetch the model's code during the training process. In the case of using a private repository, it is necessary to generate a Git access token and securely store the key in the Secret Manager

from qwak.automations import Automation, ScheduledTrigger, QwakBuildDeploy,\
    BuildSpecifications, BuildMetric, ThresholdDirection, DeploymentSpecifications

test_automation = Automation(
    name="retrain_my_model",
    model_id="my-model-id",
    trigger=ScheduledTrigger(cron="0 0 * * 0"),
    action=QwakBuildDeploy(
        build_spec=BuildSpecifications(git_uri="https://github.com/org_id/repository_name.git#dir_1/dir_2",
                                       git_access_token_secret="token_secret_name",
                                       git_branch="main",
                                       main_dir="main",
                                       tags=["prod"],
                                       env_vars=["key1=val1", "key2=val2", "key3=val3"]),
        deployment_condition=BuildMetric(metric_name="f1_score",
                                         direction=ThresholdDirection.ABOVE,
                                         threshold="0.65"),
        deployment_spec=DeploymentSpecifications(number_of_pods=1,
                                                 cpu_fraction=2.0,
                                                 memory="2Gi",
                                                 variation_name="B")
    )
)

📘

Scheduler Timezone

The default timezone for the cron scheduler is UTC.


Build & deploy configuration

The QwakBuildDeploy action has three configuration parameters:

  1. build_spec defines the location of the model code that we will build in the Qwak platform.
  2. deployment_condition defines the metrics used to determine when to deploy the model after the training.
  3. deployment_spec specifies the runtime environment parameters for model deployment.

❗️

Metrics used to trigger build or deploy automations must be logged during the model build phase.


BuildSpecifications

To configure the automation build specification, we need a link to the git repository.

Note that the link consists of two parts delimited by hashtag #:

  • The repository URL
  • The path within the repository

For example, when we use this link: https://github.com/org_id/repository_name.git#dir_1/dir2

The platform will clone the https://github.com/org_id/repository_name.git repository and change the working directory to dir_1/dir_2 before starting the build.

In this example, dir_1/dir_2 should be the directory containing the main and tests folders.

Using private repositories

When using private repositories, we must also specify the access token or private key

As the Qwak platform doesn't allow the usage of plain text token, we must store the access tokens in the Qwak Secret Manager, and specify only the secret name.

When not using the default folder structure, in which main is the models folder, we must also specify the git branch and the directory containing the ML model

As a best practice,

Custom resources

In the build specification, you may control the number of CPUs, amount of memory or use GPUs Instance Sizes

Defining CPU resources:

resources=CpuResources(cpu_fraction=2, memory="2Gi"))

Defining GPU resources:

resources=GpuResources(gpu_type="NVIDIA_K80", gpu_amount=1)

Alternatively, you can specify the instance type as opposed to fractions of resources. For example:

resources=ClientResources(instance='gpu.a10.8xl') #GPU

#OR

resources=ClientResources(instance='medium')  #CPU


It is possible specify the IAM role used in production (assumed_iam_role) or a custom docker image (base_image)

Environment variables

Additionally, we can specify the environment variables to configure in the build environment.

he environment variables should be specified with the env_vars field (list), and the value as the following:
key=value.

The model's code must log the metric that describes the model's performance. We will use the metric in the deployment condition. If you don't know how to do it, look at our Logging and Monitoring Guide.

Disable Push Image

It is possible to disable the push image phase in cases you don't want the final build saved to the docker repository. You can do that by adding push_image=False to the BuildSpecification


BuildMetric

During the build process, it is common to log metrics such as accuracy, F1 score, or loss. When executing the automation, these logged values may be compared against a specified threshold.

For each metric, it is possible to define whether the value should be above or below the threshold. Once this condition is met, the Qwak platform will proceed to deploy the model.

The BuildMetric object has three parameters:

  1. metric_name: The metric name we logged during the build phase
  2. direction: Show the value be below or above the threshold, where the valid values are ThresholdDirection.ABOVE, ThresholdDirection.BELOW
  3. threshold: The threshold used for comparison

❗️

The threshold must always be a string, where threshold="0.65" is a valid threshold and threshold=0.65 is invalid!

Dynamic threshold

To use a dynamic threshold, we can use a SQL expression as the threshold value.

In this case, the Qwak platform will run the SQL query in Qwak Model Analytics and compare the model's metric with the threshold produced by the SQL query.

The query must return a single row containing only one column!


DeploymentSpecifications

After we build the model, compared its performance with the threshold, and concluded that the model is ready to be deployed, the platform will use the deployment specification to configure the model's runtime environment.

We may specify:

ParameterDetails
number_of_http_server_workersThe number of threads used by the HTTP server.
http_request_timeout_msThe request timeout.
daemon_modeShould gunicorn process be daemonized, which makes the workers work in the background.
custom_iam_role_arnThe IAM role used in production
max_batch_sizeMax batch size of record
deployment_process_timeout_limitThe timeout for the deployment (in seconds)
number_of_podsThe number of instances to be deployed
cpu_fraction The CPU cores for Kubernetes.
memoryThe amount of RAM
variation_nameThe variant name if we run an A/B test.
auto_scale_configThe autoscaling configuration for Kubernetes
min_replica_countThe minimum number of replicas will scale the resource down to
max_replica_countThe maximum number of replicas of the target resource
polling_intervalThis is the interval to check each trigger on. By default it's every 30 seconds
cool_down_periodThe period to wait after the last trigger reported active before scaling the resource back to 0. By default it's 5 minutes (300 seconds).
prometheus_triggermetric_type: The type of the metric - cpu/gpu/memory/latency
aggregation_type: The type of the aggregation - min/max/avg/sum
time_period: The period to run the query based on
threshold: Value to start scaling for
EnvironmentsList of environment names to deploy to

Defining auto scaling

When we want to define an auto-scaling policy for our deployment, we have to use the following pattern:

auto_scale_config = AutoScalingConfig(min_replica_count=1,
                                      max_replica_count=10,
                                      polling_interval=30,
                                      cool_down_period=300,
                                      triggers=[
                                          AutoScalingPrometheusTrigger(
                                              query_spec=AutoScaleQuerySpec(
                                                  aggregation_type="max",
                                                  metric_type="latency",
                                                  time_period=4),
                                              threshold=60
                                          )
                                      ]
                                      )