Using PyTorch with CUDA

When working with PyTorch on GPU instances, it's crucial to ensure that your library installation is compatible with the CUDA drivers installed on Qwak instances. This ensures optimal performance and compatibility with GPU resources.

Currently the Qwak GPU instances are provisioned with CUDA version 12.1 and below you will find instructions on using the latest versions of Torch compatible with the CUDA mentioned above.

Installing Compatible PyTorch

To align PyTorch with the CUDA version on your instance, use the following index URL when adding the pytorch library to your dependencies configuration file, whether it's Conda, Pip (requirements.txt), or Poetry.

In Workspaces

Use this command in your workspace environment:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

In Model Builds

For requirements.txt, your file should look like this:

scipy
scikit-learn
pandas

--extra-index-url https://download.pytorch.org/whl/cu121
torch
torchvision
torchaudio

For Conda environments, here's an example configuration:

name: your-conda-environment
channels:
  - defaults
  - conda-forge
  - huggingface
dependencies:
  - python=3.9
  - pip:
    - --extra-index-url https://download.pytorch.org/whl/cu121
    - torch
    - torchvision
    - torchaudio
  - transformers
  - accelerate
  - scikit-learn
  - pandas

Please note that the conda.yaml above is just an example, not all the dependencies are required.

Verifying the installation

After installation, confirm that PyTorch is utilizing the GPU. Add the following code snippet to your QwakModel. For training models, insert it at the start of the build() method. If loading a pre-trained model, place it in the initialize_model() method.

import torch

print("Torch version:",torch.__version__)

# Automatically use CUDA if available, else use CPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"The PyTorch device used by the model is {device}\n")

This should output cuda as device in your Qwak model build logs, indicating that PyTorch is correctly set up to use the GPU.

Troubleshooting

If you don't see True in your logs, check the Code tab within your Build page. Ensure that the dependency file is correctly recognised by the model and that the requirements.lock file reflects the appropriate versions for the Torch libraries.