GPU Support in Virtual Servers

c2-gpu Virtual Servers

Each c2-gpu instance type includes one or more slices of NVIDIA A100 GPUs. The slice size provided is “GRID A100D-20C”, which provides 2 compute pipelines and 20GB of video RAM from the card.

Warning

c2-gpu virtual servers are in Technical Preview only

Minimum Requirements

For “c2-gpu”, the absolute minimum requirements are as follows:

  • A boot/OS disk of at least 30GB (when installing CUDA support)

  • NVIDIA vGPU driver from the v15.0 series. This is currently version 525.60.12.

The version of the driver loaded into your virtual server must be exactly this version, and not any other. From time to time we will update the version needed, and inform you when this updated will be required on your virtual servers.

Note

Drivers provided by OS or distribution vendors should not be installed. Only the drivers specified here will function with the vGPUs available.

In addition, NVIDIA support only the following server operating systems for your vGPU virtual server while running in Catalyst Cloud:

  • Ubuntu 22.04, 20.04

Tested by Catalyst Cloud, but not supported by NVIDIA are the following server operating systems:

  • Rocky Linux 8, 9

All other OS images are unsupported or untested.

Creating a c2-gpu virtual server

To create a GPU-enabled virtual server, create an instance using a flavor prefixed with c2-gpu.

Catalyst Cloud is not permitted to provide modified operating system images so you will need to install supporting drivers to enable GPU support in GPU-enabled virtual servers as per the instructions below.

To help with streamlining GPU server builds we’ve provided examples on using Packer to build custom images that include GPU drivers and software. This process is recommended for bulk GPU compute deployments.

Ubuntu

Once you have created an Ubuntu virtual server using a version supported by the NVIDIA drivers, you will need to perform the following steps.

First, ensure all packages are up to date on your server and it is running the latest kernel (which will require a reboot):

sudo apt update
sudo apt dist-upgrade -y
sudo reboot

Then download and install the GRID driver package.

sudo apt install -y dkms
curl -O https://object-storage.nz-por-1.catalystcloud.io/v1/AUTH_483553c6e156487eaeefd63a5669151d/gpu-guest-drivers/nvidia/grid/15.0/linux/nvidia-linux-grid-525_525.60.13_amd64.deb
sudo dpkg -i nvidia-linux-grid-525_525.60.13_amd64.deb

Note

If you get a 404 response to this download, contact Catalyst Cloud support as the driver versions may have been updated making this documentation outdated.

Next, you will need to install the client license for vGPU support. Download and save the license to /etc/nvidia/ClientConfigToken on your virtual server, using the following steps:

(cd /etc/nvidia/ClientConfigToken && curl -O https://object-storage.nz-por-1.catalystcloud.io/v1/AUTH_483553c6e156487eaeefd63a5669151d/gpu-guest-drivers/nvidia/grid/licenses/client_configuration_token_12-29-2022-15-20-23.tok)

Edit the GRID driver configuration file /etc/nvidia/gridd.conf and ensure that FeatureType is set to 1. Then restart the nvidia- gridd service. The following commands apply the setting and restart the service:

sudo sed -i -e '/^\(FeatureType=\).*/{s//\11/;:a;n;ba;q}' -e '$aFeatureType=1' /etc/nvidia/gridd.conf
sudo systemctl restart nvidia-gridd

After the service has been restarted, check the license status of the vGPU:

nvidia-smi -q | grep 'License Status'

This should return a line stating it is “Licensed” with an expiry in the future.

(Optional) Install the CUDA toolkit, if CUDA support is needed:

curl -O https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda_12.0.0_525.60.13_linux.run
sudo sh cuda_12.0.0_525.60.13_linux.run --silent --toolkit

This will run without any visible output for a while, before returning to a command prompt.

Note

We do not recommend using Debian or Ubuntu packages for the installation of CUDA toolkit. Those packages conflicts with required driver versions and will break your vGPU support.

To complete CUDA tookit installation, ensure that the CUDA libraries are available for applications to link and load:

sudo tee /etc/ld.so.conf.d/cuda.conf <<< /usr/local/cuda/lib64
sudo ldconfig

RHEL-derived Distributions

Linux distributions derived from RHEL, such as Rocky Linux, need the following steps to install the drivers.

Note

NVIDIA do not support RHEL-derived Linux distributions on Catalyst Cloud

First, ensure all packages are up to date on your server and it is running the latest kernel:

sudo dnf update -y && sudo reboot

Then install kernel source and related development tools:

sudo dnf install -y kernel-devel make

(Optional) Next, enable EPEL repositories and install DKMS support. This will automatically rebuild the drivers on kernel upgrades, rather than forcing you to re-install the GRID drivers every time the kernel is updated.

sudo dnf install -y epel-release
sudo dnf install -y dkms

Then install the GRID driver package:

curl -O https://object-storage.nz-por-1.catalystcloud.io/v1/AUTH_483553c6e156487eaeefd63a5669151d/gpu-guest-drivers/nvidia/grid/15.0/linux/NVIDIA-Linux-x86_64-525.60.13-grid.run
sudo sh NVIDIA-Linux-x86_64-525.60.13-grid.run -s -Z

Note

If you get a 404 response to this download, contact Catalyst Cloud support as the driver versions may have been updated making this documentation outdated.

This may produce errors or warnings related to missing X libraries and Vulkan ICD loader. These warnings can be safely ignored.

It may also produce an error about failing to register with DKMS, if you installed DKMS support above. This can be safely ignored, the modules will be rebuilt automatically despite the error message.

Next, you will need to install the client license for vGPU support. Download and save the license to /etc/nvidia/ClientConfigToken on your virtual server, using the following steps:

(cd /etc/nvidia/ClientConfigToken && curl -O https://object-storage.nz-por-1.catalystcloud.io/v1/AUTH_483553c6e156487eaeefd63a5669151d/gpu-guest-drivers/nvidia/grid/licenses/client_configuration_token_12-29-2022-15-20-23.tok)

Edit the GRID driver configuration file /etc/nvidia/gridd.conf and ensure that FeatureType is set to 1. Then restart the nvidia- gridd service. The following commands apply the setting and restart the service:

sudo sed -i -e '/^\(FeatureType=\).*/{s//\11/;:a;n;ba;q}' -e '$aFeatureType=1' /etc/nvidia/gridd.conf
sudo systemctl restart nvidia-gridd

After the service has been restarted, check the license status of the vGPU:

nvidia-smi -q | grep 'License Status'

This should return a line stating it is “Licensed” with an expiry date in the future.

(Optional) Install the CUDA toolkit, if CUDA support is needed:

curl -O https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda_12.0.0_525.60.13_linux.run
sudo sh cuda_12.0.0_525.60.13_linux.run --silent --toolkit

This will run without any visible output for a while, before returning to a command prompt.

Note

We do not recommend using distribution-provided packages for the installation of CUDA toolkit. Those packages conflicts with required driver versions and will break your vGPU support.

To complete CUDA tookit installation, ensure that the CUDA libraries are available for applications to link and load:

sudo tee /etc/ld.so.conf.d/cuda.conf <<< /usr/local/cuda/lib64
sudo ldconfig

Docker Support

NVIDIA provide documentation on supporting vGPU access from Docker containers here:

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html