Install nvidia-docker on ubuntu 20.04
You should first check previous blog Install Docker on Ubuntu 20.04 to install docker on ubuntu 20.04, then you can follow these steps. But before that, you need to figure out several concepts about these nvidia software packages:
-
nvidia driver vs cuda toolkit
nvidia driver is a software stack that allows the computer to use NVIDIA GPUs, while CUDA Toolkit provides a set of development tools for creating applications that take advantage of NVIDIA GPUs. (this sentence is from my copilot’s auto-completion :) And it looks fine)
-
nvidia cuda toolkit vs nvidia container toolkit
These two concepts are related to the use of NVIDIA GPUs in Docker containers, and they have different purposes: CUDA Toolkit is for running CUDA applications directly on GPU hardware, while nvidia container toolkit provides a way to run Docker containers with access to NVIDIA GPUs. (also provided by copilot)
-
nvidia docker vs nvidia container toolkit
nvidia docker is a relative old project. Now it is recommended to use nvidia container toolkit instead of nvidia docker officially. There still nvidia docker supported, but it is named as nvidia docker2 project. Anyway, nvidia container toolkit is what you need if you want to use NVIDIA GPUs in Docker containers. (copilot again)
Here is the steps to install and configure nvidia container toolkit:
-
Step 1
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
-
Step 2
sudo apt-get update
-
Step 3
sudo apt-get install -y nvidia-container-toolkit
-
Step 4
Configure the container runtime by using the
nvidia-ctk
command:sudo nvidia-ctk runtime configure --runtime=docker
And then, restart the docker daemon:
sudo systemctl restart docker
To verify the installation, you can run nvidia-smi
inside a container like this:
docker run --gpus all nvidia/cuda:12.4.1-devel-ubuntu20.04 nvidia-smi
You should see the GPU information printed out if everything is working correctly.
==========
== CUDA ==
==========
CUDA Version 12.4.1
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
Fri Jul 5 08:33:33 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:18:00.0 Off | N/A |
| 0% 27C P8 6W / 370W | 16MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:3B:00.0 Off | N/A |
| 0% 26C P8 8W / 370W | 16MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA GeForce RTX 3090 Off | 00000000:86:00.0 Off | N/A |
| 0% 28C P8 6W / 370W | 16MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA GeForce RTX 3090 Off | 00000000:AF:00.0 Off | N/A |
| 0% 28C P8 14W / 370W | 16MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Now you can run a container with all GPUs like this:
docker run --gpus all --runtime=nvidia -itd f82e92565c4a bash
Here are some explanations about these options:
--runtime
option is used to let docker container use nvida provied runtime instead of default docker engine.--gpus
option is used to specify the GPUs that docker can use for this container. The value “all” means all available GPUs.-itd
is the conventional options for running a docker container in interactive mode with pseudo terminal and detached mode. So that you can use composed keys: Ctrl+P, Ctrl+Q to detach the container without stopping it. And you can usedocker attach <container_id>
to reattach the container.
And also if you need to connect the container with a proxy server provided by the host machine, you can add an extra option --add-host=host.docker.internal:host-gateway
to the docker run command. Then you can use host.docker.internal
as the hostname in your container to access host machine services (instead of localhost or 127.0.0.1).
And for Docker on Linux there is also an alternative option: --network="host"
for docker run command. After adding this option for docker run, 127.0.0.1
within the container will point to your docker host. (Note: after testing --network="host"
works better in my case)
So the final command would look like this:
docker run --gpus all --runtime=nvidia --network="host" -itd f82e92565c4a bash
Reference: