nvidia container runtime 설정
nvidia container runtime 을 설정하기 위해서는 꼭 docker-ce
로 docker
가 설치가 되어 있어야 됩니다.
(일반 RHEL/CentOS에서 제공되는 docker
package로는 설치 불가)
docker-ce 설치
- yum-utils 설치
$ yum -y install yum-utils
- docker-ce Repository 연결
$ yum-config-manager \ > --add-repo \ > https://download.docker.com/linux/centos/docker-ce.repo Loaded plugins: fastestmirror adding repo from: https://download.docker.com/linux/centos/docker-ce.repo grabbing file https://download.docker.com/linux/centos/docker-ce.repo to /etc/yum.repos.d/docker-ce.repo repo saved to /etc/yum.repos.d/docker-ce.repo $
- docker-ce 설치
$ yum install docker-ce
- docker 서비스 시작
$ systemctl enable docker Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service. $ systemctl start docker
docker
기본 runtime 확인# docker info ... 중략 Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f init version: fec3683 ... 중략
위와 같이 Default 로 지정된 runtime 은
runc
입니다.
nvidia container runtime 설치
- nvidia container runtime repository 연결
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.repo | \ sudo tee /etc/yum.repos.d/nvidia-container-runtime.repo
- nvidia container runtime 설치
$ yum install nvidia-container-runtime
- Daemon configuration file 수정 및 systemd 수정
- Daemon configuration file 수정
$ vi /etc/docker/daemon.json { "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }
- systemd 수정
$ mkdir -p /etc/systemd/system/docker.service.d $ tee /etc/systemd/system/docker.service.d/override.conf <<EOF [Service] ExecStart= ExecStart=/usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime EOF $ systemctl daemon-reload $ systemctl restart docker
- Daemon configuration file 수정
- nvidia container runtime 설치 확인
$ docker info ... 중략 Runtimes: nvidia runc Default Runtime: nvidia Init Binary: docker-init containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f init version: fec3683 ... 중략
위와 같이 사용가능한
docker
runtime 은nvidia
,runc
이고 Default runtime 은nvidia
입니다.
앞으로docker run
명령을 통해 생성되는 container 는nvidia-container-runtime
을 이용하여 생성될 것입니다.
마치며
상기 내용은 NVIDIA DGX Station 시스템에 Red Hat 운영체제 설치하고 CUDA 설정을 하면서 경험한 내용입니다.
[root@localhost sosreport-test-2019-xx-xx-xxxxxx]# cat proc/driver/nvidia/gpus/*/information
Model: Tesla V100-DGXS-16GB
IRQ: 144
GPU UUID: GPU-11111111-1111-1111-1111-111111111111
Video BIOS: 88.00.24.00.01
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:07:00.0
Device Minor: 0
Blacklisted: No
Model: Tesla V100-DGXS-16GB
IRQ: 145
GPU UUID: GPU-22222222-2222-2222-2222-222222222222
Video BIOS: 88.00.24.00.01
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:08:00.0
Device Minor: 1
Blacklisted: No
Model: Tesla V100-DGXS-16GB
IRQ: 146
GPU UUID: GPU-33333333-3333-3333-3333-333333333333
Video BIOS: 88.00.24.00.01
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:0e:00.0
Device Minor: 2
Blacklisted: No
Model: Tesla V100-DGXS-16GB
IRQ: 147
GPU UUID: GPU-44444444-4444-4444-4444-444444444444
Video BIOS: 88.00.24.00.01
Bus Type: PCIe
DMA Size: 47 bits
DMA Mask: 0x7fffffffffff
Bus Location: 0000:0f:00.0
Device Minor: 3
Blacklisted: No
Tesla V100
이 무려 4장이나 설치된 어마어마한 장비였습니다……. :(
참고 문서
- NVIDIA DGX Station : https://www.nvidia.com/ko-kr/data-center/dgx-station/
- NVIDIA Repository : https://nvidia.github.io/nvidia-container-runtime/
- Github nvidia-container-runtime : https://github.com/NVIDIA/nvidia-container-runtime
- Install
docker-ce
: https://docs.docker.com/install/linux/docker-ce/centos/