1. 问题描述

在pytorch调用cuda的时候,报如下错误

torch/cuda/__init__.py:118: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0

此时

nvidia-smi

nvcc -V

torch.cuda.device_count()

均正常

而就在执行

torch.cuda.is_available()

时,报torch/cuda/__init__.py:118: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0

2. 解决

执行

sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm

查询发现有网友猜测是猜 NVIDIA 内核模块太脆弱了,而且会随机损坏所以采取删除并插入 nvidia_uvm 模块。

执行完成后

>>> import torch
>>> torch.cuda.is_available()
True

Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐