本来在PVE9上使用的104-100好好的,没事升级了一下,马上显卡不能用了。因为是小白,查了半天才知道内核升级导致的。
输入:
nvidia-smi
输入:
uname -r
输入:
dkms status
系统的内核是6.14.11-3-pve nvidia安装的内核是6.14.11-2-pve,两个不一样。
如要升级|降级vGPU驱动,则需要先卸载,再安装
nvidia 580.65.05资源下载
百度网盘:https://pan.baidu.com/s/1rqgxvmFku6rG2Ppjq4g-cg?pwd=knrj
# 卸载显卡驱动
./NVIDIA-Linux-x86_64-580.65.05-vgpu-kvm-custom.run --uninstall
# 移除显卡相关程序
apt remove --purge nvidia-*
安装 NVIDIA vGPU_HOST 驱动
在宿主机PVE下安装vGPU的HOST驱动。驱动已做修补,自行将host驱动NVIDIA-Linux-x86_64-580.65.05-vgpu-kvm-custom.run上传到/tmp目录
# 安装用到的依赖包和header头文件
apt install build-essential dkms mdevctl pve-headers-$(uname -r)
# 前往上传驱动的/tmp目录
cd /tmp
# 赋予执行权限
chmod +x NVIDIA-Linux-x86_64-580.65.05-vgpu-kvm-custom.run
# 安装驱动(默认一路回车直至安装完成即可)
./NVIDIA-Linux-x86_64-580.65.05-vgpu-kvm-custom.run --dkms -m=kernel
# 安装好后执行重启
reboot随后使用
nvidia-smiroot@pve9:~# nvidia-smi
Wed Aug 27 23:54:38 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.65.05 Driver Version: 580.65.05 CUDA Version: N/A |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA P106-100 On | 00000000:08:00.0 Off | N/A |
| 23% 46C P8 7W / 120W | 1041MiB / 6144MiB | 1% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1574 C+G vgpu 1012MiB |
+-----------------------------------------------------------------------------------------+
root@pve9:~# nvidia-smi vgpu
Wed Aug 27 23:54:44 2025
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 580.65.05 Driver Version: 580.65.05 |
|---------------------------------+------------------------------+------------+
| GPU Name | Bus-Id | GPU-Util |
| vGPU ID Name | VM ID VM Name | vGPU-Util |
|=================================+==============================+============|
| 0 NVIDIA P106-100 | 00000000:08:00.0 | 0% |
| 3251634191 GRID T4-1Q | dd02... Windows10-22H2,d... | 0% |
+---------------------------------+------------------------------+------------+以及mdevctl types查看
root@pve9:~# mdevctl types
0000:08:00.0
nvidia-222
Available instances: 15
Device API: vfio-pci
Name: GRID T4-1B
Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=16
nvidia-223
Available instances: 0
Device API: vfio-pci
Name: GRID T4-2B
Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=8
nvidia-224
Available instances: 0
Device API: vfio-pci
Name: GRID T4-2B4
Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=8
nvidia-225
Available instances: 15
Device API: vfio-pci
Name: GRID T4-1A
Description: num_heads=1, frl_config=60, framebuffer=1024M, max_resolution=1280x1024, max_instance=16
nvidia-226
Available instances: 0
Device API: vfio-pci
Name: GRID T4-2A
Description: num_heads=1, frl_config=60, framebuffer=2048M, max_resolution=1280x1024, max_instance=8
nvidia-227
Available instances: 0
Device API: vfio-pci
Name: GRID T4-4A
Description: num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=1280x1024, max_instance=4
nvidia-228
Available instances: 0
Device API: vfio-pci
Name: GRID T4-8A
Description: num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=1280x1024, max_instance=2
nvidia-229
Available instances: 0
Device API: vfio-pci
Name: GRID T4-16A
Description: num_heads=1, frl_config=60, framebuffer=16384M, max_resolution=1280x1024, max_instance=1
nvidia-230
Available instances: 15
Device API: vfio-pci
Name: GRID T4-1Q
Description: num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=16常用相关服务状态是否正常
# 查看相关服务状态
systemctl status {nvidia-vgpud.service,nvidia-vgpu-mgr.service}
# 重新启动相关服务
systemctl restart {nvidia-vgpud.service,nvidia-vgpu-mgr.service}
# 停止相关服务
systemctl stop {nvidia-vgpud.service,nvidia-vgpu-mgr.service}