本来在PVE9上使用的104-100好好的,没事升级了一下,马上显卡不能用了。因为是小白,查了半天才知道内核升级导致的。

输入:

nvidia-smi

输入nvidia-smi出现的提示
输入:

uname -r

2025-10-15T04:08:29.png
输入:

dkms status

2025-10-15T04:09:20.png
系统的内核是6.14.11-3-pve nvidia安装的内核是6.14.11-2-pve,两个不一样。

如要升级|降级vGPU驱动,则需要先卸载,再安装

nvidia 580.65.05资源下载
百度网盘:https://pan.baidu.com/s/1rqgxvmFku6rG2Ppjq4g-cg?pwd=knrj

# 卸载显卡驱动
./NVIDIA-Linux-x86_64-580.65.05-vgpu-kvm-custom.run --uninstall
# 移除显卡相关程序
apt remove --purge nvidia-*

安装 NVIDIA vGPU_HOST 驱动
在宿主机PVE下安装vGPU的HOST驱动。驱动已做修补,自行将host驱动NVIDIA-Linux-x86_64-580.65.05-vgpu-kvm-custom.run上传到/tmp目录

# 安装用到的依赖包和header头文件
apt install build-essential dkms mdevctl pve-headers-$(uname -r)
# 前往上传驱动的/tmp目录
cd /tmp
# 赋予执行权限
chmod +x NVIDIA-Linux-x86_64-580.65.05-vgpu-kvm-custom.run
# 安装驱动(默认一路回车直至安装完成即可)
./NVIDIA-Linux-x86_64-580.65.05-vgpu-kvm-custom.run --dkms -m=kernel
# 安装好后执行重启
reboot

随后使用

nvidia-smi
root@pve9:~# nvidia-smi
Wed Aug 27 23:54:38 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.65.05              Driver Version: 580.65.05      CUDA Version: N/A      |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA P106-100                On  |   00000000:08:00.0 Off |                  N/A |
| 23%   46C    P8              7W /  120W |    1041MiB /   6144MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1574    C+G   vgpu                                   1012MiB |
+-----------------------------------------------------------------------------------------+
root@pve9:~# nvidia-smi vgpu
Wed Aug 27 23:54:44 2025       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 580.65.05              Driver Version: 580.65.05                 |
|---------------------------------+------------------------------+------------+
| GPU  Name                       | Bus-Id                       | GPU-Util   |
|      vGPU ID     Name           | VM ID     VM Name            | vGPU-Util  |
|=================================+==============================+============|
|  0   NVIDIA P106-100            | 00000000:08:00.0             |   0%       |
|      3251634191  GRID T4-1Q     | dd02...  Windows10-22H2,d... |    0%      |
+---------------------------------+------------------------------+------------+

以及mdevctl types查看

root@pve9:~# mdevctl types
0000:08:00.0
  nvidia-222
    Available instances: 15
    Device API: vfio-pci
    Name: GRID T4-1B
    Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=16
  nvidia-223
    Available instances: 0
    Device API: vfio-pci
    Name: GRID T4-2B
    Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=8
  nvidia-224
    Available instances: 0
    Device API: vfio-pci
    Name: GRID T4-2B4
    Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=8
  nvidia-225
    Available instances: 15
    Device API: vfio-pci
    Name: GRID T4-1A
    Description: num_heads=1, frl_config=60, framebuffer=1024M, max_resolution=1280x1024, max_instance=16
  nvidia-226
    Available instances: 0
    Device API: vfio-pci
    Name: GRID T4-2A
    Description: num_heads=1, frl_config=60, framebuffer=2048M, max_resolution=1280x1024, max_instance=8
  nvidia-227
    Available instances: 0
    Device API: vfio-pci
    Name: GRID T4-4A
    Description: num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=1280x1024, max_instance=4
  nvidia-228
    Available instances: 0
    Device API: vfio-pci
    Name: GRID T4-8A
    Description: num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=1280x1024, max_instance=2
  nvidia-229
    Available instances: 0
    Device API: vfio-pci
    Name: GRID T4-16A
    Description: num_heads=1, frl_config=60, framebuffer=16384M, max_resolution=1280x1024, max_instance=1
  nvidia-230
    Available instances: 15
    Device API: vfio-pci
    Name: GRID T4-1Q
    Description: num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=16

常用相关服务状态是否正常

# 查看相关服务状态
systemctl status {nvidia-vgpud.service,nvidia-vgpu-mgr.service}
# 重新启动相关服务
systemctl restart {nvidia-vgpud.service,nvidia-vgpu-mgr.service}
# 停止相关服务
systemctl stop {nvidia-vgpud.service,nvidia-vgpu-mgr.service}
只能处理以前安装过驱动,出现内核版本不一致问题
最后修改:2025 年 10 月 20 日
如果觉得我的文章对你有用,请随意赞赏