续前节。切好继续:
一,文档里提到uio_pci_generic, igb_uio, vfio_pci三个内核模块,完全搞不懂,以及dpdk-devbind.py用来查看网卡状态,我得到了下边的输出:
[root@dpdk tools]# ./dpdk-devbind.py --status Network devices using DPDK-compatible driver ============================================ <none> Network devices using kernel driver =================================== 0000:00:03.0 'Virtio network device' if= drv=virtio-pci unused= Other network devices ===================== <none> [root@dpdk tools]#
所以,首先需要学习一下qemu的网卡设置,调一调硬件再回来~~(我悲催的去man qemu了。。。)
此前,对于qemu的网络,我只有一种用法,外边一个tap,里边一个virtio。
man完,回来鸟,guest的硬件使用”-net nic model=xxx“可以模拟。但是如何passthough还不知道。
1 在前端驱动使用virtio的情况下,如何让后端使用vhost-user
突然意识到其实这个事情如此复杂,于是我觉得另起一文。move to ” [qemu] 在前端驱动使用virtio的情况下,如何让后端使用vhost-user”
2. 设备直接访问,PCI passthrough
http://blog.csdn.net/qq123386926/article/details/47757089
http://blog.csdn.net/halcyonbaby/article/details/37776211
http://blog.csdn.net/richardysteven/article/details/9008971
两种方法,pci-stub / VFIO ,我只使用较新的VFIO。我准备把我的物理网口交给虚拟机直接访问。
1. 确保CPU支持 vt-d,并且bois中已经打开。
我的CPU是支持地:http://ark.intel.com/products/85214/Intel-Core-i7-5500U-Processor-4M-Cache-up-to-3_00-GHz
2. 修改grub在内核启动 intel_iommu=on(这里有个坑,请继续阅读后边另起一 ”文“ 讲了这个坑)
[tong@T7 dpdk]$ zcat /proc/config.gz |grep -i intel_iommu CONFIG_INTEL_IOMMU=y CONFIG_INTEL_IOMMU_SVM=y # CONFIG_INTEL_IOMMU_DEFAULT_ON is not set CONFIG_INTEL_IOMMU_FLOPPY_WA=y [tong@T7 dpdk]$
3. 加载 vfio-pci 驱动至内核。
[tong@T7 dpdk]$ sudo modprobe vfio-pci [tong@T7 dpdk]$ lsmod |grep vfio vfio_pci 36864 0 vfio_iommu_type1 20480 0 vfio_virqfd 16384 1 vfio_pci vfio 24576 2 vfio_iommu_type1,vfio_pci irqbypass 16384 2 kvm,vfio_pci [tong@T7 dpdk]$
4. 查看网卡信息
[root@T7 0000:00:19.0]# lspci -vv -nn -d 8086:15a3 00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-V [8086:15a3] (rev 03) Subsystem: Lenovo Device [17aa:2227] Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 20 Region 0: Memory at f2200000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at f223e000 (32-bit, non-prefetchable) [size=4K] Region 2: I/O ports at 4080 [size=32] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [e0] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel modules: e1000e
5. bind / unbind
[root@T7 0000:00:19.0]# echo "0000:00:19.0" > /sys/bus/pci/devices/0000:00:19.0/driver/unbind
[root@T7 0000:00:19.0]# echo "8086 15a3" > /sys/bus/pci/drivers/vfio-pci/new_id
*** 问题来了,根据文档描述,已经发现些许不对,我并没有iommu_group, 那是神马鬼。。。***
[tong@T7 dpdk]$ ls /dev/vfio/ vfio [tong@T7 dpdk]$ dmesg |grep vfio [20355.407062] vfio-pci: probe of 0000:00:19.0 failed with error -22 [20593.172116] vfio-pci: probe of 0000:00:19.0 failed with error -22 [20684.750370] vfio-pci: probe of 0000:00:19.0 failed with error -22 [tong@T7 dpdk]$
我如下启动,然后报错:
[tong@T7 dpdk]$ cat start.sh sudo qemu-system-x86_64 -enable-kvm -m 2G -cpu Nehalem -smp cores=2,threads=2,sockets=2 -numa node,mem=1G,cpus=0-3,nodeid=0 -numa node,mem=1G,cpus=4-7,nodeid=1 -drive file=disk.img,if=virtio -net nic,model=virtio,macaddr='00:00:00:00:00:03' -device vfio-pci,host='0000:00:19.0' -net tap,ifname=tap0 & [tong@T7 dpdk]$ ./start.sh [tong@T7 dpdk]$ qemu-system-x86_64: -device vfio-pci,host=0000:00:19.0: vfio: error no iommu_group for device qemu-system-x86_64: -device vfio-pci,host=0000:00:19.0: Device initialization failed
问题解答:
为了解答这个问题,我读了内核文档,以及又读了IBM的这篇特别好的文。终于理解了iommu group到底是什么,然而并没有找到答案。
https://www.kernel.org/doc/Documentation/vfio.txt
https://www.ibm.com/developerworks/community/blogs/5144904d-5d75-45ed-9d2b-cf1754ee936a/entry/vfio?lang=en
那么为什么没有iommu_group呢? 因为我愚蠢啊!并没有如(2)所说在grub上加入内核参数 intel_iommu=on 。为什么我没加呢? 因为我已经zcat /proc/config.gz里边写着是y就是启动了的意思。然后等我加好这个参数之后,再zcat /proc/config.gz。两次竟然是一样的。嗯,原来我根本就把这个文件的功能理解错了。我猜它只是代表内核编译时的选项状态。与运行状态根本就是无关的!
于是,改完参数,系统刚刚启动开的时候,是酱紫的,就代表生效了:
[tong@T7 ~]$ ll /sys/bus/pci/devices/0000:00:19.0/ |grep io lrwxrwxrwx 1 root root 0 Sep 27 23:44 iommu -> ../../virtual/iommu/dmar1 lrwxrwxrwx 1 root root 0 Sep 27 23:44 iommu_group -> ../../../kernel/iommu_groups/5 [tong@T7 ~]$
然后出栈这个问题,回到 unbind / bind 继续,我要passthrough给虚拟机的是物理网卡 lan0 :
unbind前网络灯亮,状态信息:
[tong@T7 ~]$ lspci -vv -nn -s 00:19.0 00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-V [8086:15a3] (rev 03) Subsystem: Lenovo Device [17aa:2227] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 46 Region 0: Memory at f2200000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at f223e000 (32-bit, non-prefetchable) [size=4K] Region 2: I/O ports at 4080 [size=32] Capabilities: <access denied> Kernel driver in use: e1000e Kernel modules: e1000e [tong@T7 ~]$ sudo ip link show dev lan0 2: lan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 50:7b:9d:5c:1e:9b brd ff:ff:ff:ff:ff:ff [tong@T7 ~]$ ll /sys/bus/pci/devices/0000:00:19.0/ |grep driver lrwxrwxrwx 1 root root 0 Sep 27 23:42 driver -> ../../../bus/pci/drivers/e1000e -rw-r--r-- 1 root root 4096 Sep 27 23:44 driver_override [tong@T7 ~]$
unbind:(I don't know why ? maybe someday someone could tell me, if you see code belowj.但这并不重要)
[tong@T7 ~]$ sudo echo 0000:00:19.0 > /sys/bus/pci/devices/0000:00:19.0/driver/unbind bash: /sys/bus/pci/devices/0000:00:19.0/driver/unbind: Permission denied [tong@T7 ~]$ sudo su - [root@T7 ~]# echo 0000:00:19.0 > /sys/bus/pci/devices/0000:00:19.0/driver/unbind [root@T7 ~]#
unbind成功后,各状态的对比如下: 网卡灯还是亮的
[root@T7 ~]# lspci -vv -nn -s 00:19.0 00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-V [8086:15a3] (rev 03) Subsystem: Lenovo Device [17aa:2227] Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 20 Region 0: Memory at f2200000 (32-bit, non-prefetchable) [size=128K] Region 1: Memory at f223e000 (32-bit, non-prefetchable) [size=4K] Region 2: I/O ports at 4080 [size=32] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [e0] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel modules: e1000e [root@T7 ~]# ip link show dev lan0 Device "lan0" does not exist. [root@T7 ~]# ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000 link/ether dc:53:60:6c:b5:7e brd ff:ff:ff:ff:ff:ff 4: internal-br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 26:4a:07:a1:4f:06 brd ff:ff:ff:ff:ff:ff [root@T7 ~]# ll /sys/bus/pci/devices/0000:00:19.0/ |grep driver -rw-r--r-- 1 root root 4096 Sep 27 23:44 driver_override [root@T7 ~]#
bind to vfio:
[root@T7 ~]# modprobe vfio_pci [root@T7 ~]# lsmod |grep vfio vfio_pci 36864 0 vfio_iommu_type1 20480 0 vfio_virqfd 16384 1 vfio_pci vfio 24576 2 vfio_iommu_type1,vfio_pci irqbypass 16384 2 kvm,vfio_pci [root@T7 ~]# echo 8086 15a3 > /sys/bus/pci/drivers/vfio-pci/new_id
bind成功后,各种状态:
[root@T7 ~]# ll /sys/bus/pci/devices/0000:00:19.0/iommu_group/devices/ total 0 lrwxrwxrwx 1 root root 0 Sep 28 00:09 0000:00:19.0 -> ../../../../devices/pci0000:00/0000:00:19.0 [root@T7 ~]# ll /dev/vfio/ total 0 crw------- 1 root root 242, 0 Sep 28 00:08 5 crw-rw-rw- 1 root root 10, 196 Sep 28 00:06 vfio [root@T7 ~]# ll /sys/bus/pci/devices/0000:00:19.0/iom* lrwxrwxrwx 1 root root 0 Sep 27 23:44 /sys/bus/pci/devices/0000:00:19.0/iommu -> ../../virtual/iommu/dmar1 lrwxrwxrwx 1 root root 0 Sep 27 23:44 /sys/bus/pci/devices/0000:00:19.0/iommu_group -> ../../../kernel/iommu_groups/5 [root@T7 ~]# dmesg |tail ... ... [ 1027.806155] e1000e 0000:00:19.0 lan0: removed PHC [ 1394.134555] VFIO - User Level meta-driver version: 0.3 [root@T7 ~]# lspci -vv -nn -s 00:19.0 00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-V [8086:15a3] (rev 03) Subsystem: Lenovo Device [17aa:2227] Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 20 Region 0: Memory at f2200000 (32-bit, non-prefetchable) [disabled] [size=128K] Region 1: Memory at f223e000 (32-bit, non-prefetchable) [disabled] [size=4K] Region 2: I/O ports at 4080 [disabled] [size=32] Capabilities: [c8] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D3 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [e0] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: vfio-pci Kernel modules: e1000e [root@T7 ~]# ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DORMANT group default qlen 1000 link/ether dc:53:60:6c:b5:7e brd ff:ff:ff:ff:ff:ff 4: internal-br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 26:4a:07:a1:4f:06 brd ff:ff:ff:ff:ff:ff [root@T7 ~]#
6. 启虚拟机测试,进去虚拟机查看,多了一个网卡,该网卡在虚拟机内可以收到交换机上的二层广播,可以dhcp到地址:
[root@dpdk ~]# lspci -nn 00:00.0 Host bridge [0600]: Intel Corporation 440FX - 82441FX PMC [Natoma] [8086:1237] (rev 02) 00:01.0 ISA bridge [0601]: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] [8086:7000] 00:01.1 IDE interface [0101]: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] [8086:7010] 00:01.3 Bridge [0680]: Intel Corporation 82371AB/EB/MB PIIX4 ACPI [8086:7113] (rev 03) 00:02.0 VGA compatible controller [0300]: Device [1234:1111] (rev 02) 00:03.0 Ethernet controller [0200]: Red Hat, Inc Virtio network device [1af4:1000] 00:04.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (3) I218-V [8086:15a3] (rev 03) 00:05.0 SCSI storage controller [0100]: Red Hat, Inc Virtio block device [1af4:1001] [root@dpdk ~]# ip link 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000 link/ether 00:00:00:00:00:03 brd ff:ff:ff:ff:ff:ff 3: ens4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 1000 link/ether 50:7b:9d:5c:1e:9b brd ff:ff:ff:ff:ff:ff [root@dpdk ~]# tcpdump -i ens4 -nn -c 10 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ens4, link-type EN10MB (Ethernet), capture size 65535 bytes 00:17:32.969547 ARP, Request who-has 192.168.197.100 tell 192.168.197.101, length 46 00:17:33.970617 ARP, Request who-has 192.168.197.100 tell 192.168.197.101, length 46
7. 是否可以复用??? 我打算再启动一个虚拟机看看。
[tong@T7 CentOS7]$ ./start.sh [tong@T7 CentOS7]$ qemu-system-x86_64: -device vfio-pci,host=0000:00:19.0: vfio: error opening /dev/vfio/5: Device or resource busy qemu-system-x86_64: -device vfio-pci,host=0000:00:19.0: vfio: failed to get group 5 qemu-system-x86_64: -device vfio-pci,host=0000:00:19.0: Device initialization failed ^C [tong@T7 CentOS7]$
答案是不能!
至此,pci网卡使用 vfio 配置passthrough完成!: )