The Mirantis NFV initiative aims to create an NFV ecosystem for OpenStack, with validated hardware at the bottom; hardened, configurationally-optimized OpenStack as a platform in the middle, and validated VNFs and other NFV software and application components at the top. As the pure play OpenStack company, we know that OpenStack is the best way to create an NFV infrastructure (NFVi), but we also know that our NFV clients – both telcos and enterprises – need more than just the OpenStack platform. They need a complete solution for NFV Infrastructure (NFVi) that answers the whole stack of architectural challenges presented by NFV — in compute, networking, storage, availability, scale and performance — and that reliably provides the network functions, orchestration and management functionality carriers need.
To provide this solution, Mirantis is integrating and optimizing OpenStack itself, and working with an ever-growing number of partners. In this article, we’ll talk about one important innovation that will help turn OpenStack into NFVi, Single Root I/O Virtualization or SR-IOV.
SR-IOV is a PCI Special Interest Group (PCI-SIG) specification for virtualizing network interfaces, representing each physical resource as a configurable entity (called a PF for Physical Function), and creating multiple virtual interfaces (VFs or Virtual Functions) with limited configurability on top of it, recruiting support for doing so from the system BIOS, and conventionally, also from the host OS or hypervisor. Among other benefits, SR-IOV makes it possible to run a very large number of network-traffic-handling VMs per compute without increasing the number of physical NICs/ports, and provides means for pushing processing for this down into the hardware layer, off-loading the hypervisor and significantly improving both throughput and deterministic network performance. That’s why it’s an NFV must-have.
We first talked about SR-IOV at the OpenStack Summit in Vancouver, in a session with an unofficial title that might as well have been “Run, Forrest, run!” because the main idea of SR-IOV is to get data to VMs more quickly. Now, we’re going to look at actually using SR-IOV with Mirantis OpenStack.
SR-IOV can be complicated. Note: On Intel NICs, PF cannot support promiscuous mode when SR-IOV is enabled, so it cannot be doing L2 bridging. Because of this, you shouldn’t enable SR-IOV on interfaces that have standard Fuel networks assigned to them. (One way to get around this problem is to use nova host aggregates and different flavours for normal and SR-IOV enabled instances, but it’s out of scope for us in this article; if you’d like to hear more about it, let us know in the comments, and we’ll do a separate blog post.)
You should note that SR-IOV has a couple of limitations in the Kilo release of OpenStack. Most notably, instance migration with SR-IOV attached ports is not supported. Also, iptables-based filtering is not usable with SR-IOV NICs, because SR-IOV bypasses the normal network stack, so security groups cannot be used with SR-IOV enabled ports (though you still can use security groups for normal ports).
So now that we know what we’re talking about, let’s look at how to enable SR-IOV and use SR-IOV. While you can use Fuel to deploy a Mirantis OpenStack cloud that includes all of the pieces for SR-IOV, it still needs to be configured separately.
Enabling SR-IOV
To enable SR-IOV, you need to configure it on compute and controller nodes. Let’s start with the compute nodes.
Configure SR-IOV on Compute nodes
To enable SR-IOV, perform the following steps only on Compute nodes that will be used for running instances with SR-IOV virtual NICs:
- Ensure that your compute nodes are capable of PCI passthrough and SR-IOV. Your hardware must provide VT-d and SR-IOV capabilities and these extensions may need to be enabled in the BIOS. VT-d options are usually configured in the Chipset Configuration/North Bridge/IIO configuration” section of the BIOS, while SR-IOV support is configured in “PCIe/PCI/PnP Configuration.”
If your system supports VT-d you should see the messages related to DMAR in dmesg output:# grep -i dmar /var/log/dmesg [ 0.000000] ACPI: DMAR 0000000079d31860 000140 (v01 ALASKA A M I 00000001 INTL 20091013) [ 0.061993] dmar: Host address width 46 [ 0.061996] dmar: DRHD base: 0x000000fbffc000 flags: 0x0 [ 0.062004] dmar: IOMMU 0: reg_base_addr fbffc000 ver 1:0 cap d2078c106f0466 ecap f020de [ 0.062007] dmar: DRHD base: 0x000000c7ffc000 flags: 0x1 [ 0.062012] dmar: IOMMU 1: reg_base_addr c7ffc000 ver 1:0 cap d2078c106f0466 ecap f020de [ 0.062014] dmar: RMRR base: 0x0000007bc94000 end: 0x0000007bca2fff
This is just an example, of course; your output may differ.
If your system supports SR-IOV you should see SR-IOV capability section for each NIC PF, and the total VFs should be non-zero:
lspci -vvv | grep -i "initial vf"
Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 01
Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 00
Initial VFs: 8, Total VFs: 8, Number of VFs: 0, Function Dependency Link: 01 - Check that VT-d is enabled in the kernel using this command:
# grep -i "iommu.*enabled" /var/log/dmesg
If you don’t see a response similar to:
[0.000000] Intel-IOMMU: enabled
then it’s not yet enabled. Enable it by editing
/etc/default/grub
to add:GRUB_CMDLINE_LINUX=" console=ttyS0,9600 console=tty0 net.ifnames=0 biosdevname=0 rootdelay=90 nomodeset root=UUID=d2b06335-bf6d-44b8-a0a4-a54224bdc7f8 intel_iommu=on"
Next, update grub and reboot to get the changes to take effect:
# update-grub # reboot
and repeat the check. For new environments you may want to add these kernel parameters before deploying so that they will be applied to all nodes of environment. You can do that from the Fuel interface in the “Kernel Parameters” section of the “Settings” tab.
NOTE: If you have an AMD motherboard, you need to check for ‘AMD-Vi’ in the output of the dmesg command and pass the options “iommu=pt iommu=1″ to kernel, (but we haven’t yet tested that). - Enable the number of virtual functions required on the SR-IOV interface. NOTE: Do not set the number of VFs to more than required, since this might degrade performance. Depending on kernel and NIC driver version you might get more queues on each PF with fewer VFs (usually, fewer than 32).First, enable the interface:
ip link set eth1 up
Next, from the command-line, get the maximum number of functions that could potentially be enabled for your NIC:
cat /sys/class/net/eth1/device/sriov_totalvfs
Then finally, enable the desired number of virtual functions for your NIC:
echo 31 > /sys/class/net/eth1/device/sriov_numvfs
NOTE: These settings aren’t saved across reboots. To save them, add them to
/etc/rc.local
:ip link set eth1 up echo "echo 31 > /sys/class/net/eth1/device/sriov_numvfs" >> /etc/rc.local
- Check to make sure that SR-IOV is enabled:
# ip link show eth1 |grep vf vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto vf 1 MAC c2:cd:57:9b:6c:7d, spoof checking on, link-state auto ...
If you don’t see ‘link-state auto’ in output, then your installation will require an SR-IOV agent. You can enable it like so:
apt-get install neutron-plugin-sriov-agent # nohup neutron-sriov-nic-agent --debug --log-file /tmp/sriov_agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf_sriov.ini
- Edit
/etc/nova/nova.conf
:pci_passthrough_whitelist={"devname": "eth1", "physical_network":"physnet2"}
- Edit
/etc/neutron/plugins/ml2/ml2_conf_sriov.ini
:[sriov_nic] physical_device_mappings = physnet2:eth1
- Restart the compute service:
# restart nova-compute
- Get the vendor’s product id; you’ll need it to configure SR-IOV on the controller nodes.
NOTE: This is just an example of the output. Actual value may differ on your hardware.
# lspci -nn|grep -e "Ethernet.*Virtual" 06:10.1 Ethernet controller [0200]: Intel Corporation 82599 Ethernet Controller Virtual Function [8086:10ed] (rev 01) 06:10.3 Ethernet controller [0200]: Intel Corporation 82599 Ethernet Controller Virtual Function [8086:10ed] (rev 01) ...
Write down the vendor’s product id (the value in square brackets).
Configure SR-IOV on the Controller nodes
- Edit
/etc/neutron/plugins/ml2/ml2_conf.ini
; use the vendor’s product id from the previous step as the value for supported_pci_vendor_devs:
Change the line for mechanism_driversmechanism_drivers =openvswitch,l2population,sriovnicswitch
and add new section at the end of file:
[ml2_sriov] supported_pci_vendor_devs = 8086:10ed
- Edit
/etc/nova/nova.conf
:[DEFAULT] scheduler_default_filters=DifferentHostFilter,RetryFilter, AvailabilityZoneFilter,RamFilter,CoreFilter,DiskFilter,ComputeFilter, ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter,PciPassthroughFilter
- Restart services:
restart neutron-server restart nova-api
Using SR-IOV
Now you’re ready to actually use SR-IOV.
- A recommended practice for using SR-IOV is to create a separate host aggregate for SR-IOV enabled computes.
nova aggregate-create sriov nova aggregate-set-metadata sriov sriov=true nova aggregate-create normal nova aggregate-set-metadata normal sriov=false
… and add some hosts to them:
nova aggregate-add-host sriov node-9.domain.tld nova aggregate-add-host normal node-10.domain.tld
- Create a new flavor for VMs that require SR-IOV support:
nova flavor-create m1.small.sriov auto 2048 20 2 nova flavor-key m1.small.sriov set aggregate_instance_extra_specs:sriov=true
You should update all other flavours so they will start only on hosts without SR-IOV support:
openstack flavor list -f csv|grep -v sriov|cut -f1 -d,| tail -n +2| xargs -I% -n 1 nova flavor-key % set aggregate_instance_extra_specs:sriov=false
To use the SR-IOV port you need to create an instance with ports that use the vnic-type “direct”. For now, you’ll need to do this via the command line. Because the default Cirros image does not have the Intel NIC drivers included, we’ll use an Ubuntu cloud image to test SR-IOV.
- Prepare the ubuntu cloud image:
# glance image-create --name trusty --disk-format raw --container-format bare --is-public True --location https://cloud-images.ubuntu.com/trusty/current/trusty-server-cloudimg-amd64-disk1.img
You can only login to this instance by using an ssh public key, so let’s go ahead and create a keypair. You can do this from the Horizon interface, but we’ll do it from the command-line, like so:
# nova keypair-add key1 > key1.pem
# chmod 600 key1.pem
- Create a port for the instance:
# neutron port-create net04 --binding:vnic-type direct --device_owner nova-compute --name sriov-port1
- Spawn the instance:
# port_id=`neutron port-list | grep sriov-port1 | awk ‘{print $2}’` # nova boot --flavor m1.small --image trusty --key_name key1 --nic port-id=$port_id sriov-vm1
- Get the node’s ip address:
# nova list | grep sriov-vm1 | awk '{print $12}' net04=192.168.111.5
- Connect to the instance to check if everything up and running:
Find controllers with namespace which has access to instance:# dhcp-agent-list-hosting-net net04 # neutron dhcp-agent-list-hosting-net -f csv -c host net04 --quote none | tail -n+2 node-7.domain.tld node-9.domain.tld
Connect to the instance (this command should be run on one of the controllers which we found in previous step):
# ip netns exec `ip netns show|grep qdhcp-$(neutron net-list | grep 'net04 ' | awk '{print$2}')` ssh -i key1.pem ubuntu@192.168.111.5
And that should be it!
Troubleshooting
Sometimes something goes wrong. Here are some common problems and solutions.
- If you see errors in /var/log/nova/nova-compute.log on the compute host:
libvirtError: internal error: missing IFLA_VF_INFO in netlink response
… you should install a newer version of libnl3, as shown above.
- If you see:
libvirtError: unsupported configuration: host doesn't support passthrough of host PCI devices
… in /var/log/nova/nova-compute.log, it means that VT-d is not supported or not enabled.
- If you see:
NovaException: Unexpected vif_type=binding_failed
You should enable the SR-IOV agent, or if you’ve already done so, check that it’s running:
# neutron agent-list | grep sriov-nic-agent | dfa4edcf-63c1-4af7-a291-ec139a16f346 | NIC Switch agent | node-16.domain.tld | :-) | True | neutron-sriov-nic-agent |
Otherwise, examine the log file /tmp/sriov_agent for clues to what else might be wrong.
Conclusion
For now, configuring Mirantis OpenStack for SR-IOV is still relatively complex, thus potentially challenging to do on large clusters and prone to error. During the Mikata cycle, we’ll be making improvements to current configurations, doing deeper testing, and working on automating configuration and deployment of SR-IOV via Fuel.
http://dev-vpierre-plugindev.pantheon.io/carrier-grade-mirantis-openstack-the-mirantis-nfv-initiative-part-1-single-root-io-virtualization-sr-iov/