Automated NIC names in Linux on VMware vSphere

A colleague of mine was working with a customer recently on some changes to their automated VM provisioning process (they're not vRA customers… yet). He got stuck trying to get around a particular challenge with the automatic naming of network interfaces in certain Linux distributions.

The customer in question is using vRealize Orchestrator (vRO) to create (not clone) their Virtual Machines from a JSON structure that is supplied by an external system. In that structure there are definitions for the hardware, OS network identity (name, IP etc) and OS installation sources (ISO file for installation and floppy image for a ks.cfg (KickStart) file).

Once the JSON object is provided to the vRO workflow, the VM is created, booted and automatically starts to install and configure itself.

Customer's Simple VM

Simple VMs have a number of disks defined (for root, opt, var, swap etc partitions) that are attached to a single ParaVirtual SCSI adapter. The VM is also equipped with a single VMXNET3 network adapter.

In this configuration, there is no problem. The installation of the OS runs through to completion and the VM is handed off to Puppet and eventually goes in to service.

Customer's Complex VM

For the provision of Linux-based Oracle servers however, the customer wanted to be able to specify not only extra disks and partitions, but extra SCSI controllers too. Their JSON structure easily accommodated such a change and the vRO changes were simple also – it just required the addition of a loop or two.

The issue now though was that their installation and subsequent customisations of the VM failed. After a few test runs, the reason for the failure appeared. The network interface in the simple VM was “ens192” whilst the interface in the complex VM was “ens224” and the installation and customisation processes couldn't cope with that difference.

Why does this happen?

My first port of call was to verify how the network card was getting its name. I thought I knew, but I just wanted to check. Yes, RedHat have a page that goes in to detail: Consistent Network Device Naming. Coupled with a couple of other quick reads, it was clear that the NIC name was derived from the hardware and PCI slot being used.

Next I conducted a couple of quick tests and noticed something that I hadn't observed before (because I'd never had to look).

Test 1 – Create a Simple VM

By creating a new VM from scratch using the vSphere client and then inspecting it via the managed object browser (MOB), I was able to look at the VM's config in detail.

What's interesting is that creating the VM does not populate the PCI slot information immediately:

This would explain why no amount of fiddling with the order of hardware additions was making any difference. Only by powering on the VM do these slot values get set. Doing so, and looking at the network adapter again, I could see that the slot had been assigned.

With a slot number of 192, the network interface in the OS will go on to be called “ens192”, as expected.

Test 2 – Create a Complex VM

Again using the vSphere client, I created a VM with two ParaVirtual SCSI adapters:

After powering it on, and checking the MOB, I could see that the network adapter was being given a slot number of 224. This would result in the network interface being named “ens224”.

What that would suggest is that the SCSI adapters / controllers are being assigned PCI slots before the network interfaces.

Test 3 – Add a SCSI Adapter to the Simple VM

Before doing this, I cloned the Simple VM to a template (for later). Next I added a SCSI adapter to the Simple VM:

Back in the MOB, it's the new SCSI adapter that gets the PCI slot ID 224:

So, the new SCSI controller gets the higher PCI slot on this occasion because the network adapter already has 192 assigned.

Test 4 – Deploy from Template and Add SCSI Adapter

Just to cover off some more of the possibilities, I wanted to clone from the template I made and add a SCSI adapter. For giggles, I also added another network adapter.

Looking at the hardware, before powering the VM on, I found the following:

  • SCSI 1 (from the template) has PCI slot 160
  • VMXNET3 (from the template) has PCI slot 192
  • SCSI 2 (added during the clone process) is unset
  • VMXNET3 (added during the clone process) is unset

And after turning it on:

  • SCSI 1 (from the template) has PCI slot 160
  • VMXNET3 (from the template) has PCI slot 192
  • SCSI 2 (added during the clone process) has PCI slot 224
  • VMXNET3 (added during the clone process) has PCI slot 256

Summary of Findings

So far, we've discovered the following:

  1. In some Linux distributions, network interface names are automatically generated based on the PCI slot number assigned to the individual hardware item (network adapter, SCSI adapter etc) in vSphere.
  2. vSphere assigns PCI slot numbers to new hardware when a VM is powered on, not when it is added (unless the VM is already on)
  3. vSphere seems to assign PCI slot numbers to SCSI adapters before network adapters, probably because they have a lower device key value
  4. New hardware additions get the next available PCI slot, the existing, allocated slot numbers are not adjusted
  5. PCI slot numbers already allocated in templates are preserved

What did we do?

Now we know the lay of the land, it's time to work out how to solve the customer issue. Based on the findings above, to options were immediately obvious.

The first is to to adjust the process so that the VM is always created with a single SCSI controller and a single NIC, powered on and off, and then has the extra hardware added. We tried this and it works. It's a slightly more complex process than we wanted, but testing has shown it to be fairly reliable.

The second option is to specify the PCI slot numbers during the machine creation. Looking at the vSphere API reference for information about the VirtualDevicePciBusSlotInfo object gave us pause because it states the following:

The pci slot number assignment should generally be left to the system. If assigned a value, and the value is invalid or duplicated, it will automatically be reassigned. This will not cause an error.

Generally, the PCI slot numbers should never be specified in an Reconfigure operation, and only in a CreateVM operation if i) they are specified for all devices, and ii) the numbers have been determined by looking at an existing VM configuration of similar hardware version. In other words, when the virtual hardware configuration is duplicated.

Loosely translated: “not advisable”.

We settled on option 1, and the customer is happy.