Making Sense of QEMU CPU Types

I recently started migrating my personal virtual infrastructure over to Proxmox from ESXi. In this process, I've found some interesting QEMU/KVM-isms that I am getting used to. One thing that I have not seen well documented (but have seen a lot of discussion on) are the QEMU x86-64 CPU types. Not only are there instruction set differences, but there are security implications for the VM. QEMU will pass errata based on the CPU chosen, leaving the VM potentially vulnerable to speculative execution attacks. This post will dive a little deeper into the differences and share my findings. This is primarily for my own discovery, but maybe someone else will find it useful as well.

A bit of personal history...

You can skip past this section if you want to go straight to the technical bits, I just wanted to share how I got to where I am today: Proxmox and KVM. I've been in the world of enterprise IT and hyper scalers for over 15 years, and when I was starting my career, VMware was the big innovator in x86 virtualization. I've seen the new players start as interesting projects and slowly mature (Xen, KVM, Hyper-V, etc) and eventually dominate in certain industries due to some being open source (cough AWS EC2 cough) and being the back bone of the other guys (Azure), but VMware has always been the gold standard in this space for self-management, in my opinion.

My first go around with VMware ESXi (back when it was still called ESX with a Red Hat based service console) was at the university I went to for undergrad. I was a student employee, doing IT side tasks for a small research organization on campus. We had decommissioned some old servers (I believe they were K7-era Athlon-MPs and K8-era Opterons), and decided to run one under my desk. For what purpose? A Counter Strike Source server of course! It was 2008, I had a 100Mbps ethernet link to the campus backbone, and low latency for students in the dorms (the server was pretty popular for quite a while at UW-Milwaukee).

The only problem was that I also wanted to host a web server, a wiki, and wanted the flexibility to mess with other Linux distros and FreeBSD. I wanted some segmentation in the systems, so my web server wouldn't interfere with the Counter Strike server. Enter VMware - one of the campus networking folks mentioned that ESXi can do this, and a huge light bulb went on.

I find out ESXi is FREE? How can this be?! I install it, and after being confused for a few hours, I install the old J# based thick client. Bam! I've got my first VM running (Debian Linux of course!). Wow that was easy.....

I was a VMware advocate for years because of this experience. I never became certified or anything, but it was my hypervisor platform of choice. All because of that positive experience with the free version when I was a student. I ran it at home for many years. Ran it in college on a server with my room mates to share all of those Linux ISOs; ran it on all of my bare metal servers that lived with me throughout the years.

Enter 2023, Broadcom acquires VMware, and in 2024, they kill the free ESXi variant. In early March, CVE-2024-22252 hits with a pretty nasty Hypervisor escape bug in the guest USB XHCI controller. Yes, you can download the patches via v-Front, but my license is no longer valid, so time to move on to something supported. This feels very Oracle like when they killed Open Solaris in 2010.

I'll get off my soap box; with that out of the way, onto the technical bits...

x86-64 Microarchitecture Levels

In 2020, some major CPU and software vendors got together to create the x86-64 microarchitecture feature levels. Intel, AMD, Red Hat, and SUSE all had a seat at the table, with the intention of defining baseline instruction sets available on a CPU that a compiler can target. The idea is that newer x86-64 CPUs have additional instructions that the compiler can optimize code with. The result is faster execution time at the expense of backwards compatibility, code that is compiled for a newer level can not run on older CPUs that are missing those instructions - essentially drawing a hard line in the sand for older hardware.

Red Hat Enterprise Linux 9 requires a specific microarchitecture level to run, a first in the enterprise Linux space. Microsoft is even getting in on the fun, with indications showing future Windows 11 versions will require the POPCNT instruction on the CPU.

What are the levels?

x86-64 has been around for 21 years, and in that time it's instruction set extended, much like the original i386 architecture by the time the Pentium Pro came on the scene (MMX, SSE, SSE2, NX, and DEP were all major additions to i386 throughout it's lifetime).

Each microarchitecture level corresponds to a base number of instructions available on the CPU. The original x86-64 instruction set, which debuted with AMD's Athlon 64 in September 2003, is known as x86-64-v1. Any code targeting level 1 will run on any x86-64 CPU.

Level 2, x86-64-v2, corresponds to the Intel Nehalem (2009) and AMD Bulldozer (2010) and above. It has some additional SSE3 + SSE4 vector instructions, and the POPCNT instruction that Windows 11 may start requiring in 2024. RHEL 9 requires this version.

There is an unofficial Level 2.5 known as x86-64-v2-aes. It is important to call out, because it has a huge performance impact by hardware offloading TLS and full disk encryption that use AES. This corresponds to the Intel Sandy Bridge architecture from 2011 and above. These CPUs introduced AES instructions, making huge performance optimizations for full disk encryption and servers that handled a lot of TLS / HTTPS traffic.

Level 3, x86-64-v3, corresponds to an Intel Haswell (2013) or an AMD Excavator (2015) CPU or above. This instruction set adds AVX2, which provides 256-bit registers for performing SIMD operations (instead of loading, operating, storing, and looping). These provide for fast operations over data that is represented in large arrays, think matricies, vectors, etc.

Level 4, x86-64-v4, corresponds to CPUs that support AVX512. AVX512 introduces SIMD instructions for new 512bit wide registers. This is an odd support list, because Intel has actually taken AVX512 out of it's consumer CPUs from 2022 onwards. Intel CPUs from Skylake (2015) through Rocket Lake (2021) have AVX512 instructions. Newer Intel CPUs from Alder Lake (2022) onwards do not meet the x86-64-v4 baseline. AMD CPUs using Zen 4/4c (2022) cores support AVX512, which corresponds on the consumer side to the Ryzen 7xxx series.

A note on AVX512 and x86-64-v4

Clearly there is fragmentation with AVX512. There are just not as many applications that can utilize the AVX512 instruction set. Given that it introduces 512bit wide registers, it takes up a lot of silicon die space on the CPU. Manufacturers are making bets on it's usefulness, therefore, it is not included everywhere, and had very slow uptick from AMD.

Pulling CPU Types together with QEMU

QEMU allows for the "virtualization" of specific CPU types. You can select a x86-64 CPU type, in addition to generic CPU types that represent the microarchitecture levels above. Additionally, QEMU can pass the "host" CPU to KVM: this will pass through the running CPU of the hypervisor, along with the instructions that CPU supports to the virtual machine. In addition to instructions, security errata are passed, and this all has implications for both the performance and security of the VM.

Proxmox allows for one to live migrate a VM between hosts. This is similar to VMware ESXi's vmotion: the hypervisor snapshots the VM state, including the current running state of the CPU including it's cache, registers, stack, stack pointer, and program counter. The hypervisor will transfer this to another hypervisor, where it then instantiates the VM on the other host. Basically pausing the VM for a very brief period, transferring the state of the CPU, and then hitting the play button once transferred. It sounds complicated, but this happens very fast once the RAM is checkpointed and transferred (which is usually staged before the migration of the VM is done between hypervisors).

When we have two hosts that are live-migrating a VM, their host CPUs may differ. We might have differing brands (Intel vs AMD), differing architectures (Sandy Bridge vs Haswell), or even different CPU steppings of the same architecture. These will all carry differences in the instruction set between the hypervisor CPUs.

We run into a problem when the VM we are migrating is not migrating between two systems of the same CPU architecture and of the same stepping. The virtual CPU in the VM will have registers and a program counter with instructions that may not exist on the physical hardware we are migrating to. What does this do? The CPU in the hypervisor throws an exception when it encounters this foreign instruction, because it doesn't know what to do with it. For that CPU, it might as well be a corrupted program stack. Hopefully our hypervisor software handles this gracefully, which usually ends up with the VM being killed.

VM Security and Performance Impacts

How do we get around this live-migration instruction conundrum? QEMU has the ability to pass baseline sets of instructions to the virtual CPU in the VM, masking some of the hypervisor's instruction sets from the VM. This is all done in the name of compatibility to ensure that a VM can migrate between two hypervisors of differing CPU brand or architecture. The cost to doing this is twofold:

  1. Performance impacts - if a program you are running can take advantage of AVX2, and your virtual CPU does not include the AVX2 instruction set, software running on that VM will default to a non-AVX2 code path even if the hypervisor CPU supports AVX2.
  2. Security Impacts - in addition to instructions, CPUs have security errata applied to them (these usually involve speculative execution mitigations, like spectrev1, spectrev2, etc). Some of these errata are architecture or brand specific. If you choose a generic QEMU CPU type, such as one based on the x86-64 feature levels, security errata may not be applied, potentially leaving your VM open to a speculative execution attack.

Instruction Set and Security Errata Comparison

Now that we have a baseline understanding of x86-64 and the way QEMU masks host CPU instructions, I wanted to show a baseline of instructions and security errata that QEMU's generic x86-64-v3 CPU type support when compared to the actual host CPU. All of these are taken from the same VM on the same hypervisor, with the QEMU CPU type changed.

Host CPU Type

The bare metal hypervisor is running an AMD Ryzen 5900X. Below is the instruction set and security errata that are applied when using the 'host' CPU option in QEMU

  • CPU Model Name: AMD Ryzen 9 5900X 12-Core Processor
  • Instruction Sets / Feature Flags:
    • fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw perfctr_core ssbd ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr wbnoinvd arat npt lbrv nrip_save tsc_scale vmcb_clean flushbyasid pausefilter pfthreshold v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid fsrm arch_capabilities
  • Errata:
    • sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso

x86-64-v3 CPU Type

This is the most generic of the QEMU CPU types that I would run with this specific host processor.

  • CPU Model Name: QEMU Virtual CPU version 2.5+
  • Instruction Sets / Feature Flags:
    • fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm rep_good nopl cpuid extd_apicid tsc_known_freq pni ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c hypervisor lahf_lm cmp_legacy abm 3dnowprefetch vmmcall bmi1 avx2 bmi2
  • Errata:
    • fxsave_leak sysret_ss_attrs null_seg swapgs_fence amd_e400 spectre_v1 spectre_v2

Instruction Set and Feature Flag Differences

Regarding the instruction set/feature support, thats quite a huge difference. All the heavy hitters are there (avx, avx2, sse*, popcnt), but the generic v3 instruction is definitely a subset of the host CPU's superset of feature flags. This is expected, because this feature set is the common denominator between all modern AMD and Intel CPUs of the v3 baseline (basically the only guaranteed baseline to work for consumer CPUs in 2024). We will take a look at how this impacts performance below.

(Security) Errata Differences

The more interesting note is the Errata string, because this contains fixes to speculative execution bugs in the CPU. This flag is populated by the linux kernel if it detects that the CPU is potentially affected by that bug. You will notice that neither set of errata is a subset of the other for both models. Reading QEMU documentation here tells me that if a security errata is not present in the VM, your VM is not actually protected from that bug. Some of these errata do not apply to the host CPU, as it is newer and unaffected by the errata present on the generic QEMU CPU.

In this example, that leaves our VM vulnerable to the spec_store_bypass and srso security errata (CVE-2018-3639 and CVE-2023-20569, respectively) as the host CPU is vulnerable to these errata, and the QEMU vCPU is not applying this. That is actually pretty alarming, and something to be very aware of when choosing a QEMU CPU type for live migration purposes between different architectures.

Using a Vendor Specific CPU Type

We may be able to get around the security errata differences with a more modern CPU type. Since this is a modern AMD Ryzen system, QEMU recommends here using the EPYC-IBPB CPU type. IBPB stands for Indirect Branch Predictor Barrier, which is a mitigation technique for the multitude of spectre-style speculative execution attacks, so it is likely this will bring some errata into the picture. Let's take a look at the differences:

  • CPU Model Name: AMD EPYC Processor (with IBPB)
  • Instruction Sets / Feature Flags:
    • fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 arat
  • Errata:
    • fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 spec_store_bypass retbleed smt_rsb srso div0

In this case, we have some more instructions present over the generic QEMU CPU. This is great, as these also appear to be some AMD specific instructions too. What is strange here, we have an additional instruction, topoext that is present on the Epyc CPU type that is not present on the host.

The errata is more important: these errata are a superset of the errata present on the Ryzen 5900X of the host CPU. This is great - which means this VM is not exposing us to speculative execution attacks like the generic QEMU CPU is.

Performance Impacts

I would like to tie things off by running some CPU benchmarks with Geekbench 6. I attempted to run the avx2 enabled tests, however the generic x86-64-v3 CPU type segmentation faulted, so I also ran it with the standard x86_64 test. Let's see if these QEMU CPU types have any performance impact that is outside the margin of error against the host CPU. I am using a 2 core VM for these benchmarks. Ultimately, we are just trying to see if we pick up large deviations in performance.

Host CPU Benchmark Results

With the host CPU type, we see the following performance:

QEMU x86-64-v3 CPU Benchmark Results

  • Single Core Score: 2145
  • Multi Core Score: 3819
  • Test Reference: https://browser.geekbench.com/v6/cpu/5307093
  • AVX2 Single Core Score: Segfault
  • AVX2 Multi Core Score: Segfault
  • AVX2 Test Reference: N/A
    This benchmark failed to finish. It encountered a segmentation fault.

EPYC-IBPB CPU Benchmark Results

Performance Summary

To me, the benchmarks between the EPYC and Host CPU types are essentially the same, they are within a 2% margin of error, regardless of the AVX2 or not.

The AVX2 test on the x86-64-v3 CPU type would not complete. The Geekbench 6 Ray Tracer test caused the benchmark to segmentation fault. This tells me that this instruction set may not have everything needed to run the Geekbench 6 avx2 benchmark. This might be a harbinger for errors you may experience when using this CPU type with software that expects avx2 - or maybe Geekbench just compiled a binary that does not meet the v3 spec.

Take Aways

To close this all off, here are the things that you should think about. Hopefully this answers some questions (or maybe you learned something new) if you ended up on this page:

  • If you do not care about live migration between hosts, just use the host CPU type. It's dead simple, and you don't have to think twice about the security tradeoffs.
  • Certain CPU types may open your VM up to speculative execution attacks that your hypervisor host is not vulnerable to.
  • If you care about live migration, be mindful of your CPU type in QEMU. One should really avoid using the default x86-64 CPU types if at all possible.
    • If you are migrating between AMD/Intel systems, the generic CPU type is your only option. Just know that you may be opening your VM up to certain speculative execution attacks.
    • If you are standardized around similar microarchitectures, choose the lowest common denominator. For example, if you have Haswell and Skylake CPUs in your live migration cluster, choose the Haswell CPU. If you have EPYC Genoa and Rome CPUs, choose the Rome CPU type.
    • Be mindful of choosing a QEMU CPU type that may not have instructions of your host system. In theory, libvirt should catch this and error out when the VM starts.
  • Architectures can always be changed later on. Linux really doesn't care, and I am sure any modern distribution of Windows doesn't either - these parameters are generally set during boot. The only thing to note when doing this is if you remove instructions, and if you have software in that VM that depends on those instructions (like AVX2) you may run into crashes or a major performance degradation.
  • There are no discernible performance impact between the host (Ryzen 5900X) and EPYC CPU types.

Conclusion

I will be running my VMs with the host CPU type. If I ever end up with a second Proxmox system that I want to live-migrate to, I'll likely chose the EPYC-IPBP CPU type. I am going to avoid mixing CPU vendors in my cluster - stay with Intel or AMD - as I want to avoid running the QEMU generic CPU types. I'm sure people do this (and I've read a lot do), I'm just worried about any instability this may introduce that that doesn't immediately show itself. I would definitely not do this in a production environment, where uptime is required.