Skip to main content

Command Palette

Search for a command to run...

VirtualBox VM lagging when adding more CPUs

Updated
5 min read

Initial brain dump

I noticed something a little counter intuitive for me…

My home computer has the following specs:

  • CPU: Ryzen 9 7900X : 12 Cores / 24 threads (2 CCDs) / Special NUMA architecture

  • So There is basically 6 cores per silicon die.

The computer is running Windows 11.

I created an Kubuntu 24.04 with initially 4 CPUs assigned in VBox.

As I have overall 24 threads/vcpus available I bumped CPU count to 8 for the VM.

All of a sudden when logged in the VM, the UI was feeling sluggish/laggy as if there was not enough RAM…

Turns out because all the Cores are spread accross 2 dies/chips/sockets-ish, there is some cost when a program is running threads accross the 2 dies. So the OS needs to know not to do that…

Turns out VBox is not NUMA aware so when assigning 8 vpus, the 8 vcpus might spill to the other die which again is very costly when a single program has to make threads talk to each other accross the dies.

VMWare of KVM/Quemu(on linux) are NUMA aware so they know that iniherent achitecture of the CPUs so they know to prefrer scheduling the vcpus on a single die to prevent the overhead of cross-die communicating.

Also because there is only actually 12 real cores and there is also a hyperthreading cost so if one of the vcpus is actually a hyperthread, its actually going to also be a little slower (if a vcpus-hypertrhread is actually on another cpu die, it’s even worse)

So its best to only assign at most half of the real cores one has.

So for my processor, I only have 6 real cores, I shouldnt do anything that has some chance of spilling to the other cpu die.

But I still have 12 threads however, again VBox is not ccd/numa aware, and vbox all vcpus are equal which is not true at all. So even though VBox shows 24 vcpus,

Because 12 vcpus of those are on another ccd we shouldnt get above 12

Because 6 of those 12 vcpus per ccd are hyperthreaded we really shouldnt go above 6 as we are certainly be using vcpus which are hyperthreads or on another die which can also cause latency

But because we only really have 6 real cores usable per ccd, using 6 vcpus will likely result in some of those being hyperthreads which can also cause latency / co-scheduling issues

I though i should just go with 3 vcpus but nope Linux guests work best when number of cpus is a power of 2. so the best is 4 vcpus which leave in some room to keep all those vcpus on the same ccd, and not cause any hyperthread vcpus related issue…

However all this BS is just because VBox is not NUMA/CCD aware. If i were on linux using QUEMU/KVM or VMWare I think I wouldnt have these issues.

ChatGPT rewrite

Title: A Counter-Intuitive VirtualBox Performance Lesson on Ryzen

Why My Linux VM Got Slower When I Gave It More CPUs

I recently ran into a counter-intuitive performance issue while running a Linux VM.

My system uses a Ryzen 9 7900X:

  • 12 physical cores / 24 threads

  • 2 CCDs (chiplets)

  • NUMA-like architecture (6 cores per CCD)

The host OS is Windows 11.


The Setup

I created a Kubuntu 24.04 VM in VirtualBox, initially assigning 4 vCPUs.
The VM felt smooth and responsive.

Since the CPU exposes 24 logical threads, I increased the VM to 8 vCPUs, expecting better performance.

Instead, the desktop became noticeably laggy — window movement stuttered and the UI felt unresponsive, despite normal RAM and CPU usage.


The Cause

The Ryzen 7900X is not a single monolithic chip. Each CCD has its own L3 cache, and communication between CCDs has a real latency cost.

The problem is that VirtualBox is not NUMA or CCD aware. When assigning 8 vCPUs, VirtualBox may schedule them across both CCDs, even for a single workload like a desktop environment. This forces threads to communicate across CCDs, hurting latency-sensitive tasks such as GUI rendering.

Hypervisors like KVM/QEMU or VMware are topology-aware and usually try to keep vCPUs within the same CCD, avoiding this issue.


SMT Makes It Worse

Out of the 24 logical CPUs:

  • Only 12 are real physical cores

  • The rest are SMT (hyper-threaded) siblings

VirtualBox treats all vCPUs as equal, even though SMT threads are slower and more prone to contention — especially when combined with cross-CCD scheduling.


The Practical Limit

To avoid cross-CCD traffic, vCPUs should stay well below the 6 physical cores per CCD.
However, assigning all 6 risks SMT contention and scheduling latency.

Although 3 vCPUs seemed like the safest option, Linux guests tend to behave better with power-of-two CPU counts.


The Sweet Spot

4 vCPUs turned out to be ideal:

  • Fits comfortably within one CCD

  • Leaves scheduling headroom

  • Minimizes SMT contention

  • Eliminates cross-CCD latency

  • Results in a smooth, responsive desktop

Switching back to 4 vCPUs immediately fixed the lag.


Takeaway

This issue exists because VirtualBox is not NUMA/CCD aware.

On the same hardware, KVM/QEMU or VMware would likely handle 8 vCPUs without issue.

The lesson is simple:

More vCPUs does not always mean better performance — especially on modern multi-CCD CPUs when using VirtualBox.

Sometimes, fewer CPUs really are faster.

Resources: