Finding a Wasm Runtime Unikernel for libkrun

· Virtualization, microVMs and Three-Headed Monkeys


There's this interesting idea of adding support for running Wasm/WASI payloads in libkrun, which is something we could easily achieve by simply embedding a Wasm runtime, statically built for Linux, into initrd.

Now, the problem with this approach is that, despite having a payload (the Wasm runtime) with a well-known behavior, we would still be using a (built with a minimal config, but otherwise complete) Linux kernel, while only needing a small amount of its functionality. In other words, the workload's TCB would not be optimal.

But, what if the Wasm runtime was also the kernel?


Wait, do you really need Virtulization for running a Wasm workload? #

Yes and no. In most cases, no. The isolation provided by the Wasm runtime, combined with container isolation (namespaces, cgroups, selinux...) for the runtime itself, provides an excellent degree of security.

But there's an scenario where Virtualization is not optional, and that's when you want to protect the workload with Confidential Computing technologies such as SEV or TDX, as both of them are built on top of the existing Virtualization capabilities provided by the hardware.


Choosing a Unikernel + WASM Runtime combo #

The first idea that came to my mind was to use RustyHermit, which is supported as a target by the Rust toolchain, to build a Rust-based runtime, such as Wasmer or Wasmtime.

After giving it a quick try, I've noticed that, with both runtimes, there are a number of dependencies that include platform-dependent code that would need to be ported to RustyHermit. Since I didn't have much time to invest in this experiment, I decided to look for a simpler solution.

And the simplest one would've probably been using OSv. OSv is able to run unmodified, dynamically linked Linux binaries, by playing this cool trick in which their linker resolves the symbols of some well-known libraries to kernel-provided functions, so it still behaves like a unikernel.

While this one was tempting, the main goal of this experiment was to find out how small the TCB can get for this use case, so OSv's approach wasn't really a good fit, since you still have a potentially larger kernel than what you really need.

Finally, I came across Unikraft, which does really behave like a library OS, which is exactly what I was looking for in the context of this experiment. It's also well documented, and the project looks pretty alive in GitHub.

Now we just need a Wasm runtime written in C to pair it up with Unikraft.


The low-hanging fruit #

There's certainly quite a number of Wasm runtimes out there, but the one that first caught my attention was Wasm3, as it's small, simple, and written in C. The build process is so simple that they even provide a one-liner to build it manually with gcc. Identifying which source files and headers need to get built wasn't going to be a problem.

Said and done, in a very small amount of time I got it running a simple Wasm hello world program:

[    0.000000] Info: [libkvmplat] <setup.c @  472> Entering from KVM (x86)...
[    0.000000] Info: [libkvmplat] <setup.c @  473>      multiboot: 0
[    0.000000] Info: [libkvmplat] <setup.c @  407> HEAP area @ 400000000 - 63b56a000 (9585467392 bytes)
[    0.000000] Info: [libkvmplat] <setup.c @  499>         initrd: 0x1b1000
[    0.000000] Info: [libkvmplat] <setup.c @  501>     heap start: 0x400000000
[    0.000000] Info: [libkvmplat] <setup.c @  505>      stack top: 0x63b56a000
[    0.000000] Info: [libkvmplat] <setup.c @  532> Switch from bootstrap stack to stack @0x63b57a000
[    0.000000] Info: [libukboot] <boot.c @  199> Unikraft constructor table at 0x167000 - 0x167028
[    0.000000] Info: [libuklibparam] <param.c @  113> libname: netdev, 72
[    0.000000] Info: [libuklibparam] <param.c @  113> libname: vfs, 96
[    0.000000] Info: [libuklibparam] <param.c @  594> No library arguments found
[    0.000000] Info: [libukboot] <boot.c @  213> Found 0 library args
[    0.000000] Info: [libukboot] <boot.c @  221> Initialize memory allocator...
[    0.000000] Info: [libukallocregion] <region.c @  202> Initialize allocregion allocator @ 0x400000000,2
[    0.000000] Info: [libukboot] <boot.c @  264> Initialize IRQ subsystem...
[    0.000000] Info: [libukboot] <boot.c @  271> Initialize platform time...
[    0.000000] Info: [libkvmplat] <tscclock.c @  253> Calibrating TSC clock against i8254 timer
[    0.100001] Info: [libkvmplat] <tscclock.c @  274> Clock source: TSC, frequency estimate is 2592089840z
[    0.104876] Info: [libukschedcoop] <schedcoop.c @  232> Initializing cooperative scheduler
[    0.111274] Warn: [libpthread-embedded] <pte_osal.c @  215> Thread 0x4000000d0 created without libpthr.
[    0.123340] Info: [libuksched] <thread.c @  180> Thread "Idle": pointer: 0x4000000d0, stack: 0x40001000
[    0.131926] Warn: [libpthread-embedded] <pte_osal.c @  215> Thread 0x4000203d8 created without libpthr.
[    0.143563] Info: [libuksched] <thread.c @  180> Thread "main": pointer: 0x4000203d8, stack: 0x40003000
[    0.150728] Info: [libukboot] <boot.c @   95> Init Table @ 0x167028 - 0x167058
[    0.157502] Info: [libukswrand] <swrand.c @   86> Initialize random number generator...
[    0.164264] Info: [libukbus] <bus.c @  134> Initialize bus handlers...
[    0.168237] Info: [libukbus] <bus.c @  136> Probe buses...
[    0.171996] Info: [liblwip] <init.c @  152> Initializing lwip
[    0.175937] Info: [libuksched] <thread.c @  180> Thread "lwip": pointer: 0x400040fc0, stack: 0x40005000
[    0.193270] Info: [libvfscore] <rootfs.c @   98> Mount ramfs to /...
[    0.201977] Info: [libvfscore] <mount.c @  122> VFS: mounting ramfs at /
[    0.208725] Info: [libvfscore] <rootfs.c @  106> Extracting initrd @ 0x1b1000 (136704 bytes) to /...
[    0.217546] Info: [libukcpio] <cpio.c @  233> Extracting /main.aot (136428 bytes)
Powered by
o.   .o       _ _               __ _
Oo   Oo  ___ (_) | __ __  __ _ ' _) :_
oO   oO ' _ `| | |/ /  _)' _` | |_|  _)
oOo oOO| | | | |   (| | | (_) |  _) :_
 OoOoO ._, ._:_:_,\_._,  .__,_:_, \___)
           Phoebe 0.10.0~9bf6e63-custom
[    0.245898] Info: [libukboot] <boot.c @  125> Pre-init table at 0x1763f0 - 0x1763f0
[    0.251770] Info: [libukboot] <boot.c @  136> Constructor table at 0x1763f0 - 0x1763f0
[    0.257999] Info: [libukboot] <boot.c @  146> Calling main(2, ['build/wasm3_kvm-x86_64', 'main.wasm'])
[    0.264377] Warn: [libukmmap] <mmap.c @  196> __uk_syscall_r_mprotect() stubbed
[    0.270256] Warn: [libukmmap] <mmap.c @  190> __uk_syscall_r_madvise() stubbed
Hello, Unikraft + Wasm3!
[    0.278786] Info: [libukboot] <boot.c @  155> main returned 0, halting system
[    0.294673] Info: [libkvmplat] <shutdown.c @   35> Unikraft halted

Gotta go fast! #

Wasm3 is neat piece of software and a good starting point, but it's just an interpreter. It doesn't have a JIT, nor the ability to run AOT code. So I started looking for some other option.

Soon I came across WAMR, which is also small, written in C, and pretty portable. It's build process is a bit more complex, but looking at a regular build log generated on Linux I was able to figure out which source files, headers and flags I needed for building it with Unikraft.

Another interesting aspect of WAMR is that it provides an AOT compiler, based on LLVM, to compile the Wasm payload into native code. And you can also tune the runtime build process to include just the code to load and run AOT code, leaving out the interpreter and the JIT, leading to a pretty small TCB.

After a bit of tweaking, I was able to build the Unikraft + WAMR bundle, an run it with the Wasm hello world program:

[    0.000000] Info: [libkvmplat] <setup.c @  472> Entering from KVM (x86)...
[    0.000000] Info: [libkvmplat] <setup.c @  473>      multiboot: 0
[    0.000000] Info: [libkvmplat] <setup.c @  407> HEAP area @ 400000000 - 43f5dc000 (1063108608 bytes)
[    0.000000] Info: [libkvmplat] <setup.c @  499>         initrd: 0x1b1000
[    0.000000] Info: [libkvmplat] <setup.c @  501>     heap start: 0x400000000
[    0.000000] Info: [libkvmplat] <setup.c @  505>      stack top: 0x43f5dc000
[    0.000000] Info: [libkvmplat] <setup.c @  532> Switch from bootstrap stack to stack @0x43f5ec000
[    0.000000] Info: [libukboot] <boot.c @  199> Unikraft constructor table at 0x167000 - 0x167028
[    0.000000] Info: [libuklibparam] <param.c @  113> libname: netdev, 72
[    0.000000] Info: [libuklibparam] <param.c @  113> libname: vfs, 96
[    0.000000] Info: [libuklibparam] <param.c @  594> No library arguments found
[    0.000000] Info: [libukboot] <boot.c @  213> Found 0 library args
[    0.000000] Info: [libukboot] <boot.c @  221> Initialize memory allocator...
[    0.000000] Info: [libukallocregion] <region.c @  202> Initialize allocregion allocator @ 0x400000000,8
[    0.000000] Info: [libukboot] <boot.c @  264> Initialize IRQ subsystem...
[    0.000000] Info: [libukboot] <boot.c @  271> Initialize platform time...
[    0.000000] Info: [libkvmplat] <tscclock.c @  253> Calibrating TSC clock against i8254 timer
[    0.100001] Info: [libkvmplat] <tscclock.c @  274> Clock source: TSC, frequency estimate is 2592141140z
[    0.106867] Info: [libukschedcoop] <schedcoop.c @  232> Initializing cooperative scheduler
[    0.114889] Warn: [libpthread-embedded] <pte_osal.c @  215> Thread 0x4000000d0 created without libpthr.
[    0.125951] Info: [libuksched] <thread.c @  180> Thread "Idle": pointer: 0x4000000d0, stack: 0x40001000
[    0.132773] Warn: [libpthread-embedded] <pte_osal.c @  215> Thread 0x4000203d8 created without libpthr.
[    0.142529] Info: [libuksched] <thread.c @  180> Thread "main": pointer: 0x4000203d8, stack: 0x40003000
[    0.148912] Info: [libukboot] <boot.c @   95> Init Table @ 0x167028 - 0x167058
[    0.154532] Info: [libukswrand] <swrand.c @   86> Initialize random number generator...
[    0.160847] Info: [libukbus] <bus.c @  134> Initialize bus handlers...
[    0.164679] Info: [libukbus] <bus.c @  136> Probe buses...
[    0.168079] Info: [liblwip] <init.c @  152> Initializing lwip
[    0.171448] Info: [libuksched] <thread.c @  180> Thread "lwip": pointer: 0x400040fc0, stack: 0x40005000
[    0.178030] Info: [libvfscore] <rootfs.c @   98> Mount ramfs to /...
[    0.181563] Info: [libvfscore] <mount.c @  122> VFS: mounting ramfs at /
[    0.192104] Info: [libvfscore] <rootfs.c @  106> Extracting initrd @ 0x1b1000 (136704 bytes) to /...
[    0.204603] Info: [libukcpio] <cpio.c @  233> Extracting /main.aot (136428 bytes)
Powered by
o.   .o       _ _               __ _
Oo   Oo  ___ (_) | __ __  __ _ ' _) :_
oO   oO ' _ `| | |/ /  _)' _` | |_|  _)
oOo oOO| | | | |   (| | | (_) |  _) :_
 OoOoO ._, ._:_:_,\_._,  .__,_:_, \___)
           Phoebe 0.10.0~9bf6e63-custom
[    0.233750] Info: [libukboot] <boot.c @  125> Pre-init table at 0x1763f0 - 0x1763f0
[    0.239519] Info: [libukboot] <boot.c @  136> Constructor table at 0x1763f0 - 0x1763f0
[    0.245658] Info: [libukboot] <boot.c @  146> Calling main(2, ['build/wamr_kvm-x86_64', 'main.aot'])
[    0.252124] Warn: [libukmmap] <mmap.c @  196> __uk_syscall_r_mprotect() stubbed
AOT module instantiate failed: mmap memory failed
[    0.261049] Info: [libukboot] <boot.c @  155> main returned 0, halting system
[    0.266754] Info: [libkvmplat] <shutdown.c @   35> Unikraft halted

... but it failed with AOT module instantiate failed: mmap memory failed. What's happening here?

Tracking down the error message in WAMR's source code we get to this section from aot_runtime.c:

    /* Totally 8G is mapped, the opcode load/store address range is 0 to 8G:
     *   ea = i + memarg.offset
     * both i and memarg.offset are u32 in range 0 to 4G
     * so the range of ea is 0 to 8G
     */
    if (!(p = mapped_mem =
              os_mmap(NULL, map_size, MMAP_PROT_NONE, MMAP_MAP_NONE))) {
        set_error_buf(error_buf, error_buf_size, "mmap memory failed");
        return NULL;
    }

So WAMR needs to mmap() a 8GB chunk of anonymous memory, but Unikraft does not yet support on-demand paging (though it looks like, after merging PR#338 they're pretty close to having it). So it seems like we've hit a wall, don't we?

Well, while Unikraft does not have on-demand paging, our host system does! Which means we can simply create a VM machine with more than 8GB of RAM and comment out the memset in ukmmap/mmap.c to avoid touching every page from that region in advance. (NOTE: this hack wouldn't work in a SEV/TDX TEE, since the VM's memory is pre-allocated and pinned in those cases, I'm just using it to be able to continue with the experiment).

And, after doing so, it works:

[    0.000000] Info: [libkvmplat] <setup.c @  472> Entering from KVM (x86)...
[    0.000000] Info: [libkvmplat] <setup.c @  473>      multiboot: 0
[    0.000000] Info: [libkvmplat] <setup.c @  407> HEAP area @ 400000000 - 63b56a000 (9585467392 bytes)
[    0.000000] Info: [libkvmplat] <setup.c @  499>         initrd: 0x1b1000
[    0.000000] Info: [libkvmplat] <setup.c @  501>     heap start: 0x400000000
[    0.000000] Info: [libkvmplat] <setup.c @  505>      stack top: 0x63b56a000
[    0.000000] Info: [libkvmplat] <setup.c @  532> Switch from bootstrap stack to stack @0x63b57a000
[    0.000000] Info: [libukboot] <boot.c @  199> Unikraft constructor table at 0x167000 - 0x167028
[    0.000000] Info: [libuklibparam] <param.c @  113> libname: netdev, 72
[    0.000000] Info: [libuklibparam] <param.c @  113> libname: vfs, 96
[    0.000000] Info: [libuklibparam] <param.c @  594> No library arguments found
[    0.000000] Info: [libukboot] <boot.c @  213> Found 0 library args
[    0.000000] Info: [libukboot] <boot.c @  221> Initialize memory allocator...
[    0.000000] Info: [libukallocregion] <region.c @  202> Initialize allocregion allocator @ 0x400000000,2
[    0.000000] Info: [libukboot] <boot.c @  264> Initialize IRQ subsystem...
[    0.000000] Info: [libukboot] <boot.c @  271> Initialize platform time...
[    0.000000] Info: [libkvmplat] <tscclock.c @  253> Calibrating TSC clock against i8254 timer
[    0.100001] Info: [libkvmplat] <tscclock.c @  274> Clock source: TSC, frequency estimate is 2592107600z
[    0.105883] Info: [libukschedcoop] <schedcoop.c @  232> Initializing cooperative scheduler
[    0.113099] Warn: [libpthread-embedded] <pte_osal.c @  215> Thread 0x4000000d0 created without libpthr.
[    0.123215] Info: [libuksched] <thread.c @  180> Thread "Idle": pointer: 0x4000000d0, stack: 0x40001000
[    0.130177] Warn: [libpthread-embedded] <pte_osal.c @  215> Thread 0x4000203d8 created without libpthr.
[    0.140264] Info: [libuksched] <thread.c @  180> Thread "main": pointer: 0x4000203d8, stack: 0x40003000
[    0.146943] Info: [libukboot] <boot.c @   95> Init Table @ 0x167028 - 0x167058
[    0.152705] Info: [libukswrand] <swrand.c @   86> Initialize random number generator...
[    0.158711] Info: [libukbus] <bus.c @  134> Initialize bus handlers...
[    0.162330] Info: [libukbus] <bus.c @  136> Probe buses...
[    0.165668] Info: [liblwip] <init.c @  152> Initializing lwip
[    0.169172] Info: [libuksched] <thread.c @  180> Thread "lwip": pointer: 0x400040fc0, stack: 0x40005000
[    0.180469] Info: [libvfscore] <rootfs.c @   98> Mount ramfs to /...
[    0.189926] Info: [libvfscore] <mount.c @  122> VFS: mounting ramfs at /
[    0.196627] Info: [libvfscore] <rootfs.c @  106> Extracting initrd @ 0x1b1000 (136704 bytes) to /...
[    0.205720] Info: [libukcpio] <cpio.c @  233> Extracting /main.aot (136428 bytes)
Powered by
o.   .o       _ _               __ _
Oo   Oo  ___ (_) | __ __  __ _ ' _) :_
oO   oO ' _ `| | |/ /  _)' _` | |_|  _)
oOo oOO| | | | |   (| | | (_) |  _) :_
 OoOoO ._, ._:_:_,\_._,  .__,_:_, \___)
           Phoebe 0.10.0~9bf6e63-custom
[    0.232435] Info: [libukboot] <boot.c @  125> Pre-init table at 0x1763f0 - 0x1763f0
[    0.238147] Info: [libukboot] <boot.c @  136> Constructor table at 0x1763f0 - 0x1763f0
[    0.243980] Info: [libukboot] <boot.c @  146> Calling main(2, ['build/wamr_kvm-x86_64', 'main.aot'])
[    0.250338] Warn: [libukmmap] <mmap.c @  196> __uk_syscall_r_mprotect() stubbed
[    0.256167] Warn: [libukmmap] <mmap.c @  190> __uk_syscall_r_madvise() stubbed
Hello, Unikraft + WAMR!
[    0.264487] Info: [libukboot] <boot.c @  155> main returned 0, halting system
[    0.270064] Info: [libkvmplat] <shutdown.c @   35> Unikraft halted

Now give me the numbers #

With the option Drop unused functions and data enabled in Unikraft's config, the size of the stripped binary for the WAMR unikernel is 642K:

[slopezpa@toolbox wamr]$ ls -l build/wamr_kvm-x86_64
-rwxr-xr-x. 1 slopezpa slopezpa 656856 Sep 27 17:05 build/wamr_kvm-x86_64

That's kind of nice, but can be better. Right now, I'm building WAMR with Unikraft using the POSIX compatibility layer, which means including a number of external libraries (newlib, pthread-embedded, lwip) into the build. If, instead, we ported WAMR to support Unikraft's libraries directly, we would significantly reduce the size of the unikernel (and, with it, the TCB).

Now, let's take a look at the memory consumption of our unikernel while running the example Wasm payload:

[slopezpa@mhamilton libkrunfw.wamr]$ ps -axuww |grep chroot_vm
slopezpa   71854  0.9  0.0 9517320 13828 pts/8   Sl+  17:04   0:00 ./chroot_vm

That's less than 14MB of RSS, and that's including the VMM (libkrun) internal structures, the guest's memory usage, and without discounting shared pages. Not bad, I guess... ;-)


Where to go from there #

I think from this experiment we can conclude that it is, indeed, feasible to build a Wasm runtime in unikernel form factor in a reasonable amount of time, and that would come with significant benefits in TCB reduction and, perhaps, in performance (yet to be tested).

Some things I'd like to do next (if I manage to find the time):

Sounds like fun! ;-)

Do you have a comment about this post? Let's chat: Matrix | Mastodon | Twitter | GitHub