I Wrote a Wasm Interpreter in C
Jan 26, 2025For the last 6 months, I’ve been spending my (limited) free time on evenings and weekends writing a WebAssembly VM in C. After repeatedly throwing away small side-projects, this was an extremely productive experience for me as an engineer. I’m writing this in order to put together some take-aways and organize my thoughts on what I learned.
What is WebAssembly?
WebAssembly is a standardized instruction set and bytecode format for a virtual stack machine that is designed to run untrusted client-side code in Web browsers at near-native speed.
The WebAssembly standard evolved out of the Emscripten C/C++ to Javascript compiler. Emscripten transpiled LLVM IR to Javascript in order to allow C and C++ code to run on the Web. Over time, browser developers worked with the Emscripten project to restrict its generated code to a simple subset of Javascript that could be optimized for performance. This subset of Javascript was standardized as asm.js. Eventually, a standalone bytecode format was designed to avoid the overhead of Javascript altogether, which, among other things, is difficult to parse efficiently. That’s where we got Wasm.
In recent years WebAssembly has been gaining popularity outside the browser. Wasm’s secure sandbox offers a lightweight alternative to containers and virtual machines in the cloud: Fastly and Shopify have built their Edge Compute and Functions products on top of WebAssembly engines. Its security and performance guarantees, plus the ability to target Wasm from a multitude of source languages makes it an attractive platform to build plugin ecosystems: Zed’s extension system is built on Wasm.
There is an industry-wide effort to support the use of WebAssembly outside of the Browser led by the Bytecode Alliance. The Bytecode Alliance leads the development of the Wasmtime runtime in order to standardize the WebAssembly System Interface (WASI). WASI enables WebAssembly modules to communicate with the host operating system through a set of POSIX-like hostcalls. With WASI, WebAssembly has a standardized way to interact with standard input and output, filesystems, networks, and other common interfaces to the outside world.
Why a WebAssembly Interpreter?
Like many other software engineers, I had fallen into a cyclical pattern where I would begin an awesome new side-project and then quickly get bored and move on to the next shiny thing after a few weeks. This left me feeling unproductive as I didn’t have anything tangible to show as a result of all my tinkering.
I needed a large project that I could invest my effort into over a sustained period of time. A project that would get me out of my comfort zone and expose me to a lower level of the computing stack than I usually work with at my day job.
Many of my abandoned side-projects failed due to a lack of direction. I needed a project that had clear goals and tangible outcomes.
I had recently made it through a rough job search where I felt my generalist engineering profile was impacting my ability to secure stable work. I was looking for something I could specialize in to build up more of a T-shaped skillset. I was getting fatigued with application level web dev and wanted to move my career towards a lower level of the stack.
Around this time I started buying into some of the WebAssembly hype. Being able to design a bespoke computing environment with custom syscalls sounded, to me, like an incredible platform engineering tool.
Semblance
I decided I’d write a WebAssembly interpreter. My goals for the project were primarily to familiarize myself with the WebAssembly Core Specification. I wanted to come out of this project ready to start contributing to an industrial grade runtime like Wasmtime. I decided to call the project Semblance.
Because the project is just for learning purposes, I had no intention of implementing every opcode or passing the core test suite. If I could get a “Hello, World!” to run I’d be happy.
Let’s take a high-level tour through the code and I’ll explain the interpreter’s lifecycle during the execution of a WebAssembly module.
But first I need to start with a disclaimer: I am not, nor have I ever been, a professional C developer, I know this is not ideal C. But I am trying to improve, I would happily welcome a code review from a friendly greybeard.
We’ll start with a simple hello world in C
extern void puts(char *str);
void hello() {
puts("Hello, World!\n");
}
We can compile this to WebAssembly with Clang
clang --target=wasm32 \
-nostdlib \
-Wl,--no-entry \
-Wl,--export-all \
-Wl,--allow-undefined \
-O3 \
-o hello.wasm \
hello.c
Now let’s invoke the exported hello
function with semblance
semblance hello.wasm --invoke hello
The program begins by initializing some memory on the stack to hold our parsed command line arguments, our decoded WebAssembly module, and the store which will encapsulate all of our module’s runtime state. We parse the command line arguments and exit if they’re invalid.
int main(int argc, char *argv[]) {
CliArgs args;
WasmModule wmod;
WasmStore store;
wmod_init(&wmod);
wrun_store_init(&store);
cli_parse_or_exit(&args, argc, argv);
if (args.help) {
printf("%s\n", cli_usage_str());
return 0;
}
// ...
}
After that initial setup, the first thing the interpreter
is going to do is decode the binary format.
Starting by looking for the "\0asm"
magic bytes at the beginning
of the file, verifying that the version
of the binary format is 1
,
and then moving on to decode each of the contained sections.
WasmDecodeResult wbin_decode_module(size_t size, WasmHeader *header, WasmModule *wmod) {
if (
size < sizeof(WasmHeader)
|| header->magic_bytes[0] != '\0'
|| header->magic_bytes[1] != 'a'
|| header->magic_bytes[2] != 's'
|| header->magic_bytes[3] != 'm'
) return wbin_err(WasmDecodeErrMagicBytes, 0);
wmod->meta.version = header->version;
if (header->version != 1) return wbin_err(WasmDecodeErrUnsupportedVersion, 0);
return wbin_decode_sections(size - sizeof(WasmHeader), header->sections, wmod);
}
WasmDecodeResult wbin_read_module(char *path, WasmModule *wmod) {
int fd = open(path, O_RDONLY);
if (fd < 0) return wbin_err_io(errno);
struct stat stats;
int stat_err = fstat(fd, &stats);
if (stat_err == -1) return wbin_err_io(errno);
WasmHeader* data = mmap(NULL, stats.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if ((size_t) data == -1) return wbin_err_io(errno);
int close_err = close(fd);
if (close_err != 0) return wbin_err_io(errno);
return wbin_decode_module(stats.st_size, data, wmod);
}
int main(int argc, char *argv[]) {
// ...
wbin_read_module_or_exit(&args, &wmod);
// ...
}
After successfully decoding the module, our WasmModule
struct
will be filled in with hello.wasm
’s
functions, types, imports, datas, etc.
typedef struct {
wasm_type_idx_t type_idx;
VEC(WasmValueType) locals;
WasmExpr body;
} WasmFunc;
typedef VEC(WasmFunc) WasmFuncs;
// ...
typedef struct {
WasmTypes types;
WasmFuncs funcs;
WasmTables tables;
WasmMems mems;
WasmGlobals globals;
WasmElems elems;
WasmDatas datas;
WasmStart start;
WasmImports imports;
WasmExports exports;
WasmCustoms customs;
WasmMeta meta;
} WasmModule;
Next we’ll resolve any of the module’s imports.
At this time env::puts
is the only
host function provided by semblance
.
Importing a host function involves allocating
a native function pointer in the store
and placing the resulting funcaddr
into the
module’s imports
array.
VEC(WasmValue) hostcall_puts(WasmStore *store, VEC(WasmValue) *args) {
VEC(WasmValue) out;
vec_init(&out);
WasmValue *arg = args->ptr;
int32_t offset = arg[0].num.i32;
WasmMemInst *mem = store->mems.ptr;
void *data = mem[0].data.ptr;
printf("%s", (char*)(data + offset));
return out;
}
WasmExternVal register_hostcall_puts(WasmStore *store) {
WasmFuncType puts_type;
vec_init(&puts_type.input_type);
WasmValueType arg1 = {
.kind = WasmValueTypeNum,
.value.num = WasmNumI32
};
vec_push_back(&puts_type.input_type, sizeof(WasmValueType), &arg1);
vec_init(&puts_type.output_type);
wasm_func_addr_t putsaddr = wrun_store_alloc_hostfunc(store, puts_type, hostcall_puts);
WasmExternVal out = {
.kind = WasmExternValFunc,
.val.func = putsaddr
};
return out;
}
int main(int argc, char *argv[]) {
// ...
VEC(WasmExternVal) imports;
vec_init(&imports);
for (size_t i = 0; i < wmod.imports.len; i++) {
WasmImport *import = vec_at(&wmod.imports, sizeof(WasmImport), i);
if (wmod_name_eq(&import->module_name, "env") && wmod_name_eq(&import->item_name, "puts")) {
WasmExternVal func_puts = register_hostcall_puts(&store);
vec_push_back(&imports, sizeof(WasmExternVal), &func_puts);
}
}
// ...
}
Now we’re ready to instantiate
the module. The instantiation of a module involves a validation
step
that performs typechecking and other tests to ensure that the module
is well formed. Additionally, memories, globals and tables are initialized
and the start
function is invoked. Instantiation results in a WasmModuleInst
that represents
the runtime structure
of the module with it’s exports
ready for consumption.
typedef enum {
WasmExternValFunc,
WasmExternValTable,
WasmExternValMem,
WasmExternValGlobal
} WasmExternValKind;
typedef struct {
WasmExternValKind kind;
union {
wasm_func_addr_t func;
wasm_table_addr_t table;
wasm_mem_addr_t mem;
wasm_global_addr_t global;
} val;
} WasmExternVal;
typedef struct {
WasmName name;
WasmExternVal val;
} WasmExportInst;
typedef struct {
WasmFuncType *types;
VEC(wasm_func_addr_t) funcaddrs;
VEC(wasm_table_addr_t) tableaddrs;
VEC(wasm_mem_addr_t) memaddrs;
VEC(wasm_global_addr_t) globaladdrs;
VEC(wasm_elem_addr_t) elemaddrs;
VEC(wasm_data_addr_t) dataaddrs;
VEC(WasmExportInst) exports;
} WasmModuleInst;
int main(int argc, char *argv[]) {
// ...
WasmModuleInst *winst = wrun_instantiate_module(&wmod, &store, &imports);
// ...
}
Next we need to resolve the exported function to invoke based
on the --invoke
command line argument.
WasmExternVal wrun_resolve_export(WasmModuleInst *winst, char *name) {
for (size_t i = 0; i < winst->exports.len; i++) {
WasmExportInst *wexp = vec_at(&winst->exports, sizeof(WasmExportInst), i);
if (wmod_name_eq(&wexp->name, name)) {
return wexp->val;
}
}
assert(false); // export not found
}
int main(int argc, char *argv[]) {
// ...
WasmExternVal export = wrun_resolve_export(winst, args.invoke);
assert(export.kind == WasmExternValFunc);
// ...
}
Finally, we’re ready to invoke the function. Invocation
requires some setup, mainly initializing the stack
and the function’s activation frame.
But ultimately we arrive at the core of the
interpreter: the opcode switch loop. The opcode switch loop
walks the sequence of Wasm instructions
in the function body, taking action on the stack
and the store
accordingly.
WasmResultKind wrun_exec_expr(WasmStore *store, WasmStack *stack, WasmInstruction *expr) {
WasmInstruction* ip = expr;
while (true) {
switch (ip->opcode) {
case WasmOpI32Const:
wrun_stack_push_i32(stack, ip->params._const.value.i32);
break;
case WasmOpI64Const:
wrun_stack_push_i64(stack, ip->params._const.value.i64);
break;
case WasmOpF32Const:
wrun_stack_push_f32(stack, ip->params._const.value.f32);
break;
case WasmOpF64Const:
wrun_stack_push_f64(stack, ip->params._const.value.f64);
break;
case WasmOpI32Ge_s: {
WasmValue a;
WasmValue b;
wrun_stack_pop_val(stack, &b);
wrun_stack_pop_val(stack, &a);
WasmValue out = {
.num.i32 = a.num.i32 >= b.num.i32
};
wrun_stack_push_val(stack, &out);
break;
}
case WasmOpCall: {
// ...
break;
}
case WasmOpNop:
break;
case WasmOpUnreachable:
return Trap;
case WasmOpExprEnd: {
// ...
return Ok;
}
// ...
default:
printf("unhandled opcode [%s]\n", wmod_str_opcode(ip->opcode));
return Trap;
}
ip++;
}
}
Let’s zoom in on the WasmOpCall
case to see how a module invokes our
env::puts
host function through the call
instruction. At a high level, we use the call
opcode’s immediate
parameter funcidx
to resolve the function instance from the store
.
Using the type information in the function instance finst
, we pop the function’s
arguments off of the stack
and then check finst->kind
to
determine whether we are calling a Wasm function or a native host function.
If we are calling a host function, we perform a native call against the
hostfunc
function pointer with our popped args
and then push any return values onto the stack.
case WasmOpCall: {
WasmActivation *frame = wrun_stack_find_current_frame(stack);
wasm_func_idx_t funcidx = ip->params.call.funcidx;
wasm_func_addr_t funcaddr = *(wasm_func_addr_t*)vec_at(&frame->inst->funcaddrs, sizeof(wasm_func_addr_t), funcidx);
WasmFuncInst *finst = vec_at(&store->funcs, sizeof(WasmFuncInst), funcaddr - 1);
size_t numargs = finst->functype.input_type.len;
VEC(WasmValue) args;
vec_init_with_capacity(&args, sizeof(WasmValue), numargs);
for (size_t i = 0; i < numargs; i++) {
WasmValue argval;
wrun_stack_pop_val(stack, &argval);
vec_push_back(&args, sizeof(WasmValue), &argval);
}
switch (finst->kind) {
case WasmFuncInstWasm: {
// ...
}
case WasmFuncInstHost: {
VEC(WasmValue) ret = ((WasmHostFunc)finst->val.hostfunc)(store, &args);
for (size_t i = 0; i < ret.len; i++) {
WasmValue *retval = vec_at(&ret, sizeof(WasmValue), i);
wrun_stack_push_val(stack, retval);
}
vec_free(&ret);
break;
}
}
vec_free(&args);
break;
}
Eventually, the opcode switch loop will arrive
at the hello
function’s end
opcode. At which point we return back to our main
function
and the program ends.
int main(int argc, char *argv[]) {
// ...
VEC(WasmValue) fn_args;
vec_init(&fn_args);
DynamicWasmResult wres = wrun_invoke_func(winst, export.val.func, &fn_args, &store);
wrun_result_dump_dynamic(&wres);
return wres.result.kind != Ok;
}
To put it all together: we should see our message on standard output.
Hello, World!
Ok []
I can’t describe how satisfying it felt to see Hello, World!
print to my terminal the first time that worked.
Outcome
In my eyes this project was a huge success. The opcode coverage isn’t anywhere near complete, but it can execute some simple “Hello, World!” programs. The code is messy, it’s slow, it leaks memory, and I’m confident it’s vulnerable to malicious modules, but it works! Plus it taught me a ton about the core WebAssembly specification and got me out of my comfort zone as an engineer.
After hacking together my own interpreter, I feel that I have enough WebAssembly specific knowledge to begin contributing to an industrial grade runtime like Wasmtime.
Over time I’ll continue hacking on a few opcodes here and there. But I think it’s time to retire this as my primary project. I’m not sure what’s next yet, but my gauge for what I can accomplish as a side project has been recalibrated.