I Wrote a Wasm Interpreter in C

Jan 26, 2025

For the last 6 months, I’ve been spending my (limited) free time on evenings and weekends writing a WebAssembly VM in C. After repeatedly throwing away small side-projects, this was an extremely productive experience for me as an engineer. I’m writing this in order to put together some take-aways and organize my thoughts on what I learned.

What is WebAssembly?

WebAssembly is a standardized instruction set and bytecode format for a virtual stack machine that is designed to run untrusted client-side code in Web browsers at near-native speed.

The WebAssembly standard evolved out of the Emscripten C/C++ to Javascript compiler. Emscripten transpiled LLVM IR to Javascript in order to allow C and C++ code to run on the Web. Over time, browser developers worked with the Emscripten project to restrict its generated code to a simple subset of Javascript that could be optimized for performance. This subset of Javascript was standardized as asm.js. Eventually, a standalone bytecode format was designed to avoid the overhead of Javascript altogether, which, among other things, is difficult to parse efficiently. That’s where we got Wasm.

In recent years WebAssembly has been gaining popularity outside the browser. Wasm’s secure sandbox offers a lightweight alternative to containers and virtual machines in the cloud: Fastly and Shopify have built their Edge Compute and Functions products on top of WebAssembly engines. Its security and performance guarantees, plus the ability to target Wasm from a multitude of source languages makes it an attractive platform to build plugin ecosystems: Zed’s extension system is built on Wasm.

There is an industry-wide effort to support the use of WebAssembly outside of the Browser led by the Bytecode Alliance. The Bytecode Alliance leads the development of the Wasmtime runtime in order to standardize the WebAssembly System Interface (WASI). WASI enables WebAssembly modules to communicate with the host operating system through a set of POSIX-like hostcalls. With WASI, WebAssembly has a standardized way to interact with standard input and output, filesystems, networks, and other common interfaces to the outside world.

Why a WebAssembly Interpreter?

Like many other software engineers, I had fallen into a cyclical pattern where I would begin an awesome new side-project and then quickly get bored and move on to the next shiny thing after a few weeks. This left me feeling unproductive as I didn’t have anything tangible to show as a result of all my tinkering.

I needed a large project that I could invest my effort into over a sustained period of time. A project that would get me out of my comfort zone and expose me to a lower level of the computing stack than I usually work with at my day job.

Many of my abandoned side-projects failed due to a lack of direction. I needed a project that had clear goals and tangible outcomes.

I had recently made it through a rough job search where I felt my generalist engineering profile was impacting my ability to secure stable work. I was looking for something I could specialize in to build up more of a T-shaped skillset. I was getting fatigued with application level web dev and wanted to move my career towards a lower level of the stack.

Around this time I started buying into some of the WebAssembly hype. Being able to design a bespoke computing environment with custom syscalls sounded, to me, like an incredible platform engineering tool.

Semblance

I decided I’d write a WebAssembly interpreter. My goals for the project were primarily to familiarize myself with the WebAssembly Core Specification. I wanted to come out of this project ready to start contributing to an industrial grade runtime like Wasmtime. I decided to call the project Semblance.

Because the project is just for learning purposes, I had no intention of implementing every opcode or passing the core test suite. If I could get a “Hello, World!” to run I’d be happy.

Let’s take a high-level tour through the code and I’ll explain the interpreter’s lifecycle during the execution of a WebAssembly module.

But first I need to start with a disclaimer: I am not, nor have I ever been, a professional C developer, I know this is not ideal C. But I am trying to improve, I would happily welcome a code review from a friendly greybeard.

We’ll start with a simple hello world in C

extern void puts(char *str);

void hello() {
  puts("Hello, World!\n");
}

We can compile this to WebAssembly with Clang

clang --target=wasm32 \
    -nostdlib \
    -Wl,--no-entry \
    -Wl,--export-all \
    -Wl,--allow-undefined \
    -O3 \
    -o hello.wasm \
    hello.c

Now let’s invoke the exported hello function with semblance

semblance hello.wasm --invoke hello

The program begins by initializing some memory on the stack to hold our parsed command line arguments, our decoded WebAssembly module, and the store which will encapsulate all of our module’s runtime state. We parse the command line arguments and exit if they’re invalid.

int main(int argc, char *argv[]) {

    CliArgs args;
    WasmModule wmod;
    WasmStore store;

    wmod_init(&wmod);
    wrun_store_init(&store);

    cli_parse_or_exit(&args, argc, argv);

    if (args.help) {
        printf("%s\n", cli_usage_str());
        return 0;
    }

    // ...

}

After that initial setup, the first thing the interpreter is going to do is decode the binary format. Starting by looking for the "\0asm" magic bytes at the beginning of the file, verifying that the version of the binary format is 1, and then moving on to decode each of the contained sections.

WasmDecodeResult wbin_decode_module(size_t size, WasmHeader *header, WasmModule *wmod) {
    if (
        size < sizeof(WasmHeader)
        || header->magic_bytes[0] != '\0'
        || header->magic_bytes[1] != 'a'
        || header->magic_bytes[2] != 's'
        || header->magic_bytes[3] != 'm'
    ) return wbin_err(WasmDecodeErrMagicBytes, 0);

    wmod->meta.version = header->version;
    if (header->version != 1) return wbin_err(WasmDecodeErrUnsupportedVersion, 0);

    return wbin_decode_sections(size - sizeof(WasmHeader), header->sections, wmod);
}

WasmDecodeResult wbin_read_module(char *path, WasmModule *wmod) {
    int fd = open(path, O_RDONLY);
    if (fd < 0) return wbin_err_io(errno);

    struct stat stats;
    int stat_err = fstat(fd, &stats);
    if (stat_err == -1) return wbin_err_io(errno);

    WasmHeader* data = mmap(NULL, stats.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    if ((size_t) data == -1) return wbin_err_io(errno);

    int close_err = close(fd);
    if (close_err != 0) return wbin_err_io(errno);

    return wbin_decode_module(stats.st_size, data, wmod);
}

int main(int argc, char *argv[]) {

    // ...

    wbin_read_module_or_exit(&args, &wmod);

    // ...

}

After successfully decoding the module, our WasmModule struct will be filled in with hello.wasm’s functions, types, imports, datas, etc.

typedef struct {
    wasm_type_idx_t type_idx;
    VEC(WasmValueType) locals;
    WasmExpr body;
} WasmFunc;

typedef VEC(WasmFunc) WasmFuncs;

// ...

typedef struct {
    WasmTypes types;
    WasmFuncs funcs;
    WasmTables tables;
    WasmMems mems;
    WasmGlobals globals;
    WasmElems elems;
    WasmDatas datas;
    WasmStart start;
    WasmImports imports;
    WasmExports exports;
    WasmCustoms customs;
    WasmMeta meta;
} WasmModule;

Next we’ll resolve any of the module’s imports. At this time env::puts is the only host function provided by semblance. Importing a host function involves allocating a native function pointer in the store and placing the resulting funcaddr into the module’s imports array.

VEC(WasmValue) hostcall_puts(WasmStore *store, VEC(WasmValue) *args) {
    VEC(WasmValue) out;
    vec_init(&out);

    WasmValue *arg = args->ptr;
    int32_t offset = arg[0].num.i32;

    WasmMemInst *mem = store->mems.ptr;
    void *data = mem[0].data.ptr;

    printf("%s", (char*)(data + offset));

    return out;
}

WasmExternVal register_hostcall_puts(WasmStore *store) {
    WasmFuncType puts_type;
    vec_init(&puts_type.input_type);
    WasmValueType arg1 = {
        .kind = WasmValueTypeNum,
        .value.num = WasmNumI32
    };
    vec_push_back(&puts_type.input_type, sizeof(WasmValueType), &arg1);
    vec_init(&puts_type.output_type);
    wasm_func_addr_t putsaddr = wrun_store_alloc_hostfunc(store, puts_type, hostcall_puts);
    WasmExternVal out = {
        .kind = WasmExternValFunc,
        .val.func = putsaddr
    };
    return out;
}

int main(int argc, char *argv[]) {

    // ...

    VEC(WasmExternVal) imports;
    vec_init(&imports);

    for (size_t i = 0; i < wmod.imports.len; i++) {
        WasmImport *import = vec_at(&wmod.imports, sizeof(WasmImport), i);
        if (wmod_name_eq(&import->module_name, "env") && wmod_name_eq(&import->item_name, "puts")) {
            WasmExternVal func_puts = register_hostcall_puts(&store);
            vec_push_back(&imports, sizeof(WasmExternVal), &func_puts);
        }
    }

    // ...

}

Now we’re ready to instantiate the module. The instantiation of a module involves a validation step that performs typechecking and other tests to ensure that the module is well formed. Additionally, memories, globals and tables are initialized and the start function is invoked. Instantiation results in a WasmModuleInst that represents the runtime structure of the module with it’s exports ready for consumption.

typedef enum {
    WasmExternValFunc,
    WasmExternValTable,
    WasmExternValMem,
    WasmExternValGlobal
} WasmExternValKind;

typedef struct {
    WasmExternValKind kind;
    union {
        wasm_func_addr_t func;
        wasm_table_addr_t table;
        wasm_mem_addr_t mem;
        wasm_global_addr_t global;
    } val;
} WasmExternVal;

typedef struct {
    WasmName name;
    WasmExternVal val;
} WasmExportInst;

typedef struct {
    WasmFuncType *types;
    VEC(wasm_func_addr_t) funcaddrs;
    VEC(wasm_table_addr_t) tableaddrs;
    VEC(wasm_mem_addr_t) memaddrs;
    VEC(wasm_global_addr_t) globaladdrs;
    VEC(wasm_elem_addr_t) elemaddrs;
    VEC(wasm_data_addr_t) dataaddrs;
    VEC(WasmExportInst) exports;
} WasmModuleInst;

int main(int argc, char *argv[]) {

    // ...

    WasmModuleInst *winst = wrun_instantiate_module(&wmod, &store, &imports);

    // ...

}

Next we need to resolve the exported function to invoke based on the --invoke command line argument.

WasmExternVal wrun_resolve_export(WasmModuleInst *winst, char *name) {
    for (size_t i = 0; i < winst->exports.len; i++) {
        WasmExportInst *wexp = vec_at(&winst->exports, sizeof(WasmExportInst), i);
        if (wmod_name_eq(&wexp->name, name)) {
            return wexp->val;
        }
    }
    assert(false); // export not found
}

int main(int argc, char *argv[]) {

    // ...

    WasmExternVal export = wrun_resolve_export(winst, args.invoke);
    assert(export.kind == WasmExternValFunc);

    // ...
}

Finally, we’re ready to invoke the function. Invocation requires some setup, mainly initializing the stack and the function’s activation frame. But ultimately we arrive at the core of the interpreter: the opcode switch loop. The opcode switch loop walks the sequence of Wasm instructions in the function body, taking action on the stack and the store accordingly.

WasmResultKind wrun_exec_expr(WasmStore *store, WasmStack *stack, WasmInstruction *expr) {
    WasmInstruction* ip = expr;
    while (true) {
        switch (ip->opcode) {
           case WasmOpI32Const:
                wrun_stack_push_i32(stack, ip->params._const.value.i32);
                break;
            case WasmOpI64Const:
                wrun_stack_push_i64(stack, ip->params._const.value.i64);
                break;
            case WasmOpF32Const:
                wrun_stack_push_f32(stack, ip->params._const.value.f32);
                break;
            case WasmOpF64Const:
                wrun_stack_push_f64(stack, ip->params._const.value.f64);
                break;
            case WasmOpI32Ge_s: {
                WasmValue a;
                WasmValue b;
                wrun_stack_pop_val(stack, &b);
                wrun_stack_pop_val(stack, &a);
                WasmValue out = {
                    .num.i32 = a.num.i32 >= b.num.i32
                };
                wrun_stack_push_val(stack, &out);
                break;
            }
            case WasmOpCall: {
                // ...
                break;
            }
            case WasmOpNop:
                break;
            case WasmOpUnreachable:
                return Trap;
            case WasmOpExprEnd: {
                // ...
                return Ok;
            }

            // ...

            default:
                printf("unhandled opcode [%s]\n", wmod_str_opcode(ip->opcode));
                return Trap;
        }
        ip++;
    }
}

Let’s zoom in on the WasmOpCall case to see how a module invokes our env::puts host function through the call instruction. At a high level, we use the call opcode’s immediate parameter funcidx to resolve the function instance from the store. Using the type information in the function instance finst, we pop the function’s arguments off of the stack and then check finst->kind to determine whether we are calling a Wasm function or a native host function. If we are calling a host function, we perform a native call against the hostfunc function pointer with our popped args and then push any return values onto the stack.

case WasmOpCall: {
    WasmActivation *frame = wrun_stack_find_current_frame(stack);
    wasm_func_idx_t funcidx = ip->params.call.funcidx;
    wasm_func_addr_t funcaddr = *(wasm_func_addr_t*)vec_at(&frame->inst->funcaddrs, sizeof(wasm_func_addr_t), funcidx);
    WasmFuncInst *finst = vec_at(&store->funcs, sizeof(WasmFuncInst), funcaddr - 1);
    size_t numargs = finst->functype.input_type.len;
    VEC(WasmValue) args;
    vec_init_with_capacity(&args, sizeof(WasmValue), numargs);
    for (size_t i = 0; i < numargs; i++) {
        WasmValue argval;
        wrun_stack_pop_val(stack, &argval);
        vec_push_back(&args, sizeof(WasmValue), &argval);
    }
    switch (finst->kind) {
        case WasmFuncInstWasm: {
            // ...
        }
        case WasmFuncInstHost: {
            VEC(WasmValue) ret = ((WasmHostFunc)finst->val.hostfunc)(store, &args);
            for (size_t i = 0; i < ret.len; i++) {
                WasmValue *retval = vec_at(&ret, sizeof(WasmValue), i);
                wrun_stack_push_val(stack, retval);
            }
            vec_free(&ret);
            break;
        }
    }
    vec_free(&args);
    break;
}

Eventually, the opcode switch loop will arrive at the hello function’s end opcode. At which point we return back to our main function and the program ends.

int main(int argc, char *argv[]) {

    // ...

    VEC(WasmValue) fn_args;
    vec_init(&fn_args);
    DynamicWasmResult wres = wrun_invoke_func(winst, export.val.func, &fn_args, &store);
    wrun_result_dump_dynamic(&wres);

    return wres.result.kind != Ok;
}

To put it all together: we should see our message on standard output.

Hello, World!
Ok []

I can’t describe how satisfying it felt to see Hello, World! print to my terminal the first time that worked.

Outcome

In my eyes this project was a huge success. The opcode coverage isn’t anywhere near complete, but it can execute some simple “Hello, World!” programs. The code is messy, it’s slow, it leaks memory, and I’m confident it’s vulnerable to malicious modules, but it works! Plus it taught me a ton about the core WebAssembly specification and got me out of my comfort zone as an engineer.

After hacking together my own interpreter, I feel that I have enough WebAssembly specific knowledge to begin contributing to an industrial grade runtime like Wasmtime.

Over time I’ll continue hacking on a few opcodes here and there. But I think it’s time to retire this as my primary project. I’m not sure what’s next yet, but my gauge for what I can accomplish as a side project has been recalibrated.