Fast Userspace Kernel (FUsK)

A draft/unimplemented design of the proposed concept

Background and review of past literature

There are generally 2 methods of accomplishing intraprocess isolation: leveraging MPK-based and virtualisation-based techniques.

...

Design

Goal: This technique aims to enable fast IPC without directly involving the kernel. As a result, this technique can also be used to run a limited subset of unprivileged kernel code that only requires a separate address space. The Client (C) must switch to the Server (S) address space (AS) as defined by the CR3 structures without writing to its CR3 register as it is in the unprivileged Guest usermode.

Intel VT-x: To realize this, we leverage the virtualisation features present in newer Intel processors. Here, there is 6 privileged domains divided between "Guest" and "Host"/"Root".

Virtual memory can be translated through 2 layers, virtual addresses (VA) to guest physical (GPA) to host physical (HPA) addresses. VA to GPA translation is done through CR3 page tables while GPA to HPA is done through the Extended Page Tables (EPT).

EPTs can be switched from the unprivileged Guest through the VMCALL instruction, which allows switching to a predetermined list of EPTs using the EPT-list (EPTL), see SDM vol. 3 section 25.5.6-3.

High-Level Design Overview

FUsK leverages VMCALL to switch between address spaces. We insert a trampoline (T) in its own address space which will act as a barrier between C and S. The trampoline's role is similar to a "Fast Userspace Kernel".

Using VMCALL, we construct an EPT that maps the GPA of the CR3-C to the HPA of the CR3-T. Ref 25.5.6-3, subsequent virtual memory accesses will be translated by the structures through the new EPT.

instruction A  // Translated through old CR3
VMCALL 0
insruction B   // Translated through new CR3

The EPTL-C will contain one entry - the EPT describing the mapping CR3C -> CR3T. AS-C must contain an unmappable region of memory where the code for T will reside. Furthermore, AS-T contains only mappings for code + data which resides in the virtual addresses inaccessible by C (unwritable).

A single page is inserted such that the union of AS-T and AS-C addressable VAs is that single page; in AS-T, it will contain the entrypoint of the T code (as shown below).

     ┌────────────────────────┐
     │Closeup of VMCALL Region│
     └────────────────────────┘

      AS-C            AS-T
     ┌────────┐      ┌────────┐
     │        │      │        │
W^X  │ Mapped │      │        │
     │        │      │Unmapped│
     ├────────┤      │        │
     │...     │      │        │
     │INSTR A │      ├────────┤ AS-T intersects
 RX  │VMCALL  ├─────►│ EntryT │ AS-C at entry
     │...     │      │        │
     │        │      │        │
     ├────────┤      │        │
     │        │      │ Mapped │
     │Unmapped│      │        │
     │        │      │        │
     └────────┘      └────────┘

Within the AS-T data will be the EPTL as mapped by its own CR3 structure. AS-T data should also contain the caller information (i.e., client) and perform permission checks before yielding control to the server.

A similar mechanism is at play to switch to AS-S. T will overwrite EPTL to only include EPT-S and EPT-T before invoking an entry into a known location. Switching from AS-T back to AS-C proceeds similarly, with AS-T switching back into the unwritable region in AS-C.

Threat model

We make no assumptions about the client and server though we assume the trampoline is trusted.

Last updated