Microkernel Design

A rough draft of a (not really) microkernel

Philosophy and Design

There is no single strict guideline being adhered to. These are different from the design principles of a microkernel, namely policy isolation and minimalism.

  1. Policy: The system should only enforce a minimal set of policies on the userspace.

  2. Isolation: The kernel design should not intrude on the function of programs. Furthermore, clients should not easily influence the behaviour of servers. It should be difficult for any single program to crash the entire system.

  3. Minimalism: The interface methods between the kernel and userspace should not be heavily overloaded, yet only a minimal set of primitives should be exposed. If something can be removed from the kernel, it should only if none of the previous points are violated.

We can consider this design a weak microkernel variant, or even a hybrid design.

To reduce the number of context switches while still complying (mostly) with the above guidelines, the kernel is responsible for these 3 tasks:

  1. Process/thread management and scheduling

  2. Physical and virtual memory management

  3. Interprocess communication

Some consideration may be given in the future regarding:

  1. Userspace thread scheduling

  2. Userspace memory management

Missing core kernel elements that still need to be decided upon:

Kernel Objects

Kernel objects are allocated in kernel space as there is no practical limitation on the address space because of the 64-bit only support.

References are per-process and an object can have multiple references. When a reference is freed, the object's refcount is decreased by 1. When the count reaches 0, the object is freed by the GC.

// Sample reference implementation
struct kref {
    kobj*    object;
    uint32_t ability;
    uint32_t proc_owner;
}

Objects are exposed to the userspace through references. References can be transferred to another process, duplicated and re-instantiated with different rights.

Method

Parameters

Description

cx_ref_free

ref

Frees the reference ref.

cx_ref_control

ref ability &out

Duplicates the reference ref with new rights.

cx_ref_replace

ref ability &out

Replaces rights of ref and invalidates ref.

The above table lists the 3 logical operations that can be performed on references.

Asynchronous IPC

Instead of associating IPC endpoints with threads, they are described by channel endpoint objects. Endpoints are transferrable, duplicable XOR readable and writable.

Endpoint objects belong to one or no thread(s) and only support asynchronous IPC primitives. Channels may transfer both references and data from one endpoint to another.

In-transit messages are buffered in a kernel queue. If the queue fills up, the kernel will not allow any more messages to be queued and return an error. Until both endpoints are destroyed, in-transit messages are kept alive. Unpaired endpoints can only be read from.

Method

Parameters

Description

cx_endpoint_create

&out1 &out2

Creates an endpoint. out1 gives the r/w variant while out2 gives the read-only variant.

cx_endpoint_send

ref msg[] refs[] msgsz refsz

Sends the message in the buffer msg of length msgsz along with an array of references in the buffer refs of length refsz to the target endpoint denoted by ref.

cx_endpoint_read

ref msg[] refs[] msgsz refsz &vmsgsz &vrefsz

Reads a message from the endpoint denoted by ref into buffers msg and refs of lengths msgsz and refsz respectively. The actual buffer sizes are passed as vmsgsz and vrefsz.

The above table lists 3 logical operations that can be performed on the IPC endpoint pairs.

Notifications

Endpoint pairs can be signaled and waited upon through notifications. Unlike L4, a thread can wait on multiple endpoints. This is done by marking endpoints as "waiting" and placing the thread in a WAIT state. When any paired endpoint signals, we check if the endpoint partner is waiting and unblock accordingly.

Deadlocks can easily form through dependency cycles. Dependency-based schedulers must check for cycles otherwise the entire system freezes.

Internally, notifications are represented by a single bit that can only be asserted. The kernel will de-assert the bit upon successful delivery. If there are no listeners, the notification will be dropped.

Method

Parameters

Description

cx_pair_notify

ref mask

Sets the signal bits on the target endpoint pair

cx_pair_wait_many

ref[] mask[] len timeout

Waits for any signal on many endpoints until the given timeout.

The above table outlines the 2 primitives used for notifications.

Memory Management

Name

Object

Description

Virtual Address Space

aspace

Allocated per process and is the root region.

Virtual Memory Region

region

Contains a single address range allocated from a parent region.

Virtual Memory Object

vmo

List of vm_page and can be mapped into regions.

Instead of working with single page mappings, we introduce virtual memory regions in which memory from a pool of physical pages, the VMO, can be mapped into.

Method

Parameters

Description

cx_vmo_create

size options pager &out

Allocates a VMO of given size with an optional pager thread.

cx_vmo_control

ref options

Changes options intrinsic to the VMO (i.e., can execute). Can be used to destroy the VMO object as well.

cx_vmr_create

ref options size offset &outvmr &outadr

Allocates an address range from the parent VMR.

cx_vmr_control

ref options size offset

Changes the options of a given range in the VMR. Can be used to unmap VMR sub-regions too or destroy the VMR.

cx_vmr_map

ref options size vmr_offset vmo vmo_offset

Maps size bytes from offset vmo_offset in vmo to vmr_offset in the regionref. Lengths and offsets should be page aligned.

The above table outlines 5 memory primitives that operate on VMRs and VMOs.

Userspace Pager

These operations only work on VMOs registered with a custom userspace pager thread. The kernel will redirect vmo operations performed on vmo objects with a pager to their respective thread endpoints.

Method

Parameters

Description

cx_vmo_transfer

vmo1 vmo2 offset1 offset2 length

Transfers pages from vmo1 to vmo2 where vmo1 must be a non-pager object (i.e., kernel-supplied pager object).

The above table outlines 1 memory primitive that can be used to create a userspace pager.

Processes and Threads

A thread is the single unit of execution while processes can own multiple threads. Threads of the same process share one address space while different processes have different address spaces.

The control structures for thread and process syscalls are used below.

Method

Parameters

Description

cx_thread_create

ref &out

Creates a thread in process ref.

cx_thread_control

control data

TBD

cx_process_create

&out &vmr

Creates a process and returns the VMR.

cx_process_control

control data

TBD

The above table outlines 5 primitives used for process and thread control.

Last updated