Microkernel Design
A rough draft of a (not really) microkernel
Philosophy and Design
There is no single strict guideline being adhered to. These are different from the design principles of a microkernel, namely policy isolation and minimalism.
Policy: The system should only enforce a minimal set of policies on the userspace.
Isolation: The kernel design should not intrude on the function of programs. Furthermore, clients should not easily influence the behaviour of servers. It should be difficult for any single program to crash the entire system.
Minimalism: The interface methods between the kernel and userspace should not be heavily overloaded, yet only a minimal set of primitives should be exposed. If something can be removed from the kernel, it should only if none of the previous points are violated.
We can consider this design a weak microkernel variant, or even a hybrid design.
To reduce the number of context switches while still complying (mostly) with the above guidelines, the kernel is responsible for these 3 tasks:
Process/thread management and scheduling
Physical and virtual memory management
Interprocess communication
Some consideration may be given in the future regarding:
Userspace thread scheduling
Userspace memory management
Missing core kernel elements that still need to be decided upon:
Kernel Objects
Kernel objects are allocated in kernel space as there is no practical limitation on the address space because of the 64-bit only support.
References are per-process and an object can have multiple references. When a reference is freed, the object's refcount
is decreased by 1. When the count reaches 0, the object is freed by the GC.
Objects are exposed to the userspace through references. References can be transferred to another process, duplicated and re-instantiated with different rights.
Method
Parameters
Description
cx_ref_free
ref
Frees the reference ref
.
cx_ref_control
ref
ability
&out
Duplicates the reference ref
with new rights.
cx_ref_replace
ref
ability
&out
Replaces rights of ref
and invalidates ref
.
The above table lists the 3 logical operations that can be performed on references.
Asynchronous IPC
Instead of associating IPC endpoints with threads, they are described by channel endpoint objects. Endpoints are transferrable, duplicable XOR readable and writable.
Endpoint objects belong to one or no thread(s) and only support asynchronous IPC primitives. Channels may transfer both references and data from one endpoint to another.
In-transit messages are buffered in a kernel queue. If the queue fills up, the kernel will not allow any more messages to be queued and return an error. Until both endpoints are destroyed, in-transit messages are kept alive. Unpaired endpoints can only be read from.
Method
Parameters
Description
cx_endpoint_create
&out1
&out2
Creates an endpoint. out1
gives the r/w variant while out2
gives the read-only variant.
cx_endpoint_send
ref
msg[]
refs[]
msgsz
refsz
Sends the message in the buffer msg
of length msgsz
along with an array of references in the buffer refs
of length refsz
to the target endpoint denoted by ref
.
cx_endpoint_read
ref
msg[]
refs[]
msgsz
refsz
&vmsgsz
&vrefsz
Reads a message from the endpoint denoted by ref
into buffers msg
and refs
of lengths msgsz
and refsz
respectively. The actual buffer sizes are passed as vmsgsz
and vrefsz
.
The above table lists 3 logical operations that can be performed on the IPC endpoint pairs.
Notifications
Endpoint pairs can be signaled and waited upon through notifications. Unlike L4, a thread can wait on multiple endpoints. This is done by marking endpoints as "waiting" and placing the thread in a WAIT state. When any paired endpoint signals, we check if the endpoint partner is waiting and unblock accordingly.
Deadlocks can easily form through dependency cycles. Dependency-based schedulers must check for cycles otherwise the entire system freezes.
Internally, notifications are represented by a single bit that can only be asserted. The kernel will de-assert the bit upon successful delivery. If there are no listeners, the notification will be dropped.
Method
Parameters
Description
cx_pair_notify
ref
mask
Sets the signal bits on the target endpoint pair
cx_pair_wait_many
ref[]
mask[]
len
timeout
Waits for any signal on many endpoints until the given timeout.
The above table outlines the 2 primitives used for notifications.
Memory Management
Name
Object
Description
Virtual Address Space
aspace
Allocated per process and is the root region.
Virtual Memory Region
region
Contains a single address range allocated from a parent region.
Virtual Memory Object
vmo
List of vm_page
and can be mapped into regions.
Instead of working with single page mappings, we introduce virtual memory regions in which memory from a pool of physical pages, the VMO, can be mapped into.
Method
Parameters
Description
cx_vmo_create
size
options
pager
&out
Allocates a VMO of given size with an optional pager thread.
cx_vmo_control
ref
options
Changes options intrinsic to the VMO (i.e., can execute). Can be used to destroy the VMO object as well.
cx_vmr_create
ref
options
size
offset
&outvmr
&outadr
Allocates an address range from the parent VMR.
cx_vmr_control
ref
options
size
offset
Changes the options of a given range in the VMR. Can be used to unmap VMR sub-regions too or destroy the VMR.
cx_vmr_map
ref
options
size
vmr_offset
vmo
vmo_offset
Maps size
bytes from offset vmo_offset
in vmo
to vmr_offset
in the regionref
. Lengths and offsets should be page aligned.
The above table outlines 5 memory primitives that operate on VMRs and VMOs.
Userspace Pager
These operations only work on VMOs registered with a custom userspace pager thread. The kernel will redirect vmo
operations performed on vmo
objects with a pager to their respective thread endpoints.
Method
Parameters
Description
cx_vmo_transfer
vmo1
vmo2
offset1
offset2
length
Transfers pages from vmo1
to vmo2
where vmo1
must be a non-pager object (i.e., kernel-supplied pager object).
The above table outlines 1 memory primitive that can be used to create a userspace pager.
Processes and Threads
A thread is the single unit of execution while processes can own multiple threads. Threads of the same process share one address space while different processes have different address spaces.
The control structures for thread
and process
syscalls are used below.
Method
Parameters
Description
cx_thread_create
ref
&out
Creates a thread in process ref
.
cx_thread_control
control
data
TBD
cx_process_create
&out
&vmr
Creates a process and returns the VMR.
cx_process_control
control
data
TBD
The above table outlines 5 primitives used for process and thread control.
Last updated