zlacker

The C startup code (for statically-linked binaries) and run-time linker (for dynamically-linked binaries) carve up initial capabilities provided by the kernel into capabilities that cover the various global variables and function pointers needed by the program and libraries, similar to how pointers are initialised for position-independent code (more complex, but same principle, just scan through all the relocations and apply them). When you mmap(2) memory from the OS, you get back a capability with bounds covering that memory. When you malloc(3) memory from your libc, it finds space in an existing mapping, takes that capability and restricts its bounds to the allocation size. When you take a pointer to a stack-allocated variable, the compiler inserts an instruction to set the bounds of that capability to just the memory it allocated for that variable. Every pointer, whether "language-level" (what is exposed in the language) or "sub-language-level" (the pointers in the implementation, like return addresses on the stack or the stack pointer itself), is a capability, and all you need to do is insert a bounds-setting instruction at the point of allocation to restrict its bounds. So your libc's malloc needs modifying, as does your kernel, but your C program that calls them just needs to be recompiled for the pure-capability ABI.

Edit: To answer the first question, yes, that is the primitive which enables CHERI to be used for in-address-space compartmentalisation rather than relying on an MMU for process-based separation and all the overheads that come from context switching address spaces.

replies(1): >>zeotro+NK

>>jrtc27+(OP)
Thanks for the explanation!

So these bounds are set by the software (and are guarded against manipulation). Then each read or write to memory is checked against these bounds by the "fine grained MMU" hardware.

replies(1): >>jrtc27+mQ

>>zeotro+NK
Yes, though "software" is rather broad; where exactly the bounds setting happens is important as if you get it wrong it allows malicious software to not set bounds and be able to access memory outside of its allocations. Pushing it to the same place the actual allocation happens or, in the case of referencing global variables, the same place the loading and relocating happens, ensures that the only thing malicious software can do by not setting bounds is make itself insecure.