How do you do the context switching between coroutines? getcontext/setcontext, or something more architecture specific? I'm currently working on some stackful coroutine stuff and the swapcontext calls actually take a fair amount of time, so I'm planning on writing a custom one that doesn't preserve unused bits (signal mask and FPU state). So I'm curious about your findings there
Others have done this elsewhere, of course. There are links/references to several other examples in the code. I mention two in particular in the NOTICE file, not because I copied their code, but because I read it very closely and followed the outline of their examples. It would probably taken me forever to figure out the Windows TIB on my own.
What I think is pretty cool (biased as I am) in my implementation is the «trampoline» that launches the coroutine function and waits silently in case it returns. If it does, it is intercepted and the proper coroutine exit() function gets called.
I'm wondering whether we could further decrease the overhead of the switch on GCC/clang by marking the push function with `__attribute__((preserve_none))`. Then among GPRs we only need to save the base and stack pointers, and the callers will only save what they need to
https://github.com/ambonvik/cimba/blob/main/src/port/x86-64/...
https://github.com/ambonvik/cimba/blob/main/src/port/x86-64/...