I'm wondering whether we could further decrease the overhead of the switch on GCC/clang by marking the push function with `__attribute__((preserve_none))`. Then among GPRs we only need to save the base and stack pointers, and the callers will only save what they need to
https://github.com/ambonvik/cimba/blob/main/src/port/x86-64/...
https://github.com/ambonvik/cimba/blob/main/src/port/x86-64/...
Do sanitizers (ASan/UBSan/valgrind) still work in this setting? Also I'm wondering if you'll need some special handling if Intel CET is enabled
Not familiar with the details of Intel CET, but this is basically just implementing what others call fibers or "green threads", so any such special handling should certainly be possible if necessary.