Looks like quite a lot of complexity for such gain. 30-40% is roughly what context-threading would buy you [1]. It takes relatively little code to implement - only do honest assembly for jumps and conditional branches, for other opcodes just emit a call to interpreter's handler. Reportedly, it took Apple just 4k LOC to ship first JIT like that in JavaScriptCore [2].
Also, if you haven't seen it, musttail + preserve_none is a cool new dispatch technique to get more mileage out of plain C/C++ before turning to hand-coded assembly/JIT [3]. A step up from computed goto.
[1] https://webdocs.cs.ualberta.ca/~amaral/cascon/CDP05/slides/C...
[2] https://webkit.org/blog/214/introducing-squirrelfish-extreme...
I suppose the downside of the weval transform is that it is only helpful for interpreters, whereas the other extensions could have other use cases.
Academic paper about weval: https://dl.acm.org/doi/pdf/10.1145/3729259
My summary of that paper: https://danglingpointers.substack.com/p/partial-evaluation-w...