>>mighdo+(OP)
> This is a perfect scenario to take advantage of code generation. I wrote a code generator that takes a template and generates the optimized kernels for each operation. The code generator is written in TypeScript and generates WebGPU compute shader code. This means that the generated code can be heavily optimized for the given scenario and those optimizations can be shared between operations.
A clever way to implement an AOT variant of the operator fusion methods in the XLA (JIT) compiler.