Many people are still working on improving RNNs, mostly in academia. Examples off the top of my head:
* RWKV: https://arxiv.org/abs/2006.16236 / https://arxiv.org/abs/2404.05892 https://arxiv.org/abs/2305.13048
* Linear attention: https://arxiv.org/abs/2503.14456
* State space models: https://arxiv.org/abs/2312.00752 / https://arxiv.org/abs/2405.21060
* Linear RNNs: https://arxiv.org/abs/2410.01201
Industry OTOH has gone all-in on Transformers.
It's so annoying. Transformers keep improving and recurrent networks are harder to train so until we hit some real wall, companies don't seem eager to diverge. It's like lithium batteries improving easy faster than it was profitable to work on sodium ones, even though we unfortunately want the sodium ones to be better.
https://arxiv.org/abs/2602.00294
Recently saw it on HN.