There were two very noteworthy (Perhaps Nobel prize level?) breakthroughs in two completely different fields of mathematics (knot theory and representation theory) by using these systems.
I would certainly not call that "useless", even if they're not quite Nobel-prize-worthy.
Also, "No one uses GATs in systems people discuss right now" ... Transformerare GATs (with PE) ... So, you're incredibly wrong.
And I’m so tired of this “transformers are just GNNs” nonsense that Petar has been pushing (who happens to have invented GATs and has a vested interest in overstating their importance). Transformers are GNNs in only the most trivial way: if you make the graph fully connected and allow everything to interact with everything else. I.e., not really a graph problem. Not to mention that the use of positional encodings breaks the very symmetry that GNNs were designed to preserve. In practice, no one is using GNN tooling to build transformers. You don’t see PyTorch geometric or DGL in any of the code bases. In fact, you see the opposite: people exploring transformers to replace GNNs in graph problems and getting SOTA results.
It reminds me of people that are into Bayesian methods always swooping in after some method has success and saying, “yes, but this is just a special case of a Bayesian method we’ve been talking about all along!” Yes, sure, but GATs have had 6 years to move the needle, and they’re no where to be found within modern AI systems that this thread is about.