Shame on all of the people involved in this: the people in these companies, the journalists who shovel shit (hope they get replaced real soon), researchers who should know better, and dementia ridden legislators.
So utterly predictable and slimy. All of those who are so gravely concerned about "alignment" in this context, give yourselves a pat on the back for hyping up science fiction stories and enabling regulatory capture.
The fact that these systems can extrapolate well beyond their training data by learning algorithms is quite different than what has come before, and anyone stating that they "simply" predict next token is severely shortsighted. Things don't have to be 'brain-like' to be useful, or to have capabilities of reasoning, but we have evidence that these systems have aligned well with reasoning tasks, perform well at causal reasoning, and we also have mathematical proofs that show how.
So I don't understand your sentiment.
There were two very noteworthy (Perhaps Nobel prize level?) breakthroughs in two completely different fields of mathematics (knot theory and representation theory) by using these systems.
I would certainly not call that "useless", even if they're not quite Nobel-prize-worthy.
Also, "No one uses GATs in systems people discuss right now" ... Transformerare GATs (with PE) ... So, you're incredibly wrong.
And I’m so tired of this “transformers are just GNNs” nonsense that Petar has been pushing (who happens to have invented GATs and has a vested interest in overstating their importance). Transformers are GNNs in only the most trivial way: if you make the graph fully connected and allow everything to interact with everything else. I.e., not really a graph problem. Not to mention that the use of positional encodings breaks the very symmetry that GNNs were designed to preserve. In practice, no one is using GNN tooling to build transformers. You don’t see PyTorch geometric or DGL in any of the code bases. In fact, you see the opposite: people exploring transformers to replace GNNs in graph problems and getting SOTA results.
It reminds me of people that are into Bayesian methods always swooping in after some method has success and saying, “yes, but this is just a special case of a Bayesian method we’ve been talking about all along!” Yes, sure, but GATs have had 6 years to move the needle, and they’re no where to be found within modern AI systems that this thread is about.