zlacker

In 2015 he wrote this blog post about "The Unreasonable Effectiveness of Recurrent Neural Networks": https://karpathy.github.io/2015/05/21/rnn-effectiveness/

That blog post inspired Alec Radford at Open AI to do the research that produced the "Unsupervised sentiment neuron": https://openai.com/research/unsupervised-sentiment-neuron

Open AI decided to see what happened if they scaled up that model by leveraging the new Transformer architecture invented at Google, and they created something called GPT: https://cdn.openai.com/research-covers/language-unsupervised...

replies(4): >>imjons+m2 >>jatins+w6 >>arugul+ja >>levido+9i

>>magogh+(OP)
Also in that article he says

"In fact, I’d go as far as to say that

    The concept of attention is the most interesting recent architectural innovation in neural networks."

when the initial attention paper was less than a year old, and two years before the transformer paper.

>>magogh+(OP)
I read that post recently and it felt prescient to someone who has not been deeply involved in ML

Even the HN discussion around this had comments like "this feels my baby learning to speak.." which are the same comparisons people were saying when LLMs hit mainstream in 2022

replies(1): >>sigmoi+I7

>>jatins+w6
I had forgotten it's existence by now, but I remember reading this post all those years back. Damn. I also remember thinking that this would be so cool if RNNs didn't suck at long contexts, even with an attention mechanism. In some sense, the only thing he needed was the transformer architecture and a "fuck, let's just do it" compute budget to end up at ChatGPT. He was always at the frontier of this field.

>>magogh+(OP)
Is it stated somewhere that Radford was inspired by that blog post?

replies(1): >>magogh+sO

>>magogh+(OP)
He also wrote about the concept of Software 3.0

>>arugul+ja
I tried to find the where I heard that Radford was inspired by that blog post, but the closest thing I found is that in the "Sentiment Neuron" paper (Learning to Generate Reviews and Discovering Sentiment: https://arxiv.org/pdf/1704.01444.pdf), in the "Discussion and Future Work" section they mention this Karpathy paper from 2015: Visualizing and Understanding Recurrent Networks https://arxiv.org/abs/1506.02078