zlacker

> Neither OpenAI nor FAIR ever has the top score on anything unless Google delays publication.

This is ... very incorrect. I am very certain (95%+) that Google had nothing even close to GPT-3 at the time of its release. It's been 2 full years since GPT-3 was released, and even longer since OpenAI actually trained it.

That's not to talk about any of the other things OpenAI/FAIR has released that were SOTA at the time of release (Dall-E 1, JukeBox, Poker, Diplomacy, Codex).

Google Brain and Deepmind have done a lot of great work, but to imply that they essentially have a monopoly on SOTA results and all SOTA results other labs have achieved are just due to Google delaying publication is ridiculous.

replies(2): >>gwern+95 >>benree+B8

>>chille+(OP)
Yeah, at the time, GB was still very big on mixture-of-expert models and bidirectional models like T5. (I'm not too enthusiastic about the former, but the latter has been a great model family and even if not GPT-3, still awesome.) DeepMind pivoted faster than GB, based on Gopher's reported training date, and GB followed some time after. But definitely neither had their own GPT-3-scale dense Transformer when GPT-3 was published.

replies(1): >>benree+Ja

>>chille+(OP)
Any “brash generalization” is clearly going to be grossly incorrect in concrete cases, and while I have a little gossip from true insiders, it’s nowhere near enough to make definitive statements about specific progress on teams at companies that I’ve never worked for.

I did a bit of disclaimer on my original post but not enough to withstand detailed scrutiny. This is sort of the trouble with trying to talk about cutting-edge research in what amounts to a tweet: what’s the right amount of oversimplified, emphatic statement to add legitimate insight but not overstep into being just full of shit.

I obviously don’t know that publication schedules at heavy-duty learning shops are deliberate and factor-in other publications. The only one I know anything concretely about is FAIR and even that’s badly dated knowledge.

I was trying to squeeze into a few hundred characters my very strong belief that Brain and DM haven’t let themselves be scooped since ResNet, based on my even stronger belief that no one has the muscle to do it.

To the extent that my oversimplification detracted from the conversation I regret that.

>>gwern+95
At the risk of sounding like I’m trying to defend a position that I’ve already conceded is an oversimplification, I’m frankly a little skeptical of how we can even know that.

GPT is, opaque. It’s somewhere between common knowledge and conspiracy theory that it gets a helping hand from Turks when it gets in over its head.

The exact details of why a BERT-style transformer, or any of the zillion other lookalikes, isn’t just over-fitting Wikipedia the more corpus and compute you feed to its insatiable maw has always seemed a little big on claims and light on reproducibility.

I don’t think there are many attention skeptics in language modeling, it’s a good idea that you can demo on a gaming PC. Transformers demonstrably work, and a better beam-search (or whatever) hits the armchair Turing test harder for a given compute budget.

But having seen some of this stuff play out at scale, and admittedly this is purely anecdotal, these things are basically asking the question: “if I overfit all human language on the Internet, is that a bad thing?”

It’s my personal suspicion that this is the dominant term, and it’s my personal belief that Google’s ability to do both corpus and model parallelism at Jeff Dean levels while simultaneously building out hardware to the exact precision required is unique by a long way.

But, to be more accurate than I was in my original comment, I don’t know most of that in the sense that would be required by peer-review, let alone a jury. It’s just an educated guess.