zlacker

[return to "Imagen, a text-to-image diffusion model"]
1. benree+al[view] [source] 2022-05-23 22:55:39
>>kevema+(OP)
I apologize in advance for the elitist-sounding tone. In my defense the people I’m calling elite I have nothing to do with, I’m certainly not talking about myself.

Without a fairly deep grounding in this stuff it’s hard to appreciate how far ahead Brain and DM are.

Neither OpenAI nor FAIR ever has the top score on anything unless Google delays publication. And short of FAIR? D2 lacrosse. There are exceptions to such a brash generalization, NVIDIA’s group comes to mind, but it’s a very good rule of thumb. Or your whole face the next time you are tempted to doze behind the wheel of a Tesla.

There are two big reasons for this:

- the talent wants to work with the other talent, and through a combination of foresight and deep pockets Google got that exponent on their side right around the time NVIDIA cards started breaking ImageNet. Winning the Hinton bidding war clinched it.

- the current approach of “how many Falcon Heavy launches worth of TPU can I throw at the same basic masked attention with residual feedback and a cute Fourier coloring” inherently favors deep pockets, and obviously MSFT, sorry OpenAI has that, but deep pockets also non-linearly scale outcomes when you’ve got in-house hardware for multiply-mixed precision.

Now clearly we’re nowhere close to Maxwell’s Demon on this stuff, and sooner or later some bright spark is going to break the logjam of needing 10-100MM in compute to squeeze a few points out of a language benchmark. But the incentives are weird here: who, exactly, does it serve for us plebs to be able to train these things from scratch?

◧◩
2. dougab+bp[view] [source] 2022-05-23 23:27:43
>>benree+al
This characterization is not really accurate. OpenAI has had almost a 2 year lead with GPT-3 dominating the discussion of LLMs (large language models). Google didn’t release its paper on the powerful PaLM-540b model until recently. Similarly, CLiP, Glide, DALL-E, and DALL-E2 have been incredibly influential in visual-language models. Imagen, while highly impressive, definitely is a catch-up piece of work (as was PaLM-540b).

Google clearly demonstrates their unrivaled capability to leverage massive quantities of data and compute, but it’s premature to declare that they’ve secured victory in the AI Wars.

◧◩◪
3. benree+Ap[view] [source] 2022-05-23 23:31:27
>>dougab+bp
I agree that it’s still a jump ball in a rapidly moving field, I was saying Google is far ahead, not that they’ve won.

And I don’t think whatever iteration of PaLM was cooking at the time GPT-3 started getting press would have looked to shabby.

I think Google crushed OpenAI on both GPT and DALL-E in short order because OpenAI published twice and someone had had enough.

◧◩◪◨
4. dougab+Wq[view] [source] 2022-05-23 23:42:14
>>benree+Ap
That’s pretty speculative and dubious (the holding back part) given the heavy bias to publication culture at Google Research and DeepMind. OpenAI has hardly been “crushed” here; PaLM and Imagen are solid, incremental advances, but given what came before them, not Earth-shattering.

If I were going to cite evidence for Alphabet’s “supremacy” in AI, I would’ve picked something more novel and surprising such as AlphaFold, or perhaps even Gato.

It’s not clear to me that Google has anything which compares to Reality Labs, although this may simply be my own ignorance.

Nvidia surely scooped Google with Instant Neural Graphics Primitives, in spite of Google publishing dozens of (often very interesting) NeRF papers. It’s not a war, all these works build on one another.

◧◩◪◨⬒
5. benree+Vt[view] [source] 2022-05-24 00:07:22
>>dougab+Wq
I want to be clear, all of this stuff is fascinating, expensive, and difficult. With the possible exception of a few trailer-park weirdos like me, it basically takes a PhD to even stay on top of the field, and you clearly know your stuff.

And to be equally clear, I have no inside baseball on how Brain/DM choose when to publish. I have some watercooler chat on the friendly but serious rivalry between those groups, but that’s about it.

I’m looking from the outside in at OpenAI getting all the press and attention, which sounds superficial but sooner or later turns into actual hires of actual star-bound post docs, and Google laying a little low for a few years.

Then we get Gato, Imagen, and PaLM in the space of like what, 2 months?

Clearly I’m speculating that someone pulled the trigger, but I don’t think it’s like, absurd.

◧◩◪◨⬒⬓
6. dougab+dw[view] [source] 2022-05-24 00:25:50
>>benree+Vt
Scaling up improved versions of existing recipes can be done surprisingly fast if you have strong DL infrastructure. Also, GPT-3 was built on top of previous advances such as Google’s BERT. I’m surprised that it took Google so long to answer w/ PaLM, though it seems plausible to me that they wanted a clear enough qualitative advancement that people didn’t immediate say, “So what.”

You could’ve had the same reaction years ago when Google published GoogleNet followed by a series of increasingly powerful Inception models - namely that Google would wind up owning the DNN space. But it didn’t play out that way, perhaps because Google dragged its feet releasing the models and training code, and by the time it did, there were simpler and more powerful models available like ResNet.

Meta’s recent release of the actual OPT LLM weights is probably going to have more impact than PaLM, unless Google can be persuaded to open up that model.

[go to top]