zlacker

[return to "Imagen, a text-to-image diffusion model"]
1. Veedra+7w[view] [source] 2022-05-24 00:25:12
>>kevema+(OP)
I thought I was doing well after not being overly surprised by DALL-E 2 or Gato. How am I still not calibrated on this stuff? I know I am meant to be the one who constantly argues that language models already have sophisticated semantic understanding, and that you don't need visual senses to learn grounded world knowledge of this sort, but come on, you don't get to just throw T5 in a multimodal model as-is and have it work better than multimodal transformers! VLM[1] at least added fine-tuned internal components.

Good lord we are screwed. And yet somehow I bet even this isn't going to kill off the they're just statistical interpolators meme.

[1] https://www.deepmind.com/blog/tackling-multiple-tasks-with-a...

◧◩
2. benree+1x[view] [source] 2022-05-24 00:33:50
>>Veedra+7w
It’s just my opinion but I think the meme you’re talking about is deeply related to other branches of science and philosophy: ranging from the trust old saw about AI being anything a computer hasn’t done yet to deep meditations on the nature of consciousness.

They’re all fundamentally anthropocentric: people argue until they are blue in the face about what “intelligent” means but it’s always implicit that what they really mean is “how much like me is this other thing”.

Language models, even more so than the vision models that got them funded have empirically demonstrated that knowing the probability of two things being adjacent in some latent space is at the boundary indistinguishable from creating and understanding language.

I think the burden is on the bright hominids with both a reflexive language model and a sex drive to explain their pre-Copernican, unique place in the theory of computation rather than vice versa.

A lot of these problems just aren’t problems anymore if performance on tasks supersedes “consciousness” as the thing we’re studying.

[go to top]