MobileDiffusion: Rapid text-to-image generation on-device

>>jasond+(OP)
Original paper: https://arxiv.org/abs/2311.16567

>>jasond+(OP)
some points that stood out to me:

1. they made a lot of careful tweaks to the unet network architecture - it seems like they ran many different ablations here ("In total, our endeavor consumes approximately 512 TPUs spanning 30 days").

2. the model distillation is based on previous UFOGen work from the same team https://arxiv.org/abs/2311.09257 (hence the UFO graphic in the diffusion-gan diagram)

3. they train their own 8-channel latent encoder / decoder ("VAE") from scratch (similar to Meta's Emu paper) instead of using the SD VAEs like many other papers do

4. they use an internal dataset of 150m image/text pairs (roughly the size of laion-highres)

5. they also reran SD training from scratch on this dataset to get their baseline performance

>>tussa+1m
It did work, I used it myself. A quick search shows others who had my experience. This was late 2019 for me. Here's the first link and the Google post on rolling out in summer 2019:

"Google Assistant just made a dinner reservation for me... I knew this was coming... but mind blown!" https://www.reddit.com/r/googlehome/comments/ezv3us/google_a...

"Now, you can use it on all Pixel phones in 43 U.S. states.

All it takes is a few seconds to tell your Assistant where you'd like to go. Just ask the Assistant on your phone, “Book a table for four people at [restaurant name] tomorrow night.” The Assistant will then call the restaurant to see if it can accommodate your request. Once your reservation is successfully made, you’ll receive a notification on your phone, an email update and a calendar invite so you don’t forget."

https://blog.google/products/assistant/book-table-google-ass...

>>tussa+1m
It did materialize.

https://www.reddit.com/r/googlehome/comments/ezv3us/google_a...

Or the comments under https://youtu.be/-RHG5DFAjp8

It's probably hard to trigger these days because most places support OpenTable or similar.

>>jshear+df
it may turn out more like the imagen timeline

2022-05 - google imagen research paper posted >>31484562

2022-12 - imagen developers leave google to form ideogram

2023-08 - ideogram ships a version of imagen, free, for anyone who wants to use it https://ideogram.ai/publicly-available

2023-12 - google "imagen 2" is officially "generally available for Vertex AI customers on the allowlist (i.e., approved for access)." >>38628417

>>simult+XW
People search and remember things visually. Even if they're not consciously aware. So on the Cybershow [0] we decided to jump-in and use AI images as a quick way to visually tag episodes with something meaningful and fun.

We did that despite some moral ambivalence/uneasiness around AI "art".

For example, give me a "young and exciting Dana Meadows in front of a board of systems theory"

I'm not awful at photoshopping things, and sometimes that's the only way to get a specific image one has in mind. But it saves time and lets us concentrate on writing and researching instead.

TBH if an artist/illustrator came along and said "Let me do the episode icons even though you can't pay me yet" I'd feel inclined to ask the AI to step aside.

[0] https://cybershow.uk/episodes.php

>>jasond+(OP)
https://arxiv.org/pdf/2311.16567.pdf

>>kj99+Mn1
Which laws do you mean and where do they apply?

This article talks a bit about the lack of legal power to fight against deepfakes: https://mcolaw.com/theres-not-much-we-can-legally-do-about-d...

zlacker

MobileDiffusion: Rapid text-to-image generation on-device