Google Imagen 2

>>geox+(OP)
This post has more information: https://cloud.google.com/blog/products/ai-machine-learning/i...

I can't figure out how to try this thing. The closest I got was this sentence:

"To get started with Imagen 2 on Vertex AI, find our documentation or reach out to your Google Cloud account representative to join the Trusted Tester Program."

>>simonw+Q7
This page might be somewhat helpful: https://cloud.google.com/vertex-ai/docs/generative-ai/image/...

It also includes a link to the TTP form, although the form itself seems to make no reference to Imagen being part of the program anymore, confusingly. (Instead indicating that Imagen is GA.)

>>Mashim+qb
Stability AI has gaps in SDXL for text, but they seem to do a better job with Deep Floyd ( https://github.com/deep-floyd/IF ). I have done a lot of interesting text things with Deep Floyd

>>geox+(OP)
For the peer comments

- https://cloud.google.com/vertex-ai (marketing page)

- https://cloud.google.com/vertex-ai/docs (docs entry point)

- https://console.cloud.google.com/vertex-ai (cloud console)

- https://console.cloud.google.com/vertex-ai/model-garden (all the models)

- https://console.cloud.google.com/vertex-ai/generative (studio / playground)

VertexAI is the umbrella for all of the Google models available through their cloud platform.

It still seems there is confusion (at google) about this being TTP or GA. Docs say both, the studio has a request access link.

more... this page has a table with features and current access levels: https://cloud.google.com/vertex-ai/docs/generative-ai/image/...

Seems that some features are GA while others are still in early access, in particular image generation is still EA, or what they call "Restricted GA"

>>geox+(OP)
The prompt "A shot of a 32-year-old female, up and coming conservationist in a jungle; athletic with short, curly hair and a warm smile" produced an impressive image. But I ran the same prompt 3 times on my laptop in just a few minutes, and got 3 almost-equally impressive images. (using stable diffusion and a free model called devlishphotorealism_sdxl15)

https://imgur.com/a/4otrN17

>>geox+(OP)
they should make it accessible at https://imagen.google like how meta did with https://imagine.meta.com

>>geox+(OP)
The authors of the original Imagen paper have gone on to create https://ideogram.ai/

>>geox+(OP)
I asked imagen 2 to generate a transparent product icon image, and it generated an actual grey and white square pattern as the background of the image... https://imgur.com/a/KA2yWHp

>>simonw+Q7
I think the process is

1. Go to console.cloud.google.com

2. Go to model garden

3. Search imagegeneration

4. End up at https://console.cloud.google.com/vertex-ai/publishers/google...

And for whatever reason that is where the documentation is.

Sample request

    curl -X POST \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        -H "Content-Type: application/json; charset=utf-8" \
        -d @request.json \
        "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/imagegeneration@002:predict"

Sample request.json

    {
      "instances": [
        {
          "prompt": "TEXT_PROMPT"
        }
      ],
      "parameters": {
        "sampleCount": IMAGE_COUNT
      }
    }

Sample response

    {
      "predictions": [
        {
          "bytesBase64Encoded": "BASE64_IMG_BYTES",
          "mimeType": "image/png"
        },
        {
          "mimeType": "image/png",
          "bytesBase64Encoded": "BASE64_IMG_BYTES"
        }
      ],
      "deployedModelId": "DEPLOYED_MODEL_ID",
      "model": "projects/PROJECT_ID/locations/us-central1/models/MODEL_ID",
      "modelDisplayName": "MODEL_DISPLAYNAME",
      "modelVersionId": "1"
    }

Disclaimer: Haven't actually tried sending a request...

>>celest+ow
https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-dat...

>>simonw+Q7
Ok, we'll change to that from https://deepmind.google/technologies/imagen-2/ above. Thanks!

>>tomCom+fL
Except they just came to an agreement https://www.theglobeandmail.com/politics/article-bill-c18-on...

>>brettg+i41
Allegedly Imagen 2 is indeed better at producing hands: https://deepmind.google/technologies/imagen-2/

> Imagen 2’s dataset and model advances have delivered improvements in many of the areas that text-to-image tools often struggle with, including rendering realistic hands and human faces and keeping images free of distracting visual artifacts.

>>geox+(OP)
Kinda scratching my head at the purpose of the prompt understanding examples they show off. From previous papers I've seen in the space, shouldn't they be trying various compositional things like "A blue cube next to a red sphere" and variations thereof?

Instead they use

>The robin flew from his swinging spray of ivy on to the top of the wall and he opened his beak and sang a loud, lovely trill, merely to show off. Nothing in the world is quite as adorably lovely as a robin when he shows off - and they are nearly always doing it.

And show off the result being a photograph of a robin, cool. SDXL[0] can do the exact same thing given the same prompt, in fact even SD1.5 would be able to easily[1].

[0]https://i.imgur.com/rsgtYbf.png

[1]https://i.imgur.com/1rcQpcQ.png

>>nprate+R41
You don't even need a GPU anymore unless you care about realtime. A decent CPU can generate a 512x512 image in 2 seconds.

https://github.com/rupeshs/fastsdcpu

https://www.youtube.com/watch?v=s2zSxBHkNE0

>>brrrrr+Hp
Don't forget Bing Image Creator: https://www.bing.com/images/create

My kids found it organically and were happily creating all sorts of DALL·E 3 images.

>>karmas+C81
DALL-E 3 doesn't have Stable Diffusion's killer feature, which is the ability to use an image as input and influence that image with the prompt.

(DALL-E pretends to do that, but it's actually just using GPT-4 Vision to create a description of the image and then prompting based on that.)

Live editing tools like https://drawfast.tldraw.com/ are increasingly being built on top of Stable Diffusion, and are far and away the most interesting way to interact with image generation models. You can't build that on DALL-E 3.

>>bbor+f81
Using this: https://github.com/AUTOMATIC1111/stable-diffusion-webui

Then this: https://civitai.com/

And I have completely abandoned DALLE and will likely never use it again.

>>kkkkkk+741
The documentation [1] says otherwise. Image generation is "Restricted General Availability (approved users)" and "To request access to use this Imagen feature, contact your Google account representative."

[1] https://cloud.google.com/vertex-ai/docs/generative-ai/image/...

>>gpm+rw
Once I finally got mostly set up for that, with billing and everything, it said it's only available for a limited number of customers, with a "request access" link to a google form with further links (to enable https://aiplatform.googleapis.com/) which 404.

What a shitshow.

>>jeffbe+My
>>38633910

>>bbor+rw1
On Windows just use https://softology.pro/tutorials/tensorflow/tensorflow.htm

It installs dozens upon dozens of models and related scripts painlessly.

>>6gvONx+6A1
Weird, just tried in my terminal and it works fine. My account definitely has no special permissions, I've never requested any, I've probably spent less than $100 total on it (and that almost entirely on domain names).

Results: https://imgur.com/a/JIiuDt9

>>gpm+rw
Self reply since I can't edit the post anymore. Tried this, the api seems to work just fine to me with no extra permissions.

Results (these are the only two images I generated): https://imgur.com/a/JIiuDt9

>>mkl+Er1
Sure, here's sample query https://imgur.com/a/8UDDac9 These are DALL-E 3, Imagen, and Imagen 2, in this order. I've used the code based on similar examples from GitHub [1]. According to docs [2], imagegeneration@005 was released on the 11th, so I guessed it's Imagen 2, though there are no confirmations.

[1] https://github.com/GoogleCloudPlatform/generative-ai/blob/ma...

[2] https://console.cloud.google.com/vertex-ai/publishers/google...

>>kossTK+Pe3
To be fair, it is not always that awful. Here is a sample of the results of simpler subset prompts I like to run on image generation: https://imgur.com/a/aO5S7yM Some are bad (first two), but others are okay; it understands text pretty well, but the artifacts just feel like years ago.

I still can't understand how it got released and advertized.

zlacker

Google Imagen 2