zlacker

Kinda scratching my head at the purpose of the prompt understanding examples they show off. From previous papers I've seen in the space, shouldn't they be trying various compositional things like "A blue cube next to a red sphere" and variations thereof?

Instead they use

>The robin flew from his swinging spray of ivy on to the top of the wall and he opened his beak and sang a loud, lovely trill, merely to show off. Nothing in the world is quite as adorably lovely as a robin when he shows off - and they are nearly always doing it.

And show off the result being a photograph of a robin, cool. SDXL[0] can do the exact same thing given the same prompt, in fact even SD1.5 would be able to easily[1].

[0]https://i.imgur.com/rsgtYbf.png

[1]https://i.imgur.com/1rcQpcQ.png

replies(2): >>riskab+ao >>JayXon+Z71

>>Jackso+(OP)
I've developed two tests for AI image generators to see if they've actually advanced to "the next level". Take literally any AI image generator and give it one of these prompts:

"A flying squirrel gliding between trees": It won't be able to do it. Just telling it "flying squirrel" will often generate squirrels with bat wings coming off their backs.

Ahh, but that's just a tiny, specific thing missing from the data set! Surely that'll get fixed eventually as they add more training data...

"A fox girl hugging a bunny girl hugging a cat girl": The only way to make this work is with fancy stuff like Segment Anything (SAM) working with Stable Diffusion. Alternative prompts of the same thing:

"A fox girl and a bunny girl and a cat girl all hugging each other"

It's such a simple thing; generative AI can make three people hugging each other no problem. However, trying to get it to generate three different types of people in the same scene is really, really hard and largely dependent on luck.

replies(2): >>SushiH+3F >>DrSiem+pL

>>riskab+ao
I tested the prompts with dalle-3 (through the API)

The flying squirrel one, was spot on, it showed an image of the trees, and a squirrel with wings, which kind of looned like bat wings.

The 3 girls hugging each other however worked fairly well, it always created 3 different types of persons, but they never hugged each other. Either two of these 3 hugged each other, or no one hugged someone.

replies(1): >>therea+0f1

>>riskab+ao
In SD you can add words like twins, brothers, clones, repetition and copy to your negative prompt. It won't fix the problem, but it will help.

Would be a lot easier if AfterDetailer could handle dynamic prompts.

>>Jackso+(OP)
The prompt is a quote from a book, and it mentions "opened his beak and sang a loud, lovely trill", and the Imagen 2 robin does exactly that, but both SD ignored it completely, and SD1.5 isn't even on top of the wall.

>>SushiH+3F
(Deleted)