Z-Image: Powerful and highly efficient image generation model with 6B parameters

>>doener+(OP)
I've done some preliminary testing with Z-Image Turbo in the past week.

Thoughts

- It's fast (~3 seconds on my RTX 4090)

- Surprisingly capable of maintaining image integrity even at high resolutions (1536x1024, sometimes 2048x2048)

- The adherence is impressive for a 6B parameter model

Some tests (2 / 4 passed):

Personally I find it works better as a refiner model downstream of Qwen-Image 20b which has significantly better prompt understanding but has an unnatural "smoothness" to its generated images.

>>doener+(OP)
The [demo PDF](https://github.com/Tongyi-MAI/Z-Image/blob/main/assets/Z-Ima...) has ~50 photos of attractive young women sitting/standing alone, and exactly two photos featuring young attractive men on their own.

It's incredibly clear who the devs assume the target market is.

>>doener+(OP)
i have been testing this on my Framework Desktop. ComfyUI generally causes an amdgpu kernel fault after about 40 steps (across multiple prompts), so i spent a few hours building a workaround here https://github.com/comfyanonymous/ComfyUI/pull/11143

overall it's fun and impressive. decent results using LoRA. you can achieve good looking results with as few as 8 inference steps, which takes 15-20 seconds on a Strix Halo. i also created a llama.cpp inherence custom node for prompt enhancement which has been helping with overall output quality.

>>vunder+fCk
On fal, it takes less than a second many times.

https://fal.ai/models/fal-ai/z-image/turbo/api

Couple that with the LoRA, in about 3 seconds you can generate completely personalized images.

The speed alone is a big factor but if you put the model side by side with seedream and nanobanana and other models it's definitely in the top 5 and that's killer combo imho.

>>echelo+PCk
Yeah, I've definitely switched largely away from Flux. Much as I do like Flux (for prompt adherency), BFL's baffling licensing structure along with its excessive censorship makes it a noop.

For ref, the Porcupine-cone creature that ZiT couldn't handle by itself in my aforementioned test was easily handled using a Qwen20b + ZiT refiner workflow and even with two separate models STILL runs faster than Flux2 [dev].

https://imgur.com/a/5qYP0Vc

>>idontw+qKk
Apparently - https://github.com/ivanfioravanti/z-image-mps

Supports MPS (Metal Performance Shaders). Using something that skips Python entirely along with a mlx or gguf converted model file (if one exists) will likely be even faster.

>>muglug+TEk
Please write what you mean instead of making veiled implications. What is the point of beating around the bush here?

It's not clear to me what you mean either, especially since female models are overwhelmingly more popular in general[1].

[1]: "Female models make up about 70% of the modeling industry workforce worldwide" https://zipdo.co/modeling-industry-statistics/

>>muglug+TEk
It's interesting the handsome guy is literally Tony Leung Chiu-wai, https://www.imdb.com/name/nm0504897/, not even modified

>>doener+(OP)
As an AI outsider with a recent 24GB macbook, can I follow the quick start[1] steps from the repo and expect decent results? How much time would it take to generate a single medium quality image?

[1]: https://github.com/Tongyi-MAI/Z-Image?tab=readme-ov-file#-qu...

>>rfoo+wSk
You tell me.

https://imgur.com/a/7FR3uT1

>>muglug+TEk
"The Internet is really, really great..."

https://www.youtube.com/watch?v=LTJvdGcb7Fs

>>amrrs+uGk
I don't know anything about paying for these services, and as a beginner, I worry about running up a huge bill. Do they let you set a limit on how much you pay? I see their pricing examples, but I've never tried one of these.

https://fal.ai/pricing

>>paweld+Uyk
Incredibly fast, on my 5090 with CUDA 13 (& the latest diffusers, xformers, transformers, etc...), 9 samplig steps and the "Tongyi-MAI/Z-Image-Turbo" model I get:

- 1.5s to generate an image at 512x512

- 3.5s to generate an image at 1024x1024

- 26.s to generate an image at 2048x2048

It uses almost all the 32Gb Gb of VRAM and GPU usage. I'm using the script from the HF post: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

>>sheeps+8Mk
(Not tested) though apparently it already exists: https://github.com/leejet/stable-diffusion.cpp/wiki/How-to-U...

>>soonti+agl
Thanks for the heads up. I just checked the site through several browsers and proxying through a VPN. There's no typo and it properly links to:

https://github.com/Tongyi-MAI/Z-Image

Screenshot of site with network tools open to indicate link

https://imgur.com/a/FZDz0K2

EDIT: It's possible that this issue might have existed in an old cached version. I'll purge the cache just to make sure.

>>tethys+del
This. You can also run most (if not all) of the models that Fal.ai directly from the playground tab including Z-Image Turbo.

https://fal.ai/models/fal-ai/z-image/turbo

>>thih9+sUk
Try koboldcpp with the kcppt config file. The easiest way by far.

Download the release here

* https://github.com/LostRuins/koboldcpp/releases/tag/v1.103

Download the config file here

* https://huggingface.co/koboldcpp/kcppt/resolve/main/z-image-...

Set +x to the koboldcpp executable and launch it, select 'Load config' and point at the config file, then hit 'launch'.

Wait until the model weights are downloaded and launched, then open a browser and go to:

* http://localhost:5001/sdui

EDIT: This will work for Linux, Windows and Mac

>>Copenj+Dyk
I have had good textual results with the Turbo version so far. Sometimes it drops a letter in the output, but most of the time it adheres well to both the text requested and the style.

I tried this prompt on my username: "A painted UFO abducts the graffiti text "Accrual" painted on the side of a rusty bridge."

Results: https://imgur.com/a/z-image-test-hL1ACLd

zlacker

Z-Image: Powerful and highly efficient image generation model with 6B parameters