Thoughts
- It's fast (~3 seconds on my RTX 4090)
- Surprisingly capable of maintaining image integrity even at high resolutions (1536x1024, sometimes 2048x2048)
- The adherence is impressive for a 6B parameter model
Some tests (2 / 4 passed):
Personally I find it works better as a refiner model downstream of Qwen-Image 20b which has significantly better prompt understanding but has an unnatural "smoothness" to its generated images.
It's incredibly clear who the devs assume the target market is.
overall it's fun and impressive. decent results using LoRA. you can achieve good looking results with as few as 8 inference steps, which takes 15-20 seconds on a Strix Halo. i also created a llama.cpp inherence custom node for prompt enhancement which has been helping with overall output quality.
https://fal.ai/models/fal-ai/z-image/turbo/api
Couple that with the LoRA, in about 3 seconds you can generate completely personalized images.
The speed alone is a big factor but if you put the model side by side with seedream and nanobanana and other models it's definitely in the top 5 and that's killer combo imho.
For ref, the Porcupine-cone creature that ZiT couldn't handle by itself in my aforementioned test was easily handled using a Qwen20b + ZiT refiner workflow and even with two separate models STILL runs faster than Flux2 [dev].
Supports MPS (Metal Performance Shaders). Using something that skips Python entirely along with a mlx or gguf converted model file (if one exists) will likely be even faster.
It's not clear to me what you mean either, especially since female models are overwhelmingly more popular in general[1].
[1]: "Female models make up about 70% of the modeling industry workforce worldwide" https://zipdo.co/modeling-industry-statistics/
[1]: https://github.com/Tongyi-MAI/Z-Image?tab=readme-ov-file#-qu...
- 1.5s to generate an image at 512x512
- 3.5s to generate an image at 1024x1024
- 26.s to generate an image at 2048x2048
It uses almost all the 32Gb Gb of VRAM and GPU usage. I'm using the script from the HF post: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
https://github.com/Tongyi-MAI/Z-Image
Screenshot of site with network tools open to indicate link
EDIT: It's possible that this issue might have existed in an old cached version. I'll purge the cache just to make sure.
Download the release here
* https://github.com/LostRuins/koboldcpp/releases/tag/v1.103
Download the config file here
* https://huggingface.co/koboldcpp/kcppt/resolve/main/z-image-...
Set +x to the koboldcpp executable and launch it, select 'Load config' and point at the config file, then hit 'launch'.
Wait until the model weights are downloaded and launched, then open a browser and go to:
* http://localhost:5001/sdui
EDIT: This will work for Linux, Windows and Mac
I tried this prompt on my username: "A painted UFO abducts the graffiti text "Accrual" painted on the side of a rusty bridge."