Thoughts
- It's fast (~3 seconds on my RTX 4090)
- Surprisingly capable of maintaining image integrity even at high resolutions (1536x1024, sometimes 2048x2048)
- The adherence is impressive for a 6B parameter model
Some tests (2 / 4 passed):
Personally I find it works better as a refiner model downstream of Qwen-Image 20b which has significantly better prompt understanding but has an unnatural "smoothness" to its generated images.
Is Flux 1/2/Kontext left in the dust by the Z Image and Qwen combo?
For ref, the Porcupine-cone creature that ZiT couldn't handle by itself in my aforementioned test was easily handled using a Qwen20b + ZiT refiner workflow and even with two separate models STILL runs faster than Flux2 [dev].