Thoughts
- It's fast (~3 seconds on my RTX 4090)
- Surprisingly capable of maintaining image integrity even at high resolutions (1536x1024, sometimes 2048x2048)
- The adherence is impressive for a 6B parameter model
Some tests (2 / 4 passed):
Personally I find it works better as a refiner model downstream of Qwen-Image 20b which has significantly better prompt understanding but has an unnatural "smoothness" to its generated images.
It is amazing how far behind Apple Silicon is when it comes to use non- language models.
Using the reference code from Z-image on my M1 ultra, it takes 8 seconds per step. Over a minute for the default of 9 steps.