zlacker

I think the competition for text to image services is over and open source, stable diffusion won. It doesn't matter how detailed (or whatever counts as "better") corporate text-to-image products get, stable diffusion is good enough which really is good enough. Unlike the corporate offerings, open source txt2img doesn't have random restrictions (no its not just porn at this point) and actually allows for additional scripts/tooling/models. If you're attempting to do anything on a professional level or produce an image with specific details via txt2img, you likely have a workflow with txt2img being only step one.

Why bother using a product from a company that is notorious for failing to commit to most of their services, when you can run something which produces output that is pretty close (and maybe better) and is free to run and change and train?

replies(8): >>herval+M2 >>nprate+W5 >>pradn+V8 >>karmas+H9 >>wongar+Ef >>summer+jg >>yellow+fC >>fngjdf+OW

>>boh+(OP)
I also think it's over, but I don't see how Stable Diffusion won anything. If something, I see people flocking en masse to dalle3/google/amazon/whatever API is easy to integrate in one side, and consumers paying for Adobe & Canva in the other.

Stable Diffusion is the Linux-on-the-desktop of diffusion models IMO

(I agree w/ your comment on trusting Google - pretty sure they'll just phase this off eventually anyway, so I wouldn't bother trying it)

replies(2): >>boh+j7 >>bbor+k9

>>boh+(OP)
> Why bother using a product from a company that is notorious for failing to commit to most of their services, when you can run something which produces output that is pretty close (and maybe better) and is free to run and change and train?

Because it costs $0.02 per image instead of $1000 on a graphics card and endless buggering around to set up.

replies(4): >>herval+i6 >>boh+J8 >>ForkMe+Rc >>edgood+4f4

>>nprate+W5
you can use stable diffusion on many hosted services out there (eg Replicate) for fractions of a cent. 2 cents per image is absurdly expensive, they're anchoring that on the dalle3 price, which likely won't go down because there's little incentive to do so, specially from their stakeholders/partners (shutterstock, etc)

>>herval+M2
I don't think there's numbers that show "people flocking" to paid vs free open source offerings since running your own stable diffusion server/desktop isn't showing up on a sale's report.

Linux entered the market at a time when paid alternatives were fully established and concentrated, servicing users/companies for years who became used to working with them. No paid txt2img offering comes anywhere close to market dominance for image generation. They don't offer anything that isn't available with free alternatives (they actually offer less) and are highly restrictive in comparison. Anyone who is doing anything beyond disguised DALLE/Imagen clients, has absolutely no incentives to use a paid service.

>>nprate+W5
$0.02 per image is crazy expensive! Running a higher tier GPU on Runpod is a fraction of the cost (especially if you're pricing per image).

*it also takes like 15 mins to setup up (this includes loading the models).

>>boh+(OP)
Google has as good a track record as anyone else for not shutting down Cloud services. Consumer services are a different category of product.

>>herval+M2
I would totally agree. I’ve tried to setup stable diffusion a couple times, and even as a professional software engineer working in AI, every time I fail to get good results, get interrupted, lose track, and end up back at DALLE. I’ve seen what it can do, I know it can be amazing, but like Linux it has some serious usability issues

replies(1): >>boh+ki

>>boh+(OP)
Why stable diffusion won? Dalle3 and this is miles ahead in understanding scene and put correct text at the right place.

This makes the image much more usable without editing.

replies(3): >>simonw+Qe >>doctor+gg >>boh+Jg

>>nprate+W5
You don't even need a GPU anymore unless you care about realtime. A decent CPU can generate a 512x512 image in 2 seconds.

https://github.com/rupeshs/fastsdcpu

https://www.youtube.com/watch?v=s2zSxBHkNE0

>>karmas+H9
DALL-E 3 doesn't have Stable Diffusion's killer feature, which is the ability to use an image as input and influence that image with the prompt.

(DALL-E pretends to do that, but it's actually just using GPT-4 Vision to create a description of the image and then prompting based on that.)

Live editing tools like https://drawfast.tldraw.com/ are increasingly being built on top of Stable Diffusion, and are far and away the most interesting way to interact with image generation models. You can't build that on DALL-E 3.

replies(1): >>karmas+Bo

>>boh+(OP)
Stable Diffusion with the right fine-tunes in the hand of a competent user might be the best (if you define "realistic" as best, MidJourney might disagree with that being the only metric). It is good enough that I find it hard to get excited about somebody showing off a new model.

Still, Stable Diffusion is losing the usability, tooling and integration game. The people who care to make interfaces for it mostly treat it as an expert tool, not something for people who have never heard of image generating AI. Many competing services have better out-of-the-box results (for people who don't know what a negative prompt is), easier hosting, user friendly integrations in tools that matter, better hosted services, etc.

>>karmas+H9
> Dalle3 and this is miles ahead in understanding scene and put correct text at the right place.

I guess that turns out to be not as important for end users as you'd think.

Anyway, DeepFloyd/IF has great comprehension. It is straightforward to improve that for Stable Diffusion, I cannot tell you exactly why they haven't tried this.

replies(1): >>astran+Ic1

>>boh+(OP)
I don't think SD has won the fight. It still doesn't give creators a full control of the output. It might be useful to auto generate some random illustrations but you need to give more controls if the output needs to be used as essential assets.

>>karmas+H9
I think because most people are used to Dall-E and the Midjourney user experience, they don't know what they're missing. In my experience SD was just as good in terms of "understanding" but offers way more features when using something like AUTOMATIC 1111.

If you're just generating something for fun then DallE/MJ is probably sufficient, but if you're doing a project that requires specific details/style/consistency you're going to need way more tools. With SD/A*1111 you can use a specific model (one that generates images with an Anime style for instance), use a ControlNet model for a specific pose, generate hundreds of potential images (without having to pay for each), use other tools like img2img/inpaint to hone your vision using the images you like, and if you're looking for a specific effect (like a gif for instance), you can use the many extensions created by the community to make it happen.

>>bbor+k9
Using this: https://github.com/AUTOMATIC1111/stable-diffusion-webui

Then this: https://civitai.com/

And I have completely abandoned DALLE and will likely never use it again.

replies(1): >>bbor+wx

>>simonw+Qe
Saying SD is losing or not useful isn't my position.

But it clearly didn't win in many scenarios, especially those require text to be precise, and that happens to be more important in commercial setting, to clear up those gibberish texts generated by OSS stable diffusion seems tiring by itself.

replies(1): >>boh+Tw

>>karmas+Bo
If you’re in charge of graphics in a “commercial setting”, you 100% couldn’t care less about text and likely do not want txt2img to include text at all. #1 it’s about the easiest thing to deal with in Photoshop, #2 you likely want to have complete control over text placement/fonts etc., #3 you actually have to have licenses for fonts, especially for commercial purposes. Using a random font from a txt2img generator can open you up to IP litigation.

>>boh+ki
I was kind of hoping someone like you would reply - you’re a very kind person. Thank you for taking the time. Excited to try this advice tonight!

replies(1): >>andyba+LF

>>boh+(OP)
SD can’t give indemnification the way Google and Microsoft can.

>>bbor+wx
On Windows just use https://softology.pro/tutorials/tensorflow/tensorflow.htm

It installs dozens upon dozens of models and related scripts painlessly.

>>boh+(OP)
SD still can't do interactions (between people, objects) as well as DALL-E 3 can. I hope that improves. And unfortunately this isn't like software where we can just slowly build a better open source version. This costs millions to train. I hope that as the hardware and algorithms improve (and perhaps the datasets as well) it won't be that way in the future. Random kick starters can get hundreds of thousands easily and I think we could see something like that with with something like SD as well in the future.

>>doctor+gg
Deepfloyd is slower and needs a lot more memory since it's pixel diffusion.

Also not sure if it can be extended with LORAs or by turning it into a video/3D model the same way an LDM can.

>>nprate+W5
if you're interested in exploring providers shadeform will let you compare prices for the same cards across providers*

i'm one of the founders