I can't figure out how to try this thing. The closest I got was this sentence:
"To get started with Imagen 2 on Vertex AI, find our documentation or reach out to your Google Cloud account representative to join the Trusted Tester Program."
Yet another documentation release by googling, promising impressive things that we cannot actually use, while the competition is readily available.
It also includes a link to the TTP form, although the form itself seems to make no reference to Imagen being part of the program anymore, confusingly. (Instead indicating that Imagen is GA.)
I can only compare it to Stable Diffusion. But Imagen2 seems significant more advanced.
Try to do anything with text and SDxl. It's not easy and often messes up. I don't think you can get a clean logo with multiple text areas on sdxl.
Look at the prompt and image of the robin. That is mighty impressive.
dalle 3 gets things right most of the time
You can use Harrlogos XL to produce text with SDXL, although it's mostly limited to short captions and logos. The other way (controlnets) is more involved. (and is actually useful)
I wouldn't say this until we are able to try it for ourselves. As we know, Google is prone to severe cherry picking and deceptive marketing.
- https://cloud.google.com/vertex-ai (marketing page)
- https://cloud.google.com/vertex-ai/docs (docs entry point)
- https://console.cloud.google.com/vertex-ai (cloud console)
- https://console.cloud.google.com/vertex-ai/model-garden (all the models)
- https://console.cloud.google.com/vertex-ai/generative (studio / playground)
VertexAI is the umbrella for all of the Google models available through their cloud platform.
It still seems there is confusion (at google) about this being TTP or GA. Docs say both, the studio has a request access link.
more... this page has a table with features and current access levels: https://cloud.google.com/vertex-ai/docs/generative-ai/image/...
Seems that some features are GA while others are still in early access, in particular image generation is still EA, or what they call "Restricted GA"
I was hoping to see some research development but nothing.
Is this just an end-run around incompetent security teams or something?
In addition to the models, you'll find a host of day-2 features like model monitoring and experiment tracking. Having to vet and pick from 100+ new SaaS's for these is a nice to not have problem.
1. Go to console.cloud.google.com
2. Go to model garden
3. Search imagegeneration
4. End up at https://console.cloud.google.com/vertex-ai/publishers/google...
And for whatever reason that is where the documentation is.
Sample request
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/imagegeneration@002:predict"
Sample request.json {
"instances": [
{
"prompt": "TEXT_PROMPT"
}
],
"parameters": {
"sampleCount": IMAGE_COUNT
}
}
Sample response {
"predictions": [
{
"bytesBase64Encoded": "BASE64_IMG_BYTES",
"mimeType": "image/png"
},
{
"mimeType": "image/png",
"bytesBase64Encoded": "BASE64_IMG_BYTES"
}
],
"deployedModelId": "DEPLOYED_MODEL_ID",
"model": "projects/PROJECT_ID/locations/us-central1/models/MODEL_ID",
"modelDisplayName": "MODEL_DISPLAYNAME",
"modelVersionId": "1"
}
Disclaimer: Haven't actually tried sending a request..."Generative AI" is a learned, lossy compression codec. You should not be surprised that the range of outputs for a given input seems limited.
Would you have agreed this was the case at Twitter for a while?
FWIW I think I have 5+ google accounts. Have had them since gmail was in beta and have never been banned
Ambition trickles downwards and is killed upwards.
This has recently been resolved though, with a compromised deal, so hopefully these services will soon be available here
Overselling is not a winning strategy, especially when others are shipping genuinely good products.
Every time Google show off something new the first thing people now ask is what part Google faked (or extreme cherry picking).
And also be prepared to wait somewhere between 6- inf months ... at this point the google cloud account reps can't even grease the wheels for us
Why bother using a product from a company that is notorious for failing to commit to most of their services, when you can run something which produces output that is pretty close (and maybe better) and is free to run and change and train?
To add insult to injury, they have nice press releases and demos of their latest AI but aren't easily accessible or available until next year. The press and Wallstreet love gob it up and the stock rises. Is it just for them?
Stable Diffusion is the Linux-on-the-desktop of diffusion models IMO
(I agree w/ your comment on trusting Google - pretty sure they'll just phase this off eventually anyway, so I wouldn't bother trying it)
> As always, you are in control of your data with ChatGPT.
Which is a flat-out lie. You can allegedly opt-out of them using your data for training, but you are still sending your data to a private corporation for processing/etc. which makes it totally unsuitable for handling sensitive or restricted data.
Because it costs $0.02 per image instead of $1000 on a graphics card and endless buggering around to set up.
Linux entered the market at a time when paid alternatives were fully established and concentrated, servicing users/companies for years who became used to working with them. No paid txt2img offering comes anywhere close to market dominance for image generation. They don't offer anything that isn't available with free alternatives (they actually offer less) and are highly restrictive in comparison. Anyone who is doing anything beyond disguised DALLE/Imagen clients, has absolutely no incentives to use a paid service.
*it also takes like 15 mins to setup up (this includes loading the models).
This makes the image much more usable without editing.
OpenAI services are available in Canada but as an individual, $27/mo for ChatGPT Plus and then paying per use for the API is kinda a hard sell for me.
I'm needing a hardware refresh soon, so I think i'm just going to run the open source stuff locally once I get around to figuring out how to set that all up.
> Imagen 2’s dataset and model advances have delivered improvements in many of the areas that text-to-image tools often struggle with, including rendering realistic hands and human faces and keeping images free of distracting visual artifacts.
People just have different definitions of what coasting means. In general don't think "doing nothing" or "avoiding work" think "add certainty to process + decision making like everyone else does", and much more importantly "avoiding friction because as soon as there's even a little bit, people leverage it"
More detail on what causes this:
- processes become elongated through what Steve Yegge called cookie-licking, more specifically, anyone above line level doing "I am the 10th person who needs to give a green light for this to happen"
- the elongated process taking so long with that many people that some people lose interest or move on or forget they already approved it
- business disruptions (ex. now Sundar told VP told VP told VP who told director to add GenAI goals)
- bad managers are __really__ bad at BigCo, there's so much insulation from reality due to the money printer, and cultural bias towards "meh everythings good!"
- managers trying to get stuff done rely on people who slavishly overwork to do the minimum possible for their _direct manager_ to be happy
- only needing to keep your manager happy, and your manager being focused on deploying limited resources, creates a suspicious untrusting atmosphere. The amount of othering and trash-talking is incredibly disturbing.
- _someone_ has to slavishly overwork on any given project because there's very little planning. due to the "meh everythings good!" inclination, coupled to software being pretty hard to plan accurately anyway. so what's the point of planning it all?
- newly minted middle managers are used to clinging onto anything their manager cares about and overworking, so they end up being a massive bottleneck for their reports. New middle manager on my team's profile page looked like a military dictator's medals, 6 projects they were "leading", 1 of which they were actually working on and actually got done.
- The "coaster" realizes "if I go outside the remit of what my manager asked for, they A) won't care because they didn't ask for it B) which exposes me to non-zero friction because they'll constantly be wondering why I'm doing it at all C) I'll have to overwork because they won't help plan or distribute work because it was my idea to go beyond the bare minimum D) its very very hard to get promoted, especially based on work my manager didn't explicitly ask for E) the cultural bias here is strongly towards everything is okay all the time no matter what, so any visible friction will be attributed to me personally being difficult
And that's _before_ you account for the genuine sociopathy you see increasingly as you move up the ladder.
Anecdote:
I waited _3 years_ to launch work I had done and 3 VPs asked for. Year 3, it came to a head b/c one of the 3 was like "wtf is going on!?" My team's product manager outright pretended our org's VP didn't want it, had 0 interest in it, after first pretending it didn't _come up at all_ in a meeting arranged to talk about it.
Within a couple weeks this was corrected by yet another VP meeting where they called in the PM's boss' boss' boss and the VP was like "fuck yeah I want this yesterday", but engineering middle manager and PM closed ranks to blame it on me. Engineering went with "Where's the plan / doc!?!?" (I won't even try to explain this, trust me, after 3 yrs they knew and there were docs), and both pretended I was interrupting meetings regularly (I was the only one who ever wrote anything on the agenda, and once we hit year 2.5, I was very careful to only speak when called upon because it was clear it was going to build up to this, as they were assigned the new shiny year-long project to rush a half-assed version of Cupertino's latest, as they were every year).
Instead they use
>The robin flew from his swinging spray of ivy on to the top of the wall and he opened his beak and sang a loud, lovely trill, merely to show off. Nothing in the world is quite as adorably lovely as a robin when he shows off - and they are nearly always doing it.
And show off the result being a photograph of a robin, cool. SDXL[0] can do the exact same thing given the same prompt, in fact even SD1.5 would be able to easily[1].
My kids found it organically and were happily creating all sorts of DALL·E 3 images.
(DALL-E pretends to do that, but it's actually just using GPT-4 Vision to create a description of the image and then prompting based on that.)
Live editing tools like https://drawfast.tldraw.com/ are increasingly being built on top of Stable Diffusion, and are far and away the most interesting way to interact with image generation models. You can't build that on DALL-E 3.
Still, Stable Diffusion is losing the usability, tooling and integration game. The people who care to make interfaces for it mostly treat it as an expert tool, not something for people who have never heard of image generating AI. Many competing services have better out-of-the-box results (for people who don't know what a negative prompt is), easier hosting, user friendly integrations in tools that matter, better hosted services, etc.
>> generally available for Vertex AI customers on the allowlist (i.e., approved for access).
I guess that turns out to be not as important for end users as you'd think.
Anyway, DeepFloyd/IF has great comprehension. It is straightforward to improve that for Stable Diffusion, I cannot tell you exactly why they haven't tried this.
If you're just generating something for fun then DallE/MJ is probably sufficient, but if you're doing a project that requires specific details/style/consistency you're going to need way more tools. With SD/A*1111 you can use a specific model (one that generates images with an Anime style for instance), use a ControlNet model for a specific pose, generate hundreds of potential images (without having to pay for each), use other tools like img2img/inpaint to hone your vision using the images you like, and if you're looking for a specific effect (like a gif for instance), you can use the many extensions created by the community to make it happen.
I'm guessing that number included product/program managers, not just "people managers".
Rewriting a Linux kernel module is "important", but rarely impactful.
Then this: https://civitai.com/
And I have completely abandoned DALLE and will likely never use it again.
But it clearly didn't win in many scenarios, especially those require text to be precise, and that happens to be more important in commercial setting, to clear up those gibberish texts generated by OSS stable diffusion seems tiring by itself.
But I still agree with you - would rather have seen Google not give in to this sort of thing at all.
It was very different for Meta - they already don't like sending people away from their site so it was much easier for them to hold out.
[1] https://cloud.google.com/vertex-ai/docs/generative-ai/image/...
Threads? Its usage is down 90% since its launch six months ago, presumably because they kept the people who could launch stuff and got rid of the people who had some idea of what should be launched.
The "Blue Checkmark" system? Released with no thought at all, absolute disaster. Steven King had to publicly announce that, despite indications to the contrary, he was not a paid user, and he felt it was important to tell people because he didn't want the idea that he was a paid subscriber to harm his reputation. Same underlying problem: the people who could ship things were still shipping things, but the people who could figure out what to make were gone.
And yes, they did drastically reduce cost...and much more drastically reduce revenue.
They shipped quite a bit of stuff, like the blue tick or revenue sharing. Other than Musk courting fascist and other kind of undesirables, twitter as a product is doing fine. It might go under though, but if that happens isn't going to happen because lack of employees.
Open AI really shows us how it's done, or the way Mistral just dumps a torrent on everyone. That's marketing i can respect.
Stop attacking other people and mind your own business, especially if you're making stuff up.
"A flying squirrel gliding between trees": It won't be able to do it. Just telling it "flying squirrel" will often generate squirrels with bat wings coming off their backs.
Ahh, but that's just a tiny, specific thing missing from the data set! Surely that'll get fixed eventually as they add more training data...
"A fox girl hugging a bunny girl hugging a cat girl": The only way to make this work is with fancy stuff like Segment Anything (SAM) working with Stable Diffusion. Alternative prompts of the same thing:
"A fox girl and a bunny girl and a cat girl all hugging each other"
It's such a simple thing; generative AI can make three people hugging each other no problem. However, trying to get it to generate three different types of people in the same scene is really, really hard and largely dependent on luck.
One low-level issue is how long everything has to take because of tooling. Engineers have way too much patience for overcomplicated garbage and tend to obsess over pointless details. Kind of in the opposite direction of coasting, but still a real problem.
I've met many from the managerial class without these traits that seem to have no problem coasting and trancending actual meticulous work because their game is all about personal career management, not the hyperfocus a lot of us here engage in daily.
What a shitshow.
It installs dozens upon dozens of models and related scripts painlessly.
Even calendaring was something that took ages for them to get right. For something like a decade you couldn't move an event from one calendar to another on Android - only via the destop web view.
Google went from being an innovative company to a web version of IBM...a giant lumbering dinosaur that can't get out of its own way, and everyone kinda needs but also deeply loathes
Of course it became immediately obvious to me why the model isn't public. It's just not as good as advertised, that's why. Google should stop deceiving the public.
Not if you have no account and are not in US. Before, when I clicked on twitter link it worked 99.9% of the time. Now it is lottery. Sometimes it loads without comments, most of the time it does not load at all.
The flying squirrel one, was spot on, it showed an image of the trees, and a squirrel with wings, which kind of looned like bat wings.
The 3 girls hugging each other however worked fairly well, it always created 3 different types of persons, but they never hugged each other. Either two of these 3 hugged each other, or no one hugged someone.
For camaraderie along the way:
- any peer to peer counseling / mentorship at your company. having someone senior in an unrelated division caring about it, and who I also could trust to be honest with me about when it was my fault vs. I was being railroaded helped a ton
- Blind (the app). Standard perils of internet anonymity and verbal brutality, but, at least you'll always get excellent advice. if you did your best, people aren't afraid to say it either.
- be aware of your companies policies on medical leave.
- leave sooner rather than later
Results: https://imgur.com/a/JIiuDt9
Results (these are the only two images I generated): https://imgur.com/a/JIiuDt9
Would be a lot easier if AfterDetailer could handle dynamic prompts.
Android's advantage has always been that everyone else gets to play. And it's good that we have that. But they aren't exactly the beacon of innovation they think they are or claim to be in marketing copy.
Also not sure if it can be extended with LORAs or by turning it into a video/3D model the same way an LDM can.
I used to hardly ever see spam, except when looking at replies to famous huge accounts, now I get 2-5 follows/likes/mentions a day from fake accounts mostly of semi-naked girls with a link to a website.
And any reasonably active thread of replies to a tweet now surfaces the idiotic nonsense of blue tick subscribers to the top, rather than ranking by tweet quality/relevance.
I still think layoffs are bad because I don't care about corporate profit or efficiency to be honest, but in this case it's a bit surprising how nothing concrete has actually changed even with 80% reduction in staffing. 80% sounds apocalyptical to me but again, twitter just works like it did before. With the same annoying, never fixed bugs (occasional "something went wrong" on clicking tweets, etc) . But again, nothing close to the (technical) train wreck I would expect.
[1] https://github.com/GoogleCloudPlatform/generative-ai/blob/ma...
[2] https://console.cloud.google.com/vertex-ai/publishers/google...
Now, from a designer perspective, honestly, I don't care too much who's the provider of the image, since one will have to anyway work more on it. So designers, illustrators, etc are not the target for such platforms, even though it seems counter-intuitive. If you ask me which system was the source for an image used for a poster last 12 months... well, I may remember, but is not of a paramount importance to the end result. After an year of active usage of DALLE2/3, SDXL, Midjourney (which is also SD of some sort) I can confidently state that there is much more work to do and a lot of prompting, to actually get something unique and something worth being used. Sadly the time taken is proportionate to working with actual real artist. Of course - the latter is likely to be hit by this new innovation, but perhaps not so much.
From the perspective of s.o. integrating text-ot-image - which is yet to be seen in a reasonable manner, like for a quest game with generative images - the API flexibility and cost would be the most important qualifier. Even then it may actually be better to run SD/XL. From cost perspective - all these services are still very pricey to be used for anything more serious than few one-shot images.
Yes.
> Much less spam accounts below every single post like how it used to be.
No, they're still there. They're even more there on popular posts.
A strange thing is that they never seem to ban the onlyfans bots, but they do hide them under "more replies" - so if you habitually expand that, you just keep seeing the same ones everywhere.
> but in this case it's a bit surprising how nothing concrete has actually changed even with 80% reduction in staffing
That's not too surprising, because what the other people were doing was changing stuff. So now they're gone, things won't change, ever.
That last picture is still so horribly bad it's no wonder Google made it almost impossible to access this tech.
How did Google drop the ball on AI like this when they pioneered the entire field?
Our team was flabbergasted that this could even be an issue.
I still can't understand how it got released and advertized.
I've used SD / Midjourney / Dalle extensively and would say this is honestly shockingly bad besides the two last ones.
Comparable to the first versions of the other services, but with better contextual understanding but still with lots of gnarly artifacts and weirdness going on.
Longer version. Sorry to torture the threads with these, but I've noticed people don't take 'BigCo is a weird, strange, place' stories seriously unless there's a full anecdote coupled to it:
Google was my first real job, got very very lucky with a transition from dropout waiter => startup founder => sold => 9 months later, did interviews as a joke and...passed?
My first few years, I didn't understand this was happening, and eventually we got transferred to Android, and it was just an absolute directionless wasteland for at least 4 months. I couldn't even begin to understand why my peers A) had no work B) were fine with it C) when we tried talking about this, it was like we were speaking different languages.
I saw it as a 'leadership opportunity' and butted my/our way in to a big project and picked up another. Huge stuff. Visual redo of key property, and on the side, got a fundamental change to the input method for the same property, delivered by me client side and server side, then wheeled and dealed to get it deployed cross-platform.
That whole year peers didn't invest in the visual redo, even though it was ostensibly our teams work. Our newly promoted manager never planned / assigned work to people, and was out for about 50% of that first year.
It turned into Lord of the Flies while they were out. Only 2 peers worked on it out of 4. #3 helped out on a lower-key project. #4 focused on advocating for a feature that'd watch your screen and ex. tell you Infowars was Very Bad if you visited Infowars. At Old Google you could work on obviously bad ideas like this and you just wouldn't advance. It's a good thing that this would only last a month or two these days, if it happened at all.
Peer A was extremely confident but also extremely out of touch, for example, 2 weeks before launch they spent 5 minutes arguing with the partner team, telling the it was impossible that we had written all our code in $BINARY_A instead of $BINARY_B...which we had. When faced with the bare fact, they then went with "oh no wonder why nothing works" (???)
Peer B was relatively new to tech, so the histrionics the other would leap to had a massive influence on them. Always horrified we were doing anything at all without getting 3 separate approvals first, stapled to a direct request laying out exactly what was required, instead of just a Figma / GIF.
Peer B also got _insanely_ over-the-top mean to me after the project. Yet, they were nice and extremely intelligent generally.
That's when it finally clicked for me that something was off and I needed to approach the coasting question more inquisitively:
_what_ were they seeing differently?
They understood they were avoiding pain that they'd get ~0 credit for working through.
They were right.
I got excellent reviews from the partner team and product manager, I got awful reviews from Peer A and a meh one from peer B, and got a middling performance review after moving 2 mountains essentially solo.
Though, a $10K bonus, this was standard payout for staying silent / not complaining after dealing with an obviously toxic situation.
I had to appeal to VPs for recommendations the next year to break through the "gee you moved two mountains and had great feedback from everyone _not_ on the team, but peer A and peer B didn't like you much"
i'm one of the founders