zlacker

Everything around it seems so shady.

The strangest thing to me is that the shadiness seems completely unnecessary, and really requires a very critical eye for anything associated with OpenAI. Google seems like the good guy in AI lol.0

replies(3): >>ethbr1+J2 >>yaomin+48 >>Eisens+Tw

>>Spooky+(OP)
Google, the one who haphazardly allows diversity prompt rewriting to be layered on top of their models, with seemingly no internal adversarial testing or public documentation?

replies(1): >>ben_w+N4

>>ethbr1+J2
"We had a bug" is shooting fish in a barrel, when it comes to software.

I was genuinely concerned about their behaviour towards Timnit Gebru, though.

replies(2): >>ethbr1+Gm >>concor+zz

>>Spooky+(OP)
It's a shame that Gemini is so far behind ChatGPT. Gemini Advanced failed softball questions when I've tried it, but GPT works almost every time even when I push the limits.

Google wants to replace the default voice assistant with Gemini, I hope they can make up the gap and also add natural voice responses too.

replies(1): >>nebula+Rj

>>yaomin+48
You tried Gemini 1.5 or just 1.0? I got an invite to try 1.5 Pro which they said is supposed to be equivalent to 1.0 Ultra I think?

1.0 Ultra completely sucked but when I tried 1.5 it is actually quite close to GPT4.

It can handle most things as well as ChatGPT 4 and in some cases actually does not get stuck like GPT does.

I'd love to hear other peoples thoughts on Gemini 1.0 vs 1.5? Are you guys seeing the same thing?

I have developed a personal benchmark of 10 questions that resemble common tasks I'd like an AI to do (write some code, translate a PNG with text into usable content and then do operations on it, Work with a simple excel sheet and a few other tasks that are somewhat similar).

I recommend everyone else who is serious about evaluating these LLMs think of a series of things they feel an "AI" should be able to do and then prepare a series of questions. That way you have a common reference so you can quickly see any advancement (or lack of advancement)

GPT-4 kinda handles 7 of the 10. I say kinda because it also gets hung up on the 7th task(reading a game price chart PNG with an odd number of columns and boxes) depending on how you ask: They have improved over the last year slowly and steadily to reach this point.

Bard Failed all the tasks.

Gemini 1.0 failed all but 1.

Gemini 1.5 passed 6/10.

replies(2): >>a_wild+tn >>sema4h+pG

>>ben_w+N4
If you build a black box, and a bug that seems like it should have been caught in testing comes through, and there's limited documentation that the black box was programmed to do that, it makes me nervous.

Granted, stupid fun-sy public-facing image generation project.

But I'm more worried about the lack of transparency around the black box, and the internal adversarial testing that's being applied to it.

Google has an absolute right to build a model however they want -- but they should be able to proactively document how it functions, what it should and should not be used for, and any guardrails they put around it.

Is there anywhere that says "Given a prompt, Bard will attempt to deliver a racially and sexually diverse result set, and that will take precedence over historical facts"?

By all means, I support them building that model! But that's a pretty big 'if' that should be clearly documented.

replies(2): >>prepen+3s >>ben_w+tR

>>nebula+Rj
Gemini 1.0 Pro < Gemini 1.5 Pro < Gemini 1.0 Ultra < GPT-4V

GPT-4V is still the king. But Google's latest widely available offering (1.5 Pro) is close, if benchmarks indicate capability (questionable). Gemini's writing is evidently better, and vastly more so its context window.

replies(1): >>nebula+lp

>>a_wild+tn
Its nice to have some more potentially viable competition. Gemini has better OCR capabilities but its computation abilities seem to fall short....so I have it do the work with the OCR and then move the remainder of the work to GPT4 :)

>>ethbr1+Gm
> Google has an absolute right to build a model however they want

I don’t think anyone is arguing google doesn’t have the right. The argument is that google is incompetent and stupid for creating and releasing such a poor model.

replies(1): >>ethbr1+6T

>>Spooky+(OP)
Actually, the good guy in AI right now is Zuckerberg.

replies(1): >>api+lS2

>>ben_w+N4
It's specifically been trained to be, well, the best term is "woke" (despite the word's vagueness, LLMs mean you can actually have alignment towards very fuzzy ideas). They have started fixing things (e.g. it no longer changes between "would be an immense tragedy" and "that's a complex issue" depending on what ethnicity you talk about when asking whether it would be sad if that ethnicity went extinct), but I suspect they'll still end up a lot more biased than ChatGPT.

replies(1): >>ben_w+8o2

>>nebula+Rj
>a personal benchmark of 10 questions that resemble common tasks

That is an idea worth expanding on. Someone should develop a "standard" public list of 100 (or more) questions/tasks against which any AI version can be tested to see what the program's current "score" is (although some scoring might have to assign a subjective evaluation when pass/fail isn't clear).

replies(1): >>jprete+tu1

>>ethbr1+Gm
In general I agree with you, though I would add that Google doesn't have any kind of good reputation for documenting how their consumer facing tools work, and have been getting flak for years about perceived biases in their search results and spam filters.

>>prepen+3s
I try and call out my intent explicitly, because I hate when hot-button issues get talked past.

IMHO, there are distinct technical/documentation (does it?) and ethical (should it?) issues here.

Better to keep them separate when discussing.

>>sema4h+pG
Thats what a benchmark is, and they're all gamed by everyone training models, even if they don't intend to, because the benchmarks are in the training data.

The advantage of a personal set of questions is that you might be able to keep it out of the training set, if you don't publish it anywhere, and if you make sure cloud-accessed model providers aren't logging the conversations.

>>concor+zz
I think you win a prize for the first time someone has used "woke" when describing an issue to me, such that the vagueness of the term is not only acknowledged but also not a problem in its own right. Well done :)

>>Eisens+Tw
Also Mistral and a few others.