Plugins were a failure. GPTs are a little better, but I still don't see the product market fit. GPT-4 is still king, but not by that much any more. It's not even clear that they're doing great research, because they don't publish.
GPT-5 has to be incredibly good at this point, and I'm not sure that it will be.
Good.
I have no idea what's really going on inside that company but the way the staff were acting on twitter when Altman got the push was genuinely scary, major red flags, bad vibes, you name it, it reeked of it.
Idk, I just tried Gemini Ultra and it's so much worse than GPT4 that I am actually quite shocked. Trying to ask it any kind of coding question ends up being this frustrating and honestly bizarre waste of time as it hallucinates a whole new language syntax every time and then asks if you want to continue with non-working, in fact non-existing, option A or the equally non-existent option B until you realise that you've spent an hour trying to make it at least output something that is even in the requested language and finally that it is completely useless.
I'm actually pretty astonished at how far Google is behind and that they released such a bunch of worthless junk at all. And have the chutzpah to ask people to pay for it!
Of course I'm looking forward to gpt-5 but even if it's only a minor step up, they're still way ahead.
I was surprised and touched by their loyalty, but maybe I missed something you noticed.
edit: as pointed out, this was indeed a pretty esoteric example. But the rest of my attempts were hardly better, if they had a response at all.
The language in question was only open sourced after GPT4's training date, so i couldn't compare. That's actually why I tried it in the first place. And yes, I do expect it to be better - GPT4 isn't perfect but I don't really it ever hallucinating quite that hard. In fact, its answer was basically that it didn't know.
And when I asked it questions with other, much less esoteric code like "how would you refactor this to be more idiomatic?" I'd get either "I couldn't complete your request. Rephrase your prompt and try again." or "Sorry, I can't help with that because there's too much data. Try again with less data." GPT-4 was helpful in both cases.
This isn’t a race to write the most lines of code or the most lines of text. It’s a race to write the most correct lines of code.
I’ll wait half an hour for a response if I know I’m getting at least staff engineer level tier of code for every question
Sufficiently accurate responses can be fed into other systems downstream and cleaned up. Even code responses can benefit from this by restricting output tokens using the grammar of the target language, or iterating until the code compiles successfully.
And for a decent number of LLM-enabled use cases the functionality unlocked by these models is novel. When you're going from 0 to 1 people will just be amazed that the product exists.
As the popularity has exploded, and ethical questions have become increasingly relevant, it is probably worth taking some time to nail certain aspects down before releasing everything to the public for the sake of being first.
But so far nobody is even in the same ballpark. And not just freely distributed models, but proprietary ones backed by big money, as well.
It really makes one wonder what kind of secret sauce OpenAI has. Surely it can't just be all that compute that Microsoft bought them, since Google could easily match that, and yet...
It is practically unusable and I'll likely cancel paid plan soon.
It's magic, until it isn't.
If ChatGPT doesn't have product-market fit, what actually has?
Here are some hilarious highlights: https://twitter.com/Suhail/status/1757573182138290284
I don't think it's hugely surprising given the massive hype. No doubt OpenAI are doing impressive things, but it's normal for the market to over value it initially as everyone tries to get onboard, and then for it to fall back to a more sensible level.
Personally, the chat UI is the main limiting factor in my own adoption, because a) it’s not in the tool I’m trying to use, and b) it’s quicker for me to do the work than describe the work I need doing.
They were loyal to money, nothing to be touched by.
I'm not particularly interested in having it outright program for me (other than say to sketch how to do something as inspiration, which I'll rewrite rather than copy) because I think typically I'd want to do it a certain way and it would take far longer to NLP an LLM to write it in whatever syntax than to WhateverSyntaxProgram it myself.
I could understand the sentiment when you think that OpenAI is really doubling down just on LLMs recently, and forgoing a ton of research in other fronts.
They’re rapidly iterating though, and it’s refreshing to see them try a bunch of new things so quickly while every other company is comparatively slow to release anything.
Also, all the evidence is in this thread. Clearly people unhappy with wasting time on LLMs, when the time that was wasted was the result of obviously bad output.
For lots of applications the speed/quality/price trade offs make a lot of sense.
For example if you are doing vanilla question answering over lots of documents then 3.5 or Mixtral are better than GPT4 because the speed is important.
I love using the smaller models like Starling LM 7B and Mistral 7B have been enough for many tasks like you mentioned.
Yeah putting people out of work on an industrial scale is probably gonna have a pretty big effect on global GDP
Edit: I forgot, NASA trained astronaut!
Add to that a company environment that seems to be built on money-crazed stock option piling engineers and a CEO that seems to have gotten power-crazed.. I mean they grew far too fast I guess..
Transformers were invented with the support of Google (by the researchers, not by Google).
Open community has been creating better and better models with a group effort; like how ML works itself, it's way easier to try 100,000 ideas on a small scale than it is to try a couple of ideas on a large scale.
Initially it felt like the singularity was at hand. You've played with it, got to know it, the computer was taking to you, it was your friend, it was exciting then you got bored with your new friend and it wasn't as great as you remember it.
Dating is often like this. You meet someone, have some amazing intimacy, then you get really get to know someone, you work out it wasn't for you and it's time to move on.
If it's a conversation with "format this loose data into XML" repeated several times and then a "now format it to JSON" I find often it has trouble determing that what you just asked for is the most important; I think the attention model gets confused by all the preceding text.
I've been anti-Google for a while now so I'm not biased.
I don't think openAI have this sown up.
For some advanced reasoning you're 100% right, but many times you're doing document conversion, summarizing, doing RAG, in all these cases GPT 3.5 performs as good if not better than GPT 4 (we can't ignore cost and speed) and it's very hard to distinguish between the two.
Multi languages would be so useful for me.
I see how most people would prefer a better but slower model when price is equal, but I'm sure many prefer a worse $2/mo model over a better $20/mo model.
Though the TTS side has some trouble switching languages if only single words are embedded. A single German word inside an English sentence can really get butchered. More training needed on multilingual texts (and perhaps preserving italics). But anyways this is really only an issue for early language learning applications in my experience.
Altman saga, allowing military use and other small things step by step tarnish your reputation and pushes you to the mediocrity or worse.
Microsoft has many great development stories (read Raymond Chen's blog to be awed), but what they did at the end to other competitors and how they behave removed their luster, permanently for some people.
But used as autocomplete, it's definitively a time saver. Most of us read faster than we type.
People say that, but I don't get this line of reasoning. There was something new, I learned to work with it. At one point I knew what question to ask to get the answer I want and have been using that form ever since.
Nowadays I don't get the answer I want for the same input. How is that not a result of declining quality?
That on top of my own experiences, and heaps of anecdotes over the last year.
> How would they honestly be getting worse?
The models behind GPT-4 (which is rumored to be a mixture model)? Tuning, RLHF (which has long been demonstrated to dumb the model down). The GPT-4, as in the thing that produces responses you get through API? Caching, load-balancing, whatever other tricks they do to keep the costs down and availability up, to cope with the growth of the number of requests.
--
[0] - >>39361705
That would actually increase their standing in my eyes.
Not too far from where I live, Russian bombing is destroying homes of people whose language is similar to mine and whose "fault" is that they don't want to submit to rule from Moscow, direct or indirect.
If OpenAI can somehow help stop that, I am all for it.
I got some bad news for you then.
And, according to UN, Turkey has used AI powered, autonomous littering drones to hit military convoys in Libya [1].
Regardless of us vs. them, AI shouldn't be a part of warfare, IMHO.
[0]: https://www.theguardian.com/world/2023/dec/01/the-gospel-how...
[1]: https://www.voanews.com/a/africa_possible-first-use-ai-armed...
In my experience, stopping to talk even for a moment already makes it submit. This makes a real conversation with pauses for thought difficult, because of the need to hurry before it cuts off.
> Personally, the chat UI is the main limiting factor in my own adoption, because a) it’s not in the tool I’m trying to use, [...]
though I haven't tried it through some combination of it the effort to set it up & it not particularly appealing to me anyway. The best it could possibly be would be like pair programming (back seat) with someone who does things the same way as you, and reviewing their code. I read faster than I type, but probably don't review non-trivial code faster than I type it. (That's not a brag, I just mean I think it's harder and takes longer to reason about something you haven't written, to understand it, and be confident you're not missing anything or haven't (both) failed to consider xyz.)
I am not saying this is anything but it's definetely tingling my "something's up" senses.
Nor should nuclear weapons, guns, knives, or cudgels.
But we don’t have a way to stop them being used.
- NPR: https://www.npr.org/2021/06/01/1002196245/a-u-n-report-suggests-libya-saw-the-first-battlefield-killing-by-an-autonomous-d
- Lieber Institute: https://lieber.westpoint.edu/kargu-2-autonomous-attack-drone-legal-ethical/
- ICRC: https://casebook.icrc.org/case-study/libya-use-lethal-autonomous-weapon-systems
- UN report itself (Search for Kargu): https://undocs.org/Home/Mobile?FinalSymbol=S%2F2021%2F229&Language=E&DeviceType=Desktop&LangRequested=False
- Kargu itself: https://www.stm.com.tr/en/kargu-autonomous-tactical-multi-rotor-attack-uav
From my experience, Turkish military doesn't like to talk about all the things they have.- price per thing you use it with matters (a lot)
- making sure that under no circumstances are the involved information leaked (included being trained on) matters a lot in many use cases, while OpenAI does by now have supports that the degree of you being able to enforce it is not enough for some use cases. In some cases this is a hard constraint due to legal regulations.
- geo politics matters, sometimes. Being dependent on a US service is sometimes a no go (using self hosted US software is most times fine, tho). Even if you only operate in the EU.
- it's much easier to domain adapt if the model is source/weight accessible in a reasonable degree, while GPT-4 has a fine tuning API it's much much less powerful a direct consequence of the highly proprietary nature of GPT-4
- a lot of companies are not happy at all if they become highly reliable on a single service which can change at any time in how it acts, the pricing model or it being available in your country at all. So basing your product on a less powerful but in turn replaceable or open source AI can be a good idea, especially if you are based in a country not at best terms with the US.
- do you trust Sam Altman at all? I do not and it seem short sighted to do so. In which case some of the points above become more relevant
- 3.5 level especially in combination with domain adoption can be "good enough" for some use cases
> Nowadays I don't get the answer I want for the same input. How is that not a result of declining quality?
Is it really the same input? An argument could easily be made that as you’ve gotten accustomed to ChatGPT, you ask harder questions, use less descriptive of language, etc.
Just imagine what valuation OpenAI would have as a grid monopolist combined with nVidia, ARM, Intel and AMD! Hundreds of trillions of dollars!
If that's the foundation your luster is built on - then it's not really ridiculous.
GPT popularized LLMs to the world with GPT-3, not too long before GPT-4 came out. They made a lot of big, cool changes shortly after GPT-4 - and everyone in their mother announced LLM projects and integrations in that time.
It's been about 9 months now, and not a whole lot has happened in the space.
It's almost as if the law of diminishing returns has kicked in.
My guess is it isnt, these systems are hard to trust, and the rhetoric "were aiming for AGI" suggests to me that they know this and AGI might be the only surefire way out.
If you tried to replace all of a devs duties with current LLMs it would be a disaster, making sense of all that info requires focus and background thinking processes simulataneously which i dont believe we have yet.
I don't think so. In order to be virtuous, one should have some skin in the game. I would respect dedicated pacifists in Kyiv a lot more. I wouldn't agree with them, but at least they would be ready to face pretty stark consequences of their philosophical belief.
Living in the Silicon Valley and proclaiming yourself virtuous pacifist comes at negligible personal cost.
I will check out the links. Thanks a lot.
My experience is limited. I got it to berate me with a jailbreak. I asked it to do so, so the onus is on me to be able to handle the response.
I'm trying to think of unethical things it can do that are not in the realm of "you asked it for that information, just as you would have searched on Google", but I can only think of things like "how to make a bomb", suicide related instructions, etc which I would place in the "sharp knife" category. One has to be able to handle it before using it.
It's been increasingly giving the canned "As an AI language model ..." response for stuff that's not even unethical, just dicey, for example.
Note well: I haven't actually used it myself, so I'm speculating (guessing) rather than saying that this is how it is.
Talking to corporate HR is subjectively worse for most people, and objectively worse in many cases.
Google DeepMind is still an AI research powerhouse that is producing a ton of innovation both internal and publicly published.
The second that this tech was developed it became literally impossible to stop this from happening. It was a totally foreseeable consequence, but the researchers involved didn't care because they wanted to be successful and figured they could just try to blame others for the consequences of their actions.
The same nonsense happened with Apple, where like a month after they first released Apple Watch people were yelling "What's next???!!!! Apple is dying without Steve Jobs!"
Such an absurdly reductive take. Or how about just like nuclear energy and knives, they are incredibly useful, society advancing tools that can also be used to cause harm. It's not as if AI can only be used for warfare. And like pretty much every technology, it ends up being used 99.9% for good, and 0.1% for evil.
GPTs are also pretty good, and being able to invoke them in regular chat is also handy, but the lack of monetization and the ability to easily surface them outside of chatgpt is also kind of a problem. These problems are more fixable than the plugin issue IMO since I think the architecture of plugins is a limiting factor.
Miqu is pretty good. Sure, it's a leak...but there's nothing special there. It's just a 70b llama2 finetune.
I don't have logs detailed enough to be able to look it up, so I can't prove it. But for me learning to work with AI tools like ChatGPT consists specifically developing an intuition of what kind of answer to expect.
Maybe my intuition skewed a little over the months. It did not do that for open source models though. As a software developer understanding and knowing what to expect from a complex system is basically my profession. Not just the systems I build, maintain and integrate, but also the systems I use to get information, like search engines. Prompt engineering is just a new iteration of google-fu.
Since this intuition has not failed me in all those other areas and since OpenAI has an incentive to change the workings under the hood (cutting costs, adding barriers to keep it politically correct) and it is a closed source system that no-one from the outside can inspect, my bet is that it is them and not me.
If we cared about preventing LLMs from being used for violence, we would have poured more than a tiny fraction our resources into safety/alignment research. We did not. Ergo, we don't care, we just want people to think we care.
I don't have any real issue with using LLMs for military purposes. It was always going to happen.
Overall a chatbot like GPT-4 may be useful, but not that useful as it stands.
If you can write well, it's not really going to improve your writing. Granted, you can automate a few tasks, but it does not give you 10X or even 2X improvement as sometimes advertised.
It might be useful here and there for coding, but it's not reliable.
Any unionising effort consists of employees convincing other employees to join them. Some people will care more about the union's goals than others, and you can be certain that those who care more will pester those that care less to join their cause.
What happened at OpenAI was not a union effort, but I believe the comparison is excellent to understand normal dynamics of employee-based efforts.
To me it feels like it detects if the answer could be answered cheaper by code interpreter model or 4 Turbo and then it offloads them to that and they just kinda suck compared to OG 4.
I’ve watched it fumble and fail to solve a problem with CI, took it 3 attempts over 5 minutes real time and just gave up in the end, a problem that OG 4 can do one shot no preamble.
We may lack the motivation and agreement to ban particular methods of warfare, but the means to enforce that ban exists, and drastically reduces their use.
Watching tools decline is frustrating.
Unfortunately, no deep piles of gold without deep piles of corpses. It is inevitable, though. Prompted by the US military, other countries have also always pioneered or acquired advance tech, and I don't see why AI would be any different: Never send a human to do a machine's job is as ominous now as it is dystopian as machines increasingly become more human-like.
Do we, though? Sometimes, against smaller misbehaving players. Note that it doesn't necessarily stop them (Iran, North Korea), even though it makes their international position somewhat complicated.
Against the big players (the US, Russia, China), "threat of warfare and prosecution" does not really work to enforce anything. Russia rains death on Ukrainian cities every night, or attempts to do so while being stopped by AA. Meanwhile, Russian oil and gas are still being traded, including in EU.
In LLMs it’s even worse. To make it concrete, for how I use LLMs I will not only not pay for anything with less capability than GPT4, I won’t even use it for free. It could be that other LLMs could perform well on narrow problems after fine tuning, but even then I’d prefer the model with the highest metrics, not the lowest inference cost.
The todo comments can be prompted against, just tell it to always include complete runnable code as its output will executed in a sandbox without prior verification.
Keep in mind GPT-3.5 is not an overnight craze. It takes months before normal people even know what it is.
Deepfakes are going to become a concern of everyday life whether you stop OpenAI from generating them or not. The cat is out of the proverbial bag. We as a society need to adjust to treating this sort of content skeptically, and I see no more appropriate way than letting a bunch of fake celebrity porn circulate.
What scares me about deepfakes is not the porn, it's the scams. The scams can actually destroy lives. We need to start ratcheting up social skepticism asap.
To the general public sure but not research which is what produces the models.
The idea that diminishing returns has hit because there hasn't been a new SOTA model in 9 months is ridiculous. Models take months just to train. Open AI sat on 4 for over half a year after training was done just red-teaming it.
In reality you have to know the strengths and weaknesses of any tool, and small/fast LLM can do a tremendous amount within a fixed scope. The people at Mistral get this.
So the assertion that small models aren’t as good just isn’t correct. They are amazing at certain things, and are incredibly faster and cheaper than larger models.
LLM are not AGI, they are tools that have specific uses we are still discovering.
If you aren’t trying to optimize your accuracy to start with and just saying “I’ll run the most expensive thing and assume it is better” with zero evaluation you’re wasting money, time, and hurting the environment.
Also, I don’t even like running Mistral if I can avoid it - a lot of tasks can be done with a fine tune of BERT or DistilBERT. It takes more work but my custom BERT models way outperform GPT-4 on bounded tasks because I have highly curated training data.
Within specialized domains you just aren’t going to see GPT-4/5/6 performing on par with expert curated data.
It's not like the technology is going to disappear.
Convinced hey do it on purpose.
Ok, I’m going to call b/s here unless your expectations of Google have not gone way down over the years. Google was night and day different results twenty years ago vs ten years ago vs today. If 2004 Google search was a “10 out of 10”, then 2014 it was an “8 out of 10”, and today barely breaks a “5” in quality of results in comparison and don’t even bother with the advanced query syntax you could’ve used in the 00’s, they flat ignore it now.
(Also, side note, reread what you said in this post again. Just a friendly note that the overall tone comes across a certain way you might not have intended)
People don't participate in murder and they think others shouldn't either.
People don't participate in wars (which are essentially large scale murder) and they think others shouldn't.
Murder happens anyway. War happens anyway.
Yet if someone says 'war bad' people jump and say 'virtue signaling', but no one does that when people say 'murder bad'.
There's some really weird moral entanglement happening in the minds of people that are so eager to call out virtue signaling.
The specific policies of OpenAI or Google or whatnot are irrelevant. The technology is out of the bag.
You can easily talk while you’re doing something else.
It may be that you're expecting it to do too much at once. Try giving smaller requests.