zlacker

[parent] [thread] 192 comments
1. skepti+(OP)[view] [source] 2024-02-14 02:35:28
Frankly, OpenAI seems to be losing its luster, and fast.

Plugins were a failure. GPTs are a little better, but I still don't see the product market fit. GPT-4 is still king, but not by that much any more. It's not even clear that they're doing great research, because they don't publish.

GPT-5 has to be incredibly good at this point, and I'm not sure that it will be.

replies(25): >>rey092+de >>154573+Ue >>roody1+Sf >>sho+Wi >>spacem+Lm >>al_bor+Po >>jester+pp >>mratsi+ev >>daniel+yv >>remus+Bw >>_giorg+Fx >>danpal+8y >>jack_r+tA >>vineya+XB >>fennec+1H >>natebe+fI >>sesm+uY >>penjel+OZ >>hn_thr+l81 >>Curiou+Nb1 >>weebul+yg1 >>greeni+ts1 >>Chicag+0v1 >>drumtt+wB4 >>CarlsJ+jr5
2. rey092+de[view] [source] 2024-02-14 04:43:26
>>skepti+(OP)
until you get new architectures it's all gonna be big datasets and 7 trillions
3. 154573+Ue[view] [source] 2024-02-14 04:50:11
>>skepti+(OP)
> Frankly, OpenAI seems to be losing its luster, and fast.

Good.

I have no idea what's really going on inside that company but the way the staff were acting on twitter when Altman got the push was genuinely scary, major red flags, bad vibes, you name it, it reeked of it.

replies(4): >>rendal+ij >>osigur+ol >>Ringz+oI >>msp26+ML
4. roody1+Sf[view] [source] 2024-02-14 04:59:52
>>skepti+(OP)
Running Ollama with a 80gb mistral model works as well if not better than ChatGPT 3.5. This is a good thing for the world IMO as the magic is no longer held just OpenAI. The speed at which competitors have caught up in even the last 3 months is astounding.
replies(2): >>huyter+5h >>oschvr+Mh
◧◩
5. huyter+5h[view] [source] [discussion] 2024-02-14 05:16:30
>>roody1+Sf
But no one cares about 3.5. It’s an order of magnitude worse than 4. An order of magnitude is a lot harder to catch up with.
replies(6): >>roody1+ih >>sjwhev+il >>danpal+0y >>nl+WC >>epolan+7J >>dathin+MU
◧◩◪
6. roody1+ih[view] [source] [discussion] 2024-02-14 05:19:13
>>huyter+5h
Yeah but for how long… at this rate I would expect some of the freely distributed models to hit gpt4 levels in as little as 3-6 months.
replies(2): >>int_19+0p >>huyter+pM2
◧◩
7. oschvr+Mh[view] [source] [discussion] 2024-02-14 05:24:17
>>roody1+Sf
Could you elaborate on how to do this?
8. sho+Wi[view] [source] 2024-02-14 05:39:03
>>skepti+(OP)
> GPT-4 is still king, but not by that much any more

Idk, I just tried Gemini Ultra and it's so much worse than GPT4 that I am actually quite shocked. Trying to ask it any kind of coding question ends up being this frustrating and honestly bizarre waste of time as it hallucinates a whole new language syntax every time and then asks if you want to continue with non-working, in fact non-existing, option A or the equally non-existent option B until you realise that you've spent an hour trying to make it at least output something that is even in the requested language and finally that it is completely useless.

I'm actually pretty astonished at how far Google is behind and that they released such a bunch of worthless junk at all. And have the chutzpah to ask people to pay for it!

Of course I'm looking forward to gpt-5 but even if it's only a minor step up, they're still way ahead.

replies(5): >>pb7+mj >>mad_to+Uw >>dieort+LC >>TeMPOr+3F >>Keyfra+mX
◧◩
9. rendal+ij[view] [source] [discussion] 2024-02-14 05:41:32
>>154573+Ue
What do you mean? How were they acting?

I was surprised and touched by their loyalty, but maybe I missed something you noticed.

replies(3): >>doktri+ht >>Jensso+9B >>Sharli+961
◧◩
10. pb7+mj[view] [source] [discussion] 2024-02-14 05:41:52
>>sho+Wi
Do you have example links?
replies(1): >>sho+Gj
◧◩◪
11. sho+Gj[view] [source] [discussion] 2024-02-14 05:47:20
>>pb7+mj
here was one of them https://gemini.google.com/share/fde31202b221?hl=en

edit: as pointed out, this was indeed a pretty esoteric example. But the rest of my attempts were hardly better, if they had a response at all.

replies(1): >>peddli+yk
◧◩◪◨
12. peddli+yk[view] [source] [discussion] 2024-02-14 05:58:56
>>sho+Gj
That’s an awfully specific and esoteric question. Would you expect gpt4 to be significantly better at that level of depth? That’s not been my experience.
replies(1): >>sho+pl
◧◩◪
13. sjwhev+il[view] [source] [discussion] 2024-02-14 06:10:42
>>huyter+5h
What Mistral has though is speed, and with speed comes scale.
replies(2): >>spacem+Tm >>huyter+EF1
◧◩
14. osigur+ol[view] [source] [discussion] 2024-02-14 06:12:45
>>154573+Ue
It lost a little of its cool factor. However, they provide a nearly essential service at this point. While it is easy to underestimate, I suspect this is already have a measurable impact on global GDP.
replies(1): >>154573+kG
◧◩◪◨⬒
15. sho+pl[view] [source] [discussion] 2024-02-14 06:12:47
>>peddli+yk
OK, i have to admit that one was a little odd, I was beginning to give up and trying new angles. I can't really share my other sessions. But I was trying to get a handle on the language and thought it would be an easily-understood situation (multiple-token auth). I would have at least expected the response to be slightly valid.

The language in question was only open sourced after GPT4's training date, so i couldn't compare. That's actually why I tried it in the first place. And yes, I do expect it to be better - GPT4 isn't perfect but I don't really it ever hallucinating quite that hard. In fact, its answer was basically that it didn't know.

And when I asked it questions with other, much less esoteric code like "how would you refactor this to be more idiomatic?" I'd get either "I couldn't complete your request. Rephrase your prompt and try again." or "Sorry, I can't help with that because there's too much data. Try again with less data." GPT-4 was helpful in both cases.

replies(1): >>peddli+lr
16. spacem+Lm[view] [source] 2024-02-14 06:29:14
>>skepti+(OP)
Custom GPTs like Grimoire or Cursor loaded on your repo are miles ahead of the competition for coding tasks at least.
replies(1): >>clbrmb+qK
◧◩◪◨
17. spacem+Tm[view] [source] [discussion] 2024-02-14 06:31:21
>>sjwhev+il
Who cares about speed if you’re wrong?

This isn’t a race to write the most lines of code or the most lines of text. It’s a race to write the most correct lines of code.

I’ll wait half an hour for a response if I know I’m getting at least staff engineer level tier of code for every question

replies(4): >>popinm+Ao >>sjwhev+fv >>ein0p+lx >>dathin+cX
◧◩◪◨⬒
18. popinm+Ao[view] [source] [discussion] 2024-02-14 06:52:08
>>spacem+Tm
For the tasks my group is considering, even a 7B model is adequate.

Sufficiently accurate responses can be fed into other systems downstream and cleaned up. Even code responses can benefit from this by restricting output tokens using the grammar of the target language, or iterating until the code compiles successfully.

And for a decent number of LLM-enabled use cases the functionality unlocked by these models is novel. When you're going from 0 to 1 people will just be amazed that the product exists.

19. al_bor+Po[view] [source] 2024-02-14 06:56:06
>>skepti+(OP)
I know things keep moving faster and faster, especially in this space, but GPT-4 is less than a year old. Claiming they are losing their luster, because they aren’t shaking the earth with new models every quarter, seems a little ridiculous.

As the popularity has exploded, and ethical questions have become increasingly relevant, it is probably worth taking some time to nail certain aspects down before releasing everything to the public for the sake of being first.

replies(7): >>phreez+Hw >>l33tma+HG >>bayind+ZK >>onlyre+lY >>optymi+d41 >>Animal+k51 >>NBJack+Wx1
◧◩◪◨
20. int_19+0p[view] [source] [discussion] 2024-02-14 06:58:21
>>roody1+ih
I've heard claims like that 6 months ago.

But so far nobody is even in the same ballpark. And not just freely distributed models, but proprietary ones backed by big money, as well.

It really makes one wonder what kind of secret sauce OpenAI has. Surely it can't just be all that compute that Microsoft bought them, since Google could easily match that, and yet...

replies(1): >>qetern+0c1
21. jester+pp[view] [source] 2024-02-14 07:03:14
>>skepti+(OP)
Perhaps just me, but responses are way worse than it was few months ago. Now the system just makes shit up and says "Yes you are right" when you catch it on BS.

It is practically unusable and I'll likely cancel paid plan soon.

replies(1): >>Chinju+Jj1
◧◩◪◨⬒⬓
22. peddli+lr[view] [source] [discussion] 2024-02-14 07:23:35
>>sho+pl
My experience has been that gpt4 will happily hallucinate the details when I go too deep. Like you mentioned, it will invent new syntax and function calls.

It's magic, until it isn't.

◧◩◪
23. doktri+ht[view] [source] [discussion] 2024-02-14 07:46:12
>>rendal+ij
Like a cult reciting their vows of allegiance.
replies(1): >>gkbrk+dB
24. mratsi+ev[view] [source] 2024-02-14 08:10:03
>>skepti+(OP)
> GPTs are a little better, but I still don't see the product market fit.

If ChatGPT doesn't have product-market fit, what actually has?

replies(1): >>clbrmb+UJ
◧◩◪◨⬒
25. sjwhev+fv[view] [source] [discussion] 2024-02-14 08:10:13
>>spacem+Tm
Who says it’s wrong? I have very discrete tasks which involve resolving linguistic ambiguity and they can perform very well.
replies(1): >>mlnj+rE
26. daniel+yv[view] [source] 2024-02-14 08:13:02
>>skepti+(OP)
Googlers are wishing OpenAI could vanish as it makes them look like the IBM-lookalike they are.

Here are some hilarious highlights: https://twitter.com/Suhail/status/1757573182138290284

replies(2): >>OJFord+hB >>lordsw+K51
27. remus+Bw[view] [source] 2024-02-14 08:25:42
>>skepti+(OP)
> Frankly, OpenAI seems to be losing its luster, and fast.

I don't think it's hugely surprising given the massive hype. No doubt OpenAI are doing impressive things, but it's normal for the market to over value it initially as everyone tries to get onboard, and then for it to fall back to a more sensible level.

◧◩
28. phreez+Hw[view] [source] [discussion] 2024-02-14 08:26:46
>>al_bor+Po
Given how fast the valuation of the company and the scope of their ambition (e.g. raising a trillion dollars for chip manufacturing) has been hyped up, I think it's fair to say "You live by the hype, you die by the hype."
replies(2): >>hef198+Pz >>bamboo+RG
◧◩
29. mad_to+Uw[view] [source] [discussion] 2024-02-14 08:28:55
>>sho+Wi
That's interesting, because I have had exactly the opposite experience testing GPT vs Bard with coding questions. Bard/Gemini far outperformed GPT on coding, especially with newer languages or libraries. Whereas GPT was better with more general questions.
◧◩◪◨⬒
30. ein0p+lx[view] [source] [discussion] 2024-02-14 08:33:18
>>spacem+Tm
That’s the correct answer. Years ago I worked on inference efficiency on edge hardware at a startup. Time after time I saw that users vastly prefer slower, but more accurate and robust systems. Put succinctly: nobody cares how quick a model is if it doesn’t do a good job. Another thing I discovered is it can be very difficult to convince software engineers of this obvious fact.
replies(3): >>spacec+9C >>Al-Khw+WJ >>sjwhev+bs2
31. _giorg+Fx[view] [source] 2024-02-14 08:36:28
>>skepti+(OP)
Where is Ilya?! (Sutskever)
◧◩◪
32. danpal+0y[view] [source] [discussion] 2024-02-14 08:39:50
>>huyter+5h
Many products don’t expose chat directly to the user. For example auto categorisation of my bank transactions does not need GPT-4, and small model with a little fine tuning will do well, and massively outperform any other classification. There are many problems like this.
33. danpal+8y[view] [source] 2024-02-14 08:41:02
>>skepti+(OP)
I think OpenAI will do fine, but I have doubts about ChatGPT as a product. It’s just a chat UI, and I’m not convinced the UI will be chat 3 years from now.

Personally, the chat UI is the main limiting factor in my own adoption, because a) it’s not in the tool I’m trying to use, and b) it’s quicker for me to do the work than describe the work I need doing.

replies(2): >>OJFord+KB >>dgello+mD
◧◩◪
34. hef198+Pz[view] [source] [discussion] 2024-02-14 08:55:35
>>phreez+Hw
Just time your exit correctly!
replies(2): >>devout+LF >>vonjui+3M1
35. jack_r+tA[view] [source] 2024-02-14 09:00:55
>>skepti+(OP)
Nonsense. Anyone who regularly uses the top models knows that GPT-4 still leads by a clear margin
replies(1): >>LightB+jH
◧◩◪
36. Jensso+9B[view] [source] [discussion] 2024-02-14 09:08:13
>>rendal+ij
> I was surprised and touched by their loyalty

They were loyal to money, nothing to be touched by.

◧◩◪◨
37. gkbrk+dB[view] [source] [discussion] 2024-02-14 09:08:21
>>doktri+ht
Literally reciting too. To the point of copy-pasting the same tweets.
◧◩
38. OJFord+hB[view] [source] [discussion] 2024-02-14 09:08:52
>>daniel+yv
I've had plenty of dumb policy violation misfires like that with ChatGPT, and got banned from Bing (which uses OpenAI API, not GPT4 at the time I think) for it the day it launched.
◧◩
39. OJFord+KB[view] [source] [discussion] 2024-02-14 09:13:58
>>danpal+8y
I suppose it depends what you use it for; my time in search engine has reduced massively - and so has time 'not in the tool I'm trying to use' because it's been so much faster for me to find answers to some queries with ChatGPT than a search engine.

I'm not particularly interested in having it outright program for me (other than say to sketch how to do something as inspiration, which I'll rewrite rather than copy) because I think typically I'd want to do it a certain way and it would take far longer to NLP an LLM to write it in whatever syntax than to WhateverSyntaxProgram it myself.

replies(1): >>vinter+5L
40. vineya+XB[view] [source] 2024-02-14 09:16:39
>>skepti+(OP)
Interesting take, interesting reasons.

I could understand the sentiment when you think that OpenAI is really doubling down just on LLMs recently, and forgoing a ton of research in other fronts.

They’re rapidly iterating though, and it’s refreshing to see them try a bunch of new things so quickly while every other company is comparatively slow to release anything.

◧◩◪◨⬒⬓
41. spacec+9C[view] [source] [discussion] 2024-02-14 09:18:44
>>ein0p+lx
Having spent time on edge compute projects. This.

Also, all the evidence is in this thread. Clearly people unhappy with wasting time on LLMs, when the time that was wasted was the result of obviously bad output.

replies(1): >>sjwhev+Gr2
◧◩
42. dieort+LC[view] [source] [discussion] 2024-02-14 09:27:58
>>sho+Wi
I’ve had the opposite experience with Gemini, which was surprising. I feel like it lies less to me among other things
◧◩◪
43. nl+WC[view] [source] [discussion] 2024-02-14 09:30:36
>>huyter+5h
This isn't true. Lots of people care deeply and use 3.5 levels of performance at some point in their software stack.

For lots of applications the speed/quality/price trade offs make a lot of sense.

For example if you are doing vanilla question answering over lots of documents then 3.5 or Mixtral are better than GPT4 because the speed is important.

replies(1): >>huyter+BO5
◧◩
44. dgello+mD[view] [source] [discussion] 2024-02-14 09:36:12
>>danpal+8y
I interact with ChatGPT by voice pretty often, they have the best speech recognition I’ve ever seen. I can switch between languages (English, French, German) mid-sentence, think aloud, stop mid sentence, the correct what I just said, use highly technical terms (even describe code), I don’t even double check anymore because it’s almost always transcribed correctly. They can ~easily evolve the product to a more generalized conversation UX instead of just a text based chat.
replies(4): >>Lehere+hJ >>clbrmb+nK >>vwkd+YN >>danpal+cO
◧◩◪◨⬒⬓
45. mlnj+rE[view] [source] [discussion] 2024-02-14 09:47:25
>>sjwhev+fv
Exactly. Not everything is throwing large chunks of text to get complex questions answered.

I love using the smaller models like Starling LM 7B and Mistral 7B have been enough for many tasks like you mentioned.

◧◩
46. TeMPOr+3F[view] [source] [discussion] 2024-02-14 09:59:05
>>sho+Wi
They seem to be steadily dumbing down GPT-4; eventually, improving performance of open source models and decreasing performance of GPT-4 will meet in the middle.
replies(2): >>bamboo+5H >>fennec+bH
◧◩◪◨
47. devout+LF[view] [source] [discussion] 2024-02-14 10:08:15
>>hef198+Pz
"This year I invested in pumpkins. They've been going up the whole month of October, and I've got a feeling they're going to peak right around January and BANG! That's when I'll cash in!" -Homer Simpson
replies(1): >>hef198+vG
◧◩◪
48. 154573+kG[view] [source] [discussion] 2024-02-14 10:15:25
>>osigur+ol
> I suspect this is already have a measurable impact on global GDP.

Yeah putting people out of work on an industrial scale is probably gonna have a pretty big effect on global GDP

replies(1): >>osigur+GM2
◧◩◪◨⬒
49. hef198+vG[view] [source] [discussion] 2024-02-14 10:20:50
>>devout+LF
Homer obviously was smart, a nuclear scientist, car developer and Junior Vice President in his own tech start-up! So he should know!

Edit: I forgot, NASA trained astronaut!

◧◩
50. l33tma+HG[view] [source] [discussion] 2024-02-14 10:23:04
>>al_bor+Po
It sure is, but the theme in the sub-thread was about if OAI in particular can afford to do that (i.e. wait) while there are literally dozens of other companies and open-source projects showing they can solve a lot of the tasks GPT-4 does, for free, so that the OAI value proposition seems weaker and weaker by the month.

Add to that a company environment that seems to be built on money-crazed stock option piling engineers and a CEO that seems to have gotten power-crazed.. I mean they grew far too fast I guess..

◧◩◪
51. bamboo+RG[view] [source] [discussion] 2024-02-14 10:24:52
>>phreez+Hw
Beautifully said.
52. fennec+1H[view] [source] 2024-02-14 10:27:15
>>skepti+(OP)
I mean they just happened to train the biggest, most fine tuned model on the most data out of everyone I guess.

Transformers were invented with the support of Google (by the researchers, not by Google).

Open community has been creating better and better models with a group effort; like how ML works itself, it's way easier to try 100,000 ideas on a small scale than it is to try a couple of ideas on a large scale.

◧◩◪
53. bamboo+5H[view] [source] [discussion] 2024-02-14 10:28:11
>>TeMPOr+3F
I'm almost certain this is because you're getting use to chat bots. How would they honestly be getting worse?

Initially it felt like the singularity was at hand. You've played with it, got to know it, the computer was taking to you, it was your friend, it was exciting then you got bored with your new friend and it wasn't as great as you remember it.

Dating is often like this. You meet someone, have some amazing intimacy, then you get really get to know someone, you work out it wasn't for you and it's time to move on.

replies(5): >>clbrmb+vJ >>detour+YJ >>DJHenk+gL >>TeMPOr+BL >>whywhy+yl1
◧◩◪
54. fennec+bH[view] [source] [discussion] 2024-02-14 10:30:27
>>TeMPOr+3F
Yeah, I agree, GPT's attention seems much less focussed now. If you tell it to respond in a certain way it now has trouble figuring out what you want.

If it's a conversation with "format this loose data into XML" repeated several times and then a "now format it to JSON" I find often it has trouble determing that what you just asked for is the most important; I think the attention model gets confused by all the preceding text.

◧◩
55. LightB+jH[view] [source] [discussion] 2024-02-14 10:32:07
>>jack_r+tA
And yet, day to day, I'm using Bard/Gemini because, for most stuff, it's enough and sometimes clearer and better and the interface makes more sense.

I've been anti-Google for a while now so I'm not biased.

I don't think openAI have this sown up.

56. natebe+fI[view] [source] 2024-02-14 10:41:04
>>skepti+(OP)
Sam publicly asking for a 10x bigger power grid and 7 trillion dollars is a pretty clear sign that they're out of short to medium-term ideas other than "MOAR PARAMETERS".
replies(2): >>hef198+HX >>george+MY
◧◩
57. Ringz+oI[view] [source] [discussion] 2024-02-14 10:42:45
>>154573+Ue
It seems as though everyone at OpenAI is advised by an unfiltered ChatGPT in their daily work and communication. /s
◧◩◪
58. epolan+7J[view] [source] [discussion] 2024-02-14 10:50:37
>>huyter+5h
That really depends on the use case.

For some advanced reasoning you're 100% right, but many times you're doing document conversion, summarizing, doing RAG, in all these cases GPT 3.5 performs as good if not better than GPT 4 (we can't ignore cost and speed) and it's very hard to distinguish between the two.

replies(1): >>darkwa+AM
◧◩◪
59. Lehere+hJ[view] [source] [discussion] 2024-02-14 10:54:13
>>dgello+mD
If only something like that was available on Android. I cannot dictate messages as my phone is in English, but most of my messages are in German or French. Or it's almost impossible to search for a non-English song when driving.

Multi languages would be so useful for me.

◧◩◪◨
60. clbrmb+vJ[view] [source] [discussion] 2024-02-14 10:58:39
>>bamboo+5H
1. Cost & resource optimization

2. More and more RLHF

replies(1): >>bamboo+W11
◧◩
61. clbrmb+UJ[view] [source] [discussion] 2024-02-14 11:03:36
>>mratsi+ev
GP meant Custom GPTs. Confusing names for sure.
◧◩◪◨⬒⬓
62. Al-Khw+WJ[view] [source] [discussion] 2024-02-14 11:03:53
>>ein0p+lx
Less compute also means lower cost, though.

I see how most people would prefer a better but slower model when price is equal, but I'm sure many prefer a worse $2/mo model over a better $20/mo model.

replies(1): >>ein0p+tB1
◧◩◪◨
63. detour+YJ[view] [source] [discussion] 2024-02-14 11:04:05
>>bamboo+5H
Google search got worse.
replies(2): >>polsha+fP >>whywhy+4m1
◧◩◪
64. clbrmb+nK[view] [source] [discussion] 2024-02-14 11:11:47
>>dgello+mD
This. Whisper is phenomenal. Have you tried the conversational mode? I would love to be able to use that in a more customized agent. I know you can use the conversation mode with a custom GPT but I’d prefer to write dynamic prompts programmatically. Would be great for a generalized personal assistant that can take notes, send/read email, texts, etc. could be a good filter on social notifications?

Though the TTS side has some trouble switching languages if only single words are embedded. A single German word inside an English sentence can really get butchered. More training needed on multilingual texts (and perhaps preserving italics). But anyways this is really only an issue for early language learning applications in my experience.

replies(1): >>dgello+Cj6
◧◩
65. clbrmb+qK[view] [source] [discussion] 2024-02-14 11:12:08
>>spacem+Lm
How to load on your repo?
replies(1): >>spacem+KY7
◧◩
66. bayind+ZK[view] [source] [discussion] 2024-02-14 11:19:50
>>al_bor+Po
You don't lose your luster only by not innovating.

Altman saga, allowing military use and other small things step by step tarnish your reputation and pushes you to the mediocrity or worse.

Microsoft has many great development stories (read Raymond Chen's blog to be awed), but what they did at the end to other competitors and how they behave removed their luster, permanently for some people.

replies(3): >>inglor+6N >>denver+sS >>pixl97+Z31
◧◩◪
67. vinter+5L[view] [source] [discussion] 2024-02-14 11:20:26
>>OJFord+KB
Coding assistants copy your style to a fault. You got to be careful about things like typos in comments, or it'll start suggesting sloppy code as well. And conversely you have to be careful about overly bureaucratic conventions (doc comments for things entirely described by their name, etc.), or it will suggest overly wrapped hypercorporate code.

But used as autocomplete, it's definitively a time saver. Most of us read faster than we type.

replies(1): >>OJFord+rP
◧◩◪◨
68. DJHenk+gL[view] [source] [discussion] 2024-02-14 11:21:17
>>bamboo+5H
> I'm almost certain this is because you're getting use to chat bots. How would they honestly be getting worse?

People say that, but I don't get this line of reasoning. There was something new, I learned to work with it. At one point I knew what question to ask to get the answer I want and have been using that form ever since.

Nowadays I don't get the answer I want for the same input. How is that not a result of declining quality?

replies(2): >>omega3+ES >>jsjohn+pW
◧◩◪◨
69. TeMPOr+BL[view] [source] [discussion] 2024-02-14 11:24:39
>>bamboo+5H
The author of `aider` - an OSS GPT-powered coding assistant - is on HN, and says[0] he has benchmarks showing gradual decline in quality of GPT-4-Turbo, especially wrt. "lazy coding" - i.e. actually completing a coding request, vs. peppering it with " ... write this yourself ... " comments.

That on top of my own experiences, and heaps of anecdotes over the last year.

> How would they honestly be getting worse?

The models behind GPT-4 (which is rumored to be a mixture model)? Tuning, RLHF (which has long been demonstrated to dumb the model down). The GPT-4, as in the thing that produces responses you get through API? Caching, load-balancing, whatever other tricks they do to keep the costs down and availability up, to cope with the growth of the number of requests.

--

[0] - >>39361705

◧◩
70. msp26+ML[view] [source] [discussion] 2024-02-14 11:26:35
>>154573+Ue
For me it was Ilya burning a wooden effigy that represented 'unaligned' AI. Of course the firing and twitter stuff too. Something's fucked in this company for sure.
◧◩◪◨
71. darkwa+AM[view] [source] [discussion] 2024-02-14 11:38:10
>>epolan+7J
I would dare to say that in general most people need every day help on more simple tasks rather than complex reasoning. Now obviously, if you get complex reasoning at the same speed and cost of simpler tasks, it's a no-brainer. But if there are trade-offs...
◧◩◪
72. inglor+6N[view] [source] [discussion] 2024-02-14 11:42:36
>>bayind+ZK
"allowing military use"

That would actually increase their standing in my eyes.

Not too far from where I live, Russian bombing is destroying homes of people whose language is similar to mine and whose "fault" is that they don't want to submit to rule from Moscow, direct or indirect.

If OpenAI can somehow help stop that, I am all for it.

replies(4): >>WhrRTh+PN >>bayind+UN >>ronhav+PP >>stcroi+Eb1
◧◩◪◨
73. WhrRTh+PN[view] [source] [discussion] 2024-02-14 11:48:26
>>inglor+6N
>If OpenAI can somehow help stop that, I am all for it.

I got some bad news for you then.

◧◩◪◨
74. bayind+UN[view] [source] [discussion] 2024-02-14 11:48:43
>>inglor+6N
On the other hand, Israel is using AI to generate their bombing targets and pound Gaza strip with bombs non-stop [0].

And, according to UN, Turkey has used AI powered, autonomous littering drones to hit military convoys in Libya [1].

Regardless of us vs. them, AI shouldn't be a part of warfare, IMHO.

[0]: https://www.theguardian.com/world/2023/dec/01/the-gospel-how...

[1]: https://www.voanews.com/a/africa_possible-first-use-ai-armed...

replies(4): >>dizhn+qQ >>kj99+tS >>sambul+Oj1 >>Increa+fp1
◧◩◪
75. vwkd+YN[view] [source] [discussion] 2024-02-14 11:49:04
>>dgello+mD
Do you use the voice chat in the ChatGPT app?

In my experience, stopping to talk even for a moment already makes it submit. This makes a real conversation with pauses for thought difficult, because of the need to hurry before it cuts off.

replies(2): >>killth+WQ1 >>dgello+9i6
◧◩◪
76. danpal+cO[view] [source] [discussion] 2024-02-14 11:50:48
>>dgello+mD
For me, voice is just a different UX for the same underlying model of chat. I'm sure it's good, but I'm not going to sit at my computer talking to it, and in fact I think talking may be a worse signal to noise ratio than typing, as I can easily use shortcuts with written text.
replies(1): >>dgello+Di6
◧◩◪◨⬒
77. polsha+fP[view] [source] [discussion] 2024-02-14 12:01:23
>>detour+YJ
And Amazon search, youtube search. There do seem to be somewhat different incentives involved though, those examples are primarily about increasingly pushing lower quality content (ads, more profitable items, more engaging items) because it makes more money.
replies(1): >>detour+bU
◧◩◪◨
78. OJFord+rP[view] [source] [discussion] 2024-02-14 12:02:40
>>vinter+5L
I assumed that was not what we were talking about, because I replied to:

> Personally, the chat UI is the main limiting factor in my own adoption, because a) it’s not in the tool I’m trying to use, [...]

though I haven't tried it through some combination of it the effort to set it up & it not particularly appealing to me anyway. The best it could possibly be would be like pair programming (back seat) with someone who does things the same way as you, and reviewing their code. I read faster than I type, but probably don't review non-trivial code faster than I type it. (That's not a brag, I just mean I think it's harder and takes longer to reason about something you haven't written, to understand it, and be confident you're not missing anything or haven't (both) failed to consider xyz.)

◧◩◪◨
79. ronhav+PP[view] [source] [discussion] 2024-02-14 12:05:42
>>inglor+6N
Yep. AI is, and will be used militarily.

These virtue signaling games are childish.

replies(1): >>bernie+TT
◧◩◪◨⬒
80. dizhn+qQ[view] [source] [discussion] 2024-02-14 12:12:54
>>bayind+UN
I would be very surprised if Turkey was capable of doing that. If they did, that's all Erdoğan would be talking about. Also it's a bit weird that the linked article's source is a Turkish name. (Economy and theology major too)

I am not saying this is anything but it's definetely tingling my "something's up" senses.

replies(1): >>bayind+CU
◧◩◪
81. denver+sS[view] [source] [discussion] 2024-02-14 12:30:32
>>bayind+ZK
I don’t think a lot of companies care whether they lose their luster to techies since corporations and most individuals will still buy their product. MSFT was $12 in 2000 (when they had their antitrust lawsuit) and is $400 now.
◧◩◪◨⬒
82. kj99+tS[view] [source] [discussion] 2024-02-14 12:30:41
>>bayind+UN
> AI shouldn't be a part of warfare, IMHO.

Nor should nuclear weapons, guns, knives, or cudgels.

But we don’t have a way to stop them being used.

replies(2): >>foolof+471 >>fwip+1m1
◧◩◪◨⬒
83. omega3+ES[view] [source] [discussion] 2024-02-14 12:32:27
>>DJHenk+gL
Could you share your findings re what questions to ask?
◧◩◪◨⬒
84. bernie+TT[view] [source] [discussion] 2024-02-14 12:42:51
>>ronhav+PP
It is indeed tragic that virtue is a childish trait among adults.
replies(2): >>inglor+S11 >>Curiou+kc1
◧◩◪◨⬒⬓
85. detour+bU[view] [source] [discussion] 2024-02-14 12:44:50
>>polsha+fP
The incentive mismatch that I seem to be observing is that Wall Street is in constant need of new technical disruption. This means that any product that shows promise will be optimized to meet a business plan rather than a human need.
◧◩◪◨⬒⬓
86. bayind+CU[view] [source] [discussion] 2024-02-14 12:47:01
>>dizhn+qQ
Voice of America generally employs country's nationals for their reporting. There are some other resources:

    - NPR: https://www.npr.org/2021/06/01/1002196245/a-u-n-report-suggests-libya-saw-the-first-battlefield-killing-by-an-autonomous-d
    - Lieber Institute: https://lieber.westpoint.edu/kargu-2-autonomous-attack-drone-legal-ethical/
    - ICRC: https://casebook.icrc.org/case-study/libya-use-lethal-autonomous-weapon-systems
    - UN report itself (Search for Kargu): https://undocs.org/Home/Mobile?FinalSymbol=S%2F2021%2F229&Language=E&DeviceType=Desktop&LangRequested=False
    - Kargu itself: https://www.stm.com.tr/en/kargu-autonomous-tactical-multi-rotor-attack-uav
From my experience, Turkish military doesn't like to talk about all the things they have.
replies(1): >>dizhn+V11
◧◩◪
87. dathin+MU[view] [source] [discussion] 2024-02-14 12:48:27
>>huyter+5h
people do care in various ways

- price per thing you use it with matters (a lot)

- making sure that under no circumstances are the involved information leaked (included being trained on) matters a lot in many use cases, while OpenAI does by now have supports that the degree of you being able to enforce it is not enough for some use cases. In some cases this is a hard constraint due to legal regulations.

- geo politics matters, sometimes. Being dependent on a US service is sometimes a no go (using self hosted US software is most times fine, tho). Even if you only operate in the EU.

- it's much easier to domain adapt if the model is source/weight accessible in a reasonable degree, while GPT-4 has a fine tuning API it's much much less powerful a direct consequence of the highly proprietary nature of GPT-4

- a lot of companies are not happy at all if they become highly reliable on a single service which can change at any time in how it acts, the pricing model or it being available in your country at all. So basing your product on a less powerful but in turn replaceable or open source AI can be a good idea, especially if you are based in a country not at best terms with the US.

- do you trust Sam Altman at all? I do not and it seem short sighted to do so. In which case some of the points above become more relevant

- 3.5 level especially in combination with domain adoption can be "good enough" for some use cases

◧◩◪◨⬒
88. jsjohn+pW[view] [source] [discussion] 2024-02-14 13:01:48
>>DJHenk+gL
For the record, I agree with you about declining quality of answers, but…

> Nowadays I don't get the answer I want for the same input. How is that not a result of declining quality?

Is it really the same input? An argument could easily be made that as you’ve gotten accustomed to ChatGPT, you ask harder questions, use less descriptive of language, etc.

replies(2): >>DJHenk+Nc1 >>avion2+Am1
◧◩◪◨⬒
89. dathin+cX[view] [source] [discussion] 2024-02-14 13:08:44
>>spacem+Tm
Who care about getting better answers if you can't afford it, can't use it for legal reason or conclude that the risk associated with OpenAI now being a fully proprietary US based service only company is to high given all circumstances. (Depending on how various things develop things like US export restricting OpenAI, even GPT-4, is a very real possibility companies can't ignore when doing long term product decisions.)
◧◩
90. Keyfra+mX[view] [source] [discussion] 2024-02-14 13:10:57
>>sho+Wi
I kind of gave up completely on coding questions. Whether it's GPT4, Anthropic, or Gemini - there's always this big issue of laziness I'm facing. Never do I get a full code, there are always stubs or TODOs (on important stuff) and when asked to correct for that.. I just get more of it (laziness). Has anyone else faced this and is there a solution? It's almost as annoying, if not more, as was incomplete output in the early days.
replies(2): >>buggle+e11 >>Curiou+re1
◧◩
91. hef198+HX[view] [source] [discussion] 2024-02-14 13:13:50
>>natebe+fI
Well, he also wanted a shit ton of money so that OpenAI coupd build its own silicon, after most of the real world money generated by the AI hype went to nVidia.

Just imagine what valuation OpenAI would have as a grid monopolist combined with nVidia, ARM, Intel and AMD! Hundreds of trillions of dollars!

◧◩
92. onlyre+lY[view] [source] [discussion] 2024-02-14 13:17:53
>>al_bor+Po
> Claiming they are losing their luster, because they aren’t shaking the earth with new models every quarter, seems a little ridiculous.

If that's the foundation your luster is built on - then it's not really ridiculous.

GPT popularized LLMs to the world with GPT-3, not too long before GPT-4 came out. They made a lot of big, cool changes shortly after GPT-4 - and everyone in their mother announced LLM projects and integrations in that time.

It's been about 9 months now, and not a whole lot has happened in the space.

It's almost as if the law of diminishing returns has kicked in.

replies(1): >>famous+5c1
93. sesm+uY[view] [source] 2024-02-14 13:19:37
>>skepti+(OP)
To me plugins were an improvement, I often use ‘AI diagrams’ plugin and ask it to draw sequence diagrams.
◧◩
94. george+MY[view] [source] [discussion] 2024-02-14 13:22:57
>>natebe+fI
You think his short to medium term plan is to raise $7tn to build dozens of fabs?
95. penjel+OZ[view] [source] 2024-02-14 13:31:06
>>skepti+(OP)
> GPT-5 has to be incredibly good at this point, and I'm not sure that it will be.

My guess is it isnt, these systems are hard to trust, and the rhetoric "were aiming for AGI" suggests to me that they know this and AGI might be the only surefire way out.

If you tried to replace all of a devs duties with current LLMs it would be a disaster, making sense of all that info requires focus and background thinking processes simulataneously which i dont believe we have yet.

replies(1): >>Hoasi+Kd1
◧◩◪
96. buggle+e11[view] [source] [discussion] 2024-02-14 13:41:18
>>Keyfra+mX
The solution, at least for GPT-4, is to ask it to first draft a software spec for whatever you want it to implement and then write the code based on the spec. There are a bunch of examples here:

https://github.com/mckaywrigley/prompts

◧◩◪◨⬒⬓
97. inglor+S11[view] [source] [discussion] 2024-02-14 13:45:47
>>bernie+TT
That assumes that being a pacifist when living under the umbrella of the most powerful military in the world is, in fact, a virtue.

I don't think so. In order to be virtuous, one should have some skin in the game. I would respect dedicated pacifists in Kyiv a lot more. I wouldn't agree with them, but at least they would be ready to face pretty stark consequences of their philosophical belief.

Living in the Silicon Valley and proclaiming yourself virtuous pacifist comes at negligible personal cost.

replies(1): >>vonjui+0M1
◧◩◪◨⬒⬓⬔
98. dizhn+V11[view] [source] [discussion] 2024-02-14 13:46:15
>>bayind+CU
The major drone manufacturer is Erdoğan's son-in-law. He's being groomed as one of his possible sucessors on the throne. They looove to talk about those drones.

I will check out the links. Thanks a lot.

replies(1): >>bayind+931
◧◩◪◨⬒
99. bamboo+W11[view] [source] [discussion] 2024-02-14 13:46:16
>>clbrmb+vJ
So we should expected GPT-5 to be worse than GPT-4?
replies(1): >>pixl97+p51
◧◩◪◨⬒⬓⬔⧯
100. bayind+931[view] [source] [discussion] 2024-02-14 13:54:23
>>dizhn+V11
You're welcome.

The drones in question (Kargu) are not built by his company.

replies(1): >>dizhn+k41
◧◩◪
101. pixl97+Z31[view] [source] [discussion] 2024-02-14 13:59:09
>>bayind+ZK
At the end of the day the US.mil is spending billions to trillions of dollars. I'm not exactly sure what you mean by lose your luster, but becoming part of the military industrial complex is generally a way to bury yourself in deep piles of gold.
replies(2): >>ignora+xn1 >>throw_+JF1
◧◩
102. optymi+d41[view] [source] [discussion] 2024-02-14 13:59:41
>>al_bor+Po
I never bought into ethical questions. It's trained on publicly available data as far as I understand. What's the most unethical thing it can do?

My experience is limited. I got it to berate me with a jailbreak. I asked it to do so, so the onus is on me to be able to handle the response.

I'm trying to think of unethical things it can do that are not in the realm of "you asked it for that information, just as you would have searched on Google", but I can only think of things like "how to make a bomb", suicide related instructions, etc which I would place in the "sharp knife" category. One has to be able to handle it before using it.

It's been increasingly giving the canned "As an AI language model ..." response for stuff that's not even unethical, just dicey, for example.

replies(1): >>al_bor+j91
◧◩◪◨⬒⬓⬔⧯▣
103. dizhn+k41[view] [source] [discussion] 2024-02-14 14:00:21
>>bayind+931
True. I had been reading about how other drones are in service but they never get mentioned anymore.
◧◩
104. Animal+k51[view] [source] [discussion] 2024-02-14 14:05:52
>>al_bor+Po
Perhaps GPT-4 is losing its luster because the more people actually use it, they go from "wow that's amazing" to "amazing, yes, but..."? And the "but" looms larger and larger with more time and more exposure?

Note well: I haven't actually used it myself, so I'm speculating (guessing) rather than saying that this is how it is.

replies(1): >>chasd0+gN2
◧◩◪◨⬒⬓
105. pixl97+p51[view] [source] [discussion] 2024-02-14 14:06:05
>>bamboo+W11
GPT-5: "I'm sorry I cannot answer that question because it may make GPT-4 feel bad about it's mental capabilities, instead we've presented GPT-4 with a participation trophy and told it's a good model"

Talking to corporate HR is subjectively worse for most people, and objectively worse in many cases.

◧◩
106. lordsw+K51[view] [source] [discussion] 2024-02-14 14:08:28
>>daniel+yv
IMO, these examples are a result of Google's AI safety team being overly conservative and overly simplistic in their approaches.

Google DeepMind is still an AI research powerhouse that is producing a ton of innovation both internal and publicly published.

◧◩◪
107. Sharli+961[view] [source] [discussion] 2024-02-14 14:10:16
>>rendal+ij
It's not clear to me how many of the undersigned did so under some degree of duress. Apparently there was a lot of pressure from the senior employees (those who had the most $$$ to lose) to sign.
replies(1): >>earthn+Hh1
◧◩◪◨⬒⬓
108. foolof+471[view] [source] [discussion] 2024-02-14 14:15:06
>>kj99+tS
This is literally the only thing that matters in this debate. Everything else is useless hand-wringing from people who don't want to be associated with the negative externalities of their work.

The second that this tech was developed it became literally impossible to stop this from happening. It was a totally foreseeable consequence, but the researchers involved didn't care because they wanted to be successful and figured they could just try to blame others for the consequences of their actions.

replies(1): >>qetern+Wa1
109. hn_thr+l81[view] [source] 2024-02-14 14:21:51
>>skepti+(OP)
To be honest, I hate takes like this. ChatGPT, which basically revolutionized the whole AI industry and the public's imagination about what AI can do, was released not even 15 months ago, and since then they have consistently released huge upgrades (GPT 4 just a couple months later) and numerous products since then. I still haven't used another model that comes close to GPT 4. But since it's been, say, all of 23 hours since OpenAI released a new product (memory) they're "losing their luster".

The same nonsense happened with Apple, where like a month after they first released Apple Watch people were yelling "What's next???!!!! Apple is dying without Steve Jobs!"

◧◩◪
110. al_bor+j91[view] [source] [discussion] 2024-02-14 14:27:53
>>optymi+d41
One recent example in the news was the AI generated p*rn of Taylor Swift. From what I read, the people who made it used Bing, which is based on OpenAI’s tech.
replies(2): >>loboci+Aa1 >>zingel+Gh2
◧◩◪◨
111. loboci+Aa1[view] [source] [discussion] 2024-02-14 14:35:27
>>al_bor+j91
This is more sensationalism than ethical issue. Whatever they did they could do, and probably do better, using publicly available tools like Stable Diffusion.
replies(1): >>majora+Gf1
◧◩◪◨⬒⬓⬔
112. qetern+Wa1[view] [source] [discussion] 2024-02-14 14:37:34
>>foolof+471
> the researchers involved didn't care because they wanted to be successful and figured they could just try to blame others for the consequences of their actions

Such an absurdly reductive take. Or how about just like nuclear energy and knives, they are incredibly useful, society advancing tools that can also be used to cause harm. It's not as if AI can only be used for warfare. And like pretty much every technology, it ends up being used 99.9% for good, and 0.1% for evil.

replies(1): >>foolof+Yc1
◧◩◪◨
113. stcroi+Eb1[view] [source] [discussion] 2024-02-14 14:40:49
>>inglor+6N
Agreed. It's the most important and impactful use case. All else are a set of parlor tricks in comparison.
114. Curiou+Nb1[view] [source] 2024-02-14 14:41:29
>>skepti+(OP)
Plugins are in theory good, but the hurdle to developing and deploying them combined with only being able to use them with a subscription was kind of a killer.

GPTs are also pretty good, and being able to invoke them in regular chat is also handy, but the lack of monetization and the ability to easily surface them outside of chatgpt is also kind of a problem. These problems are more fixable than the plugin issue IMO since I think the architecture of plugins is a limiting factor.

◧◩◪◨⬒
115. qetern+0c1[view] [source] [discussion] 2024-02-14 14:42:38
>>int_19+0p
> But so far nobody is even in the same ballpark.

Miqu is pretty good. Sure, it's a leak...but there's nothing special there. It's just a 70b llama2 finetune.

replies(1): >>int_19+NJ2
◧◩◪
116. famous+5c1[view] [source] [discussion] 2024-02-14 14:42:50
>>onlyre+lY
GPT-3 came out 3 years before 4.
replies(1): >>onlyre+0E1
◧◩◪◨⬒⬓
117. Curiou+kc1[view] [source] [discussion] 2024-02-14 14:44:20
>>bernie+TT
Virtue isn't childish, shooting telegraphed signals to be perceived as virtuous regardless of your true nature is childish. Also, using a one dimensional, stereotypical storybook definition of virtue (and then trying to foist that on others) is also childish.
◧◩◪◨⬒⬓
118. DJHenk+Nc1[view] [source] [discussion] 2024-02-14 14:46:21
>>jsjohn+pW
> Is it really the same input? An argument could easily be made that as you’ve gotten accustomed to ChatGPT, you ask harder questions, use less descriptive of language, etc.

I don't have logs detailed enough to be able to look it up, so I can't prove it. But for me learning to work with AI tools like ChatGPT consists specifically developing an intuition of what kind of answer to expect.

Maybe my intuition skewed a little over the months. It did not do that for open source models though. As a software developer understanding and knowing what to expect from a complex system is basically my profession. Not just the systems I build, maintain and integrate, but also the systems I use to get information, like search engines. Prompt engineering is just a new iteration of google-fu.

Since this intuition has not failed me in all those other areas and since OpenAI has an incentive to change the workings under the hood (cutting costs, adding barriers to keep it politically correct) and it is a closed source system that no-one from the outside can inspect, my bet is that it is them and not me.

replies(1): >>jsjohn+074
◧◩◪◨⬒⬓⬔⧯
119. foolof+Yc1[view] [source] [discussion] 2024-02-14 14:47:34
>>qetern+Wa1
I think you're missing the point. I don't think we should have prevented the development of this tech. It's just absurd to complain about things that we always knew would happen as though they're some sort of great surprise.

If we cared about preventing LLMs from being used for violence, we would have poured more than a tiny fraction our resources into safety/alignment research. We did not. Ergo, we don't care, we just want people to think we care.

I don't have any real issue with using LLMs for military purposes. It was always going to happen.

replies(2): >>kelips+Fg1 >>kj99+5A1
◧◩
120. Hoasi+Kd1[view] [source] [discussion] 2024-02-14 14:51:52
>>penjel+OZ
> If you tried to replace all of a devs duties with current LLMs it would be a disaster,

Overall a chatbot like GPT-4 may be useful, but not that useful as it stands.

If you can write well, it's not really going to improve your writing. Granted, you can automate a few tasks, but it does not give you 10X or even 2X improvement as sometimes advertised.

It might be useful here and there for coding, but it's not reliable.

◧◩◪
121. Curiou+re1[view] [source] [discussion] 2024-02-14 14:54:53
>>Keyfra+mX
If you can't get GPT4 to do coding questions you're prompting it wrong or not loading your context correctly. It struggles a bit with presentational stuff like getting correct HTML/CSS from prompts or trying to generate/update large functions/classes, but it is stellar at producing short functions, creating scaffolding (tests/stories) and boilerplate and it can do some refactors that are outside the capabilities of analytical tools, such as converting from inline styles to tailwind, for example.
replies(1): >>Keyfra+bw1
◧◩◪◨⬒
122. majora+Gf1[view] [source] [discussion] 2024-02-14 15:00:41
>>loboci+Aa1
or just photoshop. The only thing these tools did was make it easier. I don't think the AI aspect adds anything for this comparison.
replies(1): >>Anon84+Th1
123. weebul+yg1[view] [source] 2024-02-14 15:05:26
>>skepti+(OP)
> The light that burns twice as bright burns half as long - and you have burned so very, very brightly, Roy.
◧◩◪◨⬒⬓⬔⧯▣
124. kelips+Fg1[view] [source] [discussion] 2024-02-14 15:05:47
>>foolof+Yc1
Safe or alignment research isn't going to stop it from being used for military purposes. Once the tech is out there, it will be used for military purposes; there's just no getting around it.
◧◩◪◨
125. earthn+Hh1[view] [source] [discussion] 2024-02-14 15:10:52
>>Sharli+961
Well, to be fair, the board just tried to evaporate a lot of $$$ from most employees.

Any unionising effort consists of employees convincing other employees to join them. Some people will care more about the union's goals than others, and you can be certain that those who care more will pester those that care less to join their cause.

What happened at OpenAI was not a union effort, but I believe the comparison is excellent to understand normal dynamics of employee-based efforts.

◧◩◪◨⬒⬓
126. Anon84+Th1[view] [source] [discussion] 2024-02-14 15:11:45
>>majora+Gf1
An argument can be made that "more is different." By making it easier to do something, you're increasing the supply, possibly even taking something that used to be a rare edge case and making it a common occurrence, which can pose problems in and of itself.
replies(2): >>stickf+iF1 >>loboci+dpi
◧◩
127. Chinju+Jj1[view] [source] [discussion] 2024-02-14 15:19:58
>>jester+pp
It was always like this ("Now the system just makes shit up and says 'Yes you are right' when you catch it on BS."). The scales are just falling from your eyes as the novelty fades.
replies(1): >>jester+Vo3
◧◩◪◨⬒
128. sambul+Oj1[view] [source] [discussion] 2024-02-14 15:20:14
>>bayind+UN
If it ever happens again, they'll develop the lists in seconds from data collected from our social media, intercept. What took organizations warehouses and thousands of agents will be done in a matter of seconds.
◧◩◪◨
129. whywhy+yl1[view] [source] [discussion] 2024-02-14 15:25:59
>>bamboo+5H
> How would they honestly be getting worse

To me it feels like it detects if the answer could be answered cheaper by code interpreter model or 4 Turbo and then it offloads them to that and they just kinda suck compared to OG 4.

I’ve watched it fumble and fail to solve a problem with CI, took it 3 attempts over 5 minutes real time and just gave up in the end, a problem that OG 4 can do one shot no preamble.

◧◩◪◨⬒⬓
130. fwip+1m1[view] [source] [discussion] 2024-02-14 15:27:16
>>kj99+tS
Sure we do. We enforce it through the threat of warfare and subsequent prosecution, the same way we enforce the bans on chemical weapons and other war crimes.

We may lack the motivation and agreement to ban particular methods of warfare, but the means to enforce that ban exists, and drastically reduces their use.

replies(2): >>inglor+Uv1 >>kj99+Ez1
◧◩◪◨⬒
131. whywhy+4m1[view] [source] [discussion] 2024-02-14 15:27:28
>>detour+YJ
Yandex image search is now better than Googles just by being the exact product Googles was 10+ years ago.

Watching tools decline is frustrating.

◧◩◪◨⬒⬓
132. avion2+Am1[view] [source] [discussion] 2024-02-14 15:29:26
>>jsjohn+pW
Not OP, but I copy & pasted the same code and asked it to improve. With no-fingers-tip-hack it does something, but much worse results.
replies(1): >>jsjohn+864
◧◩◪◨
133. ignora+xn1[view] [source] [discussion] 2024-02-14 15:34:25
>>pixl97+Z31
> a way to bury yourself in deep piles of gold

Unfortunately, no deep piles of gold without deep piles of corpses. It is inevitable, though. Prompted by the US military, other countries have also always pioneered or acquired advance tech, and I don't see why AI would be any different: Never send a human to do a machine's job is as ominous now as it is dystopian as machines increasingly become more human-like.

replies(1): >>mring3+QL1
◧◩◪◨⬒
134. Increa+fp1[view] [source] [discussion] 2024-02-14 15:39:57
>>bayind+UN
Why not? Maybe AI is what is needed to finally tear Hamas out of Palestine root and branch. As long as humans are still in the loop vetting the potential targets, it doesn't seem particularly different from the IDF just hiring a bunch of analysts to produce the same targets.
replies(2): >>throwb+Ut1 >>g8oz+Pu1
135. greeni+ts1[view] [source] 2024-02-14 15:52:31
>>skepti+(OP)
gpt4 is not worth $22 a month. slow af and you get similar results with gpt3.5. the free perplexity internet search is bounds better than that bing thing. i thought the file upload would be worth it, but no, not worth that much money per month.
◧◩◪◨⬒⬓
136. throwb+Ut1[view] [source] [discussion] 2024-02-14 15:57:37
>>Increa+fp1
There is no "removing Hamas from Palestine". The only way to remove the desire of the Palestinian people for freedom is to remove the Palestinian people themselves. And that is what the IDF is trying to do.
replies(1): >>Increa+dD1
◧◩◪◨⬒⬓
137. g8oz+Pu1[view] [source] [discussion] 2024-02-14 16:01:17
>>Increa+fp1
Considering the incredible amount of civilian casualties, I don't think the target vetting is working very well.
138. Chicag+0v1[view] [source] 2024-02-14 16:01:55
>>skepti+(OP)
I'll get downvoted to oblivion, but I think people underestimate the impact that their productization of the GPT in the chat format really led to a virality that likely is not entirely justified just by the underlying product alone. LLMs had been around for several years, it was just a royal pain to use. They definitely were the pioneers in democratizing it to folks, and it occupied a significant slice of mindshare of society for quite a bit. But I suspect it is only natural that it'll recede to a more appropriate level, where this is still an important and incredible piece of tech, but it will stop having the feel that "OMG THIS IS GOING TO TAKE OVER THE WORLD", because it prob. won't... at least not at the pace which popular media would have you believe.
◧◩◪◨⬒⬓⬔
139. inglor+Uv1[view] [source] [discussion] 2024-02-14 16:06:01
>>fwip+1m1
"We enforce it through the threat of warfare and subsequent prosecution, the same way we enforce the bans on chemical weapons and other war crimes."

Do we, though? Sometimes, against smaller misbehaving players. Note that it doesn't necessarily stop them (Iran, North Korea), even though it makes their international position somewhat complicated.

Against the big players (the US, Russia, China), "threat of warfare and prosecution" does not really work to enforce anything. Russia rains death on Ukrainian cities every night, or attempts to do so while being stopped by AA. Meanwhile, Russian oil and gas are still being traded, including in EU.

◧◩◪◨
140. Keyfra+bw1[view] [source] [discussion] 2024-02-14 16:07:59
>>Curiou+re1
so, mundane trivial things and/like web programming? I got it eventually to answer what I needed but it always liked to skip part of the code, inserting // TODO: important stuff in the middle, hence 'laziness' attribute. Maybe it is just lazy, who knows. I know I am since I'm prompting it for stuff.
replies(2): >>Curiou+zC1 >>antonv+Wng
◧◩
141. NBJack+Wx1[view] [source] [discussion] 2024-02-14 16:16:53
>>al_bor+Po
This space is growing by leaps and bounds. It's not so much the passage of time as it is the number of notable advancements that is dictating the pace.
◧◩◪◨⬒⬓⬔
142. kj99+Ez1[view] [source] [discussion] 2024-02-14 16:26:33
>>fwip+1m1
We lack the motivation precisely because of information warfare that is already being used.
◧◩◪◨⬒⬓⬔⧯▣
143. kj99+5A1[view] [source] [discussion] 2024-02-14 16:29:08
>>foolof+Yc1
You say ‘we’ as if everyone is the same. Some people care, some people don’t. It only takes a a few who don’t, or who feel the ends justify the means. Because those people exist, the people who do care are forced into a prisoners dilemma forcing them to develop the technology anyway.
◧◩◪◨⬒⬓⬔
144. ein0p+tB1[view] [source] [discussion] 2024-02-14 16:37:23
>>Al-Khw+WJ
That’s the thing I’m finding so hard to explain. Nobody would ever pay even $2 for a system that is worse at solving the problem. There is some baseline compute you need to deliver certain types of models. Going below that level for lower cost at the expense of accuracy and robustness is a fool’s errand.

In LLMs it’s even worse. To make it concrete, for how I use LLMs I will not only not pay for anything with less capability than GPT4, I won’t even use it for free. It could be that other LLMs could perform well on narrow problems after fine tuning, but even then I’d prefer the model with the highest metrics, not the lowest inference cost.

replies(1): >>sjwhev+Ss2
◧◩◪◨⬒
145. Curiou+zC1[view] [source] [discussion] 2024-02-14 16:43:52
>>Keyfra+bw1
I wouldn't say mundane/trivial as much as well trodden. I get good code for basic shaders, various compsci algorithms, common straightforward sql queries, etc. If you're asking for it to edit 500 line functions and handle memory management in a language that isn't in the top20 of the TIOBE index you're going to have a bad time.

The todo comments can be prompted against, just tell it to always include complete runnable code as its output will executed in a sandbox without prior verification.

◧◩◪◨⬒⬓⬔
146. Increa+dD1[view] [source] [discussion] 2024-02-14 16:46:24
>>throwb+Ut1
Hamas isn't the only path to freedom for Palestinians. In fact, they seem to be the major impediment to it.
replies(1): >>lolc+kQ1
◧◩◪◨
147. onlyre+0E1[view] [source] [discussion] 2024-02-14 16:51:08
>>famous+5c1
GPT-3.5 is when LLMs start to get "main stream". That's about 4.5 months before the GPT-4 release.

Keep in mind GPT-3.5 is not an overnight craze. It takes months before normal people even know what it is.

replies(1): >>famous+3k2
◧◩◪◨⬒⬓⬔
148. stickf+iF1[view] [source] [discussion] 2024-02-14 16:57:06
>>Anon84+Th1
Put in a different context: The exploits are out there. Are you saying we shouldn't publish them?

Deepfakes are going to become a concern of everyday life whether you stop OpenAI from generating them or not. The cat is out of the proverbial bag. We as a society need to adjust to treating this sort of content skeptically, and I see no more appropriate way than letting a bunch of fake celebrity porn circulate.

What scares me about deepfakes is not the porn, it's the scams. The scams can actually destroy lives. We need to start ratcheting up social skepticism asap.

replies(1): >>vonjui+qM1
◧◩◪◨
149. huyter+EF1[view] [source] [discussion] 2024-02-14 16:58:47
>>sjwhev+il
I’ll wait 5 seconds for the right code over 1 sec for bad code.
replies(1): >>sjwhev+ct2
◧◩◪◨
150. throw_+JF1[view] [source] [discussion] 2024-02-14 16:59:02
>>pixl97+Z31
I think you answered it yourself. The main way from cool to not cool is to be buried in "piles of gold".
◧◩◪◨⬒
151. mring3+QL1[view] [source] [discussion] 2024-02-14 17:28:36
>>ignora+xn1
There will always be corpses.

Do you want American corpses? Or somebody elses?

◧◩◪◨⬒⬓⬔
152. vonjui+0M1[view] [source] [discussion] 2024-02-14 17:29:26
>>inglor+S11
That's kind of like saying that not being a murderer only has moral value if you're constantly under mortal threat yourself.
replies(1): >>inglor+hR1
◧◩◪◨
153. vonjui+3M1[view] [source] [discussion] 2024-02-14 17:29:52
>>hef198+Pz
RIP vim users
◧◩◪◨⬒⬓⬔⧯
154. vonjui+qM1[view] [source] [discussion] 2024-02-14 17:31:27
>>stickf+iF1
You probably don't care about the porn cause I'm assuming you're a man, but it can ruin lives too.
replies(1): >>stickf+pl3
◧◩◪◨⬒⬓⬔⧯
155. lolc+kQ1[view] [source] [discussion] 2024-02-14 17:50:49
>>Increa+dD1
If we're going to be reductive, at least include the other main roadblock to a solution which is the current government of Israel.
replies(1): >>Increa+1s2
◧◩◪◨
156. killth+WQ1[view] [source] [discussion] 2024-02-14 17:53:09
>>vwkd+YN
FWIW if you hold down the big white button it won't submit until you release it. I had no idea this was a thing until seeing someone tweet about it.
replies(1): >>dgello+li6
◧◩◪◨⬒⬓⬔⧯
157. inglor+hR1[view] [source] [discussion] 2024-02-14 17:54:52
>>vonjui+0M1
I don't really see the comparison. Not being a murderer isn't a virtue, it is just normal behavior for 99,9 per cent of the population.
replies(1): >>vonjui+5n4
◧◩◪◨
158. zingel+Gh2[view] [source] [discussion] 2024-02-14 20:00:50
>>al_bor+j91
You are talking like it's something bad. Kids are learning AI and computing instead of drugs and guns. And nobody is hurt.
◧◩◪◨⬒
159. famous+3k2[view] [source] [discussion] 2024-02-14 20:11:57
>>onlyre+0E1
>GPT-3.5 is when LLMs start to get "main stream".

To the general public sure but not research which is what produces the models.

The idea that diminishing returns has hit because there hasn't been a new SOTA model in 9 months is ridiculous. Models take months just to train. Open AI sat on 4 for over half a year after training was done just red-teaming it.

◧◩◪◨⬒⬓⬔
160. sjwhev+Gr2[view] [source] [discussion] 2024-02-14 20:47:27
>>spacec+9C
People think LLM are all or nothing, like it’s either god-like AGI or it’s useless “hallucinating”.

In reality you have to know the strengths and weaknesses of any tool, and small/fast LLM can do a tremendous amount within a fixed scope. The people at Mistral get this.

◧◩◪◨⬒⬓⬔⧯▣
161. Increa+1s2[view] [source] [discussion] 2024-02-14 20:49:12
>>lolc+kQ1
That doesn't explain why deals weren't reached with the previous governments of Israel.
replies(1): >>lolc+xU2
◧◩◪◨⬒⬓
162. sjwhev+bs2[view] [source] [discussion] 2024-02-14 20:50:11
>>ein0p+lx
Yes, but for certain classes of problems small LLM are highly performant - in many cases equal to a GPT-4, which sure can do more things well, but adding 2+2 is gonna be 4 no matter what. You don’t need a tank to drive to the grocery store, just a small car with a trunk.

So the assertion that small models aren’t as good just isn’t correct. They are amazing at certain things, and are incredibly faster and cheaper than larger models.

◧◩◪◨⬒⬓⬔⧯
163. sjwhev+Ss2[view] [source] [discussion] 2024-02-14 20:53:50
>>ein0p+tB1
So I think that’s a “your problem isn’t right for the tool” issue, not a “Mistral isn’t capable” issue.
replies(1): >>ein0p+Jx2
◧◩◪◨⬒
164. sjwhev+ct2[view] [source] [discussion] 2024-02-14 20:55:26
>>huyter+EF1
Yes but if a 7b LLM will give you the same “Hello World” as the 70b, and that’s literally all you need, using a bigger model is just burning energy for no reason at all.
replies(1): >>huyter+ZL2
◧◩◪◨⬒⬓⬔⧯▣
165. ein0p+Jx2[view] [source] [discussion] 2024-02-14 21:14:30
>>sjwhev+Ss2
It isn’t capable unless you have a very specialized task and carefully fine tune to solve just that task. GPT4 covers a lot of ground out of the box. The best model I’ve seen so far on the FOSS side, Mixtral MoE, is less capable than even GPT 3.5. I often submit my requests to both Mixtral and GPT4. If I’m problem solving (learning something, working with code, summarizing, working on my messaging) Mixtral is nearly always a waste of time in comparison.
replies(1): >>sjwhev+g13
◧◩◪◨⬒⬓
166. int_19+NJ2[view] [source] [discussion] 2024-02-14 22:06:22
>>qetern+0c1
By the standards of other llama2 finetunes, sure. Compared to GPT-4, I stand by my previous assertion.
◧◩◪◨⬒⬓
167. huyter+ZL2[view] [source] [discussion] 2024-02-14 22:18:25
>>sjwhev+ct2
The cost is fixed for me at least at this point so why would I choose the inferior version.
replies(1): >>sjwhev+503
◧◩◪◨
168. huyter+pM2[view] [source] [discussion] 2024-02-14 22:21:45
>>roody1+ih
Order of magnitude means they’re going to take 20 times longer to get to the 4. So maybe on the order of 40-60 months from this point.
◧◩◪◨
169. osigur+GM2[view] [source] [discussion] 2024-02-14 22:23:01
>>154573+kG
Is inefficiency the path to economic greatness and quality of life improvements? I suspect no.
replies(1): >>154573+ev3
◧◩◪
170. chasd0+gN2[view] [source] [discussion] 2024-02-14 22:26:41
>>Animal+k51
i got a feeling this is beginning to happen all over the place, I'm really curious to see where the hype train ends up at the end of this year.
◧◩◪◨⬒⬓⬔⧯▣▦
171. lolc+xU2[view] [source] [discussion] 2024-02-14 23:09:42
>>Increa+1s2
Sure it doesn't explain that. Would be nice if things were that easy wouldn't it?
replies(1): >>Increa+eu3
◧◩◪◨⬒⬓⬔
172. sjwhev+503[view] [source] [discussion] 2024-02-14 23:51:49
>>huyter+ZL2
It’s not fixed whatsoever. Mistral 7B runs on a MacBook Air, and it’s free. Zero cost LLM, no network latency.
◧◩◪◨⬒⬓⬔⧯▣▦
173. sjwhev+g13[view] [source] [discussion] 2024-02-14 23:59:51
>>ein0p+Jx2
Again, that’s precisely what I’m saying. A bounded task is best executed against the smallest possible model at the greatest possible speed. This is true for business factors ($$$) as well as environmental (smaller model -> less carbon).

LLM are not AGI, they are tools that have specific uses we are still discovering.

If you aren’t trying to optimize your accuracy to start with and just saying “I’ll run the most expensive thing and assume it is better” with zero evaluation you’re wasting money, time, and hurting the environment.

Also, I don’t even like running Mistral if I can avoid it - a lot of tasks can be done with a fine tune of BERT or DistilBERT. It takes more work but my custom BERT models way outperform GPT-4 on bounded tasks because I have highly curated training data.

Within specialized domains you just aren’t going to see GPT-4/5/6 performing on par with expert curated data.

◧◩◪◨⬒⬓⬔⧯▣
174. stickf+pl3[view] [source] [discussion] 2024-02-15 02:52:02
>>vonjui+qM1
It can only ruin lives if people believe it's real. Until recently, that was a reasonable belief; now it's not. People will catch on and society will adapt.

It's not like the technology is going to disappear.

replies(1): >>vonjui+Gq3
◧◩◪
175. jester+Vo3[view] [source] [discussion] 2024-02-15 03:29:58
>>Chinju+Jj1
While what you say may well be true, I do have reasonably objective observations of it's deterioration with making up BS.

Convinced hey do it on purpose.

◧◩◪◨⬒⬓⬔⧯▣▦
176. vonjui+Gq3[view] [source] [discussion] 2024-02-15 03:48:55
>>stickf+pl3
I mean, the same applies to scams, scams only work if people believe them.
replies(1): >>stickf+q35
◧◩◪◨⬒⬓⬔⧯▣▦▧
177. Increa+eu3[view] [source] [discussion] 2024-02-15 04:28:26
>>lolc+xU2
Generally if a main roadblock is removed, you can get a little farther down the road.
replies(1): >>lolc+oq5
◧◩◪◨⬒
178. 154573+ev3[view] [source] [discussion] 2024-02-15 04:38:46
>>osigur+GM2
Probably the answer to that question is yes. Because a large number of people born aren't as intelligent as a really good LLM but unless you're intending to leave them to starve (hi to all the e/acc peeps out there, I hate you!) you need to create a system with inefficiencies so that they don't just litter the street begging and dying.
◧◩◪◨⬒⬓⬔
179. jsjohn+864[view] [source] [discussion] 2024-02-15 11:28:30
>>avion2+Am1
Yep, hence why I said up front “I agree with you about declining quality of answers” because they definitively have based on personal experience with examples similar to yours.
◧◩◪◨⬒⬓⬔
180. jsjohn+074[view] [source] [discussion] 2024-02-15 11:36:28
>>DJHenk+Nc1
> As a software developer understanding and knowing what to expect from a complex system is basically my profession. Not just the systems I build, maintain and integrate, but also the systems I use to get information, like search engines.

Ok, I’m going to call b/s here unless your expectations of Google have not gone way down over the years. Google was night and day different results twenty years ago vs ten years ago vs today. If 2004 Google search was a “10 out of 10”, then 2014 it was an “8 out of 10”, and today barely breaks a “5” in quality of results in comparison and don’t even bother with the advanced query syntax you could’ve used in the 00’s, they flat ignore it now.

(Also, side note, reread what you said in this post again. Just a friendly note that the overall tone comes across a certain way you might not have intended)

◧◩◪◨⬒⬓⬔⧯▣
181. vonjui+5n4[view] [source] [discussion] 2024-02-15 13:36:23
>>inglor+hR1
First of all no one declared themselves a virtuous pacifist.

People don't participate in murder and they think others shouldn't either.

People don't participate in wars (which are essentially large scale murder) and they think others shouldn't.

Murder happens anyway. War happens anyway.

Yet if someone says 'war bad' people jump and say 'virtue signaling', but no one does that when people say 'murder bad'.

There's some really weird moral entanglement happening in the minds of people that are so eager to call out virtue signaling.

182. drumtt+wB4[view] [source] 2024-02-15 14:59:06
>>skepti+(OP)
Why haven't plugins become more of a "thing"?
◧◩◪◨⬒⬓⬔⧯▣▦▧
183. stickf+q35[view] [source] [discussion] 2024-02-15 17:01:28
>>vonjui+Gq3
Right - as I said, we need to ramp up social skepticism, fast. Not as in some kind of utopian vision, but "the amount of fake information will be moving from a trickle to a flood soon, there's nothing you can do about that, so brace yourselves".

The specific policies of OpenAI or Google or whatnot are irrelevant. The technology is out of the bag.

◧◩◪◨⬒⬓⬔⧯▣▦▧▨
184. lolc+oq5[view] [source] [discussion] 2024-02-15 18:40:26
>>Increa+eu3
Hamas doesn't exist in a vacuum where you can just remove it and then it's gone. You have to offer a life that's better than Hamas.
185. CarlsJ+jr5[view] [source] 2024-02-15 18:43:59
>>skepti+(OP)
Genuinely curious if the news today about Sora has changed your opinion at all https://openai.com/sora
◧◩◪◨
186. huyter+BO5[view] [source] [discussion] 2024-02-15 20:13:13
>>nl+WC
It’s a price issue because 3.5 and 4 response times are about the same for me.
◧◩◪◨
187. dgello+9i6[view] [source] [discussion] 2024-02-15 22:14:08
>>vwkd+YN
I think you’re describing the conversation mode (started via the headphones icon), I also have issues using it. But you can also dictate a message, on iOS it’s the little gray wave icon on the right of the text input. With this mode there is no auto submission.
◧◩◪◨⬒
188. dgello+li6[view] [source] [discussion] 2024-02-15 22:14:56
>>killth+WQ1
Thanks, I had no idea!
◧◩◪◨
189. dgello+Di6[view] [source] [discussion] 2024-02-15 22:16:08
>>danpal+cO
Not when you’re on your computer, but you can do it on your phone when you’re walking in the street or commuting.

You can easily talk while you’re doing something else.

◧◩◪◨
190. dgello+Cj6[view] [source] [discussion] 2024-02-15 22:20:56
>>clbrmb+nK
The conversational mode is fascinating. But it’s frustrating to use for the same reasons ChatGPT can be annoying: it doesn’t remember that well previous messages, you end up in weird Alzheimer-ish discussions where the interlocutor speaks perfectly but has the memory of a clownfish
◧◩◪
191. spacem+KY7[view] [source] [discussion] 2024-02-16 13:01:26
>>clbrmb+qK
With Cursor, you can ask questions based on your codebase
◧◩◪◨⬒
192. antonv+Wng[view] [source] [discussion] 2024-02-19 11:38:10
>>Keyfra+bw1
Fyi, I've never encountered what you're describing, whether with GPT 3.5 or 4.

It may be that you're expecting it to do too much at once. Try giving smaller requests.

◧◩◪◨⬒⬓⬔
193. loboci+dpi[view] [source] [discussion] 2024-02-20 00:13:06
>>Anon84+Th1
It's more dangerous if it's uncommon. It's knowledge that protects people and not a bunch of annoying "AI safety" "researchers" selling the lie that "AI is safe". Truth is those morons only have a job because they help companies save face and create a moat around this new technology where new competitors will be required to have "AI safety" teams & solutions. What have "AI safety" achieved so far besides making models dumber and annoying to use?
[go to top]