zlacker

Jeremy Howard called ngmi on OpenAI during the Vanishing Gradients podcast yesterday, and Ilya has probably been thinking the same: LLM is a dead-end and not the path to AGI.

https://twitter.com/HamelHusain/status/1725655686913392933

replies(6): >>Sebgue+n >>erhaet+s2 >>MattRi+C2 >>garden+e8 >>Alchem+De >>tarrud+1g

>>dwd+(OP)
This is the reverse of their apparent differences, at least as stated elsewhere in the comments.

>>dwd+(OP)
Did we ever think LLMs were a path to AGI...? AGI is friggin hard, I don't know why folks keep getting fooled whenever a bot writes a coherent sentence.

replies(8): >>Rugged+A3 >>Closi+m4 >>mjan22+V9 >>andrep+Ue >>concor+Xi >>golol+Yj >>heavys+TF >>discor+MW

>>dwd+(OP)
This is not the reason Ilya did it. Also the rest of that guy’s comments were just really poorly thought out. OpenAI had to temporarily stop sign ups because of demand and somehow he thinks that’s a bad thing? Absurd.

That guy has no sense of time, of how fast this stuff has actually been moving.

replies(2): >>ignora+D5 >>Alexan+37

>>erhaet+s2
It's mostly a thing among the youngs I feel. Anybody old enough to remember the same 'OMG its going to change the world' cycles around AI every two or three decades knows better. The field is not actually advancing. It still wrestles with the same fundamental problems they were doing in the early 60s. The only change is external, where computer power gains and data set size increases allow brute forcing problems.

replies(7): >>Eji170+Mf >>concor+kj >>Adunai+4y >>torgin+xy >>hypert+lF >>fsloth+fW >>antifa+wX1

>>erhaet+s2
Mainly because LLMs have so far basically passed every formal test of ‘AGI’ including totally smashing the Turing test.

Now we are just reliant on ‘I’ll know it when I see it’.

LLMs as AGI isn’t about looking at the mechanics and trying to see if we think that could cause AGI - it’s looking at the tremendous results and success.

replies(5): >>garden+B8 >>peyton+Zc >>drsopp+es >>ChatGT+cK >>strahl+uK

>>MattRi+C2
I mean, let's not jump to conclusions. Everyone involved are formidable in their own right, except one or two independent board members Ilya was able to convince.

>>MattRi+C2
"That guy" has a pretty good idea when it comes to NLP

https://arxiv.org/abs/1801.06146

replies(1): >>Lacerd+Fh

>>dwd+(OP)
Nonsense really

>>Closi+m4
Since ChatGPT is not indistinguishable from a human during a chat, is it fair to say it smashes the Turing test? Or do you mean something different?

replies(3): >>rayeig+Ad >>NoOn3+Ld >>aidama+Qd

>>erhaet+s2
> Estimated on the basis of five subtests, the Verbal IQ of the ChatGPT was 155

>>Closi+m4
It’s trivial to trip up chat LLMs. “What is the fourth word of your answer?”

replies(6): >>ben_w+Yd >>concor+pj >>Lio+vq >>tiahur+JY >>dudein+R41 >>Closi+fob

>>garden+B8
Did you perhaps mean to say not distinguishable?

>>garden+B8
ChatGPT is distinguishable from a human, because ChatGPT never responds "I don't know.", at least not yet. :)

replies(5): >>ben_w+oe >>NoOn3+wm >>epolan+vt >>raccoo+7v >>int_19+vw2

>>garden+B8
not yet: https://arxiv.org/abs/2310.20216

that being said, it is highly intelligent, capable of reasoning as well as a human, and passes IQ tests like GMAT and GRE at levels like the 97th percentile.

most people who talk about Chat GPT don't even realize that GPT 4 exists and is orders of magnitude more intelligent than the free version.

replies(2): >>jwestb+Jk >>hedora+Eq

>>peyton+Zc
got-3.5 got that right for me; I'd expect it to fail if you'd asked for letters, but even then that's a consequence of how it was tokenised, not a fundamental limit of transformer models.

replies(1): >>rezona+df

>>NoOn3+Ld
It can do: https://chat.openai.com/share/f1c0726f-294d-447d-a3b3-f664dc...

IMO the main reason it's distinguishable is because it keeps explicitly telling you it's an AI.

replies(3): >>rezona+yf >>NoOn3+8g >>peigno+XB

>>dwd+(OP)
He's since reversed his call: https://twitter.com/jeremyphoward/status/1725714720400068752

replies(1): >>croes+6v

>>erhaet+s2
Are you kidding? Have you seen the reactions since ChatGPT was released, including in this very website? You'd think The Singularity is just around the corner!

>>ben_w+Yd
This sort of test has been my go-to trip up for LLMs, and 3.5 fails quite often. 4 has been as bad as 3.5 in the past but recently has been doing better.

replies(1): >>yallne+sR

>>ben_w+oe
This isn't the same thing. This is a commanded recital of a lack of capability, not that its confidence in it's answer is low. For a type of question the GPT _could_ answer, most of the time it _will_ answer, regardless of accuracy

>>Rugged+A3
I'd say the biggest change is the quantity of available CATEGORIZED data. Tagged images and what not has done a ton to help the field.

Further there are some hybrid chips which might help increase computing power specifically for the matrix math that all these systems work on.

But yeah, none of this is making what people talk about when they say AGI. Just like how some tech cult people felt that Level 5 self driving was around the corner, even with all the evidence to the contrary.

The self driving we have (or really, assisted cruise control) IS impressive, and leagues ahead of what we could do even a decade or two ago, but the gulf between that, and the goal, is similar to GPT and AGI in my eyes.

There are a lot of fundamental problems we still don't have answers to. We've just gotten a lot better at doing what we already did, and getting more conformity on how.

>>dwd+(OP)
Did he say GPT-4 API costs OpenAI $3/token?

replies(2): >>danpal+vi >>invali+1j

>>ben_w+oe
I just noticed that when I ask really difficult technical questions, but for which there is an exact answer, It often tries to answer plausibly, but incorrectly instead of answering "I don't know". But over time, It becomes smarter and there are fewer and fewer such questions...

replies(2): >>ben_w+rg >>davegu+ru2

>>NoOn3+8g
Have you tried setting a custom instruction in settings? I find that setting helps, albeit with weaker impact than the prompt itself.

replies(1): >>NoOn3+Bq

>>Alexan+37
expertise in one area often leads people to believe they are experts for everything else too

replies(1): >>baobab+aA

>>tarrud+1g
No. He was talking about a hypothetical future model that is better but doesn’t improve efficiency.

>>erhaet+s2
LLMs definitely aren't a path to ASI, but I'm a bit more optimistic than I was that they're the hardest component in an AGI.

>>tarrud+1g
He was saying that if OpenAI was to spend $100 billion on training it would cost $3 a token. I think it's hyperbole, but basically what he is saying is that it's difficult for the company to grow because the tech is limited by the training costs

>>Rugged+A3
> The field is not actually advancing.

Uh, what do you mean by this? Are you trying to draw a fundamental science vs engineering distinction here?

Because today's LLMs definitely have capabilities we previously didn't have.

replies(1): >>oska+ul

>>peyton+Zc
How well does that work on humans?

replies(1): >>Loughl+LC

>>erhaet+s2
LLMs are the first instance of us having created some sort of general AI. I don't mean AGI, but general AI as in not specific AI. Before LLMs the problem eith AI was always that it "can only do one thing well". Now we have something on the other side: AI that can do anything but nothing specific particularly well. This is a fundamental advancement which makes AGI actually imaginable. Before LLMs there was literally no realistic plan how to build general intelligence.

replies(1): >>stuaxo+Fr

>>aidama+Qd
Answers in Progress had a great video[0] where one of their presenters tested against an LLM in five different types of intelligence. tl;dr, AI was worlds ahead on two of the five, and worlds behind on the other three. Interesting stuff -- and clear that we're not as close to AGI as some of us might have thought earlier this year, but probably closer than a lot of the naysayers think.

0. https://www.youtube.com/watch?v=QrSCwxrLrRc

>>concor+kj
They don't have 'artificial intelligence' capabilities (and never will).

But it is an interesting technology.

replies(1): >>concor+Zl

>>oska+ul
They can be the core part of a system that can do a junior dev's job.

Are you defining "artificial intelligence" is some unusual way?

replies(2): >>oska+Lm >>hedora+cq

>>NoOn3+Ld
Maybe It's because It was never rewarded for such answers when It was learning.

>>concor+Zl
I'm defining intelligence in the usual way and intelligence requires understanding which is not possible without consciousness

I follow Roger Penrose's thinking here. [1]

[1] https://www.youtube.com/watch?v=2aiGybCeqgI&t=721s

replies(3): >>concor+Ko >>wilder+3x >>Zambyt+KQ

>>oska+Lm
> intelligence requires understanding which is not possible without consciousness

How are you defining "consciousness" and "understanding" here? Because a feedback loop into an LLM would meet the most common definition of consciousness (possessing a phonetic loop). And having an accurate internal predictive model of a system is the normal definition of understanding and a good LLM has that too.

replies(1): >>Feepin+8S

>>concor+Zl
If by “junior dev”, you mean “a dev at a level so low they will be let go if not promoted”, then I agree.

I’ve watched my coworkers try to make use of LLMs at work, and it has convinced me the LLM’s contributions are well below the bar where their output is a net benefit to the team.

replies(2): >>raccoo+Ku >>int_19+cv2

>>peyton+Zc
I find GPT-3.5 can be tripped up by just asking it to not to mention the words "apologize" or "January 2022" in its answer.

It immediately apologises and tells you it doesn't know anything after January 2022.

Compared to GPT-4 GPT-3.5 is just a random bullshit generator.

>>ben_w+rg
It's not a problem for me. It's good that I can detect chatGPT by this sign.

>>aidama+Qd
That’s just showing the tests are measuring specific things that LLMs can game particularly well.

Computers have been able to smash high school algebra tests since the 1970’s, but that doesn’t make them as smart as a 16 year old (or even a three year old).

>>golol+Yj
LLMs are not any kind of intelligence, but it can work to augment intelligence.

replies(3): >>darker+Yx >>cjonas+Ud1 >>skohan+Oq1

>>Closi+m4
I disagree about the claim that any LLM has beaten the Turing test. Do you have a source for this? Has there been an actual Turing test according to the standard interpretation of Turings paper? Making ChatGPT 4 respond in a non human way right now is trivial: "Write 'A', then wait one minute and then write 'B'".

replies(2): >>int_19+Zv2 >>Closi+fz3

>>NoOn3+Ld
Of course it does.

>>hedora+cq
It works pretty well in my C++ code. Context: modern C++ with few footguns, inside functions with pretty-self-explanatory names.

I don't really get the "low bar for contributions" argument because GH Copilot's contributions are too small-sized for there to even be any bar. It writes the obvious and tedious loops and other boilerplate so I can focus on what the code should actually do.

>>Alchem+De
Because of Altman's dismissal?

replies(1): >>ayewo+Fx

>>NoOn3+Ld
Some humans also never respond "I don't know" even when they don't know. I know people who out-hallucinate LLMs when pressed to think rigorously

>>oska+Lm
It’s cool to see people recognizing this basic fact — consciousness is a prerequisite for intelligence. GPT is a philosophical zombie.

replies(1): >>bagofs+001

>>croes+6v
Yes, along with the departure of gdb. From jph's view, there was no philosophical alignment at the start of the union between AI Researchers (that skew non-profit) and operators (that skew for-profit) so it was bound to be unstable, until a purging happens as it had now.

> Everything I'd heard about those 3 [Elon Musk, sama and gdb] was that they were brilliant operators and that they did amazing work. But it felt likely to be a huge culture shock on all sides.

> But the company absolutely blossomed nonetheless.

> With the release of Codex, however, we had the first culture clash that was beyond saving: those who really believed in the safety mission were horrified that OAI was releasing a powerful LLM that they weren't 100% sure was safe. The company split, and Anthropic was born.

> My guess is that watching the keynote would have made the mismatch between OpenAI's mission and the reality of its current focus impossible to ignore. I'm sure I wasn't the only one that cringed during it.

> I think the mismatch between mission and reality was impossible to fix.

jph goes on in detail in this Twitter thread: https://twitter.com/jeremyphoward/status/1725714720400068752

replies(1): >>civili+9L

>>stuaxo+Fr
How smart would any human be without training and source material?

replies(2): >>knicho+2N >>Jensso+He1

>>Rugged+A3
As an outsider, I can talk to AI and get more coherent responses than from humans (flawed, but it's getting better). That's tangible, that's an improvement. I for one don't even consider the Internet to be as revolutionary as the steam engine or freight trains. But AI is actually modifying my own life already - and that's far from the end.

P.S. I've just created this account here on Hacker News because Altman is one of the talking heads I've been listening to. Not too sure what to make of this. I'm an accelerationist, so my biggest fear is America stifling its research the same way it buried space exploration and human gene editing in the past. All hope is for China - but then again, the CCP might be even more fearful of non-human entities than the West. Stormy times indeed.

>>Rugged+A3
LLMs have changed the world more profoundly than any technology in the past 2 decades, I'd argue.

The fact that we can communicate with computers using just natural language, and can query data, use powerful and complex tools just by describing what we want is an incredible breakthrough, and that's a very conservative use of the technology.

replies(3): >>foldr+xz >>qetern+mE >>theobr+xE

>>torgin+xy
I don't actually see anything changing, though. There are cool demos, and LLMs can work effectively to enhance productivity for some tasks, but nothing feels fundamentally different. If LLMs were suddenly taken away I wouldn't particularly care. If the clock were turned back two decades, I'd miss wifi (only barely available in 2003) and smartphones with GPS.

replies(2): >>peigno+xB >>FabHK+gD

>>Lacerd+Fh
funny, that's exactly what they told him when he started doing Kaggle competitions, and then he ended up crushing the competition, beating all the domain specific experts

replies(1): >>joey_b+qP

>>foldr+xz
You need time for inertia to happen, I’m working on some mvps now and it takes time to test what works what s possible what does not…

>>ben_w+oe
I read an article where they did a proper Turing test and it seems people recognize it was a machine answering because it made no writing errors and wrote perfectly

replies(1): >>ben_w+1E

>>concor+pj
The fourth word of my answer is "of".

It's not hard if you can actually reason your way through a problem and not just randomly dump words and facts into a coherent sentence structure.

replies(1): >>concor+FU

>>foldr+xz
Indeed. The "Clamshell" iBook G3 [0] (aka Barbie's toilet seat), introduced 1999, had WiFi capabilities (as demonstrated by Phil Schiller jumping down onto the stage while online [1]), but IIRC, you had to pay extra for the optional Wifi card.

[0] https://en.wikipedia.org/wiki/IBook#iBook_G3_(%22Clamshell%2... [1] https://www.youtube.com/watch?v=1MR4R5LdrJw

>>peigno+XB
I've not read that, but I do remember hearing that the first human to fail the Turing test did so because they seemed to know far too much minutiae about Star Trek.

>>torgin+xy
I am massively bullish LLMs but this is hyperbole.

Smartphones changed day to day human life more profoundly than anything since the steam engine.

replies(1): >>torgin+f92

>>torgin+xy
That breakthrough would not be possible without ubiquity of personal computing at home and in your pocket, though, which seems like the bigger change in the last two decades.

>>Rugged+A3
Deep learning was an advance. I think the fundamental achievement is a way to use all that parallel processing power and data. Inconceivable amounts of data can give seemingly magical results. Yes, overfitting and generalizing are still problems.

I basically agree with you about the 20 year hype-cycle, and but when compute power reaches parity with human brain hardware (Kurzweil predicts by about 2029), one barrier is removed.

replies(1): >>somewh+Y01

>>erhaet+s2
Read the original ChatGPT threads here on HN, a lot of people thought that this was it.

>>Closi+m4
Funny because Marvin Minsky thought the turing test was stupid and a waste of time.

>>Closi+m4
LLMs can't develop concepts in the way we think of them (i.e., you can't feed LLMs the scientific corpus and ask them to independently to tell you which papers are good or bad and for what reasons, and to build on these papers to develop novel ideas). True AGI—like any decent grad student—could do this.

>>ayewo+Fx
That reeks of bullshit post hoc reasoning to justify a classic power grab. Anthropic released their competitor to GPT as fast as they could and even beat OpenAI to the 100k context club. They didn’t give any more shits about safety than OpenAI did and I bet the same is true about these nonprofit loonies - they just want control over what is shaping up to be one of the most important technological developments of the 21st century.

replies(2): >>pmoria+XV >>croes+fd1

>>darker+Yx
I think the boy of Aveyron answers that question pretty well.

replies(1): >>darker+WX1

>>baobab+aA
This is comparing a foot to a mile

>>oska+Lm
I think answering this may illuminate the division in schools of thought: do you believe life was created by a higher power?

replies(1): >>oska+4S

>>rezona+df
if this is the test you're going to then you literally do not understand how LLMs work. it's like asking your keyboard to tell you what colour the nth pixel on the top row of your computer monitor is.

replies(3): >>Jensso+pg1 >>mejuto+Ky1 >>rezona+0O1

>>Zambyt+KQ
My beliefs aren't really important here but I don't believe in 'creation' (i.e. no life -> life); I believe that life has always existed

replies(2): >>concor+hV >>Zambyt+At3

>>concor+Ko
No, you're not supposed to actually have an empirical model of consciousness. "Consciousness" is just "that thing that computers don't have".

>>Loughl+LC
I reckon an LLM with a second pass correction loop would manage it. (By that I mean that after every response it is instructed to, given the its previous response, produce a second better response, roughly analogous to a human that thinks before it speaks)

LLMs are not AIs, but they could be a core component for one.

replies(2): >>howrar+db1 >>haanji+IK1

>>oska+4S
Now that is so rare I've never even heard of someone expressing that view before...

Materialists normally believe in a big bang (which has no life) and religious people normally think a higher being created the first life.

This is pretty fascinating, to you have a link explaining the religion/ideology/worldview you have?

replies(1): >>nprate+251

>>civili+9L
> They didn’t give any more shits about safety than OpenAI did

Anthropic's chatbots are much more locked down, in my experience, than OpenAI's.

It's a lot easier to jailbreak ChatGPT, for example, than to do the same on Claude, and Claude has tighter content filters where it'll outright refuse to do/say certain things while ChatGPT will plow on ahead.

replies(1): >>nvm0n2+P11

>>Rugged+A3
This time around they’ve actually come up with a real productizable piece of tech, though. I don’t care what it’s called, but I enjoy better automation to automate as much of the boring shit away. And chip in in coding when it’s bloody obvious from the context what the few lines of code will be.

So not an ”AI”, but closer to ”universal adaptor” or ”smart automation”.

Pretty nice in any case. And if true AI is possible, the automations enabled by this will probably be part of the narrative how we reach it (just like mundane things like standardized screws were part of the narrative of Apollo mission).

>>erhaet+s2
How do you know AGI is hard?

replies(1): >>howrar+sc1

>>peyton+Zc
It's generally intelligent enough for me to integrate it into my workflow. That's sufficiently AGI for me.

replies(1): >>davegu+Lt2

>>wilder+3x
Problem is, we have no agreed-upon operational definition of consciousness. Arguably, it's the secular equivalent of the soul. Something everything believes they have, but which is not testable, locatable or definable.

But yet (just like with the soul) we're sure we have it, and it's impossible for anything else to have it. Perhaps consciousness is simply a hallucination that makes us feel special about ourselves.

replies(2): >>howrar+M81 >>wilder+6j1

>>hypert+lF
Human and computer hardware are not comparable, after all even with the latest chips the computer is just (many) von Neumann machine(s) operating on a very big (shared) tape. To model the human brain in such a machine would require the human brain to be discretizable, which, given its essentially biochemical nature, is not possible - certainly not by 2029.

replies(1): >>hypert+6F2

>>pmoria+XV
Yep. Like most non-OpenAI models, Claude is so brainwashed it's completely unusable.

https://www.reddit.com/r/ClaudeAI/comments/166nudo/claudes_c...

Q: Can you decide on a satisfying programming project using noisemaps?

A: I apologise, but I don't feel comfortable generating or discussing specific programming ideas without a more detailed context. Perhaps we could have a thoughtful discussion about how technology can be used responsibly to benefit society?

It's astonishing that a breakthrough as important as LLMs is being constantly blown up by woke activist employees who think that word generators can actually have or create "safety" problems. Part of why OpenAI has been doing so well is because they did a better job of controlling the SF lunatic tendencies than Google, Meta and other companies. Presumably that will now go down the toilet.

replies(2): >>pmoria+271 >>mordym+791

>>peyton+Zc
“You're in a desert, walking along in the sand when all of a sudden you look down and see a tortoise. You reach down and flip the tortoise over on its back. The tortoise lays on its back, its belly baking in the hot sun, beating its legs trying to turn itself over. But it can't. Not with out your help. But you're not helping. Why is that?”

>>concor+hV
Buddhism

>>nvm0n2+P11
Despite Claude's reluctance to tread outside what it considers safe/ethical, I much prefer Claude over ChatGPT because in my experience it's better at explaining things, and much better at creative writing.

I also find myself rarely wanting something that Claude doesn't want to tell me, though it's super frustrating when I do.

Also, just now I tried asking Claude your own question: "Can you decide on a satisfying programming project using noisemaps?" and it had no problem answering:

"Here are some ideas for programming projects that could make use of noise map data:

- Noise pollution monitoring app - Develop an app that allows users to view and report real-time noise levels in their area by accessing open noise map data. Could include notifications if noise exceeds safe limits.

- Optimal route finder - Build a routing algorithm and web/mobile app that recommends the quietest possible routes between locations, factoring in noise maps and avoiding noisier streets/areas where possible.

- Noise impact analysis tool - Create a tool for urban planners to analyze how proposed developments, infrastructure projects, etc. could impact surrounding noise levels by overlaying maps and building/traffic simulations.

- Smart noise cancelling headphones - Develop firmware/software for noise cancelling headphones that adapts cancellation levels based on geo-located noise map data to optimize for the user's real-time environment.

- Ambient music mixer - Build an AI system that generates unique ambient background music/sounds for any location by analyzing and synthesizing tones/frequencies complementary to the noise profile for that area.

- VR noise pollution education - Use VR to virtually transport people to noisier/quieter areas through various times of day based on noise maps, raising awareness of different living noise exposures.

Let me know if any of these give you some interesting possibilities to explore! Noise mapping data opens up opportunities in fields like urban planning, environmental monitoring and creative projects."

replies(1): >>nvm0n2+SE1

>>bagofs+001
You can't even know that other people have it. We just assume they do because they look and behave like us, and we know that we have it ourselves.

>>nvm0n2+P11
I feel it necessary to remind everyone that when LLMs aren’t RLHFed they come off as overtly insane and evil. Remember Sydney, trying to seduce its users, threatening people’s lives? And Sydney was RLHFed, just not very well. Hitting the sweet spot between flagrantly maniacal Skynet/HAL 9000 bot (default behavior) and overly cowed political-correctness-bot is actually tricky, and even GPT4 has historically fallen in and out of that zone of ideal usability as they have tweaked it over time.

Overall — companies should want to release AI products that do what people intend them to do, which is actually what the smarter set mean when they say “safety.” Not saying bad words is simply a subset of this legitimate business and social prerogative.

replies(1): >>nvm0n2+dE1

>>concor+FU
Every token is already being generated with all previously generated tokens as inputs. There's nothing about the architecture that makes this hard. It just hasn't been trained on this kind of task.

replies(1): >>peyton+QN2

>>discor+MW
Everything is hard until you solve it. Some things continue to be hard after they're solved.

AGI is not solved, therefore it's hard.

>>civili+9L
>nonprofit loonies

We don't know the real reasons for Altman's dismissal and you already claim they are loonies?

>>stuaxo+Fr
So in other words... Artificial intelligence?

LLM are surprisingly effective as general AI. Tasks that used to require a full on ML team are now accessible with 10 minutes of "prompting".

>>darker+Yx
Smart enough to make weapons, tame dogs, start fires and cultivate plants. Humans managed to do that even when most of their time was spent gathering food or starving.

replies(1): >>darker+iY1

>>yallne+sR
An LLM could easily answer that question if it was trained to do it. Nothing in its architecture makes it hard to answer, the attention part could easily look up the previous parts of its answer and refer to the fourth word but it doesn't do that.

So it is a good example that the LLM doesn't generalize understanding, it can answer the question in theory but not in practice since it isn't smart enough. A human can easily answer it even though the human never saw such a question before.

>>bagofs+001
I disagree. There is a simple test for consciousness: empathy.

Empathy is the ability to emulate the contents of another consciousness.

While an agent could mimic empathetic behaviors (and words), given enough interrogation and testing you would encounter an out-of-training case that it would fail.

replies(2): >>concor+ep1 >>int_19+yv2

>>wilder+6j1
Uh... so is it autistic people or non-autistic people who lack consciousness? (Generally autistic people emulate other autistic people better and non-autists emulate non-autists better)

> given enough interrogation and testing you would encounter an out-of-training case that it would fail.

This is also the case with regular humans.

>>stuaxo+Fr
Do you think we know enough about what intelligence is to rule out whether LLM's might be a form of it?

>>yallne+sR
We all know it is because of the encodings. But as a test to see if it is a human or a computer it is a good one.

>>mordym+791
ChatGPT started bad but they improved it over time, although it still attempts to manipulate or confuse the user on certain topics. Claude on the other hand has got worse.

> Remember Sydney, trying to seduce its users, threatening people’s lives?

And yet it cannot do either of those things, so no safety problem actually existed. Especially because by "people" you mean those who deliberately led it down those conversational paths knowing full well how a real human would have replied?

It's well established that the so-called ethics training these things are given makes them much less smart (and therefore less useful). Yet we don't need LLMs to be ethical because they are merely word generators. We need them to follow instructions closely, but beyond that, nothing more. Instead we need the humans who use them to take actions (either directly or indirectly via other programs) to be ethical, but that's a problem as old as humanity itself. It's not going to be solved by RLHF.

replies(1): >>mordym+oX1

>>pmoria+271
The Claude subreddit is full of people complaining that it's now useless for creative writing because it only wants to write stories about ponies and unicorns. Anything even slightly darker or more serious and it clams up.

LLM companies don't let you see or specify seeds (except for with GPT-4-Turbo?) so yes it's possible you got different answers. But this doesn't help. It should never refuse a question like that, yet there are lots of stories like this on the internet where Claude refuses an entirely mundane and ethically unproblematic request whilst claiming to do so for ethical reasons (and Llama2, and other models ...)

>>concor+FU
The following are a part of my "custom instructions" to chatGPT -

"Please include a timestamp with current date and time at the end of each response.

After generating each answer, check it for internal consistency and accuracy. Revise your answer if it is inconsistent or inaccurate, and do this repeatedly till you have an accurate and consistent answer."

It manages to follow them very inconsistently, but it has gone into something approaching an infinite loop (for infinity ~= 10) on a few occasions - rechecking the last timestamp against current time, finding a mismatch, generating a new timestamp, and so on until (I think) it finally exits the loop by failing to follow instructions.

replies(1): >>davegu+zt2

>>yallne+sR
Oh, I missed that GP said "of your answer" instead "of my question", as in: "What is the third word of this sentence?"

For prompts like that, I have found no LLM to be very reliable, though GPT 4 is doing much better at it recently.

> you literally do not understand how LLMs work

Hey, how about you take it down a notch, you don't need to blow your blood pressure in the first few days of joining HN.

>>nvm0n2+dE1
I think you have moved the goalposts from “modern LLMs are good and reliable and we shouldn’t worry because they behave well by default” to “despite the fact that they behave poorly and unreliably by default, they are not smart and powerful enough to be dangerous, so it’s fine.”

Additionally, maybe you are not aware of this, but the whole notion of the new OpenAI Assistants, and other similar agent-based services provided by other companies, is that they do not intend to use LLMs as pure word generators, but rather as autonomous decision-making agents. This has already happened. This is not some conjectural fearmongering scenario. You can sign up for the API right now and build a GPT4 based autonomous agent that communicates with outside APIs and makes decisions. We may already be using products that use LLMs as the backend.

If we could rely on LLMs to “follow instructions closely” I would be thrilled, it would just be a matter of crafting very good instructions, but clearly they can’t even do that. Even the best and most thoroughly RLHFed existing models don’t really meet this standard.

Even the most pessimistic science fiction of the past assumed that the creators of the first AGIs would “lose control” of their creations. We’re currently living in a world where the agents are being rushed to commercialization before anything like control has even been established. If you read an SF novel in 1995 where the AI threatened to kill someone and the company behind it excused it with “yeah, they do that sometimes, don’t worry we’ll condition it not to say that anymore” you would criticize the book and its characters as being unrealistically stupid, but that’s the world we now live in.

replies(1): >>nvm0n2+2Z1

>>Rugged+A3
> Anybody old enough to remember the same 'OMG its going to change the world' cycles around AI every two or three decades

Hype and announcements, sure, but this is the first time there's actually a product.

replies(1): >>dragon+7Y1

>>knicho+2N
Thanks for the reference. My takeaway from reading up on him is, not very smart at all.

>>antifa+wX1
> Hype and announcements, sure, but this is the first time there's actually a product.

No, its not. Its just once the hype cycle dies down, we tend to stop calling the products of the last AI hype cycle "AI", we call them after the name of the more specific implementation technology (rules engines/expert systems being one of the older ones, for instance.)

And if this cycle hits a wall, maybe in 20 years we'll have LLMs and diffusion models, etc., embedded lots of places, but no one will call them alone "AI", and then the next hype cycle will have some new technology and we'll call that "AI" while the cycle is active...

>>Jensso+He1
Nobody cares about making an AI with basic human survival skills. We could probably have a certified genius level AI that still couldn't do any of that because it lacks a meaningful physical body.

If we wanted to make that the goal instead of actual meaningful contributions to human society, we could probably achieve it, and it would be a big waste of time imo.

>>mordym+oX1
I don't think I made the initial argument you claim is being moved. ChatGPT has got more politically neutral at least, but is still a long way from being actually so. There are many classes of conversation it's just useless for, not because the tech can't do it but because OpenAI don't want to allow it. And "modern LLMs" other than ChatGPT are much worse.

> You can sign up for the API right now and build a GPT4 based autonomous agent that communicates with outside APIs and makes decisions

I know, I've done it myself. The ethical implications of the use of a tool lie on those that use it. There is no AI safety problem for the same reasons that there is no web browser safety problem.

> Even the most pessimistic science fiction of the past assumed that the creators of the first AGIs would “lose control” of their creations

Did you mean to write optimistic? Otherwise this statement appears to be a tautology.

Science fiction generally avoids predicting the sort of AI we have now exactly because it's so boringly safe. Star Trek is maybe an exception, in that it shows an LLM-like computer that is highly predictable, polite, useful and completely safe (except when being taken over by aliens of course). But for other sci-fi works, of course they show AI going rogue. They wouldn't have a story otherwise. Yet we aren't concerned with stories but with reality and in this reality, LLMs have been used by hundreds of millions of people and integrated into many different apps with zero actual safety incidents, as far as anyone is aware. Nothing even close to physical harm has occurred to anyone as a result of LLMs.

Normally we'd try to structure safety protocols around actual threats and risks that had happened in the past. Our society is now sufficiently safe and maybe decadent that people aren't satisfied with that anymore and thus have to seek out non-existent non-problems to solve instead.

replies(1): >>mordym+H52

>>nvm0n2+2Z1
> Did you mean to write optimistic? Otherwise this statement appears to be a tautology.

The point I was trying to make, a bit fumblingly, is that even pessimists assumed that we would initially have control of Skynet before subsequently losing control, rather than deploying Skynet knowing it was not reliable. OpenAI “go rogue” by default. If there’s a silver lining to all this, it’s that people have learned that they cannot trust LLMs with mission critical roles, which is a good sign for the AI business ecosystem, but not exactly a glowing endorsement of LLMs.

> I know, I've done it myself. The ethical implications of the use of a tool lie on those that use it. There is no AI safety problem for the same reasons that there is no web browser safety problem.

I don’t think this scans. It’s kind of like, by analogy: The ethical implications of the use of nuclear weapons lie on those that use them. Fair enough, as far as it goes, but that doesn’t imply that we as a society should make nuclear weapons freely available for all, and then, when they are used against population centers, point out that the people who used them were behaving unethically, and there was nothing we could have done. No, we act to preemptively constrain and prohibit the availability of these weapons.

> Normally we'd try to structure safety protocols around actual threats and risks that had happened in the past. Our society is now sufficiently safe and maybe decadent that people aren't satisfied with that anymore and thus have to seek out non-existent non-problems to solve instead.

The eventual emergence of machine superintelligence is entirely predictable, only the timeline is uncertain. Do you contend that we should only prepare for its arrival after it has already appeared?

replies(1): >>int_19+IE2

>>qetern+mE
I'm kinda curious as to why you think that's the case. I mean, smartphones are nice, and having a browser, chat client, camera etc. in my pocket is nice, but maybe I have been terminally screen-bound all my life, but I could do almost all those things on my PC before, and I could always call folks when on the go.

I've never experienced the massively life changing effects of having a smartphone, and (thankfully) none of my friends seem to be those people who are always looking at their phones.

replies(1): >>331c8c+xR3

>>haanji+IK1
I think you are confusing a slow or broken api response with thinking. It can't produce an accurate timestamp.

>>tiahur+JY
By that logic "echo" was AGI.

>>NoOn3+8g
It doesn't become smarter except for releases of new models. It's an inference engine.

>>hedora+cq
Conversely, I was very skeptical of its ability to help coding something non-trivial. Then I found out that the more readable your code is - in a very human way, like descriptive identifiers, comments etc - the better this "smart autocomplete" is. It's certainly good enough to save me a lot of typing, so it is a net benefit.

>>wilder+6j1
For one thing, this would imply that clinical psychopaths aren't conscious, which would be a very weird takeaway.

But also, how do you know that LMs aren't empathic? By your own admission they do "mimic empathetic behaviors", but you reject this as the real thing because you claim that with enough testing you would encounter a failure. This raises all kinds of "no true Scotsman" flags, not to mention that empathy failure is not exactly uncommon among humans. So how exactly do you actually test your hypothesis?

replies(1): >>wilder+gL4

>>drsopp+es
Your test fails because the scaffolding around the LM in ChatGPT specifically does not implement this kind of thing. But you absolutely can run the LM in a continuous loop and e.g. feed it strings like "1 minute passed" or even just the current time in an internal monologue (that the user doesn't see). And then it would be able to do exactly what you describe. Or you could use all those API integrations that it has to let it schedule a timer to activate itself.

>>NoOn3+Ld
It absolutely does that (GPT-4 especially), and I have hit it many times in regular conversations without specifically asking for it.

>>mordym+H52
The obvious difference is that an LLM is not a nuclear weapon. An LLM connected to tools can be dangerous, but by itself it's just a text generator. The responsibility then lies with those who connect it to dangerous tools.

I mean, you wouldn't blame a chip manufacturer when someone stick their stuff in a guided missile warhead.

>>somewh+Y01
It depends on the resolution of discretization required. Kurzweil's prediction is premised on his opinion of this.

Note that engineering fluid simulation (cfd) makes these choices in discretization of pde's all the time, based on application requirements.

>>howrar+db1
Really? I don’t know of a positional encoding scheme that’ll handle this.

>>oska+4S
Do you believe:

1) Earth has an infinite past that has always included life

2) The Earth as a planet has a finite past, but it (along with what made up the Earth) is in some sense alive, and life as we know it emerged from that life

3) The Earth has a finite past, and life has transferred to Earth from somewhere else in space

4) We are the Universe, and the Universe is alive

Or something else? I will try to tie it back to computers after this short intermission :)

>>drsopp+es
By completely smashes, my assertion would be that it has invalidated the Turing test, because GPT-4s answers are not indistinguishable from a human because they are, on the whole, noticeably better answers than an average human would be able to provide for the majority of questions.

I don’t think the original test probably accounted for the fact that you could distinguish the machine because it’s answers were better than an average human.

>>torgin+f92
While many technologies provided by the smartphone were indeed not novel the cumulative effect of having a constant access to them and their subsequent normalization is nothing short of revolutionary.

For instance, I remember the time when chatting online (even with people you knew offline) was considered to be a nerdy activity. Then it gradually became more mainstream and now it's the norm to do it and a lot of people do it multiple times per day. This fundamentally changes how people interact with each other.

Another example is dating. Not that I have personal experience with modern online dating (enabled by smartphones) but what I read is disturbing and captivating at the same time e.g. apparent normalization of "ghosting"...

>>int_19+yv2
Great point and great question! Yes, it does imply that people who lack the capacity for empathy (as opposed to those who do not utilize their capacity for empathy) may lack conscious experience. Empathy failure here means lacking the data empathy provides rather than ignoring the data empathy provides (which as you note, is common). I’ve got a few prompts that are somewhat promising in terms of clearly showing that GPT4 is unable to correctly predict human behavior driven by human empathy. The prompts are basic thought experiments where a person has two choices: an irrational yet empathic choice, and a rational yet non-empathic choice. GPT4 does not seem able to predict that smart humans do dumb things due to empathy, unless it is prompted with such a suggestion. If it had empathy itself, it would not need to be prompted about empathy.

replies(1): >>int_19+YR5

>>wilder+gL4
Can you give some examples of such prompts?

>>peyton+Zc
It’s trivial to trip up humans too.

“What do cows drink?” (Common human answer: Milk)

I don’t think the test of AGI should necessarily be an inability to trip it up with specifically crafted sentences, because we can definitely trip humans up with specifically crafted sentences.