zlacker

Stop doing self-correction within the context of the model's own generation.

The previous paper on self correction told the model "you previously said X - are there errors with this?"

This one has the mistakes statically added to the prompt in a task prompt and response without additional context immediately before asking if it has any errors.

Think about the training data.

How often does the training data of most of the Internet reflect users identifying issues with their own output?

How often does the training data reflect users identifying issues with someone else's output?

Try doing self-correction by setting up the context of "this was someone else's answer". It is still technically self-correction if a model is reviewing its own output in that context - it just isn't set up as "correct your own answer."

This may even be part of why the classifier did a better job at identifying issues - less the fine tuning and more the context (unfortunately I don't see the training/prompts for the classifier in their GitHub repo).

It really seems like the aversion to anthropomorphizing LLMs is leading people to ignore or overlook relevant patterns in the highly anthropomorphic training data fed into them. We might not want to entertain that a LLM has a concept of self vs other or a bias between critiques based on such a differentiation, and yet the training data almost certainly reflects such a concept and bias.

I'd strongly encourage future work on self-correction to explicitly define the thing being evaluated as the work of another. (Or ideally even compare self-correction rates between critiques in the context of their own output vs another's output.)

replies(8): >>andai+L1 >>famous+w6 >>4death+Ua >>Poigna+1c >>bongod+Gp >>NoToP+l11 >>seanhu+z51 >>cyanyd+z34

>>kromem+(OP)
That's hilarious. Does this imply LLMs inherited the human tendency to get attached to a perspective despite evidence to the contrary? I'll often try to coax the right answer out of GPT-3 when I know it's wrong, and it'll often insist that it's right several times in a row.

replies(3): >>OmarSh+i2 >>mkl+Lb1 >>jibal+0d1

>>andai+L1
I think it does indeed suggest this, but I think this may be good news.

Part of what makes humans able to make progress in difficult, vague, and uncertain fields is a willingness to hold onto a point of view in the face of criticism to try & fix itl. This is, as a matter of fact, how science progresses, depending on if you ask scientists or historians of science. See Thomas Kuhn's Structure of Scientific Revolutions for more on this.

replies(1): >>jibal+pd1

>>kromem+(OP)
>It really seems like the aversion to anthropomorphizing LLMs is leading people to ignore or overlook relevant patterns in the highly anthropomorphic training data fed into them.

This exactly. Not anthropomizing when anthropomization is producing better predictive models of what to expect in output is not smart, it's just silly.

>>kromem+(OP)
I don't agree about your point regarding training data. The internet is infamous for pedants who will correct even the smallest factual or logical errors. Take this comment for instance... It seems like the training set would be filled with proposition X, followed by a corrective assertion Y.

replies(2): >>xp84+gc >>jibal+kz3

>>kromem+(OP)
> Think about the training data.

> How often does the training data of most of the Internet reflect users identifying issues with their own output?

> How often does the training data reflect users identifying issues with someone else's output?

I wouldn't put too much weight into just-so theories like this.

We still don't understand too much about how LLMs process information internally; it could be that their understanding of the concept of "correcting a previous mistake" is good enough that they can access it without prompt engineering to mimic the way it happens in training data. Or maybe not (after all, there's an entire management concept called "pre-mortems" which is basically doing what you suggest, as a human).

replies(3): >>kromem+Te >>galaxy+nM >>jibal+8e1

>>4death+Ua
I think you're agreeing with GP.

That's the point: The internet IS full of pedants correcting others' statements. (Hopefully those pedants are right enough of the time for this to be helpful training data, heh.)

I think GP (kromem) was pointing out that those corrections are more likely to be phrased as "You're wrong, here's why..." than as "I'm sorry, I was mistaken" because humans are full of sass for other humans and not as full of first-person admitted errors.

replies(2): >>kromem+0d >>galaxy+8N

>>xp84+gc
Exactly.

replies(1): >>4death+V01

>>Poigna+1c
This depends less on the internals vs the patterns in the training data.

Even if the model has the capacity to abstract beyond the patterns, the patterns are still very likely to have influence on its ability to do so.

For example, early after GPT-4 was released it was being claimed it couldn't solve variations on the goat, wolf, and cabbage problem.

I found that it could solve these variations fine 100% of the time, you just needed to explicitly prompt for it to repeat adjectives with nouns and change the nouns to emojis. The repeating worked similar to CoT by biasing the generation towards the variation and away from the original form, and the emojis in place of the nouns further broke the token associations which was leading it to fail by extending the original solution.

So while it's possible that with enough finessing you could get a model to perform self-critique as well as its critique of others, if the training data has a clear pattern of bias between those two, why actively ignore it?

It's a bit like sanding against the grain vs with it. You can sand against the grain of the training data and with enough effort potentially get the result you want with sophisticated enough models. But maybe your life will be a lot easier if you identify the grain in the data first and sand along with it instead?

replies(1): >>Poigna+h86

>>kromem+(OP)
I see lots of people trying to prompt with incomplete sentences, not capitalizing, using slang, bad grammar, imprecise terminology etc. And it still works. However, I find that you get a noticable a quality boost if you use proper English and treat it more like a human.

"i want a python app that calculates a roadtrip for me"

vs

"Please write me a Python program using a map API that measures the distance between two locations as a car would drive. Think carefully about the program architecture and be sure to use a human readable Pythonic style. Please show me the complete program in it's entirety."

The former game me a high level overview with a ton of explanation and didn't write any code. You can try to walk it through the process of all the steps it needs, but it will write "confused", albeit working, code after a few prompts. The latter just wrote working code on the first response. Moving forward, the context is just so more concise and correct that everything after will be of much higher quality.

I rarely go past 5-10 responses due to what I'd call "context poisoning". If it makes a simple syntax error or something small, I'll shoot it the error and let it correct itself. But as soon as it invents a function or otherwise hallucinates, it gets copy pasted into a new prompt saying "here's some bad code, fix this" and it is far more likely to come up with an elegant solution rather that rewriting everything or making huge changes to solve a one off error or something it's previous context was preventing it from grasping.

What you're saying is almost the meta of using good grammer and context, and I completely agree.

replies(8): >>galaxy+6D >>lupire+KI >>CtrlAl+LW >>kromem+V31 >>averev+Kj1 >>miven+Mp1 >>ameliu+ww1 >>jiggaw+Fz1

>>bongod+Gp
If your prompt-input is high quality it is more likely to match high-quality training inputs

replies(1): >>eurode+Zc1

>>bongod+Gp
Your example confounds many variables.

replies(1): >>bongod+Va1

>>Poigna+1c
> We still don't understand too much about how LLMs process information internally

I admit I personally don't know too much about how "LLMs process information internally". But, I would find it curious if programmers who created the system wouldn't understand what it is doing. Is there any evidence that the LLM programmers don't understand how the program they created works?

replies(2): >>kromem+Y11 >>seanhu+P61

>>xp84+gc
Good point. That is why LLMs are incapable of humility. And that may be their downfall.

>>bongod+Gp
Using a common search engine for "python app calculate roadtrip"

is way faster, free, doesn't require a phone number or login, and gives much better results.

replies(2): >>cosmoj+a11 >>jibal+Cc1

>>kromem+0d
Isn’t the entire promise of LLMs that they’re supposed to generalize, though?

replies(2): >>jibal+8z3 >>kromem+ZV4

>>CtrlAl+LW
Not nearly as quickly or directly, though. LLMs augmented by search engines (or vice versa) seem to be an obvious and permanent innovation, especially for the general public who are notoriously awful at personally generating optimal keywords for a desired search query.

replies(1): >>Roark6+6a1

>>kromem+(OP)
With due respect (and I actually mean due respect), this embodies exactly what is wrong with the modern approach to AI. Who cares that there's no examples in the training set. True AI should be able of taking a few steps out of book without getting flummoxed. When you learn your first language, teacher does not stand before the class and provide examples of ungrammatical statements, yet you figure out the rules of grammar just fine.

There is something fundamentally flawed in the approach not in the data.

replies(2): >>seanhu+O71 >>kromem+NX4

>>galaxy+nM
LLMs aren't programmed and it's why the neural network working as it does is black box to everyone, developers included.

Imagine a billion black boxes with hamsters put in them. You put in a bag of equally mixed Skittles in one end of each box and then rate each box based on how well it does to get rid of the yellow and green Skittles but push out the others. The ones that do the best at this you mate the hamsters and go again, over and over. Eventually you should have hamsters in boxes that almost always get rid of yellow and green Skittles and output the rest.

But is it because you bred in a preference to eat those color Skittles? An aversion to the other colors? Are they using those colors for nesting? Do they find the red and blue and orange ones too stimulating so they push those out but leave the others alone?

There could be a myriad of reasons why your training was successful, and without the ability to introspect the result you just won't know what's correct.

This is a huge simplification by way of loose analogy for what's going on with training a transformer based LLM, but no one is sitting there 'programming' it. They are just setting up the conditions for it to self-optimize around the training goals given the data, and the 'programming' just has to do with improving the efficiency of the training process. Analyzing the final network itself is like trying to understand what each variable in a billion variable math equation is doing to the result.

replies(1): >>galaxy+j71

>>bongod+Gp
A recent paper along these lines you might be interested in was Large Language Models Understand and Can be Enhanced by Emotional Stimuli: https://arxiv.org/abs/2307.11760

It makes complete sense and has been a part of my own usage for well over a year now, but it's been cool seeing it demonstrated in research across multiple models.

replies(1): >>bongod+3u1

>>kromem+(OP)
The obvious way to do this would be as adversarial networks like in GANs for image generation. Have the existing LLM as the generator trained exactly as at present but with an additional penalty for being found to have committed an error and have another network trained at the same time as a validator where its fitness function is finding errors in the output of the generator.

People must be doing this, probably just takes a while for the research to bear fruit.

Some of these errors are so obvious I can’t imagine this would be too hard. For an example, try asking an LLM “generate me a system of two equations in two unknowns. Both the coefficients and the solutions must be integers between -10 and 10”. In my experience it will generate a valid system. Some of the time the coefficients will be in the range specified. Probably about a third to a half the time the solution it gives will be wrong and when you ask for an explanation of the solution it will make some basic arithmetic error (eg flipping a sign etc). Then when you point out the error it will correct.

>>galaxy+nM
People understand how the program works but not how the network produces the outputs it does from the inputs and training it receives. The mechanics of how these models work at a very high level are:

1. Tokenize some input so you have some big vectors

2. <bunch of linear algebra involving these vectors and some sets of matrices of weights>

3. Take the output vector and turn it back into tokens

Each of these steps are well understood in and of themselves. So maybe the magic is in the way the matrices of weights are generated and trained? Well we know they typically start as random matrices, and can explain how as the network is trained these weights are tweaked in various ways.

All of that is known. What’s unclear is specifically how the weights in the matrices correspond to our understanding of the concepts in the input and output and how it all seems to add up to a system that works as well as it does. I think that’s what they meant by not understanding how they process information internally.

replies(1): >>galaxy+M81

>>kromem+Y11
When you train an LLM you do that by executing some computer code with some inputs. The programmers who wrote the code you execute know exactly what it does. Just like Google knows exactly how its search-algorithm works. An LLM uses statistics and Markov-chains and what have you to generate the output for a given input.

It's like with any optimization algorithm. You cannot predict what exactly will be the result of a given optimization-run. But you know how the optimization algorithm works. The (more or less) optimal solution you get back might surprise you, might be counter-intuitive. But programmers who wrote the code that did the optimization, and have the source-code, know exactly how it works.

When you get a result from LLM you don't say "I can't possibly understand why it came up with this result?". You can understand that, it's just following the rules it was programmed to follow. You might not know those rules, you might not understand them, but programmers who wrote them do.

replies(4): >>IanCal+if1 >>trasht+5A1 >>jibal+Ax3 >>kromem+HU4

>>NoToP+l11
There are training methodologies that do this but they don’t necessarily work in this case (or noone has got them to work that well yet).

For example reinforcement learning, like when AlphaZero famously learned by playing itself at chess and go and became much stronger than the purpose-built “alphago” first version.

Or another example generative adversarial networks where you have a generator network generating images and a validator network trying to spot fake images.

In both these examples it’s easy to see how you build the loss functions for the training because they are quite constrained. For a domain like a game you penalize versions of the model that lose games and reward those that win. For GANs the initial insight was huge but having had that it’s easy to see how you move forward - you reward the generator for slipping fake images past the validator and you reward the validator for finding fakes in a stream of images that includes some real images and some generated images.

For an open-ended general model like an LLM it’s not so easy to see how you do this in the general case. GPT models are actually pretty good at “zero shot” learning (without examples) and “transfer” learning (where lessons from a domain are applied to an associated domain).

Your example of a language is interesting, because you don’t learn your first language from any sort of teacher - you learn it from your parents and others talking around you and to you. So you have lots of examples to draw on. You then try out various sounds and words and everyone looks confused but becomes more excited as you get closer to saying something that is a real word eventually you hit on the magic recipe and say the word “DUCK!” (Or whatever) and everyone loses their minds. So you have lots of positive reinforcement that you’re on the right track and you have a huge number of examples. You’re not just fed the hackernews comment section, some papers on quantum mechanics and all the english literature that has fallen out of copyright and left to get on with it.

replies(1): >>NoToP+tv1

>>seanhu+P61
> that’s what they meant by not understanding how they process information internally.

There is no other "internal information processing" happening in an LLM than the process it was programmed to execute. Is there?

The code an LLM executes is not too complicated for humans to understand. After all it was written by humans. The outputs may be surprising but so it is with lottery. Why did I win the jackpot this week, when I didn't win anything in the last 10 years? Very counter-intuitive. I can't possibly understand that? Yes I can, it is just statistics and probability.

replies(1): >>seanhu+ic1

>>cosmoj+a11
I'm not convinced. On these few occasions where an AI chat bot went out, did a Google search and responded with results the quality of that answer was always much worse than if it just replied from it's training data. This of course excludes things that happened after training data ends.

For example, ask chatgpt about writing a python script that does anything with AWS inspector 2. It will do very badly, it will hallucinate, etc. Even with Internet access. Ask about doing the same with some other API that was well represented in the training set and it's great.

This is why I think predicting death for sites like stackoverflow is very premature. What happens 10 years down the line once everything chatgpt knows is old tech? It can't be simply trained with more recrnt data, because unless stackoverflow regains it's popularity there will be very little training data. Of course various data generation techniques will be invented and tried, but no one will match the gold standard of human generated data.

Unfortunately I have to predict inevitable enshittification of general purpose chat bots.

replies(1): >>dwattt+Cn1

>>lupire+KI
You're definitely right. I'm painting with very broad strokes to make a point of what I've been seeing.

>>andai+L1
Getting attached to a perspective despite evidence to the contrary would require perspective and distinguishing fact from fiction, but just copying humans protesting that they're right (regardless of context) seems plausible, as there's a lot of that to learn from.

replies(1): >>kromem+SV4

>>galaxy+M81
As I tried to explain, it's not the code that people don't understand. People understand the code they wrote.

It's why the bunch of linear algebra on the weights works to do this particular task, and how it will respond to any particular task that is a bit mysterious.

Like imagine someone gave you the Taylor series expansion of the inverse Kepler equation[1]. So you just have a bunch of crazy fractions of powers of x that you add up. And then they say ok now the this function will very accurately explain the orbit of the planets.

You'd be able to do the steps - you're just adding up fractions. You'd be able to verify the answer you got corresponded to the orbit of a given celestial body.

But if you didn't have all the pieces in the middle (calculus mainly) there's no way you'd be able to explain why this particular set of fractions corresponds to the movement of the planets and some other set doesn't.

[1] https://en.wikipedia.org/wiki/Kepler%27s_equation scroll down a bit

replies(1): >>galaxy+ZI2

>>CtrlAl+LW
Utterly false. A google search for that phrase yields "It looks like there aren't many great matches for your search". And no search engine will yield the code for such an app unless the engine is LLM-based.

replies(2): >>CtrlAl+He2 >>jibal+4N4

>>galaxy+6D
This seems intuitively true but has it been established ?

replies(1): >>galaxy+UJ2

>>andai+L1
Everything in the output of LLMs is inherited from human tendencies ... that's the very essence of how they work. But LLMs themselves don't have any of these tendencies ... they are just statistical engines that extract patterns from the training data.

replies(2): >>kromem+GV4 >>jibal+pn5

>>OmarSh+i2
But LLMs don't do these things ... they just produce text that statistically matches patterns in the training data. Since the humans who authored the training data have personality patterns, the outputs of LLMs show these personality patterns. But LLMs do not internalize such patterns--they have no cognitive functions of their own.

>>Poigna+1c
They aren't just-so theories ... this is how LLMs work. We actually understand exactly how they process information internally, but since their very nature is to extract statistical patterns from the training data and that training data is massive, we can't anticipate what patterns have been extracted. We just know that, whatever patterns are there to be abstracted--e.g., users tending to identify issues with someone else's output rather than their own--those patterns will be reflected in the output.

>>galaxy+j71
You're mixing up what we mean by what rules it's following or how it's working.

If I ask how it's able to write a poem given a request and you tell me you know - it multiplies and adds this set of 1.8 trillion numbers together X times with this set of accumulators, I would argue you don't understand how it works enough to make any useful predictions.

Kind of like how you understand what insane spaghetti code is doing - it's running this code - but can have absolutely no idea what business logic it encodes.

replies(1): >>galaxy+jH2

>>bongod+Gp
Smallish model (7b) require a somewhat simplified grammar tho. Especially with longer complex instruction I found more luck by joining all the conditions with ands and to have everything that's a follow up and need to happen in order joined by then, instead of having more natural sentences.

So instead of "write a short story of a person that's satisfied at work" something along the line of "write a short story and the protagonist must be a person and the protagonist must be happy at work" boost comprension especially as the condition list becomes longer.

>>Roark6+6a1
https://www.inf.ufpr.br/renato/profession.html

>>bongod+Gp
Are there any risks I miss to asking a model (in a separate context as to not muddy the waters) to rewrite the informal prompt into something more proper and then use that as a prompt?

Seems like a pretty simple task for an LLM as long as the initial prompt isn't too ambiguous. If it really does help with the recall it could be interesting to have this as an optional preprocessing layer in chat clients and such.

replies(3): >>bongod+SJ1 >>geoduc+iZ1 >>kromem+XW4

>>kromem+V31
This is wonderful, thank you.

>>seanhu+O71
I wish I could take credit for my example, but it's perhaps the most famous example in all of linguistics and its the thing that made Noam Chomsky's name in the field.

To summarise it quickly, Chomsky's contention was that all the world's languages can be described by shockingly few degrees of freedom on the same universal grammar, and that we learn language surprisingly fast relative to training data because all we are really picking up are those parameters and the rest is hard wired from birth the same way horses come out the womb already hard wired to gallop.

Decades later, very few things have truely stood the test of being universal among languages, but it was still a valuable contribution because he poked a serious hole in the pure Hebbian reinforcement theories which were in vogue back then.

>>bongod+Gp
How often does:

"Please write me ..."

occur in training data? And why does it still work?

>>bongod+Gp
When experimenting with the early models that were set up for "text completion" instead of question-answer chat, I noticed that I could get it to generate vastly better code by having the LLM complete a high quality "doc comment" style preamble instead of a one-line comment.

I also noticed that if I wrote comments in "my style", then it would complete the code in my style also, which I found both hilarious and mildly disturbing.

replies(1): >>kromem+sX4

>>galaxy+j71
I think of LLM's as if we would create a human stem cell from scratch, including the DNA, and then grow it to a person.

We may know every we put every single atom in that stem cell, but still not know any more about the resulting baby (and later adult) than we do about humans made the natural way.

Oh, and if you're looking for reasons to regulate AI, this metaphor works for that, too.

>>miven+Mp1
I do this all the time. "Summarize in a YAML like markup that retains all information." Then plug that as is into something else.

>>miven+Mp1
That is a pretty good use case. In fact, if your prompt is very long, you will need to summarize it (with an LLM).

Also, when you fine-tune the LLM, you can also use an LLM to summarize or concatenate content that you train it on (e.g. rewrite this content in the style of a human having a conversation with a computer)

>>jibal+Cc1
Are we using the same google? Did you make a typo?

"python app calculate roadtrip"

>About 6,470,000 results (0.34 seconds)

Four out of the top five results have code. The other one is a video tutorial where the app is coded live.

>>IanCal+if1
It is not "spaghetti-code" but well-engineered code I believe. The output of an LLM is based on billions of fine-tuned parameters but we know how those parameters came about, by executing the code of the AI-application in the training mode.

It doesn't really encode "business logic", it just matches your input with the best output it can come up with, based on how its parameters are fine-tuned. Saying that "We don't understand how it works" is just unnecessary AI-mysticism.

replies(1): >>IanCal+uQ2

>>seanhu+ic1
There are many mathematical functions whose output is hard to predict and requires immense calculations. I just recently read about how they had "discovered" the 9th Dedekind number, or something like that.

Just because we can't predict what the 10th Dedekind number will be does not mean it is somehow 'mysterious". It is just mathematics, logic and programming.

replies(1): >>seanhu+wP2

>>eurode+Zc1
Not sure. But it does make sense like you say. The output must somehow correspond to the input, in a meaningful way, that is the purpose of LLMs. If you gave the LLM just one words as input who knows what the output would be. But if you give it more meaningful information it has more to work with, to give you an answer that more precisely matches your question.

>>galaxy+ZI2
I don't think the Dedekind number relationship is really like what I described though. These are numbers which. have known properties (ie given a number you can test whether or not it is that) but no known closed form solution exists for the generator of the sequence, and probably there is no structure to the intervals between the numbers other than the properties we ascribe to the numbers. I see them as similar to primes for example in that you know one when you see one but not how to make all of them[1].

In my example, the relationship between the fractions in the Tailor expansion and the orbit definitely exists but if you don't have calculus it is not something that is amenable to understanding. There is a fundamental structure but the language to describe it would be missing.

ML is a universal function approximator and in the case of GPT models the functional form of the model consists of linear algebra operations and the parameters are matrices of weights. The mysterious part is "how the model processes information" like the original person said - why a particular mix of model weights corresponds with particular types of outputs. That is genuinely mysterious. We don't know whether or not there really is a structure and if there is, we don't know the "calculus" that would link them.

Now it may be that there isn't a missing piece (ie that the banal truth is we tweak the weights until we see what we want to see and by doing so we create an illusion of structure via the training process and the whole perception that the model is doing any information processing at all is something we make up). I actually have a lot of time for this point of view although I really need to understand the topic much more deeply before I make my own mind up.

[1] I don't know any number theory so could be totally wrong about this in which case I apologise.

>>galaxy+jH2
The spaghetti code comparison is not to the code but the parameters.

> It doesn't really encode "business logic"

Doesn't it? Gpt architectures can build world models internally while processing tokens (see Othello got).

> we know how those parameters came about, by executing the code of the AI-application in the training mode.

Sure. But that's not actually a very useful description when trying to figure out how to use and apply these models to solve problems or understand what their limitations are.

> Saying that "We don't understand how it works" is just unnecessary AI-mysticism.

We don't to the level we want to.

Tell you what, let's flip it around. If we know how they work just fine, why are smart researchers doing experiments with them? Why is looking at the code and billions or trillions of floats not enough?

>>galaxy+j71
What you fail to appreciate is the operation of an LLM is driven by the input data far more than is the case with most programs. Typical programs have a lot of business logic that determines their behavior--rules, as you say. E.g., an optimizing compiler has a large number of hand-crafted optimizations that are invoked when code fits the pattern they are intended for. But LLMs don't have programmed cases or rule like that--the core algorithms are input-agnostic. All of the variability of the output is purely a reflection of patterns in the input; the programmers never made any sort of decision like "if this word is seen do this".

>>4death+V01
No. Or if so that's a false promise, because LLMs are incapable of generalizing.

>>4death+Ua
Read it again ... you got it exactly backwards.

>>kromem+(OP)
if you're using reddit logic, the user needs to present the wrong answer first, before getting the right answer

anyway, LLMs aren't thinking. they're pattern matching and it's not doing recursion it seems.

I'd say the only way you're getting error correction is taking multiple LLMS And running them through chains and parallel construction.

>>jibal+Cc1
P.S.

https://www.google.com/search?q=%22python+app+calculate+road...

If you leave off the quotes (which were present in the comment I responded to) then of course you will get millions of irrelevant hits. Somewhere in that chaff there is some Python code that alleges to have something to with road trips, though it's not always clear what. If I give the same prompt to ChatGPT I get a nicely formatted box with a program that uses the Google Maps Distance Matrix API to calculate distance and duration, without a bunch of junk to wade through. (I haven't tried it so it could be a complete hallucination.)

>>galaxy+j71
The training is known. The result is not.

The gap between what you think is the case and what's actually the case is that there isn't a single optimization step directed by the programming.

Instead, the training gives the network the freedom to make its own optimizations, which remain obfuscated from the programmers.

So we do know that we are giving the network the ability to self modify in order to optimize its performance on the task, and have a clear understanding of how this is set up.

But it isn't at all clear what the self modifications that improve the results are actually doing, as there's simply far too many interdependent variables to identify cause and effect for each node's weight changes from the initial to final state.

>>jibal+0d1
What you just said is paradoxical.

If there is a pattern in the training data that people resist contrary information to their earlier stated position, and a LLM extracts and extends patterns from the training data, then a LLM absolutely should have a tendency to resist contrary information to an earlier stated position.

The difference, and what I think you may have meant to indicate, is that there's not necessarily the same contributing processes that lend themselves to that tendency in humans occurring in parallel in the LLM, even if both should fall into that tendency in their output.

So the tendencies represented in the data are mirrored, such as "when people are mourning their grandmother dying I should be extra helpful" even if the underlying processes - such as mirror neurons firing to resonate grief or drawing on one's own lived experience of loss to empathize - are not occurring in the LLM.

>>mkl+Lb1
> distinguishing fact from fiction

Actually this part does seem in recent research to be encoded in LLMs at an abstract level in a linear representation...

https://arxiv.org/abs/2310.06824

>>4death+V01
How do you mean?

>>miven+Mp1
Preprocessing prompts is actually a great approach.

Personally I think given the model loss with fine tuning people who want the cutting edge LLM at any cost would - instead of fine tuning the model itself - fine tune a preprocess prompter that takes a chat/instruction and converts it to a good TextCompletion prompt.

So for example taking "write me a paragraph of marketing copy for an athletic shoe" and tuning it into:

"Marketing case study: Athletic shoe The problem: The client needed a paragraph of high quality marketing copy to promote their new athletic shoe on their website. The solution: Our award winning copywriters wrote the outstanding copy reproduced below."

Followed by an extractor that reformats the completion result into an answer for the initial prompt, as well as potentially a safety filter that checks the result isn't breaking any rules (which will as a bonus be much more resistant to jailbreaking attempts).

>>jiggaw+Fz1
The fact that 90% of the people aware of and using LLMs have yet to experience it thinking their own thoughts before they do means we're in store for a whole new slew of freak outs as integration in real world products expands.

It's a very weird feeling for sure. I remember when Copilot first took a comment I left at the end of the day for me to start my next day with and generated exactly the thing I was going to end up thinking of 5 minutes later in my own personal style.

It doesn't always work and it often has compile issues, but when it does align just right - it's quite amazing and unsettling at the same time.

>>NoToP+l11
Well, let me know when "true AI" arrives.

Until then, I'll make sure to be mindful of conventions.

(And just a reminder, but organic intelligence has its own conventions that work when aligned with and cause issues when misaligned with, so your expectations of universal general purpose without advantages to one approach or another may be unrealistic.)

>>jibal+0d1
P.S. What I said is not "paradoxical". An LLM does not take on the attributes of its training data, any more than a computer screen displaying the pages of books becomes an author. Regardless of what is in the training data, the LLM continues to be the same statistical engine. The notion that an LLM can take on human characteristics is a category mistake, like thinking that there are people inside your TV set. The TV set is not, for instance, a criminal, even if it is tuned to crime shows 24/7. And an LLM does not have a tendency to protect its ego, even if everyone who contributed to the training data does ... the LLM doesn't have an ego. Those are characteristics of its output, not of the LLM itself, and there's a huge difference between the two. Too many people seem to think that, if for instance, they insult the LLM, it feels offended, just because it says it does. But that's entirely an illusion.

>>kromem+Te
As someone who tried variants of the wolf riddle with the Bing model and didn'tget too far, I'm super interested. Do you have a source on this?