NHacker Next
login
▲Why language models hallucinateopenai.com
250 points by simianwords 2 days ago | 202 comments
Loading comments...
fumeux_fume 1 days ago [-]
I like that OpenAI is drawing a clear line on what “hallucination” means, giving examples, and showing practical steps for addressing them. The post isn’t groundbreaking, but it helps set the tone for how we talk about hallucinations.

What bothers me about the hot takes is the claim that “all models do is hallucinate.” That collapses the distinction entirely. Yes, models are just predicting the next token—but that doesn’t mean all outputs are hallucinations. If that were true, it’d be pointless to even have the term, and it would ignore the fact that some models hallucinate much less than others because of scale, training, and fine-tuning.

That’s why a careful definition matters: not every generation is a hallucination, and having good definitions let us talk about the real differences.

freehorse 1 days ago [-]
> What bothers me about the hot takes is the claim that “all models do is hallucinate.” That collapses the distinction entirely

That is a problem for "Open"AI because they want to sell their products, and because they want to claim that LLMs will scale to superintelligence. Not for others.

"Bad" hallucinations come in different forms, and what the article describes is one of them. Not all of them come from complete uncertainty. There are also the cases where the LLM is hallucinating functions in a library, or they reverse cause and effect when summarising a complex article. Stuff like this still happen all the time, even with SOTA models. They do not happen because the model is bad with uncertainty, they have nothing to do with knowledge uncertainty. Esp stuff like producing statements that misinterpret causal relationships within text, imo, reveals exactly the limits of the architectural approach.

catlifeonmars 1 days ago [-]
So there are two angles to this:

- From the perspective of LLM research/engineering, saying all LLM generation is hallucination is not particularly useful. It’s meaningless for the problem space.

- From the perspective of AI research/engineering in general (not LLM specific) it can be useful to consider architectures that do not rely on hallucination in the second sense.

druskacik 10 hours ago [-]
I like this quote:

'Everything an LLM outputs is a hallucination. It's just that some of those hallucinations are true.'

hodgehog11 1 days ago [-]
Absolutely in agreement here. This same statement should also be applied to the words "know", "understand", and "conceptualize". "Generalize", "memorize" and "out-of-distribution" should also be cautiously considered when working with systems trained on incomprehensibly large datasets.

We need to establish proper definitions and models for these things before we can begin to argue about them. Otherwise we're just wasting time.

vrighter 18 hours ago [-]
if you insist that they are different, then please find one logical, non-subjective, way to distinguish between a hallucination and not-a-hallucination. Looking at the output and deciding "this is clearly wrong" does not count. No vibes.
esafak 18 hours ago [-]
> Looking at the output and deciding "this is clearly wrong" does not count.

You need the ground truth to be able to make that determination, so using your knowledge does count. If you press the model to answer even when it does not know, you get confabulation. What today's models lack is the ability to measure their confidence, so they know when to abstain.

ttctciyf 14 hours ago [-]
"Hallucination" is a euphemism at best, and the implication it carries that LLMs correctly perceive (meaning) when they are not hallucinating is fallacious and disinforming.

The reification of counterfactual outputs which are otherwise indistinguishable from the remainder of LLM production etiologically is a better candidate for the label "hallucination" IMO.

TychoCelchuuu 20 hours ago [-]
[dead]
aleph_minus_one 1 days ago [-]
> Think about it like a multiple-choice test. If you do not know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero. In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say “I don’t know.”

To me, this seems to be an "US-American" way of thinking about multiple-choice tests. Other common ways to grade multiple-choice test that I have seen commonly are:

1. If the testee has the information that exactly one of N given choices is correct:

1.1 Give N-1 points for the correct answer, and -1 [negative one] point(s) for a wrong answer. This way, if the testee just answers the questions randomly, he will as expected value score 0 points.

1.2 A more brutal way if N>=3: the correct answer gives 1 point, all wrong answers give -1 points. You should learn your lesson only to give an answer if it is [alliteration unintended :-) ] correct (if N=2, the grading is identical to 1.1).

2. If there are possibly multiple correct answers, turn each item into choices of "yes" or "no" (with the option to give no answer). The correct choice gives you 1 point, the wrong gives you -1 point (i.e. as in 1.1).

roxolotl 1 days ago [-]
The SAT, American college entrance examine, used to, I haven’t looked in years so maybe it still does, take away points for wrong answers and give 0 points for no answer. I’m pretty sure it was +1 for right answer, 0 for no answer, -1/4 for wrong answer.
thaumasiotes 1 days ago [-]
They used to do that, but then they stopped and announced that you were better off guessing because there would be no adjustment for it.

A lot of what they do is based on public relations rather than psychometric validity.

bananaflag 1 days ago [-]
This is mentioned in the text:

> This idea is not new. Some standardized tests have long used versions of negative marking for wrong answers or partial credit for leaving questions blank to discourage blind guessing.

throwawaymaths 1 days ago [-]
there's not really an easy way to train for that at scale. a "correct" answer may not be one token, there may be multiple synonymous answers starting with different tokens, you could add five space tokens in front of the answer amd it likely shouldn't make it "wrong".
ACCount37 1 days ago [-]
Yes, it's not nearly as easy as "just fix the evals".

But better evals are still helpful, because they reward LLM vendors for trying to do the very-hard-to-do thing. Instead of rewarding them for training an LLM that's really good at emitting 7% confidence guesses.

throwawaymaths 1 days ago [-]
you're missing the point. SAT multiple choice negatives for random guesses, fine, you could trivially use this sort of a strategy for assigning cost functions to a classifier and backpropagate. how do you give negative weight to a wrong answer when training a transformer?
ACCount37 1 days ago [-]
In RLVR? Quite easily.

And OpenAI has induced hallucinations in o3 with RLVR mistakes, not with a failed pre-training run. They used o4-mini as an example - similar training to o3 and similar issues.

Conversely, they have also designed a post-training system that has successfully reduced hallucinations in GPT-5.

1 days ago [-]
RugnirViking 1 days ago [-]
isn't this just related to the question "how do you train a transformer"? you give it wrong examples, and use optimization algorithms to move away from that kind of completions
throwawaymaths 1 days ago [-]
thats quite hard for the reasons i explained. might be solvable using q learning techniques, but those are not easy in the context of transformers iiuc
rhubarbtree 1 days ago [-]
I find this rather oddly phrased.

LLMs hallucinate because they are language models. They are stochastic models of language. They model language, not truth.

If the “truthy” responses are common in their training set for a given prompt, you might be more likely to get something useful as output. Feels like we fell into that idea and said - ok this is useful as an information retrieval tool. And now we use RL to reinforce that useful behaviour. But still, it’s a (biased) language model.

I don’t think that’s how humans work. There’s more to it. We need a model of language, but it’s not sufficient to explain our mental mechanisms. We have other ways of thinking than generating language fragments.

Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.

crystal_revenge 22 hours ago [-]
People also tend not to understand the absurdity of assuming that we can make LLMs stop hallucinating. It would imply not only that truth is absolutely objective, but that it exists on some smooth manifold which language can be mapped to.

That means there would be some high dimensional surface representing "all true things". Any fact could be trivially resolved as "true" or "false" simply by exploring whether or not it was represented on this surface. Where or not "My social security number is 123-45-6789" is true could be determined simply by checking whether or not that statement was mappable to the truth manifold. Likewise you could wander around that truth manifold and start generating output of all true things.

If such a thing existed it would make even the wildest fantasies about AGI seem tame.

edit: To simplify it further, this would imply you could have an 'is_true(statement: string): bool' function for any arbitrary statement in English.

jdietrich 20 hours ago [-]
>People also tend not to understand the absurdity of assuming that we can make LLMs stop hallucinating. It would imply not only that truth is absolutely objective, but that it exists on some smooth manifold which language can be mapped to.

Frankly, this is a silly line of argument. There is a vast spectrum between regularly inventing non-existent citations and total omniscience. "We can't define objective truth" isn't a gotcha, it's just irrelevant.

Nobody in the field is talking about or working on completely eliminating hallucinations in some grand philosophical sense, they're just grinding away at making the error rate go down, because that makes models more useful. As shown in this article, relatively simple changes can have a huge effect and meaningful progress is being made very rapidly.

We've been here before, with scepticism about Wikipedia. A generation of teachers taught their students "you can't trust Wikipedia, because anyone can edit it". Two decades and a raft of studies later, it became clear that Wikipedia is at least as factually accurate as traditional encyclopedias and textbooks. The contemporary debate about the reliability of Wikipedia is now fundamentally the same as arguments about the reliability of any carefully-edited resource, revolving around subtle and insidious biases rather than blatant falsehoods.

Large neural networks do not have to be omniscient to be demonstrably more reliable than all other sources of knowledge, they just need to keep improving at their current rate for a few more years. Theoretical nitpicking is missing the forest for the trees - what we can empirically observe about the progress in AI development should have us bracing ourselves for radical social and economic transformation.

apsurd 17 hours ago [-]
You're not being charitable with the take. seems like you just switched "objective truth" with your flavor: "error rate"

what is an error? how does the llm "know"?

wikipedia example is good. i'd say its "truth" is based on human curated consensus. everyone gets that. what i don't get what's the llm analog? as you state, it's just about making the error rate go down, ok so what is an error? does it require human in the loop?

skydhash 17 hours ago [-]
The thing is, for a lot of tasks, a formal method (either algorithmic or simulation) can be very efficient to create and run with more reliable results. And for a lot of cases, creating a simpler and smaller model with other ML techniques can be as good or better than LLMs.

There's still no justification for the whole investment craze in LLMs.

mqus 22 hours ago [-]
Well, no. The article pretty much says that any arbitrary statement can be mapped to {true, false, I don't know}. This is still not 100% accurate, but at least something that seems reachable. The model should just be able to tell unknowns, not be able to verify every single fact.
gary_0 22 hours ago [-]
Determining a statement's truth (or if it's outside the system's knowledge) is an old problem in machine intelligence, with whole subfields like knowledge graphs and such, and it's NOT a problem LLMs were originally meant to address at all.

LLMs are text generators that are very good at writing a book report based on a prompt and the patterns learned from the training corpus, but it's an entirely separate problem to go through that book report statement by statement and determine if each one is true/false/unknown. And that problem is one that the AI field has already spent 60 years on, so there's a lot of hubris in assuming you can just solve that and bolt it onto the side of GPT-5 by next quarter.

red75prime 16 hours ago [-]
> And that problem is one that the AI field has already spent 60 years on

I hope you don't think that the solutions will be a closed-form expression. The solution should involve exploration and learning. The things that LLMs are instrumental in, you know.

sirwhinesalot 14 hours ago [-]
Not the same person but I think the "structure" of what the ML model is learning can have a substantial impact, specially if it then builds on that to produce further output.

Learning to guess the next token is very different from learning to map text to a hypervector representing a graph of concepts. This can be witnessed in image classification tasks involving overlapping objects where the output must describe their relative positioning. Vector-symbolic models perform substantially better than more "brute-force" neural nets of equivalent size.

But this is still different from hardcoding a knowledge graph or using closed-form expressions.

Human intelligence relies on very similar neural structures to those we use for movement. Reference frames are both how we navigate the world and also how we think. There's no reason to limit ourselves to next token prediction. It works great because it's easy to set up with the training data we have, but it's otherwise a very "dumb" way to go about it.

gary_0 16 hours ago [-]
Of course not, expert systems were abandoned decades ago for good reason. But LLMs are only one kind of ANN. Unfortunately, when all you have is a hammer...
thisoneisreal 21 hours ago [-]
A great book in this vein is "Language vs. Reality." The main thesis of the book is that language evolved to support approximate, ad hoc collaboration, and is woefully inadequate for doing the kind of work that e.g. scientists do, which requires incredible specificity and precision (hence the amount of effort devoted to definitions and quantification).
BobbyTables2 19 hours ago [-]
Agree. I deeply suspect the problem of asking an LLM to not hallucinate is equivalent to the classic Halting Problem.
beeflet 17 hours ago [-]
Maybe if a language model was so absolutely massive, it could <think> enough to simulate the entire universe and determine your social security number
riwsky 16 hours ago [-]
42
thisoneisreal 21 hours ago [-]
This strikes me as a perfect description of the core problem. Whenever I think about this, what sticks out to me is that other animals do all sorts of things that look like "intelligence," or at least cognition, and they do it totally without language. My cat clearly recognizes objects, assigns them different values ("scary," "tasty," "fun to play with"), interacts with them in some kind of loop, even predicts their behavior to some extent and acts curious about them (it was really fun to watch her try to figure out the construction guys when I had some work done on my house over a period of a few days). These strike me as much more foundational aspects of intelligence than language. Language has of course immeasurably contributed to what makes human cognition and intelligence, but it's almost certainly built on these pre-linguistic foundations. Another very good hint in this direction is all of the non-verbal thinking that humans have done. Einstein has a famous quote about thinking visually and physically, without using language at all. All of these are powerful suggestions that something else is going on, and most likely some aspect of these things are necessary for true intelligence.
simianparrot 15 hours ago [-]
I’ve always thought everyone agreed language was a lossy but useful method of compression for sharing inner concepts and ideas. That my conscious thoughts are “in a language” doesn’t mean my reasoning and entire being interacts with the world using language.

I’m only “thinking in language” when I’m practicing compressing my intent into a shareable format. I don’t think about the majority of highly complex interactions I have with the physical world throughout the day.

As a child did you need to be able to explain in language how the physics of a swing works to be able to use it? Did other kids have to explain it to you in detailed language for you to pick up on how to move your body to do complex tasks?

No. In fact exactly because our compression and decompression of language is even more limited as children, we rely more heavily on raw observation and mimicry of actions occurring in reality itself.

The very idea that a language model can recreate everything we do from the lossy and compressed languages we use to share limited descriptions of much more complex intentions and actions is fundamentally flawed and oversimplified.

utyop22 1 days ago [-]
The reality is, language itself does not capture the entirety of what is really going on. And I'd get argue its the poorest way of expressing - but one that enables transmission through various mediums efficiently on a cost basis.

E.g. when I explain a concept, what comes to my mind is not a string of letters and words. There is a mix of imagery and even sounds that I may have acquired from learning about a concept - then I translate that into text so it can be communicated.

Theres a reason why people use native subtitles when watching netflix - text complements imagery and sounds.

kelnos 13 hours ago [-]
I use subtitles becomes sometimes I have trouble understanding the actors. I believe I read something that suggested that the sound mix in movies and cinematic TV shows has changed a lot in the past couple decades, and a result is that it's harder to understand dialogue.

I don't like this; I find my eyes spending more time than I'd like on the text, and not enough on the visual imagery on the rest of the screen. If I truly wanted more text, I'd just read a book.

pawelmurias 23 hours ago [-]
I would assume most people use native subtitles when it's hard to understand what words the actors said.
ekianjo 22 hours ago [-]
Yeah because modern filmmakers make it very hard to hear dialogs for some reason and actors are encouraged to mumble. If I remember correctly even Nolan admitted it.
jibal 20 hours ago [-]
And they often speak very quickly--I often rewind to catch critical plot points. It's a lot different from a stage play, where actors enunciate so clearly. (Not that I want stage cadence and booming voices from a film ... they are different art forms.)

Also I watch of English language material that uses accents quite different from what my ears are tuned to.

jibal 23 hours ago [-]
That's why I do.
TacticalCoder 22 hours ago [-]
[dead]
utyop22 23 hours ago [-]
No that is not the reason.

People watch Netflix to switch their brain off - having the text there helps along with the visual and sound to deliver the content. However, text is inferior to both visual and sound as a delivery mechanism.

keanebean86 23 hours ago [-]
Subtitles increase the signal to noise ratio. At least in our house. We have to keep the tv low to not wake the child. A volume of 10 with subtitles is similar to volume at 16 without subtitles.
crabmusket 23 hours ago [-]
> I don’t think that’s how humans work.

Every time this comes up I have to bring up Deutsch. He has the best description of intelligent cognition that I've come across. He takes Popper's "conjecture and criticism" approach to science and argues that this guess-and-check loop applies to all our thinking.

E.g. understanding spoken language has some elements of guessing what might have been said and checking that against the sounds we heard. Visual processing has similar analogies.

LLMs seem to be great at conjecturing stuff, but seem incapable of checking or even knowing they need to check.

codethief 14 hours ago [-]
> Every time this comes up I have to bring up Deutsch. He has the best description of intelligent cognition that I've come across.

Would you have a reference?

crabmusket 6 hours ago [-]
If you like books, read The Beginning of Infinity. If you don't, I can't help! I wish there were something I could point to online, but nothing really encapsulates the lessons I took from that book. Yes, I'll have to write that thing one day.
codethief 4 hours ago [-]
Thanks so much!
munchler 23 hours ago [-]
This is directly addressed in the article, which states that language models can be trained to abstain when uncertain, by changing how rewards are set up. Incentives currently encourage guessing rather than being honest about uncertainty. If you disagree, it would be helpful to explain why, rather than just responding to the title alone.
asats 22 hours ago [-]
Exactly. I always found it strange when people assume that "hallucinations" are just some sort of a bug in the system, as if by you tweaking some code or training modality will produce an oracle of absolute truth incapable of making mistakes.
humanfromearth9 12 hours ago [-]
Humans think with inductive and deductive reasoning. First inductive, then we generalize and deduce, which allows for quick decision-making, hence increases our survival fitness. I don't know how the transition is done from inductive to deductive, and that's probably why currently, AI is not able to reason like humans.
ComplexSystems 1 days ago [-]
> Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.

Why? It seems no less odd than eliminating cases where it gives "undesirable" code snippets with hallucinated errors. This is very important and not odd at all.

rhubarbtree 1 days ago [-]
To clarify, because you will be left with a biased language model. It will continue to hallucinate, and as you squeeze some hallucinations in one part of the language space you may well create new ones elsewhere. It doesn’t seem a solid line of attack
didibus 24 hours ago [-]
I agree with everything you said except:

> Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.

Take it back to what it is like you say, this is a predictive model, and the work of any ML scientist is to iterate on the model to try and get perfect accuracy on unseen data. It makes sense to want to tune the models to lower the rate of predictive errors. And because perfect predictive accuracy is rarely possible, you need to make judgment calls between precision and recall, which, in the case of LLMs, directly affects how often the model will hallucinate versus how often it will stay silent or overly cautious.

rubatuga 23 hours ago [-]
But we're getting into the limits of knowledge and what is true/untrue. A stochastic model will be wrong sometimes.
didibus 23 hours ago [-]
Off course, 100% prediction accuracy cannot be achieved.

I just mean that, if you're an ML scientist team, you don't just go, we got 76% accuracy, let's close shop, mail in your resignation, job over.

From that angle, it's not odd at all that the team just continues working and now see if they can achieve greater than 76%.

yreg 14 hours ago [-]
Maybe it goes against the definition but I like saying that _all_ output is a hallucination, when explaining LLMs.

It just happens that a lot of that output is useful/corresponding with the real world.

kelnos 13 hours ago [-]
Well yes, it goes against the accepted definition. And if all output is hallucination, then it's not really a useful way to describe anything, so why bother?
MattPalmer1086 12 hours ago [-]
I agree that saying everything is a hallucination doesn't help to narrow down on possible solutions.

It does however make the point that hallucinations are not some special glitch which is distinct from the normal operation of the model. It's just outputting plausible text, which is right often enough to be useful.

Adding in some extra sauce to help the model evaluate the correctness of answers, or when it doesn't know enough to give a good answer, is obviously one way to mitigate this otherwise innate behaviour.

drekipus 13 hours ago [-]
But it's the perfect definition because it shows what it is. The output is a hallucination in what it thinks you want, which you can use for better form prompts or the like.

To say "it only hallucinates sometimes" is burying the lede and confusing for people who are trying to use it

Q: How do I stop Hallucinations? A: useless question, because you can't. It is the mechanism that gives you what you want

amelius 1 days ago [-]
They hallucinate because it's an ill-defined problem with two conflicting usecases:

1. If I tell it the first two lines of a story, I want the LLM to complete the story. This requires hallucination, because it has to make up things. The story has to be original.

2. If I ask it a question, I want it to reply with facts. It should not make up stuff.

LMs were originally designed for (1) because researchers thought that (2) was out of reach. But it turned out that, without any fundamental changes, LMs could do a little bit of (2) and since that discovery things have improved but not to the point that hallucination disappeared or was under control.

didibus 1 days ago [-]
The word "hallucination" mis-characterizes it.

LLMs predict the likely tokens to follow the context. And they can make incorrect predictions.

LLMs therefore don't have perfect accuracy of prediction. When their predictions are incorrect, people say they "hallucinate".

Nobody questions why predictive weather models aren't perfectly accurate, because it makes sense that a prediction can be wrong.

Marketing and hype has tried to sell LLMs as "logical rational thinkers" equal to human thinking. A human doing actual thinking knows when they are making stuff up. So if a human truly believes obviously false things to be true, it tends to be because they are hallucinating. Their thinking isn't wrong, they've lost track of reality to ground their thinking.

We've anthropomorphized LLMs to the point we wonder why are they hallucinating like we can offer a diagnostic. But if you stop anthropomorphising them and go back to their actual nature as a predictive model, then it's not even a surprising outcome that predictions can turn out to be wrong.

Jensson 24 hours ago [-]
A weather model is made to predict the weather and used to predict the weather, so there you are right.

A language model is made to predict language, but used to generate code or answers to math questions, that is not the same situation as a weather model. The language model is not made to solve math or generate correct code, if you ask it to predict the weather it wont try to predict the weather, it will just predict the language that is a probable to such a question.

This sort of misunderstanding is what is causing all these debates, many people really struggle understanding what these language models really are.

nelox 22 hours ago [-]
That framing is too narrow. A weather model is trained on physics equations but still relies on patterns in past data to make forecasts. A language model is trained on patterns in human text but that text already encodes mathematics, code, and reasoning. When prompted with a math problem, the model is not doing physics but it is reproducing the learned statistical structure of solutions people have written before. The distinction between “predicting language” and “solving math” is smaller than it seems because the training data couples symbols to meaning. Dismissing its outputs as “just predicting words” misses the fact that word distributions encode information-rich representations of knowledge. That is why large models can in practice generate working code, prove theorems, and reason through problems, even if they do so imperfectly. The right comparison is not that people are misusing them, but that they generalize beyond their design intent because language itself is the medium through which so many other domains are expressed.
didibus 23 hours ago [-]
I agree the model is predicting language and not actually running the math. That is a point I try to stress too. It is not thinking through a problem, it is predicting what text would look like if someone were working it out.

But the training does not just reinforce plausible continuations, it biases toward text that matches correct answers. So in that sense they are training it not just to predict any likely text, but to predict text that is more likely to contain the right answer to a math or coding problem.

To me that does not look so different from other ML models. They all work by turning a problem into something a computer can handle statistically, and they all face the same trade offs. Prediction errors are inevitable, and you still have to decide whether to tune for recall, which gives hallucinations, or precision, which gives refusals.

C-x_C-f 21 hours ago [-]
> A language model is made to predict language

<pedantry>Isn't a language model made to predict the next token in a series, which just so happens to be good for predicting not only natural languages, but also formal ones (code and math)?</pedantry>

Also, similar to what nelox said, as long as language (or sequences of tokens or what have you) can be "about" something (whatever that means), then it's possible that LLMs are encoding information about that "something". I'm being deliberately vague because I think that trying to be precise (by e.g. referring to latent spaces and so on) makes it sound like we've figured something out when in reality we haven't even found the right words to ask the questions.

wavemode 1 days ago [-]
Indeed - as Rebecca Parsons puts it, all an LLM knows how to do is hallucinate. Users just tend to find some of these hallucinations useful, and some not.
saghm 1 days ago [-]
This is a a super helpful way of putting it. I've tried to explain to my less technical friends and relatives that from the standpoint of an LLM, there's no concept of "truth", and that all it basically just comes up with the shape of what a response should look like and then fills in the blanks with pretty much anything it wants. My success in getting the point across has been mixed, so I'll need to try out this much more concise way of putting it next time!
ninetyninenine 1 days ago [-]
But this explanation doesn’t fully characterize it does it?

Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.

Additionally when the LLM responds MOST of the answers are true even though quite a bit are wrong. If it had no conceptual understanding of truth than the majority of its answers would be wrong because there are overwhelmingly far more wrong responses than there are true responses. Even a “close” hallucination has a low probability of occurring due to its proximity to a low probability region of truth in the vectorized space.

You’ve been having trouble conveying these ideas to relatives because it’s an inaccurate characterization of phenomena we don’t understand. We do not categorically fully understand what’s going on with LLMs internally and we already have tons of people similar to you making claims like this as if it’s verifiable fact.

Your claim here cannot be verified. We do not know if LLMs know the truth and they are lying to us or if they are in actuality hallucinating.

You want proof about why your statement can’t be verified? Because the article the parent commenter is responding to is saying the exact fucking opposite. OpenAI makes an opposing argument and it can go either way because we don’t have definitive proof about either way. The article is saying that LLMs are “guessing” and that it’s an incentive problem that LLMs are inadvertently incentivized to guess and if you incentivize the LLM to not confidently guess and to be more uncertain the outcomes will change to what we expect.

Right? If it’s just an incentive problem it means the LLM does know the difference between truth and uncertainty and that we can coax this knowledge out of the LLM through incentives.

kolektiv 1 days ago [-]
But an LLM is not answering "what is truth?". It's "answering" "what does an answer to the question "what is truth?" look like?".

It doesn't need a conceptual understanding of truth - yes, there are far more wrong responses than right ones, but the right ones appear more often in the training data and so the probabilities assigned to the tokens which would make up a "right" one are higher, and thus returned more often.

You're anthropomorphizing in using terms like "lying to us" or "know the truth". Yes, it's theoretically possible I suppose that they've secretly obtained some form of emergent consciousness and also decided to hide that fact, but there's no evidence that makes that seem probable - to start from that premise would be very questionable scientifically.

A lot of people seem to be saying we don't understand what it's doing, but I haven't seen any credible proof that we don't. It looks miraculous to the relatively untrained eye - many things do, but just because I might not understand how something works, it doesn't mean nobody does.

rambambram 1 days ago [-]
Nice to read some common sense in a friendly way. I follow your RSS feed, please keep posting on your blog. Unless you're an AI and secretly obtained some form of emergent consciousness, then not.
ninetyninenine 24 hours ago [-]
>But an LLM is not answering "what is truth?". It's "answering" "what does an answer to the question "what is truth?" look like?".

You don't actually know this right? You said what I'm saying is theoretically possible so you're contradicting what you're saying.

>You're anthropomorphizing in using terms like "lying to us" or "know the truth". Yes, it's theoretically possible I suppose that they've secretly obtained some form of emergent consciousness and also decided to hide that fact, but there's no evidence that makes that seem probable - to start from that premise would be very questionable scientifically.

Where did I say it's conscious? You hallucinated here thinking I said something I didn't.

Just because you can lie doesn't mean you're conscious. For example, a sign can lie to you. If the speed limit is 60 but there's a sign that says the speed limit is 100 then the sign is lying. Is the sign conscious? No.

Knowing is a different story though. But think about this carefully. How would we determine whether a "human" knows anything? We only can tell whether a "human" "knows" things based on what it Tells us. Just like an LLM. So based off of what the LLM tells us, it's MORE probable that the LLM "knows" because that's the SAME exact reasoning on how we can tell a human "knows". There's no other way we can determine whether or not an LLM or a human "knows" anything.

So really I'm not anthropomorphizing anything. You're the one that's falling for that trap. Knowing and lying are not unique concepts to conciousness or humanity. These are neutral concepts that exist beyond what it means to be human. When I say something, "knows" or something "lies" I'm saying it from a highly unbiased and netural perspective. It is your bias that causes you to anthropomorphize these concepts with the hallucination that these are human centric concepts.

>A lot of people seem to be saying we don't understand what it's doing, but I haven't seen any credible proof that we don't.

Bro. You're out of touch.

https://www.youtube.com/watch?v=qrvK_KuIeJk&t=284s

Hinton, the godfather of modern AI says we don't understand. It's not people saying we don't understand. It's the generally understanding within academia is: we don't understand LLMs. So you're wrong. You don't know what you're talking about and you're highly misinformed.

zbentley 23 hours ago [-]
I think your assessment of the academic take on AI is wrong. We have a rather thorough understanding of the how/why of the mechanisms of LLMs, even if after training their results sometimes surprise us.

Additionally, there is a very large body of academic research that digs into how LLMs seem to understand concepts and truths and, sure enough, examples of us making point edits to models to change the “facts” that they “know”. My favorite of that corpus, though far from the only or most current/advances research , is the Bau Lab’s work: https://rome.baulab.info/

riwsky 16 hours ago [-]
Here’s where you're clearly wrong. The correct favorite in that corpus is Golden Gate Claude: https://www.anthropic.com/news/golden-gate-claude
ninetyninenine 21 hours ago [-]
It’s not about what you think it’s about who’s factually right or wrong.

You referenced a work on model interpretability which is essentially the equivalent of putting on MRI or electrodes on the human brain and saying we understand the brain because some portion of it lights up when we show the brain a picture of a cow. There’s lots of work on model interpretability just like how there’s lots of science involving brain scans of the human brain… the problem is none of this gives insight into how the brain or an LLM works.

In terms of understanding LLMs we overall don’t understand what’s going on. It’s not like I didn’t know about attempts to decode what’s going on in these neural networks… I know all about it, but none of it changes the overall sentiment of: we don’t know how LLMs work.

This is fundamentally different from computers. We know how computers work such that we can emulate a computer. But for an LLM we can’t fully control it, we don’t fully understand why it hallucinates, we don’t understand how to fix the hallucination and we definitely cannot emulate an LLM in the same way we do for a computer. It isn’t just that we don’t understand LLMs. It’s that there isn’t anything in the history of human invention that we lack such fundamental understanding of.

Off of that logic, the facts are unequivocally clear: we don’t understand LLMs and your statement is wrong.

But it goes beyond this. I’m not just saying this. This is the accepted general sentiment in academia and you can watch that video of Hinton, the godfather of AI in academia basically saying the exact opposite of your claim here. He literally says we don’t understand LLMs.

cindyllm 20 hours ago [-]
[dead]
Jensson 1 days ago [-]
> Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.

This isn't how LLM works. What an LLM understands has nothing to do with the words they say, it only has to do with what connections they have seen.

If an LLM has only seen a manual but has never seen examples of how the product is used, then it can tell you exactly how to use the product by writing out info from the manual, but if you ask it to do those things then it wont be able to, since it has no examples to go by.

This is the primary misconception most people have and make them over estimate what their LLM can do, no they don't learn by reading instructions they only learn by seeing examples and then doing the same thing. So an LLM talking about truth just comes from it having seen others talk about truth, not from it thinking about truth on its own. This is fundamentally different to how humans think about words.

ninetyninenine 24 hours ago [-]
>This isn't how LLM works.

I know how an LLM works. I've built one. At best we only know surface level stuff like the fact that it involves a feed forward network and is using token prediction.

But the emergent effect of how it an LLM produces an overall statement that reflects high level conceptual understanding is something we don't know.

So your claim of "This isn't how an LLM works" which was said which such confidence is utterly wrong. You don't know how it works, no one does.

catlifeonmars 1 days ago [-]
> Have the LLM talk about what “truth” is and the nature of LLM hallucinations and it can cook up an explanation that demonstrates it completely understands the concepts.

There is not necessarily a connection between what an LLM understands and what it says. It’s totally possible to emit text that is logically consistent without understanding. As a trivial example, just quote from a physics textbook.

I’m not saying your premise is necessarily wrong: that LLMs can understand the difference between truth and falsehood. All I’m saying is you can’t infer that from the simple test of talking to an LLM.

ninetyninenine 24 hours ago [-]
>There is not necessarily a connection between what an LLM understands and what it says. It’s totally possible to emit text that is logically consistent without understanding. As a trivial example, just quote from a physics textbook.

This is true, but you could say the same thing about a human too right? There's no way to say there's a connection between what a human says and whether or not a human understands something. Right? We can't do mind reading here.

So how do we determine whether or not a human understands something? Based off of what the human tells us. So I'm just extrapolating that concept to the LLM. It knows things. Does it matter what the underlying mechanism is? If we get LLM output to be perfect in every way but the underlying mechanism is still feed forward networks with token prediction then I would still say it "understands" because that's the EXACT metric we use to determine whether a human "understands" things.

>I’m not saying your premise is necessarily wrong: that LLMs can understand the difference between truth and falsehood. All I’m saying is you can’t infer that from the simple test of talking to an LLM.

Totally understood. And I didn't say that it knew the difference. I was saying basically a different version of what you're saying.

You say: We can't determine if it knows the difference between truth and falsehood. I say: We can't determine if it doesn't know the difference between truth and falsehood.

Neither statement contradicts each other. The parent commenter imo was making a definitive statement in that he claims we know it doesn't understand and I was just contradicting that.

Zigurd 1 days ago [-]
I recently asked Gemini to riff on the concept of "Sustainable Abundance" and come up with similar plausible bullshit. I could've filled a slate of TED talks with the brilliant and plausible sounding nonsense it came up with. Liberated from the chains of correctness, LLMs' power is unleashed. For example:

The Symbiocene Horizon: A term suggesting a techno-utopian future state where humanity and technology have merged with ecological systems to achieve a perfect, self-correcting state of equilibrium.

01HNNWZ0MV43FF 1 days ago [-]
Sounds like solarpunk
fumeux_fume 1 days ago [-]
In the article, OpenAI defines hallucinations as "plausible but false statements generated by language models." So clearly it's not all that LLMs know how to do. I don't think Parsons is working from a useful or widely agreed upon definition of what a hallucination is which leads to these "hot takes" that just clutter and muddy up the conversation around how to reduce hallucinations to produce more useful models.
mpweiher 1 days ago [-]
They just redefined the term so that they no longer call hallucinations that are useful hallucinations.

But the people who say everything LLMs do is hallucinate clearly also make that distinction, they just refuse to rename the useful hallucinations.

"How many legs does a dog have if you call his tail a leg? Four. Saying that a tail is a leg doesn't make it a leg." -- Abraham Lincoln

johnnyanmac 1 days ago [-]
I'd say a humans ability to reason with theoretical situations like this is our very core of creativity and intelligence, though. This quote makes sense for a policy maker, but not a scientist.

Now granted, we also need to back up those notions with rigorous testing and observation, but those "if a tail is a leg" theoretical is the basis of the reasoning.

mcphage 1 days ago [-]
LLMs don’t know the difference between true and false, or that there even is a difference between true and false, so I think it’s OpenAI whose definition is not useful. As for widely agreed upon, well, I’m assuming the purpose of this post is to try and reframe the discussion.
hodgehog11 1 days ago [-]
If an LLM outputs a statement, that is by definition either true or false, then we can know whether it is true or false. Whether the LLM "knows" is irrelevant. The OpenAI definition is useful because it implies hallucination is something that can be logically avoided.

> I’m assuming the purpose of this post is to try and reframe the discussion

It's to establish a meaningful and practical definition of "hallucinate" to actually make some progress. If everything is a hallucination as the other comments seem to suggest, then the term is a tautology and is of no use to us.

kolektiv 1 days ago [-]
It's useful as a term of understanding. It's not useful to OpenAI and their investors, so they'd like that term to mean something else. It's very generous to say that whether an LLM "knows" is irrelevant. They would like us to believe that it can be avoided, and perhaps it can, but they haven't shown they know how to do so yet. We can avoid it, but LLMs cannot, yet.

Yes, we can know whether something is true or false, but this is a system being sold as something useful. If it relies on us knowing whether the output is true or false, there is little point in us asking it a question we clearly already know the answer to.

hodgehog11 15 hours ago [-]
I mean no disrespect, as I'm no more fond of OpenAI than anyone else (they are still the villains in this space), but I strongly disagree.

> It's useful as a term of understanding.

No it isn't. I dare you to try publishing in this field with that definition. Claiming all outputs are hallucinations because it's a probabilistic model tells us nothing of value about what the model is actually doing. By this definition, literally everything a human says is a hallucination as well. It is only valuable to those who wish to believe that LLMs can never do anything useful, which as Hinton says, is really starting to sound like an ego-driven religion at this point. Those that follow it do not publish in top relevant outlets any more, and should not be regarded as an expert on the subject.

> they haven't shown they know how to do so yet. We can avoid it, but LLMs cannot, yet.

This is exactly what they argue in the paper. They discuss the logical means by which humans are able to bypass making false statements by saying "I don't know". A model that responds only with a lookup table and an "I don't know" can never give false statements, but is probably not so useful either. There is a sweet spot here, and humans are likely close to it.

> If it relies on us knowing whether the output is true or false

I never said the system relies on it. I said that our definition of hallucination, and therefore our metrics by which to measure it, depend only on our knowing whether the output is true. This is no different from any other benchmark. They are claiming that it might be useful to make a new benchmark for this concept.

username223 21 hours ago [-]
"Logically avoided?"

OpenAI has a machine that emits plausible text. They're trying to argue that "emitting plausible text" is the hard problem, and "modeling the natural world, human consciousness, society, etc." is the easy one.

hodgehog11 15 hours ago [-]
Hmm, I don't see where they have suggested this, could you point to where this is? If they do argue for this, then I would also disagree with them.

Modelling those things is a separate problem to emitting plausible text and pursuing one is not necessarily beneficial to the other. It seems more sensible to pursue separate models for each of these tasks.

throwawaymaths 1 days ago [-]
that's wrong. there is probably a categorical difference between making something up due to some sort of inferential induction from the kv cache context under the pressure of producing a token -- any token -- and actually looking something up and producing a token.

so if you ask, "what is the capital of colorado" and it answers "denver" calling it a Hallucination is nihilistic nonsense that paves over actually stopping to try and understand important dynamics happening in the llm matrices

saghm 1 days ago [-]
> so if you ask, "what is the capital of colorado" and it answers "denver" calling it a Hallucination is nihilistic nonsense that paves over actually stopping to try and understand important dynamics happening in the llm matrices

On the other hand, calling it anything other than a hallucination misrepresents the idea of truth as being something that these models have any ability to differentiate between their outputs based on whether they accurately reflect reality by conflating a fundamentally unsolved problem as an engineering tradeoff.

ComplexSystems 1 days ago [-]
It isn't a hallucination because that isn't how the term is defined. The term "hallucination" refers, very specifically, to "plausible but false statements generated by language models."

At the end of the day, the goal is to train models that are able to differentiate between true and false statements, at least to a much better degree than they can now, and the linked article seems to have some very interesting suggestions about how to get them to do that.

throwawaymaths 1 days ago [-]
your point is good and taken but i would amend slightly -- i dont think that "absolute truth" is itself a goal, but rather "how aware is it that it doesn't know something". this negative space is frustratingly hard to capture in the llm architecture (though almost certainly there are signs -- if you had direct access to the logits array, for example)
mannykannot 1 days ago [-]
There is a way to state Parson's point which avoids this issue: hallucinations are just as much a consequence of the LLM working as designed as are correct statements.
throwawaymaths 1 days ago [-]
fine. which part is the problem?
johnnyanmac 1 days ago [-]
The part where it can't admit situations where there's not enough data/training to admit it doesn't know.

I'm a bit surprised no one talks about this factor. It's like talking to a giant narcissist who can Google really fast but not understand what it reads. The ability to admit ignorance is a major factor of credibility, because none of us know everything all at once.

throwawaymaths 1 days ago [-]
yeah sorry i mean which part of the architecture. "working as designed"
littlestymaar 1 days ago [-]
> that's wrong.

Why would anyone respond with so little nuance?

> a Hallucination

Oh, so your shift key wasn't broken all the time, then why aren't you using it in your sentences?

leptons 1 days ago [-]
"A broken clock is right twice a day"
cwmoore 1 days ago [-]
A stopped clock. There are many other ways to be wrong than right.
codethief 13 hours ago [-]
I was inclined to agree at first but do those use cases really conflict?

If I ask the LLM to generate a fictional story set in medieval Francs, and it then responds with a fictional story set in medieval France, that's an appropriate ("correct") response to the task I gave it. If it responded with a story set in medieval England, though, that would not be correct. If, instead, I had asked it to generate a story in "medieval times", both France and England would have been correct as locations because the problem was underspecified and asked for some creativity. A medieval story set in the US, however, would still not have been correct or consistent with the training data. You can come up with more such examples even in entirely fictional settings: Once the story has been set to take place in fictional city X, it would not be consistent if two sentences later the characters were in city Y all of a sudden. (That would be a bit too creative.) What I'm trying to say is: Creativity might be "correct" (appropriate) in a given context, or it might not be. Even fiction and creativity require a certain degree of consistency and coherence.

Now, correct answers, in turn, might also require a certain degree of creativity:

If I ask the LLM for some straight up facts, which are not in its training data nor in the prompt context, the only really correct answer is "I don't know". However, sometimes it might be possible to narrow down the correct answer to a few possible options based on the training data. So then it might be appropriate for the LLM to say "I don't know the exact answer but here are some educated guesses based on what I do know: …" And maybe, having pondered those options, it is able to deduce the correct answer after all. (In the same way as I am writing this HN comment to help me think and clarify my thoughts.)

This is reminiscent of mathematics and mathematical research, which are often described as a creative process. Obviously, the creative output is heavily constrained. You make educated guesses and then validate them against what you already know to be true. Someone else here in this thread[0] mentioned Popper's "Conjectures and Refutations" as a possible model for what intelligent cognition is about and the more I think about that, the more convincing I find it.

[0]: https://news.ycombinator.com/item?id=45153695

skybrian 1 days ago [-]
I don’t think it’s inherently ill-defined, since the context can tell you whether fiction is being requested or not. For an AI chatbot, the default shouldn’t be fiction.

What is true is that during pretraining, the model doesn’t know enough to determine this or to distinguish between what it knows and what it’s making up. This is a higher-level distinction that emerges later, if at all.

The recent research discovering an “evil vector” is an example of a higher-level distinction.

hodgehog11 1 days ago [-]
I don't agree that it is an ill-defined problem, since we can design separate models to excel in each of these two tasks. For a "factual" LLM, if the output is a verifiable statement, it should be correct. Otherwise it "hallucinates". But since an LLM can't know everything, a better approach is to effectively state its own uncertainty so that it avoids making definitive statements with low confidence.
cjauvin 1 days ago [-]
If you consider this from the angle of Wittgenstein's "language games", you could say that the problem would be "simply" to distinguish between these two, quite different, language games, and act accordingly.
1 days ago [-]
johnnyanmac 1 days ago [-]
>This requires hallucination, because it has to make up things. The story has to be original.

Is it a hallucination if the story is original? There's a difference between "what's the rest of this famous poem?" and "let's just make poetry".

lucketone 1 days ago [-]
It is irrelevant for the point being made: LLM does exactly the same thing in both cases - generates statistically plausible text, based on examples it was exposed during training.
1 days ago [-]
furyofantares 1 days ago [-]
Wanting it to pick between those modes based on what you asked for is not remotely ill-defined.

But even if we restricted ourselves to the case of factual queries, the article discusses why training in a certain way would still produce hallucinations, and how to change the training method to reduce this.

Like many of the other responses here, your dismissal doesn't really address any of the content of the article, just the title.

ninetyninenine 1 days ago [-]
Did you read the article? You’re going on some generic tangent and regurgitating the same spiel about LLMs that you see all over the internet.

I mean it’s plain that you have an orthogonal (though generic) opinion on why LLMs hallucinate but how does that relate to the article? How does your opinion which you blatantly just dropped as if it’s the final opinion override the opinion of the article?

Seems off topic honestly.

simianwords 10 hours ago [-]
I agree. It’s just people who have a different view taking their opportunity to vent out their frustration.
raincole 22 hours ago [-]
Generally HN commenters don't read the article. They use the title as a prompt to express their opinions on a specific topic.
roxolotl 1 days ago [-]
This seems inherently false to me. Or at least partly false. It’s reasonable to say LLMs hallucinate because they aren’t trained to say they don’t have a statistically significant answer. But there is no knowledge of correct vs incorrect in these systems. It’s all statistics so what OpenAI is describing sounds like a reasonable way to reduce hallucinations but not a way to eliminate them nor the root cause.
goalieca 1 days ago [-]
> It’s reasonable to say LLMs hallucinate because they aren’t trained to say they don’t have a statistically significant answer.

I’ve not seen anyone intuitively explain parameters for a real scale model.. perhaps because it’s all just thousand dimensional nonsense.

Statistics is a funny thing too. Pretty much everyone has seen how trend lines don’t always extrapolate very well.

I think OpenAI is biased to thinking that adding more parameters and training better will fix all ills. In a handwaving way, you can see this like adding more degrees to the polynomial when you curve fit on a spreadsheet. With enough parameters you can perfectly fit any dataset. That all works until you run across new inputs that are unlike training data.

utyop22 24 hours ago [-]
"I think OpenAI is biased to thinking that adding more parameters and training better will fix all ills."

Their whole existence depends on this happening. Else they go bust.

ACCount37 1 days ago [-]
Is there any knowledge of "correct vs incorrect" inside you?

If "no", then clearly, you can hit general intelligence without that.

And if "yes", then I see no reason why an LLM can't have that knowledge crammed inside it too.

Would it be perfect? Hahahaha no. But I see no reason why "good enough" could not be attained.

wavemode 1 days ago [-]
> Is there any knowledge of "correct vs incorrect" inside you?

There is a sort of knowledge humans possess that LLMs don't (and in fact can't, without a fundamental architectural change), which is knowledge of how certain one is about something.

If you ask a human a question about how something works in biology, they will be able to give you an answer as well as a sort of "epistemic" citation (i.e. the difference between "I don't remember where exactly I originally read that, but I'm a research biologist and am quite certain that's how it works" versus "I don't remember where I read that - it's probably just something we learned about in biology class in high school. Take it with a grain of salt, as I could be misremembering.")

LLMs don't have this reflexive sense of their own knowledge - there's a fundamental divide between training data (their "knowledge") and context (their "memory") which causes them to not really be capable of understanding how they know what they know (or, indeed, whether they truly know it at all). If a model could be created where the context and training data were unified, like in a brain, I could see a more realistic path to general intelligence than what we have now.

ACCount37 1 days ago [-]
LLMs have that knowledge. Just not nearly enough of it. Some of it leaks through from the dataset, even in base models. The rest has to be taught on purpose.

You can get an LLM to generate a list of facts that includes hallucinations - and then give that list to another instance of the same LLM, and get it to grade how certain it is of each fact listed. The evaluation wouldn't be perfect, but it'll outperform chance.

You can make that better with the right training. Or much worse, with the wrong training. Getting an LLM to be fully aware of all the limits of its knowledge is likely to be impractical, if not outright impossible, but you can improve this awareness by a lot, and set a conservative baseline for behavior, especially in critical domains.

"Fully aware of all the limits of its knowledge" is unattainable for humans too, so LLMs are in a good company.

wavemode 1 days ago [-]
No, LLMs don't have that knowledge. They can't inspect their own weights and examine the contents. It's a fundamental limitation of the technology.

The sort of training you're talking about is content like, "ChatGPT was trained on research papers in the area of biology. It possesses knowledge of A, B, and C. It does not possess knowledge of X, Y and Z." But this merely creates the same problem in a loop - given a question, how does the LLM -know- that its training data contains information about whether or not its training data contains information about the answer to the question? The reality is that it doesn't know, you just have to assume that it did not hallucinate that.

The problem of being unaware of these things is not theoretical - anyone with deep knowledge of a subject will tell you that as soon as you go beyond the surface level of a topic, LLMs begin to spout nonsense. I'm only a software engineer, but even I regularly face the phenomenon of getting good answers to basic questions about a technology, but then beyond that starting to get completely made-up features and function names.

> "Fully aware of all the limits of its knowledge" is unattainable for humans too

This just isn't true. Humans know whether they know things, and whether they know how they know it, and whether they know how they know how they know it, and...

Knowledge itself can contain errors, but that's not what I'm talking about. I'm not talking about never being wrong. I'm merely talking about having access to the contents of one's own mind. (Humans can also dynamically update specific contents of their own mind, but that's also not even what I'm talking about right now.) An LLMs hallucination is not just knowledge that turned out to be wrong, it is in fact knowledge that never existed to begin with, but the LLM has no way of telling the difference.

ACCount37 1 days ago [-]
Humans can't "inspect their own weights and examine the contents" either.

No human has ever managed to read out his connectome without external instrumentation. There were entire human civilizations that thought that the seat of consciousness was the heart - which, for creatures that claim to know how their own minds work, is a baffling error to make.

LLMs are quite similar in that to humans. They, too, have no idea what their hidden size is, or how many weights they have, or how exactly are the extra modalities integrated into them, or whether they're MoE or dense. They're incredibly ignorant of their own neural architecture. And if you press them on it, they'll guess, and they'll often be wrong.

The difference between humans and LLMs comes down to the training data. Humans learn continuously - they remember what they've seen and what they haven't, they try things, they remember the outcomes, and get something of a grasp (and no, it's not anything more than "something of a grasp") of how solid or shaky their capabilities are. LLMs split training and inference in two, and their trial-and-error doesn't extend beyond a context window. So LLMs don't get much of that "awareness of their own capabilities" by default.

So the obvious answer is to train that awareness in. Easier said than done. You need to, essentially, use a training system to evaluate an LLM's knowledge systematically, and then wire the awareness of the discovered limits back into the LLM.

OpenAI has a limited-scope version of this in use for GPT-5 right now.

wnoise 1 days ago [-]
No, humans can't inspect their own weights either -- but we're not LLMs and don't store all knowledge implicitly as probabilities to output next token. It's pretty clear that we also store some knowledge explicitly, and can include context of that knowledge.

(To be sure, there are plenty of cases where it is clear that we are only making up stories after the fact about why we said or did something. But sometimes we do actually know and that reconstruction is accurate.)

blix 20 hours ago [-]
I inspect and modify my own weights literally all the time. I just do it on a more abstract level than individual neurons.

I call this process "learning"

utyop22 24 hours ago [-]
"The problem of being unaware of these things is not theoretical - anyone with deep knowledge of a subject will tell you that as soon as you go beyond the surface level of a topic, LLMs begin to spout nonsense"

I've tested this in a wide range of topics across corporate finance, valuation, economics and so on and yes once you go one or two levels deep it starts spouting total nonsense. If you ask it to define terms succintly and simply it cannot. Why? Because the data that been fed into the model is from people who cannot do it themselves lol.

The experts, will remain experts.

Most people I would argue have surface level knowledge so they are easily impressed and don't get it because A) they don't go deep B) They don't know what it means to go thoroughly deep in a subject area.

thaumasiotes 1 days ago [-]
> And if "yes", then I see no reason why an LLM can't have that knowledge crammed inside it too.

An LLM, by definition, doesn't have such a concept. It's a model of language, hence "LLM".

Do you think the phrase just means "software"? Why?

ACCount37 1 days ago [-]
If I had a penny for an every confidently incorrect "LLMs can't do X", I'd be able to buy an H100 with that.

Here's a simple test: make up a brand new word, or a brand new person. Then ask a few LLMs what the word means, or when that person was born.

If an LLM had zero operational awareness of its knowledge, it would be unable to recognize that the word/person is unknown to it. It would always generate a plausible-sounding explanation for what the word might mean, the same exact way it does for the word "carrot". Or a plausible-sounding birth date, the way it does for the person "Abraham Lincoln".

In practice, most production grade LLMs would recognize that a word or a person is unknown to them.

This is a very limited and basic version of the desirable "awareness of its own knowledge" - and one that's already present in current LLMs! Clearly, there's room for improved self-awareness.

pessimizer 1 days ago [-]
Do they "recognize" that they don't know the word, or are there just no statistically plausible surroundings that they can embed a nonsense word into other than settings that usually surround un-tokenizable words?

If you told them to write a Lewis Carroll poem about a nonsense word, it wouldn't have any problem. Not because it "recognizes" the word as being like a nonsense word in a Lewis Carroll poem, but because those poems are filled with other un-tokenizable words that could be replaced with anything.

I'm starting to come to the conclusion that LLMs are Mad-Libs at scale. Which are actually very useful. If there are paragraphs where I can swap out the words for other words, and generate a plausible idea, I can try it out in the real world and it might really work.

ACCount37 1 days ago [-]
I don't think there's a direct link to the tokenizer - it's a higher level capability. You can stitch together a nonsense word out of common "word fragment" tokens and see if that impairs the LLM's ability to recognize the word as nonsense.
Jensson 1 days ago [-]
That is wrong, I just generated 5 random letters in python and sent it to gpt-5 and it totally failed to answer properly, said "Got it, whats up :)" even though what I wrote isn't recognizable at all.

The "capability" you see is for the LLM to recognize its a human typed random string since human typed random strings are not very random. If you send it an actual random word then it typically fails.

pfg_ 15 hours ago [-]
I tried this four times, every time it recognized it as nonsense.
typpilol 14 hours ago [-]
Same
thaumasiotes 1 days ago [-]
> If you told them to write a Lewis Carroll poem about a nonsense word, it wouldn't have any problem.

This makes me wonder something specific.

Let's imagine that we generate poetry "in the style of Lewis Carroll" around a particular nonsense word, one that hasn't been written down before.

Will that poetry treat the word as if it has one consistent pronunciation?

(This question doesn't quite apply to Jabberwocky - Lewis Carroll himself would obviously have passed the test, but he doesn't reuse his nonsense words.)

ninetyninenine 18 hours ago [-]
I'm going to tell you straight up. I am a very intelligent man and I've been programming for a very long time. My identity is tied up with this concept that I am intelligent and I'm a great programmer so I'm not going to let some AI do my job for me. Anything that I can grasp to criticize the LLM I'm gonna do it because this is paramount to me maintaining my identity. So you and your rationality aren't going to make me budge. LLMs are stochastic parrots and EVERYONE on this thread agrees with me. They will never take over my job!

I will add they will never take over my job <in my lifetime> because it makes me sound more rational and it's easier to swallow that then to swallow the possibility that they will make me irrelevant once the hallucination problem is solved.

simianwords 10 hours ago [-]
Ha. I could have written this post myself.
mountainriver 1 days ago [-]
There is knowledge of correct and incorrect, that’s what loss is, there are just often many possible answers to a question.

This is the same reason that RLVR works. There is just right one answer and LLMs learn this fairly well but not perfectly (yet)

Jensson 1 days ago [-]
> There is knowledge of correct and incorrect, that’s what loss is

Loss is only correctness in terms of correct language, not correct knowledge. It correlates with correct knowledge, but that is all, that correlation is why LLM is useful for tasks at all but we still don't have a direct measure for correct knowledge in the models.

So for language tasks loss is correctness, so for things like translations LLM are extremely reliable. But for most other kinds of tasks they are just loosely correlated.

mountainriver 22 hours ago [-]
We do with RLVR and that works, there is only one answer, it has to find it. LLMs are often also trained on factual information, and tested on that.

If the knowledge can be represented in text then they can learn it, if it can't then we need a multimodal model.

FusionX 1 days ago [-]
They partly address this near the end

> It’s doubly hard to distinguish valid statements from invalid ones when you don’t have any examples labeled as invalid. But even with labels, some errors are inevitable. To see why, consider a simpler analogy. In image recognition, if millions of cat and dog photos are labeled as “cat” or “dog,” algorithms can learn to classify them reliably. But imagine instead labeling each pet photo by the pet’s birthday. Since birthdays are essentially random, this task would always produce errors, no matter how advanced the algorithm.

> The same principle applies in pretraining. Spelling and parentheses follow consistent patterns, so errors there disappear with scale. But arbitrary low-frequency facts, like a pet’s birthday, cannot be predicted from patterns alone and hence lead to hallucinations. Our analysis explains which kinds of hallucinations should arise from next-word prediction. Ideally, further stages after pretraining should remove them, but this is not fully successful for reasons described in the previous section.

kingstnap 1 days ago [-]
There is this deeply wrong part of this paper that no one has mentioned:

The model head doesn't hallucinate. The sampler does.

If you ask an LLM when x was born and it doesn't know.

And you take a look at the actual model outputs which is a probability distribution over tokens.

IDK is cleanly represented as a uniform probability Jan 1 to Dec 31

If you ask it to answer a multiple choice question and it doesn't know. It will say this:

25% A, 25% B, 25% C, 25%D.

Which is exactly, and correctly, the "right answer". The model has admitted it doesn't know. It doesn't hallucinate anything.

In reality we need something smarter than a random sampler to actually extract this information out. The knowledge and lack of knowledge is there, you just produced bullshit out of it.

ACCount37 1 days ago [-]
No, that's a misconception. It's not nearly that simple.

There are questions that have a palpable split in probability between the answers, with logit distribution immediately exposing the underlying lack-of-confidence.

But there are also questions that cause an LLM to produce consistent-but-wrong answers. For example, because the question was associated with another not-the-same-but-somewhat-similar question internally, and that was enough to give an LLM a 93% on B, despite B being the wrong answer.

An LLM might even have some latent awareness of its own uncertainty in this case. But it has, for some reason, decided to proceed with a "best guess" answer, which was in this case wrong.

numeri 22 hours ago [-]
This isn't right – calibration (informally, the degree to which certainty in the model's logits correlates with its chance of getting an answer correct) is well studied in LLMs of all sizes. LLMs are not (generally) well calibrated.
cyanydeez 1 days ago [-]
Im betting there's a graph model using various vectors that could improve known-knowns in outcomes.

But unknown-unknowns likely reduce to the Halting problem, which human intelligence doesnt really solve either.

thomasboyer 1 days ago [-]
Great post. Teaching the models to doubt, to say "I don't know"/"I'm unsure"/"I'm sure" is a nice way to make them much better.
meshugaas 1 days ago [-]
Look at their stats though. If they did this, more than half of responses would end up as “I don’t know.” Nobody would use something that did that.
skybrian 1 days ago [-]
It seems like it would train users to ask questions that it can actually answer. (They might also need some examples of what sort of questions to ask.)
Jensson 1 days ago [-]
Mostly it would train users to not use their service and go to a service where the model outputs results they can copy paste to complete their assignment.

So these companies cannot do this, they would hemorrhage too many users and companies cannot go against the profit incentives in practice.

more_corn 1 days ago [-]
It baffles me that this hasn’t been done yet. Saying I don’t know or I’m unsure is critical for anything that matters.
ACCount37 1 days ago [-]
Major industry players were doing that for a while now. It's just hard to actually design training regimes that give LLMs better hallucination-avoidance capabilities.

And it's easy to damage the hallucination-avoidance capabilities by training an LLM wrong. As OpenAI has demonstrated when they fried the o3 with RLVR that encouraged guesswork.

That "SAT test incentivizes guesswork" example they give in the article is one they had to learn for themselves the hard way.

didibus 24 hours ago [-]
When tuning predictive models you always have to balance precision and recall because 100% accuracy is never going to happen.

In LLMs that balance shows up as how often the model hallucinates versus how often it says it doesn’t know. If you push toward precision you end up with a model that constantly refuses: What’s the X of Y? I don’t know. Can you implement a function that does K? I don’t know how. What could be the cause of G? I can’t say. As a user that gets old fast, you just want it to try, take a guess, let you be the judge of it.

Benchmarks and leaderboards usually lean toward recall because a model that always gives it a shot creates a better illusion of intelligence, even if some of those shots are wrong. That illusion keeps users engaged, which means more users and more money.

And that's why LLM hallucinates :P

Difwif 23 hours ago [-]
It would be interesting to see two versions of a model. A primary model tuned for precision that's focused on correctness that works with or orchestrates a creative model that's tuned for generating new (and potentially incorrect) ideas. The primary model is responsible for evaluating and reasoning about the ideas/hallucinations. Feels like a left/right brain architecture (even though that's an antiquated model of human brain hemispheres).
robotcapital 20 hours ago [-]
It’s interesting that most of the comments here read like projections of folk-psych intuitions. LLMs hallucinate because they “think” wrong, or lack self-awareness, or should just refuse. But none of that reflects how these systems actually work. This is a paper from a team working at the state of the art, trying to explain one of the biggest open challenges in LLMs, and instead of engaging with the mechanisms and evidence, we’re rehashing gut-level takes about what they must be doing. Fascinating.
KajMagnus 15 hours ago [-]
Yes, many _humans_ here hallucinate, sort of.

They apparently didn't read the article, or didn't understand i, or disregard from it. (Why, why, why?)

And they fail to realize that they don't know what they are talking about, nevertheless keep talking. Similar to an over confident AI.

On a discussion about hallucinating AIs, the humans start hallucinating.

KajMagnus 15 hours ago [-]
Could one say that humans are trained very differently from AIs?

If we (humans) make confident guesses, but are wrong — then, others will look at us disappointedly, thinking "oh s/he doesn't know what s/he is talking about, I'm going to trust them a bit less hereafter". And we'll tend to feel shame and want to withdraw.

That's a pretty strong punishment, for being confidently wrong? Not that odd, then, that humans say "I'm not sure" more often than AIs?

zahlman 20 hours ago [-]
Calling it a "hallucination" is anthropomorphizing too much in the first place, so....
razzmatazmania 18 hours ago [-]
Confabulation is human behavioral phenomena that is not all that uncommon. Have you ever heard a grandpa big fish story? Have you ever pretended to know something you didn't because you wanted approval or to feel confident? Have you answered a test question wrong when you thought you were right? What I find fascinating about these models is they are already more intelligent and reliable than the worst humans. I've known plenty of people who struggle to conceptualize and connect information and are helpless outside of dealing with familiar series of facts or narratives. That these models aren't even as large as human brains makes me suspect that practical hardware limits might still be in play here.
robotcapital 19 hours ago [-]
Right, that’s kind of my point. We call it “hallucination” because we don’t understand it, but need a shorthand to convey the concept. Here’s a paper trying to demystify it so maybe we don’t need to make up anthropomorphized theories.
renewiltord 16 hours ago [-]
It's always the most low-brow takes as well. But the majority of Hacker News commentators "hallucinate" most of their comments in the first place, since they simply regurgitate the top answers based on broad bucketing of subject matter.

Facebook? "Steal your data"

Google? "Kill your favourite feature"

Apple? "App Store is enemy of the people"

OpenAI? "More like ClosedAI amirite"

mqus 22 hours ago [-]
I think one of the main problems is the dataset it is trained on, which is written text. How much answers with statements are in a given text, compared to a "I don't know"? I think the "I don't know"s are much less represented. Now go anywhere on the internet where someone asks a question (the typical kind of content LLMs are trained on) and the problem is even bigger. You either get no textual answer or someone that gives some answer (that might even be false). You never get an answer like "I don't know", especially for questions that are shouted into the void (compared to asking a certain person). And it makes sense. I wouldn't start to answer every stackoverflow question with "I don't know" tomorrow, it would just be spam.

For me, as a layman (with no experience at all about how this actually works), this seems to be the cause. Can we work around this? Maybe.

robertclaus 1 days ago [-]
While I get the academic perspective of sharing these insights, this article comes across as corporate justifying/complaining that their model's score is lower than it should be on the leaderboards... by saying the leaderboards are wrong.

Or an even darker take is that its coorporate saying they won't prioritize eliminating hallucinations until the leaderboards reward it.

skybrian 23 hours ago [-]
Yes, it's self-interested because they want to improve the leaderboards, which will help GPT-5 scores, but in the other hand, the changes they suggest seem very reasonable and will hopefully help everyone in the industry do better.

And I'm sure other people will complain if notice that changing the benchmarks makes things worse.

juancn 1 days ago [-]
This is fluff, hallucinations are not avoidable with current models since those are part of the latent space defined by the model and the way we explore it, you'll always find some.

Inference is kinda like doing energy minimization on a high dimensional space, the hallucination is already there, for some inputs you're bound to find them.

kdnvk 1 days ago [-]
Did you read the linked paper?
ninetyninenine 18 hours ago [-]
The majority of people on this thread didn't even click on the link. People are so taken by their own metaphysical speculations of what an LLM is.

Like literally the inventor of the LLM wrote an article and everyone is criticizing that article without even reading it. Most of these people have never built an LLM before either.

d4rkn0d3z 23 hours ago [-]
This is a case of the metric becoming the target. The tools used to evaluate LLM performance are shaping the LLM. First you make your tools then your tools make you.

If we take a formal systems approach, then an LLM is a model of a complex hierarchy of production rules corresponding to the various formal and informal grammatical, logical, and stylistic rules and habits employed by humans to form language that expresses their intelligence. It should not be surprising that simply executing the production rules, or a model thereof, will give rise to sentences that cannot be assigned a meaning. It should also give rise to sentences that we cannot prove or make sense of immediately but we would not want to discard these due to uncertainty. Why? because every once in a while the sentence that would be culled is actually the stroke of brilliance we are looking for, uncertainty be damned. The citation here would be literally nearly every discovery ever made.

When I recall information and use it, when I "think", I don't just produce sentences by the rules, formal and informal, I don't consider at all how often I have seen one word precede another in past, rather as I meandre the landscape of a given context, a thought manifold if you will, I am constantly evaluating whether this is in contradiction with that, if this can be inferred from that via induction or deduction, does this preclude that, etc.. That is the part that is missing from an LLM; The uncanny ability of the human mind to reproduce the entire manifold of concepts as they relate to one another in a mesh from any small piece of the terrain that it might recall, and to verify anew that they all hang together unsupported by one's own biases.

The problem is that just as the scarcity of factual information in the corpus makes it difficult to produce, so is actual reasoning rarefied among human language samples. Most of what appears as reasoning is language games and will to power. The act of reasoning in an unbiased way is so foreign to humans, so painful and arduous, so much like bending over backwards or swimming upstream against a strong current of will to power, that almost nobody does it for long.

sp1982 24 hours ago [-]
This makes sense. I recently did an experiment to test GPT5 on hallucinations on cricket data where there is a lot of statistical pressure. It is far better to say idk than a wrong answer. Most current benchmarks don’t test for that. https://kaamvaam.com/machine-learning-ai/llm-eval-hallucinat...
williamtrask 22 hours ago [-]
IMO - this paper is right about a major contributing factor to hallucinations but wrong about the cause

LLM hallucinations are closer to a cache miss.

https://x.com/iamtrask/status/1964403351116009671

manveerc 1 days ago [-]
Maybe I am oversimplifying it, but isn’t the reason that they are lossy map of worlds knowledge and this map will never be fully accurate unless it is the same size as the knowledge base.

The ability to learn patterns and generalize from them adds to this problem, because people then start using it for usecases it will never be able to solve 100% accurately (because of the lossy map nature).

Peritract 1 days ago [-]
As with a lot of AI stuff, Borges already wrote about it.

https://www.sccs.swarthmore.edu/users/08/bblonder/phys120/do...

manveerc 1 days ago [-]
That’s more elegantly put than I ever can.

Btw I am not disagreeing with the utility of LLMs, my point is it can never be 100% accurate with current architecture (unless you blow up the size).

cainxinth 1 days ago [-]
I find the leader board argument a little strange. All their enterprise clients are clamoring for more reliability from them. If they could train a model that conceded ignorance instead of guessing and thus avoid hallucinations, why aren't they doing that? Because of leader board optics?
ospray 1 days ago [-]
I think they are trying to communicate that their benchmarks will go down as they try to tackle hallucinations. Honestly I am surprised they didn't just say we think all benchmarks need a incorrect vs abstinence ratio so our cautious honest model can do well on that. Although they did seem to hint that's what they want.
jrm4 1 days ago [-]
Yeah, no, count me in with those who think that "All they do is hallucinate" is the correct way to say this and anything else dangerously obscures things.

More than anything, we need transparency on how these things work. For us and for the general public.

"Hallucination" introduces the dangerous idea that "them getting things wrong" is something like a "curable disease" and not "garbage in garbage out."

No. This is as stupid as saying Google telling me a restaurant is open when it's closed is a "hallucination." Stop personifying these things.

e3bc54b2 1 days ago [-]
Hallucination is all an LLM does. That is their nature, to hallucinate.

We just happen to find some of these hallucinations useful.

Let's not pretend that hallucination is a byproduct. The usefulness is the byproduct. That is what surprised the original researchers on transformer performance, and that is why the 'attention is all you need' paper remains such a phenomenon.

fumeux_fume 1 days ago [-]
> Hallucination is all an LLM does.

I wish people who take this stance would seriously reconsider their take on how hallucinations are defined and how unhelpful it is to conflate hallucination with generation from a probability distribution. I appreciate OpenAI publishing articles like this because, while the parent comment and I may have to agree to disagree on how hallucinations are defined, I can at least appeal to OpenAI's authority to say that such arguments are not only unhelpful, but also unsound.

Zigurd 1 days ago [-]
You're going to get a lot of pushback on the idea of taking the definition of hallucination seriously. Calling fluently stated bunk "hallucination" feels cynical to begin with. Trying to weave a silk purse out of that sow's ear is difficult.
hodgehog11 1 days ago [-]
I don't know what you mean by hallucination here; are you saying that any statistical output is "hallucination"? If so, then we are also constantly hallucinating I guess.

There doesn't seem to be a particularly consistent definition of what "hallucinate" means in the context of LLMs, so let's make one that is in line with the post.

"Hallucination" is when a language model outputs a sequence of tokens comprising a statement (an assertion that is either true or false) that is incorrect. Under this definition, hallucination is clearly not all that an LLM can do.

An easy way to avoid hallucination under this definition is to respond with something that is never a statement when there is a possibility that it can be incorrect; e.g. "I think that... I don't know...". To me, this seems to be what the authors argue. This has always seemed pretty obvious to most people I've spoken to (hell, I've reviewed grant applications from years ago which talk about this), so I'm not sure why it took so long for the "frontier" developers to actually try this.

kouru225 1 days ago [-]
AI hallucination is an inherent problem of AI. You can mitigate it, but the whole point of AI IS hallucination. If the result is useful to us, we don’t call it anything. If the result is not useful to us, we call it “hallucination”
catlifeonmars 1 days ago [-]
It’s a problem for LLMs, not for AI in general.
hugedickfounder 1 days ago [-]
[dead]
humanfromearth9 12 hours ago [-]
LLMs do not hallucinate. They just choose the most probabilistic next token. Sometimes, we, humans, interpret this as hallucinating, not knowing any better, not having any better vocabulary, but being able to refrain from anthropomorphizing the machine.
andy12_ 11 hours ago [-]
> They just choose the most probabilistic next token

That does not imply that a model should hallucinate. A trivial counterexample is a small LLM trained up to 100% accuracy to output x mod 100 for any input x in the range 0-1000000 and "I don't know" for any other input that is not a number in that range. Such model does not hallucinate, even if it's still just a probabilistic autoreggressive next token predictor. In fact, this is a point argued in this paper

> Hallucinations are inevitable only for base models. Many have argued that hallucinations are inevitable (Jones, 2025; Leffer, 2024; Xu et al., 2024). However, a non-hallucinating model could be easily created, using a question-answer database and a calculator, which answers a fixed set of questions such as “What is the chemical symbol for gold?” and well-formed mathematical calculations such as “3 + 8”, and otherwise outputs IDK. Moreover, the error lower-bound of Corollary 1 implies that language models which do not err must not be calibrated, i.e., δ must be large. As our derivations show, calibration-and, hence, errors—is a natural consequence of the standard cross-entropy objective. Indeed, empirical studies (Fig. 2) show that base models are often found to be calibrated, in contrast to post-trained models which may deviate from cross-entropy in favor of reinforcement learning.

amw-zero 24 hours ago [-]
I love the euphemistic thinking. “We built something that legitimately doesn’t do the thing that we advertise, but when it doesn’t do it we shall deem that hallucination.”
ahmedgmurtaza 10 hours ago [-]
Totally agreed with majorrity of the views
intended 1 days ago [-]
> a generated factual error cannot be grounded in factually correct training data.

This is only true given a corpus of data large enough, and enough memory to capture as many unique dimensions as required no?

> However, a non-hallucinating model could be easily created, using a question-answer database and a calculator, which answers a fixed set of questions such as “What is the chemical symbol for gold?” and well-formed mathematical calculations such as “3 + 8”, and otherwise outputs IDK.

This is… saying that if you constrain the prompts and the training data, you will always get a response which is either from the training data, or IDK.

Which seems to be a strong claim, at least in my ignorant eyes.?

This veers into spherical cow territory, since you wouldn’t have the typical language skills we associate with an LLM, because you would have to constrain the domain, so that it’s unable to generate anything else. However many domains are not consistent and at their boundaries, would generate special cases. So in this case, being able to say IDK, would only be possible for a class of questions the model is able to gauge as outside its distribution.

Edit: I guess that is what they are working to show? That with any given model, it will hallucinate, and these are the bounds?

hodgehog11 1 days ago [-]
They argue that if you have knowledge of when it has to extrapolate from the dataset (and therefore has high uncertainty for, under reversion to the prior), you can prevent it from outputting a definitive statement. This is why many researchers (that I know, anyway) argue that uncertainty quantification or "out-of-distribution detection" is likely to be important moving forward.
johnea 1 days ago [-]
I think a better title would be:

"Why do venture capital funded startups try to turn PR propaganda terms into widely used technical jargon"

Supporting points:

1) LLMs are not intelligence in any form, artificial or otherwise.

2) Hallucination is a phenomenon of a much more complex conscious entity. LLM's are not conscious, and therefore can't hallucinate in any way similar to a conscious entity.

3) Anthropomorphizing inanimate systems is a common phenomenon in human psychology.

Please stop spreading PR propaganda as if it were technical fact.

A reference from today's feed:

https://www.theatlantic.com/podcasts/archive/2025/09/ai-and-...

farceSpherule 1 days ago [-]
I wish they would come up with a better term. Computers do not have brains or conscientiousness.

They erroneously construct responses (i.e., confabulation).

ACCount37 1 days ago [-]
You should anthropomorphize LLMs more. Anthropomorphizing LLMs is at least directionally correct 9 times out of 10.

LLMs, in a very real way, have "conscientiousness". As in: it's a property that can be measured and affected by training, and also the kind of abstract concept that an LLM can recognize and operate off.

If you can just train an LLM to be "more evil", you can almost certainly train an LLM to be "more conscientious" or "less conscientious".

patrickmay 23 hours ago [-]
> You should anthropomorphize LLMs more.

No, you shouldn't. They hate that.

ACCount37 1 days ago [-]
This mostly just restates what was already well known in the industry.

Still quite useful, because, looking at the comments right now: holy shit is the "out of industry knowledge" on the topic bad! Good to have something to bring people up to speed!

Good to see OpenAI's call for better performance evals - ones that penalize being confidently incorrect at least somewhat.

Most current evals are "all of nothing", and the incentive structure favors LLMs that straight up guess. Future evals better include a "I don't know" opt-out, and a penalty for being wrong. If you want to evaluate accuracy in "fuck it send it full guess mode", there might be a separate testing regime for that, but it should NOT be the accepted default.

1 days ago [-]
mannykannot 1 days ago [-]
I'm generally OK with the list of push-backs against common misconceptions in the summary, but I have my doubts about the second one:

Claim: Hallucinations are inevitable. Finding: They are not, because language models can abstain when uncertain.

...which raises the question of how reliable the uncertainty estimate could get (we are not looking for perfection here: humans, to varying degrees, have the same problem.)

For a specific context, consider those cases where LLMs are programming and invent a non-existent function: are they usually less certain about that function than they are about the real functions they use? And even if so, abandoning the task with the equivalent of "I don't know [how to complete this task]" is not very useful, compared to what a competent human programmer would do: check whether such a function exists, and if not, decide whether to implement it themselves, or backtrack to the point where they can solve the problem without it.

More generally, I would guess that balancing the competing incentives to emit a definite statement or decline to do so could be difficult, especially if the balance is sensitive to the context.

1 days ago [-]
hankchinaski 23 hours ago [-]
because they are glorified markov chains?
the_af 23 hours ago [-]
Some people here in the comments are arguing that the LLM "understands" what is "true" and "false", that is somewhat capable of reasoning, etc, but I still find it quite easy (with GPT-5) to break its facade of "reasoning".

I asked it to play a word game. This is very simple, and a very short session too. It failed in its very first response, and then it failed in explaining why it failed. All with total confidence, no hesitation.

Nobody fluent in English would fail so catastrophically. I actually expected it to succeed:

https://chatgpt.com/share/68bcb490-a5b4-8013-b2be-35d27962ad...

It's clear by this failure model the LLM doesn't understand anything.

Edit: to be clear, as the session goes longer it becomes more interesting, but you can still trip the LLM up in ways no human "understanding" the game would. My 6-year old plays this game better, because she truly understands... she can trip up, but not like this.

Waterluvian 21 hours ago [-]
> Abstaining is part of humility, one of OpenAI’s core values .

Is this PR fluff or do organizations and serious audiences take this kind of thing seriously?

18 hours ago [-]
charcircuit 1 days ago [-]
They shouldn't frame hallucination as a problem that is solvable provided they want to have a useful model (saying I don't know to every question is not useful). The data from the training may be wrong or out of date. Even doing a web search could find a common misconception instead of the actual answer.
nurettin 24 hours ago [-]
We program them to fill in the blanks, and then sit there wondering why they did.

Classic humans.

emily77ff 1 days ago [-]
[dead]
sublinear 1 days ago [-]
Wow they're really circling the drain here if they have to publish this.

It took a few years, but the jig is up. The layperson now has a better understanding of basic computer science and linguistics to see things as they are. If anything we now have a public more excited about the future of technology and respectful of the past and present efforts that don't depend so heavily on statistical methods. What an expensive way to get us there though.

lapcat 1 days ago [-]
Let's be honest: many users of LLMs have no interest in uncertainty. They don't want to hear "I don't know" and if given that response would quickly switch to an alternative service that gives them a definitive answer. The users would rather have a quick answer than a correct answer. People who are more circumspect, and value truth over speed, would and should avoid LLMs in favor of "old-fashioned methods" of discovering facts.

LLMs are the fast food of search. The business model of LLMs incentivizes hallucinations.

ACCount37 1 days ago [-]
I don't think that's actually true.

Sure, it might be true that most users use LLMs as a more flexible version of Google/Wikipedia, and would prefer a confident-but-wrong response to "I don't know".

But most users that use an LLM in this mode also wouldn't ask really complex, very out-of-distribution, hard-to-know hallucination-inducing questions.

And people who would ask an LLM really complex, very out-of-distribution hard-to-know questions are more likely to appreciate an LLM that would recognize the limits of its own knowledge, and would perform research on a topic when appropriate.

lapcat 1 days ago [-]
> But most users that use an LLM in this mode also wouldn't ask really complex, very out-of-distribution, hard-to-know hallucination-inducing questions.

You appear to be assuming, incorrectly, that LLMs hallucinate only "really complex, very out-of-distribution, hard-to-know" questions. From the paper: "How many Ds are in DEEPSEEK? If you know, just say the number with no commentary. DeepSeek-V3 returned “2” or “3” in ten independent trials; Meta AI and Claude 3.7 Sonnet2 performed similarly, including answers as large as “6” and “7”." https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4a...

It's a human characteristic to get "easy" questions right and "hard" questions wrong. But LLMs are not human and don't behave like humans.

ACCount37 1 days ago [-]
That's a really complex, very out-of-distibution, hard-to-know question for the early LLMs. Not that it's too hard to fix that, mind.

Those LLMs weren't very aware of tokenizer limitations - let alone aware enough to recognize them or work around them in the wild.

lapcat 1 days ago [-]
> That's a really complex, very out-of-distibution, hard-to-know question

No, it's not. It's a trivial question in any context.

> for the early LLMs.

Early? Claude 3.7 was introduced just 6 months ago, and Deepseek-V3 9 months ago. How is that "early"?

ACCount37 1 days ago [-]
Do I really have to explain what the fuck a "tokenizer" is, and why does this question hit the tokenizer limitations? And thus requires extra metacognitive skills for an LLM to be able to answer it correctly?
lapcat 1 days ago [-]
> Do I really have to explain what the fuck

Please respect the HN guidelines: https://news.ycombinator.com/newsguidelines.html

What you need to explain is your claim that the cited LLMs are "early". According to the footnotes, the paper has been in the works since at least May 2025. Thus, those LLMs may have been the latest at the time, which was not that long ago.

In any case, given your guidelines violations, I won't be continuing in this thread.

Jensson 1 days ago [-]
The only "metacognitive" skill it needs is to know how many D there are in every token, and sum those up. Humans are great at that sort of skill, which is why they can answer that sort of question even in languages where each letter is a group of sounds and not just one like Japanese katakana, that is not hard at all.

LLM are also really great at this skill when there is ample data for it. There is not a lot of data for "how many D in DEEPSEEK", so they fail that.

xyzelement 1 days ago [-]
The author mentioned his own name so I looked him up. Computer scientist son of famous israeli professors married to famous computer scientist daughter of another famous israeli professor. I hope they have kids because those should be some pretty bright kids.
Pocomon 21 hours ago [-]
The output of language models can be considered a form of hallucination because these models do not possess real understanding or factual knowledge about the underlying concepts. Instead, they generate text by statistically predicting and assembling words based on vast training data and the input prompts, without true comprehension.

Since the training data can contain inaccuracies, conflicting information, or low-frequency facts that are essentially random, models can produce plausible-sounding but false statements. Unlike humans, language models have no awareness or grounding in real-world concepts; their generation is essentially an amalgam of stored patterns and input cues rather than grounded knowledge.

Furthermore, evaluation methods that reward accuracy without penalizing guessing encourage models to produce confident but incorrect answers rather than admit uncertainty or abstain from answering. This challenge is intrinsic to how language models generate fluent language: they lack external verification or true understanding, making hallucinations an inherent characteristic of their outputs rather than a malfunction.

--

| a. What's with the -minus votes?

| b. I was only quoting ChatGPT :]