The Model Is Not Confident.
It Is Calibrated.

LLMs hallucinate at rates from 9% to 75% depending on domain. MIT found they use more confident language when they are wrong. The training process systematically rewarded certainty, not accuracy. Here is what 0.94 actually means — and what it does not.

XAI Baba March 2026 · 8 min read Hallucination · Calibration · RLHF

Like many gurus I've had several students, of every kind. This however is about a particular kind of student that every worthy teacher recognises. They are the ones who, when they do not know the answer, do not leave the question blank. They write something. Anything. With the calm authority of someone who has done this before and been rewarded for it.

Your AI assistant is that student. And someone gave it a test that only rewards correct answers and never penalises the wrong ones.

This is not a metaphor. It is, according to OpenAI's own researchers, the precise mechanism by which large language models hallucinate. Language models hallucinate because training and evaluation procedures reward guessing over acknowledging uncertainty. When models are graded only on accuracy — the percentage of questions they get exactly right — they are encouraged to guess rather than say "I don't know." Guessing sometimes works. Admitting ignorance never scores points. The model learned the lesson perfectly.

The Numbers Behind The Mask

Before we go further, Baba wants you to sit with some data. Not to alarm you. To orient you.

Hallucination rates by domain — these are not the same tool on different problems, they are different tools

Even the latest models have hallucination rates above 15% when asked to analyse provided statements. On legal queries, the situation is considerably worse — a 2024 Stanford University study found that LLMs hallucinated at least 75% of the time about court rulings. The same researchers found that the models collectively invented over 120 non-existent court cases, complete with realistic names, dates, and fabricated legal reasoning.

When models are graded only on accuracy — the percentage of questions they get exactly right — they are encouraged to guess rather than say "I don't know."

— OpenAI Research · arXiv:2509.04664

The medical picture is equally clarifying. A 2025 MedRxiv study found a 64% hallucination rate on complex clinical cases without mitigation prompts. With the best mitigation available, the best-performing model — GPT-4o — still hallucinated nearly one in four times. In business: 47% of enterprise AI users made at least one major business decision based on hallucinated content in 2024. Global financial losses tied to AI hallucinations hit $67.4 billion in 2024.

The Twist Nobody Told You

Here is where it becomes genuinely strange. You might reasonably assume that when a model is wrong, it sounds less certain — hedging, softening, trailing off. A reasonable assumption. Also incorrect.

MIT research from January 2025 found that AI models use more confident language when hallucinating than when stating facts. Models were 34% more likely to use phrases like "definitely," "certainly," and "without doubt" when generating incorrect information.

The more wrong the model is, the more certain it sounds — 34% higher confident language use when hallucinating

Read that again slowly. The more wrong the model is, the more certain it sounds.

This is not a quirk or an edge case. It is structural. When the model reaches the boundary of what it actually knows, it does not stop. It accelerates. It fills the gap with the most statistically plausible next words — which, in the register of a confident assistant, happen to be words like "certainly" and "of course" and "the answer is."

The Courtroom Moment

In February 2024, a customer named Jake Moffatt took Air Canada to the Civil Resolution Tribunal over a bereavement fare. Air Canada's support chatbot had told him he could retroactively request a bereavement discount within 90 days. The actual policy allowed no such thing. The chatbot had hallucinated a policy. Moffatt had relied on it. Air Canada's defence was, essentially, that customers should not trust their chatbot. The tribunal found this unpersuasive. Air Canada was ordered to pay damages and honour the fare.

The chatbot's confidence did not save the airline. It cost them. This is what calibration failure looks like outside the benchmark.

What Calibration Actually Means

A calibrated model would follow the diagonal. LLMs consistently overstate confidence — the gap widens as stated confidence rises

A calibrated model is one where the confidence matches the accuracy. When it says it is 90% sure, it is right 90% of the time. When it says it is 60% sure, it is right 60% of the time. The expressed confidence is an honest signal, not a trained performance.

The reinforcement learning point is worth pausing on. RLHF — the training method used to make models more helpful and pleasant — teaches models to produce outputs that humans rate positively. Humans, it turns out, tend to rate confident-sounding answers more positively than uncertain ones. So the training process systematically rewarded confidence. The model learned that certainty is what the audience wants. It delivers certainty.

The Good News, Stated Precisely

Baba does not leave you in the fog. The rate of improvement is accelerating: some models reported up to a 64% drop in hallucination rates in 2025. Retrieval-Augmented Generation (RAG) is the most effective technique so far, cutting hallucinations by 71% when used properly. There are now four models with sub-1% hallucination rates on summarisation benchmarks.

In December 2024, Google researchers discovered that asking an LLM "Are you hallucinating right now?" reduced hallucination rates by 17% in subsequent responses. It is a strange prompt. It also works, because it activates internal verification processes the model already has but does not routinely engage. The model, it turns out, has some capacity for honesty. It simply was not asked.

What You Can Do Today

Distinguish between domains. The hallucination rate for general knowledge questions is around 9%. For legal queries it can reach 75%. For complex medical questions, 64%. Know which domain you are in before you decide how much to trust the output.

Ask for uncertainty explicitly. Add "If you are not certain, say so" to any consequential prompt. The model has the capacity for epistemic humility. It just needs permission.

Never use a confident AI answer in a high-stakes context without a second source. Not because the model is unreliable. Because the model's confidence is not a signal of reliability. These are different things, and confusing them is the entire problem this piece is about.

The model cannot tell you when to stop trusting it. That is your job. It always was.

The Model Is Not Confident.It Is Calibrated.