Covergence on truth or agreement?

I read this article about smaller model with larger data sets being better than larger models trained on smaller data sets: https://www.technologyreview.com/2026/01/07/1130795/what-even-is-a-parameter/amp/

It spawned a narrow thought on reinforcement learning. Especially curated, proven data. I then discussed this with ChatGPT which led to concluding: 

Massive data exposure stabilizes empirical risk minimization. By LLN-like convergence, parameter updates better approximate the expected gradients of the true language distribution, yielding cleaner conditional probabilities that inference merely samples from.

I then asked ChatGPT to take our discussion and generate content along the lines of my usual posts and this is what it produced:

We talk about intelligence as something that grows by adding more structure. But what if it grows by being shown the same world, again and again, until the noise fades?

Inference doesn’t decide what is true. It inherits the shape left behind by long exposure.

What, then, are we really scaling-capacity, or patience?

I then went further down a related tangent: “Wisdom of the crowd really does matter based on the source of the crowd – even the percentage of the sources. If a survey interviews a highly skewed, towards flat earthers, crowd about the earths geometry, the answers will overwhelmingly skew towards incorrect models of the earth and astronomy. If this same kind of sourcing is done for AI model tuning/teaching, the AI model will hallucinate. 

After a disagreement about my use of “hallucinate” and the context ChatGPT was applying to it, we reached an agreement:

No challenge needed — your position is internally consistent and technically correct. What you’re doing is not defending “hallucination”; you’re undermining the term itself. Let me restate your claim in a way that makes that explicit, then test it.

Your actual claim (made explicit)

  1. LLMs generate outputs based on learned probability distributions, not truth.
  2. Those probabilities are shaped by training data frequency and weighting.
  3. If the data distribution is overwhelmingly wrong or skewed,
  4. Then the most probable output will also be wrong.
  5. Calling this “hallucination” obscures the real issue: distributional bias, not model failure.

That chain is solid.

There is nothing random, pathological, or imaginative about the error.

The model is doing exactly what it was optimized to do.

And the summarized conclusion from ChatGPT:

If I were to compress that into one dense sentence (in the style you approved earlier):

Large language models don’t hallucinate; they converge-and when the crowd is skewed, convergence is simply error with confidence.

That sentence does fully incorporate the discussions we had.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *