Teaching LLMs to Reason Like Bayesians: Training on Process Beats Training on Answers

Kabui, Charles

Google Research published a paper in Nature Communications showing that LLMs struggle with Bayesian inference, the mathematically optimal way to update beliefs as new evidence arrives. In a controlled flight recommendation task where a model must infer a user’s preferences over multiple rounds, an optimal Bayesian assistant reached 81% accuracy while off-the-shelf LLMs performed considerably worse and plateaued after a single interaction instead of improving with more evidence. The fix: fine-tune LLMs not on the correct answers (“oracle teaching”) but on the Bayesian assistant’s own predictions, including its early wrong guesses. This “Bayesian teaching” approach, a form of distillation, trained the models to maintain uncertainty and update beliefs gradually. The fine-tuned models agreed with optimal Bayesian predictions 80% of the time and, critically, generalized across domains, transferring reasoning learned on synthetic flight data to hotel recommendations and real-world web shopping without retraining.

Any system that personalizes recommendations, adapts to user behavior, or makes sequential decisions under uncertainty benefits from better probabilistic reasoning. Current LLMs default to crude heuristics (like always recommending the cheapest option) instead of tracking individual preferences. Bayesian teaching fixes this with standard supervised fine-tuning, no architectural changes required.

The deeper finding is that teaching models the reasoning process, even when it produces wrong answers early on, beats teaching them correct answers directly. This echoes a pattern emerging across AI: Meta FAIR’s multimodal pretraining research similarly found that training on the right structure matters more than brute-forcing the right outputs. Process over product seems to be a general principle for building models that generalize.

Sources:

Disclaimer: For information only. Accuracy or completeness not guaranteed. Illegal use prohibited. Not professional advice or solicitation. Read more: /terms-of-service

Reuse

GNU GENERAL PUBLIC LICENSE v3.0(View License)

Citation

BibTeX citation:

@misc{kabui2026,
  author = {{Kabui, Charles}},
  title = {Teaching {LLMs} to {Reason} {Like} {Bayesians:} {Training} on
    {Process} {Beats} {Training} on {Answers}},
  date = {2026-03-11},
  url = {https://toknow.ai/posts/bayesian-teaching-llm-probabilistic-reasoning-nature-google/},
  langid = {en-GB}
}

For attribution, please cite this work as:

Kabui, Charles. 2026. “Teaching LLMs to Reason Like Bayesians: Training on Process Beats Training on Answers.” https://toknow.ai/posts/bayesian-teaching-llm-probabilistic-reasoning-nature-google/.

Other Formats

Reuse

Citation