Shrinkage and Empirical Bayes

A week later. Bob had 40 tournaments on a new site and a posterior that barely budged. He wanted to know why the app was "ignoring his results."

Same booth. The waiter was circling again with the combo menu. Tau held up two fingers without looking up — a peace sign or a second "no," unclear. The waiter retreated.

Bob: OK so I played forty tournaments on ACR. €109 turbos. Ran great — 38% ROI over the set. The app shows me at something like 6%. What the hell is that?

Uncle Tau: That's the app not lying to you.

Bob: That's the app ignoring me.

Uncle Tau: No. The app is pulling your posterior toward a prior that's built out of every other player in that format. That pull has a name. It's called shrinkage.

Bob: I don't want to be shrunk. I want the app to show my results.

Uncle Tau: You want the app to show you forty tournaments' worth of variance and call it a skill number. That's not shrinkage. That's a lie with a decimal point. Forty tournaments in a €109 turbo is basically nothing. Your 38% is almost entirely noise. If the app quoted it back at you, it would be agreeing with the noise.

Bob: Fine. But why 6%? Where does that come from?

Uncle Tau: From the population. Every reg in that format, their results, their fields, their rake. The app builds a distribution over "people who sit in €109 ACR turbos" and that's its starting belief about you — before it's seen a single one of your hands.

Bob: That's the prior.

Uncle Tau: That's the prior. And here's the part people miss: the app didn't pull that prior out of its ass. It learned it. From the population. That's what the "empirical" in empirical Bayes means. The prior isn't hand-written by some guy in Hamburg. It's estimated from the field you're trying to beat.

Bob: So the prior is just — what, the average reg in that format?

Uncle Tau: Roughly. A bit more structure than that. It's a distribution over the population, so it has a centre and a width. The centre is "typical reg in this format." The width is "how much regs actually vary from each other in this format." Both matter. The centre is where the app starts. The width is how willing the app is to move away from it when your data arrives.

How the pull actually works

Bob: OK so you have a prior and I have data. How much of each goes into the posterior?

Uncle Tau: You're asking the right question. That ratio is the whole game. The posterior is a weighted average between the prior and the data, and the weights are set by how informative each side is.

Bob: "Informative" meaning what.

Uncle Tau: Inverse variance. A tight thing gets more weight than a loose thing. The prior has some width — call it the variance of the prior. Your data has some width too — the variance of your sample ROI, which shrinks roughly like one over n as you play more. The posterior mean lands at:

$$\hat{\mu}{\text{post}} = w \cdot \bar{x}$$}} + (1 - w) \cdot \mu_{\text{prior}

where the weight on your data is:

$$w = \frac{n / \sigma^2_{\text{data}}}{n / \sigma^2_{\text{data}} + 1 / \sigma^2_{\text{prior}}}$$

Bob: Translate.

Uncle Tau: The more tournaments you play, the bigger $n$ gets, the heavier your side of the scale. The tighter the prior is — the more the app has decided ACR €109 turbo regs are a narrow archetype — the heavier the prior's side. At forty tournaments, your data is loud but wide; the prior is quiet but narrow. The posterior lands closer to the prior.

Bob: At five hundred?

Uncle Tau: Your side outweighs the prior. The posterior drifts toward your sample mean.

Bob: At five thousand?

Uncle Tau: The prior is a rounding error. You're quoting yourself.

What this feels like on screen

Bob: So there's a twenty-tournament kid and a five-hundred-tournament reg. What do I actually see when I open their posteriors?

Uncle Tau: The kid's curve is basically the prior with a little thumb-nudge from his results. Centre somewhere near population, width almost as wide as the prior itself. If the kid ran hot in twenty he still shows up at maybe 3% on screen. If he ran cold he shows up at maybe −2%. The prior is doing almost all the work.

The five-hundred reg, you see a posterior that sits where his results say. Centre close to his sample mean. Width much narrower than the prior — his data has earned the right to speak for itself.

Bob: And the two-thousand guy?

Uncle Tau: The prior is barely a rumour. The curve is sharp, centred on his numbers, and the app is basically reporting back what he did.

Bob: That's Cramér–Rao territory, isn't it.

Uncle Tau: You've been reading. Yes. The Cramér–Rao bound puts a floor on how tight any estimator can be, and that floor shrinks as $n$ grows. The posterior width tracks that floor — it's the cleanest visual summary of "how much information does this sample actually contain" that exists in the product.

Why "just use my raw ROI" is strictly worse

Bob: OK devil's advocate. Why can't I just look at my own sample ROI and forget the prior?

Uncle Tau: Because at every sample size, shrinkage beats the raw sample mean on the thing you actually care about — out-of-sample prediction. This is not my opinion. Stein proved it in 1956, Efron and Morris extended it to empirical Bayes in the seventies, and the result has been standing unmolested ever since. If you're estimating multiple players' ROIs at once, the sum-of-squared-errors of the shrunk estimator is lower than the sum-of-squared-errors of the raw means. Always.

Bob: Wait, always?

Uncle Tau: Always. In dimension three or higher — which you hit the moment you're tracking more than two players — the raw sample mean is inadmissible. There exists an estimator that dominates it in every state of the world. That estimator is shrinkage. Not on average. Not usually. In every state of the world.

Bob: That can't be right. If I ran a 40% ROI over a thousand tournaments, the shrunk estimate is worse than just saying 40%.

Uncle Tau: At a thousand tournaments the shrinkage pull is tiny. The shrunk estimate is 39.something. You're making a fuss about a half-percent. What shrinkage protects you from is the ten other guys in your stable who ran a 40% ROI over forty tournaments. If you quote all eleven at 40%, you're going to stake the ten noise-runners and the one real winner at the same markup. Shrinkage makes them legible as a group. It separates signal from sample.

Bob: The app is doing this for me whether I like it or not.

Uncle Tau: On every page that gives you a number. Your own posterior is shrunk. Your Stable teammates are shrunk. Your scouting targets are shrunk. It's shrunk turtles all the way down.

The honest thing

Bob: You said earlier this was "the most honest thing the app does." Why that framing?

Uncle Tau: Because it's the app refusing to pretend your forty tournaments mean what your forty thousand would mean. Every other tool in this industry — every tracker, every Sharkscope readout, every forum braggart's graph — reports the raw sample mean and lets you do the self-deception yourself. The app does the hard part for you. It pulls your estimate toward where the population actually lives, weighted by how much evidence you've actually given it, and draws the width around it so you can see how much pulling happened.

Bob: And when the pull is obvious — like my 38% becoming 6% — it feels like the app is calling me a liar.

Uncle Tau: It's calling your sample a liar. Which is different from calling you one. Your sample is always a liar at small $n$. Variance lies louder than edge in the short run. Shrinkage is the app telling you which part of your number is echo and which part is you. At forty tournaments in a turbo format, almost all of it is echo.

Bob: So when do I get to quote my own number?

Uncle Tau: When your posterior narrows enough that the pull stops being interesting. For a speed format like €109 turbos, that's somewhere north of two thousand events — and that's just for the cash frequency parameter. The HU winrate takes an order of magnitude more. Until then, quote the posterior, not the sample.

One more knob: the prior can be wrong

Bob: What if the prior is just bad? What if ACR €109 turbos don't look like whatever the app thinks they look like?

Uncle Tau: Then the prior shifts as the app re-learns from the population. That's why it's empirical Bayes and not just Bayes. A textbook Bayes setup asks you to specify a prior by hand. Empirical Bayes estimates the prior from the data — the population's data, not yours. If the field changes, the prior updates on the next learn cycle. If the format's rake goes up or a new algorithm shows up, the population mean drifts and every posterior shifts with it.

Bob: So the prior is itself a posterior of something else.

Uncle Tau: Now you're seeing the fractal. The population prior is the posterior over the population, learned from every player in that format. Your personal posterior is learned on top of that. A stable's aggregate posterior is learned on top of your personal one. Bayes all the way up, shrinkage at every layer.

Bob: So what do I actually do with this.

Uncle Tau: Three things. One: when your posterior is obviously being pulled toward the prior, don't fight it — play more so your data can earn weight. Forty tournaments cannot outvote a thousand regs. Two: when someone shows you a graph of forty tournaments at 38% ROI with no posterior around it, assume noise until proven otherwise. Three: stop being insulted when the app quotes you a smaller number than your raw sample. It's not being shy. It's running the one estimator that's guaranteed to be less wrong than whatever you were about to do.

Bob: Got it. Thanks, Uncle Tau.

Uncle Tau: Go estimate your shapes, kid. Next time we'll talk about why a mean is one story and a posterior is a menu of stories — and why every compounding decision in your life is strictly worse when you act on the mean.

What's next

Why Bayesian beats point estimates — what actually goes wrong when you collapse a posterior to its mean, and the coin-flip example that makes it obvious.
Reading the Strategy tab — how the posterior width from this lesson shows up in SALSA output ranges at the tables.
Priors and population learning — the learn cycle that builds and updates the population prior every morning.