Modeling and Compression

Exploring how data science models are essentially sophisticated forms of compression

Abstract representation of data compression with swirling patterns resembling a black hole

There’s a running joke that data scientists are just wizards who traded robes for hoodies. We wave our algorithmic wands, mutter about hyperparameters, and (poof) insights appear. One night after shipping yet another model, it hit me that the spell might be simpler than we make it sound: maybe our models aren’t mystical oracles at all. Maybe they’re just exquisitely fussy compressors. Think about how you explain a complicated situation to a friend; you don’t dump every detail, you distill. You keep what matters, drop what doesn’t, and hope the essence survives the trip. Models do the same thing at industrial scale. They turn unruly datasets into compact summaries, CliffsNotes for reality, where a few parameters carry the weight of millions of observations. It felt like magic when I first saw it. It still does. But the trick has a name: compression.

Here’s the part we rarely say out loud: modeling is the art of losing information gracefully. Lossless compression is honest but bulky; lossy compression is daring and useful. Most machine learning lives in that daring middle, where we intentionally forget. The loss function is our editorial policy, our way of telling the model which omissions are forgivable and which are cardinal sins. When we tune it, we’re negotiating a trade: shrink the world enough to make it portable, but not so much that we lose the plot. And this isn’t just math; it’s judgment. Every feature we engineer, every regularizer we add, every architectural choice we make is a decision about what deserves to be remembered. In that light, training looks less like summoning intelligence and more like teaching a zip file to have taste.

Of course, some things refuse to be neatly packed. Try summarizing the ocean into a bottle and you quickly learn the bottle’s opinion doesn’t matter. Weather on a turbulent week, markets mid-panic, human behavior at scale-these are high-entropy realities that push back against our urge to compress. The humbling lesson is not that modeling fails, but that nature is telling us something: some systems are only compressible to the extent that their signal outmuscles their chaos. That’s why pruning often helps. When we remove details that look busy but don’t carry the story, the remaining signal stands taller. Counterintuitive as it sounds, you can learn more by seeing less, as long as what remains is the essence. Generalization is simply compression that traveled well. If a model handles data it’s never met, it didn’t memorize; it remembered the right things.

Then there’s causality, the difference between a spoiler-free recap and actually understanding the plot. Correlational models tell us what tends to happen; causal structure explains why. In compression terms, causality is a deeper code: a shorter description of how the world generates what we see. It’s the hidden mechanism that lets a summary predict the sequel, not just recount the pilot. And suddenly “signal vs. noise” stops being a cliché and becomes operational: signal is whatever survives compression and still drives outcomes; noise is what disappears without consequence. Every model is a noise filter with a personality. Tighten the filter and you risk throwing away character development; loosen it and you’re stuck reading breakfast descriptions when the plot is sprinting ahead.

Look at modeling through this lens and a lot of the craft clicks into place. Feature selection becomes curation. Model complexity becomes bitrate. Regularization becomes compression discipline. The question shifts from “How do I get to 99% accuracy?” to “What story about this data deserves to be told, and at what fidelity?” So the next time you reach for a shiny architecture, try a softer provocation: if this model is a compressor, what am I keeping, what am I discarding, and am I forgetting the right things? Because the best models don’t remember everything. They know what to forget. And if that sounds less like wizardry and more like good editing, well, maybe our job is less Gandalf and more a tasteful zip file with a sense of humor.


If intelligence is compression, read The Intelligence Inflection on why our bandwidth is now the bottleneck, not AI’s capabilities.