The Bitter Lesson’s Bitter Lesson

2 months ago

research

Originally published on Andrew’s SubStack.

Richard Sutton and Dwarkesh discussed the Bitter Lesson, where Richard argued that babies and animals don’t learn through imitation, so state-of-the-art LLMs are pursuing the wrong path by imitating humans through next-token prediction. While Richard is correct about babies and animals, he overlooks the extraordinary computational savings we achieve through this approach.

Sutton’s position effectively suggests we should re-run evolution from scratch rather than inherit knowledge from our evolutionary and cultural history. This encounters the Bitter Lesson’s Bitter Lesson: if we discard everything humanity and nature have learned and attempt to re-learn policies from first principles, we must regenerate a comparable set of samples that evolution used…potentially greater than 10^50 operations when accounting for neural activity across every organism that contributed to our evolutionary trajectory. For reference, current AI models use around 10^26 operations.

The Computational Reality of Pure Learning

Evolution required approximately 4.5 billion years (4.5 * 10^9 years) and many quintillions of parallel experiments across roughly 10^30 living organisms to develop our cognitive capabilities. When we train LLMs on human-generated text, we inherit the compressed output of this process. Every sentence encodes millions of years of evolutionary optimization for communication, reasoning, and world-modeling.

Even human babies inherit sophisticated neural architectures pre-optimized for language acquisition and social learning. The “blank slate” is already running highly optimized evolutionary code. What appears to be learning from scratch is actually building upon an extraordinarily sophisticated computational foundation.

Two Paradigms: Broadcasting vs. Broad Listening

Sutton’s framework illuminates a fundamental division in AI research between experiential learning (direct environmental interaction) and inherited learning (accumulated knowledge transmission). This distinction reveals something essential about intelligence itself.

Consider what happens when knowledge can’t be transmitted effectively. Animals keep getting hit by cars not because they can’t learn from experience—it’s that the animals who experience it struggle to communicate the danger to their peers and offspring. Each generation relearns the same lethal lessons… because they don’t have information technology.

To solve the problem of gathering lived experience across time and space, humans developed information technology.

We began with language 250,000 years ago, creating two revolutionary capabilities: broadcasting (transmitting learned knowledge to others) and broad listening (synthesizing information from multiple sources into superior world models). And since that time, we’ve advanced information technology by increasing the scale at which we can broadcast and broad listen.

This is wholly different from re-learning through experiential learning. Modern physicists don’t re-derive calculus from first principles…they inherit humanity’s mathematical frameworks and extend them further.

The Knowledge Access Challenge

Nevertheless, Sutton has a point. LLMs have exhausted the internet. So what’s next? Have LLMs run their course, requiring us to pivot to experiential learning?

Not necessarily. Current LLMs demonstrate broad listening at unprecedented scale, but they access only a fraction of humanity’s accumulated knowledge. Leading AI models are trained on datasets measuring in hundreds of terabytes…for reference, you could store GPT-4’s training data using a few dozen consumer hard drives from Walmart. Meanwhile, the world has digitized an estimated 180 zettabytes of data, over a million times more than what trained today’s leading models.

The vast majority of human knowledge remains locked in private databases, medical records, proprietary research, and institutional knowledge. Consider the scale:

Current LLM training data: ~100-200 terabytes
All digitized human knowledge: ~180 zettabytes (180,000,000,000,000 terabytes)
Ratio: Over 1,000,000,000x more data exists than we currently use

Imagine AI systems practicing broad listening across all human knowledge—every hospital’s patient data, every company’s operational insights, every research institution’s findings. This isn’t just more data; it’s higher-quality data that organizations continuously validate for accuracy because they depend on it for operational decisions. This would represent a paradigm shift comparable to the original development of language, potentially accessing millions of times more high-quality data than current models.

The challenge isn’t data scarcity…it’s enabling knowledge transfer while preserving control and ownership (e.g. privacy, safety, security, copyright, etc.).

Beyond Current Architectures

The computational advantages of inherited learning point toward AI architectures that maintain attribution and control, allowing data owners to contribute to collective intelligence while retaining governance over their knowledge. Such systems would implement broad listening at civilizational scale while respecting contributor interests.

Rather than viewing inherited learning as inferior to pure discovery, we should recognize it as the foundation for systems that can stand on the shoulders of all human knowledge. This represents the next evolution of the broad listening capabilities that made human civilization possible, scaled to encompass our entire species’ collective intelligence.

The technical and policy frameworks for implementing such systems are rapidly developing. The convergence of privacy-enhancing technologies, attribution-based AI architectures, and new approaches to data governance creates unprecedented opportunities to build AI systems that truly leverage the computational advantages of inherited learning at civilizational scale.

The bitter lesson’s bitter lesson reveals that the most successful intelligence strategy combines inherited optimization with continued learning capabilities. Nature spent billions of years learning not to start over—and the future belongs to AI systems that can inherit vast human knowledge while maintaining robust capabilities for novel discovery.

Author: Andrew Trask

Category:

research

Topics:

Remote Data Science, AI Ethics, Large Language Models (LLMs), Privacy-Enhancing Technologies (PETs), Structured Transparency

Continued Reading...

View all posts

October 24, 2025

product
research

Tutorial: Turn Any LLM into an Expert Assistant with Federated RAG – Part 1

October 24, 2025

product
research