20VC: AI Scaling Myths: More Compute is not the Answer | The Core Bottlenecks in AI Today: Data, Algorithms and Compute | The Future of Models: Open vs Closed, Small vs Large with Arvind Narayanan, Professor of Computer Science @ Princeton

20VC · Harry Stebbings — Arvind Narayanan · August 28, 2024 · Original

Most important take away

The era of getting better AI by simply training bigger models on more data is ending: data is the real bottleneck, GPT-5 is unlikely to be as big a leap over GPT-4 as GPT-4 was over GPT-3.5, and the next wave of value will come from smaller, cheaper, on-device models plus a layer of agents and products built on top of commoditized foundation models. Treat sci-fi-style AGI fears and most “AI regulation” framings with skepticism; the real risks (deepfake nudes, social-media distribution of misinformation, market concentration) are concrete and call for targeted policy, not blanket AI restrictions.

Summary

Actionable insights and patterns from the conversation:

Tech/AI patterns to watch

Scaling laws are plateauing. Compute alone won’t keep yielding GPT-3 to GPT-4 style jumps. Plan products as if frontier capability gains will be incremental, not exponential.
Data, not compute, is the binding constraint. YouTube-scale corpora sound huge but de-duplicated text tokens are an order of magnitude smaller than what frontier models already ingest. New emergent text capabilities are largely tapped out.
Synthetic data helps with quality (filling gaps, rare languages, math) but won’t scale quantity. “Snake eating its tail” loops degrade rather than improve models. Quality > quantity.
Smaller models are the near-term frontier. Cost, latency, privacy, and on-device deployment are pushing aggressive distillation. Training cost rises slightly, inference cost (which dominates over a model’s lifetime) drops.
Jevons paradox applies: as inference gets cheaper, total inference spend goes up because models get embedded into more always-on workflows (email scanning, code generation with massive retry budgets, screen observation).
Models will commoditize. Differentiated value will move up the stack to agents and products. Great ideas can come from 2-person startups or academic labs, not just hyperscalers.
Tacit “show your work” knowledge inside organizations is the next data frontier. Agents will improve through closed-loop deployment inside enterprises, not passive observation — expect a slow, self-driving-car-style rollout with reliability nines.
Benchmarks are a minefield. Models are optimized (intentionally or not) to score well; vibes diverge from leaderboards. Evaluate by professionals’ real-world experience, not benchmarks. Bar exam pass means almost nothing for actual legal work.
AGI predictions from CEOs have been “one step away” for 50+ years. Treat overconfident timelines as marketing, not forecasting.
Hardware/model cycle mismatch is real but temporary. Every exponential is a sigmoid; hardware cycles will catch up as models commoditize.

Career advice and posture

Be a skeptical reader of hype. Narayanan built credibility by publicly disagreeing with consensus (compute scaling, AGI imminence, AI regulation framing). Develop a contrarian-but-rigorous voice.
Don’t conflate yourself with “the typical user.” AI developers tend to be self-taught autodidacts and over-index on tools that suit them; most learners (and workers) need social scaffolding.
For builders: pick product over AGI mission. Companies that assumed “AGI is so close we don’t need to ship products” lost ground (e.g., no ChatGPT mobile app for six months). Find product-market fit; AI doesn’t suspend normal startup rules.
Jobs are bundles of tasks; AI automates tasks. Identify which tasks in your role AI subsumes and reposition toward the residual ones (judgment, relationships, integration, physical presence). Bank-teller analogy: automation can grow headcount by expanding the surface area of a business.
For technologists: get into policy. 90% of policy work is frustrating but the 10% is high-leverage, and the field is starved for credible tech voices.
Education and medicine will not be revolutionized by “AI in your pocket” alone. The social/physical components are load-bearing. Build complements to institutions, not replacements.

Risk framing for builders and policymakers

The real near-term harms are concrete: non-consensual deepfake nudes, fake reviews, education-system load. Address these as specific harmful activities (the FTC banned fake reviews regardless of how they’re produced), not as generic “AI regulation.”
Misinformation is mostly a distribution problem (social media), not a generation problem. Place responsibility on platforms.
Open vs. closed: trying to keep models out of “bad guys’” hands is a losing strategy because near-frontier models already run on personal devices. Design for a world where capable AI is universally available, and invest in AI-for-defense (e.g., automated vuln finding before shipping).
Watch market concentration. Foundation models financed by cloud cash cows (Google, Amazon, Microsoft/OpenAI, Meta) raise antitrust questions regulators are starting to engage with.
The “liar’s dividend” — people disbelieving real content — may matter more than fake content itself, raising the value of credible mainstream sources.

What Narayanan no longer believes

He overestimated AI’s pace because GPT-4 appeared three months after GPT-3.5 (but had been training 18 months). Nothing has clearly surpassed GPT-4 qualitatively since. Future gains likely need new scientific ideas (agents, new architectures), not just scale.

Chapter Summaries

Intro and credentials: Narayanan, a Princeton CS professor and director of Princeton’s Center for Information Technology Policy, does technical AI research, studies societal effects, and advises policymakers.
Crypto vs. AI hype: He soured on crypto by 2018 because the tech wasn’t the real bottleneck for the problems it claimed to solve and the philosophy of replacing institutions with scripts was wrong. AI has been a net positive; crypto has not.
The compute question: GPT-3.5 to GPT-4 was mostly a size jump. Future cycles of order-of-magnitude bigger models are unlikely because data is the bottleneck; expect smaller, cheaper models with comparable capability.
Data limits and synthetic data: YouTube-derived text is smaller than it sounds; synthetic data helps quality not quantity; tacit “whiteboard” knowledge inside organizations is not on the web and must be learned via deployed feedback loops.
Why models are getting smaller: Cost, privacy, and on-device deployment dominate. Inference cost dominates lifetime cost; smaller models lower it even as training cost grows.
Jevons paradox in AI: Cheaper inference means more pervasive use (always-on email/document scanning, mass-retry code generation) so total spend rises.
Hardware vs. model cadence: Yes, frontier training outpaces hardware refresh, but every exponential is a sigmoid; both curves will taper and models will commoditize.
Evaluation problems: Benchmarks get gamed, contaminated, and don’t reflect real work (Bar exam example). Trust practitioner vibes over leaderboards.
AGI predictions: CEO timelines have been “one step away” for 70 years. Each step reveals new complexity. Don’t over-index on overconfident forecasts.
Products vs. AGI mission: OpenAI’s delayed ChatGPT mobile app reflected a flawed “AGI makes products obsolete” mindset. Discipline can pursue both, but most companies should prioritize product.
Market concentration: Foundation models may consolidate around 3-4 cloud-backed providers; value will move to the agent/product layer. Regulators (US, UK CMA, EU) are paying attention.
Regulation philosophy: Regulate harmful activities (fake reviews, deepfake nudes), not “AI” as a category. Allow and watch is broadly right, with carve-outs for clearly serious harms.
Misinformation debate: Narayanan thinks AI-generated misinformation fears are overblown; the real lever is distribution (social media) and source credibility. Stebbings pushes back; Narayanan concedes the harm but locates the fix in platforms and trust, not AI.
Medicine and education: Skeptical of “GP in your pocket” and “tutor in your pocket” narratives — physical exams and social motivation matter. AI integrates best as a complement to institutions.
Jobs: Replacement fears overblown. Jobs are bundles of tasks; full automation of all 20 tasks in a job is rare. Bank-teller-after-ATM is the canonical pattern.
AI as a weapon: Category error — AI is an enabler (e.g., cyber vuln discovery), not a weapon. Closed-model strategies will fail because capable models will be universally available; invest in AI-for-defense.
What he stopped believing: Speed-of-progress was inflated by GPT-4’s misleading three-month gap; bigger models alone aren’t working; new scientific ideas needed.
Biggest societal misconception: Sci-fi-driven fears of self-aware AI. Today’s architectures don’t lead there; if it ever happens, it will be a deliberate choice societies can regulate against.
Quick-fire: Leaderboards becoming less useful as the gap to real work grows; he’d push OpenAI toward radical transparency; agents should aim for the mundane parts of “Her”; Nvidia is trying to migrate up into services; tech policy is 90% frustration and 10% high-leverage; he leans LeCun (LLMs are an “off-ramp” to superintelligence; new breakthroughs needed); the question he wishes he were asked is about AI’s impact on children.