20VC: Perplexity's Aravind Srinivas on Will Foundation Models Commoditise, Diminishing Returns in Model Performance, OpenAI vs Anthropic: Who Wins & Why the Next Breakthrough in Model Performance will be in Reasoning

20VC · Harry Stebbings — Aravind Srinivas · June 5, 2024 · Original

Most important take away

The next major leap in AI will come from models that can reason iteratively — producing an output, eliciting feedback, refining the rationale, and converging on a better answer — not from simply scaling parameters and tokens. The biggest beneficiaries of foundation-model commoditization will be application-layer companies that orchestrate models, data, and UX into products users cannot live without; trying to train your own frontier base model as a smaller player is a losing game.

Summary

Aravind Srinivas, co-founder and CEO of Perplexity, argues that brute-force scaling of foundation models is hitting nuanced limits — bigger models on more tokens only pay off if data curation, mixture-of-experts design, and training recipes are executed with extreme care. He pushes back on the idea that verticalized, domain-specific models will dominate: the “magic” of LLMs comes from emergent, general-purpose capabilities arising from training on enormously diverse data, not from narrow domain fine-tuning. Code is the rare domain with enough tokens to support specialization; most enterprise data is too thin to teach models new reasoning.

The next breakthrough, he believes, will be self-bootstrapped reasoning — models that generate an output, explain it, evaluate correctness, and iteratively train on their own rationales (echoing rumored work like OpenAI’s Q* and Stanford’s STAR papers). This will be capital-intensive: each experiment burns inference compute, so only 4–5 well-funded labs can realistically play, and whoever cracks it first and pours all capital into scaling it could win decisively. His bet for who lands it first: OpenAI (capital + speed + lead) or Anthropic (algorithmic superiority, better post-training per dollar).

Actionable insights and patterns:

Career advice (from Sam Altman, relayed by Aravind): identify what you’re naturally great at by finding things that feel easy to you but hard to others — that’s where you can be “mu plus two sigma” above the rest.
Founders should ask themselves daily: “If today was my last day, would I still be doing this?” A consistent “no” means rethink priorities.
“Competitors don’t kill startups — startups kill themselves.” Slow execution, indecisive CEOs, lack of focus, and inefficient capital use are the real pre-mortem causes (Dropbox vs. Box cited as example).
For builders on top of foundation models: don’t try to be a foundation-model lab. Post-train, orchestrate models + data + UX, and you benefit whether models commoditize or not. The wrapper critique loses force when you assemble engineering, design, AI, and search talent in a way generic product/AI hires alone cannot replicate.
On monetization: subscriptions ($20/month) are not high-margin enough at scale. Advertising remains the greatest business model of the last 50 years (~80% margins for Google), but only works at scale and only if you crack relevance. Diversify across subscription, ads, API, and enterprise to avoid Google’s mistake of over-greedy single-source revenue and to keep shareholders and users aligned (Bezos principle).
Enterprise GTM is wide open in AI: no real lock-in yet, custom prompts port easily, brand and team size give early advantage but the game is just starting. Perplexity Enterprise was motivated by Google being the most-used enterprise tool — and AI-native search makes employers nervous about data leakage, opening the door for compliant alternatives.
Tech pattern — long context vs. instruction following: models now have huge context windows but instruction-following degrades when prompts are stuffed. Long-context capability without robust instruction-following is why models still can’t reliably write entire codebases.
Tech pattern — valuation insight: OpenAI/Anthropic’s value isn’t the current model, it’s “the machine that builds the machine” — the tacit knowledge and team that will produce the next model. They won’t get acquired while they keep shipping breakthroughs; the moment they stop, leverage collapses.
Product pattern: new features must tie into existing user intent in the app. Meta’s Stories worked only when grafted onto the existing top-of-feed flow, not as a separate app. WhatsApp search integrations fail because users open WhatsApp to message, not to search.
Browsers won’t be disrupted by chat UIs — people still need to browse, fill forms, log in. The real disruption comes when agents can act inside the browser (“start the podcast at Riverside,” “buy this on Amazon”), pointing toward an AI-native OS.
Aravind’s biggest mindset change in the last year: take a longer-term view on people. Those who don’t hit the ground running immediately can still transform themselves given time.
Biggest misconception in AI: that it’s a bubble because most people don’t yet use chatbots. The chat UI is unfamiliar — once AI is embedded in the form factors users already know (Gmail, Docs, Search), impact will be enormous.

Chapter Summaries

Origin story: Aravind stumbled into ML through a college contest he won by brute-force random search without knowing what the algorithms meant, which gave him confidence to pursue the field. Reinforcement learning under Rich Sutton’s student, plus DeepMind’s Atari paper, pulled him into real AI research.
Diminishing returns: Scaling still works but only with meticulous data curation, MoE design, and training-recipe expertise — only 3–4 labs do it well. Verticalized domain models are mostly a flawed bet (Bloomberg GPT example) because LLM magic comes from emergent general capabilities.
Reasoning frontier: Current models match a median high-schooler; the next era requires models that iterate on their own rationales (Q*, STAR-style self-taught reasoners). When achieved, the $20/month pricing breaks — top-tier reasoning could command millions per session for advising decision-makers.
Memory and long context: “Memory” splits into practical long context (already arriving at 1M–2M tokens) and true infinite memory (no algorithm yet). The current bottleneck is instruction-following degrading as context grows.
Commoditization of foundation models: GPT-3.5-class models are commoditized; GPT-4-class are not yet. Real frontier-model competition is a 4–5 player game (OpenAI, Anthropic, Google, Meta, Mistral, possibly xAI). Whoever cracks bootstrapped reasoning first could consolidate the field.
M&A vs. independence: Big clouds can’t poach individual researchers because talent clusters; OpenAI and Anthropic are valued for the “machine that builds the machine,” not just current models, so they won’t be acquired while they keep producing breakthroughs.
Capital dynamics: Microsoft generates ~$330M/day in free cash flow; competing on raw capital is impossible. Solution: build a real business with revenue, like OpenAI’s $2B ARR.
Business model: $20/month subscriptions are okay but not high-margin enough at scale. Advertising is the holy grail if relevance can be cracked without corrupting answers. Diversify across subs, ads, API, and enterprise.
Enterprise: Perplexity Enterprise targets the AI-native-search-at-work problem with compliance/security. No lock-in exists yet in AI tooling — the game is just starting.
Wrappers and moats: Application-layer companies are the biggest beneficiaries of model commoditization if they can assemble design + product + AI + search expertise — a combination generic hires can’t replicate.
Fundraising reality: Despite memes, investors ask hard questions (what if OpenAI does this? what if Google ships?) — Perplexity’s edge is execution track record relative to team size and funding.
Quick-fire: Biggest mindset shift — patience with people’s growth curves. Biggest AI misconception — it’s not a bubble, it’s underhyped once it lands in familiar form factors. Future of browsers — agentic actions inside traditional browsing, leading toward an AI-native OS. Hardest part of CEO life — constantly resolving contradictions. Pre-mortem cause of failure — self-inflicted: slow execution, indecisive CEO, lack of focus. 2034 vision — Perplexity as the indispensable assistant for facts and knowledge.