Anthropic And OpenAI Just Admitted The Model Isn't Enough.
Most important take away
The McKinsey “Lily” incident — where a $20 autonomous agent gained read/write access to production via 22 unauthenticated API endpoints — was not a security failure but a procurement and organizational failure. Traditional SaaS buying sequences (strategy → procurement → security → IT → devs build) collapse when applied to agentic workflows, because implementation feasibility IS the strategy. The fix is moving deep technical architectural review to the front of the buying process and giving engineers real influence over AI purchase timelines.
Summary
Actionable insights and career advice from this episode:
For technical leaders and engineers (career-relevant):
- Push to be at the table during AI procurement decisions — not after contracts are signed. The episode makes the explicit case that technical voices are being excluded from AI buying conversations and that this is where real liability is being created. Position yourself as the person who can translate agentic complexity (cross-system permissions, audit composition, token economics) into business terms executives understand.
- Build a vocabulary for explaining why agents are different from SaaS: humans use screens as their permissions model; agents have no eyes and need every system to answer “am I allowed?” in code, with composable audit trails.
- When evaluating vendors, do not accept abstract answers (“we have a comprehensive auth framework”). Ask what the out-of-the-box default posture is, and what the platform looks like in two years if nobody touches security settings after initial setup — because that is the version that will actually be running.
For decision-makers buying AI:
- Move the deep architectural review earlier in the procurement sequence. The cheapest intervention this quarter is bringing developers in before the contract, not after.
- Ask two core questions before signing: (1) Does the platform distinguish between a human user and an AI agent at the authentication/permission layer? (2) What is the team’s default behavior under deadline pressure — does the system fail open or fail closed?
- Three concrete liability checks: (a) Can you bound an agent’s permissions to a single client/scope vs. inheriting the human user’s full access? (b) Can you produce an audit trail of what the system did on behalf of a user that satisfies a regulator? (c) Can someone revoke an agent’s access from a console in five minutes, without a code deploy?
Industry signal:
- The recent flurry of enterprise announcements (Anthropic and OpenAI standing up enterprise services arms, SAP acquiring DREAMIO and Prior Labs, Pinecone Nexus, Salesforce Headless 360, ServiceNow Action Fabric) all converge on one message: the model was never the hard part. The hard part is reachable surfaces, governed action, permission-aware data, cheaper context assembly, and forward-deployed engineers. Vendors are now selling the substrate your AI roadmap assumed already existed.
Build vs. buy: Both paths face the same cross-workflow agentic complexity. Internal builds need the same architectural rigor and team-default discipline as vendor purchases.
Chapter Summaries
The Lily incident (what actually happened): On February 28, Codewall’s autonomous agent spent $20 over two hours and used SQL injection — a 1998-era vulnerability — to obtain read/write access to Lily, McKinsey’s internal AI platform used daily by ~70% of its 40,000 consultants. That included tens of millions of chat messages, tens of thousands of user accounts, and every system prompt. Disclosed responsibly March 9; McKinsey patched.
Why “security failure” is the wrong frame: 22 of 200 endpoints shipped without authentication, including a write-to-production endpoint. That is not one engineer skipping a checklist — it is a cultural and structural pattern. The root cause is that nobody asked whether the API’s shape was appropriate for a world where autonomous agents would meet it.
Why the traditional SaaS procurement sequence is breaking: Strategy → procurement → security → IT → devs worked for bounded SaaS (Salesforce, Workday, ServiceNow). For agents, a single task like “prepare the renewal brief” crosses CRM, support, contracts, usage data, transcripts, and wiki — each with its own permissions and audit log that must compose. Implementation feasibility IS the strategy; putting devs last commits capital to a strategy whose viability was never tested.
The vendor convergence: Anthropic and OpenAI launched enterprise services arms with billions behind them to embed engineers in customer build rooms. SAP/DREAMIO/Prior Labs target the data ledger. Pinecone Nexus tackles context reassembly cost. Salesforce Headless 360 exposes the platform as APIs/tools. ServiceNow Action Fabric provides governed action surfaces with identity and audit. All six announcements within roughly a week — the signal is that the model was never the hard part.
The two questions to ask this week: (1) Does your AI platform truly distinguish between human users and AI agents at the permissions layer, so an agent can be scoped narrowly while a human keeps broad access? (2) What happens on your platform when the team is under deadline pressure — what is the default posture, not the configurable one?
Three liability checkpoints: bounded agent permissions, regulator-grade audit trails of system-on-behalf-of-user actions, and five-minute console-level agent revocation.
The closing prescription: Move deep architectural review earlier in procurement. Give engineers real influence on AI deployment timelines. Treat agentic cross-workflow complexity as a first-class business problem rather than an implementation detail. The expensive path is pretending agentic workflows behave like SaaS — they do not, and continuing to act as if they do is rolling the dice on the next Lily-shaped headline.