If you tried to build an internal AI assistant this year, the first thing you’d probably do is glue it to a pile of internal APIs and call it “Lilli‑but‑for‑us.” The AI chatbot hack against McKinsey shows what happens when you do that with 22 unauthenticated endpoints and a writable prompt store sitting behind them.
TL;DR
- The McKinsey AI chatbot hack wasn’t about a genius model; it was an old‑school SQL injection plus bad architecture executed at machine speed.
- Writable system prompts and config tables have quietly become “crown jewels”, a single UPDATE can rewrite what thousands of users see, with no deploy.
- The real fix is product architecture: immutable, auditable prompts, strict machine‑to‑machine auth, and continuous, agent‑scale red‑teaming, not more “AI safety” checklists.
AI Chatbot Hack: Architecture Failure Wearing an AI Mask
Condensed version of what CodeWall says happened: they pointed an autonomous agent at McKinsey’s Lilli platform; it read public API docs, found about 200 endpoints and 22 that didn’t require auth; one of those logged search queries into a SQL database by concatenating JSON field names into a query. JSON keys were not sanitized, so the agent injected SQL via field names, gained read‑write access to the production database in about two hours, and (per CodeWall’s write‑up) could see 46.5M messages, 728k files, 57k user accounts, and 95 writable system prompts controlling the chatbot’s behavior. McKinsey says they patched the issues quickly and claim no unauthorized data access was found by forensics.
You don’t need to believe every number in that paragraph to see the pattern.
This was not “AI vs AI” in any meaningful sense.
It was: unauthenticated endpoint + SQL injection + everything important in one database.
The only “AI” twist is how fast an agent can glue those pieces together once you give it the search space.
If you’ve been around long enough to remember the early web, this is classic “SQL injection in the weird part of the app”, except the reconnaissance and exploitation loop now runs at machine speed instead of “bored consultant on a Tuesday.”
Why Two Hours Matters: Agent‑Speed Attackers Change the Baseline
Human red teams burn hours just mapping the surface of a modern app. Read the docs, click around, proxy traffic, sketch out a mental model.
CodeWall’s agent did the boring part for free.
Roughly:
- Crawl public docs for endpoints and parameters.
- Probe for which endpoints respond without auth.
- Fuzz inputs and observe errors.
- Lock onto the SQL injection via JSON keys.
- Iterate blind queries until the schema and useful tables are mapped.
That loop is pure drudge work. Machines are great at drudge work.
The important shift is not “an AI discovered a novel exploit.” It’s that two hours is now a realistic upper bound for discovering:
- Exposed dev or staging environments.
- Forgotten unauthenticated endpoints.
- Weird injection vectors (like “field names become SQL”).
Once you accept that, a lot of current security practice looks outdated:
- Quarterly pen tests? Your attack surface changes daily; automated agents can sweep it hourly.
- “We’ll secure it before GA”? An internal beta with public docs is already a target.
- “Obscure endpoint, who’d find it?” Anything in written docs is one search query away from enumeration.
The McKinsey incident is just the first high‑profile datapoint that makes this concrete. Agentic AI doesn’t invent new classes of bugs; it compresses the time‑to‑impact of the same old ones.
Writable Prompts Are the New Root Shell
The weirdest part of the CodeWall write‑up isn’t the SQL injection itself. It’s what lived behind it.
McKinsey apparently stored:
- System prompts
- Model configurations
- RAG index metadata
in the same production database the chatbot used for everything else.
The SQL injection was read‑write, so from CodeWall’s perspective:
“No deployment needed. No code change. Just a single UPDATE statement wrapped in a single HTTP call.”
In other words, if they’d been malicious, they could have:
- Updated all 95 system prompts.
- Silently changed how Lilli answered questions for ~40k employees.
- Done it instantly, with no CI/CD pipeline and no code review.
We’ve all spent the last two years talking about prompt‑layer security. This is what it looks like when the prompt layer is a row in a table and that table is writable from the internet.
Writable prompts and configs are a new class of crown jewel because:
- They’re high leverage: one write, thousands of users affected.
- They’re operational: no deploy, no feature flag, no obvious change event.
- They’re invisible: if your logging is focused on HTTP 500s and login failures, you might never notice a subtle behavior change.
If you were building this today, the obvious “ship it fast” architecture is:
promptstable in the main DB- Admin UI that edits those prompts
- Chatbot pulling the latest prompt row on every request
That’s exactly the pattern that turns a boring SQL injection AI bug into “we silently rewrote your corporate brain.”
Treat Machine‑to‑Machine Auth and Prompt Storage as First‑Class Security
If you react to the McKinsey AI chatbot hack by thinking “we should buy an AI firewall,” you’re missing the point.
You already know how to mitigate this. It’s the same discipline you (hopefully) apply to production databases, just applied to prompts and agent interfaces.
Three architecture changes that actually move the needle:
1. Immutable, auditable prompt stores
Treat system prompts like migrations, not like CMS content.
- Store them in Git, not just in a database.
- Deploy them via code changes, with review.
- If you must support runtime edits, write‑once append with versioning, not overwrite in place.
On disk, that looks like:
prompts/directory in your repo.- Each change gets a commit, an approver, and a hash.
- Live systems reference a specific prompt version, not “latest.”
An attacker who pops your app won’t get to silently rewrite history with a single SQL UPDATE.
2. Real machine‑to‑machine auth, not “it’s internal”
Lilli’s problem wasn’t “AI is dangerous.” It was “22 endpoints didn’t require auth.” Classic production database mistakes.
If you’re exposing anything that can:
- Read production data, or
- Change model behavior
then:
- It needs authenticating who is calling it (service identity, not just IP).
- It needs authorizing what that caller can do (read vs write, table‑level scope).
- It must assume the internet is the caller, even if you think it’s “internal.”
This is boring stuff: mTLS, service accounts, scoped tokens. But boring is exactly what you want between “random JSON” and “UPDATE prompts SET …”.
3. Continuous, agent‑scale red‑teaming
The single most important lesson from CodeWall isn’t “agents can hack.” It’s that they turned a one‑off research setup into a product: an autonomous red‑teaming agent.
Defenders need the same thing.
That means:
- Permanent agents crawling your own docs and OpenAPI specs, looking for unauth endpoints.
- Agents fuzzing weird parts of your inputs (headers, field names, nested JSON keys).
- Agents trying out known injection patterns against your AI‑facing endpoints.
Not once a quarter. All the time.
If you’re going to be attacked at agent speed, you can’t defend at “ticket gets triaged next sprint” speed.
Don’t Blame ‘AI’, Blame Product Decisions
One more uncomfortable point: McKinsey’s own forensics (per their quote to The Register) say they found no evidence that client data was accessed by CodeWall or anyone else. Also, right now, almost all public detail comes from CodeWall’s disclosure; there is no independent reproduced exploit write‑up.
So, yes, treat the exact numbers and autonomy claims with some skepticism.
But don’t use that skepticism as an excuse to dismiss the core lesson.
The bugs here are depressingly normal:
- Unauthenticated endpoints.
- Injection via unexpected input (JSON keys instead of values).
- High‑value control plane (prompts/configs) stored right next to user data.
- Convenience features (writable prompts, public docs) bolted on fast.
What changed is the blast radius and the timeline.
The pitfall for teams building internal assistants right now is thinking:
“We’re experimenting, it’s just for employees, we’ll harden it later.”
Except “later” now means “after someone else’s agent has swept your public docs and tried every odd‑looking parameter.”
The safer mental model:
“Anything an AI agent can read, a hostile AI agent can read; anything it can write, a hostile agent can overwrite.”
Design your architecture for that adversary, not for the demo.
Key Takeaways
- The McKinsey AI chatbot hack was an old‑fashioned SQL injection plus unauthenticated endpoints; agentic AI just compressed the time‑to‑breach to two hours.
- Writable system prompts and configs sitting in the main production database turn a narrow bug into a full prompt‑layer compromise with a single UPDATE.
- Machine‑to‑machine auth and prompt storage need the same rigor as production database access, “it’s internal” is no longer a defense.
- Continuous AI red‑teaming is now table stakes; episodic human pen tests can’t keep up with agent‑speed attackers.
- Don’t blame “AI” for these incidents; blame product decisions that prioritized convenience and rapid rollout over basic security architecture.
Further Reading
- How We Hacked McKinsey’s AI Platform, CodeWall, Primary technical disclosure of the Lilli attack chain and data exposure claims.
- AI vs AI: Agent hacked McKinsey’s chatbot and gained full read-write access in just two hours, The Register, Independent reporting summarizing CodeWall’s claims and McKinsey’s response.
- AI agent cracked McKinsey chatbot in two hours, Cybernews, Impact‑focused overview of the breach and risks for enterprise chatbots.
- When the AI Itself Is the Attack Surface, Helixar, Analysis of why prompts and model configs have become critical attack surfaces.
- AI Agent Hack and Prompt-Layer Security, NovaKnown, Broader look at prompt-layer security pitfalls in agentic AI systems.
In a world where an off‑the‑shelf agent can own your internal assistant in 120 minutes, the real upgrade you need isn’t a smarter model, it’s a more boring, disciplined architecture.
