← All Posts

Foundation First, Agents Second: What the McKinsey/Lilli Breach Pattern Reveals About Enterprise AI Security

This shift is more consequential than the cloud. The cloud was about WHERE to store data. Agentic AI is about WHAT RUNS your business.

The industry talks about AI governance as if it lives mainly in policy documents, steering committees, and approval workflows.

Those things matter.

But the most important governance decisions are usually made much earlier, in architecture choices that either enforce boundaries or quietly remove them.

Who can reach the system. How services are exposed. Whether request contracts are typed or permissive. Whether one compromise path can cross tenants, workspaces, and data domains. Whether the instructions that shape agent behavior live in a protected control layer or sit beside mutable operational data. Whether anyone can see abnormal behavior before an outsider points it out.

That is why the reported breach of McKinsey’s internal generative AI platform, Lilli, matters beyond the specifics of one incident.

If the public reporting is directionally accurate, the episode offers a useful case study in what happens when enterprise AI systems scale faster than the platform foundations underneath them.

That is the deeper issue leaders should take seriously.

The core risk in enterprise AI is not only bad answers.

It is weak architecture.

Foundation first, agents second

As organizations move from isolated copilots to broader agentic systems, the underlying platform becomes more important, not less.

Agentic systems tend to concentrate sensitive capabilities in one runtime environment:

  • enterprise data access
  • conversational history
  • retrieval and search layers
  • workflow orchestration
  • tool invocation
  • policy and prompt artifacts

When those capabilities are deployed on a well-governed platform, they can create substantial enterprise value.

When they are deployed on a platform with weak identity, weak containment, or limited observability, the same concentration can increase operational risk.

That is why the right sequence is:

foundation first, agents second.

Why this breach pattern matters

The breach pattern described in the source materials suggests a combination of issues rather than a single isolated defect.

Based on those materials, the reported incident appears to have involved some mix of:

  • insufficiently protected service entry points
  • an input surface vulnerable to malformed requests
  • limited containment across the data layer
  • inadequate separation between control artifacts and operational data
  • insufficient monitoring or detection

Whether every public detail is ultimately confirmed or refined, this set of issues provides a useful framework for evaluating enterprise AI architecture more broadly.

The lesson is not simply to watch for one bug class.

It is that agentic systems can magnify architectural mistakes.

The 5 failure points enterprise AI leaders should study

1. Publicly reachable interfaces without strong identity

The source materials indicate the reported attack path may have involved internet-reachable interfaces that lacked adequate authentication.

For enterprise AI, identity should be a default boundary, not an optional overlay.

If broad service surfaces are reachable without strong authentication and runtime context, the platform invites enumeration and automated abuse before downstream governance controls have a chance to help. In AI systems, this is especially important because exposed services are often tied to retrieval, orchestration, or workflow execution rather than simple data reads.

Organizations should evaluate whether their platforms:

  • require authentication by default
  • minimize publicly reachable interfaces
  • propagate user, tenant, and device context across service calls
  • maintain auditable records at entry points

If identity is weak at the platform edge, downstream controls become harder to trust.

2. Flexible input surfaces without tight contract discipline

The reported exploit path involved abuse of request structure rather than only obvious payload values.

That highlights the value of strong contract discipline.

Enterprise AI systems are often built for flexibility: permissive JSON payloads, hand-rolled service layers, rapid iteration, and “we’ll harden it later” access patterns. That can be expedient in early development, but risky at enterprise scale.

Mature platforms reduce ambiguity at the protocol layer through:

  • typed request schemas
  • constrained interface definitions
  • framework-enforced or generated access paths
  • avoidance of permissive raw query construction

The point is not rigidity for its own sake. It is reducing ambiguity that can create avoidable attack surfaces.

3. Flat data planes without meaningful containment

A central lesson from the breach pattern is the importance of segmentation.

Enterprise AI platforms often unify many high-value assets in one operational surface: conversations, documents, embeddings, retrieval layers, user profiles, workspace metadata, and execution traces. That makes containment especially important.

Leaders should examine whether the platform enforces:

  • tenant isolation
  • workspace or domain separation
  • session-scoped memory boundaries
  • separate storage and secret paths
  • policy-aware access controls where required

A mature architecture should limit how far any single failure can travel.

A useful question here is simple:

If one request path is compromised, what remains walled off?

If the answer is unclear, the architecture likely needs stronger containment.

4. AI control logic stored too close to mutable operational data

One of the most important architectural questions in enterprise AI is where prompts, playbooks, policies, and behavioral controls live.

If those assets are stored in the same mutable operational plane as user data, a data-layer compromise may create a pathway to behavioral manipulation as well.

That is a different class of risk than ordinary record exposure.

It means the issue may affect not only what the system stores, but how it reasons, responds, routes work, or applies policy.

That is why organizations should prefer:

  • versioned governance artifacts
  • deployment-managed control assets
  • explicit separation between control-plane logic and user content
  • runtime controls that cannot be silently relaxed through ordinary data writes

This separation is important for both security and operational discipline.

5. Limited visibility into abuse, drift, or abnormal behavior

The source materials suggest that the reported issue may have persisted for a significant period before disclosure.

Whether every detail is eventually confirmed or not, the architectural lesson is clear:

you cannot contain what you cannot see.

Enterprise AI platforms should support more than basic infrastructure logging. They should provide visibility into:

  • service-entry activity
  • model interactions
  • policy and guardrail events
  • workflow execution paths
  • anomalous usage patterns across tools and sessions

Observability is not just a compliance requirement.

It is an operational control that supports investigation, detection, and rapid containment.

The question most teams still aren’t asking

Here is what most enterprise AI security discussions miss:

Even if every governance problem is solved — every endpoint authenticated, every interface typed, every data plane segmented, every control plane protected, and every behavior monitored — there remains a deeper question.

What about the knowledge?

Enterprise knowledge is rarely structured for machine consumption. Across regulated industries and complex organizations, knowledge exists in layers: conflicting policies, outdated procedures, semantic ambiguity, contradictory playbooks, and competing definitions of the same terms.

This is not a governance problem. It is a knowledge integrity problem.

When agents encounter inconsistent, contradictory, or structurally unsound knowledge, they do not malfunction. They normalize the contradiction and confidently cite wrong answers. A governed platform running on unsound knowledge does not become safer. It becomes more dangerous, because the governance makes the agent appear trustworthy even as it reproduces systematic errors from the knowledge layer.

That is why enterprise knowledge must be structured in a robust semantic framework that enforces consistency, surfaces contradictions, and provides agents with a reliable foundation for reasoning and decision-making.

This is not about natural language databases. It is about semantic knowledge operations: the systematic structuring, validation, and governance of enterprise knowledge as the critical infrastructure that agents depend on.

EnPraxis addresses this with the Semantic Knowledge-Operations Fabric, which enforces semantic consistency and structural integrity across enterprise knowledge, making it safe and reliable for agent consumption.

What secure enterprise AI should include

Taken together, these questions point to a practical baseline for enterprise AI security.

Organizations should look for platforms that provide:

  • zero-trust identity at service boundaries
  • typed and constrained service contracts
  • tenant, workspace, and session isolation
  • explicit separation between control plane and data plane
  • first-class observability across models, tools, workflows, and policies

These capabilities are not best viewed as add-on governance features.

They are foundational properties that determine whether governance can be enforced reliably in production.

This is where many organizations still misjudge the problem. They treat governance as a phase that comes after deployment. In reality, governance becomes real only when the runtime itself enforces boundaries that cannot be casually bypassed.

That is why architecture matters so much.

How EnPraxis approaches the problem

At EnPraxis / Empower, we approach enterprise AI from two foundations: the governed platform AND the structured knowledge base.

That means treating security, governance, and observability as platform properties rather than post-deployment enhancements. And it means treating knowledge integrity and semantic structure as prerequisites rather than optional enhancements.

In practice, that means prioritizing:

On the platform side:

  • runtime boundary enforcement
  • structured orchestration contracts
  • isolation and scoped context management
  • protected governance artifacts
  • operational visibility and auditability

On the knowledge side:

  • semantic consistency enforcement across enterprise information
  • structured knowledge fabric that surfaces contradictions
  • versioned, auditable knowledge operations
  • validation gates that prevent inconsistent or ambiguous knowledge from reaching agents

No serious platform should claim absolute protection from every future risk.

The more practical standard is whether the architecture is designed to reduce, contain, and surface material classes of failure before they become systemic. And whether the knowledge infrastructure supports reliable, consistent reasoning.

That is the standard enterprise AI buyers should increasingly expect.

Download the White Paper

The full analysis — five failure points, architectural requirements, and EnPraxis design principles — is available as a PDF white paper.

Conclusion

The McKinsey/Lilli breach pattern matters not because it is unique, but because it highlights questions every enterprise AI leader should now be asking.

Are our interfaces appropriately protected? Are our contracts disciplined? Is our data meaningfully contained? Is our control logic separated from mutable operational data? Can we see abnormal behavior early enough to act?

And: Is our enterprise knowledge structured, consistent, and reliable for machine consumption?

Those are the architecture questions that matter.

In enterprise AI, trust begins with strong platform foundations AND consistent knowledge integrity.

Foundation first.

Agents second.

Ready to see governed AI in action?

Learn how Empower AI helps regulated enterprises move from pilots to production-grade systems of action.