Per-Tenant Isolation Patterns for Multi-Customer LLM SaaS

Start with the threat model
Isolation in the data plane
Isolation in the control plane
Prompts and knowledge bases as tenant property
Routing and logging model calls per tenant
Budgets, quotas, circuit breakers
Audit and exportability
Incident playbooks
Where ConvoSuite fits

Every team that ships a multi-tenant LLM SaaS hits the same wall at month three: the easy demo of "send a question, get an answer" stops being good enough, because the second customer arrives and brings rules. They want their conversations isolated from the first customer's. They want their prompts not to leak. They want their cost capped. They want their compliance officer to be able to print a list of every model call made against their data last quarter.

None of this is novel. SaaS has been solving multi-tenancy since the 2000s. What is genuinely new in 2026 is that LLM workloads break two assumptions of the classical patterns: tokens are not rows, and a model call is not a stateless lookup. This article is a working playbook of the per-tenant isolation patterns we use at ConvoSuite, in roughly the order you would adopt them as a project grows from a pilot to a production SaaS.

1. Start with the actual threat model

"Tenant isolation" means different things to different stakeholders. The product manager hears "tenant A cannot see tenant B's chats." The security officer hears "a compromised tenant cannot escalate into the platform." The CFO hears "a misbehaving tenant cannot bankrupt us with a runaway model bill." All three are legitimate. The cheap mistake is to design only against the first one and discover the other two later.

Before you write any code, write down explicitly: which adversaries do you defend against, and what does success look like for each? For most B2B SaaS, the realistic adversaries are (a) a curious end-user inside a paying tenant who pokes at the URL bar, (b) a malicious prompt injected through user-supplied content, and (c) a buggy code path that mis-routes data between tenants under load. Nation-state adversaries are out of scope unless you sell to government.

2. Isolation in the data plane

For the storage layer, three patterns dominate and they are not equivalent:

Shared table, tenant_id column. One Postgres / OpenSearch / pgvector index for everyone; every row carries a tenant_id; every query has a WHERE tenant_id = ?. Cheapest to operate. Most fragile against bugs — one forgotten WHERE clause leaks one tenant's data to another. Acceptable for small B2B SaaS where the blast radius of a leak is reputational, not regulatory.

Shared cluster, per-tenant schema or index. One database server, one OpenSearch cluster — but each tenant gets its own schema (Postgres) or index (OpenSearch / Elasticsearch / vector store). Row-level mistakes can no longer leak across tenants; the worst case is a "wrong tenant connected" bug, which is loud and immediate instead of silent and chronic. This is the sweet spot for most SaaS up to a few hundred tenants.

Per-tenant cluster. Each tenant gets its own database server, its own search cluster, its own object-store prefix with KMS keys it controls. Most expensive. The only credible story for regulated verticals (healthcare, finance, defence) where the auditor wants to see physical separation. Plan for the operational tax: you will be writing a small fleet manager.

Pick the level you can defend in front of a security reviewer, not the level your developers find most convenient. Moving up the ladder later is straightforward; moving down (consolidating) is rarely worth the engineering cost.

3. Isolation in the control plane

The control plane — who can configure prompts, knowledge bases, agents, and budgets — is where most of the day-to-day pain lives. A clean model has three layers:

Platform admins (your team) can see across all tenants but cannot read tenant data without an audited break-glass.
Tenant admins can configure their own tenant entirely but cannot see other tenants.
Tenant end-users can use the configured agents and see only their own conversations.

Implement this on top of OIDC or SAML, not bespoke auth. Reuse a battle-tested identity provider (Cognito, Entra ID, Auth0, Keycloak). Treat your tenant_id as a claim in the token, never as a query-string or local-storage value. Every backend route should derive the tenant from the verified token, not from anything the client says.

4. Prompts and knowledge bases are tenant property

This is the area that traditional SaaS patterns get wrong. A prompt template is not just configuration — it is intellectual property the tenant has paid you to host, and frequently it embeds business logic the tenant considers competitive. Treat prompt templates exactly as you would treat customer source code:

Version control them per tenant. Tenants will want to roll back.
Encrypt at rest with keys tied to the tenant where possible.
Never log full prompts at INFO level. The diff between a customer's prompt and your boilerplate often reveals their competitive secret.
Export them on request — "I want my prompts in a zip file" is a fair demand and a great way to lose a renewal if you can't satisfy it.

Knowledge bases inherit the same rules, plus one more: chunks from tenant A's KB must never appear in tenant B's RAG context. The clean way to enforce this is at the index level (per-tenant index, queries scoped at connection time). The dirty way is at the application level (shared index, filter in the application). The dirty way is also the way you will eventually leak data when somebody adds a new endpoint and forgets the filter.

5. Model calls: route per tenant, log per tenant, cap per tenant

Every outbound model call should carry, at minimum, a tenant ID and a request ID. From those two values you should be able to reconstruct, months later, exactly what happened: prompt, model, tokens in, tokens out, latency, cost. If you cannot, you cannot do incident response, you cannot do FinOps, you cannot answer a tenant's "what did you charge me for?" question. This is non-negotiable.

For routing, the right pattern is to put a tenant-aware gateway between your application and the model provider. The gateway is the only thing that talks to Bedrock / Azure OpenAI / OpenAI. It enforces three rules: (1) the request matches the tenant's allow-list of models, (2) the request is within the tenant's rate and spend budget, and (3) the response is logged before it goes back to the caller. AWS API Gateway + Lambda authoriser is one way; a small in-cluster service is another; both work.

6. Budgets, quotas, and circuit breakers

An unbounded LLM bill is the second-most-cited reason for cancelled SaaS pilots. Build the budget primitive in early: each tenant has a daily token budget, a monthly token budget, and a "kill switch" the tenant admin can flip without calling support. Surface usage to the tenant in their own admin console, in close-to-real-time. The combination of visibility and self-service control turns "you charged me too much" disputes into non-events.

7. Audit and exportability

Every regulated tenant will ask, at some point, for an export of every model call made against their data over a specified time window. Build that endpoint on day one, even if no tenant has asked yet, because retro-fitting it onto a logging pipeline that was not designed for it is several engineering weeks of unpleasant work. The export should include: timestamp, model name and version, prompt (or a documented redaction policy), response, tokens, cost, requesting user, and any tool calls made. Sign it with a hash chain if you can; that small detail is what turns "log file" into "evidence."

8. Have an incident playbook before you have an incident

Three scenarios deserve a written, rehearsed playbook:

Cross-tenant leak. Detected via the audit log, a tenant complaint, or an internal review. The playbook covers freeze (stop the bug), measure (which tenants, what data, what window), notify (tenants in scope, regulators if applicable), remediate (delete leaked artefacts, rotate keys), and review (what code path failed).
Tenant compromise. A tenant admin's credentials are stolen and used to extract data or run up a bill. The playbook covers detection (anomaly thresholds), containment (kill switch), recovery (key rotation, audit replay), and customer communication.
Provider outage. Bedrock or Azure OpenAI is down in your primary region. The playbook covers failover (degrade gracefully, switch model where possible), communication (status page, tenant emails), and post-mortem.

9. Where ConvoSuite fits

ConvoSuite implements the patterns above as defaults: per-tenant configuration, OIDC-backed RBAC, audited model gateway, exportable usage logs, and tenant-controlled budgets. Most customers turn them all on; a few subset to the patterns that match their threat model. If you are sketching a multi-tenant LLM SaaS and want a second opinion on the isolation architecture before you write the first line, we offer a free 45-minute architecture call — useful even if you decide to build it yourself.

Per-Tenant Isolation Patterns for Multi-Customer LLM SaaS

Contents

1. Start with the actual threat model

2. Isolation in the data plane

3. Isolation in the control plane

4. Prompts and knowledge bases are tenant property

5. Model calls: route per tenant, log per tenant, cap per tenant

6. Budgets, quotas, and circuit breakers

7. Audit and exportability

8. Have an incident playbook before you have an incident

9. Where ConvoSuite fits

Multi-tenant by default

Tenant-isolated LLM apps, out of the box

Company

Solutions

Support

Get In Touch