Most "GDPR for LLM" articles are written by people who have never sat across the table from a Data Protection Officer who actually reads what you give them. This checklist is the opposite. It is the list of questions a competent European DPO will ask before they will sign off on a private LLM deployment, in roughly the order they ask them, with the answer your engineering team has to be able to give without flinching.

It is not legal advice. It is a working document we use with customers who are deploying ConvoSuite in regulated EU environments — telco, banking, insurance, healthcare adjacents, and public sector adjacents. Adapt freely; do not skip steps.

1. Lawful basis and purpose

Before architecture, document the lawful basis under Article 6 GDPR for every processing activity the LLM will perform. "We built an assistant" is not a purpose. "The assistant summarises customer support tickets for internal triage" is. The purpose drives everything downstream: retention period, data minimisation, the scope of any DPIA, and the wording of your privacy notice.

For most B2B deployments the basis is legitimate interest (Art. 6(1)(f)) or contract performance (Art. 6(1)(b)). For HR or health use cases you may need explicit consent (Art. 9). Get this in writing from legal before you build, because the choice of basis constrains what you are allowed to do with the data later.

2. Data Protection Impact Assessment

An LLM that processes personal data at scale almost always triggers Article 35: "a high risk to the rights and freedoms of natural persons." That means a DPIA. Do not treat it as paperwork; treat it as architecture. The DPIA forces you to enumerate the categories of personal data flowing through the system, the risks (re-identification, sensitive-data leakage in prompts, model memorisation, third-party processor exposure), and the mitigations. The act of writing it will catch design problems your engineering team would otherwise discover in production.

3. Data residency, end to end

"The model runs in eu-central-1" is not enough. The whole pipeline has to be in scope: the embedding model, the vector store, the application backend, the logging pipeline, the backup destination, and the access points used by support and engineering. One Lambda in us-east-1 doing PII enrichment, one Datadog tenant defaulting to us, one developer SSH'ing in from outside the EU — any of them is a transfer the DPO will flag.

The defensible architecture pins every component to an EU region, uses customer-managed keys held in an EU KMS, and routes all support access through a jump host that is itself in the EU. Document the data flow as a diagram and have your network team confirm it matches reality, not the design.

4. Model-provider contracts

AWS, Microsoft, and OpenAI all publish data-processing agreements (DPAs) for their LLM services. Read them. The relevant clauses to scrutinise: (a) does the provider use your prompts and completions to train their models? (default for the enterprise offerings is "no", but you should confirm in writing), (b) what is the sub-processor list and how are you notified of changes?, (c) what is the procedure and SLA for a Data Subject Access Request that involves your prompts?, (d) what happens to your data on contract termination?

Bedrock and Azure OpenAI both offer enterprise terms where prompts and completions are not used for training and are not retained beyond the request lifecycle for abuse monitoring (typically 30 days, sometimes opt-out-able). Get a signed DPA on file before production.

5. Data minimisation in prompts

The cheapest GDPR-compliance win in any LLM system is to not send personal data to the model in the first place. Tokenise customer names to opaque IDs before the prompt. Strip e-mail addresses, phone numbers, IBANs, and free-text fields known to contain personal data. Use a tested redaction library; do not hand-roll regexes. Re-hydrate the tokenised IDs back to human-readable values in the application layer, after the model returns, never sending them upstream.

For the cases where the model genuinely needs personal data (a customer-service assistant that personalises responses), document why, restrict the data to the minimum necessary, and apply the same redaction to anything that is logged.

6. Retention and deletion

Define a retention period for every category of stored data: chat history, audit logs, prompt templates, KB documents, evaluation sets. Implement automated deletion at the end of the retention period, not "we'll do it manually." Test the deletion. Have a runbook for the Article 17 "right to erasure" request that includes the model-provider side — some providers retain abuse-monitoring logs that you cannot directly delete; you must request deletion via a documented support channel.

For embeddings: when a user invokes their right to erasure, the embeddings derived from their data must also be deleted, because embeddings are personal data in the technical sense (they can re-identify). Maintain a mapping from user/document IDs to embedding IDs so that deletion is mechanical.

7. Access control and audit

Every person who can see prompts, completions, KB content, or audit logs needs an audited role. Use the principle of least privilege: support staff see only the ticket they are working on, never a bulk export. Engineering access to production data goes through a break-glass with two-person approval and an automatic ticket. Audit logs are write-once, hashed, retained for the period the regulator expects (typically six years for financial-services-adjacent deployments).

8. Subject access requests

Build the Article 15 / Article 20 endpoints on day one. For any data subject, your system should be able to produce, within the 30-day SLA: every conversation involving that subject, every document about that subject, every model call that touched that subject's data, and a machine-readable export. The first DSAR you handle by hand will cost you more in engineering time than building the endpoint properly upfront.

9. Breach detection and notification

Article 33 requires you to notify the supervisory authority within 72 hours of awareness of a breach. To meet that SLA you need detection that does not depend on someone manually noticing. Set anomaly thresholds for: unusual volumes of model calls per tenant, cross-tenant data access patterns, prompt content matching sensitive-data regexes leaking outbound, downloads of bulk KB exports outside business hours. None of these are perfect signals; combined, they buy you the hours you need.

10. The output side

An LLM can produce personal data it was not given. Generated names, generated phone numbers, generated medical advice. Treat outputs as potentially containing personal data and apply the same controls: do not log outputs at INFO level, do not store outputs longer than the conversation requires, do not train on outputs without re-consent.

11. Where ConvoSuite fits

ConvoSuite ships with EU-only deployment profiles (eu-central-1, eu-west-1, sweden-central, west-europe), customer-managed KMS keys, audit-log export, configurable retention, prompt-side redaction hooks, and DSAR endpoints. The DPIA template above is also part of the customer onboarding pack. None of that absolves you of doing the work, but most of the mechanical compliance is wired in by default. If you are facing a DPO review and want a sanity check on the architecture before the meeting, we offer a free one-hour readiness review.