AI and Data Privacy: What Every Business Needs to Know Before They Start

  • Home
  • AI Security
  • AI and Data Privacy: What Every Business Needs to Know Before They Start

She discovered it during a contract renewal review. The AI platform her team had been using for eight months had updated its terms of service four months ago. The new terms permitted the vendor to use customer data to improve their models. The update had been communicated by email. Nobody had read it.

The customer data that had been entered into the platform over the previous four months – query histories, product details, account information – had potentially been used for model training. Whether it had or had not, she could not prove either way. Her legal team’s advice was to disclose to affected customers. The conversation with the clients was not easy.

This is the most common AI data privacy failure: not a deliberate breach, but an assumption. An assumption that the tool worked the way it appeared to work. An assumption that the vendor’s practices were what the sales team implied. An assumption that terms of service updates would be material and clearly communicated.

The Data Privacy Risks Specific to AI

AI tools introduce data privacy risks that are distinct from conventional software. Understanding the difference is the starting point for managing them.

The most significant difference: many AI systems learn from data. Conventional software processes data and returns a result. AI systems can process data and incorporate it into the model – changing what the AI produces for everyone, based on what your data contained. The implication is that data entered into an AI system may not just be processed and returned. It may persist in a way that affects future outputs, including outputs for other customers.

IBM’s 2025 Cost of a Data Breach Report found that 13% of organisations reporting a breach had experienced incidents involving AI models or applications, with 97% of those organisations lacking proper AI access controls. The breach cost statistics are stark: the global average cost of a data breach in 2025 was $4.44 million, with US breaches averaging $10.22 million.

A 2026 DataGrail analysis found that 63% of vendors that advertise AI capabilities do not disclose a third-party AI subprocessor in their legal documentation – meaning the AI tool you are using may be routing your data through an additional AI provider whose data handling practices you have never reviewed.

The Four Questions Every AI Vendor Must Answer in Writing

1. Will My Data Be Used to Train or Improve Your Models?

This is the first and most important question. The answer should be unambiguous. Not ‘we take data privacy seriously.’ Not ‘we may use anonymised data for service improvement.’ A clear, contractually binding answer: your data will not be used to train or improve models that serve other customers.

If the answer is yes, or if the answer is qualified, you need to understand exactly what data, in what form, is used for training – and whether that use is compatible with your obligations to your customers, employees, and regulators.

Ask for the data processing agreement before negotiating price. A vendor who does not have one, or who says they are ‘working on it,’ does not have contractual clarity on this question. That is not a question for later in the procurement process.

2. Who Are Your Third-Party AI Subprocessors?

Most AI platforms do not build every component of their system in-house. They use third-party AI models, cloud infrastructure providers, and specialist services. Each of these third parties may have access to your data.

A complete vendor data handling disclosure includes a list of all subprocessors, what data each one accesses, and the contractual obligations the vendor has imposed on those subprocessors regarding your data. If a vendor cannot produce this list, they cannot tell you with certainty where your data goes.

The 2026 DataGrail finding – that 63% of AI vendors do not disclose their third-party subprocessors – means this question is not answered by default. It must be asked explicitly, and the answer must be documented.

3. Where Is My Data Stored, and Does It Stay There?

Data sovereignty – the requirement that certain types of data remain within specific geographic boundaries – is a regulatory obligation in multiple sectors and jurisdictions. GDPR requires that personal data of EU residents transferred outside the EU has equivalent protection to EU standards. Sector-specific regulations in healthcare, financial services, and government create additional requirements.

Ask the vendor where your data is stored at rest and in transit. Ask whether that storage location can change without notice. Ask what contractual mechanism ensures compliance with your specific regulatory obligations. If the answer is ‘our infrastructure is global and optimised for performance,’ that is not an answer – it is a description of a problem you need to resolve before signing.

4. What Happens to My Data When the Engagement Ends?

What is the vendor’s data deletion policy at contract termination? How long does data persist after the contract ends? Is deletion guaranteed and verifiable, or is it a policy commitment without audit capability?

Data that has been incorporated into a model cannot be surgically removed. If your data has been used for training, deletion of the raw data does not delete the model’s ‘memory’ of patterns derived from that data. This is a fundamental characteristic of how AI models work, and it is a disclosure that vendors rarely make proactively.

Where model training is a possibility – even a contractually limited one – understand what deletion means in practice before the contract is signed.

Internal Controls That Protect Your Business

Vendor due diligence addresses the external risk. Internal controls address the risk that employees use AI tools in ways the organisation has not approved.

The most critical internal control is data classification aligned to AI access permissions. Not all data carries the same privacy risk. A tiered approach – defining which data categories may enter which AI systems – is more practical than a blanket prohibition and more protective than an absence of rules.

Tier one: publicly available data, generic business content, non-sensitive internal communications – approved for use in approved AI tools. Tier two: internal operational data, business analysis, non-client-specific strategy content – approved only for AI tools with reviewed and accepted data handling terms. Tier three: customer personal data, financial data, regulated information, confidential commercial data – not approved for any AI tool without a specific, reviewed data processing agreement.

Communicate the tiers. Review them. Update them as the tool landscape and the regulatory environment change.

FAQ: AI Data Privacy

Does GDPR apply to AI tools used by businesses?

Yes. If your business processes personal data of EU residents – including employee data, customer data, or partner contact information – and that data enters an AI tool, GDPR applies. This means the AI vendor must be a contracted data processor with a compliant data processing agreement. Data transfers to AI tools hosted outside the EU or EEA require additional safeguards. Most AI vendors have data processing agreements available on request – the risk is in deploying the tool before requesting and reviewing it.

How do you know if an AI vendor is using your data for training?

Read the data processing agreement and the terms of service – specifically the sections on data use, model improvement, and service enhancement. Look for language permitting the use of ‘aggregated’, ‘anonymised’, or ‘de-identified’ data for service improvement – this language often permits forms of model training. If the terms are ambiguous, ask the vendor to provide a written clarification, and get the answer incorporated into your contract, not just a sales email.

What should happen to business data when an AI contract ends?

The contract should specify a defined deletion timeline – typically 30 to 90 days after termination – with certification of deletion provided to you in writing. Where data may have been used for model training, ask the vendor what ‘deletion’ means for trained models specifically. For regulated data, the deletion terms may need to meet specific regulatory standards – involve your legal or compliance team in reviewing the termination clauses before signing.

Comments are closed

💬

Dosys Support