Lost In Translation: When AI Reveals The Growing Divide Between Legal Reasoning And Machine Logic

In recent years, I’ve attended numerous meetings involving IT and legal teams where their goals and priorities seem completely out of sync. It often feels like two entirely different universes struggling to find common ground under tight deadlines. As one colleague put it: "Legal writes for people, IT builds for machines." Law thrives on interpretation, context, and mitigation, while IT relies on logic and deterministic workflows. Because of this, even minor misunderstandings can result in weeks of wasted effort building technical solutions that were never legally sound to begin with.

The problem, solution, and anticipated outcomes I describe here target all three groups: business leaders, IT professionals (especially those in Data & AI), and legal experts who are struggling to bridge the growing divide when rolling out compliant data solutions. Rather than viewing this as purely a communication problem, I offer a practical framework for turning legal intent into machine-readable, architecture-aware controls that can grow alongside modern data and AI ecosystems.

For a long time, this tension was manageable, but it became glaringly obvious after GDPR regulations arrived in 2016, and the surge of AI demands from every part of the business is about to expose this gap on a much larger scale.

Business: Results above all else

The business side focuses on growth, revenue, efficiency, and competitive edge. Their vocabulary revolves around KPIs, margins, and performance metrics. Compliance, from their viewpoint, is an obstacle to work around rather than a core objective. They are not being careless; they simply operate in an environment where outcomes are what matter most.

They aim to:

Understand customer behavior
Experiment with new features
Leverage AI for personalized experiences
Derive more value from their data

Legal: Risk management and defensibility

Legal doesn’t deal in absolutes. Demanding zero risk is neither practical nor realistic; instead, they work with acceptable risk. What they do insist on is that any risk taken is defensible, so that if questioned, they can prove responsible action was taken.

Their thinking revolves around:

Lawful basis
Proportionality
Mitigation
Demonstrable intent
Defensibility in case of investigation

Legislation is written in narrative form and is deliberately principle-based, leaving room for interpretation. Legal professionals are trained to interpret that narrative. However, they are not trained to design database schemas, configure access controls, or define system-level enforcement mechanisms.

IT: Deterministic Controls

IT requires absolute clarity and cannot work with narratives. "Reasonable safeguards" cannot be translated into code. When legal says, "It depends," IT hears, "I can’t build this."

IT needs clear answers to questions like:

Is this field considered personal data?
Can this dataset be used for training models?
What retention period must be enforced?
Should this attribute be masked or removed?
What exactly qualifies as anonymization in this context?

So, the business is focused on driving value and profitability, while Legal’s priority is ensuring compliance. At the same time, IT must support both by building dependable systems that deliver value while staying compliant. This creates an uneven burden of compliance responsibility, which in turn makes discussions and agreements between these departments painfully slow.

Image generated by author

How AI widens the gap

Previously, this was manageable because the pace of data usage was slower. Manual oversight was still feasible, and legal teams could review major initiatives one at a time. That era is ending as the volume and velocity of AI-driven data usage overwhelms traditional compliance models. Data is no longer analyzed in a linear fashion; it is continuously processed, combined, enriched, repurposed, and modeled. Autonomous agents can even trigger workflows, generate insights, and make decisions without any human review. At this scale, traditional legal oversight falls apart. Legal cannot manually evaluate every new data use case or "processing"¹ activity, and IT doesn’t have the bandwidth to interpret vague legal language every time an engineer builds a new pipeline—yet the business will not put innovation on hold to wait for interpretive debates.

From legal text to architecture-aware compliance

Legal intent is rarely captured in a way that systems can validate. Instead, compliance lives in PDFs, policy documents, meeting minutes, and emails. That approach works up to a point, but it breaks down at scale.

What’s missing is a shared interface in a structured, machine-readable format, using concepts like structured metadata, policy-as-code, and data contracts to serve as translation layers. Instead of open-ended discussions, we can use AI to validate usage automatically and autonomously.

Machine-readable governance would allow Legal to define acceptable boundaries, IT to implement enforceable constraints, and the Business to clearly see what is or isn’t permitted. We need to shift from theoretical compliance to observable compliance.

The proposal

The Core Idea: Eliminate human error in handoffs between Legal and IT

Unstructured human conversation is a poor vehicle for transferring precise, technical, legally-consequential decisions between people who don’t share a common language. The goal is not to remove human judgment, but to replace the parts of the process where human interaction introduces friction and error rather than adding value.

I propose a straightforward organizational principle: replace the unstructured human handoff with a structured, AI-assisted process. The failure point in traditional compliance is not that humans are involved; it’s that humans with different motivations are asked to reach precise, technically implementable agreements through open-ended conversation. The solution described here is designed to

Remove friction in a structured way by organizing inputs, automating validations, and saving judgment calls for the truly unclear cases, not the routine ones.

Before getting into the details, it’s helpful to clarify a few core ideas. These concepts are not technical and should be understood around the table by Business, Legal, and IT. In fact, aligning on a shared vocabulary across these groups is part of the solution itself.

Data Products, Output Ports, and Data Contracts

These concepts are often linked to Data Mesh, but you don’t need a mesh architecture to benefit from them. Data Mesh is an approach for handling data in large companies where ownership is given to the teams closest to the data rather than to a central department. Each team treats the data they own as a product—something they’re responsible for maintaining and delivering to others in a reliable, governed manner. Even if you’re using a centralized setup, adopting “data products,” “output ports,” and “data contracts” as common terms is still highly valuable. It gives everyone a clear way to describe what data exists, who’s allowed to use it, how they can access it, and for what purpose. Without this shared language, discussions about data governance between Business, Legal, and IT will continue to break down. Start with the terminology, then build the architecture.

Data Product Model — generated by author

Data Product core concepts — created by author

In practical terms, the data pipeline could look like this:

A data product (“website user behaviour”)
Shares data through an output port (a group of tables in a database)
Is covered by a data contract (which states that the website user behaviour data can only be used to improve the products sold on the website, not for marketing or customer profiling; must be kept for no more than three years; and must be pseudonymized before being used in any predictive or AI model)

Every layer is clearly defined. Every layer is enforceable.

What a Data Contract Contains

The next illustration shows how a “Website User Behaviour” data product is assigned a data contract. Its output port is implemented as Iceberg tables in an S3 bucket, a setup common in modern data platforms where users can pick the tools they prefer. The contract travels with that output port: whoever accesses these tables, and for whatever reason, must follow the terms recorded in the contract.

Data Contract details – generated by author

A more detailed standard for data product contracts is available here².

The data contract will either reference or spell out the legal purposes and other restrictions that apply to this layer of data access.

Data Contract in the context of legal terms and documents – created by author

How This Works in Practice

The organisations I’ve seen succeed are those that already speak the language of Data Products and Data Contracts. They capture the outcomes of human conversations and agreements as structured metadata and propagate them through data pipelines and consumption layers. However, reaching those agreements in the first place still takes significant effort, and keeping them accurate as the business evolves is even harder.

To address these challenges, I propose a three-phase, AI-supported workflow: PREP, MAP, and RUN.

PREP is done once and refined over time. MAP is triggered for each new data activity or change to an existing one. RUN is continuous, automated monitoring that operates once contracts are active. At no point is ambiguity quietly handed from one team to another and mistaken for alignment.

Proposed three phase approach – created by author

Phase 1 — PREP

Before data activities can be governed through this process, IT takes the lead in setting up the foundation: the data product catalogue, the output port standard, the data contract template, and the LLM-assisted interfaces used in the MAP phase. Everything that follows builds on this preparation.

At the same time, I can state this clearly: one lesson I’ve learned from working in and with large organisations is that waiting for everything to be perfect before you begin is pointless—that day simply never arrives. Most organisations already have pieces of the puzzle, even if those pieces are scattered or incomplete. A sizable enterprise already running some form of Data Mesh might unknowingly be 80% there, while a smaller startup could still be building basic foundations. In reality, these transformations are typically iterative, messy, and happen alongside day-to-day business.

Start with what you have and work towards a proper data catalogue, focusing first on quick wins: high-value datasets that are frequently requested. For each “data product” (in quotes because that may not be your current terminology, and simply defining your collection of tables could be your first major step), capture only what matters most at this stage: the product name, the owning team, how it’s shared, and a one-line description of what it contains. Everything else—detailed column-level tagging, data-quality scores, lineage diagrams—can be added over time as the catalogue matures and the organisation gains confidence. A catalogue of five well-understood data products is far more useful than two hundred that are only half-documented.

Importantly, PREP cannot be a big-bang effort. A programme that tries to catalogue every data product, define every output port, and contract every dataset upfront

Here is the paraphrased HTML content:

The plans will stall before they launch. The work required is too extensive, stakeholder interest will fade, and the organisation will progress without a framework.

The LLM interfaces also follow this step-by-step approach. A functional MVP could be an active Claude connected to your catalogue data by using a ready-made MCP server (I use Entropy Data as it is straightforward to set up for testing purposes).

The result is not a completed framework but a functional MVP targeting high-priority data sets.

Phase 2 — MAP

The MAP phase happens whenever the organisation suggests a new data processing activity, or there is a change to an existing one, a new AI model, pipeline, access request, or contract modification. It is the organised replacement for open-ended compliance discussions. Three steps, three owners, one clear handoff each time, making certain that no unclear issues are passed between them unchecked.

Step 1 — Business initiates a change or fresh processing activity

When the organisation suggests anything related to data, a new AI capability or an additional source added to an existing pipeline for example, this process is triggered. This is not a meeting. It is a guided discussion with an LLM set up to document the nature of the change or fresh processing activity and convert it into the information required to activate the formal MAP process.

The LLM asks straightforward questions based on the framework configured by IT. According to the responses from business, the follow-up questions are customised based on the existing metadata, company knowledge documents and whatever the LLM is configured to access.

If the organisation states something unclear like “We want to use customer data differently,” the LLM will guide the conversation into the right structured approach and verify whether the request falls within the purposes already recorded on the relevant output port’s contract, or whether it constitutes a fresh processing activity requiring a new or updated contract.

The outcome is a structured summary of the change or fresh processing activity: the data product, the output port, the defined purpose, and any updates required.

Behind the scenes, this could involve YAML files with change tracking in GIT, but the LLM could display it as a PDF document or a document on a shared wiki.

Result: A structured summary of the change or fresh processing activity. The data product, output port, defined purpose, and scope of change along with any unanswered questions clearly highlighted.

Step 2 — IT receives updated contracts

IT receives the structured summary from Step 1, whether that is a suggested new contract or an amendment to an existing one.

IT can accept the amendments as they stand (as they would have already been curated at this stage) or they can choose to examine them further.

IT resolves whatever it can and forwards only the genuine ambiguities. Questions sent to Legal are pre-scoped and specific. They are framed as precise binary choices, not open-ended narratives. This is what makes Step 3 efficient.

Result: A new draft contract or annotated amendment, technically confirmed, with a brief list of specific, scoped questions for Legal — each worded to prompt a definitive answer, not a written opinion.

Step 3 — Legal reviews, decides, and signs off

Legal receives a near-final data contract with a brief list of specific questions that require legal judgment. They interact with an LLM loaded with the relevant regulatory context, whether it be GDPR, the AI Act, sector-specific regulations or the organisation’s own policies. The LLM walks them through each question systematically.

This is the critical design decision: the LLM is set up to guide Legal toward definitive answers. Not “It depends” or “We would need to assess,” but “Yes, this purpose aligns with the declared lawful basis on this output port,” “No, this output port may not be used for model training without explicit consent,” or “Permitted under these specific conditions, which must be recorded in the data contract.” Wherever Legal exercises judgment, that judgment is captured as a structured decision, not a paragraph of text that IT must interpret.

Legal owns what is signed off. The data contract is updated to reflect their decisions. If the access request is approved, the consuming data product is added to the catalogue and the contract governs its use automatically. If challenged later, the record shows exactly what was assessed, by whom, on what date, and on what basis. There is no uncertainty about accountability.

A few of the legal professionals I have spoken to over the last couple of years have already started recognising this shift themselves. From my perspective, the organisations that will navigate it best are the ones where legal teams become more architecturally informed and technically literate. The more they understand systems, data flows, and implementation realities, the more effectively they are able to own and validate the decisions being made. That is where the idea of a “human in the loop” starts becoming meaningful instead of symbolic. If legal teams remain purely advisory while increasingly relying on LLM-generated interpretations, they may end up trusting outputs they cannot fully verify themselves. Ironically, that dependency could itself become a compliance risk.

Result: A finalised, legally signed-off data contract attached to the relevant output port — ready for IT to implement directly, with no further translation required.

Phase 3 — RUN

Once data contracts are in place on output ports, the RUN phase operates continuously and automatically. There are no meetings, no manual audits, and no trigger required from Business, Legal, or IT.

In a study “Automating Data Governance with Generative AI”³ the use case checked 110 data access requests against privacy policies in real time. It caught every issue a human expert flagged, plus 3.6 times more warnings — 80% of which experts later confirmed

In practice, the RUN phase means the system is continuously performing tasks that no human team could handle manually at scale:

Automatically verifying every new data access request against existing contracts on output ports at the moment of submission — before any human even sees it.
Scanning the live data environment for new violations: whenever a restriction is added, the system identifies which previously approved contracts might be impacted and flags them for review.
Responding to governance questions instantly. Legal teams can ask in plain language: “Which data products currently lack a documented lawful basis?” or “Which output ports are being used for purposes not covered by their contracts?” The system provides answers immediately — no audit needed, no email threads.
Detecting policy drift — when regulations change or new internal rules are introduced, the system re-evaluates the entire contract portfolio to identify what needs updating.

As with earlier phases, begin small and expand gradually. Start by evaluating only new data requests. Over time, you might add periodic random checks of query logs to verify ongoing compliance. Alternatively, you could schedule quarterly audits. Adapt the approach to what your organisation actually needs.

Be realistic: the starting point won’t be perfect. The RUN phase will likely expose years of accumulated governance gaps. That’s expected. Problems you can see are problems you can fix.

The Outcome

The Business Value of Getting this Right

The benefits of truly automated, observable governance are easy to overlook if you haven’t yet noticed the wave approaching. Multiple vendors are already launching managed AI services.

What has struck me most over the past year is how rapidly non-technical staff are creating their own networks of AI agents and automations. Many are still experiments or side projects today, but it’s clear this will soon move into core business operations at scale. I’ve seen remarkably capable solutions built by people with minimal technical training — yet often with little understanding of the legal risks, governance obligations, or compliance requirements involved.

The study AI agents under EU law: A compliance architecture for AI providers⁴ suggests the current legal framework isn’t equipped for real-world, complex AI systems. Many experts expect legislation to lag behind, which means legal teams must be empowered to make risk-reducing decisions at scale.

In the approach I’ve outlined, because Legal’s decisions are captured as structured sign-offs on data contracts rather than written opinions, accountability is clear, defensible, and unambiguous. If anyone questions why a particular access request was approved, the record shows precisely what was evaluated, by whom, when, and under what conditions. This is the difference between claiming you acted responsibly and proving it.

My one concern with this approach

There’s one future scenario that worries me (aside from us all ending up like the humans in Wall-E). What if this plan works too well — so polished that it breeds overconfidence.

Initially this feels like progress: messy, hard-to-follow human interactions become clean, structured handoffs that are easy to query. But because the transitions are so smooth, confidence grows faster than understanding.

Humans begin reviewing the appearance of the output rather than its substance. “It looks fine” replaces “Yes, it’s interpreting the request correctly.” Gradually, the “human in the loop” thins out. First we review, then we just approve, then we ask a second LLM to check whether the first one missed anything. Eventually, the organisation isn’t really transferring knowledge between people anymore; it’s transferring plausible summaries between systems. The danger isn’t one obvious AI mistake. The danger is that responsibility spreads across a chain of beautifully formatted outputs that nobody fully owns, understands, or has time to question. When something goes wrong, everyone can point to a handoff, a review, a summary, or an approval step. But the original context is gone. The loop still includes humans, technically. It just no longer includes human judgment where it counts.

What if we do nothing?

As Jean-Paul Sartre said: “Once we know and are aware, we are responsible for our action and our inaction.” Choosing not to change isn’t neutral — it’s a step backward. The AI Act’s phased enforcement means organisations that haven’t built operational literacy by the deadline are already non-compliant. The compliance window is narrowing while AI adoption keeps accelerating.

In reality, every month of delay adds to the burden. The longer legal guidance stays locked in unstructured documents, informal interpretations, and human-only review cycles, the closer it gets to being indistinguishable from non-compliance. And as regulators increasingly expect demonstrable literacy, traceability, and accountability, “we didn’t know” won’t be a credible excuse.

If I had a crystal ball

If I could peer into the future, I’d expect some version of the Gartner hype cycle’s “Plateau of Productivity” to play out.

I believe around 2027–2028, regulators will push back hard if we haven’t regained control. We’ll see several real-world horror stories about Agentic AI causing serious harm within the next two years. That pushback will be met by companies and consultancies promising better-governed implementations (like the one I’ve proposed), and then for years we’ll see this careful dance between regulators and companies testing the boundaries.

In the end, we’ll get through it — and it will probably be less dramatic than some of us (myself included) imagine.

[1] ‘Processing’ as defined in Article 4(2) of the General Data Protection Regulation (EU) 2016/679 (GDPR).
[2] Bitol. (2025). Open Data Contract Standard (ODCS) (v3.1.0). LF AI & Data Foundation. https://bitol-io.github.io/open-data-contract-standard
[3] Dietz, L. W., Wider, A., & Harrer, S. (2025). Automating Data Governance with Generative AI. AAAI/ACM Conference on AI, Ethics, and Society. Available at: Automating Data Governance with Generative AI
[4] Nannini, L., Smith, A. L., Maggini, M. J., Panai, E., Feliciano, S., Tiulkanov, A., Maran, E., Gealy, J., & Bisconti, P. (2026). AI agents under EU law: A compliance architecture for AI providers. arXiv.

Top Posts

Astryx: Meta’s Open-Source React Toolkit—150+ Accessible Components, 7 Themes, and a CLI Agent-Ready Design System

Cloudflare Internal DNS is now generally available

Why Your IoT Architecture Owns the Data—And Why It Matters More Than Ever

Lost in Translation: When AI Reveals the Growing Divide Between Legal Reasoning and Machine Logic

NVIDIA Cosmos 3 Edge: 4B-Power Robot Brains Thinking and Acting on Your Device

5 Premier MCP Servers to Supercharge Agentic Development

Beyond Bandwidth: Why Goodput is the True Metric for LLM Serving Performance

I Tested a 4TB Quantum-Safe USB Drive: Maximum Security, Zero $3000 Price Tag

US Health Agencies to Pilot Test OpenAI and Anthropic’s AI Models

Smart Self-Categorize: Power Query & DAX Magic for Orphaned Rows

Astryx: Meta’s Open-Source React Toolkit—150+ Accessible Components, 7 Themes, and a CLI Agent-Ready Design System

Cloudflare Internal DNS is now generally available

Why Your IoT Architecture Owns the Data—And Why It Matters More Than Ever

NVIDIA Cosmos 3 Edge: 4B-Power Robot Brains Thinking and Acting on Your Device

HollowGraph: Weaponizing Microsoft 365 Events for 2050 Data Heists

Vietnam’s $1900 Crypto Crackdown & Korea’s Bold National Asset Move: Asia Express

Endless Code: Mastering the Art of the 24-Hour Claude Agent

Fired Shots, Fury Unleashed: Ex-Soldier Sparks Chaos at NY Federal Building

Trending

Astryx: Meta’s Open-Source React Toolkit—150+ Accessible Components, 7 Themes, and a CLI Agent-Ready Design System

Cloudflare Internal DNS is now generally available

Latest Posts

Not More Data, but Better World Models – Unite.AI

OpenAI Is Hiring Head of Preparedness, Amid AI Cyberattack Fears

Subscribe to Updates

Top Posts

Lost in Translation: When AI Reveals the Growing Divide Between Legal Reasoning and Machine Logic

Business: Results above all else

Legal: Risk management and defensibility

IT: Deterministic Controls

How AI widens the gap

From legal text to architecture-aware compliance

The proposal

The Core Idea: Eliminate human error in handoffs between Legal and IT

Data Products, Output Ports, and Data Contracts

What a Data Contract Contains

How This Works in Practice

Phase 1 — PREP

Phase 2 — MAP

Step 1 — Business initiates a change or fresh processing activity

Step 2 — IT receives updated contracts

Step 3 — Legal reviews, decides, and signs off

Phase 3 — RUN

The Outcome

The Business Value of Getting this Right

My one concern with this approach

What if we do nothing?

If I had a crystal ball

Related Posts