AI Hallucinations Hit Deloitte, EY, and a Top Law Firm. Here's the Real Lesson.
AI Hallucinations Hit Deloitte, EY, and a Top Law Firm. Here's the Real Lesson.
Why fake citations became a boardroom risk — and what they reveal about trusting AI with business documents
Quick answer: In 2025 and 2026, AI-related citation failures surfaced in professional reports and legal filings involving Deloitte, EY, and law firm Sullivan & Cromwell. The lesson for enterprises is not "don't use AI." It's that AI-generated claims, numbers, and citations must be grounded in verifiable source documents before they leave the organization.
What actually happened
The headlines piled up fast. Deloitte Australia agreed to partially refund a government report after apparent AI-generated errors were found — including references to nonexistent academic papers and a fabricated quote from a federal court judgment. A revised version disclosed that Azure OpenAI had been used in preparing the document. Weeks later, a Deloitte report for the Canadian government of Newfoundland and Labrador drew scrutiny after false citations were found in a multi-million-dollar health plan; the province said Deloitte acknowledged that several references were incorrect.
It wasn't just Deloitte. In May 2026, AI-detection firm GPTZero published an investigation into an EY Canada report on loyalty-program safeguards, finding that most of its citations were hallucinated. The Financial Times later reported that EY withdrew the study, which included fake footnotes, made-up data, and a reference to a McKinsey report that did not exist. Around the same time, law firm Sullivan & Cromwell apologized to a New York court after an AI-assisted filing contained inaccurate citations and misquoted parts of the U.S. Bankruptcy Code.
These weren't obscure private drafts. They were government reports, public studies, and court filings — the documents organizations rely on to set policy, market expertise, and argue legal positions. The problem wasn't that the writing sounded weak. It sounded professional. That was the risk.
Why this matters beyond consulting
It's tempting to read these as isolated mistakes by professional-services firms. They aren't. They're a preview of what happens whenever AI produces high-stakes business content without verifiable source grounding — in financial filings, legal contracts, insurance claims, regulatory submissions, healthcare records, and internal executive reports alike.
The pattern is simple: AI can produce a confident sentence, a polished citation, or a plausible number even when the underlying source is missing, misread, or never verified. So the real enterprise risk isn't only hallucination. It's unverified authority. When a trusted organization publishes a false citation, the error borrows credibility from the logo attached to it. GPTZero researchers warned that fabricated information in reports from well-known firms can "poison the well," misleading future researchers and AI systems that later encounter the same material online. For enterprises, that's the lesson: a hallucinated citation isn't a typo. It's a governance failure.
The part most teams get backwards
After an AI incident, the usual response is predictable: add more human review, use a better model, write stricter prompts. All three can help. None solves the deeper problem alone. A stronger model can still reason from a broken input. A reviewer can still miss a fake citation if the system can't show where it came from.
The failure usually happened one step earlier — when a document, table, or citation was converted into data the AI could use. That's the layer most enterprises underinvest in. Before an AI system writes a report or triggers a workflow, it needs reliable evidence. In document-heavy work, that evidence must be extracted, structured, and tied back to the original source. Otherwise the system isn't reasoning on evidence. It's reasoning on a loose approximation of it.
What enterprises should do instead
The answer isn't to ban AI from professional work. It's to change what "AI output" means before it's trusted. Every AI-assisted report, filing, or summary should be able to answer five questions:
Where did this number come from?
Where did this citation come from?
Can the system point to the exact page, clause, or table?
Was the source actually retrieved, or merely generated?
Can a reviewer verify the evidence before the output leaves the organization?
If the answer is no, the organization doesn't have an AI writing problem. It has a source-grounding problem. That's why document AI matters upstream: the goal isn't just to extract text, but to turn complex documents into structured, reviewable, source-grounded evidence before a model reasons on them.
This is the layer DEEP Agent, Korea Deep Learning's document AI platform, is built for. It reads complex documents with a vision-language model that understands layout, tables, and visual structure, then outputs structured data where extracted values can be traced back to the original page. It also supports fully on-premise deployment with no external network calls during inference — so sensitive reports, filings, and records can be processed inside your own environment. On the public OCRBench v2 English leaderboard (2026.03), KDL Frontier ranks #1 with an average score of 68.1 — ahead of Google's Gemini and OpenAI's GPT models on the same evaluation, which tests recognition, extraction, parsing, and reasoning on complex real-world documents.
Benchmark scores are useful. But the more important enterprise test is practical: can the system show where every extracted value came from? If not, the output isn't ready for high-stakes work.
The enterprise takeaway
These AI citation failures aren't really about one firm or one report. They're about a new habit: trusting AI-generated output before checking the source evidence behind it. A professional-looking answer is not a verified answer. A citation-shaped sentence is not a real citation.
So before the next AI-assisted report leaves your organization, ask the harder question: can we show where every number, citation, and claim came from? If the answer is no, the problem isn't just hallucination — it's weak evidence infrastructure. And that's the layer to fix first.
Bring a complex document — a financial report, legal filing, contract, or table-heavy PDF — to a 15-minute live session and see it converted into structured, source-grounded output. Request a demo at koreadeep
Frequently Asked Questions
Why do AI tools fabricate citations in reports? When a model generates text that looks like a reference without retrieving or verifying a real source — often while filling gaps or summarizing incomplete context — it can produce fake but convincing citations.
How can enterprises prevent AI hallucinations in documents? Start upstream. Parse documents into structured, source-grounded data before AI reasons on them, so every claim, number, and citation traces back to a real source a reviewer can verify.
Which firms were involved in recent AI citation failures? Publicly reported 2025–2026 cases include Deloitte Australia, Deloitte's Newfoundland and Labrador report, EY Canada's withdrawn loyalty study, and Sullivan & Cromwell's AI-assisted court filing.
Is this only a consulting-industry problem? No. The same risk appears anywhere AI touches high-stakes documents: finance, legal, insurance, healthcare, government, and regulatory reporting.
Where does DEEP Agent fit? Before AI generation, retrieval, or workflow automation. It turns complex documents into structured, source-grounded data so downstream AI systems work from verifiable evidence rather than loose text.