Recursive Fake Law
Can fake case law hallucinations become “true” over time?
No.
About a decade ago I was talking to my cofounder Drew. He was in undergrad and working on a machine learning algorithm to optimize traffic flow. I was a recent graduate, but was immediately hired back to work for the University. I had this paradigmatic question for Drew about machine learning: if machine learning is mathematically just an intensification of existing behavior, how do you iteratively update to account for changes in the underlying truth the data is built on? He told me that in technical terms, that’s call distribution drift. I get that question is both heady and abstract – but it mattered to the traffic flow he was working on. It matters for legal data too. Hear me out in the below:
If for instance a learning algorithm is working to optimize traffic flow through a city, it (back then) would be trained on the existing ways that drivers navigate the city. Then, it would be served up to users. Would this cause a calcification of existing behavior ? Would drivers just drive the path that wa s given to them? Out of cognitive ease, would they just drive the route given to them by the AI? Or trust that it had superior knowledge? If there was an increase in drivers now using the AI- recommended driving patterns, would that increase the number of drivers along those routes? In turn, would that intensify the learning data cyclically? Turns out, Route Choice and traffic optimization has been thinking about this problem for years. In legal AI however, it’s a little more new .
Some spectators are now thinking this pattern might be happening with fictitious case law (i.e. hallucinated case law) in off-the-shelf models. For instance, in Williams v. Capital One Bank, N.A., 2025 U.S. Dist. LEXIS 49256, pro se litigant Riyan Williams sued Capital One Bank over a closed account. In a section of the opinion, the court describes that Williams may have relied on AI in the creation of his brief because a number of citations did not exist. For instance, this case the court could not find "Pettway v. American Savings & Loan Association, 197 F. Supp. 489 (N.D. Ala. 1961)”. Other cites may exist with citation formatting errors, or may represent other nonexistent citations.
The problem is that the opinion fully reproduces the fictitious citations, just as we did here. Now, some commentators wonder what might happen with this material. Because the record of this fake citation now exists in a factual document, some observers are concerned it may appear in future AI as legitimate, or at least as existing somewhere. For instance, is our inclusion of the citation in this blog increasing the prevalence of this citation and cause AI to draw on it in the future? Who does this effect? Would it affect future pro se litigants using Claude? Or could it effect lawyers who use legal tech?
Truthfully – it’s a further problem for pro se litigants and foundational models, but it shouldn’t be a problem for any legally designed tools. Tools like David AI don’t just query an LLM and have it generate case law, where we hope that the LLM gets it right. In fact, our legal research for case law, statutory law, and PACER documents isn’t probabilistic and doesn’t involve a model at all. For us, other methods for search helps surface the appropriate legal authorities and a series of validation checks on David’s outputs help us control the output information. So, there’s no problem in fake cases existing because there is no opinion to pull with that name. Plus, we have our citation checker to verify any citations against .
For us, and other tools that have been built properly, there are checks that happen to make sure a citation that you make hasn’t been fabricated the AI. Now other legal tech vendors? And off the shelf models? We can’t speak to those.
This part is the technical exploration. If you’re a nerd like us or engineer, read on. Everyone else, just know you’re fine.
Foundational models or LLM driven ways to generate authorities could suffer, depending on how they are built.
Generally LLMs shouldn’t be trusted to do generate citations, since so much of the big models are likely since historically the largest foundational models have been decoder- only models. These are the ones that functionally involves picking the next word from the previous n words. This is what people are talking about when they call it a “more advanced version of your phones next word text suggestions” or “next word predictor”, and this is a worrying thing to rely on with actual citations since all of them follow a formula, but have fairly similar constructions . For instance, to a pro se , or a lawyer moving too quickly, the erroneous Pettway v. American Savings & Loan Association, 197 F. Supp. 489 (N.D. Ala. 1961) appears valid with respect to blue book citation. But to some scrutiny we lawyers should recognize that the re isn’t a Northern District of Alaska. In general LLM’s produce along the average of its training data , so the structural pattern will nearly always look correct but without being factual . In other words, it’s the appearance of correctness in the form of the citation without citation to a text that exists.
An obvious first fix to hallucinations has always been utilizing some form of external knowledge to augment an LLMs prompt – RAG. And while there are dozens of ways to implement RAG into an LLM’s prompt (be it with vector embeddings, knowledge graphs, SQL queries, and much more), this only guarantees that the input to a model will use the additional context provided. Its much more likely a LLM will output a response containing a specific case citation if within its input, you’re providing that case, a description of the findings, and its citation, than it would without all of that extra information.
However, you’re not guaranteed for this to appear in the output, even if it should be sufficient information for the model. This means robust post-processing steps are always necessary when interacting with LLMs, especially so in fields where false information being portrayed as fact is critically harmful , or where veracity is the most important .