Dear Readers,
The legal press is buzzing about artificial intelligence upending research and brief‑writing. From glossy vendor brochures to breathless webinars, one might think the old‑fashioned method of pulling cases and crafting arguments by hand is already obsolete. Don't be fooled. In my previous post, I examined how AI can put lawyers at risk. Here, I try to explain what happened from a tech perspective.
I have met people who are skeptical of AI in general. They do so at their own peril; whether AI takes jobs or simply changes them, AI is here to stay. Lack of familiarity with these tools will put a knowledge worker at a significant disadvantage compared to others. But legal research still requires human judgment to evaluate precedent, assess factual distinctions, and craft persuasive arguments—skills AI cannot replicate. At least not yet.
In my last Law & Learning Machines newsletter, I referenced a report indicating that, between 17 and 30 percent of the time, that would be unacceptable for a human.

Law and Learning Machines
Proliferation of LLM "Wrapper" Apps
Over the past year, dozens of startups have launched "wrapper" tools designed for legal office management and research. A wrapper is a thin software layer that adds a domain‑specific user interface, prompts, and integrations on top of a general large‑language model (LLM)—most commonly OpenAI’s GPT‑4, Anthropic’s Claude, or Google’s Gemini. Crucially, these wrappers do not train their own foundation models; they call the same underlying API that everyone else uses and simply re‑package the response. That means the core reasoning engine is identical across myriad products—the only differences are workflow design, data security measures, and pricing. The uniqueness of these tools stems from the dataset they use and how it interacts with retrieval augmented generation.
Retrieval‑Augmented Generation (RAG) – How It Works
Retrieval-Augmented Generation (RAG) is a hybrid approach that first retrieves relevant source documents and then generates an answer grounded in those sources, thereby *reducing hallucinations and improving traceability. Unlike general‑purpose chatbots that rely solely on their internal training data to predict answers, a RAG system consults the designated knowledge base first and only then generates text.
Step 1 – Retrieve. When you pose a question, the system first searches an external knowledge base—e.g., a brief bank, commercial case‑law database, or your discovery documents—using keyword and/or vector similarity.
Step 2 – Augment. Those passages are injected into the prompt sent to the large‑language model. The LLM now “sees” both your question and the retrieved snippets.
Step 3 – Generate. The model drafts an answer that incorporates, quotes, or cites the supplied text. Because it is grounded in real sources, hallucination risk theoretically drops compared with a bare LLM call—but in actual use, erroneous or misleading citations can still surface.
These so‑called “hallucinations” arise when the model confidently invents facts, quotes, or citations that do not exist in the provided sources or anywhere in reality. RAG systems aim to mitigate this risk by forcing the model to ground its answer in retrieved documents before generating text—but the safeguard is imperfect, and low‑quality or irrelevant source material can still lead to bad outputs.
The recent Stanford study evaluating leading legal-research platforms that integrate RAG—and draw on proprietary case-law datasets—still found that answers were incorrect or misleading 17% to 33% of the time.
Notably, the platforms studied were not new startups, but rather the long-established heavyweights, Westlaw and LexisNexis, whose services have shaped modern legal practice by making research a primarily digital endeavor. In several headline‑grabbing hallucination cases, the lawyers involved seem to have relied too heavily on these brands’ decades‑long reputations instead of applying healthy skepticism—allowing nonexistent citations to slip into their filings.
Their premium monthly subscription fees likely reinforced the belief that these services were the uncontested gold standard—"you get what you pay for"—further muting the healthy skepticism lawyers should apply. ("We ran it through AI.")
The hidden opportunity cost of delegating research to AI systems is that counsel may not realize a cited authority is inaccurate—or entirely fictional—until it is too late, a risk the profession has never previously confronted. Or it might be counter‑productive, requiring even more time than traditional research to confirm that the case exists, that the quotation is genuine, and that the factual background is kosher.
Time will tell if this is just a bump in the road on the way to AI‑assisted research or if this hallucination problem is here to stay.
Until Next Week.
Lance