Big Context Windows Bad. Data Curation Good
(It’s bad design to just dump a ton of data into an AI simply because context windows are getting bigger and it’s easier to.)When working with modern AI systems, the context window—a mechanism allowing models to process and recall input data—is a critical factor shaping output quality and operational efficiency. In legal, this looks like feeding the model lots of documents for purposes of your query. However, in the rush to maximize utility, many users of off the shelf tools, and some developers of legal tech, adopt the strategy of dumping large amounts of data into these extended context windows without considering the drawbacks. While this approach might seem like a quick solution, at 2nd Chair we think the realities of financial cost, latency, environmental impact, and performance issues compel us to build differently.
Financial Cost and Token Utilization
The financial implications of overloading a context window are significant. Each word or token added to the context window comes at a literal financial cost when using API-based language models. The longer the input, the higher the expense incurred for processing—even if much of the data isn't directly relevant to the task. A sprawling, uncurated context can lead to bloated token usage that drains resources unnecessarily, leaving users to foot the bill for inefficiency.
Large law firms might be able and willing to foot the bill. Some legal tech vendors might have free tokens through incentive programs, or other cost saving techniques. But for small and solo law firms, or larger law firms but with proportionally lower tech budgets, these folks can’t just dump files in and incur those costs.
Curation, by contrast, ensures that only the most relevant data is included in the context window, optimizing token usage and minimizing costs. Instead of throwing in unrelated or redundant information, careful selection allows users to achieve precise and effective results without breaking the bank. At 2nd Chair, we do a ton of document preprocessing to select what matters. Those keep model costs low, which pass on cost savings to our customers.
Latency: The Hidden Cost of Length
Long context windows inevitably affect latency, the time required for a model to generate responses. Each token increases computation overhead, stretching out processing times. This sluggish performance can be frustrating in real-time applications, where responsiveness is key—whether it's lawyers chatting with an AI, paralegals trying to turn around documents on a tight timeline, or late night work sessions where an attorney just craves time at home and sleeping.
Curating data before submission reduces latency by ensuring only essential information reaches the model. This streamlined input creates quicker computations and faster outputs which enhances the user experience and operational efficiency.
For small or big law firm , off the shelf of legal AI vendor , everyone feels a latency corrosion when dumping large data into a model. There is a fulcrum that balances the ease of dumping lots of docs in, and the latency in return time given the input size. While individual user preference probably varies as to where the balance is, at 2nd Chair, we prioritize speed (i.e. low latency). Importantly, that latency doesn’t have to be big if we simply build in ways that reduce the data dumped into the model using preprocessing techniques.
Environmental Cost and Sustainability
The environmental impact of extended context windows is an often-overlooked consequence. Loading more data into a context window demands additional computational power, which translates to higher energy consumption. Each query contributes to the carbon footprint associated with server farms and data centers globally. In an age of increasing environmental awareness, pouring excessive , extraneous, or redundant data into a model runs counter to sustainability goals.
By curating context data, users can reduce the workload for AI systems, thereby limiting energy consumption and minimizing their environmental impact. A lean input strategy aligns better with green technology practices, offering a more responsible approach to AI utilization.
Even for large or AI-enthusiastic law firms, everyone should be thoughtful how unneeded computation is wasteful.
Performance Issues and Diminished Accuracy
Performance is often where overloading context windows delivers its most damaging blow. Feeding models with sprawling datasets can dilute relevance and overwhelm the system, leading to errors, confusion, or suboptimal results. Extended context windows don't guarantee smarter or more accurate outputs; in fact, they often lead to degraded performance due to excessive noise in processing. Think of it this way – if you wanted a human associate attorney or paralegal to assist you in document review or discovery, why would you hand them extra documents? The more documents you give to them, the more likely they are to make a mistake. If you could mentally separate files into generally relevant before hand , wouldn’t that help these humans work? AI is no different. The more irrelevant info given to an entity (model or person) the more likely might form wrongful associations or respond incorrectly.
Curating data ensures that the context remains tightly focused and relevant, enabling the model to perform at its best.
Conclusion
It’s easier for users or less sophisticated engineers to get excited about big context windows. But at 2nd Chair we just build what our users need. It’s how we keep costs low so that we can democratize AI tools for the legal profession. It shouldn’t cost an arm and a leg to do legal research. You shouldn’t hate getting upsold to add legal tech on top. We build smart so you can have strong technology at a fair price.