· 4 min read

Introducing the Kelvin Legal DataPack, the Largest Legal Training Dataset

Over 150 billion legal and financial tokens with clean provenance and commercial licensing

Over 150 billion legal and financial tokens with clean IP rights and commercial licensing


Chicago, IL - August 21, 2023

2023 is the year that generative AI went mainstream, with hundreds of millions of users exploring models like ChatGPT, Bard, and Llama. After observing these capabilities and achievements like GPT-4 passing the Bar Exam, many leading legal organizations have begun to explore how to invoke generative AI models - or even build their own.

In support of these efforts, 273 Ventures, the company behind the Kelvin Legal Data OS, is announcing the Kelvin Legal DataPack - a dataset containing over 150B tokens of foundational legal, regulatory, and financial text that can be leveraged to support organizations across their AI journey.

The Kelvin Legal DataPack is the first large-scale legal dataset with clear provenance and commercial use rights; it also includes enrichment and annotation to support a wide variety of use cases.

“High-quality data is the foundation of all large language models. Since last year, the team has been hard at work collecting, curating, and enriching the largest corpus of legal documents directly available for bulk download by the industry. It’s a proud moment for us, as we built the DataPack with our own Kelvin Legal Data OS, proving its extensive and scalable capabilities processing and enriching legal data,” says Michael Bommarito, CEO of 273 Ventures.

Model performance is critically impacted by the quality and quantity of training data. Whether used to train embedding models or as part of training or tuning extractive or generative models, the Kelvin Legal DataPack’s mixture of curated legal data sources enables organizations to rapidly build or customize high-quality, compliant AI models.

“Coming out of stealth mode, the Kelvin team is already hard at work with various legal organizations including law firms, legal departments, and other legaltech providers to help bring enterprise legal AI offerings to market through the Kelvin Legal Data OS. The Kelvin Legal DataPack will only further accelerate and improve these efforts, and we’re very excited to see what our customers will be able to accomplish with this offering,” says Daniel Martin Katz, CSO of 273 Ventures.

The Kelvin Legal DataPack also contains several separately available collections, including the Kelvin Contract DataPack with nearly 20B tokens. This Contract DataPack is designed to support common customer use cases, such as the development of playbook automation and market comparison analytics. As competitive forces are encouraging many organizations to “personalize” their own GPT-like model, it is critical to combine foundational data with task- and organization-specific knowledge. The Kelvin Legal Data OS is designed to help organizations achieve this goal by connecting their internal data and systems, and the Kelvin Legal DataPacks further extend the capability of firms to develop their own custom LLMs.

“The appeal of LLMs is clear in theory to every knowledge worker, but unfortunately real implementations are often hampered by both internal and external restrictions due to how models were trained and data flows. Forward-looking organizations are now opting to build their own models with clear provenance to manage business continuity risk, address information security concerns, and meet global data protection requirements. Our Legal DataPacks will allow firms to do this more quickly and compliantly. We look forward to supporting organizations as they navigate this competitive landscape,” says Jillian Bommarito, CIPP/US/E and Chief Risk Officer of 273 Ventures.

About 273 Ventures

The 273 Ventures team has developed the Kelvin Legal Data OS to organize and connect data from various structured and unstructured sources, including documents like contracts and briefs, timekeeping entries, and laws and rules. Kelvin ships with automation for legal-specific use cases, like due diligence and regulatory monitoring, as well as connectors for common systems like Aderant and TeamConnect. Kelvin is LLM-agnostic, with support for practically all commercially available large language models, including GPT-4, Claude, and Llama 2. Kelvin is a modern purpose-built platform specifically designed for the legal industry, with an emphasis on compliance with information security standards and data protection laws. Kelvin can run on your own physical server or private cloud, on a developer's laptop, or in any public or hybrid cloud environment.

To learn more, contact us at hello@273ventures.com or visit kelvin.legal.

Back to Blog