· announcement · 4 min read
KL3M and the Kelvin Legal DataPack Find a New Home: Introducing the ALEA Institute
273 Ventures donates KL3M, the Kelvin Legal DataPack, and supporting tools to the newly founded ALEA Institute, a 501(c)(3) nonprofit advancing open-source legal AI.
Michael Bommarito
CEO, 273 Ventures
A Journey That Started in December 2022
In December 2022, when we founded 273 Ventures, the legal industry was just beginning to grapple with what large language models might mean for the practice of law. ChatGPT had launched weeks earlier. GPT-4 was still months away. Most legal organizations were watching from the sidelines, unsure whether this was a passing trend or a fundamental shift.
We were not watching from the sidelines. We had spent the prior decade researching computational law, natural language processing, and machine learning applied to legal text. We knew that the quality of AI systems depends on the quality of their training data, and that the legal industry had a serious data problem. Most general-purpose LLMs were trained on web scrapes that underrepresented legal and regulatory text, introduced copyright concerns, and lacked the careful provenance tracking that regulated industries require.
So we built the Kelvin Legal DataPack.
Building the DataPack and Training KL3M
Over the course of 2023, our team assembled what became the largest curated legal training corpus in existence: more than 2 trillion tokens of legal, regulatory, and financial text sourced directly from authoritative public repositories. Every document in the DataPack has clear provenance. Every source is documented. The corpus spans federal and state statutes, regulations, case law, SEC filings, patent documents, government contracts, and more.
With this foundation, we trained KL3M, the first large language model built exclusively on legally-sourced training data with transparent provenance. KL3M was not just an academic exercise. It became the first model certified by Fairly Trained, an independent organization that evaluates whether AI models respect the rights of content creators. That certification validated what we had set out to prove: you can build powerful language models without cutting corners on data rights.
We also developed All the Patents, a comprehensive dataset of U.S. patent documents, along with a suite of supporting tools for legal document processing, tokenization, and data pipeline management.
The Decision to Donate
As these projects matured, we faced a strategic question. The DataPack, KL3M, and the surrounding tools had become something larger than a product line. They had become infrastructure—resources that the entire legal AI community could benefit from. Researchers at universities were using our data. Startups were building on KL3M. Legal organizations were referencing our provenance methodology.
We realized that the greatest impact would come from making these resources fully open and placing them under the stewardship of an organization whose sole mission is to advance open, ethical AI research. Keeping them inside a commercial entity, even one as committed to openness as 273 Ventures, would inevitably create tensions between commercial priorities and community needs.
Introducing the ALEA Institute
In August 2024, members of the 273 Ventures team founded the ALEA Institute as an independent 501(c)(3) nonprofit organization. ALEA’s mission is to advance legal, ethical, and accessible AI through open-source research, data, and tools.
The name ALEA comes from the Latin alea iacta est, “the die is cast.” It reflects our conviction that the decisions we make now about AI training data, model transparency, and research accessibility will shape the trajectory of legal AI for decades.
What We Donated
The following assets have been transferred from 273 Ventures to the ALEA Institute:
- Kelvin Legal DataPack: The full 2T+ token legal training corpus, including all source documentation and provenance metadata
- KL3M Model Family: All model weights, training configurations, and evaluation benchmarks for the KL3M series
- All the Patents: The complete U.S. patent document dataset
- Supporting Tools: Data processing pipelines, tokenizers, and utilities developed alongside these projects
All of these resources are now available under open-source licenses through the ALEA Institute.
What This Means for 273 Ventures
273 Ventures remains fully committed to our commercial mission: helping legal organizations navigate AI transformation through training, strategic advisory, and products. Our Kelvin product line continues under active development. Our consulting and training practices continue to grow.
This donation sharpens our focus. We are a consulting and product company. ALEA is a research organization. Both are stronger when their missions are clearly defined.
What This Means for the Community
If you have been using KL3M models, the Kelvin Legal DataPack, or any of our open-source tools, nothing changes in terms of access. Everything remains available, now with the permanence and independence that nonprofit stewardship provides. Contributions, collaborations, and research partnerships are welcome through the ALEA Institute.
We encourage researchers, developers, and legal organizations to explore what ALEA has to offer:
- ALEA Institute: aleainstitute.ai
- KL3M on HuggingFace: huggingface.co/alea-institute
- GitHub: github.com/alea-institute
This is not an ending. It is a beginning: for ALEA, for the community, and for the next chapter of open legal AI research.