Public Data Project

Preserving Civic Ground Truth

What Is the Public Data Project?

In early 2025, the Library Innovation Lab launched the Public Data Project, a major effort to collect and publish federal datasets with proof of authenticity and provenance. Our work began with copying 311,000 datasets from Data.gov between November 2024 and January 2025. More recently, we moved to capture all public domain Smithsonian data.

The driving focus for this work is the “one copy problem.” Simply put, information that exists in a single location, or is supported by a single funding stream, or administered by a single entity, is at considerable risk of disappearing — or worse, being changed without notice. This problem, which has long been an area of focus and advocacy for us, threatens our cultural memory, our ability to access the data we need to know what has happened so we can plan where we are going. The sweeping loss of public data on the federal web beginning in 2025 is only the latest, and largest, demonstration of the internet’s mass fragility and vulnerability to shocks.

The Public Data Project is equipping a nationwide network of libraries, archives, and nonprofits with the tools they need to safeguard the most vulnerable U.S. federal data and to build the technical, organizational, and human infrastructure required for long-term, low-cost stewardship of public information. The Public Data Project builds on our history with large-scale data and digital preservation projects, such as the Caselaw Access Project and Perma.cc.

The Public Data Project’s current work includes:

  • Producing open-source data monitoring tools in collaboration with America’s Data Index;
  • Enhancing federal data access and visualization in collaboration with Radiant Earth;
  • Developing graduate-level training curriculum for the next generation of librarians;
  • Rethinking inter-governmental and inter-institutional frameworks for digital mutual aid to preserve cultural memory.

Check this page, as well as our blog, for updates to this project. We are grateful for the support of the John D. and Catherine T. MacArthur Foundation and the Rockefeller Brothers Fund (RBF). The opinions and views here do not necessarily state or reflect those of the contributors.

About Us

Born out of the Harvard Law School Library’s commitment to stewarding legal history and government documents for centuries in collections ranging from original copies of the Magna Carta to U.S. government documents from the early days of the Federal Depository Library Program, the Public Data Project is part of LIL’s mission to bring library principles to technological frontiers.

The Project Lead is Dr. Molly Hardy, who works closely with LIL Director Jack Cushman to guide and administer the project. Senior Software Engineer Christopher Setzer is technical lead on the project, and Product and Research Manager Halle Burns oversees tool development and data acquisitions. Harmony Eidolon supports all aspects of the project’s work. To learn more about the LIL team generally, please visit the About page, and please contact the project at publicdata@law.harvard.edu.

Updates

News