Public Data Project

Preserving civic ground truth.

What Is the Public Data Project?

The Public Data Project enhances discovery and monitoring of large government datasets by building cutting-edge tools and interfaces that enable access, discovery, and monitoring of multimodal collections. We preserve and make accessible large datasets in service to this development and in response to risks. We position our work among the collective effort to change the cultural climate around government data, so that it is understood as a resource held in the public trust. Our work began with copying 311,000 datasets from Data.gov between November 2024 and January 2025. More recently, we captured all public domain Smithsonian data.

Through technological innovation, inter-institutional collaboration, and pedagogical development, the Public Data Project is equipping a nationwide network of libraries, archives, and nonprofits with the tools they need to safeguard the most vulnerable U.S. federal data and to build the technical, organizational, and human infrastructure required for long-term, low-cost stewardship of public information. The Public Data Project builds on LIL’s history with large-scale data and digital preservation projects, such as the Caselaw Access Project and Perma.cc.

The Public Data Project’s current work includes:

  • Producing open-source data monitoring tools in collaboration with America’s Data Index;
  • Enhancing federal data access and visualization in collaboration with Radiant Earth;
  • Developing pedagogical materials for the next generation of librarians and technologists;
  • Rethinking inter-governmental and inter-institutional frameworks for digital mutual aid to preserve cultural memory.

Check this page, as well as our blog, for updates to this project. To receive our quarterly newsletter, please sign up using our form.

For press and other inquiries, please contact the project at publicdata@law.harvard.edu.

We are grateful for the support of the John D. and Catherine T. MacArthur Foundation and the Rockefeller Brothers Fund (RBF). The opinions and views here do not necessarily state or reflect those of the contributors.

Photo credit: Warren K. Leffler, courtesy Library of Congress

About Us

Born out of the Harvard Law School Library’s commitment to stewarding legal history and government documents for centuries in collections ranging from original copies of the Magna Carta to U.S. government documents from the early days of the Federal Depository Library Program, the Public Data Project is part of LIL’s mission to bring library principles to technological frontiers.

Guided by Harvard Faculty and Law School Library Director Professor Jonathan Zittrain and Assistant Dean Amanda Watson, Dr. Molly Hardy leads the Public Data Project, working closely with LIL Director Jack Cushman. Senior Software Engineer Christopher Setzer is technical lead on the project, and Product and Research Manager Halle Burns oversees tool development and data acquisitions. Harmony Eidolon supports all aspects of the project’s work, and Jacob Rhoades designs our graphics.

Advisory Board

  • Jim Cowie, Internet History Initiative
  • JJ Dearborn, Data Futures
  • Paul Ford, Aboard
  • Gretchen Gehrke, Environmental Data & Governance Initiative
  • James R. Jacobs, Stanford University Libraries
  • Lynda Kellam, University of Pennsylvania Libraries & Data Rescue Project
  • Merrilee Proffitt, Internet Archive
  • Jed Sundwall, Radiant Earth

Updates

News