Libraries and Public Access to Federal Data: Chris Marcum Talks to the Public Data Project

On May 7, 2026, Molly Hardy, Project Lead for the Public Data Project, sat down for an interview with Chris Marcum, Senior Fellow for Data Policy at the Data Foundation and former Senior Statistician at the White House Office of Management and Budget. Please click the video above to listen and watch; the interview transcript below has been lightly edited for clarity.

Public Data Project:
Hello, my name is Molly Hardy, and I’m here at the Library Innovation Lab’s Public Data Project. I’m the director of the project. And I’m very pleased to be welcoming Senior Fellow for Data Policy at the Data Foundation and former Senior Statistician from the White House Office of Management and Budget, Chris Marcum. Chris and I are going to have a conversation that’ll go about 45 minutes. It centers around a report that Chris recently published, The Integrity of Public Access to Federal Data: Evaluating Disruptions to Open Government Data, 2025–2026.

Cover of Chris Marcum's report, 'The Integrity of Public Access to Federal Data: Evaluating Disruptions to Open Government Data, 2025–2026' — Source: *The Integrity of Public Access to Federal Data* (2026).

And through his explanation of the flaws in the evidence cited to assess government data loss since 2025, Chris explains the complexities and intricacies of government data collection and distribution, offering those of us in the library community real insights into how we might move forward in our work to preserve and make accessible government data. Government documents and data librarians have been thinking about the preservation and access to government publications for decades. See, for example, James A. Jacobs and James R. Jacobs’s Preserving Government Information: Past, Present, and Future.

And as the Internet Archive’s recent Information Stewardship Forum 2026 on building shared practices for the preservation and access of government information highlighted, librarians, technologists, policymakers, and community advocates need to work together to address the fragmentation and challenges in preserving and accessing government information. And I want to add a quick plug here for the Preservation of Government Information call to action that folks may want to check out and sign that came out of that meeting in San Francisco.

So in February 2025, the Library Innovation Lab announced its archive of the federal data clearinghouse, Data.gov, and our Public Data Project emerged from this effort. In October of last year, we shared Data.gov Archive Search, an interface for exploring this important collection of government datasets. This work builds on recent advancements in lightweight, browser-based querying to enable discovery of more than 311,000 datasets comprising almost 18 terabytes of data on topics ranging from automobile recalls to chronic disease indicators.

So, given his illustrious career in advocating for the preservation and access to government data, the Public Data Project has learned a lot from Chris. And we greatly value this recent report that he’s issued, again, called The Integrity of Public Access to Federal Data. And I’m so pleased today to have a chance to sit down with Chris and ask him to expand on areas of the report that might be of particular interest to the library community. So, welcome, Chris.

Chris Marcum:
Thanks so much, Molly. I’m super excited to be here. I’m just tickled that you all at the Public Data Project have asked me to come and speak with you today about the report. And I’m just really, really honored. Thank you.

Public Data Project:
Absolutely. Could you just tell our audience a little bit about your background? I think it’s really fascinating, and it would be helpful for folks to understand where you’re coming from.

Chris Marcum:
Yeah, sure. So first and foremost, I’m an open science advocate, and have been steeped in information policy in the U.S. federal government for over the last five or six years.

But that’s not what I was trained in: I have a PhD in sociology, and I did a postdoc in economics and statistics at Rand Corporation, where I was looking at vaccination uptake behavior during the H1N1 potential pandemic that didn’t turn out to be a pandemic thanks to high-quality data shared by the CDC. And the late Dr. Nancy Cox was able to share that data.

So eventually I ended up at the NIH. I was doing basic research as a methodologist on biobehavioral health and social networks in the context of heritable health disease. And I started getting this policy itch. I was like, we write really, really great research papers. We produce a lot of amazing data. But ultimately, the impact of that is pretty limited. We’re talking to a very narrow audience of other researchers. And I really wanted to have a broader impact.

And so I started looking for opportunities to do more policy-related work. And NIH is not a policy-setting agency outside of the NIH itself. And so I wanted to really think about how to cut my teeth in policy.

So I joined some committees in the intramural research program. We have a scientific review committee that’s like the Center for Scientific Review for extramural [research], where we were reviewing other intramural scientists’ research. And then I got involved in the data access committees. And that really accelerated my interest in information policy. I was able to go over to help set up a new program in the Office of the Director at NIAID — National Institute of Allergy and Infectious Diseases — called the Office of Data Science and Emerging Technologies. And that was done right at the start of the pandemic. So work there was done really in data sharing and training, and training people how to share effective data, standing up a new data access committee.

And that launched me into the national stage, where I ended up being invited to President Biden’s Fast Track Action Committee on Scientific Integrity. And that led Alondra Nelson at the White House Office of Science and Technology Policy to invite me to lead open science for the Biden-Harris administration, all the way before I got to OMB later on. So it’s been a long, winding career road.

Public Data Project:
That’s fascinating. It’s such an intersection of direct policy work, as you say, as well as the work that we in libraries are concerned with around preservation and access. It’s really great to have your perspective here.

And so if you don’t mind, we’ll just go ahead and jump into the report. And we librarians, we love lists. We love indexes. We love bibliographies. We love catalogs, right? And so a point that you repeatedly returned to in the report, one that I really took to heart, is that the Federal Data Catalog, often referred to as the FDC, is neither a repository nor is it a stable indicator of data accumulation or loss.

So I’m wondering: can you tell us what it is then? That is to say, how is it best understood? And if you could explain the relationship between the Federal Data Catalog and Data.gov, that would be really helpful.

Chris Marcum:
Yeah, this is a nuance in federal information policy that is not well understood or appreciated, even by the members of Congress who, ostensibly anyway, should have an interest or a stake here. So the Federal Data Catalog is a statutory requirement in the Foundations for Evidence-Based Policymaking Act. It’s in Title II, which is also known as the Open Government Data Act. And it basically establishes a centralized catalog or index of every agency’s federal data assets.

And previously, there had been an initiative started by the Obama administration that launched Data.gov that is hosted by GSA. Now, Data.gov did not serve as a repository. This is not where data is being deposited in the sense of, like, an institutional repository that many libraries are most familiar with. And instead, it just pulled in the information that agencies were indexing on their own inventories of data.

And so when the Foundations for Evidence-Based Policymaking Act was passed, it just made a lot of sense, right, to take advantage of the infrastructure that Data.gov provided. And so what we like to characterize it as is that Data.gov provides the Federal Data Catalog. And so the relationship is that Data.gov is the landing place for the Federal Data Catalog.

The Federal Data Catalog is comprised of an aggregation of what are known in the statute, in the Open Government Data Act, as agency comprehensive data inventories. This is just an index of every data asset they hold, but not the data themselves.

They [federal agencies] are under-resourced in terms of budget and staffing, and it would take an army for every agency to be able to do this comprehensively.

Public Data Project:
Okay, so I understand it’s not a repository, but I don’t understand completely why it’s not comprehensive. I mean, the words you just used would make me think that if every agency is submitting their indices, why isn’t it comprehensive?

Chris Marcum:
Yeah, this is a really good question. It comes down to the practicalities of implementation.

So today, there are over 500,000 datasets listed on Data.gov. Most of those are federal data assets. There are some data assets in there from state and local governments because Data.gov will index if they’re supplied to the GSA, the General Services Administration that administers Data.gov.

But the question about why the Federal Data Catalog isn’t comprehensive when, in fact, the federal agencies are required by statute to have a comprehensive data inventory.

And if you think about that number, around 500,000, it’s probably an order of magnitude lower than the actual number of federal data assets that federal agencies hold. And if you go back and you think about the complexity of all of the types of data and what is defined as a data asset that an agency might hold, you have to think back over the course of the history of that agency, and they might hold on to datasets for a long time. It becomes just a huge challenge to be able to index them, to digitize those. Some of those data assets are probably still on paper. Many of them have probably ended up, to some extent, in the National Archives already. And so there has been a loss of the record of those data.

And so it’s a complicated problem. It’s really challenging for an agency to be able to do a comprehensive inventory.

But the hope is that after we, and when I say “we,” [I mean] the Office of Management and Budget — while I was there, I was one of the leads of the development of an implementation guidance memo known as M-25-05, which is where we’re trying to translate Congress’s intent into an implementation strategy for the agencies to comply with the law on comprehensive data inventories.

And what’s really interesting about that is that the hope was that it would guide agencies to make sure that they have a forward-looking perspective. So everything that comes in now should be open by default, and that you should prioritize existing data assets based off of some strategies that you and your privacy officials and your chief information officers might have, and the agencies and your stakeholders might have for all the past data.

And so really it’s a forward-thinking guidance document. And so that’s why there’s under-resourcing that agencies are faced with, and the chief data officers’ staff. They’re under-resourced in terms of budget and staffing, and it would take an army for every agency to be able to do this comprehensively.

Public Data Project:
Yeah, that’s great. That’s really, really helpful and sobering to understand. Thank you for taking the time to explain that to us.

Another thing that really struck me in the report that really just rang true — my own background is in the history of bibliography. And you talk about a lack of consistent or transparent methodologies generally across the government and across the care for federal data assets. And one distinct part of that lack is in definitions — that is, clear definitions.

And you offer some helpful nuance when you distinguish between deletion, access removal, and discontinuation around federal data. That’s really important because when we’re talking about data rescue and things like that, those lines often get blurred. And it’s really important to remember how and why data might not be accessible.

But I was wondering, too, just at a very basic level, do we have a definition for federal data? Does it come down to who is collecting the data? Or because we know that contractors often do this work, is it who’s funding the collecting of the data? Something else? And then I guess I would just layer in, too, how and why might that definition matter? And I have some ideas myself related to what you were saying earlier, but I would love to hear if you had any additional thoughts on that.

Chris Marcum:
Yeah, so I wouldn’t say there’s a definition of federal data with the qualifier “federal,” but there is a definition of data in the Foundations for Evidence-Based Policymaking Act, as well as some other statutes.

And that definition is technical and a little bit boring, but — I’m going to use some acronyms — in 44 U.S.C. [3502], Congress has defined data as recorded information acquired or maintained by an agency, I believe. [Note: 44 U.S.C. 3502 defines “data” as “recorded information, regardless of form or the media on which the data is recorded”; related terms such as “data asset” refer to data maintained by an agency.] And so in the Open Government Data Act, there’s a provision that talks about recorded information, regardless of its form or the media on which the data is recorded, and that it’s acquired or maintained by the agency.

That is really important because in the modern age, we think of data as being digital, right? But this really gives a definition of data that is broader and that can include recorded information on paper, recorded information on [other media]. What I love to imagine is these new forms of data preservation where we have, like, crystals being inscribed. Or data being recorded in genomes, for example, has been a novel thing. So it’s a really broad definition.

And when you ask [about contractors], let’s say a contractor is working with the federal government and they’re collecting data. By statute, that data is owned by the federal government. It’s federal data. And so the Evidence Act, the Open Government Data Act, what is very clear is that those data assets do need to be inventoried. And any encumbrances on those data assets, let’s say that an agency partners with an organization that provides proprietary data for some services. If the agency is maintaining those data or it acquires them under whatever legal definition their lawyers can come up with, that has to be inventoried.

But the encumbrances on those data also need to be disclosed very transparently in the metadata. So the comprehensive data inventories have to say whether or not there’s copyright associated, and how the public can access it, if the public can access it, for example. I think the biggest component is transparency in that the agency has access to that data.

If data is put into an institutional repository, or is regularly used, or accessed via the cloud, there’s a good argument to say the federal agency is maintaining that data. …

Where it becomes more nebulous is on derivative datasets. And so you can imagine that you have a large corpus of data where you’ll have a dataset that lots of agencies create sub-datasets from … Are those data assets, and do they count as something being maintained?

Public Data Project:
Right, so the agency has access to it. And this word “maintained,” I might postulate, is even more nebulous than the word “preserve.” What does that mean in this context — to maintain that data?

Chris Marcum:
Yes, so does it mean that the agency has ingested it into their institutional repository? Does it mean that it’s stored on a computer in just one person’s office?

The chief data officers have to all go through this exercise where they have to figure out what the definition means to the agency’s mission. And so “maintained” here, I think, encompasses deposit in repositories. So if data is put into an institutional repository, or is regularly used, or accessed via the cloud, there’s a good argument to say the federal agency is maintaining that data.

Certainly data that are being updated, or are being cleaned, or being processed or used are also being maintained. And so that’s been a very easy one to handle.

Where it becomes more nebulous is on derivative datasets. And so you can imagine that you have a large corpus of data where you’ll have a dataset that lots of agencies create sub-datasets from: maybe bespoke use cases, or little research projects. Are those data assets, and do they count as something being maintained?

And so that becomes more of a product-focused approach to data. Is the thing that needs to be inventoried the parent dataset or any of these child datasets that might propagate after them? And that’s more complicated.

Public Data Project:
And returning to this concept of parent and child datasets, am I right to say that that is part of the reason that the numbers of datasets in Data.gov can fluctuate so wildly?

Chris Marcum:
Yeah. So one of the things that happened early on last year that got a lot of press and got attention even by members of Congress was that there was a lot of fluctuation shortly after the inauguration through the month of February on the top-level counts. Data.gov provides a top-level count, the number of data assets indexed in the Federal Data Catalog, and it was bouncing around on the order of a few thousand datasets.

And it just so happens that one of the very mundane reasons that can happen is because Data.gov is dynamic. It pulls in information from the federal agencies. And so if federal agencies are updating their comprehensive data inventories, then that will be reflected on Data.gov.

One of the big ways that that number can change is when an agency decides to put a series of data into a collection. And then historically on Data.gov, the way they handled that is — instead of enumerating every single one of the child datasets, you can imagine that there might be a project that has five datasets and they get collected into a single collection or put into a single collection. And then the inventory goes down by four because only the collection is being counted.

Now, the new iteration of Data.gov, the new update, doesn’t do that. It actually counts the individual data assets inside a collection. So this has been something that’s been desired by the community for a long time, and GSA is finally being responsive by updating Data.gov to make a more accurate reflection of the true count of datasets.

But it can happen the other way, too. You can imagine that a collection is, well, these are no longer one entity. There might be separate datasets, but there are separate maintenance tracks and update tracks, and they get broken up from a collection. That can also happen.

Public Data Project:
I just want to understand better. When you talk about Data.gov pulling in from federal agencies, is that automated? First question. And second question: how does it then relate to what you said earlier about it being statutory that this happens, that federal agencies contribute? So is it like there’s this automated process that you do or don’t sign up for? What is actually going on there with the vacuuming-in of data?

Chris Marcum:
Really good question. That is one of the mysteries in information policy.

So the way that this massive federated apparatus works — Cole Donovan and I recently wrote a piece for the Federation of American Scientists where we have a very simple sentence that I think has a lot of impact: “governing is hard.” And in this case, governing data is hard.

So I want to point your listeners to a resource, resources.data.gov, where they outline some of this process, to look at the information on data sources for Data.gov.

So what happens is the statute requires every agency to have a comprehensive data inventory. Some agencies have more than one. These become the data sources that are harvested by Data.gov. And some agencies have more than one, even though the statute says they have to have one.

Again, the complexities of implementation mean that [there are exceptions]. Like, the Census’s TIGER files have their own inventory because they’re updated with some regularity and they’re complex. And these are the shapefiles that give us our maps, basically, for the country. They’re relied upon by pretty much everything and they’re taken for granted because we all use them on Google Maps and other platforms.

And so what will happen is these inventories are promulgated at the agency level. They sit on agency servers. And then GSA has a harvesting routine that happens pretty much daily that goes through, crawls those sources, and then pulls in the information, updating its master list, which is the Federal Data Catalog.

Public Data Project:
Okay, thank you. And so then to return to the Federal Data Catalog, that’s the lodestone, the cornerstone of all of this. Thinking back, just to return to our initial conversation about its incompleteness. Were you made information czar, what would you do to make it more complete?

If we were to say that it would be a civic good to have a complete catalog, what would we do to get to that completeness?

Chris Marcum:
So I would first and foremost recognize that it is an extremely difficult task for the agencies.

And so, as I’ve said, notwithstanding resource limitations, staffing limitations, if we had some statutory authority with an appropriated budget that is sufficient to accomplish this, it would be really helpful for every agency to establish a data governance board that then goes through all of the use cases with effectively every staff member.

And we did this exercise in the Office of Management and Budget, or started to before I departed last year. But our CIO, Chief Information Officer, brought us together, about 20 or 30 of the staff members, to just talk about — hey, what data do you use? What data is important to you? What data do you store in your computer? What data do you make derivative datasets from? What do you need from us that you don’t have access to?

And that started the process for establishing a comprehensive data inventory within that office.

Establishing a data governance board that then goes out and makes sure the staff are trained in data access and management best practices, but are also aware of the need for inventorying all the data assets and to make sure the definitions for those data assets are governed — that would be what I would do. And I would make that a requirement for every agency and have the agencies report back up to, say, the Office of Management and Budget or another appropriate office as things evolve in the government.

There’s also … great expertise in the library community within the federal government. … And so greater interagency coordination is absolutely necessary for the success of this.

Public Data Project:
That’s great. Thank you. And in that work — you mentioned the National Archives, that some things go there. Of course, we’ve got our Library of Congress, which I realize has a somewhat complicated history when it comes to this kind of work. But I’m just wondering, are there library/archive institutions within the government already that would play a role here? Or is that a big lack?

Chris Marcum:
No, I think there are. I mean, it’s “yes and.” So, yes, there is a role for the National Archives. Obviously, the National Archives have to help agencies with their final disposition of all of their records and information that appear in datasets. Of course, those are records, and they are subject to the Federal Records Act for the most part.

So you have the National Archives, which has responsibilities on archiving information. They also have responsibilities for promulgating standards. They do the classification standards. And so it’s really helpful for agencies to be able to take advantage of this existing body of knowledge around, what is this? Is this controlled unclassified information? Is this secure information? And there’s already a lingua franca available.

There’s also, like you said, great expertise in the library community within the federal government. And one of the areas that I just love to talk about is that many agencies hold material collections. Obviously, we think of maybe the big ones, like the Smithsonian. There’s a huge material collection, huge libraries.

But then there are more nuanced cases like at NIST, the National Institute of Standards and Technology. They’ve got their reference materials database, a reference materials library. That is a licensed library that people pay to have access to. But they have a lot of knowledge on how to curate information in a structured manner for accessibility and preservation for the long term.

And so greater interagency coordination is absolutely necessary for the success of this. I like to even point to the fact that NIST a few years ago developed the Research Data Framework, where they provide a governance strategy for federally funded research data. And so this goes beyond just what the agency themselves are requiring or producing, to that which their grantees produce.

Public Data Project:
I’m thinking, too, another example might be the NASA Library, which of course was recently in the news and in peril, right?

Chris Marcum:
Yeah, so not all of the NASA libraries, just the library at Goddard has been shuttered. [Note: Additional NASA library closures have been reported since 2022.] And that is a tragedy because Goddard represents a wealth of material and informational assets that really require librarian stewardship over.

And to have those assets transferred either to the National Archives or probably, as the case may be now, shuttered and just locked behind a door while that process unfolds, really does not do a service to the public good. And it certainly doesn’t do a service to the researchers who rely on those resources at the lab.

I think it’s worth reflecting for a second on the ways in which the work of the government, when done best, is transparent. And that’s another way of saying it is accessible to all. … That is half of the reason that libraries exist: preservation and access, right? And so [between the government and libraries] there’s a very natural connection and shared mission in terms of the public good.

Public Data Project:
For sure. Our conversation has naturally shifted from questions around basic preservation to access. I think it’s worth reflecting for a second on the ways in which the work of the government, when done best, is transparent. And that’s another way of saying it is accessible to all. And that is the goal. That is half of the reason that libraries exist: preservation and access, right? And so there’s a very natural connection and shared mission in terms of the public good. So, yeah, that just all makes a lot of sense to me.

I would be remiss were I not to bring up metadata because we always want to talk about metadata. All roads lead to metadata. You note in your report that inaccurate metadata is a major issue, and the misclassification of datasets, and also misleading and rotting URLs, the kind of maintenance work that librarians are quite familiar with.

So I was just curious, in terms of metadata standards — I know they exist. Is that the issue, that the standards aren’t hitting it quite right? Is it an implementation issue? Is it something that’s happening in the aggregation? Where does the inaccuracy creep in? And then also the misclassification, and this obviously missing maintenance work. Lots on the table there, if you’d be willing to pick up any of that.

Chris Marcum:
I’m going to answer you with an answer I think you’re really going to appreciate. I think that the amount of, let’s just call it error, in agencies’ comprehensive data inventories is a strong indication of the need for more information scientists in those agencies, like librarians, like repository experts, to help with the curation.

Because ultimately the information in the metadata catalogs is only as good as it is entered, typically by people. And so you get a lot of errors that can occur based on human input error. You also get errors that occur when, like, a CIO migrates a system to a new server. And then all of a sudden, the links for the data sources are all broken. We’ve seen that happen in the past. An API that might serve up information about data or serve data itself might change. It might change vendors. And then that API might have a different URL propagation system. And so that can change. And so it takes time, of course. But if they had good information systems experts and information scientists available before these decisions are made, that will help tremendously with reducing the amount of error in the future.

On classification, I found this really fascinating for data in the Federal Data Catalog, because the law is not clear. And I will say that having struggled for a long time with my colleagues at OMB on how to communicate what constitutes a data asset, a public data asset, an open government data asset — these things are all in statute, but the distinction between them is not as clear as Congress could have made them. And part [of it] is probably because there certainly wasn’t an MIS or someone with an information sciences background writing the law, per se.

And so what we found is that the interpretation historically has been left up to the agencies and left up to individual subject matter experts or individual staffers. And so you get this really interesting mosaic of what gets captured as a data asset. And so it can range anywhere from a PDF of an infographic to, you know, the Census. And the diversity of that is just wild.

I think that hopefully M-25-05, the implementation guidance, provides some additional clarity on the structured nature that we expect of data. It’ll provide agencies more clarity, but they’ll also exercise more care in classifying their assets as they go through their prioritization of which assets need to move from federal data assets to public to open government data assets.

And again, it’s a tough problem. The other part of me is like, I love the fact that I can find, for example, CDC’s anti-smoking infographics on the Federal Data Catalog. But I just don’t think they belong there. And so it’s like, I love that they’re preserved and that they’re available. But are they data assets?

And so if you don’t preserve that data, then the tools, as you said, are kind of useless, right? Because they don’t have the high-quality information that you require. On the other hand, I am a strong believer in democratizing data and making it accessible and approachable to people.

Public Data Project:
Right. You talk in the report in really helpful ways about the distinction between data tools and data sources. And what is it that we need to be advocating for? The tools are amazingly powerful and they’re wonderful. And yet without the data behind them, there’s no there there.

Chris Marcum:
Yeah, it’s so fascinating because what enables many of the tools that have been taken down by this administration and put back up by civic society organizations is the fact that the underlying data have remained publicly accessible and were publicly accessible, publicly available.

And so if you don’t preserve that data, then the tools, as you said, are kind of useless, right? Because they don’t have the high-quality information that you require.

On the other hand, I am a strong believer in democratizing data and making it accessible and approachable to people. Denice Ross and I recently produced and published a Federal Data Field Guide. It helps to make federal data just more approachable. And it is, in effect, a type of data tool because it’s like an aggregation of all of these different data types. It provides an ontology.

I really do have an appreciation for democratization. I think the data tools really do provide that accessibility. And I think the modus operandi of this administration is to increase friction in the approachability of publicly accessible data. And so if you take down the tools that help everyday people interpret federal data, I think that’s part of the goal — even if you maintain access to the online data itself. So I’m right there with you. And the distinction is really important and it needs to be emphasized. Ultimately, if we’re targeting preservation, we definitely have to handle the underlying data because without the data, you don’t have the tools.

And I’d also want to add in another nuance and something I think a lot about, as when I was a senior statistician and senior scientist at OMB, is data reports. Data tools, typically, are interactive, and they help you interpret. But a lot of the economy relies on economic reports where the underlying data are confidential statistical data. They’re not readily publicly accessible. You have to go through a clearance process to get access to them, either through the Federal Statistical Research Data Center program or through the agency research data centers themselves. And there are costs associated with that. You have to be licensed and get clearance.

And so instead, what the agencies do is they create these wonderful aggregated quarterly, monthly, yearly reports that provide aggregated statistical data and information.

Many economists, many reporters, they consider that to be data, right? This is the federal economic data. It is not the dataset that underlies those data. It’s just the aggregations. And so that’s another really important nuance I didn’t talk about in my report, but is one that we have to really think about because these are costly. And the statistical agencies that produce them are under resource constraints and under threat.

Not only are data about people, but the entire data infrastructure relies on people. And the reduction in workforce capacity? There is irreplaceable, non-AI-replaceable damage that has been done.

Public Data Project:
Exactly. And the level of expertise it takes to produce them — the people who really know the data.

I’ll just ask one last question. What I’d love to close our conversation around is federal workers. And we’re not too far away from May Day to honor federal workers. As you know, I was DOGE’d myself a year ago, so this is a topic very close to my heart.

I’m going to embarrass you a little bit and quote from your own writing, because I was really struck by these sentences. You write, “By hollowing out subject matter experts and other critical staff across agencies, the administration reduced data integrity capacity in a systemic manner. Ultimately, this systemic disruption created lasting deficits in the nation’s ability to reliably collect, protect, and disseminate the vital data necessary for informed policymaking, economic forecasting, and scientific research.” I just thought that really summed it up in a lovely way.

So I wanted to see if you had any closing reflections on the relationship between the precarity of federal data and the slashing of the federal workforce.

Chris Marcum:
Not only are data about people, but the entire data infrastructure relies on people. And the reduction in workforce capacity? There is irreplaceable, non-AI-replaceable damage that has been done in this current administration to the federal workforce.

And you see some recalcitrance by the administration at this point in acknowledging that, where the Office of Personnel Management is touting that they’re going to hire thousands of tech workers. But they had just fired, like 300,000. Or 300,000 or so had departed.

So I would say, first and foremost, this is Public Service Recognition Week. And the public servants like you and myself, whether you have departed the federal workforce by your own volition, like myself, or not, like yourself, I think it’s incredibly important to recognize that subject matter expertise is absolutely essential for the integrity of federal data and for the integrity of maintaining public access to federal data.

Public Data Project:
That’s great. Thank you so much. And I think that’s the perfect place to end. And I just want to say thank you so much for your work.

And again, to give a shout out to The Integrity of Public Access to Federal Data, this fabulous report that Chris recently published. And we encourage everyone generally to pay attention to your work, because it’s just so valuable to all of us on so many levels. So thank you.

Chris Marcum:
Well, thank you, Molly, and thanks to your project and the great work that you all are doing with both the Data.gov project and everything that LIL is doing. I really appreciate it and really appreciate the opportunity to talk to you.