Preserving the Federal Data Trump Is Trying to Purge

You have /5 articles left.
Sign up for a free account or log in.

A photo illustration on the concept of saving files, featuring the floppy-disk save icon and an arrow pointing to it, among images of documents.

Researchers say the data purges are also breaking the public’s trust.

Illustration by Justin Morrison/Inside Higher Ed

Within days of taking office, the Trump administration began purging federal demographic data—on a wide range of topics, including public health, education and climate—from government websites to comply with the president’s bans on “gender ideology” and diversity, equity and inclusion initiatives.

Over the past five months, more than 3,000 taxpayer-funded data sets—many congressionally mandated—collected by federal agencies including the Centers for Disease Control and Prevention, the National Center for Education Statistics, and the Census Bureau, have been caught in the cross fire.

One of the first data sets to disappear was the White House Council on Environmental Quality’s Climate and Economic Justice Screening Tool, an interactive map of U.S. Census tracts “marginalized by underinvestment and overburdened by pollution,” according to a description written under a previous administration.

Most Popular

It’s the type of detailed, comprehensive data academics rely on to write theses, dissertations, articles and books that often help to inform public policy. And without access to it and reams of other data sets, researchers in the United States and beyond won’t have the information they need to identify social, economic and technological trends and forge potential solutions.

“Removing this data is removing a big piece of knowledge from humanity,” said Cathy Richards, a civic science fellow and data inclusion specialist at the Open Environmental Data Project, which aims to strengthen the role of data in environmental and climate governance. “A lot of science is about innovating on what people did before. New scientists work with data they may have never seen before, but they’re using the knowledge that came before them to create something better. I don’t think we fully understand the impact [that] deleting 50 years of knowledge will have on science in the future.”

That’s why she and scores of other concerned academic librarians, researchers and data whizzes are collaborating—many of them as unpaid volunteers—to preserve as much of that data as they can on nongovernment websites. Some of the groups involved include OEDP, which is a founding member of the larger Public Environmental Data Partners coalition; the Data Rescue Project, Safeguarding Research and Culture, the Internet Archive, the End of Term Archive, and the Data.gov Archive, which is run by the Harvard Law School Library.

For Richards at OEDP, data-preservation efforts started right after Trump won the election in November.

She and her colleagues remembered how Trump, a climate change denier, had removed some—mostly environmental—data in 2017, and they wanted to get a head start on preserving any data that could become a target during his second term. OEDP, which launched in 2020 in response to the first Trump administration’s environmental policies, which prioritized fossil fuel extraction, compiled a list of about 200 potentially vulnerable federal data sets researchers said would be critical to continuing their work. They spent the last two months of 2024 and the first weeks of 2025 collecting and downloading as many data sets as they could ahead of Trump’s Jan. 20 inauguration, which they then transferred to stable, independent and publicly accessible webpages.

“That took time,” Richards said, noting that not every data set and its accompanying metadata was easy to replicate. “Each varied significantly. Some required scraping. In one case I had to manually download 400 files, clicking each one every few minutes.”

While they made a lot of headway, OEDP’s small team wasn’t able to preserve all of the data sets on their list by late January. And once Trump took office, the research community’s fears that the president would start scrubbing federal data were quickly realized.

“Data started to go down very quickly,” at a much larger scale compared to 2017, Richards said, with anything that mentioned race, gender or the LGBTQ+ community, among other keywords, becoming a target. “We started getting emails from people saying these websites were no longer working, panicking because they needed it to finish their thesis.”

As of this month, OEDP has completed archiving about 100 data sets, including the CDC’s Pregnancy Mortality Surveillance System, the Census Bureau’s American Community Survey, and the White House’s Climate and Economic Justice Screening Tool. As it works to complete dozens more, it’s also in communication with the other data-preservation efforts to make sure the work isn’t duplicated and that researchers and the general public can maintain access to as much data as possible.

Editors' Picks

‘Disrupted Trust’

Prior to Trump’s inauguration, 307,851 data sets were available on Data.gov. One month later, the number had dipped to 304,621. In addition to data-rescue efforts, the winnowing prompted outcry from the research community.

“As scientists who rely on these data to understand the causes and consequences of population change for individuals and communities, but also as taxpayers who have supported the collection, dissemination, and storage of these data, we are deeply concerned,” read a joint statement that the Population Association of America and the Association of Population Centers published in early February. “Removing data indiscriminately, even temporarily, from secure portals maintained by federal agencies undermines trust in the nation’s statistical and scientific research agencies and puts the integrity of these data at risk.”

Federal judges have since ordered the government to restore many of the deleted data sets—as of Sunday, Data.gov said there are 311,609 data sets available—and the Trump administration has complied, albeit reluctantly. For instance, the CDC’s Social Vulnerability Index, which since 2007 has tracked communities that may need support before, during or after natural disasters, came back online in February. But it now has a warning label from the Trump administration, which claims that the information does “not reflect biological reality” and the government therefore “rejects it.”

Richards, of OEDP, remains skeptical about the return of some of the data, speculating that the government may alter it to better fit its ideological narratives before restoring it. Thus, capturing the data before it gets taken down in the first place is “important for us to have that baseline proof that this is how things were on Jan. 18 and 19,” she said.

Lynda Kellam, a longtime academic data librarian who is helping to run the Data Rescue Project—which has already finished archiving some 1,000 federal data sets with the help of hundreds of volunteers—said she’s also “a little bit pessimistic” about the future of data collection. That’s not only because the Trump administration has fired thousands of federal workers who carry out that data collection, canceled billions in research contracts and removed reams of public data; it’s also because the Department of Government Efficiency has accessed protected personal data contained within some of those data sets.

“How do we actually talk to people about what’s protected and what those protections are for the data the government is collecting? DOGE has disrupted that trust,” she said. “For example, someone sent us a message asking us why they should participate in the American Community Survey when they weren’t sure what was going to happen with their (confidential, legally protected) data … There are still those protections in place, but there’s skepticism about whether those protections will hold because of what has happened in the past five months.”

Some legal protections are already eroding. On Friday, the U.S. Supreme Court sided with the Trump administration in determining that DOGE should have—for now—access to information collected by the Social Security Administration, including Social Security numbers, medical and mental health records, and family court information. (The case is now headed to a federal appeals court in Virginia that will decide on its merits.)

Henrik Schönemann, a digital history and humanities expert at Humboldt University of Berlin, who helps run the Safeguarding History and Culture initiative, which has also archived high volumes of federal data since January, said efforts to rescue federal data collections are vital to the global research community. “Even if the United States falls out of it, we are still here and we still need this data,” he said. And if and when this political moment passes, “hopefully having this data can help [the United States] rebuild.”

While Schönemann thinks it’s an “illusion” that independent federal data-preservation efforts can effectively counter the United States’ slide into autocracy, he believes it’s better than nothing.

“It’s building communities and showing people they can do something about it,” he said. “And maybe this empowerment could lead them to feeling empowered in other areas and give people hope.”

(This article has been updated to correct the spelling of Lynda Kellam's name.)