Think back to early January 2023. Republicans were preparing to take control of the House and disband a committee that Democrats set up to investigate the Jan. 6 insurrection. In the process, they were expected to scrub the committee’s website and all the evidence that had been collected there, including an interactive timeline of the day’s events. A team of internet archivists had other plans. In the days before the handover, they logged every website, video and document the committee had published online before it potentially disappeared forever. They worked against the clock to save the records, like a scene in the kind of nerdy political thriller that only captivates Washington. That moment was a bit of unique drama in a longer — and very serious — effort to preserve digital records vital to our democracy. As more information is shared exclusively online, saving the political corners of the internet, particularly government websites, is crucial to capturing our collective history for future generations. And as the Jan. 6 committee example shows, it can also protect it from tampering in a hyper-partisan political climate. Now, that same group of archivists — a coalition from government, academia and nonprofits — has begun capturing the Biden administration’s digital footprint. The monthslong undertaking is called the End of Term Archive, and it has occurred every four years since the George W. Bush administration. Archivists first amass a sprawling list of public government URLs. They then catalog all of those websites (and the websites within those websites) and a snapshot of their content. In the end, it’s as much as 300 terabytes worth of material. “Think of it like a spider,” Mark Graham, the project’s archivist-in-chief, said on the POLITICO Tech podcast. “You start at one place and then you begin crawling out as far as you can see on these different websites. And that's how you try to get a pretty good overview.” Graham works for the Internet Archive, a nonprofit that aims to preserve digital history, perhaps best-known for its long running project called the Wayback Machine, which allows you to look up old versions of websites. He also leads the End of Term Archive, which is a joint effort with the Library of Congress, University of North Texas Libraries, Stanford University Libraries, the U.S. Government Publishing Office and the National Archives and Records Administration. Right now the End of Term Archive is preparing for its initial “crawl” of government websites next month, and will then do another around the inauguration in January, Graham said. And a digital copy of those websites will be available almost immediately to the public via the Wayback Machine. But this particular data is also offered in bulk to academics and computer scientists for research projects. That’s different from most websites archived through the Wayback Machine, Graham said, which can only be accessed through individual URLs. Government websites are considered “publicly accessible” and therefore can be downloaded in bulk. In recent years, huge datasets like these archives have been drafted for another use: training artificial intelligence. Similar data has already been used to create AI systems that can explain the U.S. legislative process or talk through the content of particular bills, for example. “This is a relatively new area, but it's certainly something that everyone is paying attention to,” Graham said. The most traditional use — turning ephemeral websites into a firm historical record — is still the most important for the future, in Graham’s view. For historians, government watchdogs and journalists, the End of Term Archive has become a tool for seeing how government websites change over time and across administrations. Many of those changes are innocuous, Graham said. Some websites simply go dormant or get replaced. In other instances, however, politics are at play. During the Trump administration, for instance, the Environmental Protection Agency made headlines for eliminating a website focused on fighting climate change, and for removing references to climate change across other websites and government documents. But for Graham, and his band of archivists, the objective is simpler. “We're just doing our part to try to preserve the digital artifacts of our time, to help preserve and make available and make useful our cultural heritage,” Graham said. “And if that has the effect of causing people to think twice when they may try to change that record, so be it.”
|