On Thursday, November 2, it was announced that the online-only, city-centric news outlets Gothamist and DNAinfo had been abruptly shuttered — archives and all — by owner Joe Ricketts in response to the organization’s vote to unionize. Both online newspapers, Gothamist (and LAist, DCist, Chicagoist, and SFist) and DNAinfo were updated numerous times each day, with a focus on local news, events, food, and culture.
This special edition of the Web Archiving Roundup takes a look at what others are saying about Gothamist and DNAinfo — and online news — in the wake of their sudden shutdown.
- Archive, archive, archive: NiemanLab links to several external efforts to archive both Gothamist and DNAinfo, and reminds us of the risks of ‘billionaire-funded media.’ (Archived link.)
- What We Lose in the Disappearing Digital Archive: on Splinter, David Uberti writes: ‘It’s likely that additional existing [online] publications will close in the face of economic upheaval, leaving their sites vulnerable to technical failure without consistent upkeep.’ Uberti also speaks with Abbie Grotke, web archiving team lead at the Library of Congress, who discusses the difficulties of capturing online news. (Archived link.)
- When your server crashes, you could lose decades of digital news content — forever: in 2014, the Columbia Missourian suffered a server crash and ‘in less than a second, the newspaper’s digital archive of fifteen years of stories and seven years of photojournalism were gone forever.’ What’s worse, as Edward McCain writes, is that ‘very little is known about the policies and practices of news organizations when it comes to born-digital content.’ (Archived link.)
- If a Pulitzer-finalist 34-part series of investigative journalism can vanish from the web, anything can: written in 2015, ‘Raiders of the Lost Web‘ argues that ‘the web, as it appears at any one moment, is a phantasmagoria. It’s not a place in any reliable sense of the word. It is not a repository. It is not a library. It is a constantly changing patchwork of perpetual nowness. You can’t count on the web, okay? It’s unstable. You have to know this.’ (Archived link.)
Tools and additional links:
Conference alert: on November 15 and 16, follow along with Dodging the Memory Hole, a conference dedicated to the issue of preserving born-digital news content.
After a brief hiatus, the Web Archiving Roundup is back this month. Here are a few quick links on recent web archiving topics:
- How the Victoria and Albert Museum collected WeChat: “How do you collect an app? What is the thing you’re actually collecting? And what for?”
- Ashley Blewer asks, “How do web archiving frameworks work?” “If you wish to explain how web archiving works from a technical standpoint, you must first understand the ecosystem.”
- Collecting social media a bite at a time at the National Library of New Zealand: It “worries us that some of our documentary heritage may be lost if we don’t start collecting content” from social media.
- Can machine-learning models successfully identify content-rich PDF and Word documents from web archives? With support from the Institute of Museum and Library Services, the University of North Texas aims to find out.
- Rhizome to Host National Forum on Ethics and Archiving the Web: March 22-24, 2018, in conjunction with Documenting the Now and the New Museum in New York City.
- Is your organization involved in web archiving, or in the process of planning a web archive? If yes, and your organization is based in the United States, you have until November 17 to take this year’s NDSA Web Archiving Survey!
A few quick links on web archiving topics
- SAA joins 60 other organizations and signs letter to the Office of Management and Budget about removing online government information.
- Kalev Letaru, from the GDelt project, writes in Forbes about “Why aren’t we doing more with our web archives?“
- The recent presidential transition has led to multiple news stories about web pages and data being removed from government websites.
- Web archiving activity around government websites
Weekly web archiving roundup for the week of September 18, 2015:
Weekly web archiving roundup for the week of September 10, 2015:
Weekly web archiving roundup for the week of August 14, 2015:
Weekly web archiving roundup for the week of July 24, 2015:
Weekly web archiving roundup for the week of July 16, 2015:
Weekly web archiving roundup for the week of July 8, 2015:
- “Digital Underground“, by Ann Powers. Who Will Make Sure The Internet’s Vast Musical Archive Doesn’t Disappear?
- “The Web Will Either Kill Science Journals or Save Them“, by Julia Greenberg. In a study published last week, Vincent Larivière, along with his co-authors Stefanie Haustein and Philippe Mongeon, found that in the natural and medical sciences as well as the social sciences and humanities, five major publishers “account for more than 50 percent of all papers published in 2013.” Those publishers include Reed-Elsevier, Wiley-Blackwell, Springer, and Taylor & Francis. (The fifth differs for the two major fields—American Chemical Society for the hard sciences, Sage Publications for the more social ones.)
- “Does the rise of ephemeral content spell the death of archives?“, by Melody Kramer. As news sites negotiate with Facebook to publish material directly on the platform, Facebook’s role in determining what news to surface, what news to censor, and how original content published on the platform is archived should be examined more closely.
Weekly web archiving roundup for the week of June 24, 2015:
- “A New Interface and New Web Archive Content at Loc.gov“, by Abbie Grotke. Recently the Library of Congress launched a significant amount of new Web Archive content on the Library’s Web site, as a part of a continued effort to integrate the Library’s Web Archives into the rest of the loc.gov web presence.
- “Dodge that Memory Hole: Saving Digital News“, by Abbey Potter. Newspapers are some of the most-used collections at libraries. Networked digital technologies have changed how we communicate with each other and have rapidly changed how information is disseminated. These changes have had a drastic effect in the news industry, disrupting delivery mechanisms, upending business models and dispersing resources across the world wide web.
- “Data is immortal, but not immune to decay“, by Martin Doyle. Data exists in a dangerous state of near-non existence. Few businesses would risk not having backups in place. With cloud computing becoming commonplace in enterprise, we’ve come to accept that our data will be replicated and stored in duplicate.
- “Facing the Challenge of Web Archives Preservation Collaboratively: The Role and Work of the IIPC Preservation Working Group“, by IIPC Preservation Working Group – DLib Mag. DLib Magazine article on the goals and activities of the IIPC Preservation Working Group (PWG), including as a survey about the current state of preservation in member web archives and a number of collaborative projects which the Preservation Working Group is developing. These resources are designed to help address the preservation and long-term access to the web by sharing ideas and experiences, and by building up databases of information for support of preservation strategies and actions.