Web Archiving Roundup — Gothamist edition: November 13, 2017

On November 13, 2017November 13, 2017 By webarchrtIn News, Weekly roundupLeave a comment

Gothamist shutdown:
On Thursday, November 2, it was announced that the online-only, city-centric news outlets Gothamist and DNAinfo had been abruptly shuttered — archives and all — by owner Joe Ricketts in response to the organization’s vote to unionize. Both online newspapers, Gothamist (and LAist, DCist, Chicagoist, and SFist) and DNAinfo were updated numerous times each day, with a focus on local news, events, food, and culture.

This special edition of the Web Archiving Roundup takes a look at what others are saying about Gothamist and DNAinfo — and online news — in the wake of their sudden shutdown.

Archive, archive, archive: NiemanLab links to several external efforts to archive both Gothamist and DNAinfo, and reminds us of the risks of ‘billionaire-funded media.’ (Archived link.)
What We Lose in the Disappearing Digital Archive: on Splinter, David Uberti writes: ‘It’s likely that additional existing [online] publications will close in the face of economic upheaval, leaving their sites vulnerable to technical failure without consistent upkeep.’ Uberti also speaks with Abbie Grotke, web archiving team lead at the Library of Congress, who discusses the difficulties of capturing online news. (Archived link.)
When your server crashes, you could lose decades of digital news content — forever: in 2014, the Columbia Missourian suffered a server crash and ‘in less than a second, the newspaper’s digital archive of fifteen years of stories and seven years of photojournalism were gone forever.’ What’s worse, as Edward McCain writes, is that ‘very little is known about the policies and practices of news organizations when it comes to born-digital content.’ (Archived link.)
If a Pulitzer-finalist 34-part series of investigative journalism can vanish from the web, anything can: written in 2015, ‘Raiders of the Lost Web‘ argues that ‘the web, as it appears at any one moment, is a phantasmagoria. It’s not a place in any reliable sense of the word. It is not a repository. It is not a library. It is a constantly changing patchwork of perpetual nowness. You can’t count on the web, okay? It’s unstable. You have to know this.’ (Archived link.)

Tools and additional links:

From @rhizome and Webrecorder.io: a quick how-to video for recovering articles from the DNAinfo sites with @webrecorder_io, https://instagram.com/p/BbBH207Fx7y/.
Made by @xn9q8h and @turtlekiosk: the Gothamist Archive Retrieval Tool, which retrieves clips from Google AMP caches.
After Gothamist: how to read Web pages that have gone to their grave, from USA Today.

Conference alert: on November 15 and 16, follow along with Dodging the Memory Hole, a conference dedicated to the issue of preserving born-digital news content.

Web Archiving Roundup: October 2017

On October 19, 2017 By webarchrtIn News, Weekly roundupLeave a comment

After a brief hiatus, the Web Archiving Roundup is back this month. Here are a few quick links on recent web archiving topics:

How the Victoria and Albert Museum collected WeChat: “How do you collect an app? What is the thing you’re actually collecting? And what for?”
- Live URL
- Archived URL
Ashley Blewer asks, “How do web archiving frameworks work?” “If you wish to explain how web archiving works from a technical standpoint, you must first understand the ecosystem.”
- Live URL
- Archived URL
Collecting social media a bite at a time at the National Library of New Zealand: It “worries us that some of our documentary heritage may be lost if we don’t start collecting content” from social media.
- Live URL
- Archived URL
Can machine-learning models successfully identify content-rich PDF and Word documents from web archives? With support from the Institute of Museum and Library Services, the University of North Texas aims to find out.
- Live URL
- Archived URL
Rhizome to Host National Forum on Ethics and Archiving the Web: March 22-24, 2018, in conjunction with Documenting the Now and the New Museum in New York City.
- Live URL
- Archived URL
Is your organization involved in web archiving, or in the process of planning a web archive? If yes, and your organization is based in the United States, you have until November 17 to take this year’s NDSA Web Archiving Survey!
- Live URL
- Archived URL: n/a

Web Archiving Roundup: February 2017

On February 28, 2017 By webarchrtIn News, Weekly roundupLeave a comment

A few quick links on web archiving topics

SAA joins 60 other organizations and signs letter to the Office of Management and Budget about removing online government information.
Kalev Letaru, from the GDelt project, writes in Forbes about “Why aren’t we doing more with our web archives?“
The recent presidential transition has led to multiple news stories about web pages and data being removed from government websites.
- At Wired “Gun violence researchers race to protect data from Trump“
- At TechCrunch “White House LGBT’s rights page has disappeared“
- At Wired “Diehard coders just rescued NASA’s earth science data“
- At Quartz “Guerrilla archivists developed an app to save science data from the Trump administration“
Web archiving activity around government websites

Weekly web archiving roundup: September 18, 2015

On September 18, 2015 By webarchrtIn International News, News, Uncategorized, Weekly roundupLeave a comment

Weekly web archiving roundup for the week of September 18, 2015:

Writing WARCs (Kristinn Sigurðsson): A call for better library support for the WARC format
TVEyes Wide Shut: Ruling on Broadcast Archiving Service Undermines Fair Use (Vera Ranieri): A court decision in Fox News’s case against TVEyes has troubling implications for innovative uses of media
Copyright Fair Use: 1 Win, 1 Maybe and Two Losses for TVEyes (Ira Sacks and Erika Stallings): Another take on the TVEyes court case decision
10 Years of the Web Archive–What have we saved: A talk given by Andy Jackson, Web Archiving Technical Lead at the IIPC General Assembly 2015
The Internet Is Failing The Website Preservation Test (Ron Miller): A journalist explores the impact of “Internet memory loss”
When using an archive could put it in danger (Peter Webster): A Conservative party episode in the UK illustrates how a wider understaning of web archiving could make archived material more vulnerable
IIT Humanities Professor Discusses What Happens To Our Data Once We Die (Karis Hustad): Professor Mel Hogan will teach a class this fall on death, memories, and decay in the digital world
What does the web remember of its deleted past? (Dr. Anat Ben-David): “On March 30 2010, the country-code top-level domain of the former Yugoslavia, .yu, was deleted from the Internet…”

Weekly web archiving roundup: September 10, 2015

On September 10, 2015 By webarchrtIn Blog news, International News, News, Weekly roundupLeave a comment

Weekly web archiving roundup for the week of September 10, 2015:

Playing at Web Archiving: Why not use a interactive fiction engine to built a “web archiving simulator” that takes you through the core web archiving life-cycle?
Cooking Up a Solution to Link Rot: Sixty-six to seventy-three percent of web addresses in the footnotes of three Harvard law journals and nearly 50 percent of web addresses in U.S. Supreme Court decisions from 1996 to 2012 suffered from reference rot.
The Story of How the Internet Came to India–An Insider’s Account: On August 15, 1995, Videsh Sanchar Nigam Limited (VSNL) launched public Internet access in India. IBNLive is commemorating 20 years of the Internet in India with a special series.
Beginner’s Guide to Web Archives, Part 3: Coming to the end of his short time working on web archives at the British Library, science-policy intern Peter Spooner reflects on the process of creating a web archive special collection.
So You Want to Get Started in Web Archiving? Fear not, there are a few places to visit to get a quick sense of what’s going on!
LANL’s Time Travel Portal, Part 1 and Part 2: The Time Travel portal, launched in February 2015, provides cross-system discovery of Mementos.
Viral Content in the UK Domain: A look at how the web archive of the UK Domain approaches malware.
Using Mathematica to Plot Locations Mentioned in Web Archives: What we could do with Mathematica 10’s (relatively) new geographic visualization services?
Soft Launch of WebArchives.ca: WebArchives.ca provides access to the University of Toronto’s Archive-It Collection of Canadian Political Parties and Political Interest Groups, which they have been collecting since late 2005.

Weekly web archiving roundup: August 14, 2015

On August 14, 2015 By webarchrtIn News, Weekly roundupLeave a comment

Weekly web archiving roundup for the week of August 14, 2015:

“Brewster Kahle: A digital library, 13 min“. How Brewster Kahle and the Internet Archive will preserve the infinite information on the Web.

“NYU’s Fales Library acquires Triple Canopy’s Archive“. New York University’s Fales Library & Special Collections acquired the archives of Triple Canopy, the widely admired New York art and literary journal, published almost exclusively on-line since 2007.

“Smarsh First-Half Sales up 45 Percent Year-over-Year“. Smarsh, a provider of hosted archiving solutions for compliance and e-discovery, today announced its 2015 first-half sales grew 45 percent over the same time period a year ago.

“Express & Star photos archive: Your chance to tell us what you think“. Readers can help shape plans to digitise hundreds of thousands of photos from the Express & Star archive by having their say in a new survey.

Weekly web archiving roundup: July 24, 2015

On July 24, 2015 By webarchrtIn News, Weekly roundupLeave a comment

Weekly web archiving roundup for the week of July 24, 2015:

“Google accidentally reveals data on ‘right to be forgotten’ requests“, by Sylvia Tippman and Julia Powles. Data shows 95% of Google privacy requests are from citizens out to protect personal and private information – not criminals, politicians and public figures.

“Google’s hidden data reveals details of ‘right to be forgotten’ requests“, by Mona Lalwani. So far, Google has received about 281,000 individual requests that add up to over one million links.

“What if journalists weren’t controlled by tech? A conversation with Dave Winer“, by Melody Kramer. Journalists conversing about tech and journalism—one question about archiving.

Weekly web archiving roundup: July 16, 2015

On July 16, 2015 By webarchrtIn International News, News, Weekly roundupLeave a comment

Weekly web archiving roundup for the week of July 16, 2015:

“BBC Publishes ‘Right to be Forgotten’ Archive“, by John Lister. BBC has published a list of its articles which are no longer linked to through Google, due to a controversial European law.

“#Ramadaninaday: BBC curates social media conversation around Ramazan“. BBC Asian Network has initiated a unique social media campaign titled ‘Ramadan in a day’ to capture the spirit of Ramazan. The event will archive direct conversation with Muslims online from sunrise to sunset.

“Exploring Web Archiving at the Library of Congress“, by Samantha Abrams. Guest post by Samantha Abrams, an intern for the Web Archiving Team at the Library of Congress.

Weekly web archiving roundup: July 8, 2015

On July 8, 2015 By webarchrtIn News, Weekly roundupLeave a comment

Weekly web archiving roundup for the week of July 8, 2015:

“Digital Underground“, by Ann Powers. Who Will Make Sure The Internet’s Vast Musical Archive Doesn’t Disappear?

“Twitter shut down a site that saved politicians’ deleted tweets“, by Colin Lecher. Politwoops, a project from the Sunlight Foundation, launched in 2012 with a simple mission: save the deleted tweets that politicians would rather you didn’t see. Last night, Twitter shut the project down.

“Sunlight Foundation ‘mystified’ as to why Twitter killed Politwoops“, by Mark Sullivan. A day after news hit that Twitter cut off the feed that fueled the deleted politician tweet site Politwoops, the site’s owner, the Sunlight Foundation, said it’s confused about the real reasons for the move.

“MoMA Is Archiving Its Exhibition Websites Before They Expire“, by Allison Meier. Soon over 200 exhibition websites for the Museum of Modern Art (MoMA), going back to its first web experiments in 1995, will be totally archived, from their images to their code.

“The Web Will Either Kill Science Journals or Save Them“, by Julia Greenberg. In a study published last week, Vincent Larivière, along with his co-authors Stefanie Haustein and Philippe Mongeon, found that in the natural and medical sciences as well as the social sciences and humanities, five major publishers “account for more than 50 percent of all papers published in 2013.” Those publishers include Reed-Elsevier, Wiley-Blackwell, Springer, and Taylor & Francis. (The fifth differs for the two major fields—American Chemical Society for the hard sciences, Sage Publications for the more social ones.)

“Does the rise of ephemeral content spell the death of archives?“, by Melody Kramer. As news sites negotiate with Facebook to publish material directly on the platform, Facebook’s role in determining what news to surface, what news to censor, and how original content published on the platform is archived should be examined more closely.

“Google Photos – can it keep your digital memories safe?“, by Chloe Green. Nik Stanbridge, VP marketing for Arkivum, questions the long-term viability of cloud storage services like Google Photo for keeping important data.

Weekly web archiving roundup: June 24, 2015

On June 24, 2015 By webarchrtIn News, Weekly roundupLeave a comment

Weekly web archiving roundup for the week of June 24, 2015:

“Uh, you probably don’t want to tweet to @POTUS, actually“, by Caitlin Dewey. After Barack Obama belatedly joined Twitter last month— in his official, presidential capacity — dozens of Twitter denizens began tweeting him sex jokes, threats and other unprintable inanities.

“A New Interface and New Web Archive Content at Loc.gov“, by Abbie Grotke. Recently the Library of Congress launched a significant amount of new Web Archive content on the Library’s Web site, as a part of a continued effort to integrate the Library’s Web Archives into the rest of the loc.gov web presence.

“This start-up is helping the government keep track of social media“, by Amrita Jayakumar. Anil Chawla, a former IBM engineer, founded ArchiveSocial in 2011 to design an efficient way to store and search social media content.

“Dodge that Memory Hole: Saving Digital News“, by Abbey Potter. Newspapers are some of the most-used collections at libraries. Networked digital technologies have changed how we communicate with each other and have rapidly changed how information is disseminated. These changes have had a drastic effect in the news industry, disrupting delivery mechanisms, upending business models and dispersing resources across the world wide web.

“The Archive Is Closed“, by Scott McLemee. Update on the Twitter archive at the Library of Congress.

“Data is immortal, but not immune to decay“, by Martin Doyle. Data exists in a dangerous state of near-non existence. Few businesses would risk not having backups in place. With cloud computing becoming commonplace in enterprise, we’ve come to accept that our data will be replicated and stored in duplicate.

“The UK Web Archive, Born Digital Sources and Rethinking the Future of Research“, by Tim Hitchcock. Blog post derived from a short talk Hitchcock gave at the British Library in May 2015 focused on using the UK Web Archive. Written with PhD students in mind, it forms a meditation on the opportunities created when we are working with web sites rather than print.

“Facing the Challenge of Web Archives Preservation Collaboratively: The Role and Work of the IIPC Preservation Working Group“, by IIPC Preservation Working Group – DLib Mag. DLib Magazine article on the goals and activities of the IIPC Preservation Working Group (PWG), including as a survey about the current state of preservation in member web archives and a number of collaborative projects which the Preservation Working Group is developing. These resources are designed to help address the preservation and long-term access to the web by sharing ideas and experiences, and by building up databases of information for support of preservation strategies and actions.