Web Archiving Roundup: March 2020

In addition to the IIPC’s collaborative collecting project on the coronavirus, send me your institutionally curated COVID-19 web archiving projects. I will compile them and post them to the section website. My email address is nmg266 at nyu dot edu.

We are also looking for proposals for participation in the Web Archiving Section’s annual section meeting on theme of “web archiving and rights.”

Multiple web archiving events have been postponed due to the coronavirus. Archives Unleashed has changed to a remote-only event. The IIPC Web Archiving Conference and General Assembly has been postponed until September 28-30, 2020. Engaging with Web Archives is also postponed.

Archive-it is hosting a webinar on April 1 on the Wayback API. And the Community Webs project received additional funding! Other Archive-it news can be found on their blog.

In addition to the establishment of the coronavirus web archive, the IIPC has multiple announcements, including the newly elected steering committee member organizations and a published address from IIPC Chair on IIPC funded projects, consortium agreement renewals, collaboration with CLIR,  training materials, and collaborative collecting. The IIPC blog also featured a guest post from the National Library of Australia on their migration to Python Wayback.

New version of MemGator was released!

Happy 15th birthday to the UK Web Archive!

And lastly, here is a GitHub archiving individual stories from Wuhan during the coronavirus pandemic.

Stay home, wash your hands, stay healthy, and archive websites!

Web Archiving Roundup: February 2020

We are co-hosting a webinar with Archive-it on web archiving collection development policies and goals on February 18. Join us!

Archives Unleashed releases new features to TWUT and notebooks for working with the Archives Unleashed toolkit. Additionally, they also released a preprint of “The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives” by Nick Ruest, Jimmy Lin, Ian Milligan, and Samantha Fritz, a report of their progress over the last three years working on the project. Their findings include the ability to generate “derivative products” so users can manipulate web archives data just like any other dataset without specialized knowledge. Other Archives Unleashed news can be found in their newsletter.

IIPC announced theirsteering committee election nomination statements and featured a blog post from the Royal Library of Belgium on potentially setting up their own web archives as part of the PROMISE project.

The DIGHUMLAB announces the WARCnet Kickoff meeting from May 4-6, 2020 at Aarhus, Denmark. WARCnet is a transnational web archives studies network and this event will be the first in a number of conferences focused on the study of national and transnational web domains and events using web archives.

And here is a handy blog post by Jeff Huang on how best to create a website that is easily archivable, titled “This Page is Designed to Last.”

Web Archiving Roundup: January 2020

The Library of Congress Web Archiving Team published a blog post announcing the release of 4,258 new web archives across 97 event and thematic collections. The blog post also details how they made their web resources more discoverable through adding collection pages and adding more description on the collection level.

Niels Kerssens published an article in Internet Histories on legacies in European internet history, specifically Euronet.

Carnegie Hall Archives reported on their work archiving the Carnegie Hall’s websites using Webrecorder. Currently, they have captured the legacy version of the Carnegie Hall Performance History Search and the Games and Listening Guides websites. Webrecorder also published new tutorial videos.

The Boston Globe published an article about the Digital Preservation Coalition’s BitList 2019, the global list of digitally endangered species. The article mentions web archiving, the Internet Archive, and ArchiveTeam.

David Rosenthal reviews Supporting Web Archiving via Web Packaging on his blog and argues that it wastes bandwidth, requires publishers to make decisions that benefit aggregators, cannot safely use cookies or user storage, and has authentication risks.

Samantha AbramsAlexis AntracoliRachel AppelCelia Caust-EllenbogenSarah DenisonSumitra Duncan, and Stefanie Ramsay published an article in the American Archivist detailing a usability study of Archive-It, titled Sowing the Seeds for More Usable Web Archives: A Usability Study of Archive-It. The article synthesizes the feedback on the study and identifies areas of improvement for the Archive-it service.

Lastly, in the Journal of Contemporary Archival Studies, Gregory Wiedeman details a computer assisted approach to describing web archives by detailing an integration between ArchivesSpace and Archive-It that would automate updated description for new web crawls.

Web Archiving Roundup: December 2019

Deadline for IIPC General Assembly and Web Archiving Conference has been extended until tomorrow! Propose today!

Web Archaeology,” a special issue of TMG-Journal for Media History focuses web history and the practices of web archiving. Topics include social media and platform historiography (video on the subject), history of YouTube and informal archival practices, and Dutch articles on blogs, sustainable long-term preservation, and the history of the Netherland’s first online digital literary journal.

Archives Unleashed released twut, a library for Apache Spark that extracts tweet ids (dehydration), user information, tweet text, and tweet times.

Issue 3-4, Volume 3 of Internet Histories was published. Articles mostly focus on computational methods for analyzing internet history, the history of the dark web, internet routing, encrypted messaging, and retrospective web archiving.

LC Labs at the Library of Congress provides an update on the Computing Cultural Heritage in the Cloud Project, a pilot project to combine new technologies and the digital collections at the Library of Congress to support digital research at scale. For more information on the project, listen in to their first call on December 13, 1-2 PM EST.

The Internet Archive blog provides an update on the News Measures Research Project, a project to examine the health of local community news by analyzing the amount and type of local news coverage in a sample of community.

The UK Web Archive blog has two posts up about the Olympic Games and Militarism using the web archives as a resource.

Webrecorder used their tool Autopilot to obtain a representative sample of Yahoo Groups, which will be no longer hosted by Yahoo after December 14.

Archive-it summarizes their first Global Partners Meeting at iPRES. The blog post also talks about their talks at the Time Machine 2019 conference in Dresden, Germany and the FORCE11 2019 conference in Edinburgh, Scotland.

Do you have #webarchiving news? Tweet at us @WebArch_RT and we will be sure to feature it in next month’s roundup!

Web Archiving Roundup: November 2019

Happy World Digital Preservation Day! Check out the hashtags #WDPD2019 for how institutions are celebrating the day.

In web archiving news…

Check out the #WebArchiveWednesday hashtag on Twitter for web archiving highlights across the world.

Call for papers closes on November 16, 2019 for the first Engaging with Web Archives Conference on April 15-16, 2020 at Maynooth University Arts and Humanities Institute, Co. Kildare, Ireland.

Call for papers closes on December 1, 2019 for the International Internet Preservation Consortium (IIPC) General Assembly and the Web Archiving Conference (WAC) in Montréal, from May 11 ̶ 13, 2020.

The #metoo Digital Media Collection at the Schlesinger Library on the History of Women in America at Harvard’s Radcliffe Institute for Advanced Study is available for research.

The Expatriate Archive Centre announces their blog archive.

Web archives as a data resource for digital scholars” by Eveline Vlassenroot et al was published in the International Journal of Digital Humanities.

The Web Science and Digital Libraries Research Group at Old Dominion University has had a series of blog posts on web archiving this fall, including a summary of the CEDWARC (Continuing Education to Advance Web Archiving), interaction between search engine caches and web archives, and four part series on content drift in the Library and Archives Canada,  National Library of Ireland, Public Record Office of Northern Ireland, and WebCite. They also have launched the Dark and Stormy Archives (DSA) project, which exists to provide storytelling solutions to improve the understanding of web archive collections.

David Rosenthal reviewed the MementoMap Framework for flexible and adaptive web archive profiling.

Webrecorder announces its API and WASAPI support and discontinues its anonymous capturing services.

Archive-it announced its APIs and integrations! Visit the Archive-it help center more detailed information on implementation and documentation.

The Internet Archive also announced the Whole Earth Web Archive, a project to explore ways to improve access to the archived websites of underrepresented nations around the world, Currently, the project consists of archived web content for a sample set of 50 small nations from the Internet Archive’s web archive with a special search and access feature on top of this subcollection and a dedicated discovery portal for searching and browsing.

The Internet Archive has also put into beta testing two new features in the general Wayback Machine: “Changes” and “Collections.” The “Changes” feature allows the user to compare any two captures for additions and deletion. The “Collections” feature provides the user a list of all the collections that this URL is included in. They also expanded the “Save Page Now” tool to include outlinks.

Wikipedia and the Internet Archive have teamed up to link to digitized books in the Internet Archive collections to minimize link rot in Wikipedia’s citations.

The Consumer Product Safety Commission’s posts to Twitter are the first memes from a government agency to be added to the Library of Congress.

And lastly, a Vice article on making the case for a national register of historic places for websites.

Do you have #webarchiving news? Tweet at us @WebArch_RT and we will be sure to feature it in next month’s roundup!

Web Archiving Roundup: October 2019

The Internet Archive and Center for Open Science announce collaboration to preserve open science data, funded by the IMLS National Leadership Grants for Libraries program.

On August 12, Harvard Library Preservation Services hosted a regional meetup of Archive-it subscribers. You can learn about the meetup on the Harvard Library Communications blog.

The first workshop of the Continuing Education to Advance Web Archiving (CEDWARC) is taking place at George Washington University on October 28.  It is the first in a series of workshops to train library and archive professionals to use web archiving tools to answer research questions and enable new web archiving services based on these tools.

The Archives Unleashed Project has put out a call for participation for an Archives Unleashed Datathon at Columbia University Libraries on March 26-27, 2020. The project team invites archivists, researchers, librarians, computer scientists, and web archiving enthusiasts to join us over the course of two days to collaboratively work with web collections and explore cutting-edge research tools through hands-on experience. Proposals are due November 1, 2019.

Presentations from the Web Archiving & Data Services international partners meeting on September 20, 2019 at iPRES 2019 can be viewed here.

Call for papers opens on October 14, 2019 for the International Internet Preservation Consortium (IIPC) General Assembly and the Web Archiving Conference (WAC) in Montréal, from May 11 ̶ 13, 2020.

Article on web archiving, “Please, My Digital Archive. It’s Very Sick.” by Tanner Howard was published by the Lapham’s Quarterly on September 4, 2019.

Webrecorder had a bunch of updates! First, Webrecorder released Autopilot, a new feature that can perform actions on the current web page loaded in Webrecorder, similar to a human user: clicking buttons, scrolling down, expanding sections, and so on. Webrecorder also released a Desktop app. The app is a fully self-contained version of the online service built for a desktop environment with minimal modifications: a single-user Webrecorder that can run on a computer without requiring a connection to any centralized service. Users can now have the full web archiving capabilities of Webrecorder on their own machines without having to install Docker or use the command-line.

Finally, WABAC (Web Archive Browsing Advanced Client), a project by Webrecorder, was released. More information on the WABAC and client-side replay technology can be found in a blog post by Ilya Kreymer on the Webrecorder blog and DSHR’s blog.

Upcoming US meetups: DLF Forum & NDSA Digital Preservation, October 14-17 in Tampa, FL and Web Archiving Texas on November 6 at Baylor University.

Do you have #webarchiving news?  Tweet at us @WebArch_RT and we will be sure to feature it in next month’s roundup!


Web Archiving Section Leadership Announcement

The Web Archiving Section is excited to announce the new leadership for the 2019-2020 Steering Committee!  The leadership roster is as follows:

Chair: Emily Ward
Vice-Chair/Chair-Elect: Tori Maches
Education Coordinator: Julia Corrin
Communications Manager: Nicole Greenhouse
Secretary: Kelsey O’Connell
Student Member: Lydia Andeskie

Congratulations and best of luck to those newly elected!

Web Archiving Roundup: July, 2019

The Web Archiving and Metadata Digital Object Sections will hold a joint event during the SAA Annual Meeting in Austin, TX. Join us on Saturday, August 3rd for a debate on descriptive metadata and web archiving.

The 2019 Archive-It Partner Meeting coincides with SAA’s Annual Meeting, registration is still open.

Graphic Designer Sam Henri Gold has been archiving Apple ads from the 1970s to the present, you can take a look at the archive directly from the article.

ArchiveSpark 3.0 is now available, take a look a the updates in GitHub.

Check out this article about a High School student’s experience working for the Archives Unleashed team.

The latest issue of the Newsletter from the ESRC National Centre for Research Methods includes an article on research challenges using web archives for social research.

Registration is still open for the Specialized Data Curation Workshop hosted by the Data Curation Network at Washington University in St. Louis.

The Digital Preservation Coalition is crowd-sourcing a list of endangered digital materials. Nominations close on Friday August 30th, 2019.

Web Archiving Roundup: June, 2019

This month the Sunlight Foundation, an organization advocating for open government, published an article on the weaknesses in Federal Agencies’ archival practices.

Catch the livestream recording from the IIP Web Archiving Conference.

You can now take a look at the slides from Ian Milligan & Nick Ruest’s presentation on the Archives Unleashed Cloud Project at IIPC Web Archiving Conference.

Both paper and presentation slides on Judging Visual Correspondence in Web Archives are now available. This presentation was done by Brenda Reyes Ayala at the ACM/IEE Joint Conference on Digital Libraries (JCDL).

The Web Science and Digital Libraries Research Group at Old Dominion University has posted a trip report on the ACM/IEEE Joint Conference on Digital Libraries (JCDL).

Rhizome recently announced an opportunity for Webrecorder users and community members to financially support Webrecorder, some of the benefits include full range of tools and expanded storage for web collections.

Check out this i-D magazine article on the case for archiving our digital lives.

Web Archiving Roundup: May, 2019

UPDATE – Join the ALCTS Metadata Interest Group Meeting during ALA Annual 2019 for a presentation and Q&A on the Library of Congress Web Archiving Program on Sunday, June 23, 2019, 9:00-10:00AM at the Marriott Marquis.

Now accepting nominations for the SAA Web Archiving Section’s 2019-2020 Steering Committee: https://www2.archivists.org/groups/web-archiving-section/now-accepting-2019-2020-steering-committee-nominations.

Registration for the IIPC Web Archiving Conference ends May 24. The conference will be hosted by the National and University Library of Croatia in Zagreb, which coincides with the 15h anniversary of the Croatian Web Archive (HAW).

For members of the Digital Preservation Coalition, the DPC Web Archiving & Preservation Task Force is inviting delegates to a meeting on July 18, in London. The meeting is free for DPC members, registration ends July 10.

IIPC Content Development Group is asking for contributions to their Climate Change Collection, and their Artificial Intelligence Collection.

Ben Els, Digital Curator at the National Library of Luxembourg, gives us a glimpse not the effort to capture the Luxembourg elections.

Seth Denbo, Director of Scholarly Communication and Digital Initiatives at the American Historical Association, strikes a cord on the challenges of scale in an article titled Data Overload.

The Atlantic has an article on the implication of AI vacuum cleaners from tech companies.

You can now read the paper presented at the 2018 World Library and Information Congress by the Library of Congress, the paper is titled Institutions as Social Media Collector: Lessons Learned from the Library of Congress.

The National Library of the Netherlands has recently launched a collection of archived websites from the Chinese Community in the Netherlands.

ECAL (École cantonale d’art de Lausanne) has launched a website called Information Mesh celebrating the 30th anniversary of the World Wide Web.