Web Archiving Roundup: November 2020

Welcome to a new year with the Web Archiving Section! 

The new section Steering Committee met for the first time earlier this month and we’re all very excited about the coming year! One of our goals is to make the blog more active. We’d like to invite you to participate by submitting news, announcements, and other topics of interest. If you have a topic that you’d like to expand on, we’re also looking for guest contributors. Please send items and suggestions to April Feldman. Hope to hear from you soon.

Since we’ve all shared our “brief introductions” with the section via our candidate statements we’ve decided not to repeat ourselves. Instead, in this post, we’d like to introduce all of you to our favorite web collections. Our only requirement was that it be a collection we’ve personally worked on.

Tori Maches, Digital Archivist, UC San Diego

My favorite UCSD web archive collection is our San Diego Local Governments collection. It’s one of our oldest collections, dating back to 2007, and covers local municipal and government agency websites, as well as websites related to local government activity. I grew up in San Diego, and remember interacting with some of these websites when they looked the way they did in the earliest crawls. Recontextualizing websites you used in high school as historical objects is a weird experience, and it’s a good reminder of how quickly things change online.

Melissa Wertheimer, Music Reference Specialist, Library of Congress

I’m excited to share the LC Commissioned Composers Web Archive. This was my first venture with web archives, and I’ve been curating it since 2018. I’m a flutist who specializes in contemporary repertoire, so the topic felt natural! I think web archives are perfect for a digital record of the Music Division’s active commissioning of living jazz and classical composers. This collection is a resource in itself, but also leads users to our unique collection materials through abstracts with links to finding aids for composers’ papers and online catalog records for commissions’ manuscripts and electronic files.

Ryder Kouba, Collections Archivist, University of Hong Kong (Pok Fu Lam)

The Egyptian Politics and Revolution Collection (started by Stephen Urgola and Carolyn Runyon) documented Egyptian politics from 2011-present day (though politics have been dead since 2014 or so). As many archivists have written about collecting in the US, concerns over privacy were paramount, though difficult in a (for a time) fluid situation. Website blocking and censorship also made our work more difficult in finding content, but more important in providing access to blocked websites through the Wayback Machine (which was temporarily blocked in Egypt, logically we had concerns over exactly why the government decided to reverse that decision).

April Feldman, University Archivist, California State University, Northridge

We don’t have an active web-archiving program per se, so I haven’t worked on a favorite collection yet. We’re trying to implement a starter program, something small and scalable. In the meantime, we’ve just started using the Wayback Machine to capture limited Cal State Northridge websites and Wakelet for websites and social media posts related to CSUN’s Covid-19 response. I’m a project team of one right now, trying to figure it out as I go (love that OJT!) I’m completely open to suggestions or tales of woe if anyone wants to share.

Kiera Sullivan, University Records Processor, UC San Diego

The collection that I would like to highlight is the UC San Diego Web Archives collection. This is perhaps unsurprising, considering my role in the University Archives! This large collection captures a wealth of information from websites across the campus administration and community. Part of what I love about this collection is that although it is already extensive, there is a ton of room to grow and refine. This year, I will be working on identifying gaps in our collecting of student and community content, scoping crawls to address those gaps, and also improving the existing metadata across the collection.

Allison Fischbach, Research and Archives Associate, Towson University / MLIS student in Archives and Digital Curation, University of Maryland iSchool

The collection I am most proud of is our COVID-19 University Response Collection. This is the first web collection I have curated, and it is especially precinct as the pandemic continues. What I like about this collection is that it includes both valuable information about how institutional operations responded to COVID-19, as well as how the community continues to enact positive change. It is a collection that continues to grow, and one I know will have monumental value to future users. I feel fortunate to be able to work with it as a student archivist.

Sponsored Post Learn from the experts: Create a successful blog with our brand new courseThe WordPress.com Blog

WordPress.com is excited to announce our newest offering: a course just for beginning bloggers where you’ll learn everything you need to know about blogging from the most trusted experts in the industry. We have helped millions of blogs get up and running, we know what works, and we want you to to know everything we know. This course provides all the fundamental skills and inspiration you need to get your blog started, an interactive community forum, and content updated annually.

Web Archiving Roundup: August 2020

To start: Join us August 25 at 4:30 PM EST for the Web Archiving Section Meeting! Since March, archivists and information professionals have been focused on documenting COVID-19 and its effects on their communities. Web archiving is at the forefront of the documentation effort. The meeting will consist of a guided discussion focusing on the methods in which archivists go about creating spontaneous and event-based collections, considering all aspects of the web archiving lifecycle, from collection development to scoping to description and access efforts. We are most interested in hearing about collecting frameworks that your institution is working on, so come discuss with us! We also will do some light section business and talk about elections. Registration link here: zoom.us/meeting/register/…

On to other news:

Publications from Stanford University Press’s digital initiative now have nearly complete web-archive versions thanks to a 2020 partnership with Webrecorder. The blog post goes into detail on technical specs, specifically ReplayWeb.page and WACZ. Webrecorder also launched a forum for discussions and announcements about the software.

Rhizome’s Conifer introduces Periphery, a tool for collection owners to define how missing resources are expressed during the replay of archived content.

The “Archiving the Black Web” project was started at the African-American Research Library and Cultural Center in Fort Lauderdale.

The Joint Conference on Digital Libraries took place August 1-5 and all of  their papers are now available online. Talks related to web archiving include, Making Recommendations from Web Archives for, The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives, Identifying Documents In-Scope of a Collection from Web Archives, and  The Case For Alternative Web Archival Formats To Expedite The Data-To-Insight Cycle.

From IIPC: A descriptionof the new Robustify link service from Memento and an overview of the Danish coronavirus web collection.

Archives Unleashed was also busy! They released their 2017-2020 community report and announced their partnership with Archive-it to collaborate and integrate services to provide easy to use scalable tools to researchers using web archives.

See ya on Tuesday!





Web Archiving Roundup: July 2020

Time for our monthly round-up! Hope everyone is staying well and safe!

From IIPC: The Content Development Group reports on their 2019-2020 work, including the COVID-19 and Climate Change collections; Library and Archives Canada reports on their COVID-19 collecting; and what’s new from the Croatian Web Archive. IIPC is also hosting two upcoming webinars on visualizing web archives and Jupyter Notebooks.

Twitter got harder to capture :/

Archive-it announced their virtual partner meeting on October 7, as well as announced new staff, and other updates. Their next open call is July 29.

It’s all available online…until it’s not!

Video and article exploring base memes (the original meme a derivative meme is based off of) in web archives.

Only one article this month from Internet Histories: Forensic approaches to evaluating primary sources in internet history research: reconstructing early Web-based archival work (1989–1996) by James A. Hodges.

And a summary of a study reviewing WARC validation tools.

SAA is meeting virtually August 3-7!

Web Archiving Roundup: June 2020

Documenting the Now published its Archivists Supporting Activists list, a call to action to archivist and memory workers to help support activists documenting violence by police toward black people.

The IIPC blog was busy this month: National Széchényi Library reports back on another year of web archiving on the blog; summaries of the National Library of LuxembourgNational Library of New Zealand, Bibliothèque et Archives nationales du Québec, and the Bibliothèque nationale de France‘s coronavirus collecting; a blog post on the future of playback, specifically discussing pyweb and OpenWayback; announcement of the IIPC training program; discussion of Glam Workbench, which provides researchers with examples, tools, and documentation to help them explore and use the online collections of libraries, archives, and museums; exploration of previous annual meetings since this year’s meeting was cancelled; and reflections of training new web archivists.

IIPC also opened their discretionary funding program, proposals are due September 15.

The UK Web Archive blog discussed their2019 UK General Election web archives; case study on using Webrecorder for UK politician’s social media pages; and WARCnet— “a network is to promote high-quality national and transnational research” of web domains and events on the web.

Call for contributions for the 4th RESAW Conference on June 17-18, 2021 on mainstream vs marginal content in Web history and Web archives due October 15, 2020.

Webrecorder changed its named to Conifer!

Internet Histories has articles on international internet governance in the 1990s; selective disembodiment in the early Wired magazine; history of cyberactivism in Brazil; efficacy of civil society in global internet governance; and Weibo.

https://replayweb.page/ launched, which allows for replay of archived websites in your browser.

Announcement of version 0.80.0 of the Archives Unleashed Toolkit and revamped user documentation.

Archive-it shared the results of their 2020 State of the WARC survey.


Web Archiving Roundup: May 2020

Thank you to everyone who sent me their COVID-19 web archiving projects. You can view the list in our last blog post. I am still collecting submissions, so feel free to shoot me an email at nmg266 at nyu dot edu.

On to the roundup:

From the IIPC blog, the folks at Bibliotheca Alexandrina (BA) and the National Library of New Zealand (NLNZ)  share an update on their tool for scalable web archive visualization: LinkGate.

Archives Unleashed released their Spring newsletter! The newsletter features information about the Archives Unleashed Toolkit,  Cloud, and Notebooks; a summary of the remote New York datathon; web archiving articles and resources for use during COVID-19 pandemic; and other presentations from the team.

Archive-it and NYARC share the national forum report on Advancing Art Libraries and Curated Web Archives including the recording of 2020 ARLIS/NA Web Archiving SIG Meeting.

Archive-it has also wrote a number of blog posts and updates, given the uptick in web archiving due to the pandemic. These updates give information on special pricing and cost sharing, resources, community news on COVID-19 web archiving projects, as well as introductory programming for people new to web archiving.

The British Library releases its Brexit collection, as well as describes their work on COVID-19 collecting and how to use the collections while the reading room is closed.

There are a number of new articles on using web archives as primary resources, including Helena Byrne on using web archives to review football history, Will Mari on the racial, gendered, and class-based origins of the early internet, and Michael Stevenson & Anne Helmond on legacy systems.

The Digital Preservation Coalition has a number of posts related to web archiving and the coronavirus pandemic, including If These WARCs Could Talk: Learning from Archived Web & Social Media Covid-19 Collections and Capturing the UK Government Response to the Coronavirus (COVID-19) Pandemic at The National Archives UK. There is also a nice write up on how researchers use the archived web.

I presented at METRO with Mark Graham (Internet Archive), Gary Price (INFOdocket), and Alexander Thurman (Columbia University) on Documenting the Present Moment.

Is the WARC the best web archiving format?

The Library of Congress web archiving team celebrated its 20th anniversary! They also posted an interview with a retiring web archivist, Gina Jones. The LoC web archiving team also was featured in the New York Times.

Presentations from the WARCNet kickoff meeting can be found on the WARCNet website. The aim of the WARCnet network is to promote high-quality national and transnational research that will help us to understand the history of (trans)national web domains and of transnational events on the web, drawing on the increasingly important digital cultural heritage held in national web archives.

Version 2.3 of Social Feed Manager was released!

The International Journal of Digital Humanities is putting out a call for a special issue on digital humanities and web archives. Abstracts are due June 1, 2020.

Web Archiving and Digital Libraries workshop as part of JCDL 2020 is looking for submissions. Submissions are due on June 6, 2020.

The race to save the first draft of coronavirus history from internet oblivion in the MIT Technology Review.

Wow, that was a lot. Thanks to my colleague Amy for helping me with this roundup. Stay healthy everyone!

Web Archiving Round-up: COVID-19 Edition

As part of an initiative of the section I have worked to compile a list of projects started by archivists and librarians across the country in response to the COVID-19 pandemic. As of this writing, over 40 institutions have submitted their work to to the spreadsheet.  It ranges from international efforts to capture websites associated with COVID-19 to localized efforts from institutions across the United States and Canada. Archivists and librarians have used many different tools to do this work, including Archive-it, Webrecorder, wget, twarc, and other custom tools. The work that these folx have done is both important and emotionally challenging and I want to acknowledge this labor. People are powering this effort, as much as we think this work is automatic, it involves a significant amount of human labor to select, appraise, assess the websites for quality assurance, and adequately describe these materials. I also want to acknowledge all essential workers during this time of crisis, without them, I would not be able to continue to do this work from the safety of my home in New York.

Click here to view the COVID-19 Web Archives Spreadsheet.

As of right now, the spreadsheet is only open to comment. Do you have a collecting project that you want to be included in the spreadsheet? Shoot me an email at nicole dot greenhouse at nyu dot edu.

Stay safe, stay healthy.


Web Archiving Roundup: March 2020

In addition to the IIPC’s collaborative collecting project on the coronavirus, send me your institutionally curated COVID-19 web archiving projects. I will compile them and post them to the section website. My email address is nmg266 at nyu dot edu.

We are also looking for proposals for participation in the Web Archiving Section’s annual section meeting on theme of “web archiving and rights.”

Multiple web archiving events have been postponed due to the coronavirus. Archives Unleashed has changed to a remote-only event. The IIPC Web Archiving Conference and General Assembly has been postponed until September 28-30, 2020. Engaging with Web Archives is also postponed.

Archive-it is hosting a webinar on April 1 on the Wayback API. And the Community Webs project received additional funding! Other Archive-it news can be found on their blog.

In addition to the establishment of the coronavirus web archive, the IIPC has multiple announcements, including the newly elected steering committee member organizations and a published address from IIPC Chair on IIPC funded projects, consortium agreement renewals, collaboration with CLIR,  training materials, and collaborative collecting. The IIPC blog also featured a guest post from the National Library of Australia on their migration to Python Wayback.

New version of MemGator was released!

Happy 15th birthday to the UK Web Archive!

And lastly, here is a GitHub archiving individual stories from Wuhan during the coronavirus pandemic.

Stay home, wash your hands, stay healthy, and archive websites!

Web Archiving Roundup: February 2020

We are co-hosting a webinar with Archive-it on web archiving collection development policies and goals on February 18. Join us!

Archives Unleashed releases new features to TWUT and notebooks for working with the Archives Unleashed toolkit. Additionally, they also released a preprint of “The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives” by Nick Ruest, Jimmy Lin, Ian Milligan, and Samantha Fritz, a report of their progress over the last three years working on the project. Their findings include the ability to generate “derivative products” so users can manipulate web archives data just like any other dataset without specialized knowledge. Other Archives Unleashed news can be found in their newsletter.

IIPC announced theirsteering committee election nomination statements and featured a blog post from the Royal Library of Belgium on potentially setting up their own web archives as part of the PROMISE project.

The DIGHUMLAB announces the WARCnet Kickoff meeting from May 4-6, 2020 at Aarhus, Denmark. WARCnet is a transnational web archives studies network and this event will be the first in a number of conferences focused on the study of national and transnational web domains and events using web archives.

And here is a handy blog post by Jeff Huang on how best to create a website that is easily archivable, titled “This Page is Designed to Last.”

Web Archiving Roundup: January 2020

The Library of Congress Web Archiving Team published a blog post announcing the release of 4,258 new web archives across 97 event and thematic collections. The blog post also details how they made their web resources more discoverable through adding collection pages and adding more description on the collection level.

Niels Kerssens published an article in Internet Histories on legacies in European internet history, specifically Euronet.

Carnegie Hall Archives reported on their work archiving the Carnegie Hall’s websites using Webrecorder. Currently, they have captured the legacy version of the Carnegie Hall Performance History Search and the Games and Listening Guides websites. Webrecorder also published new tutorial videos.

The Boston Globe published an article about the Digital Preservation Coalition’s BitList 2019, the global list of digitally endangered species. The article mentions web archiving, the Internet Archive, and ArchiveTeam.

David Rosenthal reviews Supporting Web Archiving via Web Packaging on his blog and argues that it wastes bandwidth, requires publishers to make decisions that benefit aggregators, cannot safely use cookies or user storage, and has authentication risks.

Samantha AbramsAlexis AntracoliRachel AppelCelia Caust-EllenbogenSarah DenisonSumitra Duncan, and Stefanie Ramsay published an article in the American Archivist detailing a usability study of Archive-It, titled Sowing the Seeds for More Usable Web Archives: A Usability Study of Archive-It. The article synthesizes the feedback on the study and identifies areas of improvement for the Archive-it service.

Lastly, in the Journal of Contemporary Archival Studies, Gregory Wiedeman details a computer assisted approach to describing web archives by detailing an integration between ArchivesSpace and Archive-It that would automate updated description for new web crawls.

Web Archiving Roundup: December 2019

Deadline for IIPC General Assembly and Web Archiving Conference has been extended until tomorrow! Propose today!

Web Archaeology,” a special issue of TMG-Journal for Media History focuses web history and the practices of web archiving. Topics include social media and platform historiography (video on the subject), history of YouTube and informal archival practices, and Dutch articles on blogs, sustainable long-term preservation, and the history of the Netherland’s first online digital literary journal.

Archives Unleashed released twut, a library for Apache Spark that extracts tweet ids (dehydration), user information, tweet text, and tweet times.

Issue 3-4, Volume 3 of Internet Histories was published. Articles mostly focus on computational methods for analyzing internet history, the history of the dark web, internet routing, encrypted messaging, and retrospective web archiving.

LC Labs at the Library of Congress provides an update on the Computing Cultural Heritage in the Cloud Project, a pilot project to combine new technologies and the digital collections at the Library of Congress to support digital research at scale. For more information on the project, listen in to their first call on December 13, 1-2 PM EST.

The Internet Archive blog provides an update on the News Measures Research Project, a project to examine the health of local community news by analyzing the amount and type of local news coverage in a sample of community.

The UK Web Archive blog has two posts up about the Olympic Games and Militarism using the web archives as a resource.

Webrecorder used their tool Autopilot to obtain a representative sample of Yahoo Groups, which will be no longer hosted by Yahoo after December 14.

Archive-it summarizes their first Global Partners Meeting at iPRES. The blog post also talks about their talks at the Time Machine 2019 conference in Dresden, Germany and the FORCE11 2019 conference in Edinburgh, Scotland.

Do you have #webarchiving news? Tweet at us @WebArch_RT and we will be sure to feature it in next month’s roundup!