Web Archiving Roundup: August 2020

To start: Join us August 25 at 4:30 PM EST for the Web Archiving Section Meeting! Since March, archivists and information professionals have been focused on documenting COVID-19 and its effects on their communities. Web archiving is at the forefront of the documentation effort. The meeting will consist of a guided discussion focusing on the methods in which archivists go about creating spontaneous and event-based collections, considering all aspects of the web archiving lifecycle, from collection development to scoping to description and access efforts. We are most interested in hearing about collecting frameworks that your institution is working on, so come discuss with us! We also will do some light section business and talk about elections. Registration link here: zoom.us/meeting/register/…

On to other news:

Publications from Stanford University Press’s digital initiative now have nearly complete web-archive versions thanks to a 2020 partnership with Webrecorder. The blog post goes into detail on technical specs, specifically ReplayWeb.page and WACZ. Webrecorder also launched a forum for discussions and announcements about the software.

Rhizome’s Conifer introduces Periphery, a tool for collection owners to define how missing resources are expressed during the replay of archived content.

The “Archiving the Black Web” project was started at the African-American Research Library and Cultural Center in Fort Lauderdale.

The Joint Conference on Digital Libraries took place August 1-5 and all of  their papers are now available online. Talks related to web archiving include, Making Recommendations from Web Archives for, The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives, Identifying Documents In-Scope of a Collection from Web Archives, and  The Case For Alternative Web Archival Formats To Expedite The Data-To-Insight Cycle.

From IIPC: A descriptionof the new Robustify link service from Memento and an overview of the Danish coronavirus web collection.

Archives Unleashed was also busy! They released their 2017-2020 community report and announced their partnership with Archive-it to collaborate and integrate services to provide easy to use scalable tools to researchers using web archives.

See ya on Tuesday!





Web Archiving Roundup: July 2020

Time for our monthly round-up! Hope everyone is staying well and safe!

From IIPC: The Content Development Group reports on their 2019-2020 work, including the COVID-19 and Climate Change collections; Library and Archives Canada reports on their COVID-19 collecting; and what’s new from the Croatian Web Archive. IIPC is also hosting two upcoming webinars on visualizing web archives and Jupyter Notebooks.

Twitter got harder to capture :/

Archive-it announced their virtual partner meeting on October 7, as well as announced new staff, and other updates. Their next open call is July 29.

It’s all available online…until it’s not!

Video and article exploring base memes (the original meme a derivative meme is based off of) in web archives.

Only one article this month from Internet Histories: Forensic approaches to evaluating primary sources in internet history research: reconstructing early Web-based archival work (1989–1996) by James A. Hodges.

And a summary of a study reviewing WARC validation tools.

SAA is meeting virtually August 3-7!

Web Archiving Roundup: June 2020

Documenting the Now published its Archivists Supporting Activists list, a call to action to archivist and memory workers to help support activists documenting violence by police toward black people.

The IIPC blog was busy this month: National Széchényi Library reports back on another year of web archiving on the blog; summaries of the National Library of LuxembourgNational Library of New Zealand, Bibliothèque et Archives nationales du Québec, and the Bibliothèque nationale de France‘s coronavirus collecting; a blog post on the future of playback, specifically discussing pyweb and OpenWayback; announcement of the IIPC training program; discussion of Glam Workbench, which provides researchers with examples, tools, and documentation to help them explore and use the online collections of libraries, archives, and museums; exploration of previous annual meetings since this year’s meeting was cancelled; and reflections of training new web archivists.

IIPC also opened their discretionary funding program, proposals are due September 15.

The UK Web Archive blog discussed their2019 UK General Election web archives; case study on using Webrecorder for UK politician’s social media pages; and WARCnet— “a network is to promote high-quality national and transnational research” of web domains and events on the web.

Call for contributions for the 4th RESAW Conference on June 17-18, 2021 on mainstream vs marginal content in Web history and Web archives due October 15, 2020.

Webrecorder changed its named to Conifer!

Internet Histories has articles on international internet governance in the 1990s; selective disembodiment in the early Wired magazine; history of cyberactivism in Brazil; efficacy of civil society in global internet governance; and Weibo.

https://replayweb.page/ launched, which allows for replay of archived websites in your browser.

Announcement of version 0.80.0 of the Archives Unleashed Toolkit and revamped user documentation.

Archive-it shared the results of their 2020 State of the WARC survey.


Web Archiving Roundup: May 2020

Thank you to everyone who sent me their COVID-19 web archiving projects. You can view the list in our last blog post. I am still collecting submissions, so feel free to shoot me an email at nmg266 at nyu dot edu.

On to the roundup:

From the IIPC blog, the folks at Bibliotheca Alexandrina (BA) and the National Library of New Zealand (NLNZ)  share an update on their tool for scalable web archive visualization: LinkGate.

Archives Unleashed released their Spring newsletter! The newsletter features information about the Archives Unleashed Toolkit,  Cloud, and Notebooks; a summary of the remote New York datathon; web archiving articles and resources for use during COVID-19 pandemic; and other presentations from the team.

Archive-it and NYARC share the national forum report on Advancing Art Libraries and Curated Web Archives including the recording of 2020 ARLIS/NA Web Archiving SIG Meeting.

Archive-it has also wrote a number of blog posts and updates, given the uptick in web archiving due to the pandemic. These updates give information on special pricing and cost sharing, resources, community news on COVID-19 web archiving projects, as well as introductory programming for people new to web archiving.

The British Library releases its Brexit collection, as well as describes their work on COVID-19 collecting and how to use the collections while the reading room is closed.

There are a number of new articles on using web archives as primary resources, including Helena Byrne on using web archives to review football history, Will Mari on the racial, gendered, and class-based origins of the early internet, and Michael Stevenson & Anne Helmond on legacy systems.

The Digital Preservation Coalition has a number of posts related to web archiving and the coronavirus pandemic, including If These WARCs Could Talk: Learning from Archived Web & Social Media Covid-19 Collections and Capturing the UK Government Response to the Coronavirus (COVID-19) Pandemic at The National Archives UK. There is also a nice write up on how researchers use the archived web.

I presented at METRO with Mark Graham (Internet Archive), Gary Price (INFOdocket), and Alexander Thurman (Columbia University) on Documenting the Present Moment.

Is the WARC the best web archiving format?

The Library of Congress web archiving team celebrated its 20th anniversary! They also posted an interview with a retiring web archivist, Gina Jones. The LoC web archiving team also was featured in the New York Times.

Presentations from the WARCNet kickoff meeting can be found on the WARCNet website. The aim of the WARCnet network is to promote high-quality national and transnational research that will help us to understand the history of (trans)national web domains and of transnational events on the web, drawing on the increasingly important digital cultural heritage held in national web archives.

Version 2.3 of Social Feed Manager was released!

The International Journal of Digital Humanities is putting out a call for a special issue on digital humanities and web archives. Abstracts are due June 1, 2020.

Web Archiving and Digital Libraries workshop as part of JCDL 2020 is looking for submissions. Submissions are due on June 6, 2020.

The race to save the first draft of coronavirus history from internet oblivion in the MIT Technology Review.

Wow, that was a lot. Thanks to my colleague Amy for helping me with this roundup. Stay healthy everyone!

Web Archiving Round-up: COVID-19 Edition

As part of an initiative of the section I have worked to compile a list of projects started by archivists and librarians across the country in response to the COVID-19 pandemic. As of this writing, over 40 institutions have submitted their work to to the spreadsheet.  It ranges from international efforts to capture websites associated with COVID-19 to localized efforts from institutions across the United States and Canada. Archivists and librarians have used many different tools to do this work, including Archive-it, Webrecorder, wget, twarc, and other custom tools. The work that these folx have done is both important and emotionally challenging and I want to acknowledge this labor. People are powering this effort, as much as we think this work is automatic, it involves a significant amount of human labor to select, appraise, assess the websites for quality assurance, and adequately describe these materials. I also want to acknowledge all essential workers during this time of crisis, without them, I would not be able to continue to do this work from the safety of my home in New York.

Click here to view the COVID-19 Web Archives Spreadsheet.

As of right now, the spreadsheet is only open to comment. Do you have a collecting project that you want to be included in the spreadsheet? Shoot me an email at nicole dot greenhouse at nyu dot edu.

Stay safe, stay healthy.


Web Archiving Roundup: March 2020

In addition to the IIPC’s collaborative collecting project on the coronavirus, send me your institutionally curated COVID-19 web archiving projects. I will compile them and post them to the section website. My email address is nmg266 at nyu dot edu.

We are also looking for proposals for participation in the Web Archiving Section’s annual section meeting on theme of “web archiving and rights.”

Multiple web archiving events have been postponed due to the coronavirus. Archives Unleashed has changed to a remote-only event. The IIPC Web Archiving Conference and General Assembly has been postponed until September 28-30, 2020. Engaging with Web Archives is also postponed.

Archive-it is hosting a webinar on April 1 on the Wayback API. And the Community Webs project received additional funding! Other Archive-it news can be found on their blog.

In addition to the establishment of the coronavirus web archive, the IIPC has multiple announcements, including the newly elected steering committee member organizations and a published address from IIPC Chair on IIPC funded projects, consortium agreement renewals, collaboration with CLIR,  training materials, and collaborative collecting. The IIPC blog also featured a guest post from the National Library of Australia on their migration to Python Wayback.

New version of MemGator was released!

Happy 15th birthday to the UK Web Archive!

And lastly, here is a GitHub archiving individual stories from Wuhan during the coronavirus pandemic.

Stay home, wash your hands, stay healthy, and archive websites!

Web Archiving Roundup: February 2020

We are co-hosting a webinar with Archive-it on web archiving collection development policies and goals on February 18. Join us!

Archives Unleashed releases new features to TWUT and notebooks for working with the Archives Unleashed toolkit. Additionally, they also released a preprint of “The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives” by Nick Ruest, Jimmy Lin, Ian Milligan, and Samantha Fritz, a report of their progress over the last three years working on the project. Their findings include the ability to generate “derivative products” so users can manipulate web archives data just like any other dataset without specialized knowledge. Other Archives Unleashed news can be found in their newsletter.

IIPC announced theirsteering committee election nomination statements and featured a blog post from the Royal Library of Belgium on potentially setting up their own web archives as part of the PROMISE project.

The DIGHUMLAB announces the WARCnet Kickoff meeting from May 4-6, 2020 at Aarhus, Denmark. WARCnet is a transnational web archives studies network and this event will be the first in a number of conferences focused on the study of national and transnational web domains and events using web archives.

And here is a handy blog post by Jeff Huang on how best to create a website that is easily archivable, titled “This Page is Designed to Last.”

Web Archiving Roundup: January 2020

The Library of Congress Web Archiving Team published a blog post announcing the release of 4,258 new web archives across 97 event and thematic collections. The blog post also details how they made their web resources more discoverable through adding collection pages and adding more description on the collection level.

Niels Kerssens published an article in Internet Histories on legacies in European internet history, specifically Euronet.

Carnegie Hall Archives reported on their work archiving the Carnegie Hall’s websites using Webrecorder. Currently, they have captured the legacy version of the Carnegie Hall Performance History Search and the Games and Listening Guides websites. Webrecorder also published new tutorial videos.

The Boston Globe published an article about the Digital Preservation Coalition’s BitList 2019, the global list of digitally endangered species. The article mentions web archiving, the Internet Archive, and ArchiveTeam.

David Rosenthal reviews Supporting Web Archiving via Web Packaging on his blog and argues that it wastes bandwidth, requires publishers to make decisions that benefit aggregators, cannot safely use cookies or user storage, and has authentication risks.

Samantha AbramsAlexis AntracoliRachel AppelCelia Caust-EllenbogenSarah DenisonSumitra Duncan, and Stefanie Ramsay published an article in the American Archivist detailing a usability study of Archive-It, titled Sowing the Seeds for More Usable Web Archives: A Usability Study of Archive-It. The article synthesizes the feedback on the study and identifies areas of improvement for the Archive-it service.

Lastly, in the Journal of Contemporary Archival Studies, Gregory Wiedeman details a computer assisted approach to describing web archives by detailing an integration between ArchivesSpace and Archive-It that would automate updated description for new web crawls.

Web Archiving Roundup: December 2019

Deadline for IIPC General Assembly and Web Archiving Conference has been extended until tomorrow! Propose today!

Web Archaeology,” a special issue of TMG-Journal for Media History focuses web history and the practices of web archiving. Topics include social media and platform historiography (video on the subject), history of YouTube and informal archival practices, and Dutch articles on blogs, sustainable long-term preservation, and the history of the Netherland’s first online digital literary journal.

Archives Unleashed released twut, a library for Apache Spark that extracts tweet ids (dehydration), user information, tweet text, and tweet times.

Issue 3-4, Volume 3 of Internet Histories was published. Articles mostly focus on computational methods for analyzing internet history, the history of the dark web, internet routing, encrypted messaging, and retrospective web archiving.

LC Labs at the Library of Congress provides an update on the Computing Cultural Heritage in the Cloud Project, a pilot project to combine new technologies and the digital collections at the Library of Congress to support digital research at scale. For more information on the project, listen in to their first call on December 13, 1-2 PM EST.

The Internet Archive blog provides an update on the News Measures Research Project, a project to examine the health of local community news by analyzing the amount and type of local news coverage in a sample of community.

The UK Web Archive blog has two posts up about the Olympic Games and Militarism using the web archives as a resource.

Webrecorder used their tool Autopilot to obtain a representative sample of Yahoo Groups, which will be no longer hosted by Yahoo after December 14.

Archive-it summarizes their first Global Partners Meeting at iPRES. The blog post also talks about their talks at the Time Machine 2019 conference in Dresden, Germany and the FORCE11 2019 conference in Edinburgh, Scotland.

Do you have #webarchiving news? Tweet at us @WebArch_RT and we will be sure to feature it in next month’s roundup!

Web Archiving Roundup: November 2019

Happy World Digital Preservation Day! Check out the hashtags #WDPD2019 for how institutions are celebrating the day.

In web archiving news…

Check out the #WebArchiveWednesday hashtag on Twitter for web archiving highlights across the world.

Call for papers closes on November 16, 2019 for the first Engaging with Web Archives Conference on April 15-16, 2020 at Maynooth University Arts and Humanities Institute, Co. Kildare, Ireland.

Call for papers closes on December 1, 2019 for the International Internet Preservation Consortium (IIPC) General Assembly and the Web Archiving Conference (WAC) in Montréal, from May 11 ̶ 13, 2020.

The #metoo Digital Media Collection at the Schlesinger Library on the History of Women in America at Harvard’s Radcliffe Institute for Advanced Study is available for research.

The Expatriate Archive Centre announces their blog archive.

Web archives as a data resource for digital scholars” by Eveline Vlassenroot et al was published in the International Journal of Digital Humanities.

The Web Science and Digital Libraries Research Group at Old Dominion University has had a series of blog posts on web archiving this fall, including a summary of the CEDWARC (Continuing Education to Advance Web Archiving), interaction between search engine caches and web archives, and four part series on content drift in the Library and Archives Canada,  National Library of Ireland, Public Record Office of Northern Ireland, and WebCite. They also have launched the Dark and Stormy Archives (DSA) project, which exists to provide storytelling solutions to improve the understanding of web archive collections.

David Rosenthal reviewed the MementoMap Framework for flexible and adaptive web archive profiling.

Webrecorder announces its API and WASAPI support and discontinues its anonymous capturing services.

Archive-it announced its APIs and integrations! Visit the Archive-it help center more detailed information on implementation and documentation.

The Internet Archive also announced the Whole Earth Web Archive, a project to explore ways to improve access to the archived websites of underrepresented nations around the world, Currently, the project consists of archived web content for a sample set of 50 small nations from the Internet Archive’s web archive with a special search and access feature on top of this subcollection and a dedicated discovery portal for searching and browsing.

The Internet Archive has also put into beta testing two new features in the general Wayback Machine: “Changes” and “Collections.” The “Changes” feature allows the user to compare any two captures for additions and deletion. The “Collections” feature provides the user a list of all the collections that this URL is included in. They also expanded the “Save Page Now” tool to include outlinks.

Wikipedia and the Internet Archive have teamed up to link to digitized books in the Internet Archive collections to minimize link rot in Wikipedia’s citations.

The Consumer Product Safety Commission’s posts to Twitter are the first memes from a government agency to be added to the Library of Congress.

And lastly, a Vice article on making the case for a national register of historic places for websites.

Do you have #webarchiving news? Tweet at us @WebArch_RT and we will be sure to feature it in next month’s roundup!