Web Archiving Roundup: October 2019

The Internet Archive and Center for Open Science announce collaboration to preserve open science data, funded by the IMLS National Leadership Grants for Libraries program.

On August 12, Harvard Library Preservation Services hosted a regional meetup of Archive-it subscribers. You can learn about the meetup on the Harvard Library Communications blog.

The first workshop of the Continuing Education to Advance Web Archiving (CEDWARC) is taking place at George Washington University on October 28.  It is the first in a series of workshops to train library and archive professionals to use web archiving tools to answer research questions and enable new web archiving services based on these tools.

The Archives Unleashed Project has put out a call for participation for an Archives Unleashed Datathon at Columbia University Libraries on March 26-27, 2020. The project team invites archivists, researchers, librarians, computer scientists, and web archiving enthusiasts to join us over the course of two days to collaboratively work with web collections and explore cutting-edge research tools through hands-on experience. Proposals are due November 1, 2019.

Presentations from the Web Archiving & Data Services international partners meeting on September 20, 2019 at iPRES 2019 can be viewed here.

Call for papers opens on October 14, 2019 for the International Internet Preservation Consortium (IIPC) General Assembly and the Web Archiving Conference (WAC) in Montréal, from May 11 ̶ 13, 2020.

Article on web archiving, “Please, My Digital Archive. It’s Very Sick.” by Tanner Howard was published by the Lapham’s Quarterly on September 4, 2019.

Webrecorder had a bunch of updates! First, Webrecorder released Autopilot, a new feature that can perform actions on the current web page loaded in Webrecorder, similar to a human user: clicking buttons, scrolling down, expanding sections, and so on. Webrecorder also released a Desktop app. The app is a fully self-contained version of the online service built for a desktop environment with minimal modifications: a single-user Webrecorder that can run on a computer without requiring a connection to any centralized service. Users can now have the full web archiving capabilities of Webrecorder on their own machines without having to install Docker or use the command-line.

Finally, WABAC (Web Archive Browsing Advanced Client), a project by Webrecorder, was released. More information on the WABAC and client-side replay technology can be found in a blog post by Ilya Kreymer on the Webrecorder blog and DSHR’s blog.

Upcoming US meetups: DLF Forum & NDSA Digital Preservation, October 14-17 in Tampa, FL and Web Archiving Texas on November 6 at Baylor University.

Do you have #webarchiving news?  Tweet at us @WebArch_RT and we will be sure to feature it in next month’s roundup!

 

Web Archiving Roundup: April, 2019

The Library of Congress Digital Collections Development Coordinator positioncloses soon (May 1st)!

Archive-It is hosting a training webinar on Web Archiving Systems API (WASAPI). Zoom in on Wednesday, May 29that 11am Pacific, 2pm Eastern.

This one came through my Archive-It listserv this morning; a webinar about WASAPI on May 29:  https://support.archive-it.org/hc/en-us/community/posts/360043527372-New-training-webinar-The-Web-Archiving-Systems-API-WASAPI-.Register here.

Registration for the Society of American Archivists Annual Meetingis now open. 

The ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL) has published a list of accepted workshopsand tutorials. 

You can now watch Michael L. Nelson’s keynote “Web Archives at the Nexus of Good Fakes and Flawed Originals”at CNIs April meeting. 

Rhizome presents “The Art Happens Here: Net Art’s Archival Poetics,” an exhibition using web archives as art. If you happen to be in New York, you can see this exhibit at the New Museum until May 26.

The National Library of China will archive 200 billion Weibo postsin a project to preserve China’s digital heritage.

Katie Cuyler, Librarian at University of Alberta Libraries, on archiving Alberta’s climate change datain danger to disappear with the change of government. You can access the collection here.

Jason Scott, from TEXTFILES.COM, recently published The MySpace Dragon Hoard, a collection of 450,000 mp3s from MySpace between 2008-2010. These songs were gathered prior to the recent MySpace data loss.

Check out Karl Blumenthal’s trick to crawl special emojis from Twitter.

Some interesting food for thought from Stephen Dowling’s BBC article.

Weekly web archiving roundup: January 22, 2014

Here’s the weekly web archiving roundup for January 22, 2014!  These news items will be posted each Wednesday, and will be pertinent to anyone with an interest in web archiving, whether it’s something you do every day or just something you find intriguing.

  • “So who owns the Internet?” by Christina Pazzanese, Harvard Gazette: “A clash over who should decide which information flows through Internet networks — and at what price — is now before a Washington, D.C., federal appeals court in a landmark case that could grant Internet service providers (ISPs) the unfettered power to turn the information superhighway into a private toll road.”: http://news.harvard.edu/gazette/story/2014/01/so-who-owns-the-internet/

Tools:

ArchiveTeam Warrior

Check it out because 1) it’s pretty cool and 2) you can use it to help ArchiveTeam archiving efforts, currently including the archiving of Yahoo! Messages.

http://tracker.archiveteam.org/

“The ArchiveTeam Warrior is a virtual archiving appliance. You can run it to help with the ArchiveTeam archiving efforts. It will download sites and upload them to our archive — and it’s really easy to do!

The warrior is a virtual machine, so there is no risk to your computer. The warrior will only use your bandwidth and some of your disk space.

The warrior runs on Windows, OS X and Linux. You’ll need VirtualBox (recommended), VMware or a similar program to run the virtual machine.”

Weekend Web Archiving Fun: Web Archiving Integration Layer (WAIL)

Check out this new tool from Mat Kelly!

Web Archiving Integration Layer (WAIL) is a graphical user interface (GUI) atop multiple web archiving tools intended to be used as an easy way for anyone to preserve and replay web pages.

Tools included and accessible through the GUI are Heritrix 3.1.0Wayback 1.6, and warc-proxy. Support packages include Apache Tomcatphantomjs and pyinstaller.

WAIL is written mostly in Python and a small amount of JavaScript.”