Weekly web archiving roundup: October 16, 2014

Weekly web archiving roundup for the week of October 16, 2014:

  • Event: “WEB ARCHIVING: EXPERIENCES, PERSPECTIVES, AND POSSIBILITIES”“, from the Metropolitan New York Library Council. This panel discussion brings together a variety of professionals involved with various web archiving projects at their institutions. They will discuss the importance and role of web archiving in the larger library and information science arena, how their work interacts with other traditional library departments, and how web archiving occurs.
  • Video: The Future of Web Archiving”, from Gary Price.  The following video was recorded at NDIIPP’s Digital Preservation 2014 held at the Library of Congress on Wednesday, July 23, 2014 and made available online recently. We’ve also includes a link to a slide deck as well as a text transcript.
  • Newly launched blogs of the New York and Boston cohorts of the National Digital Stewardship Residency (NDSR) program” (http://ndsr.nycdigital.org/ and http://ndsrboston2014.wordpress.com/).  The National Digital Stewardship Residency is working to develop the
    next generation of digital stewardship professionals that will be responsible for acquiring, managing, preserving, and making accessible our nation’s digital assets. Residents serve 9-month, paid residencies in host institutions working on digital stewardship initiatives. Host institutions receive the dedicated contribution of a recent graduate that has received advanced training in digital stewardship.

Weekly web archiving roundup: October 8, 2014

Weekly web archiving roundup for the week of October 8, 2014:

  • Guidance on building archivable websites“, from Nicholas Taylor. A major challenge for web archivists is the low visibility that downstream archiving has on upstream web content creation. And, yet, deliberate and inadvertent architectural decisions made by web content creators strongly impact the ease or difficulty with which their websites can be captured and faithfully re-presented.
  • ““Ce projet un peu feu”: Web Archiving Art Resources with NYARC”, by Walter Schlect.  Internship report from grant-funded partnership between the New York Art Resources Consortium (NYARC) and Pratt Institute called M-LEAD TWO.
  • Integrating the Live and Archived Web Viewing Experience with Mink“, by Matt Kelly.  The Chrome extension Mink (short for Minkowski Space) queries all the public web archives (via the Memento aggregator) in the background and will display the number of mementos (that is, the number of captures of the web page) available at the bottom right of the page.

Weekly web archiving roundup: October 1, 2014

Weekly web archiving roundup for the week of October 1, 2014:

  • Creepypastas, Memes, Lolspeak & Boards: The Scope of a Digital Culture Web Archive“, by Trevor Owens.  An update on the development and curation of the American Folklife Center’s digital culture web archive. AFC’s culture web archive aims to capture “a set of sites that best document elements of the various digital vernaculars which have emerged through networked and computer-mediated communication.” Post includes description of scope and some of the sites crawled as well as a call out for site nominations.

Weekly web archiving roundup: September 24, 2014

Weekly web archiving roundup for the week of September 24, 2014:

  • NLM to Broaden Its Institutional Web Site Archiving“, from the National Library of Medicine Bulletin. NLM will begin using the Internet Archive’s Archive-It service to capture and preserve periodic snapshots of a wider landscape of NLM Web resources. Concurrently, NLM will cease utilizing its previous permanence policy to individual Web pages and documents.

Weekly web archiving roundup: September 18, 2014

Weekly web archiving roundup (long overdue!) for the week of September 18, 2014:

  • NCSU Libraries developing toolkit to make it easier to collect and preserve social media“; The NCSU Libraries have been awarded a grant to develop a web toolkit that will tackle the problem of capturing and preserving social media conversations. The team will investigate social media use associated with NC State and will develop software, procedures, and documentation to cost-effectively implement social media archiving at the NCSU Libraries. This will then be extended to the larger community in 2015.
  • NLM to Broaden Its Institutional Web Site Archiving“; NLM will begin using the Internet Archive’s Archive-It service to capture and preserve periodic snapshots of a wider landscape of NLM Web resources. Concurrently, NLM will cease utilizing its previous permanence policy to individual Web pages and documents.
  • Current Quality Assurance Practices in Web Archiving [paper]“, by Brenda Reyes Ayala, Mark Edward Phillips, and Lauren Ko. This paper presents the results of a survey of quality assurance (QA) practices within the field of web archiving. To understand current QA practices, the authors surveyed 54 institutions engaged in web archiving, which included national libraries, colleges and universities, and museums and art libraries.

Memento for Chrome review: Guest Post by Cliff Hight

The following is a guest post by Cliff Hight, University Archivist, Kansas State University Libraries.

The Memento for Chrome extension allows users of Google’s web browser, Chrome, to see previous versions web pages. I enjoyed testing the tool and seeing how certain sites have changed over the years. By the way, do any of you remember how Yahoo! looked in 1996? With a few clicks of a mouse, you can now.


To give you a flavor of how it works, I’ll walk you through (with pictures!) my experience.

1)      After installing the extension, I used my institution’s home page as a test bed.


2)      I clicked on the clock to the right of the browser’s address bar and set the web time to which I wanted to travel—arbitrarily selected as April 14, 2010.


3)      I opened the context menu by right-clicking (or control-click for Mac users) on the page, selecting “Memento Time Travel,” and clicking the “Get near Wed, 14 Apr 2010 18:54:30 GMT” option.


4)      Voila! A view of how the Kansas State University Libraries website looked in the Internet Archive’s Wayback Machine on March 6, 2009.


You might have noticed that the date I was seeking and the date of the archived site were not the same. As it turns out, the developers note on their page that the extension has two limitations: it cannot “obtain a prior version of a page when none have [sic] been archived and time travel into the future.” Because my institution’s site was not captured in the Wayback Machine on April 14, 2010, there was nothing from that date to show. Instead, the tool went with the next oldest date, which happened to be March 6, 2009.

Additionally, you may have seen additional options on the context submenu. Selecting “Get near current time” sends you the most recently archived version of the page. The “Get at current time” option takes you to the live version of the page, and the “Got” line tells you which page you are currently seeing.

This extension basically uses various web archives, such as the Wayback Machine and the British Library Web Archive, to provide easy access to earlier versions of webpages. It also claims to provide archived pages of Wikipedia in all available languages. In my use of the tool, it was more convenient than going to the Wayback Machine every time I wanted to see older version of websites.

Like most technology products, there are some bugs. In my tests, there were a couple of times on different websites when I set a date, looked at the current version of the page, clicked to see the older version, and waited while nothing happened. To get it working again, I went back to the date box, changed the date by a day, and had success in seeing the older version. I’m not sure why it had those hiccups (and it would not surprise me if there was user error), but know as you begin to use the tool that there might be some kinks to work through.

The developers of Memento include the Prototyping Team of the Research Library of the Los Alamos National Laboratory and the Computer Science Department of Old Dominion University. Based on information on the Memento website, the extension began development in 2009 and its most recent update was in November 2013. On the plugin page, the developers state that “Memento for Chrome allows you to seamlessly navigate between the present web and the web of the past. It turns your browser into a web time travel machine that is activated by means of a Memento sub-menu that is available on right-click.” And, to learn more about the technical side of the project, you can see their Memento Guide and Request for Comments pages.

The Memento for Chrome extension is a helpful tool that allows users to easily peruse websites and see how they have changed through the years. I would recommend adding it to your toolbox as you seek to view the history of the web.

Roundtable survey results

Here’s a link to Web Archiving Roundtable survey results presented by John Bence and Anna Perricci of the SAA Web Archiving Roundtable at SAA 2014.

John and Anna designed and administered a survey to assess the needs and preferences of community members who would like to learn more about web archives. This presentation gives more information about the findings of the survey and the path forward to meet the needs described by those who responded.



Weds, 8/13, 3:30.

Welcome and Remarks from Chair Tessa Fallon and Vice Chair Trevor Alvord

Web Archiving Survey Results, John Bence and Anna Perricci

2013 NDSA Web Archiving Survey Highlights, Nicholas Taylor

Jason Scott, Internet Archive, Archive Team
Jason Scott is the proprietor of http://TEXTFILES.COM , historian, filmmaker, archivist, famous cat maintenance staff. He works on/for/over the Internet Archive.

Jimmy Lin, UMD, “Infrastructure for Supporting Exploration and Discovery in Web Archives”

Jimmy Lin is an Associate Professor in the College of Information
Studies (The iSchool) at the University of Maryland, with a joint
appointment in the Institute for Advanced Computer Studies (UMIACS)
and an affiliate appointment in the Department of Computer Science. He
graduated with a Ph.D. in Electrical Engineering and Computer Science
from MIT in 2004. Lin’s research lies at the intersection of
information retrieval and natural language processing; his current
work focuses on large-scale distributed algorithms and infrastructure
for data analytics. From 2010-2012, Lin spent an extended sabbatical
at Twitter, where he worked on services designed to surface relevant
content to users and analytics infrastructure to support data
science. He continues to engage with Twitter on various aspects of big
data and data science.

Weekly web archiving roundup: August 6, 2014

Weekly web archiving roundup for the week of August 6, 2014:

Meet the Web Archiving Roundtable Leaders for 2014-2015!

Kate Stratton: Vice Chair/Chair Elect

Kate Stratton is Assistant Archivist at the Gates Archive where she has multi-functional responsibilities, particularly in the acquisition, accession, and description of incoming born-digital, digitized, and analog materials. Kate also manages the Archive’s web archiving program and tools. Prior to joining the Gates Archive, Kate was Research Library Fellow at Emory University in the Manuscript, Archives, and Rare Book Library and the Legacy Finding Aids Research Assistant at the Southern Historical Collection, University of North Carolina at Chapel Hill. She received her Master’s of Science in Library Science from the University of North Carolina at Chapel Hill. She has previously served on the Steering Committee and Education Sub-committee of the Records Management Roundtable.


Benn Joseph: Web Liaison

Benn Joseph joined Northwestern University in 2009 as a Manuscript Librarian, and splits his time between the Northwestern University Archives and the Charles Deering McCormick Library of Special Collections where he manages both paper-based and digital collections. He is also responsible for developing digital archives workflows for the library, and manages the university’s web archiving initiative. Prior to Northwestern he worked at the Chicago History Museum processing manuscript and photographic collections, and at Benedictine University as the Special Collections Librarian. Benn received his MSLS degree from the University of North Carolina at Chapel Hill in 2006, and is a member of the 2013 Archives Leadership Institute cohort.


Anna Perricci: Education Coordinator

While completing my MSI in Archives and Records Management at the University of Michigan with a focus on digital preservation, I also earned a graduate teaching certificate.  At the New York Public Library, I served as a preservation librarian and then was responsible for education, outreach and statistical analysis at ARTstor.  On a volunteer basis I have helped artists and activists in New York City preserve their work in multiple venues including in conjunction with FIGMENT (a participatory art festival) and the Archives Working Group of Occupy Wall Street.  In May 2013, I started as a Web Archiving Project Librarian at Columbia University.  A main goal of my role is to foster collaboration and improvements in web archiving.  I work with colleagues to form new collection development models, address technical challenges and implement strategies for the stewardship of web archives.  We are building strong networks of professionals with an interest in web archiving to extend current web archiving models and methods.


John Bence: Social Media Manager

John Bence is the University Archivist in the Manuscript, Archives, and Rare Book Library at Emory University. John leads the Emory University Archives efforts in acquiring, arranging, describing, and providing access to University records and special collections in all formats, as well as coordinating the development of records management programs at Emory.


Rachel Taketa: Social Media Manager

Rachel Taketa is the Library Specialist for the Industry Documents Digital Libraries at UC San Francisco’s Library and Center for Knowledge Management. She earned her BA from California State University, East Bay and her MLIS from San Jose State University.  In her capacity as Library Specialist, Rachel processes digital assets for the Legacy Tobacco Documents Library (LTDL) and the Drug Industry Documents Archive (DIDA), internationally utilized archives of tobacco and pharmaceutical industry documents. In addition, she co-teaches workshops and webinars, provides reference interviews, and manages the archives’ websites and social presence in the form of blogs, listservs and twitter accounts.  In 2009, Rachel initiated the Industry Documents Digital Libraries’ Web Archives which have produced the California Tobacco Control Web Archives and the newly planned E-Cigarette Web Archive for use by researchers investigating the tobacco industry’s strategies and tactics surrounding the marketing of new nicotine products.


Trevor Alvord graduates to Chair for the 2014/2015 year.  Congrats everyone!