Archive-It Updates: Guest Post by Kristine Hanna (Director, Internet Archive)

A big year for web archiving

2013 has already been a big year for web archiving with over 250 partner organizations world-wide archiving the web using Archive-It. We are excited about the wide range of use cases and experience levels with web archiving represented in the Archive-It community. As we grow, we are focusing on bringing a stronger and more robust service offering to our partner organizations.

Archiving the web has come a long way since Archive-It was first deployed in early 2006; but we also realize how much more work needs to be done. The web remains “a mess” and we are grateful to the Archive-It community who continues to help us find solutions. A few months ago we published our web archiving white paper in an effort to build a framework and best practices around many of the use cases we have seen evolve inside the Archive-It community over the last seven years. http://www.archive-it.org/publications

Heritrix upgrade and 4.8

In February we upgraded to the newest version of the crawler software Heritrix, 3.1. For our partners this means faster crawls and more complete captures of websites, particularly for sites that contain Javascript.

In May we went live with our 4.8 release. Over 16 partner requested features were included. Here are a few:

  • New Wayback Quality Assurance Tool: Enables partners to find missing embedded URLs while browsing their archived content. A subsequent patch crawl of the missing URLs can dramatically improve the overall look and functionality of the archived site.
  • Increased functionality for archiving PDFs- A new feature enables partners to quickly find, view, and add metadata to new PDFs that were archived since their last crawl.
  • IP Authentification- Option to restrict access to archived websites to a particular IP address, such as a reading room or library.
  • Vanity URL- Customizable URLs for partner’s public home pages on http://www.archive-it.org.
  • Import Metdata- Ability to import a spreadsheet to add to or replace metadata for thousands of records at a time.

Archive-It 5.0: The next generation web archiving application

Later this year, the Archive-It team will be turning our focus to the next generation of the web application which we are calling 5.0. We expect the development to take 6-8 months and we will continue to run the existing 4.8 version while we work with partners on reinvigorating the web application. As always it will be a priority to incorporate partner’s feedback and requests into 5.0 to ensure that the service continues to be relevant and useful to the libraries and archiving communities.

So far, we are focusing our attention towards:

  • adding additional capture and display solutions for social media sites and dynamic content
  • transitioning full text search from NutchWAX to SOLR to accompany our metadata search (which made the transition to SOLR in 2011)
  • enhanced QA and crawl analysis tools
  • designing a more intuitive user interface

If you would like to learn more about 5.0, you can take a look at our wiki page here https://webarchive.jira.com/wiki/pages/viewpage.action?pageId=49414243. We welcome additional feature requests or suggestions.

Get In Touch

Please come find the Archive-It team at the following conferences, where we will be presenting or showing a poster:

Association of Canadian Archivists General Meeting June 15 http://archivists.ca/content/program

Digital Preservation 2013, Alexandria VA July 23-25 http://www.digitalpreservation.gov/meetings/ndiipp13.html

Society of American Archivists August 15-17 http://www2.archivists.org/conference/2013/new-orleans

And our 2013 partner meeting is scheduled for November 12 in Salt Lake City Utah http://archiveitmeeting2013.wordpress.com/about/

We appreciate all the valuable feedback we receive from our partners and the library/archive community on ways to make the Archive-It service better.

Follow us on twitter @archiveitorg

Find us on facebook https://www.facebook.com/ArchiveIt

Our blog http://blog.archive-it.org

Bylaws for the Web Archiving Roundtable-We Need Your Comments!

The Web Archiving Roundtable bylaws are now available for our members to review.   Comments are definitely welcome and will be accepted until July 15th. In compliance with SAA policy, SAA staff have reviewed the bylaws below.  Here is the link to them:

https://docs.google.com/document/d/1R4cny3pWiT6ciEZZou1f5OsGv6jvU94034A6-ABLVGc/edit?usp=sharing

We will hold a referendum on the bylaws and election of officers at our inaugural meeting in New Orleans.  Future elections, as stated in the bylaws, will be held via the standard online system SAA provides.  Once we ratify the bylaws they will be passed on to the SAA Council.

Thank you,

Trevor Alvord