A big year for web archiving
2013 has already been a big year for web archiving with over 250 partner organizations world-wide archiving the web using Archive-It. We are excited about the wide range of use cases and experience levels with web archiving represented in the Archive-It community. As we grow, we are focusing on bringing a stronger and more robust service offering to our partner organizations.
Archiving the web has come a long way since Archive-It was first deployed in early 2006; but we also realize how much more work needs to be done. The web remains “a mess” and we are grateful to the Archive-It community who continues to help us find solutions. A few months ago we published our web archiving white paper in an effort to build a framework and best practices around many of the use cases we have seen evolve inside the Archive-It community over the last seven years. http://www.archive-it.org/publications
Heritrix upgrade and 4.8
In May we went live with our 4.8 release. Over 16 partner requested features were included. Here are a few:
- New Wayback Quality Assurance Tool: Enables partners to find missing embedded URLs while browsing their archived content. A subsequent patch crawl of the missing URLs can dramatically improve the overall look and functionality of the archived site.
- Increased functionality for archiving PDFs- A new feature enables partners to quickly find, view, and add metadata to new PDFs that were archived since their last crawl.
- IP Authentification- Option to restrict access to archived websites to a particular IP address, such as a reading room or library.
- Vanity URL- Customizable URLs for partner’s public home pages on http://www.archive-it.org.
- Import Metdata- Ability to import a spreadsheet to add to or replace metadata for thousands of records at a time.
Archive-It 5.0: The next generation web archiving application
Later this year, the Archive-It team will be turning our focus to the next generation of the web application which we are calling 5.0. We expect the development to take 6-8 months and we will continue to run the existing 4.8 version while we work with partners on reinvigorating the web application. As always it will be a priority to incorporate partner’s feedback and requests into 5.0 to ensure that the service continues to be relevant and useful to the libraries and archiving communities.
So far, we are focusing our attention towards:
- adding additional capture and display solutions for social media sites and dynamic content
- transitioning full text search from NutchWAX to SOLR to accompany our metadata search (which made the transition to SOLR in 2011)
- enhanced QA and crawl analysis tools
- designing a more intuitive user interface
If you would like to learn more about 5.0, you can take a look at our wiki page here https://webarchive.jira.com/wiki/pages/viewpage.action?pageId=49414243. We welcome additional feature requests or suggestions.
Get In Touch
Please come find the Archive-It team at the following conferences, where we will be presenting or showing a poster:
Association of Canadian Archivists General Meeting June 15 http://archivists.ca/content/program
Digital Preservation 2013, Alexandria VA July 23-25 http://www.digitalpreservation.gov/meetings/ndiipp13.html
Society of American Archivists August 15-17 http://www2.archivists.org/conference/2013/new-orleans
And our 2013 partner meeting is scheduled for November 12 in Salt Lake City Utah http://archiveitmeeting2013.wordpress.com/about/
We appreciate all the valuable feedback we receive from our partners and the library/archive community on ways to make the Archive-It service better.
Follow us on twitter @archiveitorg
Find us on facebook https://www.facebook.com/ArchiveIt
Our blog http://blog.archive-it.org