Weekly web archiving roundup for the week of March 11, 2015:
- “Introducing the Federal Web Archiving Working Group“, by Michael Neubert.
- “Crawling websites using RSS feeds“, from Kristinn Sigurdsson. Post on the advantages of crawling websites using an RSS feed. Using Kris’ add-on for Heritrix, CrawlRSS, you can crawl sites (notably news sites) both more effectively (getting more good content) and efficiently (getting less redundant content) because the RSS feeds pull out the most critical data we need from the site.
- “Using Warcbase to Generate a Link Graph of the Wide Web Scrape“, from Ian Milligan. Interesting post tying web content with community archiving in some places.