Library and Archives Canada Coffee Chat: Video and Slides

On Tuesday, May 10, 2022 the SAA Web Archiving Section was honored to host Library and Archives Canada’s Tom Smyth discussing web and social media preservation.

A video of the coffee chat and presenter slides will be available for a limited time.


Library and Archives Canada Coffee Chat

Presenter Slides

Web and Social Media Preservation Program at the National Library and Archives Canada

Coffee Chat Recap: U.S. Federal Government Web Archiving

By Susan Paterson, SAA Web Archiving Section Vice-Chair and liaison librarian at Koerner Library at the University of British Columbia Vancouver.

The Web Archiving Section hosted its first coffee chat of the year on March 29, 2022. Melissa Wertheimer, Chair of the Web Archiving Section, led a panel discussion on US Federal Government web archiving activities. We were joined by web archiving experts from the National Library of Medicine, Government Publishing Office, the Smithsonian and the Library of Congress. Topics ranged from content curation, collaboration amongst agencies, staffing, workflow models, and successes and challenges, as well as the impact of COVID-19 on their collecting activities.

The coffee chat recording is available to viewers until summer 2022.

National Library of Medicine – History of Medicine Division, Digital Manuscripts Program

The session began with an informative presentation by Christie Moffatt, Manager of the Digital Manuscripts Program, History of Medicine Division at the National Library of Medicine (NLM). Christie described NLM’s approach to web collecting, an activity which began in 2009 with their Health and Medical Blog Collection. Christie provided an overview of the NLM web archive collections, and the collection development policies that guide their web collecting.

Christie noted that thematic and events based collections have grown and are a key component to their collection building. The Global Health Events Web Archive collection, one of their largest, began in 2015 with the Ebola outbreak. With the World Health Organization’s announcement of a global health pandemic in January 2020, NLM started work on a COVID-19 pandemic collection. Interestingly, this specific designation of a global health emergency is worked into their collection development policy and the NLM takes responsibility for building a web archive once this designation is made. An aim of the COVID-19 collection is to ensure a diversity of perspectives on both the impact and the response to the pandemic are archived. Tools and communication used for outreach during the pandemic (i.e. TikTok, Twitter) as well as personal narratives of the pandemic make up part of the collection which is the Library’s largest collection with 3.5 FTES working on the project. The National Library of Medicine’s Archive It website can be explored here.

Government Publishing Office

Dory Bower, Archives Specialist at the Government Publishing Office (GPO) provided an overview of GPO’s web archiving activities and explained how legislation Title 44 of the U.S. Code is the mandate for Public Printing and Documents. Specifically, Chapter 19 discusses the Depository Library Program. Federal agencies should be distributing and publishing their documents through the GPO. If a federal agency is not doing so, the Superintendent of Documents should be notified. Unfortunately, with born digital publications and publishing directly onto websites this is not happening and material is being missed. GPO joined the Archive-it community in 2012 and since then they have been using web archiving to help fill this gap.

The web archive is part of the overall GPO Digital Collection Development Plan. For material to be part of the GPO collection, it must be federally funded and be within the scope of the Federal Depository Library Program (FDLP). GPO is also focusing on collections of interest geared towards women, tribal libraries, and Native American communities, just to name a few. GPO maintains over 213 collections in Archive-it, making up 38.3 TB of data and consisting of over 392 million urls. You can explore the Federal Depository Library Program Web Archive here.

Smithsonian Institution – Libraries and Archives

Lynda Schmitz Fuhrig, Digital Archivist at the Smithsonian Institution Libraries and Archives, presented next. Lynda provided a fascinating overview of the web archiving work that’s being done at the Smithsonian, which celebrated its 175th anniversary in 2021.

The Libraries and Archives of the Smithsonian is the official repository and record keeper for the Smithsonian. Their responsibilities include sharing, collecting, and preserving Smithsonian activities and output which includes documenting its unique web presence, which launched in 1995. They now have nearly 400 public websites and blogs and a very active social media presence covering Twitter, Instagram, Flickr, and YouTube – just to name a few.

Like the National Library of Medicine, the Smithsonian has documented the impacts and effects of the COVID-19 pandemic on America. For example, beginning in March 2020, more focused and frequent crawls were necessary to document the altered scheduling of the closing and reopening of the Museums and the Zoo. Additionally, the closure of museums created a need for an increased digital presence, and the Smithsonian launched several new websites and initiatives, including Care Package, Vaccines & US and Our Shared Future.

Audience members were particularly interested in their web and social media workflows and tools. Along with Conifer and Browsertrix, the Smithsonian uses netyltic, which was developed by the Social Media Lab at Ryerson University to collect Smithsonian hashtags and accounts.

They are one of the few organizations that download the WARCS from Archive-it. They developed an in-house tool called WARC Retriever which they hope to release on Github later this year. Lynda’s summation was poignant: “The Smithsonian web collections will continue to tell the history and stories of the Smithsonian.” You can explore the Smithsonian Archive-it page here.

Library of Congress – Web Archiving Program

To round out the panel, Meghan Lyon and Lauren Baker, Digital Collection Specialists from the Library of Congress (LOC) Web Archiving Program, provided an overview of the activities at LOC. The Web Archiving Program began in 2000 and is part of the Library’s Digital Content Management Section.

The LOC web archives consist of 3PB of data organized into 174 collections, 75 of which are active collections. Like many of the other speakers, the Web Archiving Team collaborates frequently on web archive collections, relying on the contributions of collaborators around the Library. The Collection Development Office helps guide collection initiatives, and various committees review subject proposals and select content to archive and determine collection focus. LOC comprehensively collects content from Legislative Branch Agencies and U.S. House and Senate offices and committees. They collect content about U.S. national elections as well as managing other events based and thematic collections.

Megan and Lauren addressed the issue of permission and web archiving. Their permissions policy is determined by the Office of the General Counsel, which is based on Copyright Law. Permission requests must be sent to site owners for anything selected for web archiving. There are two permission requests: a permission to crawl, and permission to display based on the country of publication and the category of the entity. You can explore the LOC Web Archiving Program website here.

The panel closed out the session by discussing how they became interested in web archiving and how their careers started in the field. Their initial experiences ranged from practicums to learning on the job. The remainder of the conversation also included the topics of trends, the and future of web archiving tools – including what improvements people hope for and imagining better tools for harvesting and user awareness. The session was well attended with 181 registrants and over 80 attendees. Thank you to everyone who presented and who attended for such an engaging hour. Stay tuned for our next coffee chat, which will be in May!

Introducing the DocNow App

This week’s post was written by Zakiya Collier, Community Manager at Documenting the Now.

This week the Documenting the Now project announces the release of DocNow, an application for appraising, collecting, and gathering consent for Twitter content. DocNow reimagines the relationship between content creators and social media analysts (archivists and researchers) by addressing two of the most challenging issues of social media archiving practice—the issues of consent and appraisal.

The Documenting the Now Project is happy to release version 1.0 of our open-source tool freely for anyone to use. Read all about the app, what it does, and how to engage with the DocNow project team for support and providing feedback.

Over the last seven years, Documenting the Now has helped to foster an environment where a more broad sector of cultural memory workers can learn about web archiving tools and practices and can become involved with web archiving networks. This has largely been achieved by practicing transparency and inviting people who have traditionally been left out of established web content archiving networks into the project to participate, namely students, activists, and archivists who represent marginalized communities and who work in community-centered organizations, HBCUs, public libraries, community-based archives, and tribal libraries and archives.

Documenting the Now was a response to the need among scholars, activists, archivists, and other memory workers for new tools that would provide easily-accessible and user-friendly means to collect, visualize, analyze, and preserve web and social media content to better document public events. In addition, it aimed to respond to questions and concerns related to ethics, safety, intellectual property, and access issues for the collection, preservation, and dissemination of Twitter data in particular.

Documenting the Now has also developed community-centered web and social media archiving tools that both prioritize care for content creators and robust functionality for users:

  • Twarc – a command line tool and Python library for collecting tweet data from Twitter’s official API
  • Hydrator – a desktop application for turning Tweet ID datasets back into tweet data to use in your research
  • Social Humans – a label system to specify the terms of consent for social media content
  • The Catalog – a community-sourced clearinghouse to access and share tweet identifier datasets

In continuing to support and develop tools that embody ethical practices for social media archiving, the DocNow app joins this suite of tools. DocNow is an application for appraising, collecting, and gathering consent for Twitter content and includes several new features including:

  • Trends tab to view trending topics across the globe in real time
  • Explore tab to view content by users, media, URLs, and related hashtags all on one screen
  • Live testing and refining of collecting parameters on recent tweets
  • Tweets per hour calculator to easily identify Twitter bot accounts
  • Search and Collect tweets back in time via Search API and forwards with Stream API
  • Activate toggle to start collecting tweets and send a notification tweet to encourage transparency and communication in Twitter data collection
  • Collections tab to share information about your collection with the public
  • “Find Me” and Insights Overview features to specify and gather consent using Social Humans labels
  • Download Tweet ID archive for sharing following Twitter’s terms of service

The DocNow app also works in concert with other Documenting the Now tools, creating for users, a 4-step social media archiving journey:

Step 1: Collect content with the DocNow App by activating a search. Set collection limits and explore insights as your collection grows.
Step 2: Download your archive from the DocNow App, which includes a Tweet Viewer, Tweet IDs, and media files.
Step 3: Hydrate your Tweet IDs from the archive’s tweets.csv file back into full
tweets using DocNow’s Hydrator desktop application.
Step 4: Describe your collection and share your Tweet IDs with other researchers by adding them to the DocNow Catalog.

Ways to Use DocNow
There are 3 different ways to use DocNow including joining the community instance, running DocNow locally on a computer, and installing an instance of DocNow in the cloud. The Community Instance is a great way to get familiar with the tool before committing to running an instance but those with development skills may want to administer their own instance of DocNow locally or in the cloud.

  1. Join Documenting the Now‘s hosted community instance
  2. Run DocNow locally on your machine
  3. Install your own instance in the cloud

For help with installation and getting started, the Documenting the Now team will host community conversations. Dates will be announced soon! More information about the DocNow App can be found here.

Documenting the Now is seeking community input on all of our features as we continue to develop DocNow. Please join our slack channel by going to our website or email us at

US Government Web Archive Coffee Chat: Video and Slides

On Tuesday, March 29, 2022 the SAA Web Archiving Section was honored to host Lauren Baker and Meghan Lyon, Library of Congress; Dory Bower, Government Publishing Office; Lynda Schmitz Fuhrig, Smithsonian Institution Archives; Christie Moffatt, National Library of Medicine at the National Institute of Health. Section chair Melissa Wertheimer moderated the panel.

A video of the coffee chat and presenter slides will be available for a limited time.


US Government Web Archive Coffee Chat

Presenter Slides

Web Collecting at the National Library of Medicine, presented by Christie Moffatt

Web and Social Media Archiving, presented by Lynda Schmitz Fuhrig

Library of Congress Web Archiving, presented by Meghan Lyon & Lauren Baker

Coffee Chat Recap: Accessibility and Web Archiving

By: Kiera Sullivan and Lydia Tang

The Web Archiving and Accessibility & Disability sections co-hosted an engaging coffee chat about accessibility and web archiving on May 19, 2021. Dr. Lydia Tang, Immediate Past Chair of A&DS, gave a talk on accessibility tools, accessible web design, and web archiving. Big thanks to Dr. Tang for her presentation (view her slides!), and also to everyone who attended, asked questions, and shared their thoughts.

Dr. Tang’s presentation highlighted the idea of designing accessibility from the very beginning. She pointed out that the way a website is designed will affect how a screen reader navigates the content. Consider the web content that we collect – or may one day collect – in our web archiving activities. How accessible are archived websites for screen readers? They are only as accessible as they were created – which has been an evolving concept for developers.  Can or should we attempt to remediate inaccessible archived websites? Further, how accessible are the platforms that we use to keep our web archives, and how about our own websites?

The good news is this is one wheel we do not need to reinvent. The Web Accessibility Initiative (WAI) of the World Wide Web Consortium (W3C) has published a series of web accessibility guidelines. The Web Content Accessibility Guidelines (WCAG) is an industry-standard that is continually being updated, and it is a tremendous resource for making web content accessible to all users. As archivists we cultivate expertise in preserving material and making it accessible, whatever the format. If web content is one of the formats we work with, perhaps it is a professional duty to learn to recognize the components of the websites with which we engage and which we collect. These guidelines are an excellent place to start. Additionally, there are a number of groups that work on building resources, training, and awareness of accessibility best practices, such as the DLF Digital Accessibility Working Group and the Society of American Archivists’ Accessibility & Disability Section.

Dr. Tang shared her go-to tool for evaluating web accessibility: the WAVE tool, which she uses regularly as a browser extension for Chrome. Using WAVE, she demonstrated how easy it is to analyze a website for automatically detected errors such as missing alt-text, contrast issues, and other under-the-hood issues that might not be apparent but could greatly impact the website experience for people using screen readers and other assistive technology. It certainly isn’t a catch-all but is at least a big first step. She also shared other tools and resources, including WebAIM’s Contrast Checker, and a variety of assistive tech programs to try out – see her slides for a list of these!

One powerful takeaway from the conversation was the idea that we need to encourage and facilitate a culture of accessibility in our workplaces. Accessibility should be a regular component of our usual discussions about workflows and projects. It means thinking about the tools and templates that we use, becoming familiar with their accessibility features, and improving our own use of them. Coffee chat attendees shared their ideas for how we can achieve this, including playing around with dark mode, high contrast mode, and color schemes on our websites. The discussion also included looking at a WARC file together and questioning whether WARCs preserve alt-text and other accessibility features of a website.  We wrapped up the hour by considering whether accessibility statements – incorporated in the Conditions Governing Access note or a similar note – should be a required element for descriptive standards.

Did you attend this coffee chat? Have any afterthoughts, questions, or key takeaways that you would like to share? Please feel free to drop your thoughts in the comment section of this post. Do you belong to an SAA section that might be interested in co-hosting a coffee chat with the Web Archiving section? Let us know!

What coffee chat conversations would YOU like to see the Web Archiving section take part in? Leave your suggestions in the comments or reach out to any of the steering committee members with your ideas!

What do you get when you hand a Master’s student a web archiving program?: A Two-Part Series

The following is a guest post by Grace Moran, Graduate Web Archiving Assistant – Library Administration, University of Illinois Library

Websites are fleeting; finding a way to preserve them and ensure accessibility for end-users is one of the greatest challenges facing record-keeping institutions today. The window for capturing a website is minute, with many pages disappearing within 2-4 years of their creation (See Ben-David, “2014 Not Found,” in the further reading list). This two-part guest blog series documents my efforts over this academic year to develop, standardize, and envision a future for the University of Illinois web archives collections through our subscription to Archive-It. This first post will explore the deeply related issues of metadata and access for web archives, as well as how they influence my proposal for the future of our web collections.

About me. I am a graduate student getting my MS in Library & Information Sciences at the University of Illinois, graduating this May. I have been working for my supervisor, Dr. Chris Prom (Associate Dean, Office of Digital Strategies) since the Spring semester of my senior year of my bachelor’s. Last year, he gave me the option of transitioning over from digital special collections and workflows and focusing on our young web-archiving program. This opportunity has taught me so much, and I want to share that with others. 

Now, for a bit of background on the program. The University of Illinois has been capturing web pages through Archive-It since 2015, but the program itself has been inconsistent in its stewardship. This year, my charge is the evaluation of the status of the University of Illinois subscription to Archive-It and the creation of a plan for its continued success. The culmination of the role will be a final report outlining my accomplishments for the year, evaluating programmatic needs for a web-archiving program, conversations I have had with stakeholders in the program, and recommendations for the future of web archiving at the University of Illinois. The challenges facing a burgeoning web-archiving program are numerous and varied, ranging from policy to accessibility. Preferably, by the end of the year, I will have identified opportunities for growth and layout a plan for the future, ideally making a case for a dedicated web-archiving position when the budget allows. With 5 TB of data archived since 2015, the care and keeping of these collections is of utmost importance; the time invested by the University of Illinois Library must not be wasted.  

My experience thus far has shown me that managing a web archive is no small feat. (Is this an obvious statement? Yes, but that doesn’t make it any less true.) Two of the most urgent issues I have seen are metadata and access; it is no secret that these two go hand-in-hand. The debate around metadata is not new. The OCLC Research Library Partnership Metadata Working Group identified this same problem in their 2017 report “Developing Web Archiving Metadata Best Practices to Meet User Needs” (in the further reading list below). As it stands now, we don’t have a standard for documenting archived websites. Standards such as DACS and MARC don’t fully account for the unique nature of websites, so some institutions have adopted a hybrid approach, using both archival and bibliographic metadata. The question remains whether or not a hybrid approach is sufficient. Is it necessary to create a new standard? Would this over-complicate things?

On the Archive-It platform, there are three possible levels of metadata: 

  1. Collection-level
  2. Seed-level
  3. Document-level

Our collections at the University of Illinois tend to have collection-level and seed-level metadata. This is due to a couple of factors. First, the labor of editing seed metadata takes enough hours without also creating document metadata; law of diminishing returns. Second, what is/isn’t a document is a bit difficult to understand on the Archive-It platform, making it unclear what benefit there is in creating document-level metadata. For now, the goal is to make collection- and seed-level metadata consistent across collections and make it standard practice to describe websites when they are crawled.

Deeply tied into the issue of metadata is that of access; how do we make sure that content reaches end-users? If a tree falls in the forest… I’m not going to finish that; too cliché. You get the idea. My point is this: taking the time to preserve digital materials is nearly meaningless if no one gets to enjoy the fruits of the labor. 

What does access look like for University of Illinois web collections? Right now, an end-user would have to navigate to the public-facing Archive-It website and either search “University of Illinois” or stumble across one of our collections because of a keyword search. We are not in any way alone in this; institutions all over are struggling with the idea of making collections searchable.

So what’s the solution? How do we implement an access system without relying on end-users to know we participate in web archiving and have collections on a website referenced nowhere in our systems? In my final report to the library, I am going to recommend both short- and long-term solutions. Short-term, Archive-It allows subscribers to add a search bar to their discovery systems allowing them to search through their Archive-It collections. This simple, elegant, and quick solution would make a significant difference for researchers. A long-term, more involved solution would be to index all of our pages locally and work to allow full-text search of collections. In the era of Google, our users are used to full-text search; discovery systems should fit that behavior.

If you are working with web archives, I hope that something in this post resonated with you. I hope that institutions will begin to collaborate more and find standard solutions so we can truly harness this technology to our advantage.

In my next blog post a month from now, I’ll talk about the more human side of things: policy and personnel. I also want to share a bit about our efforts to document COVID-19 at the University of Illinois. Stay safe, stay healthy, and stay tuned!

Further Reading:

  1. Belovari, S. (2017). Historians and web archives. Archivaria, 83(1), 59-79.
  2. Ben-David, A. (2019). 2014 not found: a cross-platform approach to retrospective web archiving. Internet Histories, 3(3-4), 316-342.
  3. Bragg, M., & Hanna, K. (2013). The web archiving life cycle model. Archive-It), https://archive-it. org/static/files/archiveit_life_cycle_ model. pdf (visited 14.2. 14).
  4. Brunelle, J. F., Kelly, M., Weigle, M. C., & Nelson, M. L. (2016). The impact of JavaScript on archivability. International Journal on Digital Libraries, 17(2), 95-117.
  5. Condill, K. (2017). The Online Media Environment of the North Caucasus: Issues of Preservation and Accessibility in a Zone of Political and Ideological Conflict. Preservation, Digital Technology & Culture, 45(4), 166.
  6. Costa, M., Gomes, D., & Silva, M. J. (2017). The evolution of web archiving. International Journal on Digital Libraries, 18(3), 191-205.
  7. Dooley, J. M., & Bowers, K. (2018). Descriptive metadata for web archiving: Recommendations of the OCLC research library partnership web archiving metadata working group. OCLC Research.
  8. Dooley, J. M., Farrell, K. S., Kim, T., & Venlet, J. (2018). “Descriptive Metadata for Web Archiving: Literature Review of User Needs. OCLC Research.
  9. Dooley, J. M., Farrell, K. S., Kim, T., & Venlet, J. (2017). Developing web archiving metadata best practices to meet user needs. Journal of Western Archives, 8(2), 5.
  10. Dooley, J., Samouelian, M (2018). Descriptive Metadata for Web Archiving: Review of Harvesting Tools. OCLC Research.
  11. Farag, M. M., Lee, S., & Fox, E. A. (2018). Focused crawler for events. International Journal on Digital Libraries, 19(1), 3-19.
  12. Graham, P. M. (2017). Guest Editorial: Reflections on the Ethics of Web Archiving. Journal of Archival Organization, 14(3-4): 103-110.
  13. Grotke, A. (2011). Web Archiving at the Library of Congress. Computers in Libraries, 31(10), 15-19.
  14. Jones, Gina M. and Neubert, Michael (2017) “Using RSS to Improve Web Harvest Results for News Web Sites,” Journal of Western Archives: Vol. 8 : Iss. 2 , Article 3. 
  15. Littman, J., Chudnov, D., Kerchner, D., Peterson, C., Tan, Y., Trent, R., … & Wrubel, L. (2018). API-based social media collecting as a form of web archiving. International Journal on Digital Libraries, 19(1), 21-38.
  16. Maemura, E., Worby, N., Milligan, I., & Becker, C. (2018). If these crawls could talk: Studying and documenting web archives provenance. Journal of the Association for Information Science and Technology, 69(10), 1223-1233.
  17. Masanès, J. (2005). Web archiving methods and approaches: A comparative study. Library trends, 54(1), 72-90.
  18. Pennock, Maureen (2013, March). Web-Archiving: DPC Technology Watch Report. Digital Preservation Coalition.
  19. Summers, E. (2020). Appraisal Talk in Web Archives. Archivaria, 89(1), 70-102.
  20. Thomson, Sara Day (2016, February). Preserving Social Media: DPC Technology Watch Report. Digital Preservation Coalition. 
  21. Weber, M. (2017). The tumultuous history of news on the web. In Brügger N. & Schroeder R. (Eds.), The Web as History: Using Web Archives to Understand the Past and the Present (pp. 83-100). London: UCL Press. doi:10.2307/j.ctt1mtz55k.10
  22. Webster, Peter (2020). How Researchers use the Archived Web: DPC Technology Watch Guidance Note. Digital Preservation Coalition.
  23. Wiedeman, Gregory (2019) “Describing Web Archives: A Computer-Assisted Approach,” Journal of Contemporary Archival Studies: Vol. 6, Article 31.

Web Archiving Roundup: October 2019

The Internet Archive and Center for Open Science announce collaboration to preserve open science data, funded by the IMLS National Leadership Grants for Libraries program.

On August 12, Harvard Library Preservation Services hosted a regional meetup of Archive-it subscribers. You can learn about the meetup on the Harvard Library Communications blog.

The first workshop of the Continuing Education to Advance Web Archiving (CEDWARC) is taking place at George Washington University on October 28.  It is the first in a series of workshops to train library and archive professionals to use web archiving tools to answer research questions and enable new web archiving services based on these tools.

The Archives Unleashed Project has put out a call for participation for an Archives Unleashed Datathon at Columbia University Libraries on March 26-27, 2020. The project team invites archivists, researchers, librarians, computer scientists, and web archiving enthusiasts to join us over the course of two days to collaboratively work with web collections and explore cutting-edge research tools through hands-on experience. Proposals are due November 1, 2019.

Presentations from the Web Archiving & Data Services international partners meeting on September 20, 2019 at iPRES 2019 can be viewed here.

Call for papers opens on October 14, 2019 for the International Internet Preservation Consortium (IIPC) General Assembly and the Web Archiving Conference (WAC) in Montréal, from May 11 ̶ 13, 2020.

Article on web archiving, “Please, My Digital Archive. It’s Very Sick.” by Tanner Howard was published by the Lapham’s Quarterly on September 4, 2019.

Webrecorder had a bunch of updates! First, Webrecorder released Autopilot, a new feature that can perform actions on the current web page loaded in Webrecorder, similar to a human user: clicking buttons, scrolling down, expanding sections, and so on. Webrecorder also released a Desktop app. The app is a fully self-contained version of the online service built for a desktop environment with minimal modifications: a single-user Webrecorder that can run on a computer without requiring a connection to any centralized service. Users can now have the full web archiving capabilities of Webrecorder on their own machines without having to install Docker or use the command-line.

Finally, WABAC (Web Archive Browsing Advanced Client), a project by Webrecorder, was released. More information on the WABAC and client-side replay technology can be found in a blog post by Ilya Kreymer on the Webrecorder blog and DSHR’s blog.

Upcoming US meetups: DLF Forum & NDSA Digital Preservation, October 14-17 in Tampa, FL and Web Archiving Texas on November 6 at Baylor University.

Do you have #webarchiving news?  Tweet at us @WebArch_RT and we will be sure to feature it in next month’s roundup!


Web Archiving Roundup: April, 2019

The Library of Congress Digital Collections Development Coordinator positioncloses soon (May 1st)!

Archive-It is hosting a training webinar on Web Archiving Systems API (WASAPI). Zoom in on Wednesday, May 29that 11am Pacific, 2pm Eastern.

This one came through my Archive-It listserv this morning; a webinar about WASAPI on May 29: here.

Registration for the Society of American Archivists Annual Meetingis now open. 

The ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL) has published a list of accepted workshopsand tutorials. 

You can now watch Michael L. Nelson’s keynote “Web Archives at the Nexus of Good Fakes and Flawed Originals”at CNIs April meeting. 

Rhizome presents “The Art Happens Here: Net Art’s Archival Poetics,” an exhibition using web archives as art. If you happen to be in New York, you can see this exhibit at the New Museum until May 26.

The National Library of China will archive 200 billion Weibo postsin a project to preserve China’s digital heritage.

Katie Cuyler, Librarian at University of Alberta Libraries, on archiving Alberta’s climate change datain danger to disappear with the change of government. You can access the collection here.

Jason Scott, from TEXTFILES.COM, recently published The MySpace Dragon Hoard, a collection of 450,000 mp3s from MySpace between 2008-2010. These songs were gathered prior to the recent MySpace data loss.

Check out Karl Blumenthal’s trick to crawl special emojis from Twitter.

Some interesting food for thought from Stephen Dowling’s BBC article.

Weekly web archiving roundup: January 22, 2014

Here’s the weekly web archiving roundup for January 22, 2014!  These news items will be posted each Wednesday, and will be pertinent to anyone with an interest in web archiving, whether it’s something you do every day or just something you find intriguing.

  • “So who owns the Internet?” by Christina Pazzanese, Harvard Gazette: “A clash over who should decide which information flows through Internet networks — and at what price — is now before a Washington, D.C., federal appeals court in a landmark case that could grant Internet service providers (ISPs) the unfettered power to turn the information superhighway into a private toll road.”:


ArchiveTeam Warrior

Check it out because 1) it’s pretty cool and 2) you can use it to help ArchiveTeam archiving efforts, currently including the archiving of Yahoo! Messages.

“The ArchiveTeam Warrior is a virtual archiving appliance. You can run it to help with the ArchiveTeam archiving efforts. It will download sites and upload them to our archive — and it’s really easy to do!

The warrior is a virtual machine, so there is no risk to your computer. The warrior will only use your bandwidth and some of your disk space.

The warrior runs on Windows, OS X and Linux. You’ll need VirtualBox (recommended), VMware or a similar program to run the virtual machine.”