Call for Web Archiving Section Committee Steering Members (deadline extended)

The Web Archiving Section is excited to accept nominations for the following Steering Committee positions for the 2022-2023 year!

  • Vice-Chair/Chair-Elect
  • Secretary
  • Communications Manager
  • Education Coordinator
  • Student Member

If you or someone you know would like to run for a position on the the Web Archiving Section Steering Committee please fill out this form by August 2022 with the following:​

  • Candidate Name
  • Job Title and Institution, if applicable
  • Bio and Candidate Statement (1-2 paragraphs)
  • Title of Steering Committee position sought

Position descriptions can be found below. Please keep in mind that membership in the Web Archiving Section is required in order to participate in elections through candidacy or casting a ballot. You may only run for one position. To learn more about the Web Archiving Section, check out the Web Archiving Section microsite and the Web Archiving Section blog

We look forward to hearing from you!

Position Descriptions:

Vice-Chair/Chair-Elect: The Vice Chair serves for two years, the first year as Chair-Elect and the second year as Chair.

  • Supports duties and responsibilities of the Chair as assigned.
  • Operates as acting Chair in the absence of the Chair.
  • Serves as member of the Steering Committee.
  • Fulfills all responsibilities specified in Section IX: Sections of the SAA Governance Manual.

Secretary (two-year term)

  • In consultation with Chair and Vice Chair establishes all Steering Committee meetings.
  • Calls for and distributes agenda items for Steering Committee meetings. 
  • Records meeting minutes and distributes them to the Steering Committee. 
  • Serves as member of the Steering Committee.

Education Coordinator (two-year term)

  • Serves as the section’s liaison to SAA Education Committee.
  • Arranges informal online meet-ups for members.
  • Prepares educational experiences, such as guest speakers, etc.
  • Serves as member of the Steering Committee.

Communications Manager (one-year term)

  • Maintains and updates the section’s microsite, blog, and Twitter feed.
  • Keeps section’s email list recipients informed on section news, events, and regular activities.
  • Serve as a member of the Steering Committee. 

Student Member (one-year term)

  • Serves as a liaison to SAA student chapters and groups. 
  • Serves as a member of the Steering Committee. 
  • Must be an actively enrolled student and student member of SAA at the time of election.

Coffee Chat Recap with Library and Archives Canada’s Tom Smyth

By Susan Paterson, SAA Web Archiving Section Vice-Chair and liaison librarian at Koerner Library at the University of British Columbia Vancouver.

The Web Archiving Section hosted another successful Coffee Chat which featured Tom Smyth, Program Manager of the Web and Social Media Preservation Program within the Digital Preservation Division at Library and Archives Canada (LAC). Tom discussed the evolution of the program which began in December 2005. The collection exceeds 100 TB of content and includes over 2.64 billion assets from wide ranging collections such as Government of Canada websites, thematic and rapid response collections. 

At the start of the LAC’s National Web Archive Program, collection activities focused on the Government of Canada domain, provincial and federal elections, Canada’s experience in the Summer and Winter Olympics and Paralympics and commemorative events such as state funerals and the War of 1812. In 2013, the program expanded to include thematic collections and events-based collections such as COVID-19, the 2022 trucking convoy protest in Ottawa, and current Canadian perspectives from the war in Ukraine. The program conducts rescue and preservation harvesting and ensures that federal government websites are preserved, pending decommission or content removal to another website. 

So who does all this work at LAC? The web archiving team consists of digital librarians and archivists that bring their own unique perspectives to web archival curation. On the technical side, the program’s senior crawl engineers are a critical component in tackling complex quality control issues. Tom and his team ensure that data curation aligns with user requirements and includes a digital humanities perspective with the purpose of building datasets for data historians, researchers and scholars. Tom used the following example to explain how the the digital librarians and digital archivists approach collection curation: “We ask the question, ‘Twenty years from now, when a digital historian sits down to write about the history of COVID-19 and its impact on Canada, what kind of data and sources do they wish they would have?’ This influences how we select for curation.” COVID-19 has demonstrated that web archiving is a key resource for  documenting history and the pandemic has influenced how they collect materials. 

LAC’s COVID-19 web archive consists of over 2000 resources, 16 TB of data and over 478 million objects, including 34 newspapers in both official languages representing various political and regional perspectives from coast to coast – and it continues to grow. The collection includes social media and over 4 million tweets concentrating on COVID-19 dialogue and its impact on Canadians.

Tom discussed the “black hole of quality control” (QC) and described the importance of using a methodological approach when conducting QC. He explained the importance of both a framework and a balance as QC can be a never ending project. For more on LAC’s approach to quality control, you can view one of the 2021 IIPC presentations from Tom Smyth and Patricia Klambauer The Black Hole of Quality Control: Toward a Framework for Managing QC Effort to Ensure Value

The importance of web archive finding aids was a key thread throughout the talk and later in the discussion. As Tom explained, finding aids enable researchers to see at a glance whether the collection or datasets are helpful in addressing their research questions. Ideal finding aids would include practical information such as a master seedlist, how many seeds for each theme, resource distribution, type of resource, language, QC status, and metadata specifications. 

The Prime Minister of Canada’s website (www.pm.gc.ca) was used as an example to describe why web archiving is so important. Back in 2006, LAC had only five days to ensure that all of outgoing Prime Minister Paul Martin’s website was captured before the site was replaced with the website of newly elected Prime Minister Stephen Harper. Tom underlined the importance of being proactive and the need to have good working relationships with partners and government departments such as the Privy Council Office, which supports the Office of the Prime Minister. 

The Truth and Reconciliation Commission Web Archive (TRC) is another example of collaboration between LAC and other institutions. LAC worked jointly with the University of Winnipeg Library, the University of Manitoba Libraries and the National Centre for Truth and Reconciliation (NCTR) to develop this collection. The relationship with Indigenous Peoples is a concentration for LAC and a constant subject of collection. All of these collections will be publicly findable in the Government of Canada Web Archive, which is planned to be released in the fall of 2022. 

Tom discussed the ethical and legal considerations of web archiving. The Library and Archives of Canada Act (S.C. 2004, c. 11, s.8(2)) empowers LAC to collect pertinent web content. Steps are taken to ensure the copyright owner is informed that their site will be archived and takedown requests are respected. Even with LAC’s extensive web archiving experience and resources, some websites just can’t be crawled due to their structure and the need for human input. 

LAC is a client of the Internet Archive and uses the Archive-IT software.  All of the WARCS are transferred back to LAC for local digital preservation via LTO tape. 

Tom closed his talk on a hopeful note. Our efforts today will help us to reduce the likelihood of what Vint Cerf has termed a “digital black hole” and “the forgotten century.” The global community of web archivists, through their concerted efforts, are making every effort to ensure that nationally relevant materials are captured, preserved and made available for future generations.

Virtual Joint Section Meeting: Web Archiving & Performing Arts

SAA’s Web Archiving Section and Performing Arts Section will hold a virtual joint section meeting on Thursday, August 11, 2022 from 1-2:30 pm CT.

The steering committees invite colleagues to present case studies (up to 10 min) about the intersections between special collections, web archives, and born-digital records in all areas of the performing arts. Possible topics could include:

  • Projects and workflows for archiving web presences and born-digital records of performing arts organizations, artists, and companies, including social media
  • How and why researchers use and interact with performing arts web archives
  • Data computation, analysis, and visualization projects based on performing arts web archives and born-digital records
  • Collection development policy initiatives to incorporate web archives into performing arts special collections
  • Including performing arts web archives in finding aids
  • Reference and engagement work around performing arts web archives and born-digital records
  • Web archiving and post-custodial performing arts archives
  • Performing arts web archives design for mixed audiences/diverse patrons

For consideration, please send a presentation title, brief description (100 words or less), and name(s) of presenter(s) to Performing Arts Section Co-Chair Cecily Marcus (cecily.marcus@mnhs.org) and Web Archiving Section Chair Melissa Wertheimer (melissa.wertheimer@gmail.com) by July 1, 2022.

Library and Archives Canada Coffee Chat: Video and Slides

On Tuesday, May 10, 2022 the SAA Web Archiving Section was honored to host Library and Archives Canada’s Tom Smyth discussing web and social media preservation.

A video of the coffee chat and presenter slides will be available for a limited time.

Video

Library and Archives Canada Coffee Chat

Presenter Slides

Web and Social Media Preservation Program at the National Library and Archives Canada

Coffee Chat Recap: U.S. Federal Government Web Archiving

By Susan Paterson, SAA Web Archiving Section Vice-Chair and liaison librarian at Koerner Library at the University of British Columbia Vancouver.

The Web Archiving Section hosted its first coffee chat of the year on March 29, 2022. Melissa Wertheimer, Chair of the Web Archiving Section, led a panel discussion on US Federal Government web archiving activities. We were joined by web archiving experts from the National Library of Medicine, Government Publishing Office, the Smithsonian and the Library of Congress. Topics ranged from content curation, collaboration amongst agencies, staffing, workflow models, and successes and challenges, as well as the impact of COVID-19 on their collecting activities.

The coffee chat recording is available to viewers until summer 2022.

National Library of Medicine – History of Medicine Division, Digital Manuscripts Program

The session began with an informative presentation by Christie Moffatt, Manager of the Digital Manuscripts Program, History of Medicine Division at the National Library of Medicine (NLM). Christie described NLM’s approach to web collecting, an activity which began in 2009 with their Health and Medical Blog Collection. Christie provided an overview of the NLM web archive collections, and the collection development policies that guide their web collecting.

Christie noted that thematic and events based collections have grown and are a key component to their collection building. The Global Health Events Web Archive collection, one of their largest, began in 2015 with the Ebola outbreak. With the World Health Organization’s announcement of a global health pandemic in January 2020, NLM started work on a COVID-19 pandemic collection. Interestingly, this specific designation of a global health emergency is worked into their collection development policy and the NLM takes responsibility for building a web archive once this designation is made. An aim of the COVID-19 collection is to ensure a diversity of perspectives on both the impact and the response to the pandemic are archived. Tools and communication used for outreach during the pandemic (i.e. TikTok, Twitter) as well as personal narratives of the pandemic make up part of the collection which is the Library’s largest collection with 3.5 FTES working on the project. The National Library of Medicine’s Archive It website can be explored here.

Government Publishing Office

Dory Bower, Archives Specialist at the Government Publishing Office (GPO) provided an overview of GPO’s web archiving activities and explained how legislation Title 44 of the U.S. Code is the mandate for Public Printing and Documents. Specifically, Chapter 19 discusses the Depository Library Program. Federal agencies should be distributing and publishing their documents through the GPO. If a federal agency is not doing so, the Superintendent of Documents should be notified. Unfortunately, with born digital publications and publishing directly onto websites this is not happening and material is being missed. GPO joined the Archive-it community in 2012 and since then they have been using web archiving to help fill this gap.

The web archive is part of the overall GPO Digital Collection Development Plan. For material to be part of the GPO collection, it must be federally funded and be within the scope of the Federal Depository Library Program (FDLP). GPO is also focusing on collections of interest geared towards women, tribal libraries, and Native American communities, just to name a few. GPO maintains over 213 collections in Archive-it, making up 38.3 TB of data and consisting of over 392 million urls. You can explore the Federal Depository Library Program Web Archive here.

Smithsonian Institution – Libraries and Archives

Lynda Schmitz Fuhrig, Digital Archivist at the Smithsonian Institution Libraries and Archives, presented next. Lynda provided a fascinating overview of the web archiving work that’s being done at the Smithsonian, which celebrated its 175th anniversary in 2021.

The Libraries and Archives of the Smithsonian is the official repository and record keeper for the Smithsonian. Their responsibilities include sharing, collecting, and preserving Smithsonian activities and output which includes documenting its unique web presence, which launched in 1995. They now have nearly 400 public websites and blogs and a very active social media presence covering Twitter, Instagram, Flickr, and YouTube – just to name a few.

Like the National Library of Medicine, the Smithsonian has documented the impacts and effects of the COVID-19 pandemic on America. For example, beginning in March 2020, more focused and frequent crawls were necessary to document the altered scheduling of the closing and reopening of the Museums and the Zoo. Additionally, the closure of museums created a need for an increased digital presence, and the Smithsonian launched several new websites and initiatives, including Care Package, Vaccines & US and Our Shared Future.

Audience members were particularly interested in their web and social media workflows and tools. Along with Conifer and Browsertrix, the Smithsonian uses netyltic, which was developed by the Social Media Lab at Ryerson University to collect Smithsonian hashtags and accounts.

They are one of the few organizations that download the WARCS from Archive-it. They developed an in-house tool called WARC Retriever which they hope to release on Github later this year. Lynda’s summation was poignant: “The Smithsonian web collections will continue to tell the history and stories of the Smithsonian.” You can explore the Smithsonian Archive-it page here.

Library of Congress – Web Archiving Program

To round out the panel, Meghan Lyon and Lauren Baker, Digital Collection Specialists from the Library of Congress (LOC) Web Archiving Program, provided an overview of the activities at LOC. The Web Archiving Program began in 2000 and is part of the Library’s Digital Content Management Section.

The LOC web archives consist of 3PB of data organized into 174 collections, 75 of which are active collections. Like many of the other speakers, the Web Archiving Team collaborates frequently on web archive collections, relying on the contributions of collaborators around the Library. The Collection Development Office helps guide collection initiatives, and various committees review subject proposals and select content to archive and determine collection focus. LOC comprehensively collects content from Legislative Branch Agencies and U.S. House and Senate offices and committees. They collect content about U.S. national elections as well as managing other events based and thematic collections.

Megan and Lauren addressed the issue of permission and web archiving. Their permissions policy is determined by the Office of the General Counsel, which is based on Copyright Law. Permission requests must be sent to site owners for anything selected for web archiving. There are two permission requests: a permission to crawl, and permission to display based on the country of publication and the category of the entity. You can explore the LOC Web Archiving Program website here.

The panel closed out the session by discussing how they became interested in web archiving and how their careers started in the field. Their initial experiences ranged from practicums to learning on the job. The remainder of the conversation also included the topics of trends, the and future of web archiving tools – including what improvements people hope for and imagining better tools for harvesting and user awareness. The session was well attended with 181 registrants and over 80 attendees. Thank you to everyone who presented and who attended for such an engaging hour. Stay tuned for our next coffee chat, which will be in May!

Introducing the DocNow App

This week’s post was written by Zakiya Collier, Community Manager at Documenting the Now.

This week the Documenting the Now project announces the release of DocNow, an application for appraising, collecting, and gathering consent for Twitter content. DocNow reimagines the relationship between content creators and social media analysts (archivists and researchers) by addressing two of the most challenging issues of social media archiving practice—the issues of consent and appraisal.

The Documenting the Now Project is happy to release version 1.0 of our open-source tool freely for anyone to use. Read all about the app, what it does, and how to engage with the DocNow project team for support and providing feedback.

Over the last seven years, Documenting the Now has helped to foster an environment where a more broad sector of cultural memory workers can learn about web archiving tools and practices and can become involved with web archiving networks. This has largely been achieved by practicing transparency and inviting people who have traditionally been left out of established web content archiving networks into the project to participate, namely students, activists, and archivists who represent marginalized communities and who work in community-centered organizations, HBCUs, public libraries, community-based archives, and tribal libraries and archives.

Documenting the Now was a response to the need among scholars, activists, archivists, and other memory workers for new tools that would provide easily-accessible and user-friendly means to collect, visualize, analyze, and preserve web and social media content to better document public events. In addition, it aimed to respond to questions and concerns related to ethics, safety, intellectual property, and access issues for the collection, preservation, and dissemination of Twitter data in particular.

Documenting the Now has also developed community-centered web and social media archiving tools that both prioritize care for content creators and robust functionality for users:

  • Twarc – a command line tool and Python library for collecting tweet data from Twitter’s official API
  • Hydrator – a desktop application for turning Tweet ID datasets back into tweet data to use in your research
  • Social Humans – a label system to specify the terms of consent for social media content
  • The Catalog – a community-sourced clearinghouse to access and share tweet identifier datasets

In continuing to support and develop tools that embody ethical practices for social media archiving, the DocNow app joins this suite of tools. DocNow is an application for appraising, collecting, and gathering consent for Twitter content and includes several new features including:

  • Trends tab to view trending topics across the globe in real time
  • Explore tab to view content by users, media, URLs, and related hashtags all on one screen
  • Live testing and refining of collecting parameters on recent tweets
  • Tweets per hour calculator to easily identify Twitter bot accounts
  • Search and Collect tweets back in time via Search API and forwards with Stream API
  • Activate toggle to start collecting tweets and send a notification tweet to encourage transparency and communication in Twitter data collection
  • Collections tab to share information about your collection with the public
  • “Find Me” and Insights Overview features to specify and gather consent using Social Humans labels
  • Download Tweet ID archive for sharing following Twitter’s terms of service

The DocNow app also works in concert with other Documenting the Now tools, creating for users, a 4-step social media archiving journey:

Step 1: Collect content with the DocNow App by activating a search. Set collection limits and explore insights as your collection grows.
Step 2: Download your archive from the DocNow App, which includes a Tweet Viewer, Tweet IDs, and media files.
Step 3: Hydrate your Tweet IDs from the archive’s tweets.csv file back into full
tweets using DocNow’s Hydrator desktop application.
Step 4: Describe your collection and share your Tweet IDs with other researchers by adding them to the DocNow Catalog.

Ways to Use DocNow
There are 3 different ways to use DocNow including joining the community instance, running DocNow locally on a computer, and installing an instance of DocNow in the cloud. The Community Instance is a great way to get familiar with the tool before committing to running an instance but those with development skills may want to administer their own instance of DocNow locally or in the cloud.

  1. Join Documenting the Now‘s hosted community instance
  2. Run DocNow locally on your machine
  3. Install your own instance in the cloud

For help with installation and getting started, the Documenting the Now team will host community conversations. Dates will be announced soon! More information about the DocNow App can be found here.

Documenting the Now is seeking community input on all of our features as we continue to develop DocNow. Please join our slack channel by going to our website or email us at info@docnow.io.

US Government Web Archive Coffee Chat: Video and Slides

On Tuesday, March 29, 2022 the SAA Web Archiving Section was honored to host Lauren Baker and Meghan Lyon, Library of Congress; Dory Bower, Government Publishing Office; Lynda Schmitz Fuhrig, Smithsonian Institution Archives; Christie Moffatt, National Library of Medicine at the National Institute of Health. Section chair Melissa Wertheimer moderated the panel.

A video of the coffee chat and presenter slides will be available for a limited time.

Video

US Government Web Archive Coffee Chat

Presenter Slides

Web Collecting at the National Library of Medicine, presented by Christie Moffatt

Web and Social Media Archiving, presented by Lynda Schmitz Fuhrig

Library of Congress Web Archiving, presented by Meghan Lyon & Lauren Baker

Learning How to Web Archive: A Graduate Student Perspective

This week’s post was written by Amanda Greenwood, Project Archivist at Union College’s Schaffer Library in Schenectady, New York.

While it is not a new aspect of archival work, web archiving is an important practice that has advanced significantly in the past ten years. Developers are creating new web archiving tools and improving the current ones, but the process can be challenging because of dynamic web content, changing web technology, and ethical concerns. However, the need for preserving web-based material has become a priority for improving preservation, creating “greater equity and justice in our preservation practices, and [finding] ways to safeguard the existence of historical records that will allow us in future to bear witness, with fairness and truth and in a spirit of reconciliation, to our society’s response to COVID-19” and other social justice issues. The explosion of web archiving initiatives in various institutions and organizations has created web archivists of us all; however, how difficult is it for someone with zero experience to learn this important skill?

From October 2020-May 2021, I was awarded the Anna Radkowski-Lee Web Archives Graduate Assistantship at the University of Albany, State University of New York. My responsibility was to manage the web archives collections with Archive-It through web crawling, scheduling, reading host reports, and rescoping. These activities would culminate with a meeting at the end of my assistantship to discuss appraisal of the active collections. While I was excited to learn this archival skill, I did not have any experience with web development, so translating the host reports took a few months to learn because I did not understand some elements of the URLs that were in the host reports. For example, I did not know that some parts of a URL told me the website was a WordPress site, and I had never heard of “robots.txt” before. Thus, my supervisor, University Archivist Greg Wiedeman, spent a lot of time at the beginning of the assistantship teaching me how to translate the host reports. I needed to learn HTTP codes and other parameters that would help me make sense of how to rescope the crawls more efficiently.

I really appreciated the support from everyone at the Archive-It Help Center because they were instrumental in helping me solve a lot of problems related to the crawls. However, I felt the Help Center website instructions and tutorials were a bit difficult to follow at times. I think they are probably easier to for more experienced users to understand, or at least for users who are familiar with web development. The other frustrating element was that managing the collections via web crawling was extremely time-consuming, but I was only allowed to work 20 hours a week per the assistantship. It was quite a substantial role, and I realized the job required someone to work on this full-time and not only 4-5 hours a day.

The unpredictability, frequency, and length of website updates also proved to be a large obstacle. I would work hard to efficiently scope one collection and schedule it, but the host report would come back with an error because the website was being updated. I often had to put those collections aside and return to them at a later date, but then the update would yield a whole new set of problems with the host report, and I would spend more time rescoping and rerunning test crawls for those collections. Test crawls would also require multiple runs and constant rescoping, which meant that I had to run longer test crawls, and some of those would take four weeks. The host reports were difficult to decipher because of crawler traps or long strings of characters. Learning how to read the host reports took the longest amount of time.

Additionally, I would get so excited at a potential strong test crawl, but after QA I would notice dynamic content like interactive web elements were not captured. Thus, I would need to rescope and rerun the test crawls and use other web crawling tools like Conifer. I looked into using other open source tools such as WARCreate, WAIL, and Wget, but Conifer ultimately helped me with what I needed at the time.

Moreover, we often were presented with impromptu requests to archive a faculty website in a timely fashion because of retirement or the website host was taking down the website, so it was stressful prioritizing those new projects. Some of the faculty websites utilized advanced web development elements, so I needed to use the aforementioned open source web crawling tools to capture the site and upload the WARC file into Archive-It.

In sum, having zero web developing and coding experience did not impede my ability to learn web archiving, although having it would have helped me significantly. Having a subscription to Archive-It can be helpful because of the technical support and video tutorials, but open access tools and software are plentiful, and the community is helpful and supportive in terms of training and instructive documentation. With my web archiving training, I was able to help other institutions initiate their own web archiving programs, and I hope to continue on this trajectory in the future.

Source:

Jones EW, Sweeney S, Milligan I, Bak G, and McCutcheon J-A. 2021. Remembering is a form of honouring: preserving the COVID-19 archival record. FACETS 6: 545–568. doi:10.1139/facets2020-0115

Community Input for Web Archiving at Small and Rural Libraries

This week’s post was written by Grace McGann (Moran), Teen Librarian, Tipp City Public Library.

Before I begin, I want to acknowledge that Tipp City Public Library is on the unceded lands of the Kaskaskia, Shawnee, Hopewell, and Myaamia.

The Tipp City Public Library has been open for nearly a century now, in a town rich with history. We are located in the downtown historic district of a city whose population falls just below 10,000. Because we are such an ingrained part of the community, it is difficult to work here and not notice how our patrons care about our city’s history. This blog post briefly examines the importance of involving our respective communities in the collection development process.

My previous experience in web archiving at the University of Illinois drove me to apply to the Community Webs program at the Internet Archive. This program was built for small cultural heritage institutions to create a diverse web history while using Archive-it technology at zero cost.

Having never built a web archive from the ground up, one of the first questions I faced was: where to start? I find that with any collection, the problem isn’t necessarily struggling to find materials, it is choosing a specific policy of selection and collection. In a previous blog post, I made an argument for separate collection development policies for web archiving. Having one in place makes evaluating websites simpler, especially when the discovery process relies on outside entities such as community members.

After joining Community Webs just a few months ago, I had a curated list of websites for capture. However, these only came from my cursory searches around the internet. In order to move beyond those first websites and create a more representative archive, I leveraged the local network. First, I reached out to a sociology professor at the University of Dayton, Dr. Leslie Picca. I knew that she was doing research on race and could possibly have connections to colleagues in the digital humanities. She led me to Dr. Todd Uhlman of the history department at UD.

Connecting with Dr. Uhlman changed my thinking about how to build this web archive. In learning about the digital humanities work he is doing in the Greater Dayton Area, I found valuable websites to be captured and preserved.

After speaking to Dr. Uhlman and agreeing to capture his content, I was contacted by community members who wanted websites to be preserved. After evaluating and capturing the websites, I created a community input form. While I am still waiting for this form to gain more traction, my theory is that community input is crucial for web archiving at small cultural heritage institutions.

I am not alone in this assertion. Papers about community archival practices demonstrate an urgent need for this sort of involvement. Zavala et. al. (2017)(though speaking more about physical archives) shared this:

There is no reason why government or university archives could not engender post-custodial practice, foster community autonomy and promote shared governance, if only they are willing to share power and authority with the communities they have historically left out.

By changing the way we engage in collection development, we challenge the systems of oppression that have been institutionalized within record-keeping institutions (whether we are aware of them or not).

I have a lot left to say about web archiving, but I want to drive this point home. Archives, whatever form they take, provide cultural value. Culture does not exist without community. Therefore, it’s actually pretty simple: communities should help create archives.

Greetings from the Web Archiving Section: A New Beginning!

The 2021-2022 Steering Committee met for the first time in September to discuss our goals for the upcoming year, and we are looking forward to continuing with our Coffee Chats and publishing some great content in the form of blog and Twitter posts. We would also love for people in our section to get involved and share their ideas, so we invite you to participate by submitting news, announcements, and topics of interest. We also welcome guest contributors to the blog, so please feel free to contact us with your ideas. Members of other SAA sections are encouraged to collaborate with us as well. Please send items and suggestions to Rosie Grant.

We have a few new members of the Steering Committee this year, so we are introducing ourselves by giving a brief introduction of our names, roles, institution affiliations, goals for the section, our favorite web archive, and something we value most or find fascinating about web archiving work.

Melissa Wertheimer

Hello! My name is Melissa Wertheimer, and I’m excited to be back this year as the Chair of the Web Archiving Section (WAS). I’m a Music Reference Specialist at the Library of Congress where my performing arts subject area duties include special collections librarianship, web archives curation, outreach, and acquisitions. I curate and manage three web archive collections for the Music Division, contribute to the Library’s interdisciplinary Coronavirus Web Archive, and represent the Special Collections Directorate on the internal Web Archives Collection Development Group. WAS had a great year last year, and this year’s steering committee is an enthusiastic and energetic group of people both returning (Kiera, Ryder, and myself) and new (Susan, Rosie, and Amanda) to the team. My primary goal leading the section this year is to build upon last year’s momentum with more Coffee Chats, blog and social media activity, and collaboration across SAA sections.

My “favorite” web archive collection always changes. This year, I’ll say it’s the Professional Organizations for Performing Arts Web Archive that I’ve curated at the Library of Congress because it relates to so many other disciplines like labor organizing, education, medicine, and technology. It’s a unique way to demonstrate the impact of the performing arts throughout the fabric of society. Over 100 items are available now, and my seed list has over 500 URLs; my goal is to use scale to preserve interrelatedness and the context of creation as much as possible by crawling websites that relate to each other through hierarchies and affiliations. Do any of our readers do the same with thematic web archives? One of the things I value most about web archives is that I can use many parts of my brain – archivist, subject specialist, digital curator – to curate meaningful collections that speak to the moment.

Susan Paterson
​​
Greetings! My name is Susan Paterson and I’m the current Vice-Chair of the Web Archiving Section. I am a liaison librarian at Koerner Library at the University of British Columbia Vancouver campus, which is located on the traditional, ancestral, and unceded territory of the xʷməθkʷəy̓əm (Musqueam) people. My liaison areas are government information, social work and French studies. My web archiving work stems from my role as a government information librarian. I have been involved in various groups in Canada to document and preserve historical government content. I am the current chair of the Canadian Government Information – Digital Preservation Network, made up of dedicated librarians across Canada who strive to curate federal Canadian government content as well as thematic government collections.

This is my first year as both a SAA and WAS member and I hope to learn much more about web archiving from my colleagues. I’m excited to network and collaborate with other web archiving enthusiasts as well as get to know other sections of SAA. I hope to build on the tremendous work of past committee members to make this section both a welcoming and an invaluable resource for web archivists. One of my favourite web archives I’ve helped to curate this year has probably been the COVID-19 Pandemic in British Columbia collection. We started to curate the collection at the start of March 2020 and we are still crawling important content. The sites are updated so frequently that i hope we’ve captured important content for future researchers.

Kiera Sullivan

Hi, all! My name is Kiera Sullivan (she/her/hers) and I am the Secretary of the Web Archiving section. I recently took up a new position as Processing Archivist for the County of San Diego, California. Previously, I was the University Records Processor at UC San Diego Special Collections & Archives. I am so happy to be returning for another year as Secretary of this section. Our section’s coffee chats were a major highlight for me last year, and I am excited for them to continue. The Steering Committee is making the coffee chats a priority again this year, and we are also hoping to engage other SAA sections in this endeavor. If you have any topics or themes that you’d like to suggest for a coffee chat, please do let us know!

One of my favorite web archive collections has to be the National Library of Ireland Collections 2011-2018, collected by the National Library of Ireland. I have recently been doing personal research on the Easter Rising of 1916, and am so happy I found this treasure trove. This collection includes quite a few websites of interest, many of which were captured for the centennial in 2016. What I find most fascinating about web archiving is that it is so dynamic, just like the content that it captures. It seems there are always tools, methods, and ideas to consider, and that makes it very exciting and gratifying.

Ryder Kouba

Hi, my name is Ryder Kouba and I’m the Education Coordinator for the Web Archiving section. I recently began my position as Librarian and Archivist at the American Center of Research in Amman, Jordan and am excited to start a web archiving program to document relevant materials to the institution. Previously I managed the Archive-It account for the American University in Cairo and then used Webrecorder to capture relevant pages while at the University of Hong Kong Archives. This is my second year as Education Coordinator, and this year I’m hoping to facilitate training in web archiving tools; it would be great to learn from archivists who have experience implementing tools such as Heritrix and Webrecorder since the command line can be a bit intimidating.

Last year, I mentioned my favorite web archive was the Politics and Revolution collection at the American University in Cairo; this year I’m proud of the 50 gigabytes of social media data I captured while working at the University of Hong Kong, documenting student protests and reaction to recent events in Hong Kong, though it is not accessible yet.

Rosie Grant

Greetings SAA – I’m Rosie Grant, and I’m the current SAA Web Archives Marketing and Communications Manager. I work for the College of Arts and Humanities at the University of Maryland where I’m a graduate student at the UMD iSchool, focusing on archives and digital curation. On the side, I run a TikTok about local cemeteries in the DC area. Previous to that, I worked as the digital manager at the National Building Museum in Washington, DC. What I hope for in this committee is to be able to feature and amplify the current stories of people in web archiving and the work of this talented community. There are so many tools and resources being made available that I’m looking to learn more about myself.

Recently I’ve been appreciating the crowdsourced web archives of Find A Grave, owned by Ancestry.com and a great source for looking up cemetery history and a database of individuals buried around the country. It’s great for genealogical purposes as well as cemeteries working to collect information about those interred there. What I find the most valuable about web archiving is the amount of open source tools available for archivists. Often when I think about web archives, I think about expensive and inaccessible technology for smaller and under-resourced archives. It’s incredible to learn about the number of tools being made available and are more accessible for such groups.

Amanda Greenwood

Hello everyone! My name is Amanda Greenwood (she/her/hers), and I am the Student Member of the Web Archiving section. Currently, I am finishing up my MSIS in Archives and Records Administration at the University of Albany, New York (UAlbany), and I am working in the Special Collections/Archives and Preservation departments at UAlbany. In 2020, I was awarded the Anna Radkowski-Lee Web Archives Graduate Assistantship at the University at Albany, New York, and I managed over 40 web archives collections using Archive-It and Conifer. This year, I will contribute to the committee by helping to manage our section blog, so I am interested in recruiting writers that can help keep our community up-to-date with trends, new and helpful technology, and important web archiving projects.

My favorite web archive is one that I captured and preserved earlier this year. The No Gun Ri Digital Archive was created by UAlbany’s Dr. Donghee Sinn, along with members of her No Gun Ri Research Team, and it acts as a digital memorial that describes a mass killing of South Korean civilians by the United States Army in the beginning of the Korean War. It serves not only to make others aware of this crime, but to give a voice to the survivors and families of the massacred so that they can tell this event from their point of view. What I value most about web archives is that websites from the past are made accessible today. I love visiting the Internet Archive’s Wayback Machine and randomly looking at websites that were originally put on the web in the late 1990s!