Lessons from pandemic web archiving

This post was written by Web Archiving Section chair Tori Maches, Digital Archivist at UC San Diego.

When my supervisor first spoke with me about COVID-19 web archiving, it was late February 2020, COVID-19 hadn’t been declared a pandemic yet, and there was no confirmed community spread in San Diego. A few weeks later, the UC San Diego campus had been evacuated, the governor had issued a stay-at-home order, and I was adapting to a new normal while documenting the campus and county COVID-19 response.

Over the last 17 months, I’ve captured web content related to the campus and San Diego County COVID-19 response from campus, local government, and local news sites. Although I’ve been responsible for UC San Diego’s web archiving work since I started working here three years ago, the COVID-19 collection has felt like a crash course. With that in mind, this blog will cover challenges I’ve run into during this roller coaster of a time, as well as lessons I’ve learned and continue to learn.

Documenting a pandemic meant dealing with both logistical and emotional challenges. Once pages stop explicitly mentioning COVID-19, for instance (e.g. talking about measures or guidelines, but assuming a reader will know why they exist), it can be harder to identify relevant material. To avoid missing material, I kept an eye on local news to see if anything new was happening, and paid attention to new linked pages and resources when I did QA for recent crawls. Coworkers also gave me updates about initiatives, policy changes, and available information, all of which informed my collecting. This helped me balance my web archiving work with other job duties, while still collecting new material as needed. Since I was the only person working on the collection for the most part, this was crucial.

Scope creep was an even greater challenge. The usual “digital FOMO” multiplies tenfold when you’re documenting a historic, rapidly changing event – it’s hard to argue with the impulse to capture everything when the world feels like it’s on fire. I had to step back, ask myself “should I be capturing this,” and recognize that sometimes the answer would be no. Some content was duplicative, was available in “good enough” form elsewhere, or would require a lot of care and thought to capture the right way. I kept my scope to the campus and county response partly because that was my original charge and I didn’t have the bandwidth to expand, and because I wasn’t sure I could capture social media during an ongoing traumatic event in a sensitive way.

Self-care was also a pretty significant challenge. Working on this collection was incredibly rewarding, and I’m proud of how it’s turned out. At the same time, documenting my hometown’s pandemic response is one of the hardest things I’ve ever done. It took me a while to build breaks into my QA workflows, or ask for help when I needed it, but it was absolutely worth it. Some of my coworkers were able to help with QA for a while, which was invaluable. Talking with friends and colleagues who have worked with trauma records helped as well. It was a good reminder that many of us have done and are doing this kind of work, and we’re not doing it alone.

I’m proud of the work I’ve done over the past year and a half. I’ve documented policy changes, testing initiatives, calls for volunteers, messages of hope and encouragement, and people working together to survive and protect each other in an almost unprecedented time. I’ve learned once again that archival work is about managing and mitigating loss, especially when working with high volumes of material. Keeping my collection scope in mind, remembering I wasn’t the only person documenting this time, and thinking about the logistical and ethical implications of what I collected all helped. I also learned that sometimes, the most productive thing to do is take a break. Most of all, pandemic web archiving was a good reminder that archival work isn’t done in a vacuum, even if you’re the only person working on a collection day to day.

Coffee Chat Recap: Accessibility and Web Archiving

By: Kiera Sullivan and Lydia Tang

The Web Archiving and Accessibility & Disability sections co-hosted an engaging coffee chat about accessibility and web archiving on May 19, 2021. Dr. Lydia Tang, Immediate Past Chair of A&DS, gave a talk on accessibility tools, accessible web design, and web archiving. Big thanks to Dr. Tang for her presentation (view her slides!), and also to everyone who attended, asked questions, and shared their thoughts.

Dr. Tang’s presentation highlighted the idea of designing accessibility from the very beginning. She pointed out that the way a website is designed will affect how a screen reader navigates the content. Consider the web content that we collect – or may one day collect – in our web archiving activities. How accessible are archived websites for screen readers? They are only as accessible as they were created – which has been an evolving concept for developers.  Can or should we attempt to remediate inaccessible archived websites? Further, how accessible are the platforms that we use to keep our web archives, and how about our own websites?

The good news is this is one wheel we do not need to reinvent. The Web Accessibility Initiative (WAI) of the World Wide Web Consortium (W3C) has published a series of web accessibility guidelines. The Web Content Accessibility Guidelines (WCAG) is an industry-standard that is continually being updated, and it is a tremendous resource for making web content accessible to all users. As archivists we cultivate expertise in preserving material and making it accessible, whatever the format. If web content is one of the formats we work with, perhaps it is a professional duty to learn to recognize the components of the websites with which we engage and which we collect. These guidelines are an excellent place to start. Additionally, there are a number of groups that work on building resources, training, and awareness of accessibility best practices, such as the DLF Digital Accessibility Working Group and the Society of American Archivists’ Accessibility & Disability Section.

Dr. Tang shared her go-to tool for evaluating web accessibility: the WAVE tool, which she uses regularly as a browser extension for Chrome. Using WAVE, she demonstrated how easy it is to analyze a website for automatically detected errors such as missing alt-text, contrast issues, and other under-the-hood issues that might not be apparent but could greatly impact the website experience for people using screen readers and other assistive technology. It certainly isn’t a catch-all but is at least a big first step. She also shared other tools and resources, including WebAIM’s Contrast Checker, and a variety of assistive tech programs to try out – see her slides for a list of these!

One powerful takeaway from the conversation was the idea that we need to encourage and facilitate a culture of accessibility in our workplaces. Accessibility should be a regular component of our usual discussions about workflows and projects. It means thinking about the tools and templates that we use, becoming familiar with their accessibility features, and improving our own use of them. Coffee chat attendees shared their ideas for how we can achieve this, including playing around with dark mode, high contrast mode, and color schemes on our websites. The discussion also included looking at a WARC file together and questioning whether WARCs preserve alt-text and other accessibility features of a website.  We wrapped up the hour by considering whether accessibility statements – incorporated in the Conditions Governing Access note or a similar note – should be a required element for descriptive standards.

Did you attend this coffee chat? Have any afterthoughts, questions, or key takeaways that you would like to share? Please feel free to drop your thoughts in the comment section of this post. Do you belong to an SAA section that might be interested in co-hosting a coffee chat with the Web Archiving section? Let us know!

What coffee chat conversations would YOU like to see the Web Archiving section take part in? Leave your suggestions in the comments or reach out to any of the steering committee members with your ideas!

Web Archiving Many Voices: Documenting COVID-19 and Marginalized Communities at Arizona State University

This week’s post was written by Shannon Walker, University Archivist, Arizona State University.

The COVID-19 pandemic gives us, as memory workers, the unusual opportunity to document a crisis as it is occurring. As with many institutions, University Archives staff at Arizona State University quickly recognized the need to document a rapidly unfolding history and its impact on our institution and community through the use of our web archiving tools.

Several factors highlighted the urgency of the task. Firstly, this was likely a once in a lifetime occurrence. We did not have time to develop a new process or procedure for “how to document a pandemic.” Secondly the crisis itself has a temporal nature, it was important to capture sites that were continuously updating, knowing that in six months or a year they might no longer refer to COVID-19 (we hope!). 

Deciding What to Capture

While there is no official mission statement for the web archiving program @ ASU, we have developed a draft of general guidelines for prioritizing the websites we capture, preserve and make available for research. Essentially:

The purpose of the Web Archiving Program at the ASU Library is to develop best practices for capturing born-digital, web-based information produced by the University. The first priority is identifying sites that complement the University Archives’ Collection Development Policy. The Web Archiving Program will also attempt to capture web-based publications for the broader ASU umbrella, especially as mandated by the University Archives Records Management Program, as well as current events and specialized collections.

In addition to being guided by our internal guidelines, we also had in mind ASU’s Charter

ASU is a comprehensive public research university, measured not by whom it excludes, but by whom it includes and how they succeed; advancing research and discovery of public value; and assuming fundamental responsibility for the economic, social, cultural and overall health of the communities it serves.

So, with both principles guiding our web archiving selection criteria we sought to address some of the following questions:

  • As we document COVID-19 at ASU, how can we be inclusive?
  • As we select sites to capture, do they document the more prominent voices on campus or marginalized communities or both?
  • Who might we be excluding? Can excluded groups be included through web archiving?
  • When a researcher is using these materials 10 years down the road what will they see? What will they not see?

Creating the COVID-19 @ ASU Response collection

The initial phase of this effort included identifying, reviewing and capturing our own institution’s websites. We wanted to document the School’s official response as well as the experiences of employees and students. We sought out and captured seeds from the Office of the President, the Office of the Provost and ASU Now, one of our campus news resources. We also captured a site from the University’s governing body, the Arizona Board of Regents

Additionally, we sought to document the impact of the pandemic on the student experience. We were able to identify seeds from the Admissions Office, Financial Aid, University Housing, and Greek Life (fraternities/sororities). In addition, a few individual schools created lists of resources for their students. Many of these lists focused on the mental, emotional, and financial consequences of the pandemic, recognizing the impact on the whole student, far beyond the logistics of classes and technology.

As we were identifying seeds to capture we were fortunate to locate a few that provided the opportunity to document the voice and experiences of underrepresented groups on campus. We say fortunate because we know that marginalized groups do not always have a prominent presence on the website, but they are an important part of what we hoped to document. Some of the sites included in this group were the Alliance of Indigenous Peoples, ASU’s American Indian Policy Institute, the International Student and Scholars Center, and the American Indian Student Support Services. 

Importantly, many of these sites addressed issues of equity and the digital divide, which was a significant issue for students here at Arizona State University as we went to online classes. 

Challenges and Unexpected Discoveries

As with any project, there were challenges that made us profoundly aware of the limits of web archiving. For one, not all campus experiences are captured on published campus websites. We know that many of the raw experiences of students, staff and faculty were being documented on social media sites. However, we chose not to capture these sites because of privacy concerns.  In addition, some website captures did not go well. After reviewing them, we could see that elements of the original site were not captured. We continue to work on improving those crawls. 

As an added bonus to our efforts, we were able to identify and collect materials in other digital formats (namely PDF or JPG) that further document COVID-19 at ASU. An additional unexpected discovery was that a few of the sites we targeted for crawls were already being well captured by the Internet Archive. In that case, we needed to figure out how to point to them as part of our curated collection.

Conclusion

Our efforts to identify, capture and collect websites related to COVID-19 at Arizona State University are only a part of the pandemic story for future researchers. We hope that the sites we chose, and were successfully able to capture, will present a varied and diverse perspective considering the limits of web archiving technology. It is not a perfect tool but it is timely and nimble, and becoming an increasingly important part of our toolkit as we seek to expand the narrative of our school’s history.

Pitching Web Archiving to your Institution: Where to Begin?

This post was written by guest blogger Andrea Belair, Archives and Special Collections Librarian at Union College in Schenectady, NY.

Does your organization engage in web archiving? Today, many organizations now collect websites in some capacity. Over the years, I think it’s become easier to make an argument that web content should be collected. Most of us, even the least tech-savvy of us, understand that web pages are not permanent and get taken down or deleted when not maintained.  Many hard lessons have already been learned by large amounts of content that has been lost overnight, a tweet deleted by its author, or a page that is no longer available on a site. Regularly, now, the public is pointed to the address of a website to get vital information, much to the detriment of those who might not have access to a computer; however, the website is the public record, the source of information. We have seen this recently with the COVID-19 epidemic and most information about it, as well as its vaccination. 

If you intend to make a pitch to start a web archiving program at your institution, of course, you need to consider your audience: What do they care about? More importantly, though, do you belong to a public or private institution? It is likely that your organization is creating web content, and, similarly, it’s likely that this web content can be considered institutional records of your organization. 

At a former job, during a meeting about web archiving, one attendee told me that I needed to take off my records management hat and put on my archivist hat. The university archivist came just after that statement, running a little late. Having missed the rest of the conversation, he sat down, and he stated, “This is really about records management.” In terms of collecting and archiving web content, there is the collecting side and an argument to be made there, and there is the side from the vantage point of institutional records. It can be hard to show scholarly uses for web collecting, although now it’s an easier case to make when you look at the amazing web collections of the Library of Congress, for example. However, if you’re pitching a web archiving program, think about your stakeholders. Chances are that if your stakeholders are, say, executive administration in a private college, you might want to reconsider your pitch if you focus on history, or some variation of history and why history is important. Stakeholders know that history is important, and executives know that, but they consider it the job of the library to take care of that and they are not really going to think that justifies more of a budget allocation. I recommend that, instead, you look at the requirements of your accrediting institution, if you are a private college. If you are a public college, look at your local, state and federal regulations and mandates on records. Web pages are official documents of the institution, and they often fall within mandated records retention schedules. For private colleges, it is very possible that the accrediting organization to which your institution belongs has records requirements as well, although I am not familiar with all accrediting institutions, unfortunately. If there is a legal mandate to be met, go that route. If there is an accrediting mandate to be met, go that route. 

It’s not that the case for history and its importance can never be made, however. If there is a new mission statement or vision statement at your institution, and you can connect it to your pitch, by all means: now is your chance. If there is an upcoming inauguration or similar major event, that is another opportunity, because we all want to make sure that this important event and person is written into history.

What do you get when you hand a Master’s student a web archiving program?: Part Two

The following is a guest post by Grace Moran, Graduate Web Archiving Assistant – Library Administration, University of Illinois Library

In the previous installment of this series, I explored the complementary issues of metadata and access for web archives. In this second and final installment, the more human aspects of this year’s endeavor come to the fore: policy and personnel. I will also briefly describe how the current pandemic has informed web-archiving efforts at the University of Illinois.

If you did not read the previous blog post, let me revisit my background. I am the Graduate Web Archiving Assistant, working for Dr. Christopher Prom, the Associate Dean for Digital Strategies at the University of Illinois. This year, I have been charged with participating in day-to-day activities related to web archiving such as running crawls and doing quality assurance. I also engage in high-level organizational thinking about the future stewardship of the library’s burgeoning web archiving program. The program began in 2015, and since then the University of Illinois has captured 5 TB worth of data using our Archive-It subscription. Multiple units take part in the curatorial side of this endeavor: the University Archives, the American Library Association Archives, Faculty Papers, and the International & Area Studies Library. We are hoping to start running crawls for the Illinois History & Lincoln Collections in the near future.

As I noted, this post is focused on policy and personnel. These are two areas that currently present a challenge for my institution. We do not have centralized documentation, a web archiving-specific collection development policy, or a position other than my own dedicated to web archiving. The following is what I envision the program looking like in the future and will be highlighted in my final report to my supervisor at the end of the academic year.

What does policy mean for web archiving? I believe institutional web archiving policies and procedures should be composed of the following:

  • A collection development policy unique to the institution’s web archiving program (a general organization-wide collection development policy does not suffice given the unique nature of the content being collected)
  • A clear, centralized workflow outlining how crawls are to be run, troubleshooting documentation, and chain-of-command for web archiving
  • A statement on copyright and ethics in web archiving (Niu in “An Overview of Web Archiving,” cited below, touches on copyright)

Though it may be extremely obvious to some readers, it is worth saying: policy should be public. As someone who works for a public university, I am painfully aware of the importance of policy accessibility for our stakeholders.

What about personnel? Who should be running a web-archiving program? How many people should be involved? Of course, this is something that varies from institution to institution; however, my experience has made clear the need for a dedicated point person. This person could be:

  • A graduate web archiving position, like my own, working 20 hours a week to coordinate crawls across units, run Quality Assurance, and populate metadata fields.
  • A civil service or academic professional position with at least a 50% appointment to web archiving. If an institution is looking to grow their web archiving program, they should consider making this a 100% appointment for the first couple years and then having the point person slowly transition towards additional activities related to digital strategies of the library.

I should note here that a graduate web archiving assistant is a great way to support your web archiving program (yes, I am biased) but there are some drawbacks to placing this responsibility on the shoulders of a temporary employee. If you are just beginning your program, you may find that a part-time position does not fulfill the needs of your program. Additionally, there are advantages to long-term employees who have institutional knowledge and memory and therefore, understand the administrative history of digital programs within your organization. Time is lost when it is necessary to re-train someone for a position annually or bi-annually.

Side Note: I want to make clear that graduate employees are so important. They bring a fresh set of eyes to problems and the opportunity to learn from a graduate position can be absolutely priceless for someone like me. Please consider funding your graduate students, there are a great number out there pursuing an unfunded MLIS and paying off student debt for years to come.

Finally, I want to highlight a unique opportunity given to web archiving programs this past year. COVID-19 has devastated lives the world over; it has also provided inspiration for innovation and creativity. This has been true at the University of Illinois at Urbana-Champaign. From a novel saliva-based PCR test, to rigorous testing protocols, to creating a ventilator in 12 days, the institution has tackled this problem head-on. Bethany Anderson, the Natural & Applied Sciences Archivist, and I have collaborated over the past year to run crawls to document COVID-19 at the university and celebrate what we have accomplished in one of the darkest years we have seen. To check out pages we have documented, you can visit https://archive-it.org/collections/13880. This is a great example of how web archiving allows us to document important moments now and preserve the historical record (which is increasingly electronic) for the benefit of future researchers.

I hope that you have identified with some part of this blog series; my hope is that if we create a dialogue about our triumphs and struggles, we can all learn something.

Sources Mentioned

Niu, Jinfang, “An Overview of Web Archiving” (2012). School of Information Faculty Publications. 308.
https://scholarcommons.usf.edu/si_facpub/308

“COVID-19 Response at the University of Illinois” (2021). The Board Trustees of the University of Illinois Urbana-Champaign. https://archive-it.org/collections/13880

For further questions, I can be reached at gmoran6@illinois.edu

What do you get when you hand a Master’s student a web archiving program?: A Two-Part Series

The following is a guest post by Grace Moran, Graduate Web Archiving Assistant – Library Administration, University of Illinois Library

Websites are fleeting; finding a way to preserve them and ensure accessibility for end-users is one of the greatest challenges facing record-keeping institutions today. The window for capturing a website is minute, with many pages disappearing within 2-4 years of their creation (See Ben-David, “2014 Not Found,” in the further reading list). This two-part guest blog series documents my efforts over this academic year to develop, standardize, and envision a future for the University of Illinois web archives collections through our subscription to Archive-It. This first post will explore the deeply related issues of metadata and access for web archives, as well as how they influence my proposal for the future of our web collections.

About me. I am a graduate student getting my MS in Library & Information Sciences at the University of Illinois, graduating this May. I have been working for my supervisor, Dr. Chris Prom (Associate Dean, Office of Digital Strategies) since the Spring semester of my senior year of my bachelor’s. Last year, he gave me the option of transitioning over from digital special collections and workflows and focusing on our young web-archiving program. This opportunity has taught me so much, and I want to share that with others. 

Now, for a bit of background on the program. The University of Illinois has been capturing web pages through Archive-It since 2015, but the program itself has been inconsistent in its stewardship. This year, my charge is the evaluation of the status of the University of Illinois subscription to Archive-It and the creation of a plan for its continued success. The culmination of the role will be a final report outlining my accomplishments for the year, evaluating programmatic needs for a web-archiving program, conversations I have had with stakeholders in the program, and recommendations for the future of web archiving at the University of Illinois. The challenges facing a burgeoning web-archiving program are numerous and varied, ranging from policy to accessibility. Preferably, by the end of the year, I will have identified opportunities for growth and layout a plan for the future, ideally making a case for a dedicated web-archiving position when the budget allows. With 5 TB of data archived since 2015, the care and keeping of these collections is of utmost importance; the time invested by the University of Illinois Library must not be wasted.  

My experience thus far has shown me that managing a web archive is no small feat. (Is this an obvious statement? Yes, but that doesn’t make it any less true.) Two of the most urgent issues I have seen are metadata and access; it is no secret that these two go hand-in-hand. The debate around metadata is not new. The OCLC Research Library Partnership Metadata Working Group identified this same problem in their 2017 report “Developing Web Archiving Metadata Best Practices to Meet User Needs” (in the further reading list below). As it stands now, we don’t have a standard for documenting archived websites. Standards such as DACS and MARC don’t fully account for the unique nature of websites, so some institutions have adopted a hybrid approach, using both archival and bibliographic metadata. The question remains whether or not a hybrid approach is sufficient. Is it necessary to create a new standard? Would this over-complicate things?

On the Archive-It platform, there are three possible levels of metadata: 

  1. Collection-level
  2. Seed-level
  3. Document-level

Our collections at the University of Illinois tend to have collection-level and seed-level metadata. This is due to a couple of factors. First, the labor of editing seed metadata takes enough hours without also creating document metadata; law of diminishing returns. Second, what is/isn’t a document is a bit difficult to understand on the Archive-It platform, making it unclear what benefit there is in creating document-level metadata. For now, the goal is to make collection- and seed-level metadata consistent across collections and make it standard practice to describe websites when they are crawled.

Deeply tied into the issue of metadata is that of access; how do we make sure that content reaches end-users? If a tree falls in the forest…..no I’m not going to finish that; too cliché. You get the idea. My point is this: taking the time to preserve digital materials is nearly meaningless if no one gets to enjoy the fruits of the labor. 

What does access look like for University of Illinois web collections? Right now, an end-user would have to navigate to the public-facing Archive-It website and either search “University of Illinois” or stumble across one of our collections because of a keyword search. We are not in any way alone in this; institutions all over are struggling with the idea of making collections searchable.

So what’s the solution? How do we implement an access system without relying on end-users to know we participate in web archiving and have collections on a website referenced nowhere in our systems? In my final report to the library, I am going to recommend both short- and long-term solutions. Short-term, Archive-It allows subscribers to add a search bar to their discovery systems allowing them to search through their Archive-It collections. This simple, elegant, and quick solution would make a significant difference for researchers. A long-term, more involved solution would be to index all of our pages locally and work to allow full-text search of collections. In the era of Google, our users are used to full-text search; discovery systems should fit that behavior.

If you are working with web archives, I hope that something in this post resonated with you. I hope that institutions will begin to collaborate more and find standard solutions so we can truly harness this technology to our advantage.

In my next blog post a month from now, I’ll talk about the more human side of things: policy and personnel. I also want to share a bit about our efforts to document COVID-19 at the University of Illinois. Stay safe, stay healthy, and stay tuned!

Further Reading:

  1. Belovari, S. (2017). Historians and web archives. Archivaria, 83(1), 59-79.
  2. Ben-David, A. (2019). 2014 not found: a cross-platform approach to retrospective web archiving. Internet Histories, 3(3-4), 316-342.
  3. Bragg, M., & Hanna, K. (2013). The web archiving life cycle model. Archive-It), https://archive-it. org/static/files/archiveit_life_cycle_ model. pdf (visited 14.2. 14).
  4. Brunelle, J. F., Kelly, M., Weigle, M. C., & Nelson, M. L. (2016). The impact of JavaScript on archivability. International Journal on Digital Libraries, 17(2), 95-117.
  5. Condill, K. (2017). The Online Media Environment of the North Caucasus: Issues of Preservation and Accessibility in a Zone of Political and Ideological Conflict. Preservation, Digital Technology & Culture, 45(4), 166.
  6. Costa, M., Gomes, D., & Silva, M. J. (2017). The evolution of web archiving. International Journal on Digital Libraries, 18(3), 191-205.
  7. Dooley, J. M., & Bowers, K. (2018). Descriptive metadata for web archiving: Recommendations of the OCLC research library partnership web archiving metadata working group. OCLC Research.
  8. Dooley, J. M., Farrell, K. S., Kim, T., & Venlet, J. (2018). “Descriptive Metadata for Web Archiving: Literature Review of User Needs. OCLC Research.
  9. Dooley, J. M., Farrell, K. S., Kim, T., & Venlet, J. (2017). Developing web archiving metadata best practices to meet user needs. Journal of Western Archives, 8(2), 5.
  10. Dooley, J., Samouelian, M (2018). Descriptive Metadata for Web Archiving: Review of Harvesting Tools. OCLC Research.
  11. Farag, M. M., Lee, S., & Fox, E. A. (2018). Focused crawler for events. International Journal on Digital Libraries, 19(1), 3-19.
  12. Graham, P. M. (2017). Guest Editorial: Reflections on the Ethics of Web Archiving. Journal of Archival Organization, 14(3-4): 103-110.
  13. Grotke, A. (2011). Web Archiving at the Library of Congress. Computers in Libraries, 31(10), 15-19.
  14. Jones, Gina M. and Neubert, Michael (2017) “Using RSS to Improve Web Harvest Results for News Web Sites,” Journal of Western Archives: Vol. 8 : Iss. 2 , Article 3. 
  15. Littman, J., Chudnov, D., Kerchner, D., Peterson, C., Tan, Y., Trent, R., … & Wrubel, L. (2018). API-based social media collecting as a form of web archiving. International Journal on Digital Libraries, 19(1), 21-38.
  16. Maemura, E., Worby, N., Milligan, I., & Becker, C. (2018). If these crawls could talk: Studying and documenting web archives provenance. Journal of the Association for Information Science and Technology, 69(10), 1223-1233.
  17. Masanès, J. (2005). Web archiving methods and approaches: A comparative study. Library trends, 54(1), 72-90.
  18. Pennock, Maureen (2013, March). Web-Archiving: DPC Technology Watch Report. Digital Preservation Coalition.
  19. Summers, E. (2020). Appraisal Talk in Web Archives. Archivaria, 89(1), 70-102.
  20. Thomson, Sara Day (2016, February). Preserving Social Media: DPC Technology Watch Report. Digital Preservation Coalition. 
  21. Weber, M. (2017). The tumultuous history of news on the web. In Brügger N. & Schroeder R. (Eds.), The Web as History: Using Web Archives to Understand the Past and the Present (pp. 83-100). London: UCL Press. doi:10.2307/j.ctt1mtz55k.10
  22. Webster, Peter (2020). How Researchers use the Archived Web: DPC Technology Watch Guidance Note. Digital Preservation Coalition.
  23. Wiedeman, Gregory (2019) “Describing Web Archives: A Computer-Assisted Approach,” Journal of Contemporary Archival Studies: Vol. 6, Article 31. https://elischolar.library.yale.edu/jcas/vol6/iss1/31.

Web Archiving Roundup: November 2020

Welcome to a new year with the Web Archiving Section! 

The new section Steering Committee met for the first time earlier this month and we’re all very excited about the coming year! One of our goals is to make the blog more active. We’d like to invite you to participate by submitting news, announcements, and other topics of interest. If you have a topic that you’d like to expand on, we’re also looking for guest contributors. Please send items and suggestions to April Feldman. Hope to hear from you soon.

Since we’ve all shared our “brief introductions” with the section via our candidate statements we’ve decided not to repeat ourselves. Instead, in this post, we’d like to introduce all of you to our favorite web collections. Our only requirement was that it be a collection we’ve personally worked on.

Tori Maches, Digital Archivist, UC San Diego

My favorite UCSD web archive collection is our San Diego Local Governments collection. It’s one of our oldest collections, dating back to 2007, and covers local municipal and government agency websites, as well as websites related to local government activity. I grew up in San Diego, and remember interacting with some of these websites when they looked the way they did in the earliest crawls. Recontextualizing websites you used in high school as historical objects is a weird experience, and it’s a good reminder of how quickly things change online.

Melissa Wertheimer, Music Reference Specialist, Library of Congress

I’m excited to share the LC Commissioned Composers Web Archive. This was my first venture with web archives, and I’ve been curating it since 2018. I’m a flutist who specializes in contemporary repertoire, so the topic felt natural! I think web archives are perfect for a digital record of the Music Division’s active commissioning of living jazz and classical composers. This collection is a resource in itself, but also leads users to our unique collection materials through abstracts with links to finding aids for composers’ papers and online catalog records for commissions’ manuscripts and electronic files.

Ryder Kouba, Collections Archivist, University of Hong Kong (Pok Fu Lam)

The Egyptian Politics and Revolution Collection (started by Stephen Urgola and Carolyn Runyon) documented Egyptian politics from 2011-present day (though politics have been dead since 2014 or so). As many archivists have written about collecting in the US, concerns over privacy were paramount, though difficult in a (for a time) fluid situation. Website blocking and censorship also made our work more difficult in finding content, but more important in providing access to blocked websites through the Wayback Machine (which was temporarily blocked in Egypt, logically we had concerns over exactly why the government decided to reverse that decision).

April Feldman, University Archivist, California State University, Northridge

We don’t have an active web-archiving program per se, so I haven’t worked on a favorite collection yet. We’re trying to implement a starter program, something small and scalable. In the meantime, we’ve just started using the Wayback Machine to capture limited Cal State Northridge websites and Wakelet for websites and social media posts related to CSUN’s Covid-19 response. I’m a project team of one right now, trying to figure it out as I go (love that OJT!) I’m completely open to suggestions or tales of woe if anyone wants to share.

Kiera Sullivan, University Records Processor, UC San Diego

The collection that I would like to highlight is the UC San Diego Web Archives collection. This is perhaps unsurprising, considering my role in the University Archives! This large collection captures a wealth of information from websites across the campus administration and community. Part of what I love about this collection is that although it is already extensive, there is a ton of room to grow and refine. This year, I will be working on identifying gaps in our collecting of student and community content, scoping crawls to address those gaps, and also improving the existing metadata across the collection.

Allison Fischbach, Research and Archives Associate, Towson University / MLIS student in Archives and Digital Curation, University of Maryland iSchool

The collection I am most proud of is our COVID-19 University Response Collection. This is the first web collection I have curated, and it is especially precinct as the pandemic continues. What I like about this collection is that it includes both valuable information about how institutional operations responded to COVID-19, as well as how the community continues to enact positive change. It is a collection that continues to grow, and one I know will have monumental value to future users. I feel fortunate to be able to work with it as a student archivist.

Web Archiving Roundup: August 2020

To start: Join us August 25 at 4:30 PM EST for the Web Archiving Section Meeting! Since March, archivists and information professionals have been focused on documenting COVID-19 and its effects on their communities. Web archiving is at the forefront of the documentation effort. The meeting will consist of a guided discussion focusing on the methods in which archivists go about creating spontaneous and event-based collections, considering all aspects of the web archiving lifecycle, from collection development to scoping to description and access efforts. We are most interested in hearing about collecting frameworks that your institution is working on, so come discuss with us! We also will do some light section business and talk about elections. Registration link here: zoom.us/meeting/register/…

On to other news:

Publications from Stanford University Press’s digital initiative now have nearly complete web-archive versions thanks to a 2020 partnership with Webrecorder. The blog post goes into detail on technical specs, specifically ReplayWeb.page and WACZ. Webrecorder also launched a forum for discussions and announcements about the software.

Rhizome’s Conifer introduces Periphery, a tool for collection owners to define how missing resources are expressed during the replay of archived content.

The “Archiving the Black Web” project was started at the African-American Research Library and Cultural Center in Fort Lauderdale.

The Joint Conference on Digital Libraries took place August 1-5 and all of  their papers are now available online. Talks related to web archiving include, Making Recommendations from Web Archives for, The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives, Identifying Documents In-Scope of a Collection from Web Archives, and  The Case For Alternative Web Archival Formats To Expedite The Data-To-Insight Cycle.

From IIPC: A descriptionof the new Robustify link service from Memento and an overview of the Danish coronavirus web collection.

Archives Unleashed was also busy! They released their 2017-2020 community report and announced their partnership with Archive-it to collaborate and integrate services to provide easy to use scalable tools to researchers using web archives.

See ya on Tuesday!

 

 

 

 

Web Archiving Roundup: July 2020

Time for our monthly round-up! Hope everyone is staying well and safe!

From IIPC: The Content Development Group reports on their 2019-2020 work, including the COVID-19 and Climate Change collections; Library and Archives Canada reports on their COVID-19 collecting; and what’s new from the Croatian Web Archive. IIPC is also hosting two upcoming webinars on visualizing web archives and Jupyter Notebooks.

Twitter got harder to capture :/

Archive-it announced their virtual partner meeting on October 7, as well as announced new staff, and other updates. Their next open call is July 29.

It’s all available online…until it’s not!

Video and article exploring base memes (the original meme a derivative meme is based off of) in web archives.

Only one article this month from Internet Histories: Forensic approaches to evaluating primary sources in internet history research: reconstructing early Web-based archival work (1989–1996) by James A. Hodges.

And a summary of a study reviewing WARC validation tools.

SAA is meeting virtually August 3-7!

Web Archiving Roundup: June 2020

Documenting the Now published its Archivists Supporting Activists list, a call to action to archivist and memory workers to help support activists documenting violence by police toward black people.

The IIPC blog was busy this month: National Széchényi Library reports back on another year of web archiving on the blog; summaries of the National Library of LuxembourgNational Library of New Zealand, Bibliothèque et Archives nationales du Québec, and the Bibliothèque nationale de France‘s coronavirus collecting; a blog post on the future of playback, specifically discussing pyweb and OpenWayback; announcement of the IIPC training program; discussion of Glam Workbench, which provides researchers with examples, tools, and documentation to help them explore and use the online collections of libraries, archives, and museums; exploration of previous annual meetings since this year’s meeting was cancelled; and reflections of training new web archivists.

IIPC also opened their discretionary funding program, proposals are due September 15.

The UK Web Archive blog discussed their2019 UK General Election web archives; case study on using Webrecorder for UK politician’s social media pages; and WARCnet— “a network is to promote high-quality national and transnational research” of web domains and events on the web.

Call for contributions for the 4th RESAW Conference on June 17-18, 2021 on mainstream vs marginal content in Web history and Web archives due October 15, 2020.

Webrecorder changed its named to Conifer!

Internet Histories has articles on international internet governance in the 1990s; selective disembodiment in the early Wired magazine; history of cyberactivism in Brazil; efficacy of civil society in global internet governance; and Weibo.

https://replayweb.page/ launched, which allows for replay of archived websites in your browser.

Announcement of version 0.80.0 of the Archives Unleashed Toolkit and revamped user documentation.

Archive-it shared the results of their 2020 State of the WARC survey.