Pitching Web Archiving to your Institution: Where to Begin?

This post was written by guest blogger Andrea Belair, Archives and Special Collections Librarian at Union College in Schenectady, NY.

Does your organization engage in web archiving? Today, many organizations now collect websites in some capacity. Over the years, I think it’s become easier to make an argument that web content should be collected. Most of us, even the least tech-savvy of us, understand that web pages are not permanent and get taken down or deleted when not maintained.  Many hard lessons have already been learned by large amounts of content that has been lost overnight, a tweet deleted by its author, or a page that is no longer available on a site. Regularly, now, the public is pointed to the address of a website to get vital information, much to the detriment of those who might not have access to a computer; however, the website is the public record, the source of information. We have seen this recently with the COVID-19 epidemic and most information about it, as well as its vaccination. 

If you intend to make a pitch to start a web archiving program at your institution, of course, you need to consider your audience: What do they care about? More importantly, though, do you belong to a public or private institution? It is likely that your organization is creating web content, and, similarly, it’s likely that this web content can be considered institutional records of your organization. 

At a former job, during a meeting about web archiving, one attendee told me that I needed to take off my records management hat and put on my archivist hat. The university archivist came just after that statement, running a little late. Having missed the rest of the conversation, he sat down, and he stated, “This is really about records management.” In terms of collecting and archiving web content, there is the collecting side and an argument to be made there, and there is the side from the vantage point of institutional records. It can be hard to show scholarly uses for web collecting, although now it’s an easier case to make when you look at the amazing web collections of the Library of Congress, for example. However, if you’re pitching a web archiving program, think about your stakeholders. Chances are that if your stakeholders are, say, executive administration in a private college, you might want to reconsider your pitch if you focus on history, or some variation of history and why history is important. Stakeholders know that history is important, and executives know that, but they consider it the job of the library to take care of that and they are not really going to think that justifies more of a budget allocation. I recommend that, instead, you look at the requirements of your accrediting institution, if you are a private college. If you are a public college, look at your local, state and federal regulations and mandates on records. Web pages are official documents of the institution, and they often fall within mandated records retention schedules. For private colleges, it is very possible that the accrediting organization to which your institution belongs has records requirements as well, although I am not familiar with all accrediting institutions, unfortunately. If there is a legal mandate to be met, go that route. If there is an accrediting mandate to be met, go that route. 

It’s not that the case for history and its importance can never be made, however. If there is a new mission statement or vision statement at your institution, and you can connect it to your pitch, by all means: now is your chance. If there is an upcoming inauguration or similar major event, that is another opportunity, because we all want to make sure that this important event and person is written into history.

What do you get when you hand a Master’s student a web archiving program?: Part Two

The following is a guest post by Grace Moran, Graduate Web Archiving Assistant – Library Administration, University of Illinois Library

In the previous installment of this series, I explored the complementary issues of metadata and access for web archives. In this second and final installment, the more human aspects of this year’s endeavor come to the fore: policy and personnel. I will also briefly describe how the current pandemic has informed web-archiving efforts at the University of Illinois.

If you did not read the previous blog post, let me revisit my background. I am the Graduate Web Archiving Assistant, working for Dr. Christopher Prom, the Associate Dean for Digital Strategies at the University of Illinois. This year, I have been charged with participating in day-to-day activities related to web archiving such as running crawls and doing quality assurance. I also engage in high-level organizational thinking about the future stewardship of the library’s burgeoning web archiving program. The program began in 2015, and since then the University of Illinois has captured 5 TB worth of data using our Archive-It subscription. Multiple units take part in the curatorial side of this endeavor: the University Archives, the American Library Association Archives, Faculty Papers, and the International & Area Studies Library. We are hoping to start running crawls for the Illinois History & Lincoln Collections in the near future.

As I noted, this post is focused on policy and personnel. These are two areas that currently present a challenge for my institution. We do not have centralized documentation, a web archiving-specific collection development policy, or a position other than my own dedicated to web archiving. The following is what I envision the program looking like in the future and will be highlighted in my final report to my supervisor at the end of the academic year.

What does policy mean for web archiving? I believe institutional web archiving policies and procedures should be composed of the following:

  • A collection development policy unique to the institution’s web archiving program (a general organization-wide collection development policy does not suffice given the unique nature of the content being collected)
  • A clear, centralized workflow outlining how crawls are to be run, troubleshooting documentation, and chain-of-command for web archiving
  • A statement on copyright and ethics in web archiving (Niu in “An Overview of Web Archiving,” cited below, touches on copyright)

Though it may be extremely obvious to some readers, it is worth saying: policy should be public. As someone who works for a public university, I am painfully aware of the importance of policy accessibility for our stakeholders.

What about personnel? Who should be running a web-archiving program? How many people should be involved? Of course, this is something that varies from institution to institution; however, my experience has made clear the need for a dedicated point person. This person could be:

  • A graduate web archiving position, like my own, working 20 hours a week to coordinate crawls across units, run Quality Assurance, and populate metadata fields.
  • A civil service or academic professional position with at least a 50% appointment to web archiving. If an institution is looking to grow their web archiving program, they should consider making this a 100% appointment for the first couple years and then having the point person slowly transition towards additional activities related to digital strategies of the library.

I should note here that a graduate web archiving assistant is a great way to support your web archiving program (yes, I am biased) but there are some drawbacks to placing this responsibility on the shoulders of a temporary employee. If you are just beginning your program, you may find that a part-time position does not fulfill the needs of your program. Additionally, there are advantages to long-term employees who have institutional knowledge and memory and therefore, understand the administrative history of digital programs within your organization. Time is lost when it is necessary to re-train someone for a position annually or bi-annually.

Side Note: I want to make clear that graduate employees are so important. They bring a fresh set of eyes to problems and the opportunity to learn from a graduate position can be absolutely priceless for someone like me. Please consider funding your graduate students, there are a great number out there pursuing an unfunded MLIS and paying off student debt for years to come.

Finally, I want to highlight a unique opportunity given to web archiving programs this past year. COVID-19 has devastated lives the world over; it has also provided inspiration for innovation and creativity. This has been true at the University of Illinois at Urbana-Champaign. From a novel saliva-based PCR test, to rigorous testing protocols, to creating a ventilator in 12 days, the institution has tackled this problem head-on. Bethany Anderson, the Natural & Applied Sciences Archivist, and I have collaborated over the past year to run crawls to document COVID-19 at the university and celebrate what we have accomplished in one of the darkest years we have seen. To check out pages we have documented, you can visit https://archive-it.org/collections/13880. This is a great example of how web archiving allows us to document important moments now and preserve the historical record (which is increasingly electronic) for the benefit of future researchers.

I hope that you have identified with some part of this blog series; my hope is that if we create a dialogue about our triumphs and struggles, we can all learn something.

Sources Mentioned

Niu, Jinfang, “An Overview of Web Archiving” (2012). School of Information Faculty Publications. 308.

“COVID-19 Response at the University of Illinois” (2021). The Board Trustees of the University of Illinois Urbana-Champaign. https://archive-it.org/collections/13880

For further questions, I can be reached at gmoran6@illinois.edu

What do you get when you hand a Master’s student a web archiving program?: A Two-Part Series

The following is a guest post by Grace Moran, Graduate Web Archiving Assistant – Library Administration, University of Illinois Library

Websites are fleeting; finding a way to preserve them and ensure accessibility for end-users is one of the greatest challenges facing record-keeping institutions today. The window for capturing a website is minute, with many pages disappearing within 2-4 years of their creation (See Ben-David, “2014 Not Found,” in the further reading list). This two-part guest blog series documents my efforts over this academic year to develop, standardize, and envision a future for the University of Illinois web archives collections through our subscription to Archive-It. This first post will explore the deeply related issues of metadata and access for web archives, as well as how they influence my proposal for the future of our web collections.

About me. I am a graduate student getting my MS in Library & Information Sciences at the University of Illinois, graduating this May. I have been working for my supervisor, Dr. Chris Prom (Associate Dean, Office of Digital Strategies) since the Spring semester of my senior year of my bachelor’s. Last year, he gave me the option of transitioning over from digital special collections and workflows and focusing on our young web-archiving program. This opportunity has taught me so much, and I want to share that with others. 

Now, for a bit of background on the program. The University of Illinois has been capturing web pages through Archive-It since 2015, but the program itself has been inconsistent in its stewardship. This year, my charge is the evaluation of the status of the University of Illinois subscription to Archive-It and the creation of a plan for its continued success. The culmination of the role will be a final report outlining my accomplishments for the year, evaluating programmatic needs for a web-archiving program, conversations I have had with stakeholders in the program, and recommendations for the future of web archiving at the University of Illinois. The challenges facing a burgeoning web-archiving program are numerous and varied, ranging from policy to accessibility. Preferably, by the end of the year, I will have identified opportunities for growth and layout a plan for the future, ideally making a case for a dedicated web-archiving position when the budget allows. With 5 TB of data archived since 2015, the care and keeping of these collections is of utmost importance; the time invested by the University of Illinois Library must not be wasted.  

My experience thus far has shown me that managing a web archive is no small feat. (Is this an obvious statement? Yes, but that doesn’t make it any less true.) Two of the most urgent issues I have seen are metadata and access; it is no secret that these two go hand-in-hand. The debate around metadata is not new. The OCLC Research Library Partnership Metadata Working Group identified this same problem in their 2017 report “Developing Web Archiving Metadata Best Practices to Meet User Needs” (in the further reading list below). As it stands now, we don’t have a standard for documenting archived websites. Standards such as DACS and MARC don’t fully account for the unique nature of websites, so some institutions have adopted a hybrid approach, using both archival and bibliographic metadata. The question remains whether or not a hybrid approach is sufficient. Is it necessary to create a new standard? Would this over-complicate things?

On the Archive-It platform, there are three possible levels of metadata: 

  1. Collection-level
  2. Seed-level
  3. Document-level

Our collections at the University of Illinois tend to have collection-level and seed-level metadata. This is due to a couple of factors. First, the labor of editing seed metadata takes enough hours without also creating document metadata; law of diminishing returns. Second, what is/isn’t a document is a bit difficult to understand on the Archive-It platform, making it unclear what benefit there is in creating document-level metadata. For now, the goal is to make collection- and seed-level metadata consistent across collections and make it standard practice to describe websites when they are crawled.

Deeply tied into the issue of metadata is that of access; how do we make sure that content reaches end-users? If a tree falls in the forest…..no I’m not going to finish that; too cliché. You get the idea. My point is this: taking the time to preserve digital materials is nearly meaningless if no one gets to enjoy the fruits of the labor. 

What does access look like for University of Illinois web collections? Right now, an end-user would have to navigate to the public-facing Archive-It website and either search “University of Illinois” or stumble across one of our collections because of a keyword search. We are not in any way alone in this; institutions all over are struggling with the idea of making collections searchable.

So what’s the solution? How do we implement an access system without relying on end-users to know we participate in web archiving and have collections on a website referenced nowhere in our systems? In my final report to the library, I am going to recommend both short- and long-term solutions. Short-term, Archive-It allows subscribers to add a search bar to their discovery systems allowing them to search through their Archive-It collections. This simple, elegant, and quick solution would make a significant difference for researchers. A long-term, more involved solution would be to index all of our pages locally and work to allow full-text search of collections. In the era of Google, our users are used to full-text search; discovery systems should fit that behavior.

If you are working with web archives, I hope that something in this post resonated with you. I hope that institutions will begin to collaborate more and find standard solutions so we can truly harness this technology to our advantage.

In my next blog post a month from now, I’ll talk about the more human side of things: policy and personnel. I also want to share a bit about our efforts to document COVID-19 at the University of Illinois. Stay safe, stay healthy, and stay tuned!

Further Reading:

  1. Belovari, S. (2017). Historians and web archives. Archivaria, 83(1), 59-79.
  2. Ben-David, A. (2019). 2014 not found: a cross-platform approach to retrospective web archiving. Internet Histories, 3(3-4), 316-342.
  3. Bragg, M., & Hanna, K. (2013). The web archiving life cycle model. Archive-It), https://archive-it. org/static/files/archiveit_life_cycle_ model. pdf (visited 14.2. 14).
  4. Brunelle, J. F., Kelly, M., Weigle, M. C., & Nelson, M. L. (2016). The impact of JavaScript on archivability. International Journal on Digital Libraries, 17(2), 95-117.
  5. Condill, K. (2017). The Online Media Environment of the North Caucasus: Issues of Preservation and Accessibility in a Zone of Political and Ideological Conflict. Preservation, Digital Technology & Culture, 45(4), 166.
  6. Costa, M., Gomes, D., & Silva, M. J. (2017). The evolution of web archiving. International Journal on Digital Libraries, 18(3), 191-205.
  7. Dooley, J. M., & Bowers, K. (2018). Descriptive metadata for web archiving: Recommendations of the OCLC research library partnership web archiving metadata working group. OCLC Research.
  8. Dooley, J. M., Farrell, K. S., Kim, T., & Venlet, J. (2018). “Descriptive Metadata for Web Archiving: Literature Review of User Needs. OCLC Research.
  9. Dooley, J. M., Farrell, K. S., Kim, T., & Venlet, J. (2017). Developing web archiving metadata best practices to meet user needs. Journal of Western Archives, 8(2), 5.
  10. Dooley, J., Samouelian, M (2018). Descriptive Metadata for Web Archiving: Review of Harvesting Tools. OCLC Research.
  11. Farag, M. M., Lee, S., & Fox, E. A. (2018). Focused crawler for events. International Journal on Digital Libraries, 19(1), 3-19.
  12. Graham, P. M. (2017). Guest Editorial: Reflections on the Ethics of Web Archiving. Journal of Archival Organization, 14(3-4): 103-110.
  13. Grotke, A. (2011). Web Archiving at the Library of Congress. Computers in Libraries, 31(10), 15-19.
  14. Jones, Gina M. and Neubert, Michael (2017) “Using RSS to Improve Web Harvest Results for News Web Sites,” Journal of Western Archives: Vol. 8 : Iss. 2 , Article 3. 
  15. Littman, J., Chudnov, D., Kerchner, D., Peterson, C., Tan, Y., Trent, R., … & Wrubel, L. (2018). API-based social media collecting as a form of web archiving. International Journal on Digital Libraries, 19(1), 21-38.
  16. Maemura, E., Worby, N., Milligan, I., & Becker, C. (2018). If these crawls could talk: Studying and documenting web archives provenance. Journal of the Association for Information Science and Technology, 69(10), 1223-1233.
  17. Masanès, J. (2005). Web archiving methods and approaches: A comparative study. Library trends, 54(1), 72-90.
  18. Pennock, Maureen (2013, March). Web-Archiving: DPC Technology Watch Report. Digital Preservation Coalition.
  19. Summers, E. (2020). Appraisal Talk in Web Archives. Archivaria, 89(1), 70-102.
  20. Thomson, Sara Day (2016, February). Preserving Social Media: DPC Technology Watch Report. Digital Preservation Coalition. 
  21. Weber, M. (2017). The tumultuous history of news on the web. In Brügger N. & Schroeder R. (Eds.), The Web as History: Using Web Archives to Understand the Past and the Present (pp. 83-100). London: UCL Press. doi:10.2307/j.ctt1mtz55k.10
  22. Webster, Peter (2020). How Researchers use the Archived Web: DPC Technology Watch Guidance Note. Digital Preservation Coalition.
  23. Wiedeman, Gregory (2019) “Describing Web Archives: A Computer-Assisted Approach,” Journal of Contemporary Archival Studies: Vol. 6, Article 31. https://elischolar.library.yale.edu/jcas/vol6/iss1/31.

Web Archiving Roundup: November 2020

Welcome to a new year with the Web Archiving Section! 

The new section Steering Committee met for the first time earlier this month and we’re all very excited about the coming year! One of our goals is to make the blog more active. We’d like to invite you to participate by submitting news, announcements, and other topics of interest. If you have a topic that you’d like to expand on, we’re also looking for guest contributors. Please send items and suggestions to April Feldman. Hope to hear from you soon.

Since we’ve all shared our “brief introductions” with the section via our candidate statements we’ve decided not to repeat ourselves. Instead, in this post, we’d like to introduce all of you to our favorite web collections. Our only requirement was that it be a collection we’ve personally worked on.

Tori Maches, Digital Archivist, UC San Diego

My favorite UCSD web archive collection is our San Diego Local Governments collection. It’s one of our oldest collections, dating back to 2007, and covers local municipal and government agency websites, as well as websites related to local government activity. I grew up in San Diego, and remember interacting with some of these websites when they looked the way they did in the earliest crawls. Recontextualizing websites you used in high school as historical objects is a weird experience, and it’s a good reminder of how quickly things change online.

Melissa Wertheimer, Music Reference Specialist, Library of Congress

I’m excited to share the LC Commissioned Composers Web Archive. This was my first venture with web archives, and I’ve been curating it since 2018. I’m a flutist who specializes in contemporary repertoire, so the topic felt natural! I think web archives are perfect for a digital record of the Music Division’s active commissioning of living jazz and classical composers. This collection is a resource in itself, but also leads users to our unique collection materials through abstracts with links to finding aids for composers’ papers and online catalog records for commissions’ manuscripts and electronic files.

Ryder Kouba, Collections Archivist, University of Hong Kong (Pok Fu Lam)

The Egyptian Politics and Revolution Collection (started by Stephen Urgola and Carolyn Runyon) documented Egyptian politics from 2011-present day (though politics have been dead since 2014 or so). As many archivists have written about collecting in the US, concerns over privacy were paramount, though difficult in a (for a time) fluid situation. Website blocking and censorship also made our work more difficult in finding content, but more important in providing access to blocked websites through the Wayback Machine (which was temporarily blocked in Egypt, logically we had concerns over exactly why the government decided to reverse that decision).

April Feldman, University Archivist, California State University, Northridge

We don’t have an active web-archiving program per se, so I haven’t worked on a favorite collection yet. We’re trying to implement a starter program, something small and scalable. In the meantime, we’ve just started using the Wayback Machine to capture limited Cal State Northridge websites and Wakelet for websites and social media posts related to CSUN’s Covid-19 response. I’m a project team of one right now, trying to figure it out as I go (love that OJT!) I’m completely open to suggestions or tales of woe if anyone wants to share.

Kiera Sullivan, University Records Processor, UC San Diego

The collection that I would like to highlight is the UC San Diego Web Archives collection. This is perhaps unsurprising, considering my role in the University Archives! This large collection captures a wealth of information from websites across the campus administration and community. Part of what I love about this collection is that although it is already extensive, there is a ton of room to grow and refine. This year, I will be working on identifying gaps in our collecting of student and community content, scoping crawls to address those gaps, and also improving the existing metadata across the collection.

Allison Fischbach, Research and Archives Associate, Towson University / MLIS student in Archives and Digital Curation, University of Maryland iSchool

The collection I am most proud of is our COVID-19 University Response Collection. This is the first web collection I have curated, and it is especially precinct as the pandemic continues. What I like about this collection is that it includes both valuable information about how institutional operations responded to COVID-19, as well as how the community continues to enact positive change. It is a collection that continues to grow, and one I know will have monumental value to future users. I feel fortunate to be able to work with it as a student archivist.

Web Archiving Roundup: August 2020

To start: Join us August 25 at 4:30 PM EST for the Web Archiving Section Meeting! Since March, archivists and information professionals have been focused on documenting COVID-19 and its effects on their communities. Web archiving is at the forefront of the documentation effort. The meeting will consist of a guided discussion focusing on the methods in which archivists go about creating spontaneous and event-based collections, considering all aspects of the web archiving lifecycle, from collection development to scoping to description and access efforts. We are most interested in hearing about collecting frameworks that your institution is working on, so come discuss with us! We also will do some light section business and talk about elections. Registration link here: zoom.us/meeting/register/…

On to other news:

Publications from Stanford University Press’s digital initiative now have nearly complete web-archive versions thanks to a 2020 partnership with Webrecorder. The blog post goes into detail on technical specs, specifically ReplayWeb.page and WACZ. Webrecorder also launched a forum for discussions and announcements about the software.

Rhizome’s Conifer introduces Periphery, a tool for collection owners to define how missing resources are expressed during the replay of archived content.

The “Archiving the Black Web” project was started at the African-American Research Library and Cultural Center in Fort Lauderdale.

The Joint Conference on Digital Libraries took place August 1-5 and all of  their papers are now available online. Talks related to web archiving include, Making Recommendations from Web Archives for, The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives, Identifying Documents In-Scope of a Collection from Web Archives, and  The Case For Alternative Web Archival Formats To Expedite The Data-To-Insight Cycle.

From IIPC: A descriptionof the new Robustify link service from Memento and an overview of the Danish coronavirus web collection.

Archives Unleashed was also busy! They released their 2017-2020 community report and announced their partnership with Archive-it to collaborate and integrate services to provide easy to use scalable tools to researchers using web archives.

See ya on Tuesday!





Web Archiving Roundup: July 2020

Time for our monthly round-up! Hope everyone is staying well and safe!

From IIPC: The Content Development Group reports on their 2019-2020 work, including the COVID-19 and Climate Change collections; Library and Archives Canada reports on their COVID-19 collecting; and what’s new from the Croatian Web Archive. IIPC is also hosting two upcoming webinars on visualizing web archives and Jupyter Notebooks.

Twitter got harder to capture :/

Archive-it announced their virtual partner meeting on October 7, as well as announced new staff, and other updates. Their next open call is July 29.

It’s all available online…until it’s not!

Video and article exploring base memes (the original meme a derivative meme is based off of) in web archives.

Only one article this month from Internet Histories: Forensic approaches to evaluating primary sources in internet history research: reconstructing early Web-based archival work (1989–1996) by James A. Hodges.

And a summary of a study reviewing WARC validation tools.

SAA is meeting virtually August 3-7!

Web Archiving Roundup: June 2020

Documenting the Now published its Archivists Supporting Activists list, a call to action to archivist and memory workers to help support activists documenting violence by police toward black people.

The IIPC blog was busy this month: National Széchényi Library reports back on another year of web archiving on the blog; summaries of the National Library of LuxembourgNational Library of New Zealand, Bibliothèque et Archives nationales du Québec, and the Bibliothèque nationale de France‘s coronavirus collecting; a blog post on the future of playback, specifically discussing pyweb and OpenWayback; announcement of the IIPC training program; discussion of Glam Workbench, which provides researchers with examples, tools, and documentation to help them explore and use the online collections of libraries, archives, and museums; exploration of previous annual meetings since this year’s meeting was cancelled; and reflections of training new web archivists.

IIPC also opened their discretionary funding program, proposals are due September 15.

The UK Web Archive blog discussed their2019 UK General Election web archives; case study on using Webrecorder for UK politician’s social media pages; and WARCnet— “a network is to promote high-quality national and transnational research” of web domains and events on the web.

Call for contributions for the 4th RESAW Conference on June 17-18, 2021 on mainstream vs marginal content in Web history and Web archives due October 15, 2020.

Webrecorder changed its named to Conifer!

Internet Histories has articles on international internet governance in the 1990s; selective disembodiment in the early Wired magazine; history of cyberactivism in Brazil; efficacy of civil society in global internet governance; and Weibo.

https://replayweb.page/ launched, which allows for replay of archived websites in your browser.

Announcement of version 0.80.0 of the Archives Unleashed Toolkit and revamped user documentation.

Archive-it shared the results of their 2020 State of the WARC survey.


Web Archiving Roundup: May 2020

Thank you to everyone who sent me their COVID-19 web archiving projects. You can view the list in our last blog post. I am still collecting submissions, so feel free to shoot me an email at nmg266 at nyu dot edu.

On to the roundup:

From the IIPC blog, the folks at Bibliotheca Alexandrina (BA) and the National Library of New Zealand (NLNZ)  share an update on their tool for scalable web archive visualization: LinkGate.

Archives Unleashed released their Spring newsletter! The newsletter features information about the Archives Unleashed Toolkit,  Cloud, and Notebooks; a summary of the remote New York datathon; web archiving articles and resources for use during COVID-19 pandemic; and other presentations from the team.

Archive-it and NYARC share the national forum report on Advancing Art Libraries and Curated Web Archives including the recording of 2020 ARLIS/NA Web Archiving SIG Meeting.

Archive-it has also wrote a number of blog posts and updates, given the uptick in web archiving due to the pandemic. These updates give information on special pricing and cost sharing, resources, community news on COVID-19 web archiving projects, as well as introductory programming for people new to web archiving.

The British Library releases its Brexit collection, as well as describes their work on COVID-19 collecting and how to use the collections while the reading room is closed.

There are a number of new articles on using web archives as primary resources, including Helena Byrne on using web archives to review football history, Will Mari on the racial, gendered, and class-based origins of the early internet, and Michael Stevenson & Anne Helmond on legacy systems.

The Digital Preservation Coalition has a number of posts related to web archiving and the coronavirus pandemic, including If These WARCs Could Talk: Learning from Archived Web & Social Media Covid-19 Collections and Capturing the UK Government Response to the Coronavirus (COVID-19) Pandemic at The National Archives UK. There is also a nice write up on how researchers use the archived web.

I presented at METRO with Mark Graham (Internet Archive), Gary Price (INFOdocket), and Alexander Thurman (Columbia University) on Documenting the Present Moment.

Is the WARC the best web archiving format?

The Library of Congress web archiving team celebrated its 20th anniversary! They also posted an interview with a retiring web archivist, Gina Jones. The LoC web archiving team also was featured in the New York Times.

Presentations from the WARCNet kickoff meeting can be found on the WARCNet website. The aim of the WARCnet network is to promote high-quality national and transnational research that will help us to understand the history of (trans)national web domains and of transnational events on the web, drawing on the increasingly important digital cultural heritage held in national web archives.

Version 2.3 of Social Feed Manager was released!

The International Journal of Digital Humanities is putting out a call for a special issue on digital humanities and web archives. Abstracts are due June 1, 2020.

Web Archiving and Digital Libraries workshop as part of JCDL 2020 is looking for submissions. Submissions are due on June 6, 2020.

The race to save the first draft of coronavirus history from internet oblivion in the MIT Technology Review.

Wow, that was a lot. Thanks to my colleague Amy for helping me with this roundup. Stay healthy everyone!

Web Archiving Round-up: COVID-19 Edition

As part of an initiative of the section I have worked to compile a list of projects started by archivists and librarians across the country in response to the COVID-19 pandemic. As of this writing, over 40 institutions have submitted their work to to the spreadsheet.  It ranges from international efforts to capture websites associated with COVID-19 to localized efforts from institutions across the United States and Canada. Archivists and librarians have used many different tools to do this work, including Archive-it, Webrecorder, wget, twarc, and other custom tools. The work that these folx have done is both important and emotionally challenging and I want to acknowledge this labor. People are powering this effort, as much as we think this work is automatic, it involves a significant amount of human labor to select, appraise, assess the websites for quality assurance, and adequately describe these materials. I also want to acknowledge all essential workers during this time of crisis, without them, I would not be able to continue to do this work from the safety of my home in New York.

Click here to view the COVID-19 Web Archives Spreadsheet.

As of right now, the spreadsheet is only open to comment. Do you have a collecting project that you want to be included in the spreadsheet? Shoot me an email at nicole dot greenhouse at nyu dot edu.

Stay safe, stay healthy.


Web Archiving Roundup: March 2020

In addition to the IIPC’s collaborative collecting project on the coronavirus, send me your institutionally curated COVID-19 web archiving projects. I will compile them and post them to the section website. My email address is nmg266 at nyu dot edu.

We are also looking for proposals for participation in the Web Archiving Section’s annual section meeting on theme of “web archiving and rights.”

Multiple web archiving events have been postponed due to the coronavirus. Archives Unleashed has changed to a remote-only event. The IIPC Web Archiving Conference and General Assembly has been postponed until September 28-30, 2020. Engaging with Web Archives is also postponed.

Archive-it is hosting a webinar on April 1 on the Wayback API. And the Community Webs project received additional funding! Other Archive-it news can be found on their blog.

In addition to the establishment of the coronavirus web archive, the IIPC has multiple announcements, including the newly elected steering committee member organizations and a published address from IIPC Chair on IIPC funded projects, consortium agreement renewals, collaboration with CLIR,  training materials, and collaborative collecting. The IIPC blog also featured a guest post from the National Library of Australia on their migration to Python Wayback.

New version of MemGator was released!

Happy 15th birthday to the UK Web Archive!

And lastly, here is a GitHub archiving individual stories from Wuhan during the coronavirus pandemic.

Stay home, wash your hands, stay healthy, and archive websites!