IIPC 2017 – Day Two | Library Innovation Lab

Most of us attended the technical track on day two of IIPC 2017. (See also Matt’s post about the first day Andrew Jackson of the British Library expanded on his talk the previous day about workflows for ingesting and processing web archives. Nick Ruest and Ian Milligan described WALK, or Web Archiving for Longitudinal Knowledge, a system for aggregating Canadian web archives, generating derivative products, and making them accessible via search and visualizations. Gregory Wiedeman from University at Albany, SUNY, described his process for automating the creation of web archive records in ArchivesSpace and adding descriptive metadata using Archive-It APIs according to DACS (Describing Archives: A Content Standard).

After the break, the Internet Archive’s Jefferson Bailey roared through a presentation of IA’s new tools, including systems for analysis, search, capture (Brozzler, and availability. Mat Kelly from Old Dominion University described three tools for enabling non-techical users to create, index, and view web archives: WARCreate, WAIL, and Mink. Lozana Rossenova and Ilya Kreymer of Rhizome demonstrated the use of containerized browsers for playback of web content that is no longer usable in modern browsers (think Java applets), as well as some upcoming features in Webrecorder for patching content into incomplete captures.

Following lunch, Fernando Melo and João Nobre from Arquivo.pt described their new APIs for search and temporal analysis of Portuguese web archives. Nicholas Taylor of Stanford University Libraries talked about the ongoing rearchitecture of LOCKSS (Lots of Copies Keep Stuff Safe), expanding its role from a focus on the archiving of electronic journals to a tool for preserving web archives and other digital objects more generally. (In the Q&A, LOCKSS founder David Rosenthal mentioned the article “Familiarity breeds contempt: the honeymoon effect and the role of legacy code in zero-day vulnerabilities”.) Jefferson Bailey returned, along with Naomi Dushay, also from the Internet Archive, to talk about WASAPI (the Web Archiving Systems API) for transfer of data between archives.

After another break, LIL’s own Jack Cushman took the stage with Ilya Kreymer for a fantastic presentation of warc.games, a tool for exploring security issues in web archives: serving a captured web page is very much akin to hosting attacker-supplied content, and warc.games provides a series of challenges for trying out different kinds of attacks against a simplified local web archive. Mat Kelly then returned with David Dias of Protocol Labs to discuss InterPlanetary Wayback, which stores web archive files in IPFS, the InterPlanetary File System. Finally, Andrew Jackson wrapped up the session by leading a discussion of planning for an IIPC hackathon or other mechanism for gathering to code.

Thanks, all, for another excellent day!