Git physical

This is a guest blog post by our summer fellow Miglena Minkova.

Last week at LIL, I had the pleasure of running a pilot of git physical, the first part of a series of workshops aimed at introducing git to artists and designers through creative challenges. In this workshop I focused on covering the basics: three-tree architecture, simple git workflow, and commands (add, commit, push). These lessons were fairly standard but contained a twist: The whole thing was completely analogue!

The participants, a diverse group of fellows and interns, engaged in a simplified version control exercise. Each participant was tasked with designing a postcard about their summer at LIL. Following basic git workflow, they took their designs from the working directory, through the staging index, to the version database, and to the remote repository where they displayed them. In the process they “pushed” five versions of their postcard design, each accompanied by a commit note. Working in this way allowed them to experience the workflow in a familiar setting and learn the basics in an interactive and social environment. By the end of the workshop everyone had ideas on how to implement git in their work and was eager to learn more.

Timelapse gif by Doyung Lee (

Not to mention some top-notch artwork was created.

The workshop was followed by a short debriefing session and Q&A.

Check GitHub for more info.

Alongside this overview, I want to share some of the thinking that went behind the scenes.

Starting with some background. Artists and designers perform version control in their work but in a much different way than developers do with git. They often use error-prone strategies to track document changes such as saving files in multiple places using obscure file naming conventions, working in large master files, or relying on in-built software features. At best these strategies result in inconsistencies, duplication and a large disc storage, and at worst, irreversible mistakes, loss of work, and multiple conflicting documents. Despite experiencing some of the same problems as developers, artists and designers are largely unfamiliar with git (exceptions exist).

The impetus for teaching artists and designers git was my personal experience with it. I had not been formally introduced to the concept of version control or git through my studies, nor my work. I discovered git during the final year of my MLIS degree when I worked with an artist to create a modular open source digital edition of an artist’s book. This project helped me see git as an ubiquitous tool with versatile application across multiple contexts and practices, the common denominator of which is making, editing, and sharing digital documents.

I realized that I was faced with a challenge: How do I get artists and designers excited about learning git?

I used my experience as a design educated digital librarian to create relatable content and tailor delivery to the specific characteristics of the audience: highly visual, creative, and non-technical.

Why create another git workshop? There are, after all, plenty of good quality learning resources out there and I have no intention of reinventing the wheel or competing with existing learning resources. However, I have noticed some gaps that I wanted to address through my workshop.

First of all, I wanted to focus on accessibility and have everyone start on equal ground with no prior knowledge or technical skills required. Even the simplest beginner level tutorials and training materials rely heavily on technology and the CLI (Command Line Interface) as a way of introducing new concepts. Notoriously intimidating for non-technical folk, the CLI seems inevitable given the fact that git is a command line tool. The inherent expectation of using technology to teach git means that people need to learn the architecture, terminology, workflow, commands, and the CLI all at the same time. This seems ambitious and a tad unrealistic for an audience of artists and designers.

I decided to put the technology on hold and combine several pedagogies to leverage learning: active learning, learning through doing, and project-based learning. To contextualize the topic, I embedded elements of the practice of artists and designers by including an open ended creative challenge to serve as a trigger and an end goal. I toyed with different creative challenges using deconstruction, generative design, and surrealist techniques. However this seemed to steer away from the main goal of the workshop. It also made it challenging to narrow down the scope, especially as I realized that no single workflow can embrace the diversity of creative practices. At the end, I chose to focus on versioning a combination of image and text in a single document. This helped to define the learning objectives, and cover only one functionality: the basic git workflow.

I considered it important to introduce concepts gradually in a familiar setting using analogue means to visualize black-box concepts and processes. I wanted to employ abstraction to present the git workflow in a tangible, easily digestible, and memorable way. To achieve this the physical environment and set up was crucial for the delivery of the learning objectives. In terms of designing the workspace, I assigned and labelled different areas of the space to represent the components of git’s architecture. I made use of directional arrows to illustrate the workflow sequence alongside the commands that needed to be executed and used a “remote” as a way of displaying each version on a timeline. Low-tech or no-tech solution such as carbon paper were used to make multiple copies. It took several experiments to get the sketchpad layering right, especially as I did not want to introduce manual redundancies that do little justice to git.

Thinking over the audience interaction, I had considered role play and collaboration. However these modes did not enable each participant to go through the whole workflow and fell short of addressing the learning objectives. Instead I provided each participant with initial instructions to guide them through the basic git workflow and repeat it over and over again using their own design work. The workshop was followed with debriefing which articulated the specific benefits for artists and designers, outlined use cases depending on the type of work they produce, and featured some existing examples of artwork done using git. This was to emphasize that the workshop did not offer a one-size fits all solution, but rather a tool that artists and designers can experiment with and adopt in many different ways in their work.

I want to thank Becky and Casey for their editing work.

Going forward, I am planning to develop a series of workshops introducing other git functionality such as basic merging and branching, diff-ing, and more, and tag a lab exercise to each of them. By providing multiple ways of processing the same information I am hoping that participants will successfully connect the workshop experience and git practice.

AALL 2017: The Caselaw Access Project + Hit Austin

Members of the LIL team including Adam, Anastasia, Brett and Caitlin visited Texas this past weekend to participate in the American Association of Law Libraries Conference in Austin. Tacos were eaten, talks were given (and attended) and friends were made over additional tacos.

Brett and Caitlin had to the chance to meet dozens of law librarians, court staff and others while manning the table in the main hall:

On Monday Adam and Anastaia presented “Case Law as Data: Making It, Sharing It, Using It“, discussing the CAP project and the exploring ways to use the new legal data the project is surfacing.

After their presentation they asked those that attended for ideas on how ways to use the data and received an incredible response — over 60 ideas were tossed out by those there!

This year’s AALL was a hot spot of good ideas, conversation and creative thought. Thanks AALL and inland Texas!

A Million Squandered: The “Million Dollar Homepage” as a Decaying Digital Artifact

In 2005, British student Alex Tew had a million-dollar idea. He launched, a website that presented initial visitors with nothing but a 1000×1000 canvas of blank pixels. At the cost of $1/pixel, visitors could permanently claim 10×10 blocks of pixels and populate them however they’d like. Pixel blocks could also be embedded with URLs and tooltip text of the buyer’s choosing.

The site took off, raising a total of $1,037,100 (the last 1,000 pixels were auctioned off for $38,100). Its customers and content demonstrate a massive range of variation, from individuals bragging about their disposable income to payday loan companies and media promoters. Some purchased minimal 10×10 blocks, while others strung together thousands of pixels to create detailed graphics. The biggest graphic on the page, a chain of pixel blocks purchased by a seemingly defunct domain called “”, contains $10,800 worth of pixels.

The largest graphic on the Million Dollar Homepage, an advertisement for

While most of the graphical elements on the Million Dollar Homepage are promotional in nature, it seems safe to say that the buying craze was motivated by a deeper fixation on the site’s perceived importance as a digital artifact. A banner at the top of the page reads “Own a Piece of Internet History,” a fair claim given the coverage that it received in the blogosphere and in the popular press. To buy a block of pixels was, in theory, to leave one’s mark on a collective accomplishment reflective of the internet’s enormous power to connect people and generate value.

But to what extent has this history been preserved? Does the Million Dollar Homepage represent a robust digital artifact 12 years after its creation, or has it fallen prey to the ephemerality common to internet content? Have the forces of link rot and administrative neglect rendered it a shell of its former self?

The Site

On the surface, there is little amiss with Its landing page retains its early 2000’s styling, save for an embedded twitter link in the upper left corner. The (now full) pixel canvas remains intact, saturated with the eye-melting color palettes of an earlier internet era. Overall, the site’s landing page gives the impression of having been frozen at the time of its completion.

A screenshot of the Million Dollar Homepage captured in July of 2017

However, efforts to access the other pages linked on the site’s navigation bar return unformatted 404 messages. The “contact me” link redirects to the creator’s Twitter page. It seems that the site has been stripped of its functional components, leaving little but the content of the pixel canvas itself.

Still, the canvas remains a largely intact record of the aesthetics and commercialization patterns of the internet circa 2005. It is populated by pixelated representations of clunky fonts, advertisements for sketchy looking internet gambling sites, and promises of risqué images. Many of the pixel blocks bear a familial resemblance to today’s clickbait banner ads, with scantily clothed models and promises of free goods and content. Of course, this eye-catching pixel art serves a specific purpose: to get the user to click, redirecting to a site of the buyer’s choosing. What happens when we do?

Internet links are not always permanent. As pages are deleted or renamed, backends are restructured, and domain namespaces change hands, previously reachable content and resources can be replaced by 404 pages. This “link rot” is the target of the Library Innovation Lab’s project, which allows individuals and institutions to create archived snapshots of webpages hosted at a trustable, static URLs.

Over the decade or so since the Million Dollar Homepage sold its last pixel, link rot has ravaged the site’s embedded links. Of the 2,816 links that embedded on the page (accounting for a total of 999,400 pixels), 547 are entirely unreachable at this time. A further 489 redirect to a different domain or to a domain resale portal, leaving 1,780 reachable links. Most of the domains to which these links correspond are for sale or devoid of content.

A visualization of link rot in the Million Dollar Homepage. Pixel blocks shaded in red link to unreachable or entirely empty pages, blocks shaded in blue link to domain redirects, and blocks shaded in green are reachable (but are often for sale or have limited content) [Note: this image replaces a previous image which was not colorblind-safe]

The 547 unreachable links are attached to graphical elements that collectively take up 342,000 pixels (face value: $342,000). Redirects account for a further 145,000 pixels (face value: $145,000). While it would take a good deal of manual work to assess the reachable pages for content value, the majority do not seem to reflect their original purpose. Though the Million Dollar Homepage’s pixel canvas exists as a largely intact digital artifact, the vast web of sites which it publicizes has decayed greatly over the course of time.

The decay of the Million Dollar Homepage speaks to a pressing challenge in the field of digital archiving. The meaning of a digital artifact to a viewer or researcher is often dependent on the accessibility of other digital artifacts with which it is linked or otherwise networked — a troubling proposition given the inherent dynamism of internet links and addresses. The process of archiving a digital object does not, therefore, necessarily end with the object itself.

What, then, is to be done about the Million Dollar Homepage? While it has clear value as an example of the internet’s ever-evolving culture, emergent potential, and sheer bizarreness, the site reveals itself to be little more than an empty directory upon closer inspection. For the full potential of the Million Dollar Homepage as an artifact to be realized, the web of sites which it catalogues would optimally need to be restored as it existed when the pixels were sold. Given the existence of powerful and widely accessible tools such as the Wayback machine, this kind of restorative curation may well be within reach.

LIL Talks: Comedy

This is a guest post by our LIL interns — written by Zach Tan with help from Anna Bialas and Doyung Lee

This week, LIL’s resident comic (and staff member) Brett Johnson taught a room full of LIL staff, interns, and fellows the finer intricacies of stand up comedy, which included the construction of a set, joke writing, and the challenges and high points of the craft.

As one example, Brett showed and broke down multiple jokes into the core structure of setup and punch line (or, platform and dismount) for analysis. Additionally, we were also given an insight into the industry where we often take for granted the sheer amount of work, honing, and refining that goes into a set.

We also explored what it meant to be a comic, and how the immediacy of audience reaction and enjoyment means that stand up comedy is one of the only art forms with an extremely evident (and sometimes, brutal) line between success and failure.

Though the talk was littered with choice jokes and funny bits, we definitely came away with a refreshing look into some aspects of stand-up comedy that rarely goes noticed. at IIPC

At IIPC last week, Jack Cushman (LIL developer) and Ilya Kreymer (former LIL summer fellow) shared their work on security considerations for web archives, including, a sandbox for developers interested in exploring web archive security.

Slides: repo:

David Rosenthal of Stanford also has a great write-up on the presentation:

IIPC 2017 – Day Three

On day three of IIPC 2017 (day 1, day 2), we heard more about what I see as the two main themes of the conference: archives users and metadata for provenance.

On the user front, I’ll point out Sumitra Duncan’s talk on NYARC Discovery; like WALK, presented yesterday, this project aggregates search across multiple archives, improving access for users. Peter Webster of Webster Research & Consulting and Chris Fryer from the Parliamentary Archives spoke about their study of the archive’s users: the questions of what users want and need, and how they actually use the archive, are fundamental. How we think archives should or could be used may not be as pertinent as we imagine…

On the metadata front, Emily Maemura and Nicholas Worby from the University of Toronto spoke about the ways in which documentation and curatorial process affect users’ experience of and access to archives — the staffing history of a collecting organization, for example, could be an important part of understanding why a web archive contains what it does. Jackie Dooley (OCLC Research), Alexis Antracoli (Princeton University), and Karen Stoll Farrell (Frick Art Reference Library) presented their work on developing web archiving metadata best practices to meet user needs — and it becomes clear that my two main themes could really be seen as one. OCLC Research will issue their reports in July.

I’ll also point out Nicholas Taylor’s excellent talk on the legal use cases for archives, and, of course, LIL’s Anastasia Aizman and Matt Phillips, who gave a super talk on their ongoing work on comparing web archives. Thanks again, and hope to see you all next year!

IIPC 2017 – Day Two

Most of us attended the technical track on day two of IIPC 2017. (See also Matt’s post about the first day Andrew Jackson of the British Library expanded on his talk the previous day about workflows for ingesting and processing web archives. Nick Ruest and Ian Milligan described WALK, or Web Archiving for Longitudinal Knowledge, a system for aggregating Canadian web archives, generating derivative products, and making them accessible via search and visualizations. Gregory Wiedeman from University at Albany, SUNY, described his process for automating the creation of web archive records in ArchivesSpace and adding descriptive metadata using Archive-It APIs according to DACS (Describing Archives: A Content Standard).

After the break, the Internet Archive’s Jefferson Bailey roared through a presentation of IA’s new tools, including systems for analysis, search, capture (Brozzler, and availability. Mat Kelly from Old Dominion University described three tools for enabling non-techical users to create, index, and view web archives: WARCreate, WAIL, and Mink. Lozana Rossenova and Ilya Kreymer of Rhizome demonstrated the use of containerized browsers for playback of web content that is no longer usable in modern browsers (think Java applets), as well as some upcoming features in Webrecorder for patching content into incomplete captures.

Following lunch, Fernando Melo and João Nobre from described their new APIs for search and temporal analysis of Portuguese web archives. Nicholas Taylor of Stanford University Libraries talked about the ongoing rearchitecture of LOCKSS (Lots of Copies Keep Stuff Safe), expanding its role from a focus on the archiving of electronic journals to a tool for preserving web archives and other digital objects more generally. (In the Q&A, LOCKSS founder David Rosenthal mentioned the article “Familiarity breeds contempt: the honeymoon effect and the role of legacy code in zero-day vulnerabilities”.) Jefferson Bailey returned, along with Naomi Dushay, also from the Internet Archive, to talk about WASAPI (the Web Archiving Systems API) for transfer of data between archives.

After another break, LIL’s own Jack Cushman took the stage with Ilya Kreymer for a fantastic presentation of, a tool for exploring security issues in web archives: serving a captured web page is very much akin to hosting attacker-supplied content, and provides a series of challenges for trying out different kinds of attacks against a simplified local web archive. Mat Kelly then returned with David Dias of Protocol Labs to discuss InterPlanetary Wayback, which stores web archive files in IPFS, the InterPlanetary File System. Finally, Andrew Jackson wrapped up the session by leading a discussion of planning for an IIPC hackathon or other mechanism for gathering to code.

Thanks, all, for another excellent day!

IIPC 2017 – Day One

Untitled It's exciting to be back at IIPC this year to chat and web archives!

The conference kicked off at on Wednesday, June 14, at 9:00 with coffee, snacks, and familiar faces from all parts of the world. Web archives bring us together physically!

Untitled So many people to meet. So many collaborators to greet!

Jane Winters and Nic Taylor welcomed. It’s wonderful to converse and share in this space — grand, human, bold, warm, strong. Love the Senate House at University of London. Thank you so much for hosting us!

Leah Lievrouw, UCLA Web history and the landscape of communication/media research

Leah told us that computers are viewed today as a medium — as human communication devices. This view is common now, but hasn’t been true for too long. Computers as a medium was very fringe even in the early 80s.

We walked through a history of communications to gain more understanding of computers as human communication devices and started with some history of information organization and sharing.

Paul Otlet pushed efforts forward to organize all of the world’s information in the late 19th century Belgium and France.

The Coldwar Intellectuals by J Light describes how networked information moved from the government and the military to the public.

And, how that network information became interesting when it was push and pull — send an email and receive a response, or send a message on a UNIX terminal to another user and chat. Computers are social machines, not just calculating machines. Leah took us through how the internet and early patterns of the web were formed by the time and the culture — in this case, the incredible activity of Stanford, Berkley. Mileu of the Bay Area — bits and boolean logic through psychedelics. Fred Turner’s From Counterculture to Cyberculture is a fantastic read on this scene.

Stewart Brand, Ted Nelson, the WELL online community, and so on.

We’re still talking about way before the web here. The idea of networked information was there, but we didn’t have a protocol (http) or a language (html) being used (web browser) at large scale (the web). Wired Cities by Dutton, Blumer, Kraemer sounds like a fantastic read to understand how mass wiring/communication made the a massive internet/web a possibility!

The Computer as Communication Device described by J.C.R. Licklider and Bob Taylor was a clear vision to the future — we’re still not at a place where computers understand us as humans, we’re still are fairly rigid with defined request and responses patterns.

The web was designed to access, create docs, that’s it. Early search engines and browsers exchanged discrete documents — we thought about the web as discrete, linked documents.

Then, user generated content came along — wikis, blogs, tagging, social network sites. Now it’s easy for lots of folks to create content and and the network is even more powerful as a communication tool for many people!

The next big phase came with mobile — about mid 2000s. More and more and more people! Data subject (data cloud or data footprint) is an approach that has felt interesting recently at UCLA. Maybe it’s real-time “flows” rather than “stacks” of docs or content.

Technology as cultural material and material culture.

Untitled University of London is a fantastic space!

Jefferson Bailey, Internet Archive Advancing access and interface for research use of web archives

Internet Archive is a massive archive! 32 Petabytes (with duplications)

And, they have search APIs!!

Holy smokes!!! Broad access to wayback without a URL!!!!!!!

IA has been working on a format called WAT. It’s about 20-25% the size of a WARC and contains just about everything (including title, headers, link) except the content. And, it’s a JSON format!

Fun experiments when you have tons of web archives!!! and US Military powerpoints are two gems!

Digital Desolation, Tatjana Seitz

A story about a homepage can be generated using its layout elements — (tables, fonts, and so on). Maybe the web counter and the alert box mark the page in time and can be used to understand the page!

Analysis of data capture cannot be purely technical, has to be socio-technical.

Digital desolation is a term that describes abandoned sites on the web. Sites that haven’t been restyled. Sites age over time. (Their wrinkles are frames and table !!?? lol)

Old sites might not bubble to the top in today’s search engines — they’re likely at the long tail of what is returned. You have to work to find good old pages.

Untitled The team grabbing some morning coffee

Ralph Schroederucla, Oxford Internet Institute

Web Archives and and theories of the web

Ralph is looking at how information is used and pursued. How do you seek information? Not many people ask this core question. Some interesting researcher (anyone know?) in Finland does thought. He sits down with folks and asks “how do you think about getting information when you’re just sitting in your house? How does your mind seek information?”

Googlearchy — a few sites exist that dominate!

You can look down globally at which websites dominate the attention space. The idea that we’d all come together in a one global culture, that hasn’t happened yet — instead, there’s been a slow crystallization of different clusters.

It used to be an anglo-ization of the web, now things may have moved to the South Asian — Angela Wu talks about this.

Some measurements show that American and Chinese devote their attention to about the same bubble of websites — it might be that Americans are no more outward looking than are Chinese.

We need a combined quantitative and qualitative study of web attention — we don’t access the web by typing in a URL (unless you’re in internet archive) we go to Google.

It’s hard to know about internet as a human right.

Maybe having reliable information about health could be construed as civil rights.

And unreliable, false information goes against human rights.

Untitled London is a delightful host for post-conference wanderings

Oh, dang, it’s lunch already. It’s been a fever of web archiving!

We have coverage at this year’s IIPC! What a fantastic way to attend a conference — with the depth and breadth of much of the team!

Anastasia Aizman, Becky Cremona, Jack Cushman, Brett Johnson, Matt Phillips, and Ben Steinberg are in attendance this year.

Caroline Nyvang, Thomas Hvid Kromann & Eld Zierau

Continuing the web at large

The authors conducted a survey of 35 master thesis from University of Copenhagen found that there were 899 web refs, 26.4 web refs on avg, 0 min, 80 max.

About 80% of links in theses were not dated or loosely dated — urls without dates are not reliable for citations?

Students are not consistent when they refer to web material, even if they followed well known style guides. The speakers studied another corpus — 10 danish academic monographs and found similar variation around citations. Maybe we can work toward a good reference style?

Form of suggested reference might be something like

Where page is the content coverage, or thing the author is citing. Fantastic!

What if we were to make the content coverage in a fragment identifier (the stuff after the # in the address? Maybe something like this,<timestamp>/<url>#<content coverage>

Untitled And totally unrelated, this fridge was spotted later that day on the streets of London. We need a fridge in LIL. Probably not worth shipping back though.

Some Author, some organization

The UK Web Archive has been actively grabbing things from the web since 2004.

Total collection of 400 TB of UK websites only, imposing a “territorial” boundary — .uk, .scot, .cymru, etc.

Those TLDs are not everything though — if the work is made available from a website with a uk domain name or that person is physically based in uk

Untitled Fantastic first day!! Post-conference toast (with a bday cheers!)!!

Untitled Recap, decompress, and keep the mind active for day two of IIPC!

The day was full of energy, ideas, and friendly folks sharing their most meaningful work. An absolute treat to be here and share our work! Two more days to soak up!

LIL Talks: The 180-Degree Rule in Film

This week, Jack Cushman illustrated how hard it is to make a film, or rather, how easy it is to make a bad film. With the assistance of LIL staff and interns, he directed a tiny film of four lines in about four minutes, then used it as a counter-example. Any mistake can break the suspension of disbelief, and amateurs are likely to make many mistakes. Infelicities in shot selection, lighting, sound, wardrobe and makeup, set design, editing, color, and so on destroy the viewer’s immersion in the film.

An example is the 180-degree rule: in alternating shots over the shoulders of two actors facing each other, the cameras must remain on the same side of the imaginary line joining the two actors. Breaking this rule produces cuts where the spatial relationship of the two actors appears to change from shot to shot.

After some discussion of the differences between our tiny film and Star Wars, Jack gauged his crew’s enthusiasm, and directed another attempt, taking only slightly longer to shoot than the first try. Here are some stills from the set.