The conference kicked off at on Wednesday, June 14, at 9:00 with coffee, snacks, and familiar faces from all parts of the world. Web archives bring us together physically!
Jane Winters and Nic Taylor welcomed. It’s wonderful to converse and share in this space — grand, human, bold, warm, strong. Love the Senate House at University of London. Thank you so much for hosting us!
Leah Lievrouw, UCLA Web history and the landscape of communication/media research
Leah told us that computers are viewed today as a medium — as human communication devices. This view is common now, but hasn’t been true for too long. Computers as a medium was very fringe even in the early 80s.
We walked through a history of communications to gain more understanding of computers as human communication devices and started with some history of information organization and sharing.
Paul Otlet pushed efforts forward to organize all of the world’s information in the late 19th century Belgium and France.
The Coldwar Intellectuals by J Light describes how networked information moved from the government and the military to the public.
And, how that network information became interesting when it was push and pull — send an email and receive a response, or send a message on a UNIX terminal to another user and chat. Computers are social machines, not just calculating machines. Leah took us through how the internet and early patterns of the web were formed by the time and the culture — in this case, the incredible activity of Stanford, Berkley. Mileu of the Bay Area — bits and boolean logic through psychedelics. Fred Turner’s From Counterculture to Cyberculture is a fantastic read on this scene.
We’re still talking about way before the web here. The idea of networked information was there, but we didn’t have a protocol (http) or a language (html) being used (web browser) at large scale (the web). Wired Cities by Dutton, Blumer, Kraemer sounds like a fantastic read to understand how mass wiring/communication made the a massive internet/web a possibility!
The Computer as Communication Device described by J.C.R. Licklider and Bob Taylor was a clear vision to the future — we’re still not at a place where computers understand us as humans, we’re still are fairly rigid with defined request and responses patterns.
The web was designed to access, create docs, that’s it. Early search engines and browsers exchanged discrete documents — we thought about the web as discrete, linked documents.
Then, user generated content came along — wikis, blogs, tagging, social network sites. Now it’s easy for lots of folks to create content and and the network is even more powerful as a communication tool for many people!
The next big phase came with mobile — about mid 2000s. More and more and more people! Data subject (data cloud or data footprint) is an approach that has felt interesting recently at UCLA. Maybe it’s real-time “flows” rather than “stacks” of docs or content.
Technology as cultural material and material culture.
Jefferson Bailey, Internet Archive Advancing access and interface for research use of web archives
Internet Archive is a massive archive! 32 Petabytes (with duplications)
And, they have search APIs!!
Holy smokes!!! Broad access to wayback without a URL!!!!!!!
IA has been working on a format called WAT. It’s about 20-25% the size of a WARC and contains just about everything (including title, headers, link) except the content. And, it’s a JSON format!
Fun experiments when you have tons of web archives!!! Gifcities.org and US Military powerpoints are two gems!
Digital Desolation, Tatjana Seitz
A story about a homepage can be generated using its layout elements — (tables, fonts, and so on). Maybe the web counter and the alert box mark the page in time and can be used to understand the page!
Analysis of data capture cannot be purely technical, has to be socio-technical.
Digital desolation is a term that describes abandoned sites on the web. Sites that haven’t been restyled. Sites age over time. (Their wrinkles are frames and table !!?? lol)
Old sites might not bubble to the top in today’s search engines — they’re likely at the long tail of what is returned. You have to work to find good old pages.
Ralph Schroederucla, Oxford Internet Institute
Web Archives and and theories of the web
Ralph is looking at how information is used and pursued. How do you seek information? Not many people ask this core question. Some interesting researcher (anyone know?) in Finland does thought. He sits down with folks and asks “how do you think about getting information when you’re just sitting in your house? How does your mind seek information?”
Googlearchy — a few sites exist that dominate!
You can look down globally at which websites dominate the attention space. The idea that we’d all come together in a one global culture, that hasn’t happened yet — instead, there’s been a slow crystallization of different clusters.
It used to be an anglo-ization of the web, now things may have moved to the South Asian — Angela Wu talks about this.
Some measurements show that American and Chinese devote their attention to about the same bubble of websites — it might be that Americans are no more outward looking than are Chinese.
We need a combined quantitative and qualitative study of web attention — we don’t access the web by typing in a URL (unless you’re in internet archive) we go to Google.
It’s hard to know about internet as a human right.
Maybe having reliable information about health could be construed as civil rights.
And unreliable, false information goes against human rights.
Oh, dang, it’s lunch already. It’s been a fever of web archiving!
We have coverage at this year’s IIPC! What a fantastic way to attend a conference — with the depth and breadth of much of the Perma.cc team!
Anastasia Aizman, Becky Cremona, Jack Cushman, Brett Johnson, Matt Phillips, and Ben Steinberg are in attendance this year.
Caroline Nyvang, Thomas Hvid Kromann & Eld Zierau
The authors conducted a survey of 35 master thesis from University of Copenhagen found that there were 899 web refs, 26.4 web refs on avg, 0 min, 80 max.
About 80% of links in theses were not dated or loosely dated — urls without dates are not reliable for citations?
Students are not consistent when they refer to web material, even if they followed well known style guides. The speakers studied another corpus — 10 danish academic monographs and found similar variation around citations. Maybe we can work toward a good reference style?
Form of suggested reference might be something like
Where page is the content coverage, or thing the author is citing. Fantastic!
What if we were to make the content coverage in a fragment identifier (the stuff after the # in the address? Maybe something like this,
Some Author, some organization
The UK Web Archive has been actively grabbing things from the web since 2004.
Total collection of 400 TB of UK websites only, imposing a “territorial” boundary — .uk, .scot, .cymru, etc.
Those TLDs are not everything though — if the work is made available from a website with a uk domain name or that person is physically based in uk
The day was full of energy, ideas, and friendly folks sharing their most meaningful work. An absolute treat to be here and share our work! Two more days to soak up!