If you had to
store something
for 100 years,
how would
you do it?

Century-Scale Storage

Maxwell Neely-Cohen

The Building by the Plum Orchard

On the north side of downtown San Jose, tucked against a gentle curve in California State Route 87 and the Guadalupe River that it follows, sits a nondescript single-story off-white building with tinted windows. As of the time of writing this, signs exclaim that 99 Notre Dame Avenue is available for lease. The two adjacent lots are also empty and being used as parking lots, mostly as overflow for the municipal courthouse a few blocks away. Across the street the condos start, glassy and standardized, one pale red and cream and the next a mixture of aqua and silver, stretching to the sky for blocks east toward City Hall and south until you hit the looming corporate headquarters of Adobe Inc. The low building’s only active neighbor is a weightlifting and martial arts gym in the adjoining warehouse that gives no hint of what used to be built there. Otherwise it sits alone.

99 Notre Dame Avenue building

99 Notre Dame Avenue, 1953

Reprint Courtesy of IBM Corporation © 2024

In the 1950s, 99 Notre Dame Avenue housed IBM’s first West Coast laboratory. Back then it overlooked a plum orchard. Between 1952 and 1956, a team of engineers led by a former high school science teacher designed and built the IBM 350 disk storage unit, part of the IBM 305 RAMAC, the first computer system that included something resembling a hard drive.

Before RAMAC, to store and access computer data was a laborious process involving feeding stacks of punch cards through machines. Other early solutions, like storing data on magnetic tape, were effective but slow. The IBM team created spinning aluminum disks readable by a magnetic arm which allowed data to be retrieved in a literal blink. The 24-inch platters were stacked 50 at a time in a cylinder. They rotated at close to 1200 rpm. Even in the 1950s, with the room-sized console only capable of storing 3.75 megabytes and weighing over a ton, this machine could retrieve data in 800 milliseconds.

The revolutionary element of the hard disk drive was not that it stored data for computers—there were plenty of other methods for that—but that you could store data that could then be accessed almost instantly. Your storage could be constantly connected to your system, an integral component, representing a tremendous shift in both the technical and conceptual idea of what a computer could even be.

RAMAC is the ancestor of every hard drive, every server, every relational database, every cloud. 99 Notre Dame Avenue is its birthplace. For digital storage, this is the Trinity Test Site, the explosive center from which all else follows.

RAMAC’s massive aluminum disks were coated in iron oxide, with little magnetic slots for data, bits to be read as they spun. Originally marketed narrowly toward accountants, IBM built and leased around 1,000 RAMAC 305 systems to businesses that used punch card systems. But within six years, the IBM 350 storage unit was completely obsolete, replaced by new model numbers and new designs. The returned units were scrapped, one by one. The march to create something smaller, faster, denser, and cheaper forced them off the market in less than a decade.

RAMAC actuator and disk stack

RAMAC actuator and
disk stack

Courtesy of the Computer History Museum

Only three RAMAC 305 systems and seven individual 350 disk drives in various configurations are known to have survived. A complete mechanical assembly of a 350 drive was restored in 2002 and sits in the collection of the Computer History Museum. According to a 2014 Wired magazine report, Citation: Tech Time Warp of the Week: The World's First Hard Drive, 1956 (Wired, 2014) during that restoration researchers found data still present and readable on the 350, from a Canadian insurance company, car manufacturers, and the 1963 World Series. “The RAMAC data is thermodynamically stable for longer than the expected lifetime of the universe,” said Joe Feng, one of the engineers who worked on the restoration.

From an isolated technical and engineering perspective, IBM created a storage medium that could last much longer than a hundred years, even long beyond any reasonable definition of forever, on the first try, without that even being the goal. Since the West Coast laboratory team was exploring a novel design and process, they used extremely hardy materials and mechanisms, focusing on functional reliability above all else. Yet today, 78 years later, the parts for the RAMAC are no longer being manufactured, and the machines that fabricated those parts no longer exist. It took the collaboration of several institutions to restore the necessary hardware to make possible the recovery of data off of a single unit that survived. So many other RAMAC drives did not make it. The theoretical longevity provided by its sturdy materials and robust mechanical design could not guarantee its continued use.

In the present day, our records, our artifacts, our publications, and our art no longer only inhabit the physical world. The intellectual and cultural output that we rely on and consume predominantly lives on screens, electromagnetically stored in bits and transmitted through packets and wires. Over the past two decades museums and archives have raised and spent billions of dollars to digitize their holdings, to say nothing of the countless individual citizen archivists painstakingly assembling digital collections on their own. Our hardware and software infrastructure is not built for this reality. It is tailored to the short term, without any concern for its long-term durability.

This piece looks at a single question. If you, right now, had the goal of digitally storing something for 100 years, how should you even begin to think about making that happen? How should the bits in your stewardship be stored with such a target in mind? How do our methods and platforms look when considered under the harsh unknowns of a century? There are plenty of worthy related subjects and discourses that this piece does not touch at all. This is not a piece about the sheer volume of data we are creating each day, and how we might store all of it. Nor is it a piece about the extremely tough curatorial process of deciding what is and isn’t worth preserving and storing. It is about longevity, about the potential methods of preserving what we make for future generations, about how we make bits endure. If you had to store something for 100 years, how would you do it? That’s it.

Even accounting for the human predilection for nice round numbers and the decimal system (10 fingers, 10 toes), 100 years is arbitrary. But 100 years is our metric precisely because it is attainable. It is a scale within the outer possibilities of a single human lifetime but not of a single human career. It is a duration that cannot be attained through individual force of will. It requires planning and organization across at least one generational replacement. It is a broad enough time period that the chance for social, economic, and technological change is absolute, yet close enough that all context does not collapse. It is survival from the end of the Napoleonic Wars to the beginning of World War I, from the invention of the shortwave radio to the age of Facebook, from the reign of King Henry V to Martin Luther publishing his Ninety-five Theses, from the first performance of Igor Stravinsky’s The Rite of Spring to the release of Daft Punk’s final album, from the Battle of Gettysburg to the signing of the Partial Nuclear Test Ban Treaty, from the patenting of the telephone to the release of the Apple I. We picked a century scale because most physical objects can survive 100 years in good care. It is attainable, and yet we selected it because the design of mainstream digital storage mediums are nowhere close to even considering this mark.

No single methodology that we discuss holds an obvious answer to this question, and that is fine, particularly because professional archivists recommend making and storing multiple copies of anything in multiple formats as a best practice. For example, the Smithsonian endorses Citation: Best Practices for Storing, Archiving and Preserving Data (Smithsonian Libraries, 2023) a “3-2-1 Rule” when it comes to data storage: “3 copies of the data, stored on 2 different media, with at least 1 stored off-site or in the cloud.” Or as archivist Trevor Owens puts it in his seminal text Theory and Craft of Digital Preservation, Citation: The Theory and Craft
of Digital Preservation

(Trevor Owens, 2018)
“In digital preservation we place our trust in having multiple copies. We cannot trust the durability of digital media, so we need to make multiple copies.” When storing digital data, archivists recommend utilizing file formats that are widespread and not dependent on a single commercial entity—in the words of the Smithsonian Citation: Smithsonian Data Management Best Practices
(Smithsonian Libraries, 2018)
, “non-proprietary, platform-independent, unencrypted, lossless, uncompressed, [and] commonly used.” But at the century scale, even our most widely adopted file formats are completely untested. Digital history is not long enough to definitively settle on best practices.

With digital storage there will always be two separate but equal battlefields of maintenance to consider: maintenance of the digital holdings and software environments in which they live, and the simple physical maintenance of the hardware and architecture that contain them. Every technology and methodology we discuss keeps these sorts of principles in mind, that solutions are never singular (not should they be). But we still try to analyze how their design and nature stacks up against the rigors of a hundred-year scale. How they might each deal with the savage threats a century may bring.

Hard Drives

Putting data on a hard drive is an act of writing. It is inscription with electromagnetism, which, given the right connected hardware and software, allows near-instant reading.

It’s tempting to say that they don’t make hard drives like they used to, and to a certain extent, it’s true. Seven decades of hard drive design have seen an unceasing sprint toward increased speed, capacity, density, efficiency, and short-term reliability, all while decreasing physical size and weight. Long-term reliability is a fringe concern. Hard drive manufacturers assume that anyone serious about their data will replace their storage set-ups at least once a decade as technology evolves.

Exposed internal hard drive

500GB Western Digital Scorpio Blue hard drive, 2013

Photo by Evans-Amos, licensed under GNU Free Documentation License

Contemporary hard drives come in two main flavors: hard disk drives, where data is stored on a spinning magnetic-coated metal platter, and solid state drives, where data is stored within interconnected semiconductor cells constructed as logic gates. The basic design and principles of contemporary hard disk drives are not all that different from RAMAC. A hard drive, both then and now, works by using electromagnetism to store bits within two states, +M or -M, yes or no, 1 or 0 imprinted onto an electromagnetic disk.

The fundamental issue with hard disk drives is that they are mechanical. A platter spins. An arm reads. There are motors involved. Actuators. Mechanisms. All small and inaccessible to regular maintenance. Mechanical parts, as a rule, fail. When things move, they break. Even despite this limitation, hard disk drives kept in the right conditions have the theoretical ability to last a long time. The magnetic disks that the data is actually stored on can be hardy. Their manufactured form factors seem to have stabilized over the last decade now that the race toward lighter and smaller is the province of solid state drives.

These SSDs, on the other hand, have the advantage of having no moving parts and high rates of reading and writing. They dominate the world around us: sitting in our pockets, as the standard in our computers, on every sort of server or system going for speed. In short-term everyday use, their lack of moving parts means they can handle movement and physical shock without mechanical breakage. This structure comes at a cost. Solid state drives have a finite lifespan, a limited number of times each cell can be written before the insulation that holds charged electrons inside the miniature transistors degrades. While these limits are high, and keep rising, they are still there, and when operating on a century scale they immediately become a concern, especially in a use case where data is being continually stored. SSDs also eventually lose their ability to store and hold data if left unpowered and unused for too long. While the exact spans vary depending on the hardware, any SSD-based long-term storage solution would require regular copying and re-copying, and careful management and placement of the drives in optimal conditions, particularly in terms of temperature. Even without mechanical parts, SSDs eventually fail with age. They have an expiration date.

1TB Patriot P210 Internal SSD, 2023

1TB Patriot P210
Internal SSD, 2023

Photo by Jacek Halicki, licensed under CC BY-SA 4.0

For those serious about long-term storage, single hard drives are never used alone. In 1988, the computer scientists Gareth Gibson, Randy Katz, and David Patterson proposed redundant arrays of inexpensive disks (RAID), positing that multiple commercially available drives could be superior in reliability to the centralized mainframe disk drives of the previous era. RAID is driven by the realization that it can be cheaper, easier, and more resilient to replace one small element of a large array than to replace one singular expensive piece of complicated equipment. It’s a component of what could be called digital Fordism, a digital order marked by the mass production of standardized hardware products meant for standardized software, used by the masses and specialists alike.

RAID levels and variants have evolved over the ensuing decades, allowing for different optimal configurations depending on the goal, chasing speed or capacity or reliability versus cost. For example, RAID 6, which consists of two parity blocks, can sustain two drive failures and still not result in a loss of data. Even the RAMAC, all those years ago, included a parity bit, an error detection system. A single bit set to zero or one, odd or even, designed to count the number of zeros or ones in a set of bits. If the parity bit has changed, an error has occurred. Higher-level RAID systems provide an extreme version of this, parity at the drive level, which can not only sustain multiple failures but detect and correct the affected data.

RAID arrays require maintenance, checks, and physical inspections when running over long periods. Attaining century-scale storage using hard drive systems is less a question of technology than one of institution-building, funding, real estate, logistics, culture, and a commitment to digitally preserving everything surrounding and interfacing with your storage system.

Say you set up multiple top-of-the-line fancy RAID 6 servers in a dozen perfectly climate controlled bunkers around the world. To achieve century-scale storage, you would have to create, fund, and ensure the survival of an institution to maintain, financially support, and remember them. This institution would also have to preserve the file formats, software, hardware, operating system, and every other digital element the data you are storing relies on, and continue to develop means to access them. (This will be a recurring theme in this piece: Preserving digital data also requires preserving the means to access that data, just as preserving a book requires preserving the language in which it is written.)

Past the scale of a human lifetime, technological solutions can become uncertain, and sometimes shockingly counterintuitive. For example, it would be easy to say we should completely write off contemporary SSDs for century-scale storage given the depth of commitment that they require. But the fact that a long-term storage scheme using SSDs would require hypervigilant care and maintenance, the development of a practice, could arguably be viewed as an advantage when it comes to century-scale storage. Part of the reason books have the capacity to last over a century or even a millennium is that eventually they have to be reprinted. Fragility, and the culture it creates, can be an asset in inspiring the sort of care necessary for the long term. A system that seeks the indestructible or infallible has the potential to encourage overconfidence, nonchalance, and the ultimate enemy of all archives, neglect.

The other major advantage that the hard drive has—speed of accessibility—is also worth considering as a factor in a century-scale storage solution. No need to swap out records, tapes, or reels. No need to rely on a connection to a broader global network or an independent company that might go out of business. Whatever data you want is there, in an instant. Your holdings can be shared and showcased without relying on any other party. If the collection you’re preserving isn’t exclusively for use 100 years from now, but rather is going to be used throughout the next 100 years, speed of accessibility must be balanced against other factors to some degree, particularly the security of storage. This is something that archivists, preservationists, and curators are constantly balancing in the professional management of physical institutional collections. Even so, accessibility can be a tremendous asset in building the sort of community that can survive over the long term. Proper care and maintenance require resources, funding, and labor. It’s hard to induce these while also hiding from the world.

When used for anything resembling “cold” storage—storage designed to not be readily accessible, that can sit locked away for decades—hard drives present a risky selection. Fire, water, physical impact, heat, moisture, and static electricity are constant threats. As RAMAC teaches us, even when the data survives, hard drives are merely one component of an entire computer system that also needs to be preserved.

RAMAC also shows us that hardware fragility is a choice. Hardware firms could make astronomically more resilient products under a different incentive structure. Just like with buggy software, the problem is not a lack of engineering know-how, but of shareholder pressure and the incentive for short-term growth. As it stands, hard drives themselves are cheap and getting cheaper. You can buy a 24 TB hard drive for $439, over a hundred dollars less than the cost of a single month of 24 TB storage on Amazon’s S3 cloud storage platform. We could have much hardier hard drives, but it would likely come at increased financial cost, or a complete engineering reorientation.

With the right resources, hosting and using one’s own hard drive systems and servers could be a reasonable component of a long-term storage effort. They are accessible, always ready to be copied to other drives and mediums, and built, in a sense, to be upgraded to future versions of both hardware and software. Having an on-site backup of a digital archive, completely within your control, can be a worthy contingency if you are primarily relying on a distributed or cloud solution. Having a hard drive system as part of such a design, designed to be upgraded or updated every few years, is completely reasonable, as long as the need for future adaptation is understood and properly funded. The very fact that hard disk drives are designed to be replaced incentivizes us to think carefully about planning for the future. Whether we respond rationally to that incentive is another story.

The Cloud

Google Drive passed one billion users in 2018. Citation: Google Drive is about to hit 1 billion users (The Verge, 2018) Dropbox currently claims 700 million registered users. Citation: Fact Sheet (Dropbox) According to a 2022 10-K filing, Citation: Amazon Web Services owns 11.9 million square feet of property, leases 14.1 million square feet (Data Center Dynamics, 2022) Amazon Web Services leases 14 million square feet of real estate and owns close to another 12 million square feet. We live and compute in the cloud’s world. The cloud is the dominant method for how we store data, and how the software that retrieves that data runs. Software platforms that don’t attempt to enforce the cloud as the default storage option are becoming an endangered species. Of the 39 archives, libraries, and collectors I surveyed for this project, 27 use a cloud storage service as the primary site of their digital collections. Of those, 18 use a separate cloud storage service as a secondary backup, and an additional four that have their primary storage on-site or in a decentralized scheme use a cloud storage service as a backup.

Satellite image of Google Data Center in The Dalles, Oregon

Google Data Center,
The Dalles, Oregon

Map data ©2024 Google

The cloud is an aggregation of data centers. These data centers run servers, which rely on massive arrays and combinations of the computational and hard drive technologies we have already discussed to store large amounts of data for millions of clients. To store data in the cloud is to outsource that storage, to give it over to a custodian, a guardian, whose sole purpose is receiving, safeguarding, and delivering that data for whomever is willing to pay. This charge has proven lucrative for the companies that have chosen to undertake it. While the biggest players often own and run their own data centers, there are tens of thousands of smaller businesses that offer cloud services to any number of clients, including the bigger services themselves.

According to Amazon, Citation: Celebrate Amazon S3’s 17th birthday at AWS Pi Day 2023 (Amazon, 2023) S3 “holds more than 280 trillion objects and averages over 100 million requests a second. To protect data integrity, Amazon S3 performs over four billion checksum computations per second.” The total of human production under the stewardship of Amazon S3, Microsoft Azure, and Google Cloud is unfathomable. It is a black hole, a star so dense it has collapsed unto itself and is only getting heavier and heavier.

The advantages of employing a cloud storage service are obvious. The burden of upgrading hardware and software is offloaded onto specialists. The physical demands of architecture, of physical protection from the elements, are no longer a concern for the customer. The cloud is accessible with an internet connection, and part of a massive existing infrastructure with many other stakeholders. This network effect of so many clients gives a sense of security unto itself, not unlike the “too big to fail” culture of banking. If you store your data trove in the same place as massive investment banks and the CIA, Citation: The Details About the CIA's Deal With Amazon
(The Atlantic, 2014)
it’s easy to imagine that the power of your fellow stakeholders would have some effect on the reliability and continued availability of the product.

The cloud’s current data center regime is only designed for conditions of utter stability. The physical threats to data centers are not dissimilar to the threats faced by traditional libraries, with a few additions: fire, water, physical destruction, neglect of maintenance, power failures, connection failures, theft, vandalism, and the constant forever need for software that works. During the writing of this piece, in July 2024, a Crowdstrike update bug caused archives that were using Microsoft Azure’s cloud storage services to lose access to their holdings. Natural disasters, wars, and political upheavals are all capable of causing immediate and irrevocable disruptions. Despite the internet’s founding dream, its birth ideal, of being a telecommunications network that could survive a nuclear attack, it’s fairly certain any substantive nuclear exchange would render the cloud unusable. Even aside from such nightmare scenarios, the cloud is made possible by a relatively small number of undersea cables that require constant maintenance. Citation: The Cloud Under the Sea
(The Verge, 2024)
Any blue water naval power already has the firepower and capability to severely damage global access to the internet, and thus the cloud. The global geographic distribution of data centers heavily tilts toward the U.S. and Europe. Citation: Leading countries by number of data centers 2024
(Statista, 2024)
The cloud is fairly centralized, because the companies that run it are fairly centralized.

Cloud storage requires paying someone, an outside entity, for as long as you are engaged in the act of storing. This can be more expensive than other methods (S3’s “Standard” option is $23 per TB as of November 2024), especially over extremely long periods. Amazon S3 has tried to combat this by offering a storage class for slower but more permanent storage, “Glacier,” designed to be competitive with offline cold storage options, by separating storage pricing from retrieval pricing (their “Flexible Retrieval” option is $3.60 per TB as of November 2024). But you still have to pay them. Every month or every year. Forever. You can turn off the machines that you own for a while and then turn them back on, and everything you stored will still be there, but if you stop paying your cloud storage fee the data is gone, probably forever.

The cloud requires trust. Assessing the cloud from the perspective of century-scale storage is less about the technical abilities and configurations on offer than about organizational structures, and even values. Right now, the dominant cloud storage options are exclusively administered by companies. In November of 2018, at an all-hands meeting in Seattle, an Amazon employee asked CEO and founder Jeff Bezos what he had learned from the recent bankruptcy of Sears. “Amazon is not too big to fail… In fact, I predict one day Amazon will fail. Citation: Jeff Bezos makes surprise admission about Amazon's life span (Business Insider, 2018) Amazon will go bankrupt. If you look at large companies, their lifespans tend to be 30-plus years, not a hundred-plus years,” Bezos answered. Bezos was right. Most companies do not last long. They get acquired or split up into pieces or go bankrupt or decline into something much smaller or are upended by catastrophic geopolitical events.

In 2012, Richard Foster, a professor at Yale School of Management, found Citation: Can a company live forever? (BBC, 2012) that the average lifespan of companies listed on the S&P 500 had been decreasing precipitously. It had dropped from 67 years in the 1920s to just 15 years. Most companies, even wildly successful behemoths, don’t even last 50 years, let alone a hundred.

It could be that data storage service companies are immune to this trend. Among the oldest companies in the United States that are not farms, pubs, breweries, or inns are several insurance companies. The Philadelphia Contributorship (1752), Insurance Company of North America (now known as Chubb, 1792), and Baltimore Equitable (1794) are all in their third century of business. This is also true globally (Bilsener Gilde of Germany, 1642; Hamburger Feuerkasse of Germany, 1676; Lloyd’s of London, 1688). These are companies engaged in the long-term management of assets, fluent in risk. One could imagine a storage service or data center adopting the sort of culture that might lead to similar longevity, but the cloud is so far the province of companies that are mostly nonspecialized, with cultures focused on driving growth-exploding paradigm shifts rather than stability.

Over the long term, there is serious risk that the three dominant players in cloud storage, or other upstart firms that wish to follow in their ilk, discontinue their services out of necessity, preference, or optimization as they move onto whatever the next shiny thing is that they believe can drive growth. Google has been particularly guilty of this behavior, shutting down its own products with such regularity that it has become a running joke among those of us who are terminally online. Citation: Killed by Google During the course of writing this piece Google announced that it was killing its URL Shortener, actively contributing to link rot and abetting the degradation of the world wide web.

To trust the entities offering cloud storage on a century scale would require a shift in their ethics, culture, terms, and contracts, but we can imagine what that shift could look like.

Every time I’ve spoken to a gallery guard at an art museum, I’ve found that they hold a deep sense of reverence and responsibility for the collections under their watch. This has been repeatedly reflected in both journalism covering the people with these jobs Citation: The Secret Lives of Museum Guards (The New Yorker, 2015) and their own accounts. Citation: All the Beauty in the World:
The Metropolitan Museum of Art and Me

(Patrick Bringley, 2023)
They stand there, day after day, dealing with entitled tourists, escaped toddlers, food smugglers, hyped-up school groups, and some of the more annoying varieties of possible human behaviors, and they don’t seem to lose a sense of the importance of their work. Even when the institutions they work for don’t pay them enough, this sense of reverence is retained. Every single day they are doing their best to protect that which is in their care. They feel a sense of duty during their time in those rooms.

I have struggled to find that similar sense of duty and reverence when interviewing big tech employees working on storage products, but there is no reason it could not exist. I’ve certainly observed their desire to care for the personal, the sense that they should make something that works so people don’t lose their treasured memories. They also strive for reliability on behalf of customers that are businesses, understanding that a donut shop needs their accounting records to work or they might shut down. But outside of a rare moment of marketing and a few moments of rhetoric, I feel comfortable generalizing that a culture of stewardship, a sense of the stakes, is not widespread within technology companies, nor building it a priority.

The current web pages and marketing for Microsoft Azure and Google Cloud do not mention cultural or historical preservation at any point. Only Amazon S3 mentions the concept, presenting the case study of migrating the BBC’s 100-year-old archive Citation: The BBC Preserves 100 Years of History Using Amazon S3 (BBC, 2024) to its Glacier system between case studies involving Salesforce data lakes and generative AI.

At this precise moment all of these services mention AI (a lot) and how it’s going to change everything. “Is your infrastructure AI-ready?” Microsoft Azure’s landing page asks. Google Cloud encourages you to “build what’s next in generative AI.” Two years ago their marketing materials mentioned web3 and the metaverse (a lot) and how it was going to change everything, and how if your business did not adapt you were going to be left behind—yet those sentiments no longer appear.

“I didn’t even know we had any clients like that,” a Microsoft product manager told me when I asked how she felt about protecting archives. “I have a hard time convincing anyone else it matters,” an Amazon engineer said. “There are some higher-ups that genuinely seem to care more, but that doesn’t filter down.”

The cloud does not exist in a vacuum. It is dependent on a far-reaching fabric of interactions, telecoms, internet service providers, and hardware manufacturers, all of which are motivated by timescales far removed from a century (often the daily whims of the market and a fiduciary responsibility to shareholders to maximize returns on a quarterly basis). The Jack Welch school of shareholder supremacy is completely incompatible with the sorts of values that would ensure a cloud storage provider would reliably exist for a century.

Given the network effect of mass reliance on the cloud, one would hope there is a corporate culture of responsibility around safety and the seriousness of their charge. But such corporate cultures, even when they do exist, are fragile.

Another manifestation of the cloud is as a means with which digital products are currently offered, sold, and consumed. Digital copies of books, film, music, games, and journalism, some released under subscription models, some available on digital marketplaces, are all part of our cloud environment. Over the past decade, the collection development budgets of most libraries have moved steadily away from physical books Citation: A Complex Landscape | Budgets and Funding 2024
(Library Journal, 2024)
and towards cloud-based digital subscriptions.

A potential approach to century-scale storage for any published work of literature, knowledge, art, or science would be to simply trust rights holders and IP developers to keep all that they offer safe until it enters the public domain, but under current conditions such trust is impossible. The publicly traded corporation, as an entity operating in today’s paradigm of shareholder supremacy and today’s copyright law, cannot be trusted as a partner in the preservation of anything. We have seen video game companies fail to preserve their own IP and then ruthlessly pursue litigation when fans attempt pick up the slack, film studios refuse to release entire films and shelve television shows in order to claim write-offs and avoid paying residuals, digital marketplaces disable content that consumers have already purchased, an entire generation of web art rendered nonfunctional, Citation: Emulation or it Didn’t Happen (Rhizome, 2020) and countless songs, books, shows, games, software, and even entire formats simply vanish without warning, even when contained within the “libraries” of customers who allegedly “own” them.

This process is constant and ongoing. During the short period in which this piece was written, Paramount unexpectedly deleted the entire archive of MTV News, including work which does not appear to have been saved by the Internet Archive’s Wayback Machine; GameStop abruptly shut down Game Informer and disappeared their archives, which as recently as 2011 was one of the three highest circulating magazines in the US; and reporting revealed that some of the archives of some of the most storied local newspapers in the country, including the Village Voice, had been taken over by LLM-generated clickbait. These may seem like small things. They are not. In the case of MTV News, two decades of documentation that included some of the most impactful moments in music history were put at risk of loss.

While various international law frameworks for the protection of cultural and intellectual heritage during wars have existed since the 19th century, no such frameworks exist for peacetime. If they did, they might radically change our capacity to trust both the custodians of the cloud and corporate rights holders. A different legal and civic order would affect this entire analysis. But for now, the cloud is only governed by itself.

Removable Media

The oldest vinyl record I own is a 1951 pressing of Tchaikovsky’s Piano Concerto No. 1 in B-flat Minor, played by Vladimir Horowitz and the NBC Symphony Orchestra at Carnegie Hall. On the crimson cover an illustrated pair of disembodied hands play an angled keyboard. It’s worth between $3 and $12 at present. The record is a re-press, as the recording was made in 1943 (of sheet music first published in 1875 and revised into its final form in 1890). Recordings of the concert also exist on YouTube, Citation: Vladimir Horowitz-Toscanini: Tchaikovsky Concerto No. 1, Op. 23 (1943/NBC Symphony Orchestra) (YouTube, 2024) digitally imported from various analog versions. The record on my shelf is not yet a successful implementation of century-scale storage, depending on your definition, but it’s getting there. If you go on Discogs, the crowdsourced tool for cataloging music releases that also functions as a global marketplace, you can shop for thousands of records that are over a century old. Citation: Shop Vinyl Records, CDs, and More released in 1902 to 1925 (Discogs) You need to have a turntable capable of rotating at 78-rpm speed in order to play them, but if you do, you can listen to a recording from over a hundred years ago stored on an analog shellac disc.

The most famous examples of faith in the stability of records as a storage format are currently 15.2 billion miles and 12.7 billion miles from the Earth respectively. They are traveling at 38,026.77 mph and 34,390.98 mph relative to Earth, beyond the heliosphere in interstellar space. The Voyager Golden Records housed on the Voyager space probes are made from gold- and nickel-plated copper instead of vinyl. They contain greetings in 55 human languages; 26 musical recordings including the works of Blind Willie Johnson, Chuck Berry, and Johann Sebastian Bach; field recordings of nature; and 116 encoded images of life. Each record is designed to last over a billion years.

The Golden Record is the ultimate outlier, a small preview of what might be achieved if a society brought real resources to the preservation and curation of cultural production. Most removable storage is not designed to travel the cosmos and last for over a billion years. Still, even in their less durable versions, records, tapes, and optical discs represent a storage regime that is both replicable and distributable, as well as standardizable enough that consumer hardware can access it, and display it immediately. These formats are by nature air-gapped, immune to cyberattack, buggy software updates, and accidental deletions.

The cover of the Golden Record

The Golden Record, 1977

Public domain, NASA/JPL

Optimized for the rigors of a past era of physical commerce, removable media formats all have some degree of shelf stability, but like everything else they wither under the stresses of time. Vinyl records hate heat and wear out with each listen. Cassette tapes slowly shed their oxide layer, especially under high heat and humidity, while magnetic fields, even those resulting from small consumer electronics, wreak havoc on a tape’s magnetic particles. VHS tapes experience similar afflictions. Fungi attack floppy disks. Motion picture film reels are tormented by an array of maladies. The cellulose nitrate film base used in the first half of the 20th century is known to disintegrate into dust when it isn’t literally exploding into spectacular flame. The cellulose acetate that replaced it suffers from “vinegar syndrome,” where its acetate base decays, emitting a pungent smell. Polyester film is troubled by fading colors. Film archives now store many reels in humidity-controlled frozen vaults to slow down these reactions.

Digital removable form factors like CDs, DVDs, and other optical discs are plagued by chaotic and unpredictable chemical meltdowns that erode their playback capability. There is no consensus on a singular cause of “disc rot” because manufacturing standards for CDs and DVDs were so varied that it’s impossible to identify a universal accelerant. Some discs appear totally fine, ready to last hundreds of years. Others seem almost like they were designed to degenerate after an appallingly short amount of time, as if they were not Céline Dion’s Greatest Hits but intentionally self-destructing messages out of Mission Impossible. Some discs bronze or speckle. Some shed little strips of their reflective layer. Some go bizarrely translucent, a magic trick, as if their previous existence were an illusion.

Humidity, pests, light, magnetism, and heat can destroy almost anything. Mitigating them can help preserve almost anything.

With the right conditions and care, any of these formats can last an awfully long time, but at that point they are artifacts like any other physical singular works selected for preservation. They are sculptures, tapestries, costumes, or paintings. The storage medium is just as much the treasured object as the content it is holding. And the ability to play or access the media on them must also be preserved. In 2021, lawyer and FOIA expert Michael Ravnitzky filed a request for copies of video footage of a lecture by legendary computer scientist Admiral Grace Hopper that were present in the National Security Agency’s archives. The NSA denied the request Citation: Admiral Grace Hopper’s landmark lecture is found, but the NSA won’t release it (Muckrock, 2024) in May of 2024, stating that the agency no longer owned a machine capable of playing back the AMPEX video tapes in their collection.

The Unicorn Defends Itself (from the Unicorn Tapestries), depicting a unicorn surrounded by hunters

The Unicorn Defends Itself, from the Unicorn Tapestries c.1495-1505

Public domain, part of The Met's Open Access Initiative

If you don’t plan on preserving a turntable along with a record collection, you’d better be able to build one. This is where analog formats have a potential long-term advantage over digital ones. The creators of the Voyager Golden Record reckoned that it’s easier to communicate instructions on how to build a record player to a future viewer than it is to communicate how to build a computer. Anthropomorphic issues when theorizing billion-year-in-the-future interactions with aliens aside, it certainly seems true that constructing a simple mechanical device is easier than constructing a multi-level hardware and software system.

For the purposes of century-scale storage, a knock against these formats might be that as mass consumer products they are already largely obsolete, replaced by a vast streaming infrastructure that intangibly beams content down from the cloud. However, even in a minority capacity, many of them continue to have lives as products. Vinyl records have been fully resurrected now for over a decade, having even surpassed sales of CDs, the format that supposedly marked their demise. Gen Z collectors are buying cassette tapes Citation: Gen Z Loves Cassettes. But Wait, How Do These Things Work? (Wall Street Journal, 2024) in massive quantities. Even when sold, produced, or maintained as niche offerings, these formats have serious potential that merits serious consideration.

In 1998, the Japanese Diet passed the Electronic Books Preservation Act, which required certain tax and accounting data to be stored digitally for 100 years. The government created and mandated a quality standard for optical discs that could reach this mark. Pioneer developed an optical disc drive, the BDR-WX01DM, and archival-level discs to meet this standard. Equally promising are M-DISCs, a slightly thicker Blu-ray disc design manufactured by both Ritek and Verbatim, which claim a lifespan of 1000 years when properly stored. M-DISCs passed both ECMA and ISO/IEC standards with a rating of several hundred years, and different durability and stress tests of M-DISCs have confirmed a high degree of resilience depending on conditions. But both formats are new, and these claims are untested in temporal reality. And like all other removable media, they require a physical space to be held, kept safe, and remembered.

Tape drives, which have been used for mainframe computer storage since the 1950s, are remarkably enduring. Now specifically designed for long term “cold” storage, tape drives have high capacities. As IBM’s Mark Lantz wrote in 2018, Citation: Why the Future of Data Storage is (Still) Magnetic Tape
(IEEE Spectrum, 2018)
“a single robotic tape library can contain up to 278 petabytes of data. Storing that much data on compact discs would require more than 397 million of them, which if stacked would form a tower more than 476 kilometers high.” Unlike the magnetic tape in cassettes or VHS tapes, the tape itself is much more resilient to physical degradation, and enclosed in cartridges made of heartier materials. Tape requires far less energy and computational minding than other types of computer hard drive systems.

Internals of Ampex Fine Line F-44, a 3-head Ampex home-use audio tape recorder, c. 1965

Internals of Ampex Fine Line F-44, a 3-head Ampex home-use audio tape recorder, c. 1965

Photo by Gregory F. Maxwell, licensed under GNU Free Documentation License

Tape also has the advantage of being relatively cheap, far cheaper over the medium term than cloud-based offerings or constantly upgrading your own server or hard drive hardware. In 2021, IBM priced their own LTO-9 tape solutions at $5.89 a terabyte. Citation: IBM ships new LTO 9 Tape Drives with greater density, performance, and resiliency (IBM, 2021) It’s no accident that tape drives are the standard for most large-scale cold storage. Most film companies and film libraries, banks, insurance agencies, law firms, and national archives keep a copy of at least some of their data on magnetic tape.

However, tape drive systems, depending on their complexity, can have high initial set-up costs. They are not currently produced, sold, or advertised to individual consumers. Depending on the exact hardware, writing to tape drives can be slow—slow enough that updating them incorrectly can even threaten the integrity of the data being stored. The drives that write to tapes are not designed to be regularly moved. The tapes themselves are primarily designed to sit in the perfect space you made for them. They are vulnerable to anything that might threaten the physical site within which they are contained. They are also almost entirely meant for use in a singular system and cannot claim the advantage of the naturally occuring decentralizations that follow when products are developed for consumer use.

Judged against other available storage technologies in a vacuum, away from the organizational, financial, architectural, and social structures around them, tape storage is probably the best bet for single-site storage of a large digital collection over a few decades, as long as that collection never has to be updated. But tape’s life is still measured in decades at most, designed to be replaced by a successor system long before we reach our 100-year mark. Tape drive vendors still typically market their competitive pricing advantages on scales of only five or ten years. And tape still requires physical caretaking, space, and a watchful eye.

While more expensive tape systems allow for fast data retrieval times, the very fact that they are not easily accessible is one of their trademarks, providing the security of an air gap against digital threats. This lack of accessibility is both a strength and a weakness. As previously discussed, the immediate and instant accessibility of one’s holdings can be a positive factor in preservation over a century scale. Any solution best designed for a hidden bunker, a place to be kept from the world, from the public, from society, runs the risk of not being able to inspire the social, political, and financial conditions to ensure proper care and maintenance.

We have grouped analog and digital removable storage methods, some meant for mass consumption and some meant for one-off institutional storage, all within the same analysis because they share the possibility for both mobility and replication detached from the systems that read them. It is absolutely possible to encode Google Chrome onto a vinyl record, just as we can encode the thousand-year-old musical compositions of Hildegard of Bingen in digital file formats, and just as the Voyager Golden Record encoded visual images of Earth onto its grooves.

Let’s imagine a world where six months from now, all digital music somehow flashes out of existence. Spotify, Apple Music, Tidal, and every other digital music service lose their holdings. Every hard drive in the world with music on it is erased. The record label backups are gone. Let’s pretend a digital copy of Sabrina Carpenter’s album Short n’ Sweet accidentally survives on both a single tape drive in a library, and on the thousands of records, tapes, and CDs sold to fans. Even as products of the streaming era, where physical copies represent a mere fraction of how people are listening to music, if you asked me to bet on which would make it a century in this imaginary scenario, I’d bet on the records, tapes, and CDs—and the fans, their heirs, and successor fans. I would bet on whatever Discogs-esque service exists in 2124.

Make It Physical: Print and Rock

If this project were called “Millennia-Scale Storage” the historical record would suggest two particularly successful methods for ensuring the survival of written and visual media—carving in stone or inscribing on a clay tablet.

Humans painted, engraved, and shaped stone for tens of thousands of years before anything resembling civilization or written history. Many of these works endure, only ceasing to exist through violent modification or annihilation, whether by a human being or natural force.

Thousands of years later, starting as early as 9000 BCE with simple small counting tokens and reaching widespread adoption around 3000 BCE, ancient Sumerians, Babylonians, Minoans, Mycenaeans, and Hittites wrote with sharp styluses on pieces of wet clay, which were then left to dry in the sun or baked in a kiln. These practices spread relatively quickly. We have over half a million of these tablets today, and the number of survivors continues to go up with new recoveries and excavations. These Bronze and Iron Age writings are in extraordinary shape, resistant to many would-be means of destruction. They encompass government archives, commercial documents, lists of battles, receipts, letters, debates, hymns, essays, laws, stories, mathematical theorems, recipes, and medical texts. The collection includes the now- internet-famous “complaint tablet to Ea-nāṣir,” a customer complaint (regarding the substandard quality of already-paid-for copper ingots) that has survived over 3,700 years. We have 1,800 years of astronomical records, the Epic of Gilgamesh, and 382 diplomatic letters from Akhenaten, the pharaoh of Egypt, to neighboring major powers. The clay tablets reach across an unfathomable stretch of time with an almost astonishing ease.

A brief look at historical sites around the world demonstrates the possibilities of not only physicalizing treasured digital archives, but of turning them into architecture, or using the architecture that surrounds them in ways that can enhance the possibility of century-scale survival. Computers are made of rocks, after all. Maybe we should be reversing the process and writing source code in stone, engraving our most important functions into walls.

The reasons for us to not resurrect these processes are obvious. Both stone inscription and clay tablet writing are inordinately time-intensive, slow, immutable, limited, and unbearably heavy. Even laser engraving on standardized surfaces costs hundreds of dollars for relatively short phrases, not to mention the cost of the stone itself. The current typical cost of a headstone, which usually only mentions a name and some dates, is $1,000–$3,000. To encode all 4.7 billion words of English Wikipedia (as of October 20, 2024) into stone would require tens of thousands of people and tens of billions of dollars. Any correction or update could force a restart. To encode digital images, one would not only have to include billions of characters of code, but instructions to build a computer, operating system, and software capable of interpreting that massive set of encoded characters, and then input them without making any errors, a task that could take human teams centuries itself.

Monuments are a target that require protection. Every war is a war on architecture. Physical monuments, particularly those that are contemporary, don’t always enjoy the benefits of national and cultural security. Without the constant guard and competence of a security force, the heaviest stone is easily destroyable, even by forces without access to advanced weaponry. The Georgia Guidestones were a mysterious 19-foot-tall, 118-ton granite monument created in 1980 by anonymous private citizens that espoused a set of ideological precepts in eight languages to guide humanity post-apocalypse. For 808 words, their construction cost over $400,000 in today’s dollars. They were bombed and destroyed by unknown vandals in July of 2022. Intended for a far-flung future, they only lasted 42 years.

The Georgia Guidestones in 2020

The Georgia Guidestones
in 2020

Public domain

More modern methods for preserving the written word are compelling in their own way. My bookshelves contain plenty of works over 100 years old: Sumerian texts telling the story of the ancient Mesopotamian goddess Inanna, the works of Homer, Sappho, and Euripides, of Marco Polo, Miguel de Cervantes, and Mary Shelley. These attained century-scale storage by being printed and reprinted, copied, protected, translated, and adapted into new formats, with these burdens distributed over centuries of caretakers. Even as individual objects, books can last quite a while when stored and cared for in the proper conditions. I have a few dozen that comfortably pass the hundred-year mark. The oldest book I own is a volume of Cicero’s Orator printed in Venice in 1554. The specific object is not valuable or important (I bought it for £40 in London in 2014), and yet it has traveled through space and time, conquering centuries, families, and continents without any evidence of having entered the care of a professional archivist or institution. It is still a functional book. The pages turn. The pale vellum binding stands straight. It survives.

This is supposed to be a piece about the best options for making digital storage last a century or more, with an implied focus on fancy storage technologies and novel archival schemes. But even today, if our goal is storing information for a century, we should not underrate the power of print.

Print books in physical codex form naturally decentralize to an extreme degree, finding their way into not only institutions but the collections of individuals. They are like plants whose seeds have adapted to float and spread in the wind, engineered to end up exactly where you would least suspect, and persist. Even small self-published print runs of artist zines can end up in the Museum of Modern Art or the National Archives.

Unlike digital storage, the survival of print requires physical libraries, even if those libraries are shelves in an individual’s home. It requires that those libraries be protected from fire, water, and pests like silverfish. While the cotton- and linen-infused long-fiber paper of centuries past is remarkably sturdy and robust, mechanical wood-pulp paper has shorter fibers, and is susceptible to acidification as it ages.

A gridded scene depicting a hunt

Les Singuliers et Nouveaux Portraicts by Federico de Vinciolo, 1588

Public domain, part of The Met's Open Access Initiative

The advantage of print is that it can be a practice. What was printed before can be reprinted. The downside is that, in order to take advantage of the full preservational powers of the codex form, what you are saving and printing has to be valued by the public. The printing press is a creature of the market, built to replicate based on demand. Still, just as with physical copies of music and film, the multiplicative scale of even small print runs dwarfs what you see with most digital backup methods. Even small independent publishers of edgy literature routinely print 5,000 books as a starting point for a print run. Tiny poetry presses regularly print 1,000–2,000 copies of a collection. In our current streaming era, how often do thousands of people voluntarily keep the exact same stored digital object on their own hard drive? Increasingly few. Physical libraries and readers accomplish this daily, to great effect.

The issue with books is their number. There are already a lot of them. The volume of books runs up against the human capacity to care for them. In 2010, Google tried to calculate the total number of books ever written and published and arrived at 129,864,880. Citation: Google Book Search Estimates vary, but each year somewhere between one and four million more are published. As of 2022, the Library of Congress, the most well-funded library on Earth, has only 25 million books. Given the physical space storing print requires, the scope of human publishing necessitates curation and culling. Millions of volumes of text can be stored digitally on a hard drive the size of your fingertip, volumes that in physical form would require multiple buildings. The greatest challenge to the century-scale storage potential of the print codex is that once a book is 100 years old, there is no guarantee anyone will care enough about what lies within it to take on the demands of its care.

Dispersal

One solution to century scale storage is to scatter your holdings, to put copy after copy all over the world, so that no disaster, war, or sudden loss of funding could ever threaten a digital collection’s survival. Right now, the internet and computation are not decentralized. As Janus Kopfstein noted in the New Yorker Citation: The Mission to Decentralize
the Internet

(The New Yorker, 2013)
in 2013, “a staggering percentage of communications flow through a small set of corporations—and thus, under the profound influence of those companies and other institutions.” This concentration has only accelerated in the intervening decade. Our access to the internet is controlled by telecommunications firms that openly employ anticompetitive practices without serious recourse, often avoiding each other’s turf Citation: Report: Most Americans Have No Real Choice in Internet Providers (Institute for Local Self-Reliance, 2020) because direct competition would limit their rent-seeking and profits. Only a handful of computer operating systems have anything approaching widespread adoption. Chips, graphics cards, and yes, hard drives, are made by a relatively small number of companies, and this is even more true of the parts that comprise them. AMD, Apple, ARM, Broadcom, Marvell, MediaTek, Qualcomm, and Nvidia are all semiconductor customers of Taiwan Semiconductor Manufacturing Company. This year, U.S. Commerce Secretary Gina Raimondo said Citation: US official says Chinese seizure of TSMC in Taiwan would be 'absolutely devastating' (Reuters, 2024) that the United States buys 92% of its chips from TSMC.

Despite occasional promises, small head nods, and paeans to the contrary, major firms have not been converting their products to open protocols. iMessages are still blue and everyone else is green. Meta products are only interoperable with other Meta products. X, née Twitter, is not interoperable with anything. Attempts to re-decentralize the internet, like the self-hosting platform arkOS, Citation: Sunset (arkOS, 2017) have regularly run out of resources and been discontinued.

Still, some attempts at decentralization have been more successful. LOCKSS (Lots of Copies Keep Stuff Safe) is a digital preservation strategy, protocol, and software developed by Victoria Reich and David Rosenthal in 1999 at Stanford Libraries. Rosenthal has spent decades working on and writing about the possibilities and pitfalls in long-term digital storage (this piece would absolutely not exist without his work). Citation: Keeping Bits Safe:
How Hard Can It Be?

(ACM Queue, 2010)
For LOCKSS, multiple copies of academic journals are stored across a distributed network; each copy in the system periodically checks itself against other copies for damage and discrepancies, a process of polling and repair. They whisper to each other, sharing checksums, ensuring that their copies remain uncorrupted. If a node detects a discrepancy, an injury, it sends out a silent SOS, and another node, a digital Samaritan, comes to its aid, offering a pristine copy to heal the wound. But each node is autonomous, individually responsible for tending to its own copies and paying its subscriptions.

What LOCKSS represents is an attempt to put a decentralized and reliable storage system within the control of a community. The result is a network of 80 research and public libraries sharing Citation: LOCKSS Program “custody of the scholarly record on library-owned storage, not in the cloud.” Members of LOCKSS pay a fee based on their budgets, an attempt to spread the financial burden beyond one institution.

LOCKSS is careful. It is narrowly tailored, built to respect copyright holders and institutional administrations. It is limited to 13,200 journals and 23,600 books under a set of labored agreements with publishers. The growth of the system is constrained by its strict enforcement of intellectual property rights and the inherent costs of maintaining a place within the network. These costs are ongoing and persistent, and unlike specific individual works that eventually enter the public domain, the current regime of academic publishers of journals and periodicals intends to continually publish new material to be held under copyright, and charge for it, forever. That LOCKSS achieved its scale in the current copyright regime is unusual.

The fundamental idea of LOCKSS—mutual decentralized stewardship—recalls much earlier forms of online file-sharing. As soon as computers could talk to each other, people used platforms like bulletin board systems, Usenet, and IRC to share data with all those who were connected. For a brief moment, in the late 1990s and 2000s, file-sharing (and later, torrent systems) spread massive amounts of music and video files with impunity despite limited bandwidth. Such structures, of course, allowed intellectual property rights to be ignored with abandon. Copies of goods that were contemporaneously being sold could be quickly acquired for free. And there were other problems. Malware was rampant, the time-intensive burden of collection management was shifted onto individuals, and what was available was completely dependent on the whims of the uploaders. Most significantly, these platforms were all run on centralized servers, which meant that once courts and states attacked them, they vanished. If decentralized storage was to have a future, it would have to be one where any user could not access the files of any other user, where collections were walled.

The progeny of these platforms still exist, and in some cases, thrive, though they are no longer a dominant means of distributing media. Sci-Hub, Library Genesis, and Z-Library offer academic journal articles for free to anyone who wants to download them, flouting intellectual property laws and invoking the right to science and culture under Article 27 of the Universal Declaration of Human Rights. These platforms are, in effect, an illegal, decentralized mirror of initiatives like LOCKSS, piracy that also functions as an insurance policy in the case of a future global meltdown. The singular and well-defined missions of these efforts help make them popular and contribute to their survival, despite their murky legal status and the vast powers arrayed against them. Their tremendous narrative strength—promoting universalist causes against overwhelming odds, not to mention the mythical appeal of being outlaws—has resulted in fierce protection from fans and has incentivized care and stewardship from this loyal community.

It’s worth considering the efficacy of piracy and the intentional breaking of intellectual property law as a long-term preservation tactic. Abigail De Kosnik, a professor in the Berkeley Center for New Media, contends Citation: Piracy Is the Future of Culture (Abigail De Kosnik, 2019) that, given the nature of digital cultural output and the failures of the current corporate and institutional orders to properly care for them, piracy-based media preservation efforts are more likely to survive catastrophic future events than traditional institutions. On the other hand, as the notorious prosecution of Aaron Swartz or the legal cases against the Internet Archive demonstrate, engaging in copyright infringement at scale runs the constant risk of sanction and shutdown from state actors.

Fully decentralized systems present a more elusive target for such actions. The InterPlanetary File System is a decentralized protocol designed to create a peer-to-peer network for storing and sharing files in a distributed structure. Instead of relying on centralized servers, IPFS uses content addressing, where each file is identified by a unique hash derived from its contents. This allows a specific file to be retrieved from any node in the network that stores the corresponding hash, all shared on a global network built to recognize and communicate multiple instances of the same file. IPFS functions like one giant torrent but allows users to download or seed only a part of the whole.

In 2017, when Turkish courts banned Wikipedia, IPFS allowed an entire copy to be distributed to bypass the ban. After crackdowns on the aforementioned Library Genesis and Z-Library, their holdings were migrated to IPFS. The long-term success of holdings stored with IPFS are dependent on the digital archival practices of each individual participant, and reliant on a level of participation that can be, as any open-source developer can tell you, fragile.

In 2017 Protocol Labs, the developers of IPFS, launched Filecoin, a cryptocurrency-based digital storage system partially based on the IPFS architecture. It attempts to incentivize participation by compensating those who provide storage space with a cryptocurrency. Filecoin is not alone. Arweave, Storj, Sia, BitTorrent Token, and Safecoin are all variations on the same theme, new attempts at an older dream: creating a market system that can connect all the unused digital storage scattered about the planet to those who might need it. We have always had vast surpluses of unused digital storage space and no viable marketplace to harness this excess, which could allow those with extra storage to profit and give those looking to store access to a cheap distributed market.

Fully blockchain-based systems, where each new piece of data gets added to the end of a chain that is then replicated in every instance and every node, have a complete “persistence mechanism.” The entire record of data is stored in an immutable, decentralized ledger across multiple nodes, ensuring transparency but consuming significant storage due to the entire transaction history existing at every point. Because this mechanism is not viable for the storage of any large amounts of data, most cryptocurrency-based storage solutions rely on contract-based persistence mechanisms, often used in so-called “smart contracts,” and thus store only essential data directly within the contract. This approach avoids the replication of the entire blockchain history.

Coin-based storage systems work by incentivizing users to store the data entrusted to them. They are designed to constantly verify that storage providers are storing an unaltered, undamaged copy of that which has been entrusted to them, and confirm that that storage is continuing over time. To Filecoin’s credit, rather than walling off their system, they offer storage solutions compatible with Amazon S3. They appear to be genuinely interested in storing archival data and working with archival and educational institutions. Their associated charitable organization, the Filecoin Foundation for the Decentralized Web, provided financial support to the Library Innovation Lab at Harvard Law School, allowing for the creation of this piece.

In 2018, digital preservationist and LOCKSS co-founder David Rosenthal argued Citation: The Four Most Expensive Words in the English Language
(David Rosenthal, 2018)
that cryptocurrency-based decentralized storage networks will never catch up to centralized cloud storage offerings on reliability, price, speed, and access terms. The need for encryption of all storage assets for security reasons, the lack of stable pricing, and the constant need for storage market liquidity all create potential long-term issues. Lastly, as Rosenthal also points out, if contributing storage space to these services does become profitable, there is a profound risk of centralization of storage providers: If providing storage generates revenue, that revenue will centralize because it is incentivized to centralize, just like other supposedly decentralized offerings in an unregulated market context. The untested legal status of these systems also poses potential problems. Storing copies of copyrighted intellectual property could lead to problems within the market itself if providers in certain jurisdictions are legally forced to delete data they were contracted to store. Citation: Document how removal of data for legal reasons (Github/Filecoin, 2018) None of these schemes have so far proven that they can function, let alone thrive, as functional viable marketplaces for a sustained period of time, nor that they can reliably incentivize storage in times of strife or scarcity. Since the development of large-scale trading civilizations, no region on Earth has seen a century pass without a significant economic crisis, shock, or shortage. On the century scale, these events can be severe, capable of toppling regimes, destroying nation-states, and sparking conflicts that lead to deaths measured in the millions. To directly peg an archival storage method to a market system with stakeholders that feed on volatility is equivalent to burying your hard drives in a 100-year flood zone. If a cryptocurrency-backed decentralized storage solution is going to be viable in the long term for cultural and intellectual institutions and collectors—organizations and individuals that tend to have extremely sensitive budgetary practices—they have to find a way to limit and mitigate the effect of these shocks on their pricing.

Blockchain-based implementations also run the risk of being intellectually dismissed or even politically targeted by those turned off by the speculative financial use of the technology and its tainted history of grift, fraud, rent-seeking, greed, and anti-statist techno-libertarian fantasies. I know this because I am one of those who harbor such instinctive negative reactions. Even if you believe these reactions are not fair, they are an excellent example of the human volatilities one must consider in evaluating a technology for century-scale storage. If you construct your storage scheme using a culturally and politically volatile technology, that in itself presents a risk long after you are gone. As with every other method described here, the method must be preserved along with what is being stored.

The cryptocurrency and blockchain communities, and their related firms, have so far invested little in the necessary digital-preservation grunt work that might allow their protocols, and the software and hardware those protocols run on, to endure. Crypto firms have also not mounted any significant challenges to the centralized hardware and telecommunications companies that make their models possible. If they are serious about wanting truly decentralized and resilient solutions to help store human cultural memory, they should use their resources to attack, subvert, and replace the centralized telecommunications, hardware, and software behemoths that they currently rely upon. A crypto community that is serious about a decentralized internet should be in an all-out war with the Verizons, AT&Ts, Comcasts, Starlinks, and Spectrums of the world, and treating the dominance of a firm like NVIDIA with utter hostility.

But these critiques are secondary. We can imagine an alternate decentralized storage technology that doesn’t relate to cryptocurrency at all and still arrive at the real evaluative question present here: that of centralization versus decentralization in archival practices itself.

Andrew Pettegree and Arthur der Weduwen’s The Library: A Fragile History Citation: The Library: A Fragile History (Andrew Pettegree & Arthur der Weduwen, 2021) opens with the anecdote of a 16th-century Dutch scholar arriving to his appointment at the Holy Roman Emperor’s library to find it in a state of utter neglect and destitution. The printing press had only been around a century, but in that short time, the greatest enemy of archives—neglect—had already struck. We can fret about all manner of dramatic disasters. Global thermonuclear war, asteroid impacts, caldera volcanoes, x-risks, Skynet, cultural revolutions, second comings, alien invasions, Malthusian crises, birthrate collapses, pandemics, solar flares, and Local Group supernovae. We can try to engineer around every variety of society-threatening catastrophe, the seas boiling and the ground rumbling and the cities burning. We can imagine how decentralization could provide security against destructive scenarios, how it would protect an archive in case of invasion, fire, bombing, and cyberattack. But none of those are what primarily kills archives. Boring human neglect kills archives.

The most pressing question for decentralized storage services is: Can they inspire care?

A library subject catalog

The subject catalog ("Schlagwortkatalog") of the University Library of Graz

Photo by Dr. Marcus Gossler, licensed under GNU Free Documentation License

There are certainly situations where centralization has proven disastrous. In Bosnia, the National Archives in Sarajevo were seriously damaged during a series of demonstrations and riots in 2014. In 1984, the Sikh Reference Library in Amritsar, Punjab was targeted in an Indian military operation and its entire collection confiscated. It has not yet been returned and is presumed lost. The Boxer Rebellion in 1900 claimed Beijing’s Hanlin Academy library. The only known surviving manuscripts of both Beowulf and Sir Gawain and the Green Knight survived a fire at the Cottonian Library in London in 1731. Other volumes were not so lucky. History is replete with the destruction and loss of libraries and books. World War II alone destroyed or damaged millions of library-held volumes.

I have been avoiding mentioning the most famous destruction of a library in history, that of the fabled Library of Alexandria, not least because the time and circumstances of its destructions (plural) are not authoritatively determined. But I would offer the impressions of Richard Ovenden, Citation: The Story of the Library of Alexandria Is Mostly a Legend, But the Lesson of Its Burning Is Still Crucial Today
(Time, 2020)
author of Burning the Books: A History of the Deliberate Destruction of Knowledge, Citation: Burning the Books:
A History of the Deliberate Destruction of Knowledge

(John Murray, 2020)
discussing Edward Gibbon’s account of the library’s fate in The History of the Decline and Fall of the Roman Empire: Citation: The History of the Decline and Fall of the Roman Empire (Edward Gibbons, 1776) “For Gibbon, the Library of Alexandria was one of the great achievements of the classical world and its destruction—which he concludes was due to a long and gradual process of neglect and growing ignorance—was a symbol of the barbarity that overwhelmed the Roman Empire, allowing civilization to leach away the ancient knowledge that was being re-encountered and appreciated in his own day. The fires were major incidents in which many books were lost, but the institution of the library disappeared more gradually both through organizational neglect and through the gradual obsolescence of the papyrus scrolls themselves.”

If your goal in century-scale storage is avoiding kinetic, Hollywood-ready catastrophes, then decentralized solutions are ideal, but whether they can combat neglect is less clear. If a decentralized scheme wants to be successful at century scale, this is what they should and must attack.

One of the few clear benefits of centralization is that it inspires care. If people know something is important, of value, potentially even the last of something, they tend to fight every day to protect it. The history of war, strife, and disaster is also the history of archivists, curators, artists, scientists, and passionate Samaritan bystanders saving works from impending destruction at great personal risk and sacrifice. The survivorship bias present in the human canon is merely an echo of thousands of acts of heroism.

The Bibliothèque nationale de France, previously the Royal Library, has survived 16 kings, two emperors, five republics, six full-scale revolutions, the Hundred Years’ War, the French Civil War, the Italian Wars, the Thirty Years’ War, the Franco-Dutch War, the Nine Years’ War, the War of the Spanish Succession, the Seven Years’ War, the Napoleonic Wars, the Franco-Prussian War, World War I, and World War II. It has, at times, safeguarded works that had no other known caretakers. The Bibliothèque nationale de France is not an outlier nor a case of survivorship bias, as many national libraries attain century-scale storage even while withstanding violent changes to the states they serve.

A fairly large portion of human literature, science, art, and music has survived precisely because it has been relatively centralized. Despite the obvious risks of putting all one’s eggs in one basket, we should not dismiss centralization too quickly. Can we really trust the anonymous contributors to a distributed cryptocoin-backed storage service to operate with the same level of care as professional librarians in a centralized institution or obsessive individual collectors? Can we trust that in the face of a disaster, malicious government, or marauding force that they might fight to protect their holdings? Or would they instead relax in the knowledge that somewhere else there is another copy, that write-blockers, error-correcting checksums, and encryptions ensure that they are not alone? Therein lies the problem for distributed systems: What if every other node in the distributed network also assumes this security?

Over a hundred years, eventually, the havocs come. A distributed system runs the risk of overconfidence and a lack of individual responsibility. During World War II, librarians and curators in centralized institutions smuggled works directly out of the hands of the Gestapo and SS. Some refused to flee and stayed working under occupation at great personal risk, even pretending to work with the enemy (thus also risking targeting from resistance forces), while compiling ledgers that tracked the destinations of looted collections. Librarians in Lithuania concealed ancient Jewish texts in local church basements. In Poland, they hid 13th- century monastery manuscripts in bank vaults. Whole archives were moved across borders, under darkness, with brutal and certain death stalking anyone who might be caught in the act.

Still, there are plenty of examples of successful distributed or decentralized efforts worth considering—some of the oldest libraries in the world—Saint Catherine’s Monastery, Al-Qarawiyyin, Nalanda University, the Vatican Library, or Sakya Monastery—are arguably the surviving nodes of a network, as keepers of religious texts. They are made possible by the first principle of print—of the codex and the scroll and even the manuscript—that it exists to be copied, to be multiple. This in itself is an endorsement of the merits of decentralization.

Globally, the performing arts, theater, dance, and music have all utilized decentralization as a preservation tactic to a staggering level of success. In the case of European baroque and classical music, thousands of orchestras and music schools across the world used and use the act of collecting, copying, printing, studying, and playing to safeguard and transmit works across the centuries. Even original instruments, now worth fortunes and hundreds of years old—the Stradivari, Amati, Guarneri, Ruggieri, Guadagnini, et al—are still being played, preserved, passed down, and held in trust for the next generation of players. Periods of war and upheaval have seen small groups of musicians playing chamber music together in whatever spaces were available to them, allowing musical works to conquer time.

Decentralized fan culture is inherently protective. Individual enthusiasts and digital pirates gathering in forums and Discord channels have done an incredible job preserving literary, music, video game, and film history through aggregation, emulation, and decentralized distribution. High-quality versions of the original unaltered cuts of films like Star Wars (which are no longer commercially available) are being preserved and held in this way. Volunteer teams of “rogue archivists” Citation: Archive Team have been engaged in decades-long efforts to save digital and web assets in danger of abandonment or destruction.

A personal collection of objects, the books on your bookshelf, for example, can easily engender a substantive emotional connection. What will be key for decentralized storage systems is developing similar mechanics. The most successful volunteer decentralized computing project in history, the 1999–2020 SETI@home project—which analyzed collected radio signals in the search for signs of extraterrestrial intelligenceCitation: A Brief History of SETI@Home (The Atlantic, 2017)—points to the ways such a scheme might be possible. Hundreds of thousands of computer users, including then-teenagers like me, gladly turned over their computers to this task. This was not accomplished with a promise of financial compensation, but an appeal to the sheer scope and grandeur of the mission and a genuine invitation to participate in something that could matter.

What is consistent about these examples is that they all involve groups who care. The most enduring decentralized efforts don’t owe their success to technological or organizational innovation, but rather by having enlisted generations of people with an emotional and intellectual investment in their worth. For both cloud storage services and distributed storage schemes, the question is whether they can provoke the necessary level of passion and watchfulness. Are they and their technologies empowering those who care, or setting them up to fail? Can cloud storage corporations transform themselves into wardens? Can distributed storage systems turn each node into a guardian?

Answers and Non-Answers

I have mostly been beating around the bush here for 12,000 words. One can make a real argument that storage methods and media are largely irrelevant to survival over such long periods. The success of century-scale storage comes down to the same thing that storage and preservation of any duration does: maintenance. The everyday work of a human being caring for something. If a collection enjoys proper maintenance and care for 400 years, odds are, that collection will survive 400 years. How it is stored will evolve or change as it is maintained, but if there are maintainers, it will persist.

This will stay true even with huge potential advancements in storage media on the horizon—foremost among them DNA storage, Citation: DNA: The Ultimate Data-Storage Solution (Scientific American, 2021) with its incredible capacity for density and replication. The method is currently limited by a painfully slow read/write speed and several processes that have not yet begun to be invented, but once it’s here, that technology will still have to be maintained.

Digital storage relies on software. All software and file formats are dependent on upkeep and preservation, as the march of technological advancement renders the hardware and software previously used to read and create media obsolete. Longstanding software is rare enough that it can become an object of fascination. In 2015, MIT Technology Review writer Glenn Fleishman answered a reader’s question Citation: What Is the Oldest Computer Program Still in Use? (MIT Technology Review, 2015) about what the oldest computer program still in use was. He concluded that the oldest was a Defense Department contracts management and tracking system, MOCAS, first created in 1958. It is still in use today, despite its scheduled retirement date Citation: Future of MOCAS (2018) of October 1, 2002. Fleishman also referenced a 1948 IBM 402 punch card system for inventory and accounting that was still being used by a Texas-based water filtration device manufacturer.

An IBM punch card

An IBM punch card

Public domain, via Wikimedia Commons

The IRS’s Individual Master File, the primary system for storing and processing tax submissions and inputting their data, was originally written in COBOL for IBM System/360 mainframe computers and has been running since the 1960s. There are parts of the UNIX codebase that have been continuously in use since the operating system’s start in 1969. There are likely implementations of assembly language that have been going since the 1950s.

It’s hard to determine the oldest piece of continuous digitally stored data that is not software or code itself and was never physicalized and re-digitized. Based on who the early adopters of hard drive and tape storage systems were, I would hazard that it’s a piece of meteorological or seismographic data recorded at a university on the West Coast of the United States, but that’s just a guess. The fact that some of these datasets were also held in print archives and later reentered into digital databases makes it hard to say for sure. The National Oceanic and Atmospheric Administration’s primary computer weather data system and Data Buoy systems have both been in use since 1970. These datasets have persisted through vigilance and the grinding attention of generations of scientists and their students. But none of them are close to attaining our 100-year mark.

Our digital tools fall into obsolescence and disrepair at an astonishing pace. Totally aside from issues related to preservation and storage, the risks when we fail to maintain software, and the knowledge and capacity to maintain it, are real and exigent. During the COVID-19 pandemic, several state and local governments found themselves in desperate need of COBOL programmers, as their unemployment and insurance systems still ran on software built in the relatively ancient language. If you want to store something for a hundred years, the ability to read and retrieve that stored item in the future is critical. And the only way to ensure that ability is to preserve the software that allows you to access your data, preserve the hardware that can run that software, and preserve the knowledge and skills required to maintain the entire system."

While plenty of computer scientists and thinkers, like co-inventor of the internet Vinton Cerf, Citation: Bit rot (on digital vellum) |
Vint Cerf | TEDxRoma
(YouTube, 2014)
have nobly proposed or theorized universal-file-format schemes that might change this reality, they remain a speculative fantasy. The only currently viable way to preserve software is through the hard everyday work of maintenance, adaptation, and emulation. Right now, there is no shortcut or magic format. There is no hack.

But there are tools, efforts, and protocols that try to ease these burdens. The Web ARChive (WARC) file format is a standard for preserving web-based holdings that accommodates all sorts of secondary content. Fedora is a digital resource management system built from the ground up to preserve digital assets. CLOCKSS is an independent nonprofit implementation of LOCKSS technology intended as a long-term dark archive for journals and books. Rhizome, a digital art and culture organization that works out of the New Museum in New York City, has a dedicated digital preservation team working to preserve digital works. ArchiveBox is a self-hosted solution to archiving the web. The Media Archeology Lab at the University of Colorado Boulder preserves and documents obsolete media. John Bowers, Jack Cushman, Jayshree Sarathy, and Jonathan Zittrain, here at the Library Innovation Lab, have proposed “Strong Dark Archives,” Citation: ‘Time Capsule’ Archiving Through Strong Dark Archives (SDA): Designing Trustable Distributed Archives for Sensitive Materials (Harvard Public Law Working Paper No. 22-17, 2022) a protocol for born-digital sealed records that must be protected for security or legal reasons. There are dozens of other worthy projects and examples. The librarians and archivists of the world have been tackling the challenges of digital preservation for decades—the issue is that no one else is.

The real solution to century-scale storage, especially at scale, is to change this reality. Successful century-scale storage will require a massive investment in digital preservation, a societal commitment. Politicians, governments, companies, and investors will have to be convinced, incentivized, or even bullied.

The United States allocates scant resources to the practice and problem of archiving and preservation in general. I tried to calculate what percentage of U.S. GDP is spent on libraries and archives—not just digital preservation, not even just preservation, but what sort of resources were allocated to the entire category. I aggregated budget reports from national, state, and local agencies, nonprofit institutions, industry groups, and corporate archives; assessed the productive capacities of the industries that serve these groups; spoke with economists, experts, and analysts at UBS, Morgan Stanley, and the Congressional Budget Office—and was never able to get close to an estimate that cracked 0.1 percent of GDP. According to the Institute of Museum and Library Services (IMLS) Public Libraries Survey, public libraries in the United States had a total operating expenditure of about $13 billion in FY 2018. The National Archives requested $481 million for their 2024 budget. The private sector spends very little on its own archival efforts. Even extremely large companies tend to employ a single corporate archivist, if that. Relative to the size of any other part of our government or economy, these numbers are tiny.

Software is running our world. Spending so little to attempt to preserve something so important is a scandal.

The best option to ensure century-scale storage is to radically change this order. Any storage provider serious about being a viable long-term storage option should be screaming about software preservation at every opportunity. If the corporate stakeholders in this space are serious about providing long-term storage to customers, they should wield the full power of their financial, human, and political capital to make digital preservation a greater priority.

Every time a media company destroys an archive, every time a video game company prosecutes the preservers of content it has abandoned, every time a tech company kills a well-used product with no plan for preservation, these actions should be met with attention and resistance.

We are on the brink of a dark age, Citation: Raiders of the LostWeb
(The Atlantic, 2015)
or have already entered one. The scale of art, music, and literature being lost each day as the World Wide Web shifts and degenerates represents the biggest loss of human cultural production since World War II. My generation was continuously warned by teachers, parents, and authority figures that we should be careful online because the internet is written in ink, and yet it turned out to be the exact opposite. As writer and researcher Kevin T. Baker remarked, Citation: X (formerly Twitter), 2024 “On the internet, Alexandria burns daily.”

The Library of Alexandria, 19th-century artistic rendering by German artist O. Von
                    Corven

19th-century artistic rendering of The Library of Alexandria

Public domain, "The Great Library of Alexandria" by O. Von Corven

For century-scale storage, you aren’t fighting against mere mortal enemies—you’re waging a battle against the raging and unkind powers of geology, physics, and chemistry, not to mention the inexhaustible fallibility of humanity as a species. No quarter will be given.

If you want to store something for 100 years, what are the best methods for ensuring its survival? Hold it within a social or governmental structure that is most likely to facilitate maintenance and care. Be under the protection of or affiliated with the right nation-state (for example, one could argue that the holdings of the Library of Congress are backed up by the full force of the United States nuclear arsenal). Be part of a major religion. Be part of an aristocracy. Be part of a prominent artistic or intellectual scene, or a participant in an artistic or intellectual tradition.

Still, each of these governance structures also presents risks. “There is no political power without power over the archive,” Jacques Derrida wrote in Archive Fever. The centralized power structures of monarchies, despotic states, military dictatorships, and single-party rule all have a penchant for the intentional destruction of artifacts and records in order to maintain control. Clan-based systems regularly destroy that which is not contained within them. Dominant free-market ideologies actively incentivize the mass abandonment of anything that does not have the market value to sustain itself. Nation-states, even social democracies that rank highly on various freedom indices, have a spectacular capacity for both conscious and accidental censorship, selective preservation, and desertion of the artifacts under their care. As Fernando Báez writes in A Universal History of the Destruction of Books, Citation: A Universal History of the Destruction of Books: From Ancient Sumer to Modern Iraq (Fernando Báez and Alfred MacAdam, 2008) “It’s a common error to attribute the destruction of books to ignorant men unaware of their hatred. After twelve years of study, I’ve concluded that the more cultured a nation or a person is, the more willing each is to eliminate books under the pressure of apocalyptic myths.”

In order to survive, a data storer, and the makers of the tools they use, must be prepared to adopt a skeptical and even defiant attitude toward the societies in which they live. They must accept the protection of a patron while also preparing for the possibility of betrayal. If you’re wondering why much of this essay takes such an antagonistic pose toward external political and economic actors, while also considering the fruits of their offerings, it is because the century-scale archivist must sometimes be in service of an ideology that only answers to itself—to the protection of the collected artifacts at all costs. This ideology, an “Archivism,” entails a belief in the preservation of that which we make and think for future generations, at the expense of anything else. Century-scale storage can span methods and platforms, be enabled by governments and titans of industry, be helped by religions, cultures, artists, scenes, fans, collectors, technocrats, and engineers, but it must, at the end of the day, retain its values internally.

This is where, once again, the only true solution is an aggressive and massive investment in archives, libraries, digital preservationists, and software and hardware maintainers at every level, in every form of practice and economic circumstance. This needs to happen not just for states, corporations, and institutions, but for hobbyists and consumers. Many of our most treasured artistic and intellectual artifacts survived for decades in the hands of individuals long before they entered institutional care.

Resilience over time is not something that can be designed at the moment of inception and then forgotten. Century-scale storage requires a watchful eye that can adapt to new threats, to new paradigms, to that which could not be previously imagined. The goal of century-scale storage must be to preserve that which we have created so that others, those we will never meet, may experience their intricacies and ecstasies, their capacities for enlightenment. This should be done by whatever means necessary, whatever method or decision ensures the possibility of that future—one day at a time,—and be willing to change at any moment, to scrap and claw against the forces attempting to smother the light.

If you are a company that offers a storage product, how can you help the long-term digital storage of archival material? Try to find a new investment model, one which might allow you to build for the longer term. Embrace open protocols. Support Right To Repair laws and build hardware that is repairable. Attack firms on other parts of the network and computational chains bent on centralization and monopoly. Help fund and implement a completely new paradigm of the digital preservation of software.

And if you, an individual reading this, want to store something and ensure it survives a century, what should you do? More than one thing. You should combine every method available to you, layers of backups, armies of copies, and most of all, practices and sites that encourage a culture of watchfulness and care. You should fight for a society that values the sciences and arts and that which they produce. And then, each day, you should do whatever it takes to keep your something safe, do whatever you can to empower the next generation to do the same, and then entrust that battle to them, to repeat into futurity.

In 2009, a couple renovated an abandoned house in sleepy St. Anne, Illinois. The house was in bad shape, clearly vandalized, ransacked, cracking and warping. They found papers strewn and stacked all over, with the name “Florence Price” written on them again and again. They had unwittingly discovered the composer’s former vacation house 56 years after her death, and a dozen works that had been thought long-lost, including two violin concertos and her fourth symphony. In 2015, the editor of a literary magazine was trolling through Princeton University’s rare book archive when he came across an unreleased short story by F. Scott Fitzgerald, which had been intended for publication but never released because of a conflict between the author and his agent. An unpublished Edith Wharton story was found at Yale’s Beinecke Rare Book & Manuscript Library that same year. In 2010, at that same archive, Richard Wright’s daughter found a previously unpublished novel manuscript hidden within his papers. If you walked into a record store in the early 1950s and asked them for a copy of Vivaldi’s “Four Seasons,” now one of the most ubiquitous pieces of music in the world, chances are they would have had no idea what you were talking about. In the 1920s, a small group of Vivaldi enthusiasts in Italy scoured local libraries and repositories for what was then thought of as a minor figure’s works, eventually finding half his scores in a monastery in Piedmont, and the other half held by a wealthy aristocratic family. The group found financial backers, purchased both collections for the University of Turin, and went to work for the next three decades resurrecting a musical reputation. By the 1960s, Vivaldi was omnipresent. When the Czech National Museum was re-cataloging and digitizing their archives in 2015, they found a long-lost 1785 collaboration between Wolfgang Amadeus Mozart and Antonio Salieri. Just this year, the municipal libraries of Leipzig revealed that another Mozart composition had been rediscovered within their holdings. A few weeks before this piece was published, the New York Times revealed a curator at the Morgan Library & Museum had unearthed a previously unknown Chopin Waltz. Citation: Hear a Chopin Waltz Unearthed After Nearly 200 Years
(The New York Times, 2024)

All of these works were republished, rerecorded, or re-performed to great acclaim. What was lost was found. Small individual acts of care, spread over generations, led to their survival and rediscovery. The digital versions of these miracles can and will happen. One day, someone will find the flash drive on the ransacked floor of a house, the forgotten server in the ruin of a data center, the file in the bowel of a database. It will matter. Even if their contents had been damaged or forgotten, actions of previous care can bear fruit decades later. They are the difference between recovery and despair.

About the Author

Maxwell Neely-Cohen is a fellow at the Library Innovation Lab. His nonfiction and essays have appeared in places like The New Republic, SSENSE, and BOMB Magazine. His non-writing work has spanned theater, video games, dance, and music. His experiments with technology have been acclaimed by The New York Times Magazine, Frieze, and The Financial Times. Before his literary and artistic career, he worked as a conflict analyst studying social upheaval, nuclear weapons, and the effects of asymmetric warfare on societies and economies. He lives in New York City.

Credits

Edited by
Clare Stanton

Additional Editing
Meg Miller

Copy Editing
Gillian Brassil

Art and Design
Shelby Wilson and Alex Miller

Web Accessibility
Rebecca Cremona and Ben Steinberg

Support for this project was provided by the Filecoin Foundation for the Decentralized Web

Published by the Library Innovation Lab at Harvard Law School