« In search of the very, very small | Main | Constitutional convention »

Born digital

Under one of my bookcases there is a box containing 40 or 50 5.25inch floppy disks next to an old floppy drive of the same size. The disks were created in SuperScripsit in the early 1980s, and require an emulator that pretends my Core2Duo is a TRS-80 Model III.

If, like me, you have had a computer for any length of time you, too, have stowed somewhere a batch of old files that you save because they are or were important to you but that you're not sure you could actually read, though you keep meaning to plug that old drive in and find out. But the Domesday Book, drafted in 1085, is still perfectly readable. In fact, it's more readable than a 1980s digital Domesday Book that was unreadable only 15 years after its creation because the technology it was stored on was outmoded.

The average life of an electronic document before it becomes obsolete is seven years. And that's if it survives that long. Paper can last centuries – and the National Archives, which holds 900 years of Britain's records, has to think in centuries.

This week, the National Archives announced it was teaming up with Microsoft to ensure that the last decade or two of government archives do not become a black hole in history.

The problem of preserving access to today's digital documents is not newly discovered. Digital preservation and archiving were on the list of topics of interest in 1997, when the Foundation for Information Policy Research was founded. Even before that, NASA had discovered the problem, in connection with the vast amounts of data collected at taxpayer expense by the various space missions. Librarians have known all along that the many format changes of the digital age posed far greater problems than deciphering an unfamiliar language chiseled into a chunk of stone.

But it takes a while for non-technical people to understand how complex a problem it really is. Most people, Natalie Ceeney, chief executive of the National Archives, said on Tuesday, think all you have to do is make back-ups. But for an archivist this isn't true, even for the simple case of, say, a departmental letter written in the early 1980s in WordStar. The National Archives wants not only to preserve the actual text of the letter but its look, feel, and functionality. To do that, you need to be able to open the document in the software in which it was originally created – which means having a machine you can run that software on. Lather, rinse, and repeat for any number of formerly common but now obsolete systems. The National Archives estimates it has 580Tb of data in obsolete formats. And more new formats are being invented every day: email, Web, instant messages, telephone text messages, email, databases, ministers' blogs, internal wikis…and as they begin to interact without human intervention that will be a whole new level of complication.

"We knew in the paper world what to keep," Ceeney said. "In the digital world, it's harder to know. But if we tried to keep everything we'd be spending the entire government budget on servers."

So for once Microsoft is looking like a good guy in providing the National Archives with Virtual PC 2007, which (it says here) combines earlier versions of Windows and Office in order to make sure that all government documents that were created using Microsoft products can be opened and read. Naturally, that isn't everything; but it's a good start. Gordon Frazer, Microsoft's UK managing director, promised open formats (or at least, Open XML) for the future. The whole mess is part of a four-year Europe-wide project called Planets.

Digital storage is surprisingly expensive compared to, say, books or film. A study reported by the head of preservation for the Swedish national archives shows that digital can cost up to eight times as much (PDF, see p4) as the same text on paper. But there is a valuable trade-off: the digital version can be easily accessed and searched by far more people. The National Archives' Web site had 66 million downloads in 2006, compared to the 250,000 visitors to its physical premises in Kew.

Listening to this discussion live, you longed to say, "Well, just print it all out, then." But even if you decided to waive the requirements for original look, feel, and functionality, not eveything could be printed out anyway. (Plus, the National Archives casually mentions that its current collection of government papers is 175 kilometres long already.) The most obvious case in point is video evidence, now being kept by police in huge amounts – and, in cases of unsolved crimes or people who have been sentenced for serious crimes, for long periods. Can't be printed. But even text-based government documents: when these were created on paper you saved the paper. The documents of the last 20 years were born digital. Paper is no longer the original but the copy. The National Archives is in the business of preserving originals.

Nor, of course, does it work to say, "Let the Internet archive take care of it: too much of the information is not published on the Web but held in internal government systems, from where it will be due to emerge in a few decades under Britain's 30-year rule. Hopefully we'll know before then that this initiative has been successful.

Wendy M. Grossman’s Web site has an extensive archive of her books, articles, and music, and an archive of all the earlier columns in this series. Readers are welcome to post here, at net.wars home, at her personal blog, or by email to netwars@skeptic.demon.co.uk (but please turn off HTML).


TrackBack URL for this entry:

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)