Web archiving

Web pages

How we're preserving the web

How do you preserve a website, when it's constantly being redesigned? That's the challenge that the British Library's web archiving team faces.

The Library collects websites in the UK domain, based primarily on their research value, how representative they are of British social history, cultural diversity and heritage, or if they demonstrate innovative web technology and innovation. Working closely with other national libraries and archives across the world, we're also a member of the International Internet Preservation Consortium (IIPC), which was formed to look at ways of preserving websites.

The challenges

The first challenge we face is actually getting permission to preserve the site. Sometimes the site is taken down before we can contact the owner, and even if we do speak to the owner, only around 25% of owners we contact grant permission.

Technically, preserving a website is a very complicated process. The software we use isn't keeping pace with the technology that's used to build and support websites. This means that often, you can't preserve the site as it stands, so when you archive the site and provide access to it, you're trying to show content that you can't control. For example, a simple code error in the original site can cause real problems in archiving it.

Also we often can't replicate the look and feel of the original site, because the technology we use can't access Javascript or streaming technology. There's also a long-term presentation issue, because some content can only be seen by using older versions of web browsers or plug-ins. This means that sites may not be seen correctly, unless we understand the old technical environment in order to gain access to it.

As members of the IIPC, we're helping to tackle these issues, developing new software and looking at ways of harvesting content.

A collaborative affair

The web archiving team sit in the Library’s technical directorate, eIS. In recognition of the curatorial challenges involved, we work closely with the colleagues in Scholarship & Collections to ensure that curatorial concerns are taken into consideration throughout the web archiving process.

Curators in Scholarship & Collections provide advice on which websites are to be preserved. Our web archivist works closely with the Library's subject specialists to identify websites that are worth preserving - either because of their content, their social, cultural or historical importance, or their design.

Then the technical team makes sure that we have the resources and infrastructure to capture and store the site. Once we've gained permission to archive the site, the web archivist ensures it's harvested correctly. Then we carry out quality control checks to see if the site we have replicates the look and feel of the original. We also have to take the size of the site into account, as we only have a finite capacity for storing sites. There's never a dull moment!

Providing access

The Library's users want easy access to all the material on a given subject - including relevant websites. We're integrating the web archives into the library’s catalogue to ensure that when someone searches our catalogue, relevant content from the web archive shows up in the search results.

The web archive is important to the Library because it adds another layer of world knowledge to our collections. It's also globally important, as the web continues to grow into an essential source of information and communication. Working on this kind of initiative gives our technical people the chance to liaise with leading web archivists around the world and develop a wide range of skills, as well as keeping the British Library at the forefront of this work.