Web Archiving Programme
The Web Archiving Programme collects, makes accessible and preserves web resources of scholarly and cultural importance from the UK domain. Our objectives are:
- to build a comprehensive web archive as part of the British Library’s digital collection
- to preserve the archive so that it remains accessible in the future
- to put in place people, processes and systems so that the Library can fulfill its obligations with respect to legal deposit of web resources
Selective web archiving
Since 2004, the British Library has been (with permission) selectively archiving websites with research value that are representative of British social history and cultural heritage. Our Collection Development Policy states the criteria we use to select websites from the UK domain. Archived websites to date are made available through the UK Web Archive, along with additional material archived by National Library of Wales, the Joint Information Systems Committee and the Wellcome Library.
The UK Web Archive contains regular snapshots of over 5,000 websites and offers rich search functionalities including full-text, title and URL search. The archive in addition can be browsed by Title, by Subject and by Special Collection.
Exploring domain-scale web archiving
The implementation of Legal Deposit for UK online publications – expected in 2011 – means that the Library will have a mandate to collect and preserve freely available UK online publications. The Web Archiving Team is exploring the technical and curatorial challenges of collecting in future a much larger proportion of the UK domain. Through large scale discovery crawls and semantic analysis, we aim to build a better understanding of the boundaries and characteristics of the UK domain. We will also put in place a system which is capable of scaling up to the size of the challenge – particularly given the size of the UK web space, which is expected to contain over 11 million websites by 2011.
Integration with Library collections and systems
We are working with the Digital Library Programme and the Digital Preservation Team on ingest, storage and long-term preservation of web archives in the Digital Library System, initially involving our selective archive. Access to web archives will be provided in the Reading Rooms, though the Library’s Resource Discovery System Primo. The Primo beta service can already find and display archived websites (hint: try searching for ‘Robin Cook’ or ‘Argotist’ or ‘rhyming slang’).
Developing web archiving tools
In recent years, the BL has been leading the development of key web archiving software tools on behalf the international web archiving community.
The Web Curator Tool (WCT), which has been designed to manage the selective web archiving process, started as a collaborative project with the National Library of New Zealand, and has since been adopted by the National Library of Norway. The BL releases periodic revisions via Sourceforge.
Heritrix is open source crawler software which has been commonly used by national libraries and archives around the world for web archiving. The BL is part of a multinational group of libraries working on ‘smart’ extensions to Heritrix, which was released in December 2009 as version 3.0, to provide better support for large-scale domain crawling.
Working with others
The British Library is a founder member of the International Internet Preservation Consortium (IIPC), which brings together national libraries and other organisations interested in web archiving, sharing experience and promoting the use of common standards and tools. Members of the Web Archiving Team currently chair two of the four IIPC working groups, the Access and the Harvesting Working Group.
From 2004 to 2008 the British Library was also the lead partner in the UK Web Archiving Consortium (UKWAC) comprising six organisations: the BL, the Joint Information Systems Committee, the National Archives, the National Library of Wales, the National Library of Scotland and the Wellcome Trust. UKWAC shared a common infrastructure to get selective web archiving started in the UK. It has now evolved to become a strategic group within the Digital Preservation Consortium (DPC), providing leadership and encouraging collaboration for UK web archiving activities. For more information on UKWAC / DPC, see http://www.dpconline.org/about/web-archiving-and-preservation-task-force.html.
Contact information
Helen Hockx-Yu
Web Archiving Programme Manager - eIS
The British Library
96 Euston Road
London
NW1 2DB
United Kingdom
Tel: +44 (0)20 7412 7184
E-mail: helen.hockx-yu@bl.uk

