Click here to skip to content

Harvesting Digital Heritage

New Zealand-led partnership breaks new ground in the management of the world's digital heritage by developing new system for collecting web pages for digital heritage archives.

The National Library of New Zealand Te Puna Mãtauranga o Aotearoa, The British Library and Sytec, a subsidiary of TelstraClear, have announced the successful development of a web harvesting management system.

The system, known as the web curator tool, will enable organisations to easily gather online material for storage in digital archives.

The web curator tool is the latest development in the practice of web harvesting (using software to 'crawl' through a specified section of the world wide web, and gather 'snapshots' of websites, including the images and documents posted on them). The web curator tool is a further advance in the race to ensure the world's digital heritage is preserved for future generations and not lost through obsolescence and the temporary nature of the web.

The partnership was brought together under the auspices of the International Internet Preservation Consortium (IIPC) to find a desktop solution to the challenge of collecting web material that would allow widespread implementation of web harvesting without requiring a high level of technical understanding within organisations.

The Consortium asked the National Library of New Zealand and the British Library to work together to develop a solution that will manage the web harvesting process and can be adapted for all consortium members and other institutions. The project was funded entirely by the national libraries, with IIPC members contributing to the initial solution requirements.

The web curator tool has been developed as an enterprise class solution. It is interoperable with other organisational systems and has a user-centred design. The web curator tool enables users to select, describe and harvest online publications without requiring an in-depth knowledge of web harvesting technology. It is auditable, has workflows and identifies the content for archiving and then manages it, including permissions, selection, descriptions, scoping, harvesting and quality review.

The National Library of New Zealand and The British Library will integrate the web curator tool into their own digital preservation programmes and the system will be shared with other organisations around the world as an open source release before the end of the year.

'It is very exciting to be involved with the British Library and the IIPC in this flagship project. It is also a great pleasure for us to contribute to a project that will bene fit all participants in the digital preservation space' says Penny Carnaby, National Librarian and Chief Executive of the National Library of New Zealand.

Ms Carnaby noted that it was a tribute to the teams involved that the project has been completed on time, within budget and with no concern for the distances involved. Ms Carnaby described the successful partnership as a very telling example of the decreasing grip of the 'tyranny of distance' in a digital world.

"The British Library is delighted to be collaborating in this project on behalf of the International Internet Preservation Consortium," said Stephen Green, the British Library's Web Archiving Programme Manager.

"The web curator tool forms a key part of our strategy, allowing us to augment an automated uk-domain level 'harvest' of websites with the collection of specific sites that we consider to be an important part of British cultural heritage. We hope that the Web Curator Tool will benefit the wider community, which is why it will be released as open-source software. Going forward, the British Library will be working with the National Library of New Zealand and the IIPC to seek ways of maintaining and enhancing the software over time."

Brendon Price, Software Solutions Team Leader for Sytec, says the company is pleased to have designed a tool that will record the history of the Internet for the archiving community around the world.

"It's exciting to work on a project of this magnitude. The project was challenging in that while our primary customers were the National Library of New Zealand and the British Library, the solution also had to be compatible with the other systems used by the IIPC."

"We've developed an integrated system that eliminates many of the previous problems faced when archiving web content, improving the workflow and integration of the harvesting operation," he says.

For further information

Courtney Johnston
National Library of New Zealand
T: +64 4 474 3013
E: courtney.johnston@natlib.govt.nz

Ben Sanderson
The British Library
T: + 44 (0)1937 546126
M: +44 78100 56848
E: Ben.Sanderson@bl.uk

Jodine Laing
TelstraClear Ltd
T: + 64 9 912 5343
M: + 64 29 912 5343
E: Jodine.Laing@team.telstraclear.co.nz

About National Library of New Zealand Te Puna Mãtauranga o Aotearoa

The National Library of New Zealand Te Puna Mãtauranga o Aotearoa has a vision of New Zealanders connected with information important to all aspects of their lives. The National Library and the Alexander Turnbull Library provides access to the nation's documentary heritage, preserves this heritage so that future generations of New Zealanders can explore and enjoy it, provides resources to schools that support all teaching and learning in New Zealand, and fosters relationships with communities, including Maori, in New Zealand and throughout the world. For more information visit www.natlib.govt.nz

About the British Library

The British Library is the national library of the United Kingdom and one of the world's greatest research libraries. It provides world-class information services to the academic, business, research and scientific communities and offers unparalleled access to the world's largest and most comprehensive research collection. The Library's collection has developed over 250 years and exceeds 150 million separate items representing every age of written civilisation. It includes: books, journals, manuscripts, maps, stamps, music, patents, newspapers and sound recordings in all written and spoken languages. Further information is available on the Library's website at www.bl.uk.

About Sytec

Sytec is a leading IT&T services company committed to providing clients with a competitive advantage through the provision of balanced technology advice, solutions, and systems management. Established in 1987 as a privately owned company, Sytec was purchased by TelstraClear in November 2004 thereby giving Sytec not only a strong financial parent but also access to a broader spectrum of ICT technologies.

Background Information

  • The International Internet Preservation Consortium (IIPC) was formed in July 2003 by national libraries from around the world and the Internet Archive (IA) to participate in projects for developing tools and approaches supporting the archiving of web content. http://netpreserve.org/about/index.php
  • The web curator tool development will be completed at the end of September 2006. The total cost of the project is around $400,000 shared equally between the National Library of New Zealand and the British Library.
  • The National Library of New Zealand's previous web harvesting methods relied on a combination of HTTrack web harvesting software, manual processes, and document management systems.
  • The passing of the National Library of New Zealand (Te Puna Mãtauranga o Aotearoa) Act 2003, requires the Library to collect, preserve and make accessible digital collections, along with the traditional paper collections, in ways that ensure current and future access to New Zealand 's documentary heritage. The Act also extended legal deposit to include digital material, including websites, and this came into force on 12 August 2006 authorising the National Librarian to copy Internet documents. For further information http://www.natlib.govt.nz/en/services/5legaldeposit.html
  • The web curator tool has been developed under the umbrella of the National Digital Heritage Archive (NDHA) Programme. The National Library established the NDHA Programme in 2004 to manage the development of software that will process the ingest, storage, preservation and access of published digital material obtained through legal deposit and donated unpublished digital material.
  • The National Library recently announced a partnership with Endeavor Information Systems, a library software company, to create a comprehensive solution for permanent storage and access to digital collections to be known as the National Digital Heritage Archive (NDHA).
  • http://www.natlib.govt.nz/bin/media/pr?item=1154900360

    http://www.natlib.govt.nz/bin/media/pr?item=1154899822

    http://www.natlib.govt.nz/en/whatsnew/4initiatives.html#NDHA%20Kronos

  • Material harvested by the web curator tool will be archived in the NDHA.
  • The National Library of New Zealand (Te Puna Mãtauranga o Aotearoa) Act 2003. http://www.natlib.govt.nz/files/Act03-19.pdf.