Click here to skip to content

International collaboration steers future of mass digitisation

01 July 2009

Feeding into the EU's i2010 vision to significantly improve access to Europe's cultural heritage, the British Library and the University of Salford have teamed up with a group of 15 institutions from across the continent as part of the four-year IMPACT project - IMProving ACcess to Text - to remove the barriers that stand in the way of the mass digitisation of the European cultural heritage.

Led by the National Library of the Netherlands, Koninklijke Bibliotheek, the IMPACT project aims to share expertise from across Europe and establish international best practice guidelines with a view to speeding up, standardising and enhancing the quality of mass digitisation through establishing a Centre of Competence for text based digitisation. As one of the main participants, the British Library has taken the lead on one of IMPACT's four sub-projects, establishing the operational context of the work carried out by contributors to the project.

Mass digitisation has become one of the most prominent issues in the library world over the last 5 years, with a number of experienced libraries in Europe already scanning millions of pages each year. To help establish some standardisation over the course of the project, the British Library's team will lead work on a set of 'Decision Support Tools' in an effort to focus on practical implementation support, providing guidance on digitisation workflow, the capturing of material and the organisation of metadata based on the real world experiences of project partners. These measures, announced at the first IMPACT conference in April will help ensure new material can be digitised successfully and feed into existing workflows.

Aly Conteh, e Strategy & Information Systems, Programme Manager, British Library said: "It is absolutely vital institutions like the British Library, the National Library of the Netherlands and technical experts like the University of Salford work together, sharing our experiences and resolving the challenges we face in digitising historic texts. To ensure that ;we deliver the digital resources that are sustainable and meet the expectations of the 21st century researcher."

With extensive experience working with the digitisation of historic material, the British Library has also been working closely with technical experts at the internationally distinguished Pattern Recognition and Image Analysis (PRImA) research group, University of Salford, exploring methods of improving Optical Character Recognition (OCR) for use in the digitisation of less standardised material. OCR technology was absolutely vital for the delivery of the Library's recent newspaper digitisation project of 19th Century UK newspapers (http://newspapers.bl.uk/blcs), allowing the text to be fully searchable, but the current technology has it limitations.

Dr Apostolos Antonacopoulos, Director of the PRImA research group at the University of Salford commented: "This collaboration presents a unique opportunity to make a significant world-wide impact on the digitisation of historical documents, by focusing extensive research expertise to exceptional material in both breadth and volume, such as the collections in the British Library. So far libraries and archives around the world rely on service providers whose best technologies are designed primarily for modern business documents (the service providers' largest commercial market) and cannot take fully into account the nuances of and problems posed by ageing books and newspapers."

Through collaboration IMPACT has already established methods for overcoming issues with geometric correction, border removal and binarisation, and is looking at examples of best practice from around the world, such as the Australian Newspaper Digitisation project's cutting edge application of collaborative user generated corrections, to increase resource discovery success for historic mass digitisation.

Further announcements will be made about the progress of IMPACT and further public events through www.impact-project.eu/news

For more information please contact:

Jacob Lant, Press Officer, British Library
+44 (0)20 7412 7105/ jacob.lant@bl.uk
Follow us on Twitter http://twitter.com/blpressoffice

Jamie Brown
Press and PR Office
T +44 (0)161 295 5361
j.brown@salford.ac.uk

Out-of-hours telephone: +44 (0)20 7412 7150

Notes to editors

The British Library is the national library of the United Kingdom and one of the world's greatest research libraries. It provides world class information services to the academic, business, research and scientific communities and offers unparalleled access to the world's largest and most comprehensive research collection. The Library's collection has developed over 250 years and exceeds 150 million separate items representing every age of written civilisation. It includes: books, journals, manuscripts, maps, stamps, music, patents, newspapers and sound recordings in all written and spoken languages www.bl.uk / http://twitter.com/britishlibrary.

With origins stretching back to 1838, the University of Salford has a rich history of innovation in the community and economy. It has almost 20,000 students from over 140 countries - a diverse mix attracted by the diversity of study programmes, the 'campus in the city' location and a growing international reputation. The University is leading the innovation agenda in research, teaching, and our work with business and the community. To achieve this Salford is investing more than ever before in facilities, and forging partnerships locally, nationally and across the world. Only a mile and a half from the centre of Manchester, the University is at the heart of one of the most dynamic economic areas of the UK - with all the advantages in terms of student experience, business links and opportunities for investment. www.salford.ac.uk

PRImA is a group of researchers aiming at developing world-class Pattern Recognition and Image Analysis techniques for real-world problems. Methods developed by PRImA members and their associates have gained international academic standing and are currently in use in Industry. The extraction of higher-level information from any data (images, signals and other numerical data) is the general goal of PRImA research. The primary research focus is on the processing and analysis of images. The work of PRImA in digital restoration and recognition of historical documents is among the most well-known internationally and has won research awards. PRImA, currently the most active document analysis research group in the UK, is part of a European-wide consortium (IMPACT project) whose goal is to dramatically improve techniques for the mass digitisation of the vast collections held by the British Library and other great European libraries. www.primaresearch.org

IMPACT is a European project that aims to speed up the process and enhance the quality of mass digitisation in Europe. The IMPACT research programme will significantly improve electronic access to historical printed text through the development and use of innovative Optical Character Recognition software and linguistic technologies. IMPACT will also build capacity in mass digitisation across Europe. The fifteen partners (seven libraries, six research institutes and two private sector companies) collectively constitute a Centre of Competence that will share best practice and expertise with the cultural heritage communities in Europe.

The project started on 1 January 2008 and will run for four years http://www.impact-project.eu