- Published date:
This pilot project will digitise rare and unique printed books from the British Library's South Asian printed books collection and enhance the catalogue records to automate searching and aid discovery by researchers.
At the end of 2015, an international partnership led by the British Library received funding from the Newton Fund to digitise rare material from its South Asian printed books collection. The Two Centuries of Indian Print project has digitised more than 1,000 early printed Bengali books which are now available online and has also digitised 600 books printed in Assamese, Sylheti and Urdu languages which are also being made available online.
Highlights from the digitised collection are available through our online exhibition Early Indian Printed Books. To view all books digitised by the project visit the British Library catalogue selecting 'I want this' and “view digital content online”.
The project is exploring how digital research methods and tools can be applied to this digitised collection and you can explore an interactive map and data visualisation showing the location and activity of book printers operating in nineteenth century Kolkata. The project will also deliver digital skills workshops and training sessions at Indian institutions to support innovative research within South Asian studies.
This project, is a partnership between the British Library, the School of Cultural Texts and Records (SCTR) of Jadavpur University, Srishti Institute of Art, Design and Technology, and the Library at SOAS University of London, working with the National Library of India, the National Mission on Libraries, and other institutions in India.
We hope to extend the project to other languages in further phases as further funding becomes available.
For media enquiries, please contact Ben Sanderson: firstname.lastname@example.org.
For the first time the project has made freely available in digital format the library's collection of bound Quarterly Lists. These are descriptive catalogue records of books published quarterly and by province of British India between 1867 and 1947. The Quarterly Lists are available to download as searchable PDFs and as OCR XML via the British Library's Shared Research Repository.
In 2017 and 2019 we ran competitions to find an optimal solution for automatically transcribing the Bengali Books and Quarterly Lists. Most recently we have trained, and are now using the Transkribus text recognition tool to create OCR transcriptions for the Bengali books. The OCR will facilitate full text searching of the books through the British Library's Universal Viewer. The project has been helped by the School of Cultural Texts and Records at Jadavpur University, who manually transcribed more than 150 pages of Bengali text that were used in the competitions and to train Transkribus. Those accurate transcriptions are freely available to download from the British Library's Research Repository and can be used to train OCR systems on your own Bengali collections. We would love to hear how you have found working with the datasets or if you would like to try OCR for our Assamese, Sylheti and Urdu books.
Events & Outreach
In March 2021 the project collaborated with the West Bengal Wikimedians User Group to run a Wikisource competition in which volunteers proofread OCR-generated text from our collection of digitised Bengali books. The competition ran for one month and saw 17 volunteers, most of whom were based in India, fully corrected the text of 20 books. In total, more than 2,500 pages of error free transcriptions were produced. Inclusion of these books within Wikisource has enhanced their accessibility, as the platform enables side-by-side view of page images and their transcription, and automatic translation of the transcriptions from Bengali into other languages. A project page records the past and planned activity of the Two Centuries of Indian Print project on Wikisource.
Since 2016 The British Library has hosted the ‘South Asia Series’ of talks inspired by the ‘Two Centuries of Indian Print’ project and the BL South Asia collection. The series featured academics and researchers from the UK and abroad, who shared their cutting-edge research with discussions chaired by curators and specialists in the field.
Digital Skills Workshops
In August 2019 the project led an international workshop, at the National Centre for Biological Sciences, Bangalore. Library professionals from 26 Indian institutions learned about the British Library's approach to digitisation standards and workflows. Attendees took part in practical activities introducing them to optical character recognition tools that can be used to process texts in Indian languages, and honing skills in strategic approaches to managing digitisation projects.
These events build on six previous workshops delivered by the project in collaboration with partner institutions in India, that has engaged with more than 200 participants. In February 2019 digitisation workshops were held with the Asiatic Society of Mumbai, and the International Institute of Technology, Indore. In July 2018 our Digital Curator led a training event at the India International Centre in collaboration with the American Institute for Indian Studies and Ashoka University. The event was attended by Archivists representing academic and cultural institutions from across India, as well as from Cambodia and Australia.
The same workshop was held again in October 2017 at the International Conference of Asian Libraries, held at Jamia Millia Islamia University, New Delhi. The event introduced librarians from all over India to the digitisation standards practiced by the British Library. A panel of speakers from the Centre for Studies in Social Sciences, Kolkata, and the Indian International Centre also shared digitisation undertaken at their institutions.
In July 2017 we hosted the second of three skills-sharing workshops at Jadavpur University, Kolkata. The event Developments with Optical Character Recognition for Bangla addressed the challenges and opportunities of OCR and computational linguistics in opening up vast quantities of knowledge to digital researchers. Attendees from 10 different institutions and with backgrounds in information science, academics and computer science, experimented with a range of state-of-the-art OCR tools for Bangla, including open source Tesseract OCR. You can view a guide for how to install and use the latest version of Tesseract to obtain OCR for your own materials.
Participants in the 'Developments with Optical Character Recognition for Bangla' workshop experiment with different OCR tools
In December 2016, the project's first workshop took place at Jadavpur University, Kolkata, where library and information professionals from cultural heritage institutions in Bengal took part in a one-day event to learn more about how information technology is transforming humanities research today, and in turn Library services. View the agenda for the workshop.
In July 2017, we held a two-day academic symposium at Jadavpur Universtity on South Asian Book History which brought together researchers, scholars and Digital Humanities practitioners from the UK, India, Bangladesh and Nepal. 25 speakers across 7 panel sessions discussed cutting edge research in the field. View abstracts from the panel sessions. Videos of the talks can be watched at bl.uk/early-indian-printed-books/vidoes
Workshop on Islam and Print in South Asia
Researchers from India, Bangladesh, Pakistan, Europe and America gathered for a series of two workshops, which took place at the British Library on 28 September and 26 October 2018. You can view the programme and abstracts of talks.
You can support this project and help make the Indian Print Collection freely available online to all.