- Published date:
This project has digitised rare and unique printed books from the British Library's South Asian printed books collection and enhanced the catalogue records to automate searching and aid discovery by researchers.
At the end of 2015, an international partnership led by the British Library received funding from the Newton Fund to digitise rare material from its South Asian printed books collection. Having run from 2016-2019, The Two Centuries of Indian Print project has enabled cataloguing and digitisation of over 1600 printed books from the South Asian collections, dating from 1713-1914, in a range of languages including Bengali, Assamese, Sylheti and Urdu. These books are now available online.
Highlights from the digitised collection are available through our online exhibition Early Indian Printed Books. To view all books digitised by the project visit the British Library catalogue selecting 'I want this' and “view digital content online”.
The project also explored how digital research methods and tools can be applied to this digitised collection and you can explore an interactive map and data visualisation showing the location and activity of book printers operating in nineteenth century Kolkata. Digital skills workshops and training sessions were delivered at Indian institutions to support innovative research within South Asian studies.
For this venture the British Library partnered with the School of Cultural Texts and Records (SCTR) of Jadavpur University, Srishti Institute of Art, Design and Technology, and the Library at SOAS University of London, the National Library of India, the National Mission on Libraries, and other institutions in India
To complement the project, a series of South Asian Seminars were hosted by the British Library, whereby academics and researchers from the UK and abroad shared their research and knowledge, including discussions chaired by curators and specialists in the field. The talks were inspired by the Two Centuries of Indian Print Project and often referenced the British Library collections, covering topics related to South Asian history.
The South Asia Series talks from 2016-2019 are available to listen to through SoundCloud.
Recordings from seminars that took place in 2021 are available on YouTube.
At the start of 2022 the project has received follow-on funding from AHRC-UKRI to facilitate events to celebrate 75 years of Indian Independence and to mark the end of the Two Centuries of Indian Print project. The events will run from July-December and will include film screenings, show and tells, community workshops and research seminars. Find out more about what the project involves here.
For media enquiries, please contact Ben Sanderson: email@example.com.
For the first time the project has made freely available in digital format the library's collection of bound Quarterly Lists. These are descriptive catalogue records of books published quarterly and by province of British India between 1867 and 1947. The Quarterly Lists are available to download as searchable PDFs and as OCR XML via the British Library's Shared Research Repository.
Since the project’s inception, we have been exploring solutions to automate recognition of Indian language texts. In 2017 and 2019 we ran competitions to find an optimal solution for automatically transcribing the Bengali Books and Quarterly Lists. We have since trained, and are now using, the Transkribus text recognition tool to create OCR transcriptions for the Bengali books. The OCR will facilitate full text searching of the books through the British Library's Universal Viewer. The project has been helped by the School of Cultural Texts and Records at Jadavpur University, who manually transcribed more than 150 pages of Bengali text that were used in the competitions and to train Transkribus. Those accurate transcriptions are freely available to download from the British Library's Research Repository and can be used to train OCR systems on your own Bengali collections. We are currently testing solutions for Urdu OCR and would welcome contact from groups developing tools or researching OCR for Urdu collections, and would encourage those using our existing datasets to approach us with any feedback they might have.
Events & Outreach
In 2021 the project collaborated with the West Bengal Wikimedians User Group to run two Wikisource competitions in which volunteers proofread OCR-generated text from our collection of digitised Bengali books. Across both competitions the community on Bengali Wikisource corrected more than 5,000 pages of text. The corrected text is being validated by Wikisource administrators, with more than 35 books so far made publically available. Users can view page images and transcripts side-by-side and automatically translate them into more than 100 languages, or download for further exploration. We continue to add books to Wikisource and will again be partnering with the Wikimedia Foundation in 2022 to integrate the project’s metadata within Wikidata.
Digital Skills Workshops
Since 2016 the project has run several ‘capacity building’ workshops in India. Across the workshops more than 200 participants from cultural heritage institutions have come together to exchange knowledge on digitisation standards and workflows, data curation, and digital humanities. The workshops have featured talks from industry experts as well as practical activities where participants have got hands on with optical character recognition software and devised strategies for managing digitisation projects. Workshops have taken place at Jadavpur University’s School of Cultural Texts and Records (2016 and 2017), Jamia Millia Islamia University (2017), the American Institute for Indian Studies (2018), the Asiatic Society of Mumbai (2019), the International Institute of Technology, Indore (2019), and the National Centre for Biological Sciences, Bangalore (2019). The project is planning more workshops for 2022.
Participants of the workshop held at the National Centre for Biological Sciences, Bangalore, August 2019
In July 2017, we held a two-day academic symposium at Jadavpur Universtity on South Asian Book History which brought together researchers, scholars and Digital Humanities practitioners from the UK, India, Bangladesh and Nepal. 25 speakers across 7 panel sessions discussed cutting edge research in the field. View abstracts from the panel sessions. Videos of the talks can be watched at bl.uk/early-indian-printed-books/vidoes
Workshop on Islam and Print in South Asia
Researchers from India, Bangladesh, Pakistan, Europe and America gathered for a series of two workshops, which took place at the British Library on 28 September and 26 October 2018. You can view the programme and abstracts of talks.