Quarterly Lists: Digitally Researching Catalogues of Indian Books
- Article written by: Tom Derrick
As well as digitising rare early printed Indian books, the Two Centuries of Indian Print project is making available online some wonderful catalogues held by the library, generally known as the Quarterly Lists, recording all books published quarterly and by province of British India between 1867 and 1947.
The catalogues complement the Bengali printed books, such as Koner Ma Kande, and I’d like to share a bit more about what the Quarterly Lists are and what we are doing to make them as accessible as possible for researchers of book history who want to apply digital research methods to explore their rich contents.
Firstly, a little more about the origins of these catalogues. With the passing of The (Indian) Press and Registration of Books Act, 1867 it became mandatory for all books published in provinces of British India to be sent to the provincial secretariat library for registration. Both the India Office Library and the British Museum Library in London, later to be united in the British Library’s collection, were separately given the privilege of requesting books from these lists free of charge in what amounted to a colonial legal deposit arrangement.
The act was passed with the aim of recording the ever growing number of publications originating from the various printing presses throughout India, its purpose political as well as archival. Not all works that issued from the presses were recorded in the lists and only a small percentage were actually deposited in the London collections. The library curators in London selected only those works which they thought were important or interesting. The Quarterly lists were originally published as appendices in the official provincial newspapers, such as the Calcutta Gazette, and Bihar and Orissa Gazette.
Now digitised for the first time, we have applied optical character recognition to the Quarterly Lists to create ALTO XML for every page, which is designed to show accurate representations of the content layout. This enables researchers to apply computational tools and methods to look across all 100,000 pages of the lists to answer their questions about book history. Researchers are able to examine a rich seem of bibliographic data about books published throughout India, including the name and address of printers and publishers, price of publication and how many copies were printed. So if a researcher is interested in what the history of book publishing reveals about a particular time period and place, the full XML OCR and searchable PDF dataset can be accessed from data.bl.uk/twocenturies-quarterlylists/.
Through the Digital Research strand of the project we will be seeking out innovative research groups willing to take a crack at improving the character error rate and accuracy of tabular text recognition and extraction from the Quarterly Lists. With that in mind, we have launched a competition through the University of Salford’s PRIMA Research Lab, as part of the International Conference on Document Analysis and Recognition, taking place in Kyoto, Japan in November 2017. The competition seeks an accurate and automated transcription solution for the Bengali books as well as the Quarterly Lists. So if you or anyone you know would like to enter, do please register and you could be contributing to this landmark project, and picking up an award for your troubles!
The text in this article is available under the Creative Commons License.