Transcription of Kaye and Johnston's catalogue of manuscripts
This small but moderately complex project used AI to transcribe a catalogue, but still required considerable human effort to review and refine the output.
5 November 2025This small but moderately complex project used AI to transcribe a catalogue, but still required considerable human effort to review and refine the output.
5 November 2025Blog series Untold lives
Author Maddy Clark, Digitisation Officer

Page 1169 of the printed catalogue of European Manuscripts Volume II, Part II showing an overview summary of the Wilson Manuscripts.
A recent addition to the Research Repository is the transcription of the Catalogue of Manuscripts in European Languages Volume II Part II: Minor Collections and Miscellaneous Manuscripts, Section II- Nos. 539-842 by George Rusby Kaye (1866–1929) and Edward Hamilton Johnston (1885–1942).
This catalogue is a continuation of Kaye and Johnston Vol. II, Part II, Section 1, Nos. 1–538, (1937), and was completed by Johnston following Kaye’s death in 1929, covering manuscripts numbered 539–842. It was printed for publication but never published.
This transcription has been created from the copy held by the India Office Records & Private Papers section at the British Library. This particular copy contains some useful additional annotations by later curators that broaden the original text’s descriptions, including further details regarding the provenance of some collections, as well as serving to correct some mistakes. These notes have been added in the published transcription.
This project was completed using Transkribus, an AI text recognition software for historical documents. For the initial transcription and creation of ground truth data, 75 pages were digitised using a scanning tent. A community-developed AI model was initially used for the first phase of transcription, however, as it is a more ‘generic’ model it was not attuned to the conventions of Kaye and Johnston. It struggled with the following:

Pages 1208-1209 of the Printed Catalogue of European Manuscripts Volume II, Part II being edited in the Transkribus programme.
The transcriptions created from the community model, once corrected, provided the ground truth data that could be used to create a more precise custom transcription model. Transcription models aren’t always perfect on the first try and are improved by 'feeding’ it more ground truth data based on corrected pages to make it as accurate as possible. The first custom model had a ‘Character Error Rate’ (CER) of 3.77%, based on 150 pages of text. That percentage indicates that out of 100 characters, only 3.77 would be wrong. This was definitely not an accurate reflection of its abilities, but it was a promising start.
In total three custom models were developed, with the final model being trained on 99 ground truth manually corrected pages, comprised of 80,668 words, and with a CER of just 0.91%. Success! While some human quality control was required on all pages, human text correction time was significantly reduced by 3-10 minutes. Some other strange quirks did make themselves known in later models, such as upside-down text and the occasional complete gibberish being transcribed.
One of the clear conclusions from this project is that AI when used on a small and moderately complex project such as this still relies heavily on human effort. It can make the repetitive, time-consuming parts more manageable, but a person still needs to review and refine the output. In this case, that meant around 320 hours of human effort.
The final PDF of the transcription is now available on the Research Repository, along with an additional document outlining the project in more detail. Both are free to download and search.
Kaye and Johnston Vol. II, Part II, Section 1, Nos. 1–538, (1937) is accessible via Google Books (as of 2025)
Other published catalogues in this series include:

This blog is part of our Untold Lives series, sharing stories of people’s lives from our collections. Stories from around the world, from the dawn of history to the present day, are told through the written word, images, audio-visual and digital materials.
We hope to inspire new research and encourage enjoyment, knowledge and understanding of the British Library and its collections.

You can access millions of collection items for free. Including books, newspapers, maps, sound recordings, photographs, patents and stamps.