
Datasets derived from and about British Library collections suitable for use in research and projects that use digital tools or methods.
About the collection
The British Library is committed to making available datasets derived from and about the collections for use in research and projects that use digital tools or methods.
Digitised Collection Datasets
These datasets of images, and associated OCR (optical character recognition, or automatically transcribed text) files and more are exceptionally diverse, encompassing digitised books, prints, drawings, maps, art works and photographs, as well as illuminated manuscripts and bookbindings. These collections prove useful in a wide range of research, from developing new technologies such as pattern recognition applications for hand-writing recognition to analysing how illustration techniques have changed over time.
Metadata
The Library’s vast catalogue and holdings information is as much a rich data source for research as the individual items they describe. Bibliographic and other metadata about our collections when viewed at scale can provide great insight into the shifting trends of publication, emergence and fluctuation of social attitudes and taste, and even how research focuses have changed over time.
Below will direct you to where you can find many of our datasets. Please contact digitalresearch@bl.uk with any questions. For standard metadata services please contact metadata@bl.uk.
What is available online?
British Library Collection Metadata: The Metadata Services team provides a range of services for researchers requiring bibliographic information for research at scale. This includes providing access in a variety of ways to the British National Bibliography (BNB) which has recorded the publishing activity of the United Kingdom and the Republic since 1950 and as such is the single most comprehensive listing of UK books and serials titles.
BL Research Repository: Material in this repository includes a range of datasets produced by our staff or research associates. The British Library is an Independent Research Organisation and as such undertakes significant research, including computational driven research, that is made discoverable here and meets the Open Access mandates of research funders.
Digitised Printed Books 18th-19th Century: 68,000+ digitised volumes (around 25 million pages) published between 1789 and 1914 cover a wide range of subject areas including philosophy, history, poetry and literature with the original JP2000, PDFs, XML and OCR files in Alto format are available for computational research via a range of different methods including IIIF API.
Early English Books Online (EEBO) 1473-1700: Thousands of texts from British Library collections have been digitised and included in the Early English Books Online Platform. 25,000 texts in EEBO have now been fully transcribed as part of the Text Creation Partnership Phase I. More information about this dataset and download links can be found on the Oxford EEBO-TCP site.
Image collections (Historical Illustrations/Photographs) on Flickr Commons: Over 1 million images mainly extracted programmatically from the pages of the Digitised Printed Books 18th-19th Century collection referenced above, but also other sources, on British Library’s Flickr Commons. The Flickr API can be used to directly download large sets of these images, and other metadata such as user-generated tag information.
Image collections (Historical Illustrations/Photographs) on Wikimedia Commons: British Library materials uploaded to Wikimedia Commons are uploaded under open licenses, however exact specifications may vary from file to file. Wikimedia has detailed guidance on this available here. There are a variety of download tools available here: please note that all tools are maintained and run by volunteers and as such, issues may take time to resolve. Using ‘Categories’ on Commons, such as Category:Collections_of_the_British_Library, allows you to view and download specific collections.
News Data: Data generated from digitised newspaper holdings and catalogue records provide valuable perspectives on newspapers past and present held by the British Library. The datasets allow researchers to explore the newspaper collections by time period and place, or deep dive into individual titles by viewing OCR (Optical Character Recognition) text in XML format one year of publication at a time.
UK Theses: The Electronic Thesis Online (Ethos) provides online access to UK doctoral theses dating as far back as the 19th Century via a single web interface. Anyone may access the metadata relating to each theses free of charge without prior permission for not-for-profit purposes.
UK Web Archive data: Web Archiving collects, preserves and makes available web resources from the UK domain and there are a number of historical datasets and derived data that can be utilised for research purposes.
What is available in our Reading Rooms?
Should you wish to view the original materials which these records describe, visit any one of the British Library’s 11 Reading Rooms at St Pancras where the Reference team will be happy to help you.
Share this page
Please consider the environment before printing