Datasets for content mining


Books from the early modern period and the 19th century are among the thousands of digitised items available for content mining.

About the collection

A variety of digitised content, including some 68,000 digitised volumes of 19th-century books, including literature, historical and philosophical works are available for content mining.

Please see the Intellectual Property Office's Exceptions to copyright: Research and the Chartered Institute of Library and Information Professionals (CILIP) for guidance on the UK Copyright exception for text and data mining.

If you have an interesting research idea in mind that relies on content mining, please contact to discuss. 

What is available online? (beta): As part of its work to open its data to wider use, the British Library is making copies of some of its collection related datasets available for research and creative purposes. We aim to describe collections in terms of their data format (images, full text, metadata, etc), licences, temporal and geographic scope, originating purpose (e.g. specific digitisation projects or exhibitions) and collection, and related subjects or themes. This site is a 'beta', and is in the early stages of development.

19th Century Printed Books: 60K+ digitised volumes (around 25 million pages) published between 1789 and 1900 cover a wide range of subject areas including philosophy, history, poetry and literature. The original JP2000, PDFs, XML and OCR files are available for download at The full list of titles included this collection is available here to download (20MB xls). 

Early English Books Online (EEBO) 1473-1700: includes thousands of digitised texts from British Library's collections. 25,000 texts in EEBO have now been fully transcribed as part of the Text Creation Partnership Phase I.

What is available in our Reading Rooms?

Should you wish to view the original materials from which these datasets derive, visit any one of the British Library’s 11 Reading Rooms at St Pancras where the Reference Enquiry Team will be happy to help you.