The News Data collection comprises a range of downloadable, shareable, reusable datasets derived from metadata created from the British Library’s news collections.
About the collection
Data generated from digitised newspaper holdings and catalogue records provide valuable perspectives on newspapers past and present held by the British Library. The datasets allow researchers to explore the newspaper collections by time period and place, or deep dive into individual titles by viewing OCR (Optical Character Recognition) text in XML format one year of publication at a time.
In the future we will be adding data from other news collections, including: radio, television and web.
What is available online?
News data outputs are freely available via the British Library Research Repository.
Full-text records of newspaper titles digitised from the British Library collection. Each file contains the newspaper's output for one year, with OCR (Optical Character Recognition) text in XML format.
Full-text records of historic press directories, listing newspapers and other journals, digitised from the British Library collection. Each file contains the run of a press directory over selected years, with OCR (Optical Character Recognition) text in XML format.
Title-level listings of news collections held by the British Library, comprising data extracted from the British Library catalogue, with some data cleaning and enhancements.
We have produced visualisations and guides for exploring the data, including:
Interactive newspaper catalogue: British Library Newspapers: Explore the collection.
German language newspapers in North America
UK & Ireland newspaper title trends
Longest running UK newspapers still published today
Non-English language newspapers in the UK