Opening Up Speech Archives


Opening up Speech Archives was an AHRC-funded project (2012-2013) studying the use of speech-to-text and speech recognition technologies in research. It was funded by the Arts & Humanities Research Council as part of its Digital Transformations in Arts and Humanities theme.

Published date:

About the project

Speech-to-text technologies have the potential to transform how academic research is conducted. Such technologies take a digital audio speech file and convert it into word-searchable text, with varying degrees of accuracy, comparable to uncorrected OCR (optical character recognition) for text.

Speech recognition services are becoming increasingly familiar to the general public as smart phone applications, but the technological challenge is far greater when it comes to tackling large-scale speech archives.

The British Library has over one million speech-based recordings, and is keen to offer equality of searching across all media, print and audiovisual, to provide the optimum service for researchers.

Opening up Speech Archives did not focus on building a technical solution for the British Library, but rather on surveying the field and engaging with researchers across a range of subject disciplines, to learn how speech-to-text technologies can best serve scholarly needs.

Outputs of the project

  • Opening up Speech Archives conference, held 8 February 2013, bringing together product developers, service providers, archivists, curators, librarians, technicians and researchers from various disciplines – see conference report
  • In-house demonstration services developed in partnership with three suppliers – HP Autonomy, GreenButton, and Nexidia
  • Workshops and interviews with researchers
  • Survey of speech-to-text services and projects
  • Searching Speech demonstrator, presenting 8,000 hours of audio and video from the British Library collection, indexed using the MAVIS system developed by Microsoft Research, and developed for the Library by Greenbutton. It featured television and radio news from 2011 (Al-Jazeera English, CNN, NHK World and BBC Radio 4), historic radio programmes and oral history interviews.


The following sites are examples of the use of speech-to-text to enhance searching of resources:

  • Democracy Live  – BBC News site collating video broadcasts of UK and European parliamentary proceedings
  • Oxford University Podcasts – archive of podcasts with automatic keywords generated using speech-to-text
  • ScienceCinema – research from the US Department of Energy and the European Organization for Nuclear Research (CERN)
  • Voxalead - multimedia news test site, searching across freely-available web news sites from around the world, bringing together programme descriptions, subtitles and speech-to-text transcripts
  • World Service Radio Archive prototype – demonstration service using keywords generated by speech-to-text; available to registered users only)

Further information

Luke McKernan
The British Library
96 Euston Road
United Kingdom

Tel: +44 (0)20 7412 7442
Fax: +44 (0)20 7412 7441


Collection guides

Television and radio news

The Broadcast News service provides onsite access to recent news programmes


We have over 60 million newspapers from the early 17th century to the present day

Web news

Our Legal Deposit UK Web Archive includes millions of websites from the UK domain

More collection guides