Welcome to Bible TTS

BibleTTS project website

Welcome to Bible TTS

Paper - Data - Github

BibleTTS is a large high-quality open Text-to-Speech dataset with up to 80 hours of single speaker, studio quality 48kHz recordings for each language. We release aligned speech and text for six languages spoken in Sub-Saharan Africa, with unaligned data available for four additional languages, derived from the Biblica open.bible project. The data is released under a commercial-friendly CC-BY-SA license.

Corpus Statistics

The BibleTTS corpus consists of high-quality audio released as 48kHz, 24-bit, mono-channel FLAC files. Recordings for each language consist of a single speaker recorded under professional quality, close-microphone conditions (i.e., without background noise or echo). BibleTTS is rare among public speech corpora for the volume of data available per speaker and the audio quality for creating TTS models. Furthermore, the corpus consists of languages which are under-represented in today’s voice technology landscape, both in academia and in industry.

Our aligned data is publicly available on OpenSLR.

Unaligned Hours Aligned Hours Aligned Verses Sample
Ewe 100.1 86.8 24,957 listen
Hausa 103.2 86.6 40,603 listen
Kikuyu 90.6 -- -- --
Lingala 151.7 71.6 15,117 listen
Luganda 110.4 -- -- --
Luo 80.4 -- -- --
Chichewa 115.9 -- -- --
Akuapem Twi 75.7 67.1 28,238 listen
Asante Twi 82.6 74.9 29,021 listen
Yoruba 93.6 33.3 10,228 listen


All trained models are integrated to Coqui TTS and can be demoed at huggingface spaces:


All models are end-to-end VITS speech synthesis models trained as described in the paper.

TTS samples coming soon!

Model checkpoint Config file In-domain sample Out-of-domain sample
Ewe link link listen listen
Hausa link link 1, 2, 3 listen
Kikuyu -- -- -- --
Lingala link link listen listen
Luganda -- -- -- --
Luo -- -- -- --
Chichewa -- -- -- --
Akuapem Twi link link listen --
Asante Twi link link listen listen
Yoruba link link listen listen

Alignment methodology

  1. Segmentation using existing verse timestamps (Sec 4.1.1)
  2. Forced alignment using pre-trained acoustic models (Sec 4.1.2)
  3. Forced alignment from scratch (Sec 4.1.3)

Outlier detection

Data-checker code for outlier detection (Sec 4.2)

TTS model training

VITS TTS models were trained with coqui-ai (Sec 5)