Welcome to Bible TTS
BibleTTS is a large high-quality open Text-to-Speech dataset with up to 80 hours of single speaker, studio quality 48kHz recordings for each language. We release aligned speech and text for six languages spoken in Sub-Saharan Africa, with unaligned data available for four additional languages, derived from the Biblica open.bible project. The data is released under a commercial-friendly CC-BY-SA license.
Corpus Statistics
The BibleTTS corpus consists of high-quality audio released as 48kHz, 24-bit, mono-channel FLAC files. Recordings for each language consist of a single speaker recorded under professional quality, close-microphone conditions (i.e., without background noise or echo). BibleTTS is rare among public speech corpora for the volume of data available per speaker and the audio quality for creating TTS models. Furthermore, the corpus consists of languages which are under-represented in today’s voice technology landscape, both in academia and in industry.
Our aligned data is publicly available on OpenSLR.
Unaligned Hours | Aligned Hours | Aligned Verses | Sample | |
---|---|---|---|---|
Ewe | 100.1 | 86.8 | 24,957 | listen |
Hausa | 103.2 | 86.6 | 40,603 | listen |
Kikuyu | 90.6 | -- | -- | -- |
Lingala | 151.7 | 71.6 | 15,117 | listen |
Luganda | 110.4 | -- | -- | -- |
Luo | 80.4 | -- | -- | -- |
Chichewa | 115.9 | -- | -- | -- |
Akuapem Twi | 75.7 | 67.1 | 28,238 | listen |
Asante Twi | 82.6 | 74.9 | 29,021 | listen |
Yoruba | 93.6 | 33.3 | 10,228 | listen |
Demo
All trained models are integrated to Coqui TTS and can be demoed at huggingface spaces:
https://huggingface.co/spaces/coqui/CoquiTTS
TTS model links and samples
All models are end-to-end VITS speech synthesis models trained as described in the paper.
TTS samples coming soon!
Model checkpoint | Config file | In-domain sample | Out-of-domain sample | |
---|---|---|---|---|
Ewe | link | link | listen | listen |
Hausa | link | link | 1, 2, 3 | listen |
Kikuyu | -- | -- | -- | -- |
Lingala | link | link | listen | listen |
Luganda | -- | -- | -- | -- |
Luo | -- | -- | -- | -- |
Chichewa | -- | -- | -- | -- |
Akuapem Twi | link | link | listen | -- |
Asante Twi | link | link | listen | listen |
Yoruba | link | link | listen | listen |
Links to code
Alignment methodology
- Segmentation using existing verse timestamps (Sec 4.1.1)
- Forced alignment using pre-trained acoustic models (Sec 4.1.2)
- Forced alignment from scratch (Sec 4.1.3)
Outlier detection
Data-checker code for outlier detection (Sec 4.2)
TTS model training
VITS TTS models were trained with coqui-ai (Sec 5)