NoGoolag

Mozilla publishes the largest public transcribed voice dataset.

Mozilla makes available the largest set of human voices based entirely on crowdsourcing. The data set includes 18 different languages and adds up to nearly 1,400 hours of recorded voice data from more than 42,000 contributors.

From the outset, our vision for Common Voice has been to create the world's most diverse voice dataset, optimized specifically for the development of speech. We have also promised to make the dataset freely accessible so that start-ups, researchers* and anyone else interested in speech technologies can use the high-quality transcribed speech data we have collected.

Today, we are pleased to present our first multilingual dataset, covering 18 languages - including English, French, German and Mandarin (traditional), but also Welsh and Kabyle, for example. This new dataset contains a total of approximately 1,400 hours of voice recordings from more than 42,000 people.

With this release, the Common Voice record is now the largest of its kind, thanks to the support of tens of thousands of people who have brought their voices and written sentences to the Public Domain (CC0). The complete data set is now available for download on the Common Voice page.

Web: https://voice.mozilla.org/en/datasets

📡 @NoGoolag
#mozilla #dataset #voice #crowdsourcing #multilingual #speech

1.47K views15:41

About

Blog

Apps

Platform