Meta has announced the development of an artificial intelligence tool capable of generating human speech. The template needs to be stocked with a few sentences recorded by the user. Then, Voicebox lets you create new audios from written text. Mark Zuckerberg himself appeared, in a clip released via Instagram, speaking good Portuguese – entitled to a rather carioca “s” in the word “everyone”. All AI-generated.
According to the digital conglomerate, it only takes 2 seconds of audio sample for the system to produce new speeches. The idea is to perform text-to-speech to avoid the hassles of eventually re-recording all the audio material.
According to the company, the technology would allow visually impaired people to hear messages from friends or non-playable game characters – the famous NPCs – to have a voice. Voicebox could also provide natural sounds for voice assistants.
Easy content editing
Another important point concerns content editing. In the example, Zuckerberg is recording audio when a horn is heard. The tool, however, manages to “clean” the material. Nowadays there are professional and other amateur software with a similar function, so it remains to be seen how the feature would reach Meta’s applications.
Incidentally, the company has not made any official announcement of the implementation of Voicebox on Instagram, WhatsApp or Facebook. For now, it seems that Zuckerberg just wants to demonstrate the advances the company is making in the field of generative AI. This is the main focus of the moment, along with the (long-term) development in metaverse technologies.
Competition is also at work
Meta is not alone in the research and development of generative AI for voice. Friday’s announcement reminded me of Vall-E, a system introduced by Microsoft in January with the proposal to receive short audios, from the person himself speaking, to generate new files.