Microsoft's Speech Generator has become so Good that it's Not Being Published!

General news News Tech Tech News 07/14/2024

0 243 Views

TECH NEWS – VALL-E 2 remains a research project because Microsoft says it could pose a significant risk of malicious use.

The Redmond-based tech giant said in a blog post that its latest neural codec language model for speech synthesis “achieves human parity for the first time,” meaning it has become so sophisticated that it is nearly impossible to distinguish the text generated from that of a real person, and can do so from a very limited sample and command set. With just a few seconds of speech, VALL-E 2 works from a large training library that maps pronunciation, intonation, and voice changes between the model and the sample, producing synthesized speech that looks absolutely convincing.

In the blog post, Microsoft presents several examples of how the zero-shot TTS process can produce amazingly high-quality speech from 3-10 seconds of material. But the ethical statement should also be addressed in the post. In it, Microsoft states that it has no plans to release VALL-E 2 to the public: “VALL-E 2 is a research project only. At this time, we have no plans to incorporate VALL-E 2 into a product or to release it to the public. There may be potential risks in misusing the model, such as spoofing voice identification or impersonating a particular speaker. We conducted the experiments under the assumption that the user agrees to be the target speaker in speech synthesis. If the model is generalized to unseen speakers in the real world, it should include a protocol to ensure that the speaker consents to the use of his or her voice and a synthesized speech recognition model.”

Microsoft previously made a similar decision regarding VASA-1. This is a technology that can take a still image and create a video in which the person in the image can convincingly move. What we don’t understand is what the company is doing with this technology. If they have created it, they will use it for something, but if the audience can’t do it, who will?

Source: PCGamer, Microsoft

Spread the love

Angyal Anikó

Anikó, our news editor and communication manager, is more interested in the business side of the gaming industry. She worked at banks, and she has a vast knowledge of business life. Still, she likes puzzle and story-oriented games, like Sherlock Holmes: Crimes & Punishments, which is her favourite title. She also played The Sims 3, but after accidentally killing a whole sim family, swore not to play it again. (For our office address, email and phone number check out our IMPRESSUM)