Friday, September 6, 2019

Real Time Voice Cloning

People are so concerned about deep fake technologies like FaceApp. These technologies are fascinating but their implications are also terrifying. Real Time Voice Cloning, started by Corentin Jemine on Github is one of these faking technologies, but instead of swapping faces this app clones voices.

Jemine, also known as CorentinJ on Github, has developed a Voice Cloning Toolbox that can take a 5 second clip of a person speaking and simulate their voice. The user must simply type a phrase in and the toolbox will generate audio that sounds like the actual person, speaking what you have typed. It is the implementation of a paper hosted at: https://arxiv.org/pdf/1806.04558.pdf. It converts a 5 second audio clip into the numerical representation of a voice that can be used to train a text-to-speech model to generate new voices. This is all part of a three-stage deep learning framework called SV2TTS, the three stages being the encoder, the synthesizer, and the vocoder.

A video showing the exact process to record and synthesize the voice can be found below:



The software is coded mostly in Python, aside from 1.2% coded in Jupyter Notebook, which is an Open Source web application that allows users to share documents injected with active code.

I was interested in this repo as the thought of someone copying my voice so accurately fascinates and terrifies me so much more than them taking my face. A voice is so personal, while people can look like other people all the time. I feel like if this software develops further, along with things like FaceApp, we could see a change in the way we represent ourselves online. I also think that this is a great example of collaboration on Github, and a good first look at an open source project.

This software is located at on Github here.

1 comment: