New voice Google translator preserves the tone and voice of the source language

Google introduced a new experimental Translatotron neural network, is able to directly translate speech into another language without using it the text view and save the voice data and the speech rate of the speaker, according to a company blog. System with long short-term memory is able to accept voice input and process it as desired and then generate on this basis new spectogram in the target language. Under certain conditions this will increase not only the speed of translation, but also its accuracy. A more complete description of the new development can be found in an article published in the online repository of scientific articles

“Translatotron is the first end-to-end model, which can directly translate speech from one language to speech in another language, retaining the original features of the speech source”, — the company said in its official blog.

Google noted that most of the modern systems of machine translation of speech is built on the principle of the cascading method when the task is divided into several more simple tasks. In the first automatic speech recognition. Then performed machine translation from one language to another, and after this the translated text is converted back into speech, which is almost always different voice from the original media.

The cascading system has proved its efficiency and practicality, and is used in most transfer systems, including Google. However, Google in the field of AI believe that this system is not perfect. At every stage, problems can occur, which generally reduces the quality of the finished result. In Google believe that end-to-end translation model may be superior to cascade, deleting the middle part Sajadi, where it is first translated into text.

As explained in Google, the cascade principle of the translation is not similar to how people who know multiple languages, mentally translate speech from one language to another. How it works — is difficult to describe, but translators are unlikely to agree that they first break the text in the head, then mentally visualizing it, translating it into language, and then simply count.

The spectrogram of the source language and the translated speech. The quality of the translation, admittedly, not the best, but it sounds more natural

Simulation of cognitive abilities is one of the principles of machine learning. The developers Translatotron decided to use as input for the translation of the spectrogram (the image showing the dependence of the power spectral density of a signal over time) of the speech source and generate on their basis of new spectrograms for the target language. This approach differs from the cascade method of translation. The researchers note that as in any other case, the new system has its advantages and disadvantages.

One of the advantages through the method of translation is that despite its complexity, this single-stage process, not multi-stage. Thus, if sufficient processing power Translatotron able to perform the translation faster. But even more important is the fact that the system retains the character and features of the original speech translation, the speech data and the speech rate of the speaker, and reproduces the translation of the neutral synthetic voice.

Those who understand linguistics, as well as those who are engaged in the technology of speech synthesis will surely agree that the translation is important not only what man says, but how he says it. Changes in the expression of the original speech in the speech translation can radically change the meaning of what was said. Examples of work Translatotron can be found by clicking on this link. Not only pay attention to the quality of the translation is more important than intonation.

The developers Translatotron recognized that the precision of the translation system is still not ahead of the traditional cascade system, but like any machine learning model, with time she can improve. Given the advantage of maintaining the original voice of the speaker even in a translated speech, further research in this area may be useful for future systems Google translation on the basis of AI.