Last year is seen as a ground breaking year for AI technology. Google is closer or maybe has achieved its dreams as it develop “Tacotron 2“, an artificial intelligence (AI) system that converts text-to-speech.
This new development came barely after the Google’s Indian-origin CEO, Sundar Pichai made a statement that the giant tech firm was gradually channeling its attention to “AI first” and no longer “mobile-first“. He made this statement at the Google I/O 2017 developers conference. There, he launched Google’s new products and features like Smart Reply for Gmail, Google Lens and Google Assistant for iPhone.
Analyzing the paper published in arXiv.org, the mode of operation of Tacotron 2 is, it first creates a spectrogram of the text, a visual representation of how the speech should sound. The visual representation is then put through Google’s existing WaveNet algorithm, which uses the representation and brings AI closer than ever to indiscernibly mimic human speech. The algorithm is made in a way to master different voices and even generate artificial breaths.
The Developers of the AI system were quoted to have said that “Our model achieves a mean opinion score (MOS) of 4.53 comparable to a MOS of 4.58 for professionally recorded speech.”
Google explained that “Tacotron 2” can detect from context the difference between the noun “desert” and the verb “desert,” as well as the noun “present” and the verb “present,” and alter its pronunciation accordingly. The company further explained that the AI system can place emphasis on capitalised words and apply the proper inflection when asking a question rather than making a statement.
Also Read|GOOGLE TESTS PUBLISHING TECH SIMILAR TO SNAPCHAT