SSML using Chrome TTS - api

I'm trying to give a little more clarity to TTS sentences by indicating emphasis, etc. I'm using the Chrome TTS API, which indicates that it accepts SSML-formatted documents in addition to raw text.
After many attempts, and a reading a few comments on the web, it doesn't look like this is actually supported, or possibly that this is up to individual voices for implementation.
Does anyone know:
Has SSML been abandoned under Chrome?
If not, is there any indication whether they expect to support it via native voice, or they're hoping that someone else will implement?
Do any Chrome voices currently exist that support this?
Thanks!

I'm a Chrome engineer. SSML support has not been implemented yet, but it's planned. Obviously not all engines would support it, but when we implement SSML support we'll also implement support for stripping SSML from engines that don't support it.
Sorry the documentation is misleading here.
Star this bug to express interest and get notified when it's fixed: https://code.google.com/p/chromium/issues/detail?id=88072

If anyone's looking at this later, you can control prosody on Mac Chrome using Apple's native command syntax, at least for the default voices:
the square root of [[pbas +4]] 2 [[char LTRL]]a[[char NORM]] to the [[pbas +4]] 14 [[char LTRL]]x[[char NORM]]
Documented here.

Related

Using Microsoft attributes with Google TTS

In my application, I am already using Google TTS but I am amazed by Microsoft TTS because they are providing a lot more useful attributes than Google. Since I am more familiar with Google, I would like to keep my implementation but would still like to be able to use MS attributes like:
<mstts:express-as style="cheerful">
That'd be just amazing!
</mstts:express-as>
Is that possible?
There are no style attributes in Google Text-to-Speech, but you can change the Standard voice to a WaveNet voice[1].
The WaveNet voice synthesizes speech with more human-like emphasis and inflection on syllables, phonemes, and words. You can see all the supported voices in Google Text-to-Speech[2].
[1]https://cloud.google.com/text-to-speech/docs/wavenet#wavenet_voices
[2]https://cloud.google.com/text-to-speech/docs/voices

Using different intonations with Watson text to speech

I am developing a PoC using Watson text to speech and Watson conversation.
Sometimes, the chatbot needs to ask a question, so I'd like text to speech to synthesize the voice using an interrogation intonation.
Is it possible to be done?
Watson Text to Speech supports SSML, and has expressive SSML tags.
The one you want to use is Uncertainty. As it is defined as "conveys an uncertain, interrogative message".
Example:
<express-as type="Uncertainty">
Could she still be in the office? She told me that she might leave early.
</express-as>
More details on it's usage is here:
https://console.bluemix.net/docs/services/text-to-speech/SSML-expressive.html#the-express-as-element
Yes, you can certainly use text-to-speech (TTS) for output and speech-to-text (STT) for input. You would need to use a middleware or app layer to drive the conversation and route the input/output to the other services (see "how to use" in the docs).
I have used the following TJBot recipe as a simple and good started for some projects: https://github.com/damiancummins/tell_the_time
Unfortunately Concatenative TTS may have problems to create correct intonation in questions. If you think it happens consistently or too often please open a bug.
If you have a specific question which gets incorrect intonation try to rephrase it a little bit if possible. A useful trick for this voice could be to use double question mark '??'

Best way to communicate to Catia via Browser

I have a question and was hoping that you could maybe help me out. I currently built an API to communicate between Catia (CAD-Application) and my browser so I can create parts/products read and write parameters etc.
One of my problems is that the only way I know to do this is via ActiveX which I really don't wanna do as it forces everyone to use IE11. Since CatiaV5 is a pretty old Software there won't be any elegant way to use it via some sort of Restful api or similar.
Since I've been working with JavaScript for a while now, did apps for phones but have no idea about browser extensions my question was this. Is it possible to write a Chrome Extension that actually uses com dlls and enables me to set up a connection to a piece of software (in this case Catia) and work with it.
NPAPI plug-ins could be the solution for my problem as it looks from the information I found but it's also supposed to also being phased out.
Do you think it's still worth looking into deeper and studying on writing such an NPAPI app or is there maybe a more elegant way you can think of?
Would be happy for some ideas and suggestions. Thanks in advance and
Greetings Chris
In modern browsers the only way to do this would be using native messaging:
Chrome
Firefox
Edge
FireBreath2 has an abstraction for building c++ plugins which supports native messaging, though the docs are still a little sparse.
Zetakey browser support NPAPI.
We provide HTML5 browser embedded system for industrial and enterprise application.
Www.zetakey.com
Best regards,
Jack Wong

Convert festival tts to flite tts

i currently have a tts which is built using festival and festvox. i need to convert these voices and build a TTS in flite. apparently you can do the conversion using festvox (the festvox and flite websites say so but no proper steps on how to do it). can some one please help me out with it as i am new to this area?
thanx in advance ..
Just in-case anyone else was wondering the same i found the steps mentioned in this document useful and also subscribe to the mailing lists and feel free to ask question.
although i must mention i never implemented to TTS using "flite". i went ahead with "espeak"

Speech Recognition API

I need to automatically transcribe some short MP3s as part of a proof of concept I am working on. I am currently looking into cloud solutions or web API services to send the MP3 as a simple HTTP request and receive a transcription back.
The only free/open source solution I have found here, but the demos don't seem to work (at least not on the files I need to transcribe). I have found some enterprise solutions for call centers, but so far nothing I can simply integrate into a project.
Are there any web based speech recognition services available? One that is able to filter out small noise would be a plus.
Here is an unofficial method to access Google ASR capability. I just tested on Yesterday and it still works - you can get JSON style ASR output with words and associated confidence score from an FLC audio sampled in 16KHz.
Also you can try speech recognition engine of Windows 7 to produce subtitles. Here is the tool for that.
This may be a good match. Also, their techcrunch profile (See this) lists competitors as: SimulScribe, SpinVox, Vlingo, Nuance, Microsoft, Google
Some of these links may be helpful.
Vlingo, Bing and Google have recognizers in the cloud, but I don't think they make them publicly programmable. I believe they are accessible only from their authorized clients.
For a proof of concept (and low volume), have you considered just using the desktop speech engines that come in Windows 7? What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition? may be helpful. The MS desktop recognizers ship with a dictation grammar and it sounds like that is what you will need.