What is the work flow between VoiceXML and Speech Synthesis? - text-to-speech

I would like to know how a VoiceXML document is rendered by a text to speech engine of a speech server. The VXML document would be containing the text which is supposed to be converted into an audio file. If the TTS server understands MRCP, to what is the VXML doc converted into, so that the speech server can understand it and how..?

The VoiceXML document as a whole is not parsed by the TTS engine. Instead, the VoiceXML browser is responsible for extracting the prompt, including any Speech Synthesis Markup Language (SSML) markup included in the VoiceXML document, and passing just that text to the TTS engine via MRCP.
You can find more info on SSML from the W3C specification: SSML 1.0 Specification

Related

Mixing languages in the same SSML

If I send this small piece of SSML to the speech processor I get two voices
<speak version='1.0' xml:lang='es-ES'>
<voice xml:lang='es-ES' xml:gender='Male' name='Microsoft Server Speech Text to Speech Voice (es-ES, Pablo, Apollo)'>
<p>
<s>Hola </s>
<s xml:lang='en'>Hello</s>
<s>¿Cómo estas?.</s>
</p>
</voice>
</speak>
A man in Spanish and a woman in English. Is this a limitation of the Project Oxford Text to Speech engine? in other words, I would expect the same voice to speak several languages but it looks like this is not the case.
To quote the SSML spec,
Specifying xml:lang does not imply a change in voice, though this may indeed occur. When a given voice is unable to speak content in the indicated language, a new voice may be selected by the processor.
While the current fallback behavior leaves something to desire, the recommendation is to create multiple voice nodes and pick a voice more explicitly when switching languages.

Read from a searchable pdf, without ocr

I'm currently using my scanner to turn my PDFs into searchable PDFs. The OCR is already taken care of, since I can use ctrl-f within the PDF.
How can I get at the OCR'd content from my program though.
I'm open to using java, ruby, the question is kind of programming language agnostic. Is the OCR'd text openly accessible by reading the file?
Not sure how your OCR software creates the PDF, but could you use a third-party library (jPedal) or tool such as iText or XPDF to extract the text from the resulting PDF?

Starting a Text to Speech in your Language

Is there any opensource library that I could use to feed the letters and sounds and produce a text to speech system.
What must I do to start from scratch? Python would be my language of choice so where must I be headed to develop my own text to speech in my language.
Here's a list of a few Open Source TTS engines:
MBrola
FreeTTS
Festival Speech Synthesis
FLite
Festvox
GnuSpeech
Epos Speech
Maybe one of the covers what you're looking for.

text to speech - microsoft TTS SDK supports the Arabic language?

i would like to know if the microsoft TTS SDK supports the Arabic language. If it does how? If it does not, is there any way to convert arabic text to speech?
please, i want my program to read arabic text using VB 2008.
Microsoft's TTS SDK (SAPI) is language-agnostic - it relies on TTS engines to actually transform the text to speech.
That being said, as far as I know, there are no free Arabic TTS engines available.
Acapela has an Arabic male and female voice available for purchase, but I have no idea how much it costs.

TTS - Text to Speech Synthesis System

I am trying to make a html page including TTS - Text to Speech Synthesis System feature. Please suggest me some online good demos.
Also please let me know if google is providing any api for TTS - Text to Speech Synthesis System.
Thanks a lot.
Unofficial Google API and the limit is 100 characters
http://translate.google.com/translate_tts?q=Hello+Sanket