Can any TTS engine change a voice's language, and subsequently its phoneme? - text-to-speech

Lets say I want to have some English text spoken in an Italian accent.
Many of the engine demos I have tried on their respected sites will have the Italian language available, but when you try to get it to pronounce a few sentences in English, they often become highly unintelligible because they are operating by a different phoneme.
There are phoneme tags in SSML, and I know one site that allows you to actually demo with SSML. I try putting in this common and generic Italian conversation into their Italian voice:
Mama mia! Princess Peach and my friends have been kidnapped?
Chase Bowser, so we can eat some spaghetti!
And it is fairly unintelligible. Utilizing SSML or something else; Can I keep the accent, but correct the speech phoneme enough to make it intelligible?

You can hire a voice-talent with Italian accent and make a new TTS model where such option is available. Even with a several hours of speech you can get a decent model.
The second option is speech morphing, but it requires some efforts as well as knowledge in the domain.

Related

Is there a way to convert speech directly into SSML?

Just as one is able to use various speech-to-text 'dictation' tools to convert spoken word into its corresponding text, I would like to know if there are similar such tools for converting spoken word into its corresponding SSML. That is, it will provide the text in addition to the relevant SSML tags associated with any intonation, prosody, pauses/breaks, inflection, etc... present in the speaker's voice.
I work on building Voice apps. In a recent project I was working on, we needed the text to sound exactly right, with all the associated intonations, prosody, pauses/breaks, inflection, etc.
On extensive research, we found that the only way to make the text sound like being spoken by a real person is either to use SSML (still not perfect) or a recorded mp3.
If you're trying to get the real person feel for a project, the best way to execute it is to utilize a human. I would suggest you record the mp3 (/get it recorded by a professional) instead of trying to get SSML from voice.
The reason we use SSML is exactly that computers cannot understand the associated intonations, prosody, pauses/breaks, inflection, etc. of human speech.
If your goal is to get SSML, then the best way would be to convert text to SSML. For this, I'd suggest taking a peek here:
W3C SSML
Google SSML
Amazon SSML
This is to the best of our knowledge # mid July 2018.
If anyone has more info please feel to add to this answer.
Hope this helps :3

How can I make language translation integration in pdf reader? Is there any Option?

I am a Portuguese learner and also a engineering student with lectures in Portuguese which is a big headache. I frequently need to look over Portuguese files provided by professors. I wish there could be a pdf reader that has a translation facility so that every time I had a doubt I could just right click over or something like that to get english translation of it. Is there any option rather than just copying the words and making google translate or whatever? It's becauses I need to read lot.

How I can start building wordnet for Turkish language to use in sentiment analysis

Although I hold EE background, I didn't get chance to attend Natural Language processing classes.
I would like to build sentiment analysis tool for Turkish language. I think it is best to create a Turkish wordnet database rather than translating the text to English and analyze it with buggy translated text with provided tools. (is it?)
So what do you guys recommend me to do ? First of all taking NLP classes from an open class website? I really don't know where to start. Could you help me and maybe provide me step by step guide? I know this is an academic project but I am interested to build skills as a hobby in that area.
Thanks in advance.
Here is the process I have used before (making Japanese, Chinese, German and Arabic semantic networks):
Gather at least two English/Turkish dictionaries. They must be independent, not derived from each other. You can use Wikipedia to auto-generate one of your dictionaries. If you need to publish your network, then you may need open source dictionaries, or license fees, or a lawyer.
Use those dictionaries to translate English Wordnet, producing a confidence rating for each synset.
Keep those with strong confidence, manually approving or fixing through those with medium or low confidence.
Finish it off manually
I expanded on this in the "Automatic Translation Of WordNet" section of my 2008 paper: http://dcook.org/mlsn/about/papers/nlp2008.MLSN_A_Multilingual_Semantic_Network.pdf
(For your stated goal of a Turkish sentiment dictionary, there are other approaches, not involving a semantic network. E.g. "Semantic Analysis and Opinion Mining", by Bing Liu, is a good round-up of research. But a semantic network approach will, IMHO, always give better results in the long run, and has so many other uses.)

How to use CFStringTokenizer with Chinese and Japanese?

I'm using the code here to split text into individual words, and it's working great for all languages I have tried, except for Japanese and Chinese.
Is there a way that code can be tweaked to properly tokenize Japanese and Chinese as well? The documentation says those languages are supported, but it does not seem to be breaking words in the proper places. For example, when it tokenizes "新しい" it breaks it into two words "新し" and "い" when it should be one (I don't speak Japanese, so I don't know if that is actually correct, but the sample I have says that those should all be one word). Other times it skips over words.
I did try creating Chinese and Japanese locales, while using kCFStringTokenizerUnitWordBoundary. The results improved, but are still not good enough for what I'm doing (adding hyperlinks to vocabulary words).
I am aware of some other tokenizers that are available, but would rather avoid them if I can just stick with core foundation.
[UPDATE] We ended up using mecab with a specific user dictionary for Japanese for some time, and have now moved over to just doing all of this on the server side. It may not be perfect there, but we have consistent results across all platforms.
If you know that you're parsing a particular language, you should create your CFStringTokenzier with the correct CFLocale (or at the very least, the guess from CFStringTokenizerCopyBestStringLanguage) and use kCFStringTokenizerUnitWordBoundary.
Unfortunately, perfect word segmentation of Chinese and Japanese text remains an open and complex problem, so any segmentation library you use is going to have some failings. For Japanese, CFStringTokenizer uses the MeCab library internally and ICU's Boundary Analysis (only when using kCFStringTokenizerUnitWordBoundary, which is why you're getting a funny break with "新しい" without it).
Also have a look at NSLinguisticTagger.
But by itself won't give you much more.
Truth be told, these two languages (and some others) are really hard to programatically tokenize accurately.
You should also see the WWDC videos on LSM. Latent Semantic Mapping. They cover the topic of stemming and lemmas. This is the art and science of more accurately determining how to tokenize meaningfully.
What you want to do is hard. Finding word boundaries alone does not give you enough context to convey accurate meaning. It requires looking at the context and also identifying idioms and phrases that should not be broken by word. (Not to mention grammatical forms)
After that look again at the available libraries, then get a book on Python NLTK to learn what you really need to learn about NLP to understand how much you really want to pursue this.
Larger bodies of text inherently yield better results. There's no accounting for typos and bad grammar. Much of the context needed to drive logic in analysis implicit context not directly written as a word. You get to build rules and train the thing.
Japanese is a particularly tough one and many libraries developed outside of Japan don't come close. You need some knowledge of a language to know if the analysis is working. Even native Japanese people can have a hard time doing the natural analysis without the proper context. There are common scenarios where the language presents two mutually intelligible correct word boundaries.
To give an analogy, it's like doing lots of look ahead and look behind in regular expressions.

Resources for programs teaching natural languages

What API's and data sets are available for use in programs to teach natural languages e.g. to aid in learning to read/write/listen/speak a 2nd language? These could be web or traditional API's to dictionaries, translation services, associations of words / concepts to images, sounds e.g. spoken words or phrases, movies, or sets of flashcard decks. Also of interest are websites that could be spidered to obtain local data sets for offline use.
As a start, I note that that the Google translate API can be accessed programmatically.
There is an online web course for Swedish with sound files.
There are online texts and MP3 files for many languages including Swedish from the Foreign Service Institute.
I am especially interested in resources for Swedish, but feel free to add resources for other languages. Please tag any answers with the relevant language or languages.
I think spidering sites that provide language learning content is against the ToU for most of them.
If you're interested in attempting to analyze pronunciation and grade its correctness, this API may be worth referencing. I don't have personal experience with it, however - algorithms for this are really still in their infancy, and don't provide a lot of value to the average consumer at this point.