I want to build my own TTS (Text to Speech) App using HTS (HMM-based speech synthesis system) for the Arabic language.
I fail to find any step by step instructions on how to build the synthesizer using HTS. What I have done is to download the sample Speaker Dependent Demo on the HTS website and train that data ana tested it on Festival (English speaker).
Now I don't know what files should I change in the HTS-demo to build my voice(language).
You first build Festival unit selection voice for your language following Building Synthetic Voices guide.
After you have the voice and required lab and utt files, you run training.pl script from HTS with updated paths to your database and it will build the voice for you.
Related
I'm trying to build an app to make image recognition using the phone camera... I saw a lot of videos where using the camara the app identify where is the person or which feelings they have or things like that in real time.
I need to do a built an app like this, I know it's not an easy task, but I need to know which technologies can be use in order to achieve this in a mobile app?
Is it tensor flow?
Are there some libraries that helps to achieve this?
Or do I need to build a full Machine Learning with IA app?
Sorry to make such a general question but I need some insights.
Redgards
If you are trying to do this for the iOS platform, you could use a starter kit here: https://developer.ibm.com/patterns/build-an-ios-game-powered-by-core-ml-and-watson-visual-recognition/ for step-by-step instructions.
https://github.com/IBM/rainbow is a repo which it references.
You train your vision model on the IBM Cloud using Watson Visual Recognition, which just needs example images to learn from. Then you download the model into your iOS app and deploy with XCode. It will "scan" the live camera feed for the classes defined in your model.
I see you tagged TF (which is not part of this starter kit) but if you're open to other technologies, I think it would be very helpful.
Is there any tool or methods to do automation testing for games developed in cocos2d-x?
FYI: Appium and other tools try to find Android and IOS elements, I don't use them in the games.
I highly recommend you to use Image Recognition for Mobile App Game Testing, Appium too supports Image recognition.
You can use applitools for doing automatic image comparison with Appium. In fact Applitools automates not only the capturing of the screenshot, but also it's validation, and has sophisticated algorithms to avoid false failures of the image comparison. Applitools' Selenium SDK works out of the box with Appium as well as with any other WebDriver implementation, and is cross-platform and cross-device. Applitools have a free registration plan, and the SDKs are open source and available on GitHub.
For More Information Use below links
Image Recognition Part 1
and Image Recognition Part 2 also take a look at Sikuli
Using Google Translate, it is possible to generate TTS mp3 audio files from English or Chinese input using a simple URL. For example,
URL for English TTS for the word "English":
http://translate.google.com/translate_tts?tl=en&q=English
URL for Chinese TTS for the word 中文:
http://translate.google.com/translate_tts?tl=zh_CN&q=中文
How can I do the same using IME Hanyu Pinyin with tonal notations as input? For example, I want to generate the audio TTS file for zhōng wén (instead of 中文).
I have searched high and low on this website for a solution but was not successful. I do apologise if I have overlooked a solution previously offered here. Thanks
Hi I have an adobe air 2 project which records some short sounds via the microphone. I am able to save the streams as wav files but require them to be saved as mp3 (For replaying in the flash player).
Does anyone know if this is possible?
If it isn't is there anyway to get the flash player to player audio in wav form?
Any hints appreciated.
If you are using Adobe AIR v2 then you should be looking for a Non-AS3 commandline tool to convert the wav file to mp3. The process is fairly CPU intensive and would take a LONG time in Actionscript even if there was a library out there that accomplished the task (which I haven't heard of).
My suggestion is to include a tool like LAME with you application and pass your wave file to it (essentially running the process in another thread in C). The only downside is providing an executable for each potential OS you'll be deploying to, if you intend on support Windows, Mac, and Linux that could be up to three different Wave->Mp3 commandline tools.
Link for LAME: http://lame.sourceforge.net/
Seems to be possible: http://www.jordansthings.com/blog/?p=5. No ready source, but libraries listed there should help. (I would just decompile it if need be.)
Is there any code out there, that I can use in Cocoa, to recognize text from photos? Let's say I snap a photo with my iPhone of a page of a book. I'd like to capture the text in it.
There is the Tesseract OCR toolkit that is an open source OCR engine, currently maintained by Google. "Olipion" created a cross compilation tutorial to get in on the iPhone. I would say that this is a good place to start.
However, there are reasons why you might not want to to OCR on the Phone even if you could. Some of these include:
Even the new iPhone 4's processor is not that fast and since you app can't really run in the background doing the processing, the user experience might not be optimal.
Running OCR on a mobile device would probably be a killer for battery life.
Every time you would want to update the OCR engine everybody who installed your app would have to upgrade.
For an always connected mobile device running the OCR on a server somewhere would be probably better. You could upgrade your OCR software easily, you could run much more powerful algorithms then a mobile device could handle and so on.
I am not so sure that you would be able to get good results from photos taken using a mobile camera -- accuracy of OCR systems goes way down with the kind of poorly lit, noisy, distorted images likely to be captured using a phone camera.
As far as commercial products out there, there is Evernote that gives you a OCR capability if you buy their premium service.
As an alternative to machine OCR, there is always Mechanical Turk, where you could pay people small amount to do the OCR for you. Would probably do better at transcription given the image source.