So I put Google TTS on my website (http://jieshangxiaochi.blogspot.com/2015/04/guge-putonghua-yanjiang.html).
Three problems:
1. Google TTS only accepts 100 char max. How do I increase this?
2. Chinese speech rate is too fast. How do I reduce the speed?
3. When a link to a particular TTS is opened, clicking 'play' won't do, one must click 'Refresh' first. Why is that? (link example: http://translate.google.com/translate_tts?tl=zh-CN&q=%E6%8C%96%E9%BC%BB%E5%AD%94%20%E7%9A%84-%20%E7%AC%AC%E4%BA%8C%E9%83%A8%E5%88%86%E3%80%82%E5%94%89%EF%BC%8C%E5%A7%90%E6%80%8E%E4%B9%88%E8%BF%99%E4%B9%88%E5%80%92%E9%9C%89%EF%BC%81%E5%81%8F%E5%81%8F%E5%B0%B1%E5%9C%A8%E6%88%91%E6%8C%96%E9%BC%BB%E5%AD%94%E6%8C%96%E5%BE%97%E5%85%A5%E8%BF%B7%E7%9A%84%E6%97%B6%E5%80%99%EF%BC%8C%E8%84%9A%E6%AD%A5%E5%A3%B0%E4%BC%A0%E6%9D%A5%EF%BC%8C%E4%B8%80%E4%B8%AA%E7%94%B7%E7%A5%9E%E4%B8%80%E6%A0%B7%E7%9A%84%E7%94%B7%E5%AD%90%E7%9C%8B%E5%88%B0%E4%BA%86%E6%88%91%E6%8C%96%E9%BC%BB%E5%AD%94%EF%BC%8C%E5%9B%A0%E4%B8%BA%E6%88%91%E6%B2%A1%E6%9C%89%E5%81%9C%E6%AD%A2%E3%80%82%E5%A4%AA%E5%B0%B4%E5%B0%AC%E4%BA%86%EF%BC%8C%E5%93%8E%E5%91%80%EF%BC%8C%E5%B0%B4%E5%B0%AC%E6%AD%BB%E4%BA%86%E3%80%82)
What I have tried: Google around.
On problem #1: They said Google translate speech now has no character limit, but why mine is still limited to 100?
On problem #2: Adding a dot between every Chinese characters. This slowed down the speed rate, but it became hard to differentiate one sentences from another. Giving two dots created the same delay as one dot.
On problem #3: I am at loss here.
Related
I went through the documentation of Google Text to Speech SSML.
https://developers.google.com/assistant/actions/reference/ssml#prosody
So there is a tag called <Prosody/> which as per the documentation of W3 Specification can accept an attribute called duration which is a value in seconds or milliseconds for the desired time to take to read the contained text.
So <speak><prosody duration='6s'>Hello, How are you?</prosody></speak> should take 3 seconds for google text to speech to speak this! But when i try it here https://cloud.google.com/text-to-speech/ , its not working and also I tried it in rest API.
Does google text to speech doesn't take duration attribute into account? If they don't then is there a way to achieve the same?
There are two ways I know of to solve this:
First Option: call Google's API twice: use the first call to measure the time of the spoken audio, and the second call to adjust the rate parameter accordingly.
Pros: Better audio quality? (this is subjective and depends on taste as well as the application's requirements)
Cons: Doubles the cost and processing time.
Second option:
Post-process the audio using a specialized library such as ffmpeg
Pros: Cost effective and can be fast if implemented correctly.
Cons: Some knowledge of the concepts and the usage of an audio post-processing library is required (no need to become an expert though).
As Mr Lister already mentioned, the documentation clearly says.
<prosody>
Used to customize the pitch, speaking rate, and volume of text
contained by the element. Currently the rate, pitch, and volume
attributes are supported.
The rate and volume attributes can be set according to the W3
specifications.
Using the UI interface you can test it.
In particular you can use things like
rate="low"
or
rate="80%"
to adjust the speed. However that is as far as you can go with Google TTS.
AWS Polly does support what you need, but only on Standard voices (not Neural).
Here is the documentation.
Setting a Maximum Duration for Synthesized Speech
Polly also has a UI to do a quick test.
I mean posts per sentence, not per letter. Such a doctor's prescription handwriting which hard to read. Not just a normal handwriting.
In example :
I use a data mining or machine learning for doing a training from
paper handwrited.
User scanning a paper with hard to read writing.
The application doing an image processing.
And the output is some sentence from paper.
And what device to use? (Scanner or webcam)
I am newbie. If could i need some example in vb.net with emguCV/openCV and researches journals.
Any help would be appreciated.
Welcome to stack overflow! The answer to your question is twofold:
a. If you want to recognize handwriting that has already happened i.e. it is presented to you as an image you are in trouble. Computer Vision is still not good enough to provide you with reasonable accuracy.
b. If you have a chance to recognize handwriting “as it's happening” - you are in luck. Download, for example, a Gesture Search app from Android play store and you are in business.
The difference between the two scenarios is subtle but significant. In the second case you have an extra piece of information that makes handwriting recognition possible. This piece is timing of each stroke. In other words, instead of an image with handwriting you have a bunch of strokes that are all labeled with their time stamps. You can think about it as a sequence of lines and curves or as image segmentation - in any way this provides a big hint for the system. Additional help comes from the dictionary on your phone but this is typically used by any handwriting system.
Android of course has an open source library for stroke recognition (find more on your own). If you still want to go for recognizing images though, you have to first detect text (e.g. as a bounding box) and second use any of the existing engines to process detected regions. For text detection I can recommend MSER. But be careful trying to implement even text detection on your own - you are entering a world of pain here ;). Here is an article that can help.
As for learning how to recognize text from images on the Internet - this can be your plan B or C or Z when you master above mentioned stages. Don’t try to abuse learning methods and make them do hard work for you - you will hit a wall if you don’t understand what’s going on under the hood.
like why do websites insist on entering captchas?
i think that even robots are capable of reading that...
I want to know how this thing with captcha work...:/
In the past 10 years robots were unable to identify distorted test as entered in captcha . But now a days robots can identify distorted text and even door number plates used in so many websites as captcha .
I am not sure if this is the perfect answer but you might want to take a look at this blog post where author has mentioned that so far no program or robot has been able to read the CAPTCHA or any other form of distorted texts while computer programs can read the simple text or text in images, no such system has been so far developed to scan and read distorted texts or CAPTCHA.
There is a thing called Turing Test which differentiates a machine an a Human. Captchas are formed so that machines fail that test. But using AI , now-a-days machines have started passing Turing tests.
Captchas will typically display alphanumeric characters in a distorted way so that the human brain can process it in a way automatic character recognition can not. This trick is how the website administrator can tell humans and robots apart.
As machine learning algorithms get better, captchas are getting more complicated for actual humans to solve. This is an arms race between website administrators trying to keep their site safe from robots and hackers trying to create fake accounts in an automated way.
Google's reCAPTCHA asks you to read 2 fields. One is an actual captcha, the other is an image that Google's machine learning system has failed to read (such as a house street number from Google Street View). By solving this captcha, you are helping the machine to learn.
Are there any easy-to-use free or cheap speech synthesis libraries for PIC and/or ARM embedded systems where code size is more important than speech quality? Nowadays it seems that a 1 meg package is considered "compact", but a lot of microcontrollers are smaller than that. Back in the 1980's Apple hired a contractor to produce Macintalk, which offered reasonable-quality speech in a 26K package which ran on a 7.16MHz 68000, and a program called SAM could produce speech that wasn't quite as good, but still serviceable, with a 16K package that ran on a 1MHz 6502. The SpeakJet runs a speech-synthesis algorithm on some type of PIC.
I probably wouldn't particularly need to produce speech, but would want to be able to speak messages formed from a number of pre-set words. Obviously it would be possible to simply prerecord all the messages, but with a vocabulary of e.g. 100 words, I would think that storing 16K worth of code plus maybe 1K worth of phonetic strings would be more compact than storing audio for 100 words.
Alternatively, if I wanted to store audio for 100 words, what would be the best way of generating a set of words that would flow naturally together? On older-style speech synthesizers, any given word could be spoken three ways: neutral inflection, falling inflection (as if followed by a period), or rising inflection (followed by a question mark). Words with neutral inflection could be spliced together in any order and sound fine. The text-to-wave tools I've found, though, seem to like to add finer details of inflection which sound "off" if words are cut apart and resequenced. Are there any tools which are designed for producing waves that can be concatenated and spliced nicely? If I do use such a tool, what audio format would be best for storing the waves so as to allow efficient decoding on a small microcontroller?
Last time I did this I was able add hardware like:http://www.sparkfun.com/products/9578 . There may be patent liabilities in your environment, like I ran into, that force a commercial software stack or OTS chip.
Otherwise, I've used http://www.speech.cs.cmu.edu/flite/ for more lenient projects, and it worked well.
Hello due to reasons of chinese paranoia and google being a bunch of pansies I am in the situation where I need to alter a number of gps waypoints stored in a gpx file so they are are correctly aligned with google map which is not correctly aligned... for reasons for aforementioned paranoia.
So I have a waypoint with a known landmark (railyway station) I can see that landmark on the google map, I would like to be able to move the waypoint in my gpx file to the new the one on the map and have all the other waypoints adjust accordingly.
This could be achieved by creating a new waypoint over the station on the map and calculating and then applying the difference or with some kind of GUI drag and drop.
I have no idea how to go about this and wonder if anyone knows of a decent solution other than persuading google to stop being pansies....
Of course google could change their magic misalignment randomly and then I'm truely screwed but hey ho.
Well you could build a small web app that takes your GPX track and overlays it on google maps. Then write some code to let the user enter some number of "corrected pairs" where they click on the GPX point and then the point on google maps. Once they have done this for n number of points, where n is the number of points you want to achieve accuracy, you can calculate an average errorX and errorY. Then you can go about and for each GPX point do X + errorX and Y + errorY which should be good on average.
Does that make sense?
Thanks for the reply TheSteveO I'd forgotten about this, in the end I used the rather handy javascript library provided here
http://www.movable-type.co.uk/scripts/latlong.html
To build myself a simple command line script which loads and realigns all the coordinates based on, as you suggested the difference between a known point on google maps and a waypoint of the same place.
I did attempt to implement it in php but unfortunately ran into a slew of floating point math problems and being pressed for time just went the javascript route.