Quality degradation with Google TTS service - text-to-speech

I'm using the Google TTS API, using Wavenet, and without any known reason, the audio returned lost quality and seems distorted. This quality degradation happens occasionally, with different texts, longer and shorter.
You can check the audio
here
The ssml used for it was the following:
<speak><prosody pitch='-3st' rate='105%'>Los Angeles on 2020-02-12 and 2020-02-13</prosody></speak>
Does anyone have any idea why this may be happening?

Related

WebRTC: Bad Encoding Performance for Screensharing via CGDisplayStream (h264/vp8/vp9)

I am using the objective-c framework for WebRTC for building a screensharing app. The video is captured using CGDisplayStream. I have a working demo but at 2580x1080 I get only 3-4 fps. My googAvgEncodeMs is around 100-300ms (should be >10ms ideally) which explains why the screensharing is far from being fluid (30fsp+). I also switched between codecs (h264/vp8/vp9) but with all of them I get the same slow experience. The contentType in webRTC is set to screen (values: [screen,realtime]).
The cpu usage of my mac is then between 80-100%. My guess is that there is some major optimisation (qpMax, hardware-acceleration etc...) in the c++ code of the codecs that I have missed. Unfortunately my knowledge on codecs is limited.
Also interesting: Even when I lower the resolution to 320x240 the googAvgEncodeMs is still in the range of 30-60ms.
I am running this on a MacBook Pro 15 inch from 2018. When running a random webrtc inside Chrome/Firefox etc I get smoother results than with the vanilla webrtc framework.
WebRTC uses software encoding and that is the real culprit. Also encoding 2580 x 1080 in software is not going to be practical. Try reducing H and V resolution in half and it will improve performance with some loss in quality. Also if you are doing screen sharing and video support is not critical, you can drop frame rate to 10 frames per second. Logical solution is to figure out how to incorporate h/w acceleration.

WebRTC : Video black screen bitrate

Is the bit rate of black screen shown when video is muted same as the original video's bit rate or is it significantly less because it is just a black screen?
It is significantly less. Since there is essentially no video information being sent to the remote party. How much depends on a lot of factors (connection quality etc).
I just did a quick test and the outgoing bit rate at 640x480 # 27fps was around 900 kbps to 1 mbps. Disabling the video track of the stream resulted in an outgoing bitrate of 30 kbps.
Please keep in mind that this was only a simple test I did. You can get this kind of information yourself by evaluating the reports in peerConnection.getStats
Some documentation and libraries for getStats
https://www.w3.org/TR/webrtc-stats
https://github.com/muaz-khan/getStats
https://www.callstats.io/2015/07/06/basics-webrtc-getstats-api/
Came across chrome://webrtc-internals, which has inbuilt tracking for bit rates and has other good features.
As seen in graph, bit rate before video was muted was ~150k which reduces to ~30k on muting the video.

Core Audio CAPlayThrough example with interface and guitar quality

I'm playing around with the CoreAudio CAPlayThrough example provided by Apple. I'm not doing anything fancy, just attempting to get my guitar to pass through an audio interface (m-audio fast track pro) to my computer and then back out the interface into my headphones. I'm getting some audio to pass through, but the quality is terrible. I have the sample rate set to 48k on the input and output. Is there something I'm missing? I suspect it may be an issue with the bit rate, but I'm not sure how to change that. Any guesses as to what maybe causing this quality issue?

OCR (reading text from photos) in Cocoa?

Is there any code out there, that I can use in Cocoa, to recognize text from photos? Let's say I snap a photo with my iPhone of a page of a book. I'd like to capture the text in it.
There is the Tesseract OCR toolkit that is an open source OCR engine, currently maintained by Google. "Olipion" created a cross compilation tutorial to get in on the iPhone. I would say that this is a good place to start.
However, there are reasons why you might not want to to OCR on the Phone even if you could. Some of these include:
Even the new iPhone 4's processor is not that fast and since you app can't really run in the background doing the processing, the user experience might not be optimal.
Running OCR on a mobile device would probably be a killer for battery life.
Every time you would want to update the OCR engine everybody who installed your app would have to upgrade.
For an always connected mobile device running the OCR on a server somewhere would be probably better. You could upgrade your OCR software easily, you could run much more powerful algorithms then a mobile device could handle and so on.
I am not so sure that you would be able to get good results from photos taken using a mobile camera -- accuracy of OCR systems goes way down with the kind of poorly lit, noisy, distorted images likely to be captured using a phone camera.
As far as commercial products out there, there is Evernote that gives you a OCR capability if you buy their premium service.
As an alternative to machine OCR, there is always Mechanical Turk, where you could pay people small amount to do the OCR for you. Would probably do better at transcription given the image source.

Apps using Google Wave

I just watched Google Wave Keynote video on Google I/O and I must say I was very impressed with pretty much everything mentioned in the video, the possibilities with Google Wave are enormous.
I'd like to ask if there are any projects using Google Wave already in beta (usable stage) and I would also like to know when is Google Wave supposed to be available for the rest of us who didn't attend Google I/O.
As great as the technology is. It is safe to say it will only be used to find more inventive ways for us to:
Not socialize in real-life
Make communications that would be ill-advised in real-life
Buy things we haven't seen in real life
Unlearn things that are useful in real life (like spelling)
Joking aside, you can signup for the sandbox (as I have) and play around with apps and robots and whatever. You can Sign up here for the developer preview and have a look at what is going on!
You could also run your own wave setup using the information here and experiment with the down-and-dirty!
You can request developer access to the wave sandbox at: https://services.google.com/fb/forms/wavesignupfordev/
It might take a few weeks.