Is there a way of using the <prosody> tag in SSML to adjust individual words without a pause (without using a post-processor) - text-to-speech

When using the prosody tag in SSML with Google Cloud TTS, I cannot adjust the attributes of individual words without creating an unwanted pause.
The code below creates a lag between 'New' and 'Video'. It has been suggested that a postprocessor can remove these pauses, but I'd like to know if there's a way of doing it directly within the code itself?
<speak>
Hello, and welcome to this<prosody pitch="+3st">New</prosody>Video Tutorial.
</speak>

After testing, it appears there isn't a way of doing this using Google Cloud TTS. You can manually edit the sound file after generating it, but thay defeats the object of the exercise.

I don't have the cleanest answer, as what you are asking is not very supported. Prosody's pitch contour let's you change the tone of voice at different parts of the sentence.
Example of Prosody contour
<speak><prosody contour="(0%, +20Hz) (20%, +30%) (100%, +20%)"> Hello friends! </prosody></speak>
I am still playing around with this, but it seems like a tedious way of getting what you want done.
Using contour
contour takes a string of tuples "(%position in sentence, pitch adjustment) (..., ...)
I hope this helped and best of luck on your work!

Related

Intellij IDEA SDK - How can I programmatically handle spellcheck 'typos'?

Wrote a plugin to handle some custom format stuff in yaml files that I've written for a huge project. It's a chat bot that can respond in a huge number of ways. There is a lot of slang and non-standard words in the yaml.
I don't want to disable spellchecking as I want to fix legitimate speeling errors. But the annotations under the "misspelled" slang words are conflicting with the annotations in my plugin, and causing issue.
One yaml file has 349 "typos". 10% or so are legit. The rest are slang and custom words.
I need to do one of two things. Either add those words to the dictionary (I've found the method to do that - SpellCheckManager.getInstance(project).acceptWordAsCorrect()) OR get a list of the words and create a custom dictionary from them. Both approaches require me to grab a list of all typos in the document/editor/project.
That's the part I can't find. Looked everywhere. (List of current Annotations? List of current Problems?) Googled my fingers off. Anyone able to point me in the right direction?
This is not the IDEAL solution, but it worked for my means, and I'm leaving the answer in case this is googled.
In DaemonCodeAnalyzerImpl, there is a method:
DaemonCodeAnalyzerImpl.getHighlights(Document document, HighlightSeverity minSeverity, Project project);
This returns a list of all highlights in the document. The method is Annotated with #TestOnly, and docs state that it should only be used in Test code because it breaks/shortcuts the normal way to access that. It still works in non-test code however.
Since the only thing I wanted was the strings of the typos, I pulled the list, then looped through the HighlightInfo's in the list, and pulled the .getText()s.
No danger of screwing anything up.
Then pushed all those strings into:
SpellCheckerManager.getInstance(project).acceptWordAsCorrect(word, project);
Viola! All current highlighted typos are now added to the dictionary.
Proper solution? No. Good enough for what I needed to accomplish? Yup.

Unwanted pauses when using <prosody> tag in SSML for TTS

I am writing and marking up spoken utterances for an VUI tool. We are using Google Cloud Wave-net for our TTS service, and I have been trying to use SSML to make the TTS output more natural. When I add the tag "prosody", the TTS output adds a pause before the start of the tag, as in the below:
<speak>
Rebecca is allergic to <prosody rate="slow" range="high">soybean oil.</prosody> Would you like to cancel this order?
</speak>
In this example, the TTS output pauses between "to" and "soybean oil". This is just a silly example sentence, but in our real product we need to use this kind of tag to provide emphasis and differentiation between complex words.
Has anyone else experienced this issue? Any tips?
It looks like range isn't part of the Google Cloud TTS ssml spec. It is part of Microsoft's spec though, so maybe that's what you were thinking of.
If you're still trying to get rid of a gap like that, you could theoretically use the <seq> tag to get the segments to slightly overlap, but that seems like it'd be super difficult.

Can someone explain what a filter chain is in GPUImage in simple words?

I realize GPUImage has been well documented and there's a lot of instructions on how to use it on the main github page. However, it fails to explain what a filter chain is - what's addTarget? What's missing is a simple enough diagram showing what needs to be added to what. Is it always GPUImageView (source?) -> add target -> [filter]? I'm sorry if this sounds daft, but I fail to follow the correct sequence given there are so many ways of using it. To me, it sounds like you're connecting it the other way round (such as saying: Connect the socket to the TV). Why not add filter to the source? I'm trying to use it but I get lost in all the addTargets. Thanks!
You can think of it as a series of inputs and outputs. Look in the GPUImage framework project to see which are inputs (typically filters) and which are outputs (imageview, moviewriter, etc..). Every target effects the next target in the chain.
Example:
GPUImageMovie -> GPUImageSepiaFilter -> GPUImageMovieWriter
A movie will be sent to the sepia filter that will perform its job, the movie with a sepia filter applied will be sent to the movie writer, then the movie writer will export a movie with a sepia filter applied.
To help visualize what's going on, any node editor program typically uses this scheme. Think of calling addTarget: as one of the connections in the attached image.
A google image search for Node Editor will give you plenty of other image to help picture what adding targets does.

How to implement an NSTextView that performs on-the-fly markup to RTF conversions

I'm trying to build an NSTextView that can take "marked up" input that is automatically translated into beautiful RTF-style text while the user types.
The idea is to let the user enter text in "plain text" format, but to "beautify" it on the spot, e.g.
H1 A quick list:
* first item
* second item
would be translated into a first line with a header font, followed by a bulleted list.
I have found plenty of potential ways of doing this, but the Text System is incredibly complicated (with reason) and I don't want to start "cooking my own" if there is already something suitable built-in. BTW I would be happy with a Snow Leopard only API.
The first thing I thought of was "data detectors", but I can't find a public API for doing this.
Having reached the end of the road with that, I turned to the new "Text Input Sources API". This does all kinds of things, but the "data-driven input methods" section of the WWDC 2006 presentation "Take Charge of the Text Input" seems interesting in my context. Beyond that single presentation slide however nothing seems to exist anywhere, so it's a bit of a dead end again.
Finally, I had a look at the NSSpellChecker class which is also supposed to offer completion features and automatic corrections.. but I'm not sure how this could be re-purposed for my requirements either.
At the moment, I'm tempted to just re-parse the entire NSTextStorage manually and make the changes myself when the user stops typing.. but I'm sure there are cleverer heads around this forum..
Any advice or pointers in the right direction would be greatly appreciated.
Neither data detectors nor the spell checker are appropriate for this task. Assuming you're just looking for a way to pass the input to a parser/formatter you already have, interfacing with the text system isn't too difficult. You're on the right track with handling the editing to NSTextStorage.
Along those lines, there's no need to re-parse the entire thing when the user stops. The text system sends you the modified range and gives you the opportunity to act on those changes (and even reject them out of hand). Since all changes funnel through this (typing, pasting, dropping...), this is the point where you want to intercede.
Because you're dealing with headings and bulleted lists, I'd get the enclosing paragraph of the modified range. This gives you a nice, round unit of work that is easily discovered and perfectly fits what you're trying to accomplish.
Good luck!

Hexadecimal numpad

The project I am currently working on requires a lot of hexadecimal numbers to be entered into the code.
I once saw a pic of an old keyboard with a hexadecimal numpad (has A-F letters on it also) replacing the normal numpad. Anyone know where I can get one of these?
IPv6 Buddy -keypad should work well for hexadecimal input.
http://www.ipv6buddy.com/
If you can get your hands on one of the retired space shuttles, they have one!
I have an old Heathkit learning toy with a hex numpad because the only way to program it was to assemble code by hand (it came with a 6800 manual and some notepads) into the online monitor. This was actually fun!
Mine is missing the 'D' button however.
Great idea with the programmable keypad. I think i am going to pick up one of these: DX1 input system. Works for any reconfiguring I might want to do.
Is this the one you're talking about?
funky http://www.cpmuseum.com/Exhibits/Apple%20Lane/7603/7603-0005/images/000%20Front%20View.jpg
While this has a lot of "gee whiz" appeal, I have to say:
You have two hands. Use them. A-F are all reachable with the left hand on a standard keyboard while your right hand is on the num-pad. Instead of putting muscle-memory time into some arcane Hex-pad, you'll be learning to touch-type with your left hand, which has application outside your current project.
Better yet, come up with a smarter way of getting the hex codes into your code. Write a script that extracts them from your data-source and into your code as symbolic variables... or whatever.
EDIT
Ok, I'll give you the benefit of the doubt. Lets assume you're working on a hardware project and need to provide a specialized interface for your user. Maybe a programmable keypad would fit the bill?
Not sure of the specifics right now, but I'm pretty sure you can easily write a keyboard remapper. You could remap the QWASDF keys to ABCDEF in order to type them more quickly. That way you could use 2 hands to type. Or if you are in control of the program they are being typed into, you could just translate the keys in code on the fly. You also might want to try out the Microsoft Keyboard Layout Creator