NSSpeechRecognizer with wildcard command elements - objective-c

The docs for NSSpeechRecognizer state that complex, multi-step actions can be executed from single spoken commands, such as:
“schedule a meeting with Adam and John tomorrow at ten o’clock.”
I'm able to execute simple commands which are preprogrammed, but I don't see how the above could be interpreted using the class. It seems like
“schedule a * with * *”
should be a command. Any idea if something like this is possible? Or are we just supposed to pass an infinite number of possible commands to the recognizer?

It does not appear to me from the NSSpeechRecognizer documentation that it will support using complex phrases like the example you have given. To get the semantic meaning from a phrase like this you would use a system that supports multislot grammars like most IVR systems that support the VoiceXML standard. It looks to me that this speech recognition API only supports passing in simple commands as an array and not specifying complex grammar rules. With this type of system you would have to implement what is called a directed dialog, which might go something like this:
C: What would you like to do?
U: Schedule a meeting.
C: Tell me the first person that you would like to attend?
U: Adam.
C: Tell me the next person that will attend or say "done" if the attendee list is complete.
U: John.
C: Tell me the next person that will attend or say "done" if the attendee list is complete.
U: Done.
C: What day is this meeting?
U: Tomorrow.
C: What time is the meeting?
U: Ten O'Clock.
C: Thank you. Your meeting has been scheduled.
Using a directed dialog you can restrict the expected commands/utterances to a much more defined list. Although your list of possible names could be quite large unless you cull them from a users contact list.

My interpretation of the docs is that you would need to accumulate and operate on any compound state on your own. You provide NSSpeechRecognizer with a set of discrete words/phrases that it should recognize as 'commands', and it reports to you when it has recognized them.
For the example you've given, I think you'll run into problems when you get to the "Adam and John" part -- it's not an arbitrary dictation engine. But, for fun, let's try to imagine how we might do this:
You might tell it you want to recognize the following phrases as 'commands':
"schedule a"
"meeting" (and perhaps "appointment", "playdate", etc)
"with"
"Adam and John"
"tomorrow" (and probably other related things like "today", "two days from now", all the days of the week, etc)
"ten o'clock"
As words/phrases are recognized, you could create a stack of semantically related words/phrases based on previously recognized words/phrases. So, for instance, it recognizes the "schedule a" phrase, and you know that there should be more info coming to fill out the semantic context, so you push that phrase onto the stack. Next, it recognizes "meeting". Your app says 'sure, a meeting is something that can be scheduled' and pushes it onto the stack as well. If the next word it recognized wasn't germane to the previously-recognized "schedule a" command, then it would clear the stack. If, at any point, the elements on the stack satisfy some pre-defined criteria for a fully formed expression of semantic intent, then your app can take the appropriate action based on that intent. There's obviously a temporal element to this as well. If the next thing required to establish semantic context doesn't arrive in a reasonable amount of time, the semantic context stack should get cleared.
A similar system, conceptually, is the iOS/MacOS touch/trackpad gesture recognition system. When a tap touch happens, the OS has to recognize the single tap, and acknowledge the possibility that that is the entire user intent, but it also has to manage the possibility that it might receive another tap very shortly, turning the single tap into a double tap. It will have to accumulate this state over time, and infer the user intent by looking at the combination of discrete events.
You're not going to get such functionality from NSSpeechRecognizer for free, and being that it's not a dictation engine, you also won't get arbitrary 'tokens' from it (like "Adam and John", assuming you're not registering some giant list of names all as potential commands.) Even so, that doesn't mean it couldn't be leveraged to do some pretty neat stuff using a mechanism like I described. It's just that you're gonna have to write it yourself.
Good luck!

Related

Unreal Engine 4 Blueprints - how to set branch condition on get actor of class

I have been working on a simple game in Unreal Engine 4. I am trying to make it so when a player is hit by a cube they take damage. However, I am stuck on creating a condition. I have previously used:
to set up a condition where a player only takes damage if they are touched by the cube (In my cube pawn blueprints).
This doesn't work however - when trying to set-up my health bar:
This shows that I am now using entirely new variables to attempt to get a successful updating bar.
Without setting it up so when a cube hits the player, the player takes damage, the player will take damage from simply jumping at walking into other surfaces.
I have created a function that successfully updates my current health and max health so I don't need to show, or need help with the maths or updating the widget. Is there a way for me to use the branch to create an if statement that checks the contact is form a cube?
I am quite new to blueprints and have mostly developed through the use of tutorials. If you need clarity on my question or you don't understand what I am asking please leave a comment and I will try to update. I have looked long and hard for an answer, but I have found that Unreal Engine 4 hasn't got many questions that I can tailor the answer to my situation. If the answer is already in another post on this website, comment saying so and I will remove this post.
Thanks for any help you can give me :)
(This also has a itch.io page for me to quickly share to my friends so I will also credit the person who helped me there)
If I've understood your question correctly, I believe you are just asking how to check if the cube has hit the player character and not some other actor.
Instead of using an if statement as you suggested, you can just cast the Other Actor property to your first person character. If the cast is successful then it hit the character, if the cast fails, it hit something else. You can then call the damage function which you said you've already created. Below is a basic example you could use in your cube blueprint. You will also need to make sure you have a collision box surrounding your cube mesh (and your character, but I can already see that in your screenshot).

Is this valid BPMN?

I have an XOR gate connected to 4 activities. Each activity is then connected to the same subprocess. Not looking for answers or solutions - just general advice related to BPMN modelling.
My issue is that I think this design pattern could be modelled better. I'm also not sure if it is valid. Does anyone have any pointers in the right direction?
I'm essentially trying to create a for each component, do inspection (inspection is same set of steps, but different component)
If you really look out for validity you may use a configurable bpmn linter, like https://github.com/bpmn-io/bpmnlint (I have no affiliation with this project).
With such a tool you can statically check against the common design errors.
Regarding your second question whether "this is valid", I would say : Almost.
Diagramme 1 : The closing XOR gateway should be placed before the "Inspect component" activity. The four arrows of the individual "Locate component xxx" activities should all merge into this gateway, thereby mirroring the initial XOR gateway whence the four arrows leave. From that merging/closing XOR gateway, you should draw a single arrow towards "Inspect component".
Diagramme 2 : You should equally place a closing XOR gateway at the end of your process, just before the end event, in order to merge the arrows from the "No" branch of the third XOR gateway and the "Record other abnormalities" activity. Otherwise, your process will never finish in case you carry out "Record other abnormalities" because there is no outgoing arrow from that activity.
While the points above are objective because they relate to the BPMN syntax, the answer to your first question, whether "this design pattern could be modeled better", is based on opinion. Here is what I would try to improve
In diagramme 1, the question "Component successfully inspected?" will never be answered with "Yes". What happens when all components have indeed been inspected ? Your process would be stuck.
I have a feeling you do not need the individual "Locate component x" activities at this level. In the diagramme, the location of each component does not change the course of subsequent activities. I would rather see this activity as the first task in the "Inspect component" sub-process. But if you insist on modeling it this way, then I would turn the exclusive gateway into an inclusive one and add an arrow that goes straight to the end event. That way, multiple inspections or none can happen and you will not get stuck.
Considering the two previous points, you can represent a "For each" loop with a sequential activity as shown below. The three horizontal lines signify that you repeat the sub-process sequentially (as opposed to in parallel). I would see this process itself as a sub-process of a wider process such as "Maintain vehicle".

Is there a way to escape a keyword in a Gherkin feature or scenario description?

In Gherkin, you can have free-form text that describes a scenario, a feature, etc. These descriptions are not used by, say, a test runner, but are for you to describe important additional information to another human.
The documentation for Gherkin says that these cannot start a line with one of the other keywords, such as Given, When or Then. Yet, sometimes the best description I could give would be to start with one of these keywords.
I'm sort of making this up as I go here, but here is an example of what I wish I could do:
Scenario: Many notifications at the same time get combined
When we have a lot of notifications being posted at once, it causes problems
for humans. They can't make sense of that much new information all at once.
So if we are ever in a situation where we are posting lots of
notifications in a short time period, we will take the one with the highest
severity and show it with the other notifications as "child" notifications,
accessible via a link that says, "And N other issues."
Given a notification posted today at 11:03:25
And a notification posted today at 11:03:26
And a notification posted today at 11:03:26
And a notification posted today at 11:03:27
When a notification is posted at 11:03.28
Then the notification list will contain 1 notification
And that notification should contain 4 child notifications
The problem I have is that because my description starts with a When, it the tools assume that I've started my specific steps, and blows up on the next line, which doesn't start with a keyword.
I've considered:
Commenting out the first line or the entire description (that seems more consistent to me) but to me, there is a semantic difference between a comment with # and a description.
Rewording the thing to not start with a "When". For example, if it started with, "In times where we have a lot of notifications..." but that's less readable, which is the point with Gherkin-style specifications.
If it wasn't the first word in the whole description, I might be able to get away with simply wrapping my lines differently so that the "When" starts in the middle of a line instead of the beginning, but in this case, I don't have that option.
Those options just seem like workarounds that feel sub-optimal.
Is there a way to "escape" these keywords to tell the system that some usage of "When" is really still just part of the description and not a keyword? If not, is there some sort of accepted best practice or guideline for how people should handle situations like this?
You could use # in the beginning of the line (it's used for writing comments).
Ex:
# When a notification is posted...
You are misinterpretting the the language spec. You can describe a feature and use keywords at the beginning of a line. The example you posted gets interpreted as steps in a scenario since the description appears after the scenario keyword.
Just as Mr Cas said in his answer, you need comments.
Feature: Given a feature title
When I use keywords up here
Then it is allowed
Scenario: When I use keywords after the title to describe a scenario
# Then I need to use comments

Naming conventions for intents, events and contexts

I am wondering if there are any agreed upon naming conventions for intents, events and contexts in Dialogflow.
If there are none, then I would appreciate if you shared your own naming conventions!
I find that it's a bit of a contradiction to say 'it doesn’t really matter as long as it‘s easy to understand for others'. If there were naming conventions, it would be much easier for someone to understand a new Dialogflow bot.
Here's my take:
Intents
I use dots to group intents and imply a hierarchy. The first part of the intent name is ideally just one word that clearly indicates the main subject of the intent. For example
name would be an intent that receives a user's name as an input. name.confirm would be the follow-up intent that receives confirmation of the name. name.confirm.yes would be the intent where the user has given confirmation.
This is in the context of a bot which is gathering contact data so the input function is implied. In a more mixed-type chatbot, you could add the type of intent as a first word to categorize your intents better. E.g. input.name.confirm.yes or FAQ.shipping.overseas or smalltalk.agent.location ('Where are you?').
I use the same approach for fallback intents: fallback.name would be the fallback intent that is triggered when the bot is waiting for the user to input their name but doesn't understand the answer.
Contexts
For contexts I use snake case. For example awaiting_email would be the context that is set when the bot is waiting for the user to input their email address. After I have the email address I would set a context email to carry forward the information so that other intents can use it as a context. Or if I'm collecting several pieces of data about the user, I will set the context user and other intents can access certain parameters e.g. via user.email.
I made a video about the topic as well: https://youtu.be/kgKuS2RJcy4
It's obvious that everyone is coming from a slightly different angle because their area of application is different. I'm sure we'll get to a common standard eventually!
I am going to answer this from perspective of a service based company's project.
In my project, we have used similar naming convention for intents as in-built small-talk intents, because its easy to understand and categorize. like FAQ.Comapny.your_question, Buy.Drinks.coffee etc.
(for some unknown reasons we capitalize the first letter of main categories of intents, in small-talk all letters are lower case as it should be).
For the events we have used similar notation for universal constants, like INVOKE_EVENT.
For parameters and contexts, we used snake_case i.e coffee_cost.
Basically, it doesn't really matter as long as it's easy to understand and replicate. But you should always have a basic structure which you and your whole team follows throughout the project.
There aren't any, unfortunately, and the system is flexible enough that it doesn't matter too much. Pick names that make sense (duh).
Although most of the examples use it, I avoid using a space in the name. I treat them more like function names, so having a space in it breaks my aesthetics.
I tend to group Intents based around what part of the conversation they're working on, which is managed through the use of contexts that are set, and separate the part and subpart designations by dots, so it vaguely looks like package designations. I'll have Intents named something like
calculate.fallback
calculate.number
calculate.operation
fallback
welcome
Where the "calculate" ones all have an Input Context of "calculate".
Most of all, remember that Intents (and thus their names) represent what the user says and not what your code does with that. This is the big way that it differs from a function name.
Honestly, it really doesnt matter! As long as its easy to replicate in code and clear to see/understand for anyone else that might be working on your agent, then anything is fine. Generally though using typical coding notation such as CamelCase is probably not a bad idea.

Displaying Korean Characters - iOS App

I am trying to display Korean text in my iPhone app. The app appends the Unicode of letters one by one to an NSMutableString and displays the string on the screen after each letter is appended.
I understand that there are some rules for conjoining letters (Jamo).
Is there a function for automatically applying all these rules to a string of letters or do I need to write code to make changes (e.g., changing a consonant to a tail consonant if there is a vowel before it)?
FCA. It's you who sent email to me, right? Because the more detailed question is here, I will try (my best) to answer here instead of replying to your email.
By reading the whole text you and people wrote here, I figured out that you are making a Korean handwriting recognition software. So, you would not enjoy the luxury of the Korean input method provided by Apple.
There are two things for me to say. Let's go one by one. (I believe you are already aware of one of the two things I'm going to explain.)
How to compose Hangul text.
So, by reading your inquiry, it should not be about Unicode composed/decomposed Korean string (or just a series of Ja (Consonants) and Mo (Vowels)). Your question looks to be about "how to determine if a consonant (your term is tail consonant, right?) a user writes is a last consonant or the begining consonant of next syllable.
Best thing is to learn Korean, but let me briefly explain it.
Let's say you write 소방차 (a Fire dept. car.)
You are to write : ㅅㅗㅂㅏㅇㅊㅏ
(Again I'm not talking about the decomposed form of Unicode. It's about how people write Korean text.)
When you type ㅗ (which is the 2nd char) temporarily a display system displays 소 by attaching the ㅗ to its preceding ㅅ. And it will look up Korean table. (Although how to assemble Hangul is JoHap style (조합형), which is called composite style, there are tables of allowed Korean text defined in any Korean standard called Wansung style (완성형). So, you are to test the "assembled" syllable to the table to see if there is such a syllable). Then you will find "소" in the table. So, you will display "소".
Now the next char, "ㅂ" is written. Then here it becomes a little complicated. Because there is a syllable "솝" in the table, first it will attach ㅂ to the preceding syllable. So, it will display "솝". However, things are not determined yet completely. A user writes the next char, "ㅏ". It's pretty sure that there is no syllable without first/beginning consonant (Ja). It will look up the table, but fail to find a syllable "ㅏ".
So, it will guess the ㅂ (edited from ㅅ. it was typo) attached to the previous syllable actually belongs to the 2nd syllable. And it should display "소바". Now, ㅇ is typed. Then it tries to attach the ㅇ to the second syllable. So it displays 소방. (At this moment it can also lookup 방 in the table. And it is found.)
Now, "ㅊ" is typed. Probably internally it can test 소방ㅊ where o and ㅊ exist under 바 (I can't write it, because there is no such syllable with o and ㅊ exist together under 바, like 밝.). However, there is no such syllable. So, it instantly determines that ㅊ belongs to the next syllable.
Then "ㅏ" is typed. It will assemble ㅊ and ㅏ to make 차. When you press the space key or return key or any other white space key, it will finish composing Hangul.
This is a simple case. In Korean, there are more complicated syllables like 빨, 꼭, 헗, etc. For the first consonants, 복자음 (BokJaUm, Double Consonants) like ㅃ, ㄲ in 빨 and 꼭, people type ㅂ and ㅅ by pressing the shift key. Then it will display ㅃ and ㄲ. So, picking up how may consonants and determine where (previous syllable or next syllable) it belongs to can be easy if a user type with keyboard. (However, there are some nice Korean input methods for Windows and Xterm, where it allows to type ㅂ twice to make ㅃ. It's kind of an intelligent feature. But testing text like 빱빠라빱, 흙을 can be complicated because you end up testing 3 or 4 consonants grouped like {1,3}, {2,2}, {3, 1}.
The bad news is... because you are writing handwriting recognition, you may need to handle such complicated case if you input recognized Hangul characters one by one into a Korean input method engine. However, if you write up your own input method in your app, you can maintain its own state machine, so it can be easier. But as you can see, it's a trade off. Depending on the existing input method engine and ingesting each char into it. (Hmmm... wait... Maybe the input method engine can handle those complicated cases too.)
FYI, I would like to introduce two open source projects. One is a Korean input method Finder module for Mac, and the other is an input method engine with which you can make a Korean input method. Also, there is a Korean input method for X-Windows hosted here. If you prefer Windows project to look up, here is one.
The latter two were hosted at KLDP.net, a Korean open source project hosting site, but they were moved to Google code. As far as I can remember, "SaeNaRu" and "Nabi" (butterfly) can support typing the same consonant twice to make a double consonant.
For more detailed information, you can look up the libhangul and nabi. (I remember that the input method part of code was almost the same between libhangul and nabi before. But at that time they were separated and expected to evolve independently. So, I guess that they are different.
OK. The first thing is done.
Now let's move on to the second issue. (This is the part I said you may know about already. But just to complete my explanation, let me explain this also.)
It's about what character to choose as an input to your probable Korean input method state machine or a engine like libhangul. There are basically two representation of composed (on display) Hangul characters : Composed and Decomposed. Composed one contains fully composed chars. For example, 사랑합니다, each syllable, 사, 랑, 합, 니, 다 is saved as such. They are not stored as ㅅ, ㅏ, ㄹ, ㅏ, ㅇ, ㅎ, ㅏ, ㅂ, ㄴ, ㅣ, ㄷ, ㅏ.
That is composed representation in Unicode. This representation is usually used by text editors, etc. The other representation is decomposed in Unicode. It's like ㅅ, ㅏ, ㄹ, ㅏ, ㅇ, ㅎ, ㅏ, ㅂ, ㄴ, ㅣ, ㄷ, ㅏ.
This representation is usually used by file systems. For example, if you put a file name in Hangul on Windows, and access the folder which contains it from Mac, it will be displayed like ㅅㅏㄹㅏㅇㅎㅏㅂㄴㅣㄷㅏ although it is displayed as 사랑합니다 on Windows.
However, there is another set of characters if memory serves, which is just a list of Hangul consonants and vowels. Although they may look same or similar to decomposed syllables, they are actually different in that the location where they are drawn is in middle a space where a character is drawn. Its purpose is to present Hangul characters in Korean alphabet tables or things like that for education purpose (or any other purpose.)
So, I'm not sure what characters (i.e. the decomposed or the characters for the list of Hangul consonants and vowels) to ingest to a input method state machine or input method engine you choose or implement. If you implement it, its your choice, but if you use some external libraries for the engine, you need to figure it out.
Also, as I mentioned in my blog post, there are two variants in each composed and decomposed representation, which are all defined in Unicode standard. So, well.. yeah.. I agree. It's quite a bit of work.
As for me, I tried to make an input method for Mac, (when Apple announced they would get rid of the Finder plugin architecture for security issue), but at that time libhangul (Yeah.. I tried to use it) was being changed a lot. So, until it stabilized, I decided to hold off. But because I became very busy at work and tired when I got home, so I didn't make progress on my own input method. So, I believe the state of the libhangul project is much better now than ever. So, it's good try at least to take a look at it.
Also, if you don't have Windows, it would be good to try hanterm or any xterm derivatives which supports Hangul input in itself. The source code will be available at their hosting web site.
Good luck with your project, and if there are more things to ask me, please do so.
Check out these system level text-input facility. I never used these, but looks promising.
http://developer.apple.com/library/ios/#documentation/StringsTextFonts/Conceptual/TextAndWebiPhoneOS/CustomTextProcessing/CustomTextProcessing.html#//apple_ref/doc/uid/TP40009542-CH4-SW8
http://developer.apple.com/library/ios/#documentation/UIKit/Reference/UITextInput_Protocol/Reference/Reference.html#//apple_ref/occ/intf/UITextInput
Because iOS doesn't support system-wide keyboard customization, everybody just use system-default input facility. And handling of Hangul composition is all different by every operating-systems or platforms. (MS/Apple/Samsung/LG or others) So the best way is using system-supplied facility such as UITextField for consistency for users. Or you should accurately simulate how your platform OS does it. Of course you can make it yourself, but users won't like it.
Though I'm not expert on this topic - Korean Hangul compositor -, but I don't think there's simple algorithm without table lookup. Anyway if you really want to implement it yourself, these are all the core problems you have to handle.
Compositing your visual symbols into consonants and vowels which defined in Unicode.
Determining initial-consonant / final-consonants by placement of vowels.
It wouldn't be so hard, but anyway ability to modify preceding character sequence is required. You cannot implement Korean input with only one-way stream unless you have separate key for initial/final consonants which are looks same.
Unicode defines all valid set of Jamo components. Usually those components are too many to be presented on a device. And also inefficient. Most Korean input system decomposes those Jamo again and composite them once before compositing final litter. You also can identify and decompose them visually just like Korean people do.
After you get initial/final-consonants and vowels which are defined in Unicode standard, Unicode Normalization feature (such as -[NSString precomposedStringWithCompatibilityMapping]) will do the rest of jobs.
libhangul (code.google.com/p/libhangul ) does the conversion! It has several functions to handle different types of keyboards (i.e., keyboards with different layouts) and converting the keys to the Unicodes of Hanguls.
It also has several functions which combine the Hanguls to make syllables (they basically implement table lookups that Eonil has mentioned in his response).
Libhangul stores the Hanguls in its buffer as it receives them (it does not output them). After receiving enough Hanguls and successfully converting them into a syllable, it outputs the syllable. Unfortunately, this is quite confusing for the user. The way around this is displaying the buffer content on the screen. After receiving a new Hangul, what has been displayed must be erased. If a syllable has been successfully formed, then the syllable is displayed. Otherwise, the buffer content is displayed again. Note that you can’t just display the new Hangul on the screen. You must erase what you have displayed before and read the previous Hanguls and the new one from the buffer and display them on the screen again.
The reason is that Libhangul may change the code for the previous Hanguls stored in the buffer to make it possible to combine them with the new Hangul. This way, you will get the updated Hanguls.
Also note if the user changes the location of the cursor, the buffer must be emptied.
Additionally, if the user presses backspace, then, the last Hangul displayed on the screen must be erased and must be removed from the buffer.
Libhangul has also some features for correcting typos. For example, if you typeᅡ and ᄉ, it converts them into사.
Thank you JongAm Park and Eonil for your help and thoughtful comments! Since my reputation is less than 15 at this point, I can’t upvote your answers, but I will do when I can.