how to create a grammar containing the text to train - sapi

I've read the blog of Eric about How to train the SAPI recognizer, and followed the pseudocode. But I don't know how to create a grammar containing the text to train. Now I have the correct transcript of the audio, but I don't know how to connect this correct transcript file to the grammar. Do I need to create a XML file? Could you tell me the interface name? Thank you so much.

Read the transcript file into a string, and then use ISpGrammarBuilder::AddWordTransition to define a simple grammar containing the string. Then process the audio from the wavefile, and you should get a recognition event.

Related

Antlr - How to generate exact input file to the output? (source-to-source transformation)

Let's say I have a source code file. I want to give this file to the ANTLR and generate the same code and save it to an output file.
Usage:
To beautify the input file.
To add some comments to the input file.
To inject some code into the input file.
Is it possible to do such a thing by exploiting ANTLR?
Basically, I am trying to do a source-to-source transformation with ANTLR from C/C++ to C/C++.
I am interested to add, delete, replace, or modify some lines of the code and generate an output that complies with C/C++ language rules.
P.S.: Please let me know if you know any other tool (other than CLANG) that does the same thing. Parsing C/C++ (or even Fortran) and providing some events to the user and let the user modify the source code.

Best way to save source line information in an ANTLR4.7.1 lexer/parser

All,
I'm fairly new to antlr so the solution may be trivial, however the solution escapes me. (I have much experience with parsers and scanners, just not with ANTLR generated ones.)
I'm recoding an assembler for a 32-bit (National Series 32000) CPU. It originally was coded using C++/(f)lex/yacc/bison, but is being ported to Java8. Part of my requirements is that I produce a listing file that contains addresses, generated code, source line, etc.
I have an object that can contain all of the information I need (e.g source line, generated code, etc) and I would like to associate said object with each token. My question is:
1) What is the best way to capture a source line? I considered using the lexer (+ modes) to capture a source line, but found no way to capture a source line and reject (or push back) the input to make it available for subsequent processing. I know that CharStream buffers it's entire input stream in one fell swoop. Would subclassing CharStream to construct my container and capture source line contents be an appropriate approach?
2) How to associate my container object with each token? I suspect subclassing Token and creating a custom TokenFactory is required, but am uncertain how to connect a custom CharStream to Token. (This is why I liked the concept of using the lexer to capture individual lines.)
Thanks for any help!
There's no need to capture position information manually. Each token (which is normally an instance of CommonToken comes with line and char offset values, plus a few more like the token index (which is the index of the token in the token stream) and start/stop indices, which give you the character indexes in the original text input.
The resulting parse tree also contains references to the token or symbol that make up a rule context or terminal node. So you can look up positions at any time, always connected to a particular parser rule.

Can GNU Radio modulate and demodulate a text file?

I'm new to sdr, dsp, and GNU Radio. My goal is to create an FSK demodulator for a project at work (described in this question), but since I haven't been making progress, I'm trying to teach myself some of the basics.
For practice, I'm trying to set up a GNU Radio flowchart that reads a text file, modulates it, then demodulates it, returning the same text as output.
Basic question: is it possible to read a text file, mod/demod, then return plain, readable text using GNU Radio? I'm trying to send and receive something simple, like "Test, one two three."
Next question: if the above is possible, where am I going wrong in the following flowchart (the output file has size (~200 kb), but appears blank)?
Thanks for any advice!
The file sink acts as a giant buffer for the data type that you choose.
To output a readable text, I chose to use a byte file sink and convert the binary data to ASCII/UTF-8 values ie add 48 to the stream.

Saving / Exporting File in objective-c which I can then open in Ruby

I am an absolute beginner, so I am sorry if this question has been asked before and I simply couldnt find it because I was lacking the right search terms. Feel free to point me to the right posts and delete this one her. So apologies in advance.
I am looking to program a software that imports a list and links every word or sentence on that list to an audio file. I then want to export the whole thing: the list, the audio files AND how the relations between the words in order to use everything with a different app, programmed in a different programming (that is all yet to come. it will probably be in ruby)
Since I will probably not be able to open coredata files with ruby, which file format will be the best for me, so that I can use it in ruby etc.? or will I have to save all audio files individually, as audio files and have a separate txt file that links the words to the files? This sounds... wrong? :(
Sorry I am so lost right now!
You can use json file to hold all your data. It is widely accepted as a data interchange format. But better not to embed audio files in another file. Instead you can save path to you audio file.

How to get parse tree in ANTLRWorks 2?

I am currently using ANTLRWorks 2. I don't know how to interpret example in it. In ANTLRWorks-1.5.2 there is an interpreter tab that you can paste your example. Is there something like that in ANTLRWorks 2? How you can get parse tree for input? Does input have to be in a specific file?
BTW, I couldn't get any result by using Run->Run in TestRig and uploading any input file.
How you can get parse tree for input?
The way you already tried: Run → Run in TestRig
Does input have to be in a specific file?
Yes.
BTW, I couldn't get any result by using Run->Run in TestRig and uploading any input file.
Then you probably did something wrong. Incorrect input file? Incorrect start rule? Pretty hard to say, really.
There's also the ANTLR4 plugin for IntelliJ which does interpreting on the fly and produces the parse tree. Also a very nice tool!