Take input from function parameters, not a file - antlr

All of the examples which I see read in a file the lex & parse it.
I need to have a function which takes a string (char *, I'm generating C code) as parameter, and acts upon that.
How can I do that best? I thought of writing the string to a stream, then feeding that to the lexer, but it doesn't feel right. Is there any better way?
Thanks in advance

You would need to use the antlr3NewAsciiStringInPlaceStream method.
You didn't say what version of Antlr you were using so I'll assume Antlr v3.
The inputs to this method are the string to parse, it's length and then you can probably use NULL for the last input.
This produces an input stream similar to the antlr3AsciiFileStreamNew that you would use to parse a file.
I see that you mentioned writing the input to a stream. If you can use C++ then that's the best method you'll probably come by.
This is the barebones code I normally use:
std::istringstream issInput(std::cin); // make this an ifstream if you want to parse a file
Lexer oLexer(issInput);
Parser oParser(oLexer);
oFactory("CommonASTWithHiddenTokens",&antlr::CommonASTWithHiddenTokens::factory);
antlr::ASTFactory oFactory;
oParser.initializeASTFactory(oFactory);
oParser.setASTFactory(&oFactory);
oParser.main();
antlr::RefAST ast = oParser.getAST();
if (ast)
{
TreeWalker oTreeWalker;
oTreeWalker.main(ast, rPCode);
}

I think you should feed it to a stream. You could feed it to stdin if you'd like. That way, your code shouldn't differ too much from reading strings from a file.

Related

ANTLR4 and parsing a type-length-value format

I am trying create a grammar for a format that follows a type-length-value convention. Can ANTLR4 read in a length value and then parse that many characters?
NO ...
From your question (which is very short so I could miss something ...) I gather you are mixing grammar and encoding rules.
When you say type-length-value, it sounds like an encoding rule to me (how to serialize a data). In my experience, you write this code yourself.
A grammar is at a higher level: it's a piece of text that describes something. Antlr will help you breaking this text into tokens and then into a tree that you can navigate.
This step only handles text: if you were going that way to solve your problem, you would still have to handle type, length and value yourself.
EDIT:
with a bit of googling I found this https://github.com/NickstaDB/SerializationDumper

Can we do string manipulation and conditional check in smooks?

I want to manipulate a large text file, which is coming as TEXT and want to use smooks to manipulate it. The text file contains large number of lines. And from each line, i have to split the characters and get information out of that.
Eg: i do following in java;
row.substring(0, 4)
row.substring(4, 64)
I have to convert the text content to CSV file.
Can we do exact same string manipulation in smooks too? (that is in smooks configuration can i do that?) I believe i can use Fixed Length processing for that?
How to add IF ELSE condition in smooks configuration?
Like in java;
if (row.length() == 900) {
//DO
}else(){
//DO
}
We can do string manipulation using fixed length reader[1]. but still i do not find a way to do condition check.
Eg: if /else
[1]http://www.smooks.org/mediawiki/index.php?title=V1.4:Smooks_v1.4_User_Guide#XML
If the format does not fit the flatfile reader, then you might be able to use the regex reader: https://github.com/smooks/smooks/tree/v1.5.1/smooks-examples/flatfile-to-xml-regex/
As for the conditional stuff... you really need to bind the data fragments into a Java model of some sort (real or virtual) and then conditionally process those fragments by either adding elements on the visitors being applied, or process the fragments by routing them to another process that processes them in parallel, which is a far better way of processing a huge data stream.

software\tool for checking syntax format

Im looking for a tool that can validate if a given text\paragraph subject to a specific format .
for example :
I can be able to check if the text is as following :
xxx{
sss:aaa;
}
yyy();
preferably open source tool, with easy rule sets like xml or something .
by text i mean a string that i get from i.e fgets(), or any function that reads from a file .
For something like this I'd suggest a parser (see, for instance, What is Parse/parsing?). You can build one from a definition of the language that you want to parse using a parser generator like Yacc or its free GNU equivalent Bison, or any number of other parser generators, many of which are also freely available.
Most parsers are used to transform a text that complies with a grammar into some other form (e.g. an intermediate language or a machine code) but that isn't neccesary - in your case the parser could simply say (at a minimum) "Yes" if the text conforms to a given grammar.
Parsers for simple grammars can be built by hand but, if you have the tools available, using a parser generator is easier and more robust in my experience.
Further, the text that you've shown is similar to a portion of code written in the C language (something close to a struct declaration followed by a function call), so you would be able to re-use parts of the grammar that you need from an existing Yacc grammar for C like this one.

equivalent of normalize function of DOM in JDOM

Can some tell me the function similar to normalize() of DOM in JDOM? I actually want to normalize the XML content and serialise it through XMLSerializer.
Thank You
Sam
Sandeep.
JDOM does not have a direct 'normalize' concept. Writing one would not be particularly hard, though. On the other hand, your intention is to output the XML in some format, and all the JDOM Output mechanisms will normalize the data for you.
So, for example, if you want to output the JDOM document as plain XML text, you can use the XMLOutputter class in org.jdom2.output and use an appropriate org.jdom2.output.Format instance (say, Format.getPrettyFormat() - do not use getRawFormat() as the raw formatter will not normalize the output at all).
In addition to outputting the JDOM document as text-based XML, you can also output to a DOM document, a SAX even stream, and even StAX streams. Each of these will produce a 'Normalized' output.
So, what you want to do (probably), is to:
Document mudoc = .....;
XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
xout.output(mydoc, somestream);
Rolf

How do I match non-ASCII characters with RegexKitLite?

I am using RegexKitLite and I'm trying to match a pattern.
The following regex patterns do not capture my word that includes N with a titlde: ñ.
Is there a string conversion I am missing?
subjectString = #"define_añadir";
//regexString = #"^define_(.*)"; //this pattern does not match, so I assume to add the ñ
//regexString = #"^define_([.ñ]*)"; //tried this pattern first with a range
regexString = #"^define_((?:\\w|ñ)*)"; //tried second
NSString *captured= [subjectString stringByMatching:regexString capture:1L];
//I want captured == añadir
Looks like an encoding problem to me. Either you're saving the source code in an encoding that can't handle that character (like ASCII), or the compiler is using the wrong encoding to read the source files. Going back to the original regex, try creating the subject string like this:
subjectString = #"define_a\xC3\xB1adir";
or this:
subjectString = #"define_a\u00F1adir";
If that works, check the encoding of your source code files and make sure it's the same encoding the compiler expects.
EDIT: I've never worked with the iPhone technology stack, but according to this doc you should be using the stringWithUTF8String method to create the NSString, not the #"" literal syntax. In fact, it says you should never use non-ASCII characters (that is, anything not in the range 0x00..0x7F) in your code; that way you never have to worry about the source file's encoding. That's good advice no matter what language or toolset you're using.