Antl Grammar with brackets and whitespace - antlr

i'm using AntlrWords 2.1 to create a grammar for antlr v4. The rParanthesis is not recognised for whatever reason. I have searched a lot but couldnt find a reason why. Can you find any errors?
grammar bracketsGrammar;
OPENINGBRACKET : '[';
CLOSINGBRACKET : ']';
lParanthesis : OPENINGBRACKET ;
rParanthesis : CLOSINGBRACKET ;
WS : ' ' ->skip;
WORD : ~[ "]+ ;
parenthesizedWord : lParanthesis WS+ WORD WS+ rParanthesis ;
fullfile: parenthesizedWord EOF ;
And my input is
[ Manuel ]
And the output is
(fullfile (parenthesizedWord (lParanthesis [) Manuel ]) <EOF>)
As you can see both [ and ] are part of the output but my rParanthesis is not recognised.
Thanks for your help
Manuel

You skip spaces in the lexer but are using the WS token inside parser rules: remove them.
It shouldn't be:
parenthesizedWord : lParanthesis WS+ WORD WS+ rParanthesis ;
but:
parenthesizedWord : lParanthesis WORD rParanthesis ;
instead.
And then the following parse tree will be created:

Related

Complex MongodbDB query in Mule4

I am trying to make a Mongodb query in Mule with the $in function, but mule says Invalid input '$', expected Namespace or NameIdentifier
have a collection that stores user authorization
{
"_id" : ObjectId("584a0dea073d4c3e976140a9"),
"partnerDataAccess" : [
{
"factoryID" : "Fac-1",
"partnerID" : "Part-1"
}
],
"userID" : "z12",
}
{
"_id" : ObjectId("584f5eba073d4c3e976140ab"),
"partnerDataAccess" : [
{
"factoryID" : "Fac-1",
"partnerID" : "Part-2"
},
{
"factoryID" : "Fac-2",
"partnerID" : "Part-2"
}
],
"userID" : "w12",
}
the flow will submit a userID and partnerID and query the database to see if authorization exist
when I query from Robo 3T, I write queries like this
e.g. user w12 and partner Part-2
db.getCollection('user').find({
userID:"w12", "partnerDataAccess.partnerID": {$in : ["Part-2", "ALL"]}
})
The $in was used because there is the "ALL" setting for admins
but while I try to put the find part into the Mongodb connector, Mule gives error during development and runtime
Hardcoded:
<mongo:find-one-document collectionName="user" doc:name="Find one document" doc:id="a03a6689-6b9d-473c-b8a6-3b8d1e989e38" config-ref="MongoDB_Config">
<mongo:find-query ><![CDATA[#[{
userID:"w12",
"partnerDataAccess.partnerID": {$in : ["Part-2", "ALL"]}
}]]]></mongo:find-query>
</mongo:find-one-document>
parametized
<mongo:find-one-document collectionName="user" doc:name="Find one document" doc:id="a03a6689-6b9d-473c-b8a6-3b8d1e989e38" config-ref="MongoDB_Config">
<mongo:find-query ><![CDATA[#[{
userID: payload.User,
"partnerDataAccess.partnerID": {$in : [ payload.partner, "ALL"]}
}]]]></mongo:find-query>
</mongo:find-one-document>
Error:
during development:
Invalid input '$', expected } or ~ or , (line 3, column 38):
Runtime:
Message : "Script '{
userID:"w12",
"partnerDataAccess.partnerID": {$in : ["Part-2", "ALL"]}
} ' has errors:
Invalid input '$', expected Namespace or NameIdentifier (line 3, column 38):
at 3 : 3" evaluating expression:
I have tried removing the $ or escaping the $ with backslash but it does not work
I know my query is not actually complex, welcome any help
seems to have found the correct way
><![CDATA[#[{
userID:"w12",
"partnerDataAccess.partnerID": {"\$in" : ["Part-2", "ALL"]}
}]]]>

ANTLR4 - using hidden Tokens in parser rules

I'm a complete noob with ANTLR, so apologies if this is a really basic question.
I'm trying to parse a file that has a weird JSON-like syntax. These files are huge, hundreds of MB, so I'm avoiding creating the parse tree and I'm just using grammar actions to manipulate the data into what I want.
As usual, I'm sending Whitespaces and Newlines to the HIDDEN channel. However, there are a couple cases where it'd be helpful if I could detect that the next character is one of those, because that delimits the property value.
Here's an excerpt from a file
game_speed=4
mapmode=0
dyn_title=
{
title="e_dyn_188785"
nick=nick_the_just hist=yes
base_title="k_mongolia"
is_custom=yes
is_dynamic=yes
claim=
{
title=k_bulgaria
pressed=yes
weak=yes
}
claim=
{
title=c_karvuna
pressed=yes
}
claim=
{
title=c_tyrnovo
}
claim=
{
title=c_mesembria
pressed=yes
}
}
And here's the relevant parts of my grammar:
property: key ASSIGNMENT value { insertProp(stack[scopeLevel], $key.text, currentVal) };
key: (LOWERCASE | UPPERCASE | UNDERSCORE | DIGIT | DOT | bool)+;
value:
bool { currentVal = $bool.text === 'yes' }
| string { currentVal = $string.text.replace(/\"/gi, '') }
| number { currentVal = parseFloat($number.text, 10) }
| date { currentVal = $date.text }
| specific_value { currentVal = $specific_value.text }
| (numberArray { currentVal = toArray($numberArray.text) }| array)
| object
;
bool: 'yes' | 'no';
number: DASH? (DIGIT+ | (DIGIT+ '.' DIGIT+));
string:
'"'
( number
| bool
| specific_value
| NONALPLHA
| UNDERSCORE
| DOT
| OPEN_CURLY_BRACES
| CLOSE_CURLY_BRACES
)*
'"'
;
specific_value: (LOWERCASE | UPPERCASE | UNDERSCORE | DASH | bool)+ ;
WS: ([\t\r\n] | ' ') -> channel(HIDDEN);
NEWLINE: ( '\r'? '\n' | '\r')+ -> channel(HIDDEN);
So, as you can see, the input syntax can have property values that are strings but are not delimited by ". And, in fact, for some odd reason, sometimes the next property appears on the same line. Ignoring the WS and NEWLINE means that the parser doesn't recognise that specific_value rule terminates so it grabs part of the next key as well. See output example below:
{
game_speed: 4,
mapmode: 0,
dyn_title:
{
title: 'e_dyn_188785',
nick: 'nick_the_just\t\t\this',
t: true,
base_title: 'k_mongolia',
is_custom: true,
is_dynamic: true,
claim: { title: 'k_bulgaria\n\t\t\t\tpresse', d: true, weak: true },
claim2: { title: 'c_karvuna\n\t\t\t\tpresse', d: true },
claim3: { title: 'c_tyrnovo' },
claim4: { title: 'c_mesembria\n\t\t\t\tpresse', d: true
}
},
What's an appropriate solution here to specify that specific_value shouldn't grab any characters once it reaches a WS or NEWLINE?
Thanks in advance! :D
I'd handle as much a possible in the lexer (like identifiers, numbers and strings). That could look like this in your case:
grammar JsonLike;
parse
: object? EOF
;
object
: '{' key_value* '}'
;
key_value
: key '=' value
;
key
: SPECIFIC_VALUE
| BOOL
// More tokens that can be a key?
;
value
: object
| array
| BOOL
| STRING
| NUMBER
| SPECIFIC_VALUE
;
array
: '[' value+ ']'
;
BOOL
: 'yes'
| 'no'
;
STRING
: '"' ( ~["\\] | '\\' ["\\] )* '"'
;
NUMBER
: '-'? [0-9]+ ( '.' [0-9]+ )?
;
SPECIFIC_VALUE
: [a-zA-Z_] [a-zA-Z_0-9]*
;
SPACES
: [ \t\r\n]+ -> channel(HIDDEN)
;
Resulting in the following parse:

Grammar.parse seems to loop forever and use 100% CPU

Reposted from the #perl6 IRC channel, by jkramer, with permission
I'm playing with grammars and trying to parse an ini-style file but somehow Grammar.parse seems to loop forever and use 100% CPU. Any ideas what's wrong here?
grammar Format {
token TOP {
[
<comment>*
[
<section>
[ <line> | <comment> ]*
]*
]*
}
rule section {
'[' <identifier> <subsection>? ']'
}
rule subsection {
'"' <identifier> '"'
}
rule identifier {
<[A..Za..z]> <[A..Za..z0..9_-]>+
}
rule comment {
<[";]> .*? $$
}
rule line {
<key> '=' <value>
}
rule key {
<identifier>
}
rule value {
.*? $$
}
}
Format.parse('lol.conf'.IO.slurp)
Token TOP has the * quantifier on a subregex that can parse an empty string (because both <comment> and the group that contains <section> have a * quantifier on their own).
If the inner subregex matches the empty string, it can do so infinitely many times without advancing the cursor. Currently, Perl 6 has no protection against this kind of error.
It looks to me like you could simplify your code to
token TOP {
<comment>*
[
<section>
[ <line> | <comment> ]*
]*
}
(there is no need for the outer group of [...]*, because the last <comment> also matches comments before sections.

How to specify a custom attribute on a generic parameter in VB.NET?

Textbook question, but I've done my googling, and I couldn't find anything.
Given a custom attribute named SomeAttribute, how do you do the following, in VB.NET?
void SomeMethod<[Some] T>()
{
}
I tried this:
Sub SomeMethod(<Some> Of T)()
End Sub
and
Sub SomeMethod(Of <Some> T)()
End Sub
But both fail to compile, with the error pointing at <Some>.
Given the silence here, and because I really needed an answer, I dug into the VB.NET Language Specification.
It never says explicitly whether this is supported or not, but it does have some formal grammar definitions which suggest that this isn't supported by VB.NET.
Specifically, section 9.2.1 defines the following productions for method declaration:
SubSignature ::= Sub Identifier [ TypeParameterList ]
[ OpenParenthesis [ ParameterList ] CloseParenthesis ]
In 9.2.5, parameters are defined as follows:
ParameterList ::=
Parameter |
ParameterList Comma Parameter
Parameter ::=
[ Attributes ] [ ParameterModifier+ ] ParameterIdentifier [ As TypeName ]
[ Equals ConstantExpression ]
And section 13.3 defines TypeParameterList:
TypeParameterList ::=
OpenParenthesis Of TypeParameters CloseParenthesis
TypeParameters ::=
TypeParameter |
TypeParameters Comma TypeParameter
TypeParameter ::=
[ VarianceModifier ] Identifier [ TypeParameterConstraints ]
VarianceModifier ::=
In | Out
TypeParameterConstraints ::=
As Constraint |
As OpenCurlyBrace ConstraintList CloseCurlyBrace
ConstraintList ::=
ConstraintList Comma Constraint |
Constraint
Constraint ::= TypeName | New | Structure | Class
Attributes make an appearance in the parameter list (and, for functions, in the return type), but the TypeParameterList is completely devoid of anything related to attributes.
So I'm going to go ahead and claim that VB.NET 10 (shipping with VS2012) does not support attributes on generic type parameters.

ANTLR v.3- Use of syntactic predicate for lookahead

Still learning how to properly use ANTLR... Here's my problem.
Say you have a (subset) of an UML grammar and an ANTLR Lexer/Parser with the following rules :
// Parser Rules
model
: 'MODEL' IDENTIFIER list_dec
;
list_dec
: declaration*
;
declaration
: class_dec ';'
| association ';'
| generalization ';'
| aggregation ';'
;
class_dec
: 'CLASS' IDENTIFIER class_content
;
...
association
: 'RELATION' IDENTIFIER 'ROLES' two_roles
;
two_roles
: role ',' role
;
role
: 'CLASS' IDENTIFIER multiplicity
;
...
I would like the 'role' rule to only allow the IDENTIFIER token if it matches an existing class IDENTIFIER. In other words, if you are given an input file and you run the lexer/parser on it, then all the classes that are referenced (e.g. the IDENTIFIER in the association rule) should exist. The problem is that a class might not exist (yet) at runtime (it can be declared anywhere in the file). What is the best approach to this ?
Thanks in advance...
This is probably best done after parsing. The parser creates some sort of tree for you, and afterwards you walk the tree and collect information about declared classes, and walk it a second time to validate the role tree/rule.
Of course, some things could be done with a bit of custom code:
grammar G;
options {
...
}
#parser::members {
java.util.Set<String> declaredClasses = new java.util.HashSet<String>();
}
model
: 'MODEL' IDENTIFIER list_dec
;
...
class_dec
: 'CLASS' id=IDENTIFIER class_content
{
declaredClasses.add($id.text);
}
;
...
role
: 'CLASS' id=IDENTIFIER multiplicity
{
if(!declaredClasses.contains($id.text)) {
// warning or exception in here
}
}
;
...
EDIT
Or with custom methods:
#parser::members {
java.util.Set<String> declaredClasses = new java.util.HashSet<String>();
void addClass(String id) {
boolean added = declaredClasses.add(id);
if(!added) {
// 'id' was already present, do something, perhaps?
}
}
void checkClass(String id) {
if(!declaredClasses.contains(id)) {
// exception, error or warning?
}
}
}
...
class_dec
: 'CLASS' id=IDENTIFIER class_content {addClass($id.text);}
;
role
: 'CLASS' id=IDENTIFIER multiplicity {checkClass($id.text);}
;