CFG OPLIST Confusion - grammar

I am trying to derive a grammar for this expression:
An operands list is a comma-separated list of zero or more terms bracketed with parentheses.
It should be like OPList = ...
Could someone help me out? The English on this has got me a little lost

OK, lets look at the English. Comma-separated terms is easy:
term, term, term, ...
Zero or more is easy, as that says the whole list can be empty. Bracketed with parenthesis is ambiguous. It could mean the whole list can be in brackets like this:
(term, term, term)
Or, it could mean that each term is in brackets, like this:
(term), (term), (term)
The use of the with would usually imply the latter style.
This gives us a CFG of:
OPList : element | List "," element |
List : element "," List | element
element : "(" term ")"

Related

Spring Data JPA Like or Containing

I'm messing with Spring Boot Data JPA and, reading the documentation, I got confused. Whats the difference?
What I understood is, the "Like" operator makes SQL without the "%" surrounding my String (where name like 'String') and the "Containing" operator makes SQL with the "%" surrounding my String (where name like '%String%'). Am I wrong?
I used the "Like" operator and he works fine in situations where the "%" is required in both sides so, I'm really confused!
It is correct that you can emulate a containing with a like.
The differences are:
you have to enclose your search string with wildcards yourself when using like.
you can have wildcards not only at the beginning or end but also in the middle, multiple wildcards in the middle and different wildcards like _ which matches a single character.
a final subtile difference is that containing will escape wildcards contained in your search argument, which like would not. So when searching for abc%def the two behave differently
| containing | like (with additional `%` around the searchstring)
-------------------------------------------------------------------------------------
123abc%def456 | matches | matches
123abcXYZdef456 | does not match | matches

ANTLR4 : ordering problem of parser rules for a keyword used in several rules (AND, BETWEEN AND)

I am having a problem while parsing some SQL typed string with ANTLR4.
The parsed string is :
WHERE a <> 17106
AND b BETWEEN c AND d
AND e BTW(f, g)
Here is a snippet of my grammar :
where_clause
: WHERE element
;
element
: element NOT_EQUAL_INFERIOR element
| element BETWEEN element AND element
| element BTW LEFT_PARENTHESIS element COMMA_CHAR element RIGHT_PARENTHESIS
| element AND element
| WORD
;
NOT_EQUAL_INFERIOR: '<>';
LEFT_PARENTHESIS: '(';
RIGHT_PARENTHESIS: ')';
COMMA_CHAR: ',';
BETWEEN: B E T W E E N;
BTW: B T W;
WORD ... //can be anything ... it doesn't matter for the problem.
(source: hostpic.xyz)
But as you can see on that same picture, the tree is not the "correct one".
ANTLR4 being greedy, it englobes everything that follows the BETWEEN in a single "element", but we want it to only take "c" and "d".
Naturally, since it englobes everything in the element rule, it is missing the second AND of the BETWEEN, so it fails.
I have tried changing order of the rules (putting AND before BETWEEN), I tried changing association to right to those rules (< assoc=right >), but those didn't work. They change the tree but don't make it the way I want it to be.
I feel like the error is a mix of greediness, association, recursivity ... Makes it quite difficult to look for the same kind of issue, but maybe I'm just missing the correct words.
Thanks, have a nice day !
I think you misuse the rule element. I don't think SQL allows you to put anything as left and right limits of BETWEEN.
Not tested, but I'd try this:
expression
: expression NOT_EQUAL_INFERIOR expression
| term BETWEEN term AND term
| term BTW LEFT_PARENTHESIS term COMMA_CHAR term RIGHT_PARENTHESIS
| expression AND expression
| term
;
term
: WORD
;
Here your element becomes expression in most places, but in others it becomes term. The latter is a dummy rule for now, but I'm pretty sure you'd want to also add e.g. literals to it.
Disclaimer: I don't actually use ANTLR (I use my own), and I haven't worked with the (rather hairy) SQL grammar in a while, so this may be off the mark, but I think to get what you want you'll have to do something along the lines of:
...
where_clause
: WHERE disjunction
;
disjunction
: conjunction OR disjunction
| conjunction
;
conjunction
: element AND conjunction
| element
;
element
: element NOT_EQUAL_INFERIOR element
| element BETWEEN element AND element
| element BTW LEFT_PARENTHESIS element COMMA_CHAR element RIGHT_PARENTHESIS
| WORD
;
...
This is not the complete refactoring needed but illustrates the first steps.

ANTLR recognize single character

I'm pretty sure this isn't possible, but I want to ask just in case.
I have the common ID token definition:
ID: LETTER (LETTER | DIG)*;
The problem is that in the grammar I need to parse, there are some instructions in which you have a single character as operand, like:
a + 4
but
ab + 4
is not possible.
So I can't write a rule like:
sum: (INT | LETTER) ('+' (INT | LETTER))*
Because the lexer will consider 'a' as an ID, due to the higher priority of ID. (And I can't change that priority because it wouldn't recognize single character IDs then)
So I can only use ID instead of LETTER in that rule. It's ugly because there shouldn't be an ID, just a single letter, and I will have to do a second syntactic analysis to check that.
I know that there's nothing to do about it, since the lexer doesn't understand about context. What I'm thinking that maybe there's already built-in ANTLR4 is some kind of way to check the token's length inside the rule. Something like:
sum: (INT | ID{length=1})...
I would also like to know if there are some kind of "token alias" so I can do:
SINGLE_CHAR is alias of => ID
In order to avoid writing "ID" in the rule, since that can be confusing.
PD: I'm not parsing a simple language like this one, this is just a little example. In reality, an ID could also be a string, there are other tokens which can only be a subset of letters, etc... So I think I will have to do that second analysis anyways after parsing the entry to check that syntactically is legal. I'm just curious if something like this exists.
Checking the size of an identifier is a semantic problem and should hence be handled in the semantic phase, which usually follows the parsing step. Parse your input with the usual ID rule and check in the constructed parse tree the size of the recognized ids (and act accordingly). Don't try to force this kind of decision into your grammar.

Velocity / Marketo issue with if statements that include a space

I am writing some Velocity Script as part of a Marketo email template that requires that I check if an boolean attribute on a lead is set or not.
When I attempt to display something associated with a lead in my system I can do something like;
{{lead.myName}}
This also works for fields that have spaces in them;
{{lead.my name}}
When it comes to using that field for #setting or #ifing something then it doesn't work as well.
#if($lead.my name) throws an error saying that an unexpected space has been found.
I have tried variants like #if(${lead.my name}) to no avail.
Any help / pointers would be massively helpful.
Actual use case
In my example the field I need to access is called lead.Subscribed to Innovation (L) 1, I don't think the brackets will cause a problem, certainly any error messages have been space related.
According to User Guide variables cannot have spaces
A VTL Identifier must start with an alphabetic character (a .. z or A .. Z). The rest of the characters are limited to the following types of characters:
alphabetic (a .. z, A .. Z)
numeric (0 .. 9)
hyphen ("-")
underscore ("_")
even with the curly brackets :
this is valid:
#set( ${myemail} = "email#email.com" )
while trhis is invalid:
#set( ${my email} = "email#email.com" )
My best guess will be to change the source system to comply with the velocity naming convention.

Ambiguous grammar?

hi
there is this question in the book that said
Given this grammer
A --> AA | (A) | epsilon
a- what it generates\
b- show that is ambiguous
now the answers that i think of is
a- adjecent paranthesis
b- it generates diffrent parse tree so its abmbiguous and i did a draw showing two scenarios .
is this right or there is a better answer ?
a is almost correct.
Grammar really generates (), ()(), ()()(), … sequences.
But due to second rule it can generate (()), ()((())), etc.
b is not correct.
This grammar is ambiguous due ot immediate left recursion: A → AA.
How to avoid left recursion: one, two.
a) Nearly right...
This grammar generates exactly the set of strings composed of balanced parenthesis. To see why is that so, let's try to make a quick demonstration.
First: Everything that goes out of your grammar is a balanced parenthesis string. Why?, simple induction:
Epsilon is a balanced (empty) parenthesis string.
if A is a balanced parenthesis string, the (A) is also balanced.
if A1 and A2 are balanced, so is A1A2 (I'm using too different identifiers just to make explicit the fact that A -> AA doesn't necessary produces the same for each A).
Second: Every set of balanced string is produced by your grammar. Let's do it by induction on the size of the string.
If the string is zero-sized, it must be Epsilon.
If not, then being N the size of the string and M the length of the shortest prefix that is balanced (note that the rest of the string is also balanced):
If M = N then you can produce that string with (A).
If M < N the you can produce it with A -> AA, the first M characters with the first A and last N - M with the last A.
In either case, you have to produce a string shorter than N characters, so by induction you can do that. QED.
For example: (()())(())
We can generate this string using exactly the idea of the demonstration.
A -> AA -> (A)A -> (AA)A -> ((A)(A))A -> (()())A -> (()())(A) -> (()())((A)) -> (()())(())
b) Of course left and right recursion is enough to say it's ambiguous, but to see why specially this grammar is ambiguous, follow the same idea for the demonstration:
It is ambiguous because you don't need to take the shortest balanced prefix. You could take the longest balanced (or in general any balanced prefix) that is not the size of the string and the demonstration (and generation) would follow the same process.
Ex: (())()()
You can chose A -> AA and generate with the first A the (()) substring, or the (())() substring.
Yes you are right.
That is what ambigious grammar means.
the problem with mbigious grammars is that if you are writing a compiler, and you want to identify each token in certain line of code (or something like that), then ambigiouity wil inerrupt you in identifying as you will have "two explainations" to that line of code.
It sounds like your approach for part B is correct, showing two independent derivations for the same string in the languages defined by the grammar.
However, I think your answer to part A needs a little work. Clearly you can use the second clause recursively to obtain strings like (((((epsilon))))), but there are other types of derivations possible using the first clause and second clause together.