JavaCC: Defining a *password* token or grammar rule - passwords

I'm using JavaCC do simulate a small part of SQL grammars, and I'm having a problem with defining a password.
I'm writting grammar rules for a
CREATE USER user_name IDENTIFIED BY a_password
statement, and I'm stuck. Since a password can match with ANYTHING like asdkj*!##, or !#%^%ASDjnkj, _ASDJLJK##& etc. Note that in Oracle, it's totally legal to input your password without single quote mark ('). I could solve this problem easily if the quote marks are compulsory, but unfortunately they're not.
I've tried many ways to define a token/grammar rule for this password, but it didn't work as I expected, the latest rule I've tried is:
TOKEN : {
< S_PASSWORD: ( < DIGIT > | < LETTER > |< S_PASSCHAR >)+ >
| <#S_PASSCHAR : "!"|"#"|"#"|"$"|"%"|"^"|"&"|"*" >
| <#LETTER: ["a"-"z", "A"-"Z", "_"]>
| <#DIGIT: ["0" - "9"]>
}
But since < S_PASSWORD > can match ANYTHING, any other token that I defined earlier will be match with it, and I always get a JavaCC warning like this:
Warning: "#" cannot be matched as a string literal token at line 33515, column 13. It will be matched as < S_PASSWORD >.
There are similar suggestions from my friends, but they didn't work either.
Can someone help me with this?

Assuming there is some lexical way to tell where the password begins and ends, you can use lexical states. For example, if the sequence IDENTIFIED BY is only ever followed by spaces that are followed by a password, you make a state machine so that IDENTIFIED transitions from DEFAULT to S0. In S0 spaces are skipped and BY transitions to S1. In S1 spaces are skipped and a sequence of password characters is a PASSWORD token; the PASSWORD token transitions back to DEFAULT. Of course this only works if IDENTIFIED BY can only ever be followed by a password. Also in S0 you need to by able to deal with all the normal stuff, so most of your token rules should apply in both states S0 and DEFAULT but transition to DEFAULT. See the FAQ for more on lexical states.
If BY is only ever followed by a password, then it is even easier, as you don't need S0.
Edit
Here are some example rules. If the keyword BY is only ever followed by a password, you only need two states
TOKEN : { <BY : "BY"> : S1>
<S1> TOKEN : { <PASSWORD : ( <PASSWORDCHAR> )+ } : DEFAULT }
<DEFAULT, S1> : SKIP { " " } // Stays in the same state.
If you can use IDENTIFIED followed by BY then you need three states
<DEFAULT, S0> TOKEN : { <CREATE : "CREATE"> : DEFAULT } // And similar for most token rules.
<DEFAULT, S0> TOKEN : { <IDENTIFIED : "IDENTIFIED"> : S1 }
<S0> TOKEN : { <BY : "BY"> : S1> // BYs that follow IDENTIFIED
<DEFAULT> TOKEN : { <BY : "BY"> : DEFAULT } BYs that don't follow IDENTIFIED.
<S1> TOKEN : { <PASSWORD : ( <PASSWORDCHAR> )+ } : DEFAULT }
<DEFAULT, S0, S1> : SKIP { " " } // Stays in the same state.

Related

How to commit to an alternation branch in a Raku grammar token?

Suppose I have a grammar with the following tokens
token paragraph {
(
|| <header>
|| <regular>
)
\n
}
token header { ^^ '---' '+'**1..5 ' ' \N+ }
token regular { \N+ }
The problem is that a line starting with ---++Foo will be parsed as a regular paragraph because there is no space before "Foo". I'd like to fail the parse in this case, i.e. somehow "commit" to this branch of the alternation, e.g. after seeing --- I want to either parse the header successfully or fail the match completely.
How can I do this? The only way I see is to use a negative lookahead assertion before <regular> to check that it does not start with ---, but this looks rather ugly and impractical, considering that my actual grammar has many more than just these 2 branches. Is there some better way? Thanks in advance!
If I understood your question correctly, you could do something like this:
token header {
^^ '---' [
|| '+'**1..5 ' ' \N+
|| { die "match failed near position $/.pos()" }
]
}

Can't handle HTTP multiple attribute values in Perl

I'm facing with a really strange issue. I interfaced a SAML authentication with OTRS which is an ITSM written in Perl and the Identity Provider sends the attributes as follow :
LoginName : dev-znuny02
mail : test2#company.dev
Profile : company.autre.idp.v2()
Profile : company.autre.mcf.sp(dev)
givenName : MyName
sn : Test2
I handle these with a module called Mod_Auth_Mellon and as you can see the attribute Profile is multivaluated. In short I retrieve all of these values with the following snippet :
sub new {
my ( $Type, %Param ) = #_;
# allocate new hash for object
my $Self = {};
bless( $Self, $Type );
$Self->{ConfigObject} = $Kernel::OM->Get('Kernel::Config');
$Self->{UserObject} = Kernel::System::User->new( %{$Self} );
# Handle header's attributes
$Self->{loginName} = 'MELLON_LoginName';
$Self->{eMail} = 'MELLON_mail';
$Self->{Profile_0} = 'MELLON_Profile_0';
$Self->{Profile_1} = 'MELLON_Profile_1';
$Self->{gName} = 'MELLON_givenName';
$Self->{sName} = 'MELLON_sn';
return $Self;
}
sub Auth {
my ( $Self, %Param ) = #_;
# get params
my $lname = $ENV{$Self->{loginName}};
my $email = $ENV{$Self->{eMail}};
my $profile0 = $ENV{$Self->{Profile_0}};
my $profile1 = $ENV{$Self->{Profile_1}};
my $gname = $ENV{$Self->{gName}};
my $sname = $ENV{$Self->{sName}};
...
}
I can handle all the values of the attributes except the attribute Profile. When I take a look to the documentation, they said :
If an attribute has multiple values, then they will be stored as MELLON_<name>_0, MELLON_<name>_1, MELLON_<name>_2
To be sure, I activated the diagnostics of the Mellon module and indeed I receive the information correctly :
...
MELLON_LoginName : dev_znuny02
MELLON_LoginName_0 : dev_znuny02
MELLON_mail : test2#company.dev
MELLON_mail_0 : test2#company.dev
MELLON_Profile : company.autre.idp.v2()
MELLON_Profile_0 : company.autre.idp.v2()
MELLON_Profile_1 : company.autre.mcf.sp(dev)
...
When I try to manipulate the MELLON_Profile_0 or MELLON_Profile_1 attributes in the Perl script, the variable assigned to it seems empty. Do you have any idea on what can be the issue here ?
Any help is welcome ! Thanks a lot guys
PS : I have no control on the Identity Provider so I can't edit the attributes sent
I didn't managed to make it work but I found a workaround to prevent users who don't have the Profile attribute value from logging into the application:
MellonCond Profile company.autre.mcf.sp(dev)
according the documentation :
You can also utilize SAML attributes to control whether Mellon authentication succeeds (a form of authorization). So even though the IdP may have successfully authenticated the user you can apply additional constraints via the MellonCond directive. The basic idea is that each MellonCond directive specifies one condition that either evaluates to True or False.

How to get the line of a token in parse rules?

I have searched everywhere and can't find a solution. I am new to ANTLR and for an assignment, I need to print out (using similar syntax that I have below) an error message when my parser comes across an unidentified token with the line number and token. The documentation for antlr4 says line is an attribute for Token objects that gives "[t]he line number on which the token occurs, counting from 1; translates to a call to getLine. Example: $ID.line."
I attempted to implement this in the following chunk of code:
not_valid : not_digit { System.out.println("Line " + $not_digit.line + " Contains Unrecognized Token " $not_digit.text)};
not_digit : ~( DIGIT );
But I keep getting the error
unknown attribute line for rule not_digit in $not_digit.line
My first thought was that I was applying an attribute for a lexer token to a parser rule because the documentation splits Token and Rule attributes into two different tables. so then I change the code to be:
not_valid : NOT_DIGIT { System.out.println("Line " + $NOT_DIGIT.line + " Contains Unrecognized Token " $NOT_DIGIT.text)};
NOT_DIGIT : ~ ( DIGIT ) ;
and also
not_valid : NOT_DIGIT { System.out.println("Line " + $NOT_DIGIT.line + " Contains Unrecognized Token " $NOT_DIGIT.text)};
NOT_DIGIT : ~DIGIT ;
like the example in the documentation, but both times I got the error
rule reference DIGIT is not currently supported in a set
I'm not sure what I am missing. All my searches turn up how to do this in Java outside of the parser, but I need to work in the action code (I think that's what it is called) in the parser.
A block like { ... } is called an action. You embed target specific code in it. So if you're working with Java, then you need to write Java between { and }
A quick demo:
grammar T;
parse
: not_valid EOF
;
not_valid
: r=not_digit
{
System.out.printf("line=%s, charPositionInLine=%s, text=%s\n",
$r.start.getLine(),
$r.start.getCharPositionInLine(),
$r.start.getText()
);
}
;
not_digit
: ~DIGIT
;
DIGIT
: [0-9]
;
OTHER
: ~[0-9]
;
Test it with the code:
String source = "a";
TLexer lexer = new TLexer(CharStreams.fromString(source));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();
which will print:
line=1, charPositionInLine=0, text=a

How to override the NQPMatch.Str function

... Or how to change $<sigil>.Str value from token sigil { ... } idependently from the matched text. Yes I'm asking how to cheat grammars above (i.e. calling) me.
I am trying to write a Slang for Raku without sigil.
So I want the nogil token, matching anything <?> to return NqpMatch that stringifies: $<sigil>.Str to '$'.
Currently, my token sigil look like that
token sigil {
| <[$#%&]>
| <nogil> { say "Nogil returned: ", lk($/, 'nogil').Str; # Here It should print "$"
}
}
token nogil-proxy {
| '€'
| <?>
{log "No sigil:", get-stack; }
}
And the method with that should return a NQPMatch with method Str overwritten
method nogil {
my $cursor := self.nogil-proxy;
# .. This si where Nqp expertise would be nice
say "string is:", $cursor.Str; # here also it should print "$"
return $cursor;
}
Failed try:
$cursor.^cache_add('Str', sub { return '$'; } );
$cursor.^publish_method_cache;
for $cursor.^attributes { .name.say };
for $cursor.^methods { .name.say };
say $cursor.WHAT.Str;
nqp::setmethcacheauth($cursor, 0);
Currently, most of my tests work but I have problems in declarations without my (with no strict) like my-var = 42; because they are considered as method call.
#Arne-Sommer already made a post and an article. This is closely related. But this questions aims:
How can we customize the return value of a compile-time token and not how to declare it.
Intro: The answer, pointed by #JonathanWorthington:
Brief: Use the mixin meta function. (And NOT the but requiring compose method.)
Demo:
Create a NQPMatch object by retrieving another token: here the token sigil-my called by self.sigil-my.
Use ^mixin with a role
method sigil { return self.sigil-my.^mixin(Nogil::StrGil); }
Context: full reproducible code:
So you can see what type are sigil-my and Nogil::StrGil. But I told you: token (more than method) and role (uninstantiable classes).
role Nogil::StrGil {
method Str() {
return sigilize(callsame);
}
}
sub EXPORT(|) {
# Save: main raku grammar
my $main-grammar = $*LANG.slang_grammar('MAIN');
my $main-actions = $*LANG.slang_actions('MAIN');
role Nogil::NogilGrammar {
method sigil {
return self.sigil-my.^mixin(Nogil::StrGil);
}
}
token sigil-my { | <[$#%&]> | <?> }
# Mix
my $grammar = $main-grammar.^mixin(Nogil::NogilGrammar);
my $actions = $main-actions.^mixin(Nogil::NogilActions);
$*LANG.define_slang('MAIN', $grammar, $actions);
# Return empty hash -> specify that we’re not exporting anything extra
return {};
}
Note: This opens the door to mush more problems (also pointed by jnthn question comments) -> -0fun !

JavaCC grammar - parse everything to the end of the file

I am using FeatureBNF (and so in essence I am using JavaCC) to try and write a grammar that will produce a (very) simple parser to parse Gherkin files.
An example Gherkin file:
Feature: Calculator
In order to avoid silly mistakes
As a math idiot
I want to be told the sum of two numbers
Scenario: Add two numbers
Given I have entered 50 into the calculator
And I have also entered 70 into the calculator
When I press add
Then the result should be 120 on the screen
All I want to do, to begin with, is parse this into a Feature that has the name Calculator and a Body, which is the entirety of the rest of the file.
I have struggled with the part where I'm trying to read the rest of the file into the Body however. I think maybe partially because there is no 'natural' delimiters for when one section ends - it's denoted by a newline.
Trying the following grammar:
<DEFAULT> TOKEN :
{
<FEATURE: "Feature: " >
| <#LETTER: ["\u0027","\u0041"-"\u005a","\u005f","\u0061"-"\u007a"] >
| <FEATURE_NAME: (<LETTER>)+ >
| <NEWLINE: ("\r\n" | "\n\r" | "\r" | "\n") >
| <TEXT : ~[] >
}
GRAMMARSTART
Feature :
<FEATURE> FeatureName <NEWLINE>
Body
<EOF>
;
FeatureName: <FEATURE_NAME>;
Body: (<TEXT>)*;
I get the error:
[java] java.lang.reflect.InvocationTargetException
... lots of stack trace removed...
[java] Caused by: cide.gparser.ParseException: Encountered "\r\n" (5) at line 2, column 1.
[java] Was expecting one of:
[java] <EOF>
[java] <TEXT> ...
I have been able to achieve what I want by adding some delimiters in to the Gherkin file and using lexical states, like so:
Feature: Calculator #TITLEEND
#BODYSTART
In order to avoid silly mistakes
As a math idiot
I want to be told the sum of two numbers
Scenario: Add two numbers
Given I have entered 50 into the calculator
And I have also entered 70 into the calculator
When I press add
Then the result should be 120 on the screen
#BODYEND
With the following relevant parts of the grammar:
<DEFAULT, IN_BODY> SPECIAL_TOKEN : {
" " | "\t" | "\n" | "\r" | "\f"
}
<DEFAULT> TOKEN : {
<FEATURE: "Feature: " >
| <#LETTER: ["\u0027", "\u0041"-"\u005a", "\u005f", "\u0061"-"\u007a"] >
| <FEATURE_NAME: (<LETTER>)+ >
| <ENDFEATURETITLE: "#TITLEEND" >
}
<DEFAULT> TOKEN : { <BODYSTART : "#BODYSTART"> : IN_BODY }
<IN_BODY> TOKEN : { <TEXT : ~[] > }
<IN_BODY> TOKEN : { <BODYEND : "#BODYEND"> : DEFAULT }
GRAMMARSTART
Feature:
<FEATURE> FeatureName <ENDFEATURETITLE>
Body
<EOF>;
FeatureName: <FEATURE_NAME>;
Body: <BODYSTART> Text <BODYEND>;
Text: (<TEXT>)*;
But I am sure I must be missing something and would like to be able to achieve this without having to annotate the feature files. What is a better way to do this?
SIDE NOTE
FeatureBNF builds on top of JavaCC and outputs a grammar file for JavaCC to process. I am completely new to both FeatureBNF and JavaCC, but they seem similar enough that I hope this question might be applicable to JavaCC gurus. (FeatureBNF uses JavaCC syntax for the lexical specifications and then its own format for the grammar's production rules.)
Based on your grammar, you can switch states after the first newline, so the following lexical grammar will suffice:
<DEFAULT> TOKEN : {
<FEATURE: "Feature: " >
| <#LETTER: ["\u0027", "\u0041"-"\u005a", "\u005f", "\u0061"-"\u007a"] >
| <FEATURE_NAME: (<LETTER>)+ >
| <ENDFEATURETITLE: "#TITLEEND" >
| <NEWLINE: ("\r\n" | "\n\r" | "\r" | "\n") > : IN_BODY
}
<IN_BODY> TOKEN : { <TEXT : ~[] > }
Now the syntactic grammar is
Feature:
<FEATURE> FeatureName <NEWLINE>
Body
<EOF>;
FeatureName: <FEATURE_NAME>;
Body: (<TEXT>)*;