What does the double asterisk mean in proguard rules? - proguard

What does this rule mean in proguard for example:
-keep class myjava.** {*;}
I understand {*;} part would mean all members and methods in the class. But what does the 2 asterisk mean in the package name?.
Thanks in advance.

From the manual:
Types in classname, annotationtype, returntype, and argumenttype can contain
wildcards: '?' for a single character, '*' for any number of characters (but
not the package separator), '**' for any number of (any) characters, '%' for
any primitive type, '***' for any type, and '...' for any number of arguments.

Related

Escaping double quotes inside backticks

Is there a way to use double quotes in a method name, like this?
#Test
fun `should do "something"`() {
// ...
}
It works with ', but not with ". Is there a way to escape double quotes?
(I'm getting a compilation error due to incorrect syntax)
Special characters can be used in method names if escaped with back-ticks.
Your example compiles in my Kotlin project.
But it also depends on the target platform, if compiling to the Android then the set of characters is more restrictive.
See discussion here - https://discuss.kotlinlang.org/t/more-characters-allowed-for-identifiers-than-grammar-specifies-what-is-supported/2359/11
And the grammar definition here - https://kotlinlang.org/docs/reference/grammar.html#Identifier

Making character class with modifier symbols in Perl 6

I'd like to make a user-defined character class of "vowels", which will match any literal English vowel letter (a, e, i, o, u) as well as any of these letters with any possible diacritics: ắ ḗ ú̱ å ų̄ ẹ́ etc.
This is what I've tried to do, but it doesn't work:
> my $vowel = / <[aeiou]> <:Sk>* /
/ <[aeiou]> <:Sk>* /
> "áei" ~~ m:g/ <$vowel> /
(「e」 「i」)
You could try use ignoremark:
The :ignoremark or :m adverb instructs the regex engine to only
compare base characters, and ignore additional marks such as combining
accents.
For your example:
my $vowel = /:m<[aeiou]>/;
.say for "áeikj" ~~ m:g/ <$vowel> /;
Output:
「á」
「e」
「i」
The reason you can't match a vowel with a combining character using / <[aeiou]> <:Sk>* / is that strings in Perl 6 are operated on at the grapheme level. At that level, ų̄ is already just a single character, and <[aeiou]> being a character class already matches one whole character.
The right solution is, as Håkon pointed out in the other answer, to use the ignoremark adverb. You can put it before the regex like rx:m/ <[aeiou]> / or inside of it, or even turn it on and off at different points with :m and :!m.

antlr4: need to convert sequences of symbols to characters in lexer

I am writing a parser for Wolfram Language. The language has a concept of "named characters", which are specified by a name delimited by \[, and ]. For example: \[Pi].
Suppose I want to specify a regular expression for an identifier. Identifiers can include named characters. I see two ways to do it: one is to have a preprocessor that would convert all named characters to their unicode representation, and two is to enumerate all possible named characters in their source form as part of the regular expression.
The second approach does not seem feasible because there are a lot of named characters. I would prefer to have ranges of unicode characters in my regex.
So I want to preprocess my token stream. In other words, it seems to me that the lexer needs to check if the named characters syntax is correct and then look up the name and convert it to unicode.
But if the syntax is incorrect or the name does not exist I need to tell the user about it. How do I propagate this error to the user and yet let antlr4 recover from the error and resume? Maybe I can sort of "pipe" lexers/parsers? (I am new to antlr).
EDIT:
In Wolfram Language I can have this string as an identifier: \[Pi]Squared. The part between brackets is called "named character". There is a limited set of named characters, each of which corresponds to a unicode code point. I am trying to figure out how to tokenize identifiers like this.
I could have a rule for my token like this (simplified to just a combination of named characters and ASCII characters):
NAME : ('\\[' [a-z]+ ']'|[a-zA-Z])+ ;
but I would like to check if the named character actually exists (and other attributes such as if it is a letter, but the latter part is outside of the scope of the question), so this regex won't work.
I considered making a list of allowed named characters and just making a long regex that enumerates all of them, but this seems ugly.
What would be a good approach to this?
END OF EDIT
A common approach is to write the lexer/parser to allow syntactically correct input and defer semantic issues to the analysis of the generated parse tree. In this case, the lexer can naively accept named characters:
NChar : NCBeg .? RBrack ;
fragment NCBeg : '\\[' ;
fragment LBrack: '[' ;
fragment RBrack: ']' ;
Update
In the parser, allow the NChar's to exist in the parse-tree as discrete terminal nodes:
idents : ident+ ;
ident : NChar // named character string
| ID // simple character string?
| Literal // something quoted?
| ....
;
This makes analysis of the parse tree considerably easier: each ident context will contain only one non-null value for a discretely identifiable alt; and isolates analysis of all ordering issues to the idents context.
Update2
For an input \[Pi]Squared, the parse tree form that would be easiest to analyze would be an idents node with two well-ordered children, \[Pi] and Squared.
Best practice would not be to pack both children into the same token - would just have to later manually break the token text into the two parts to check if it is contains a valid named character and whether the particular sequence of parts is allowable.
No regex is going to allow conclusive verification of the named characters. That will require a list. Tightening the lexer definition of an NChar can, however, achieve a result equivalent to a regex:
NChar : NCBeg [A-Z][A-Za-z]+ RBrack ;
If the concern is that there might be a space after the named character, consider that this circumstance is likely better treated with a semantic warning as opposed to a syntactic error. Rather than skipping whitespace in the lexer, put the whitespace on the hidden channel. Then, in the verification analysis of each idents context, check the hidden channel for intervening whitespace and issue a warning as appropriate.
----
A parse-tree visitor can then examine, validate, and warn as appropriate regarding unknown or misspelled named characters.
To do the validation in the parser, if more desirable, use a predicated rule to distinguish known from unknown named characters:
#members {
ArrayList<String> keyList = .... // list of named chars
public boolean inList(String id) {
return keyList.contains(id) ;
}
}
nChar : known
| unknown
;
known : NChar { inList($NChar.getText()) }? ;
unknown : NChar { error("Unknown " + $NChar.getText()); } ;
The inList function could implement a distance metric to detect misspellings, but correcting the text directly in the parse-tree is a bit complex. Easier to do when implemented as a parse-tree decoration during a visitor operation.
Finally, a scrape and munge of the named characters into a usable map (both unicode and ascii) is likely worthwhile to handle both representations as well as conversions and misspelling.

How to define (f)lex/bison pattern ( /* comment*/ ) and/or ( 100 /* comment*/ )

how can I define the lex pattern ( ), or ( /* rem / ), and ( / foo / 100 / foo */ )
in using gnu (f)lex tool.
_space [ \t]
id [a-zA-Z_]+[a-zA-Z0-9_]
digit [0-9]
math_ops [\+\-\/\*\^\%]
rem_expr (({_space}*)*|("/*".*"*/")*|("//".*)*|([\n]*))*
arr_digid ("("*({digit}*|{id}*)*")"*){arr_expr1}*{math_ops}+
arr_expr1 {rem_expr}*{digit}*{rem_expr}*
arr_expr2 {rem_expr}*
%%
\({arr_expr2}*\) {
return _REM_;
}
\({arr_expr1}*\) {
return _PATTERN2_;
}
Generally, you do not return comments or whitespace from a lexer. Why would you? They are, by definition, not part of the semantics of the program you are trying to parse.
On the whole, the easiest way to deal with them is to just ignore them. Below, the first pattern matches any whitespace character other than newline (Use [[:space:]] to also ignore newlines), and the second one is a way of matching C-style comments. ("/*".*"*/" doesn't work because it will match from the beginning of the first comment on a line to the end of the last one.)
[[:blank:]] ;
[/][*][^*]*[*]+([^/*][^*]*[*]+)[/] ;
The fact that the patterns do not have an action (or, in general, do not have a return statement in their action) means that the (f)lex-generated scanner will simply proceed to analyze the next token.
Some other notes:
It's really not necessary to define a shortcut for every pattern. There is no problem with putting a pattern directly in the lex actions. And you certainly don't need to define shortcuts for character classes which already have shortcuts (like [[:blank:]] and [[:digit:]].
You don't need to backslash escape characters inside a character class, although with a couple of characters order is important. (That's why I used [*] in the C-comment pattern; I could equally have used "*" or \*, but I personally prefer [*].) So you could have defined:
math_ops [+/*^%-]
The - must go either at the end or the beginning of the list; ^ cannot go at the beginning, and (though you don't use it) ] would have to go at the beginning. The only character which requires backslash-escaping is a backslash itself.
However, my preference is always to let single-character tokens be handled with a single default rule at the end:
. { return yytext[0]; }
This is much more maintainable, and avoids the need to invent arbitrary token names for single-character tokens. You can just use a single-quoted character in your bison/yacc file.

Solve ambiguity in my grammar with LALR parser

I'm using whittle to parse a grammar, but I'm running into the classical LALR ambiguity problem. My grammar looks like this (simplified):
<comment> ::= '{' <string> '}' # string enclosed in braces
<tag> ::= '[' <name> <quoted-string> ']' # [tagname "tag value"]
<name> ::= /[A-Za-z_]+/ # subset of all printable chars
<quoted-string> ::= '"' <string> '"' # string enclosed in quotes
<string> ::= /[:print:]/ # regex for all printable chars
The problem, of course, is <string>. It contains all printable characters and is therefore very greedy. Since it's an LALR parser, it tries to parse a <name> as a <string> and everything breaks. The grammar complicates things because it uses different string delimiters for different things, which is why I tried to make the <string> rule in the first place.
Is there a canonical way to normalize this grammar to make it LALR compliant, if it's even possible?
This is not "the classical LALR ambiguity problem", whatever that might be. It is simply an error in the lexical specification of the language.
I took a quick glance at the Whittle readme, but it didn't bear any resemblance to the grammar in the OP. So I'm assuming that the text in the OP is conceptual rather than literal, and the fact that it includes the obviously incorrect
<string> ::= /[:print:]/ # regex for all printable chars
is just a typo.
Better would have been /[:print:]*/, assuming that Ruby lets you get away with [:print:] rather than the Posix-standard [[:print:]].
But that wouldn't be correct either because lexing (usually) matches the longest possible string, and consequently that will gobble up the closing quote and any following text.
So the correct solution for quoted-string is to write it out correctly:
<quoted-string> ::= /"[^"]*"/
or even
<quoted-string> :: /"([^\\"]|\\.)*"/
# any number of characters other than quote or escape, or escaped pairs
You might have other ideas about how to escape internal double quotes; those are just examples. In both cases, you need to postprocess the token in order to (at least) strip the double-quotes and possible interpret escape sequences. That's just the way it goes.
Your comment sequences present a more difficult issue, assuming that your intention was that a comment might include nested braces (eg. {This comment {with this} ends here}) because the nested brace syntax is not regular and thus cannot be matched with a regular expression. Of course, very few "regular expression" libraries are really regular these days, and I don't know if Ruby contains some sort of brace-counting extension, like for example Lua's pattern syntax. The nested brace syntax is certainly context-free but to actually parse it you need to lexically analyze the contents of the outer {...} in a different way than the rest of the program.
It is this latter observation, and not any weakness in the LALR algorithm, that is causing you pain, and I'd say that this is a weakness with the (mostly undocumented afaics) lexical analysis section of whittle. In a flex-generated lexer, for example, it would be normal to use start conditions to separate the lexical environments (program / quoted string / braced comment), and the parser would then have no ambiguity.
Hope that helps.