ANTLR (Lexer): separate arbitrary identifiers from keywords - antlr

I'm trying to create a (simple) Lexer for bat/cmd files (for syntax coloring). As part of this task, I need to separate keywords from (arbitrary) identifiers. But according to this answer ANTLR tries to let the longest match win over shorter ones. My grammar looks like this so far
lexer grammar CmdLexer;
Identifier
: IdentifierNonDigit
( IdentifierNonDigit
| Digit
)+
;
Number
: Digit+
;
fragment IdentifierNonDigit
: [a-zA-Z_\u0080-\uffff]
;
fragment Digit
: [0-9]
;
Punctuation
: [\u0021-\u002f\u003a-\u0040\u005b-\u0060\u007b-\u007f]+
;
Keyword
: A P P E N D
| A T
| A T T R I B
| B R E A K
| C A L L
| C D
| C H C P
| C H D I R
| C L S
| C O L O R
| C O P Y
| D A T E
| D E L
| D I R
| D O
| E C H O
| E D I T
| E N D L O C A L
| E Q U
| E X I S T
| E X I T
| F C
| F O R
| F T Y P E
| G O T O
| G E Q
| G T R
| I F
| I N
| L E Q
| L S S
| M D
| M K D I R
| M K L I N K
| M O R E
| M O V E
| N E Q
| N O T
| N U L
| P A T H
| P A U S E
| P O P D
| P U S H D
| R D
| R E N
| R E N A M E
| S E T
| S E T L O C A L
| S H I F T
| S T A R T
| T I T L E
| T R E E
| T Y P E
| W H E R E
| W H O A M I
| X C O P Y
;
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
Whitespace
: [ \t]+
-> skip
;
Newline
: ( '\r' '\n'?
| '\n'
)
-> skip
;
LineComment
: ( '#'? R E M ~[\r\n]*
| '#'? '::' ~[\r\n]*
)
-> skip
;
but it always matches everything as Identifier, even words like append or CALL. I don't see how modes would solve this problem here, but how to give a certain rule higher priority (here Keyword over another (here Identifier)?

But according to this answer ANTLR tries to let the longest match win over shorter ones.
It does and that should be what you want. Note that this rule (the so-called maximal munch rule) has nothing to do with whether append is matched as a keyword or identifier. It has to do with whether appendix is matched as the keyword append, followed by the identifier ix; or as the single identifier appendix. Since the latter is clearly what one wants in most contexts, the maximal munch rule is useful.
What matters in this case though is what happens if multiple rules produce a match of the same length. In that case ANTLR applies the rules that is defined first in the grammar. So if you change the order of your definitions so that Keyword comes before Identifier, the Keyword rule will take precedence in cases where both rules would produce a match of the same length (and the longest match would still win in cases where that's not the case). So an input like append appendix would be tokenized as the keyword append, followed by the identifier appendix, which should be what you want.
PS: I'm not sure where/how your lexer is going to be used, but generally you'd want to distinguish between different keywords instead of having one rule that matches all keywords. If the tokens are going to be used as an input to parser, the information that something is a keyword is not very useful without knowing which keyword it is.

Related

Removing indirect left recursion from this grammar

I'm trying to figure out how to remove the indirect left recursion from the logical keyword expressions within my Rust port of a Ruby parser (https://github.com/kenaniah/ruby-parser/blob/master/ruby-parser/src/parsers/expression/logical.rs). The grammar looks like:
E --> N | A | O | t
N --> n E
A --> E a E
O --> E o E
E = expression
A = keyword_and_expression
O = keyword_or_expression
N = keyword_not_expression
How would I go about transforming this to remove the recursion in A and O?
According to this factorization tool, the resulting grammar would be:
E -> N
| A
| O
| t
N -> n E
A -> n E a E A'
| O a E A'
| t a E A'
O -> n E o E O'
| n E a E A' o E O'
| t a E A' o E O'
| t o E O'
A' -> a E A'
| ϵ
O' -> a E A' o E O'
| o E O'
| ϵ
Looks like the factorizations for A and O ended up being rather complex thanks to the multiple productions of E.
I think the problem here is not the indirect recursion but rather the ambiguity.
If it were just indirect recursion, you could simply substitute the right-hand sides of N, A and O, eliminating the indirect recursion:
E → n E
| E a E
| E o E
| t
In order to get rid of the direct left recursion, you can left-factor:
E → n E
| E A'
| t
A'→ a E
| o E
and then remove the remaining left-recursion:
E → n E E'
| t E'
E'→ ε
| A' E'
A'→ a E
| o E
That has no left-recursion, direct or indirect, and every rule starts with a unique terminal. However, it's not LL(1), because there's a first/follow conflict.
That's really not surprising, because the original grammar was highly ambiguous, and left-recursion elimination doesn't eliminate ambiguity. The original grammar really only makes sense if accompanied by an operator precedence table.
That table indicates that AND and OR are left-associative operators with the same precedence (a slightly unusual decision), while NOT is a unary operator with higher precedence. That, in turn, means that the BNF should have been written something like this:
N → n N
| t
E → A
| O
| N
A → E a N
O → E o N
N → n N
| t
The only difference from the grammar in the OP is the removal of ambiguity, as indicated by the precedence table.
Again, the first step is to substitute non-terminals A and O in order to make the left-recursion direct:
E → E a N
| E o N
| N
N → n N
| t
This is essentially the same grammar as the grammar for arithmetic expressions (ignoring multiplication, since there's only one precedence level), and the left-recursion can be eliminated directly:
E → N E'
E' → a E
| o E
| ε
N → n N
| t

SQL pairing adjacent rows

Using postgreSQL (latest). I'm a total noob.
I have a view that always gives me a table of an even number of rows- no duplicates (the letters are analogous to unique keys) and no nulls, let's call it letter_view:
| letter |
|-------------|
| A |
| B |
| C |
| D |
| E |
| F |
| G |
| H |
My view already uses an ORDER BY clause so the table is pre-sorted.
What I'm trying to do is merge every two rows into a single row
with each value from those two rows. So for n rows, I need the result set to have
n / 2 rows with combined adjacent rows.
| l1 | l2 |
|-------|------|
| A | B |
| C | D |
| E | F |
| G | H |
I've tried using lead and I think I'm close but I can't quite get it in the format I need.
My best query attempt looks like this:
SELECT letter AS letter_1, lead(letter, 1) OVER (PARTITION BY 2) AS letter_2 from letter_view;
but I get:
letter_1 | letter_2
----------+----------
A | B
B | C <--- Don't need this
C | D
D | E <--- Don't need this
E | F
F | G <--- Don't need this
G | H
H | <--- Don't need this
(8 rows)
I checked several other answers on SO, and looked through
the PostgreSQL docs and w3C SQL tutorials but I can't find a succinct answer.
What is this technique called and how would I do it?
I'm trying to do this in pure SQL if possible.
I know I could use multiple queries with LIMIT and OFFSET to get the data with multiple selects or potentially by using a cursor but that seems very inefficient for large input sets although I could be totally wrong. Again, total noob.
Any help in the right direction is highly appreciated.
You can use lead() to get the next value . . . but you need a way to filter as well. I would suggest row_number():
select letter_1, letter_2
from (select letter AS letter_1,
lead(letter, 1) OVER (PARTITION BY 2 order by ??) AS letter_2,
row_number() over (partition by 2 order by ??) as seqnum
from letter_view
) lv
where seqnum % 2 = 1;
Notes:
I included the partition clause as you have in the original code. I don't know what "2" refers to.
You should be explicit about the order by. It is not wise to depend on the ordering of some underlying table or view.

MS Access: Show entire group, based on another field

I would like to show all classes and grades of only those students who have at least one "F" grade.
Here is the source table:
ID | Students | Class | Grade
1 | Addams, W | History | A
2 | Addams, W | Biology | A
3 | Addams, W | French | B
4 | Jetson, E | Spanish | B
5 | Jetson, E | Geometry | B
6 | Jetson, E | Biology | F
7 | Rubble, B | English | F
8 | Rubble, B | Geometry | B
9 | Rubble, B | Biology | B
10 | Flintstone, P | Music | A
11 | Flintstone, P | Spanish | B
Here is a report, grouped by Students:
Addams, W
---------------French B
---------------Biology A
---------------History A
Flintstone, P
---------------Spanish B
---------------Music A
Jetson, E
---------------Biology F
---------------Geometry B
---------------Spanish B
Rubble, B
---------------Biology B
---------------Geometry B
---------------English F
Again, I would like to show all classes and grades of only those students who have at least one "F" grade, as seen below:
Jetson, E
---------------Biology F
---------------Geometry B
---------------Spanish B
Rubble, B
---------------Biology B
---------------Geometry B
---------------English F
Any assistance would be much appreciated.
Create a query with your table as the source. Drop in the Students field in the first column and put the following formula in the second column: IIf([Grade]=”F”,1,0), then save the query. (by default Access will name this column “Expr1”, but you change it to whatever you like)
Create a 2nd query using query 1 as the source, drop in the 2 columns from query 1, group on, sum the column with the formula, and in this column add a criteria of >=1, and save. You now have a table with only the students who have at least 1 “F”. (you group on by placing cursor in the grid section of the query at the bottom, right-clicking and selecting "Totals" from the prompt box)
Create a 3rd query tying the 2nd query to your original source table by connecting the Students field with a 1 to 1 match (i.e. join type 1).
You can query your table using a subquery to query on the same table for any instances having a F grade:
SELECT a.ID, a.Students, a.Class, a.Grade
FROM yourtable AS a
WHERE EXISTS
(
SELECT '1'
FROM yourtable AS b
WHERE a.Students = b.Students
AND b.Grade = 'F'
);
Next, base your report on above query.
OP here: Both answers worked; thanks again. Here is the code for the second answer:
Query 1
SELECT Students2.Students, IIf([Grade]="F",1,0) AS F_grade
FROM Students2;
Query 2
SELECT Query1.Students, Sum(Query1.F_grade) AS SumOfF_grade
FROM Query1
GROUP BY Query1.Students
HAVING (((Sum(Query1.F_grade))>=1));
Query 3
SELECT Students2.Students, Students2.Class, Students2.Grade
FROM Students2 INNER JOIN Query2 ON Students2.Students = Query2.Students;

Verifying a cycle of values in table

I have a table which looks like this (the real table has dates and time in place of the Letters):
| assigned | start | end
| xyz | A | B
| xyz | B | C
| xyz | C | D
| xyz | D | E
| xyz | E | F
| fgh | A | B
| fgh | B | C
etc.
There is a rotation with each assigned code (xyz,fgh and so on) where 'end' is congruent with the next 'start' up to a value indicating a defined end (here 'F').
I am looking for a statement which scans/verifys that this rotation is indeed occurring, that it starts at A and ends with F and did every step up until then.
Any help is greatly appreciated.
edit: The rotation always uses 5 rows (or 4 steps), even if the intervall length can change in between.
This is really a hack that works because the dates are replaced by characters, but it might give you ideas on how to make it work for real.
select * from (
select a_code, min(a_start) as thestart, max(a_end) as theend,
substring(group_concat(a_start order by a_start), 3) as starts,
substring(group_concat(a_end order by a_end), 1, length(group_concat(a_end))-2) as ends
from so_test
group by a_code ) as grpSelect
where thestart = 'a'
and theend = 'f'
and starts = ends
The group_concat of a_start for xyz prduces a string of 'a,b,c,d,e' while the group_concat for a_end prduces b,c,d,e,f. The substring removes the a from the start and the f from the end so that the outer query can compare b,c,d,e in both strings.

Set column value if certain record exists in same table

How could I create a view for a table with subset of the table's records, all of the table's columns, plus additional "flag" column whose value is set to 'X' if the table contains certain type of record? For example, consider the following relations table Relations, where values for types stand for H'-human, 'D'-dog:
id | type | relation | related
--------------------------------
H1 | H | knows | D2
H1 | H | owns | D2
H2 | H | knows | D1
H2 | H | owns | D1
H3 | H | knows | D1
H3 | H | knows | D2
H3 | H | treats | D1
H3 | H | treats | D2
D1 | D | bites | H3
D2 | D | bites | H3
There may not be any particular order of records in this table.
I seek to create a view Humans which will contain all human-to-dog knows relations from Relations and all of the Relations's columns and additional column isOwner storing 'X' if a human in a given relation owns someone:
id | type | relation | related | isOwner
------------------------------------------
H1 | H | knows | D2 | X
H2 | H | knows | D1 | X
H3 | H | knows | D1 |
but struggling quite a bit with this. Do you know of a way to do it, preferably in one CREATE VIEW call, or any way really?
CREATE VIEW vHumanDogRelations
AS
SELECT
id,
type,
relation,
related,
-- Consider using a bit 0/1 instead
CASE
WHEN EXISTS (
SELECT 1
FROM Relations
WHERE
id = r.id
AND related = r.related -- Owns someone or this related only?
AND relation = 'owns'
) THEN 'X'
ELSE ''
END AS isOwner
FROM Relations r
WHERE
relation = 'knows'
AND type = 'H'
AND related = 'D';
You should be able to put the following select into the view definition
select *, case when exists
(select * from Relations where id = r.id and relation= 'owns') then 'X'
else '' end as isOwner
from Relations r
You could also use PIVOT to achieve the desired result. I'll explain the method in details because the final query may appear confusing.
First, derive a subset from Relation where type is H ans relation either knows or owns, replacing the owns value with X:
SELECT
id,
type,
relation = CASE relation WHEN 'owns' THEN 'X' ELSE relation END,
related
FROM Relations
WHERE type = 'H'
AND relation IN ('knows', 'owns')
Based on your example, you'll get this:
id type relation related
-- ---- -------- -------
H1 H knows D2
H1 H owns D2
H2 H knows D1
H2 H owns D1
H3 H knows D1
H3 H knows D2
Next, apply this PIVOT clause to the result of the first query:
PIVOT (
MAX(relation) FOR relation IN (knows, X)
) AS p
It will group rows with identical id, type, related values into a single row and split relation into two columns, knows and X:
id type related knows X
-- ---- ------- ----- ----
H1 H D2 knows X
H2 H D1 knows X
H3 H D1 knows NULL
H3 H D2 knows NULL
At this point you only need to rearrange the column set slightly for the output in the main SELECT clause, renaming knows to relation and X to isOwner along the way:
SELECT
id,
type,
relation = knows,
related,
isOwner = X
...
Output:
id type relation related isOwner
-- ---- -------- ------- -------
H1 H knows D2 X
H2 H knows D1 X
H3 H knows D1 NULL
H3 H knows D2 NULL
NULLs, of course, can easily be substituted with empty strings, if that is necessary.
One final touch may be to add this additional filter to the main query:
WHERE knows IS NOT NULL
just in case there can be people that own dogs without actually knowing them (and you don't want those in the output).
So, the complete query would look like this:
SELECT
id,
type,
relation = knows,
related,
isOwner = X
FROM (
SELECT
id,
type,
relation = CASE relation WHEN 'owns' THEN 'X' ELSE relation END,
related
FROM Relations
WHERE type = 'H'
AND relation IN ('knows', 'owns')
) AS s
PIVOT (
MAX(relation) FOR relation IN (knows, X)
) AS p
WHERE knows IS NOT NULL
;
A SQL Fiddle demo for this solution is available here.