Removing indirect left recursion from this grammar - grammar

I'm trying to figure out how to remove the indirect left recursion from the logical keyword expressions within my Rust port of a Ruby parser (https://github.com/kenaniah/ruby-parser/blob/master/ruby-parser/src/parsers/expression/logical.rs). The grammar looks like:
E --> N | A | O | t
N --> n E
A --> E a E
O --> E o E
E = expression
A = keyword_and_expression
O = keyword_or_expression
N = keyword_not_expression
How would I go about transforming this to remove the recursion in A and O?

According to this factorization tool, the resulting grammar would be:
E -> N
| A
| O
| t
N -> n E
A -> n E a E A'
| O a E A'
| t a E A'
O -> n E o E O'
| n E a E A' o E O'
| t a E A' o E O'
| t o E O'
A' -> a E A'
| ϵ
O' -> a E A' o E O'
| o E O'
| ϵ
Looks like the factorizations for A and O ended up being rather complex thanks to the multiple productions of E.

I think the problem here is not the indirect recursion but rather the ambiguity.
If it were just indirect recursion, you could simply substitute the right-hand sides of N, A and O, eliminating the indirect recursion:
E → n E
| E a E
| E o E
| t
In order to get rid of the direct left recursion, you can left-factor:
E → n E
| E A'
| t
A'→ a E
| o E
and then remove the remaining left-recursion:
E → n E E'
| t E'
E'→ ε
| A' E'
A'→ a E
| o E
That has no left-recursion, direct or indirect, and every rule starts with a unique terminal. However, it's not LL(1), because there's a first/follow conflict.
That's really not surprising, because the original grammar was highly ambiguous, and left-recursion elimination doesn't eliminate ambiguity. The original grammar really only makes sense if accompanied by an operator precedence table.
That table indicates that AND and OR are left-associative operators with the same precedence (a slightly unusual decision), while NOT is a unary operator with higher precedence. That, in turn, means that the BNF should have been written something like this:
N → n N
| t
E → A
| O
| N
A → E a N
O → E o N
N → n N
| t
The only difference from the grammar in the OP is the removal of ambiguity, as indicated by the precedence table.
Again, the first step is to substitute non-terminals A and O in order to make the left-recursion direct:
E → E a N
| E o N
| N
N → n N
| t
This is essentially the same grammar as the grammar for arithmetic expressions (ignoring multiplication, since there's only one precedence level), and the left-recursion can be eliminated directly:
E → N E'
E' → a E
| o E
| ε
N → n N
| t

Related

Recursive split of path with H2 DB and SQL

I've path names of the following common form (path depth not limited):
/a/b/c/d/e/...
Example
/a/b/c/d/e
Expected result
What I'd like to achieve now is to split the path into a table containing the folder and the respective parent:
parent
folder
/a/b/c/d/
e
/a/b/c/
d
/a/b/
c
/a/
b
/
a
The capabilities of the H2 db are a bit limited when it comes to splitting strings, thus my assumption was it must be solved recursively (especially since the path depth is not limited).
Any help would be appreciated :)
You need to use a recursive query, for example:
WITH RECURSIVE CTE(S, F, T) AS (
SELECT '/a/b/c/d/e', 0, 1
UNION ALL
SELECT S, T, LOCATE('/', S, T + 1)
FROM CTE
WHERE T <> 0
)
SELECT
SUBSTRING(S FROM 1 FOR F) PARENT,
SUBSTRING(S FROM F + 1 FOR
CASE T WHEN 0 THEN CHARACTER_LENGTH(S) ELSE T - F - 1 END) FOLDER
FROM CTE WHERE F > 0;
It produces
PARENT
FOLDER
/
a
/a/
b
/a/b/
c
/a/b/c/
d
/a/b/c/d/
e
Do something like this:
with recursive
p(p) as (select '/a/b/c/d/e' as p),
t(path, parent, folder, i) as (
select
p,
REGEXP_REPLACE(p, '(.*)/\w+', '$1'),
REGEXP_REPLACE(p, '.*/(\w+)', '$1'),
1
from p
union
select
t.parent,
REGEXP_REPLACE(t.parent, '(.*)/\w+', '$1'),
REGEXP_REPLACE(t.parent, '.*/(\w+)', '$1'),
t.i + 1
from t
where t.parent != ''
)
select *
from t;
resulting in
|PATH |PARENT |FOLDER|I |
|----------|--------|------|---|
|/a/b/c/d/e|/a/b/c/d|e |1 |
|/a/b/c/d |/a/b/c |d |2 |
|/a/b/c |/a/b |c |3 |
|/a/b |/a |b |4 |
|/a | |a |5 |
Not sure if you're really interested in trailing / characters, but you can easily fix the query according to your needs.

ANTLR (Lexer): separate arbitrary identifiers from keywords

I'm trying to create a (simple) Lexer for bat/cmd files (for syntax coloring). As part of this task, I need to separate keywords from (arbitrary) identifiers. But according to this answer ANTLR tries to let the longest match win over shorter ones. My grammar looks like this so far
lexer grammar CmdLexer;
Identifier
: IdentifierNonDigit
( IdentifierNonDigit
| Digit
)+
;
Number
: Digit+
;
fragment IdentifierNonDigit
: [a-zA-Z_\u0080-\uffff]
;
fragment Digit
: [0-9]
;
Punctuation
: [\u0021-\u002f\u003a-\u0040\u005b-\u0060\u007b-\u007f]+
;
Keyword
: A P P E N D
| A T
| A T T R I B
| B R E A K
| C A L L
| C D
| C H C P
| C H D I R
| C L S
| C O L O R
| C O P Y
| D A T E
| D E L
| D I R
| D O
| E C H O
| E D I T
| E N D L O C A L
| E Q U
| E X I S T
| E X I T
| F C
| F O R
| F T Y P E
| G O T O
| G E Q
| G T R
| I F
| I N
| L E Q
| L S S
| M D
| M K D I R
| M K L I N K
| M O R E
| M O V E
| N E Q
| N O T
| N U L
| P A T H
| P A U S E
| P O P D
| P U S H D
| R D
| R E N
| R E N A M E
| S E T
| S E T L O C A L
| S H I F T
| S T A R T
| T I T L E
| T R E E
| T Y P E
| W H E R E
| W H O A M I
| X C O P Y
;
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
Whitespace
: [ \t]+
-> skip
;
Newline
: ( '\r' '\n'?
| '\n'
)
-> skip
;
LineComment
: ( '#'? R E M ~[\r\n]*
| '#'? '::' ~[\r\n]*
)
-> skip
;
but it always matches everything as Identifier, even words like append or CALL. I don't see how modes would solve this problem here, but how to give a certain rule higher priority (here Keyword over another (here Identifier)?
But according to this answer ANTLR tries to let the longest match win over shorter ones.
It does and that should be what you want. Note that this rule (the so-called maximal munch rule) has nothing to do with whether append is matched as a keyword or identifier. It has to do with whether appendix is matched as the keyword append, followed by the identifier ix; or as the single identifier appendix. Since the latter is clearly what one wants in most contexts, the maximal munch rule is useful.
What matters in this case though is what happens if multiple rules produce a match of the same length. In that case ANTLR applies the rules that is defined first in the grammar. So if you change the order of your definitions so that Keyword comes before Identifier, the Keyword rule will take precedence in cases where both rules would produce a match of the same length (and the longest match would still win in cases where that's not the case). So an input like append appendix would be tokenized as the keyword append, followed by the identifier appendix, which should be what you want.
PS: I'm not sure where/how your lexer is going to be used, but generally you'd want to distinguish between different keywords instead of having one rule that matches all keywords. If the tokens are going to be used as an input to parser, the information that something is a keyword is not very useful without knowing which keyword it is.

SQL pairing adjacent rows

Using postgreSQL (latest). I'm a total noob.
I have a view that always gives me a table of an even number of rows- no duplicates (the letters are analogous to unique keys) and no nulls, let's call it letter_view:
| letter |
|-------------|
| A |
| B |
| C |
| D |
| E |
| F |
| G |
| H |
My view already uses an ORDER BY clause so the table is pre-sorted.
What I'm trying to do is merge every two rows into a single row
with each value from those two rows. So for n rows, I need the result set to have
n / 2 rows with combined adjacent rows.
| l1 | l2 |
|-------|------|
| A | B |
| C | D |
| E | F |
| G | H |
I've tried using lead and I think I'm close but I can't quite get it in the format I need.
My best query attempt looks like this:
SELECT letter AS letter_1, lead(letter, 1) OVER (PARTITION BY 2) AS letter_2 from letter_view;
but I get:
letter_1 | letter_2
----------+----------
A | B
B | C <--- Don't need this
C | D
D | E <--- Don't need this
E | F
F | G <--- Don't need this
G | H
H | <--- Don't need this
(8 rows)
I checked several other answers on SO, and looked through
the PostgreSQL docs and w3C SQL tutorials but I can't find a succinct answer.
What is this technique called and how would I do it?
I'm trying to do this in pure SQL if possible.
I know I could use multiple queries with LIMIT and OFFSET to get the data with multiple selects or potentially by using a cursor but that seems very inefficient for large input sets although I could be totally wrong. Again, total noob.
Any help in the right direction is highly appreciated.
You can use lead() to get the next value . . . but you need a way to filter as well. I would suggest row_number():
select letter_1, letter_2
from (select letter AS letter_1,
lead(letter, 1) OVER (PARTITION BY 2 order by ??) AS letter_2,
row_number() over (partition by 2 order by ??) as seqnum
from letter_view
) lv
where seqnum % 2 = 1;
Notes:
I included the partition clause as you have in the original code. I don't know what "2" refers to.
You should be explicit about the order by. It is not wise to depend on the ordering of some underlying table or view.

MS Access: Show entire group, based on another field

I would like to show all classes and grades of only those students who have at least one "F" grade.
Here is the source table:
ID | Students | Class | Grade
1 | Addams, W | History | A
2 | Addams, W | Biology | A
3 | Addams, W | French | B
4 | Jetson, E | Spanish | B
5 | Jetson, E | Geometry | B
6 | Jetson, E | Biology | F
7 | Rubble, B | English | F
8 | Rubble, B | Geometry | B
9 | Rubble, B | Biology | B
10 | Flintstone, P | Music | A
11 | Flintstone, P | Spanish | B
Here is a report, grouped by Students:
Addams, W
---------------French B
---------------Biology A
---------------History A
Flintstone, P
---------------Spanish B
---------------Music A
Jetson, E
---------------Biology F
---------------Geometry B
---------------Spanish B
Rubble, B
---------------Biology B
---------------Geometry B
---------------English F
Again, I would like to show all classes and grades of only those students who have at least one "F" grade, as seen below:
Jetson, E
---------------Biology F
---------------Geometry B
---------------Spanish B
Rubble, B
---------------Biology B
---------------Geometry B
---------------English F
Any assistance would be much appreciated.
Create a query with your table as the source. Drop in the Students field in the first column and put the following formula in the second column: IIf([Grade]=”F”,1,0), then save the query. (by default Access will name this column “Expr1”, but you change it to whatever you like)
Create a 2nd query using query 1 as the source, drop in the 2 columns from query 1, group on, sum the column with the formula, and in this column add a criteria of >=1, and save. You now have a table with only the students who have at least 1 “F”. (you group on by placing cursor in the grid section of the query at the bottom, right-clicking and selecting "Totals" from the prompt box)
Create a 3rd query tying the 2nd query to your original source table by connecting the Students field with a 1 to 1 match (i.e. join type 1).
You can query your table using a subquery to query on the same table for any instances having a F grade:
SELECT a.ID, a.Students, a.Class, a.Grade
FROM yourtable AS a
WHERE EXISTS
(
SELECT '1'
FROM yourtable AS b
WHERE a.Students = b.Students
AND b.Grade = 'F'
);
Next, base your report on above query.
OP here: Both answers worked; thanks again. Here is the code for the second answer:
Query 1
SELECT Students2.Students, IIf([Grade]="F",1,0) AS F_grade
FROM Students2;
Query 2
SELECT Query1.Students, Sum(Query1.F_grade) AS SumOfF_grade
FROM Query1
GROUP BY Query1.Students
HAVING (((Sum(Query1.F_grade))>=1));
Query 3
SELECT Students2.Students, Students2.Class, Students2.Grade
FROM Students2 INNER JOIN Query2 ON Students2.Students = Query2.Students;

Verifying a cycle of values in table

I have a table which looks like this (the real table has dates and time in place of the Letters):
| assigned | start | end
| xyz | A | B
| xyz | B | C
| xyz | C | D
| xyz | D | E
| xyz | E | F
| fgh | A | B
| fgh | B | C
etc.
There is a rotation with each assigned code (xyz,fgh and so on) where 'end' is congruent with the next 'start' up to a value indicating a defined end (here 'F').
I am looking for a statement which scans/verifys that this rotation is indeed occurring, that it starts at A and ends with F and did every step up until then.
Any help is greatly appreciated.
edit: The rotation always uses 5 rows (or 4 steps), even if the intervall length can change in between.
This is really a hack that works because the dates are replaced by characters, but it might give you ideas on how to make it work for real.
select * from (
select a_code, min(a_start) as thestart, max(a_end) as theend,
substring(group_concat(a_start order by a_start), 3) as starts,
substring(group_concat(a_end order by a_end), 1, length(group_concat(a_end))-2) as ends
from so_test
group by a_code ) as grpSelect
where thestart = 'a'
and theend = 'f'
and starts = ends
The group_concat of a_start for xyz prduces a string of 'a,b,c,d,e' while the group_concat for a_end prduces b,c,d,e,f. The substring removes the a from the start and the f from the end so that the outer query can compare b,c,d,e in both strings.