How to optimize this grammar rule? - grammar

I am implementing a grammar using the TatSu python library. My grammar is working OK but there is one rule that is eating quite a bit of time. On a block of around 3000 lines (part of a bigger grammar), if I take this full rule, it takes about 42s to parse the entire block. If I reduce this rule to just a few tokens, the runtime drops from 42s to 33s (~20% improvement).
The rule is shown below and it should match a series of event separated by the '/'.
events
=
'/'%{event}
;
event
=
'D' | 'U' | 'Z' | 'P' | 'L' | 'H' | 'x' | 'X' | 'T' | 'V' | 'l' | 'h' | 't' | 'v' | 'N' | 'A' | 'B' | 'F' | '?' | 'G' | 'R' | 'Q' | 'M'
| 'ForceDown' | 'ForceUp' | 'ForceOff' | 'ForcePrior' | 'CompareLow' | 'CompareHigh' | 'CompareUnknown' | 'CompareOff'
| 'CompareValid' | 'CompareLowWindow' | 'CompareHighWindow' | 'CompareOffWindow' | 'CompareValidWindow' | 'ForceUnknown'
| 'LogicLow' | 'LogicHigh' | 'LogicZ' | 'Unknown' | 'ExpectHigh' | 'ExpectLow' | 'ExpectOff' | 'Marker'
;
If I change event to the following, I get faster parsing.
event
=
/[DUZPLHXT]/
;
So is it possible to improve the rule above in some way to get faster processing? Thanks for any ideas.

As you noted, for a rule with many options which are just tokens it's much more efficient to use patterns (regular expressions).
But runtime ultimately depends on how some rules call upon each other.
A simple optimization you can try is to add a cut (˜) expression so each event is tried at most once (although a cut should be implicit in the % expression).
event
=
(
'D' | 'U' | 'Z' | 'P' | 'L' | 'H' | 'x' | 'X' | 'T' | 'V' | 'l' | 'h' | 't' | 'v' | 'N' | 'A' | 'B' | 'F' | '?' | 'G' | 'R' | 'Q' | 'M'
| 'ForceDown' | 'ForceUp' | 'ForceOff' | 'ForcePrior' | 'CompareLow' | 'CompareHigh' | 'CompareUnknown' | 'CompareOff'
| 'CompareValid' | 'CompareLowWindow' | 'CompareHighWindow' | 'CompareOffWindow' | 'CompareValidWindow' | 'ForceUnknown'
| 'LogicLow' | 'LogicHigh' | 'LogicZ' | 'Unknown' | 'ExpectHigh' | 'ExpectLow' | 'ExpectOff' | 'Marker'
) ~
;
That said, because the rule is so much of the lexical kind, I'd opt for the regular expression.
event
=
/(?x)
'D' | 'U' | 'Z' | 'P' | 'L' | 'H' | 'x' | 'X' | 'T' | 'V' | 'l' | 'h' | 't' | 'v' | 'N' | 'A' | 'B' | 'F' | '?' | 'G' | 'R' | 'Q' | 'M'
| 'ForceDown' | 'ForceUp' | 'ForceOff' | 'ForcePrior' | 'CompareLow' | 'CompareHigh' | 'CompareUnknown' | 'CompareOff'
| 'CompareValid' | 'CompareLowWindow' | 'CompareHighWindow' | 'CompareOffWindow' | 'CompareValidWindow' | 'ForceUnknown'
| 'LogicLow' | 'LogicHigh' | 'LogicZ' | 'Unknown' | 'ExpectHigh' | 'ExpectLow' | 'ExpectOff' | 'Marker'
/
;

Related

Generate random pairs SQL

Suppose we have these two tables.
TABLE1:
|column_1 | ... |
--------------------
| 'a' | ... |
| 'b' | ... |
| 'c' | ... |
| 'd' | ... |
| 'e' | ... |
TABLE_2:
|column_1 | ... |
--------------------
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 4 | ... |
| 5 | ... |
I want to pair all rows of TABLE_1 with some random columns from TABLE_2 where each pair is gonna have a random amount of distinct rows from TABLE_2 (range 1,2,3)
An output could be:
|column_1 | column_2 |
---------------------------
| 'a' | 1 |
| 'a' | 2 |
| 'a' | 5 |
| 'b' | 5 |
| 'c' | 3 |
| 'c' | 4 |
| 'd' | 3 |
| 'e' | 3 |
| 'e' | 5 |
| 'e' | 1 |
JOIN LATERAL
did the thing for me.
SELECT *
FROM TABLE1
LEFT JOIN LATERAL(
SELECT *
FROM TABLE2 LIMIT FLOOR(RANDOM() * 3 + 1)) a
ON TRUE

SQL - BigQuery - Using Group & MAX in several columns - Similar to a pivot table

How would you approach this via SQL? Let's take this example
| id | type | score_a | score_b | score_c | label_a | label_b | label_c |
|----|------|---------|---------|---------|---------|---------|---------|
| 1 | A | 0.9 | | | L1 | | |
| 1 | B | | 0.7 | | | L2 | |
| 1 | B | | 0.2 | | | L3 | |
| 1 | C | | | 0.2 | | | L4 |
| 1 | C | | | 0.18 | | | L5 |
| 1 | C | | | 0.12 | | | L6 |
| 2 | A | 0.6 | | | L1 | | |
| 2 | A | 0.3 | | | L2 | | |
I want to return the max score per type in conjunction with the label_X, Almost, like a pivot table but with these custom column names. So the outcome of the above will be like:
| id | type | score_a | label_a | score_b | label_b | score_c | label_c |
|----|------|---------|---------|---------|---------|---------|---------|
| 1 | A | 0.9 | L1 | 0.7 | L2 | 0.2 | L4 |
| 2 | A | 0.6 | L1 | NULL | NULL | NULL | NULL |
Something like this is wrong as it yields both results per type per label
SELECT id,
MAX(score_a) as score_a,
label_a,
MAX(score_b) as score_b,
label_b as label_b,
MAX(score_c) as score_c,
label_c
FROM sample_table
GROUP BY id, label_a, label_b, label_c
Is there an easy way to do this via SQL, I'm doing it right now from BigQuery and tried also pivot table as described here but still no luck on how to flatten these into one big row with several columns
Any other ideas?
UPDATE
Expanding on what BGM mentioned about design; the source of this data is a table with the following form:
| id | type | label | score |
|----|------|-------|-------|
| 1 | A | L1 | 0.9 |
| 1 | B | L2 | 0.7 |
| 1 | B | L3 | 0.2 |
| 1 | C | L4 | 0.6 |
| 1 | C | L5 | 0.2 |
That gets converted to a flattened state as depicted at the top of this question using a query like
SELECT id,
type,
MAX(CASE WHEN type = 'A' THEN score ELSE 0 END) as score_a,
MAX(CASE WHEN type = 'B' THEN score ELSE 0 END) as score_b,
MAX(CASE WHEN type = 'C' THEN score ELSE 0 END) as score_c,
MAX(CASE WHEN model_type = 'theme' THEN label_score ELSE 0 END) as
-- labels
(CASE WHEN type = 'A' THEN label ELSE '' END) as label_a,
(CASE WHEN type = 'B' THEN label ELSE '' END) as label_b,
(CASE WHEN type = 'C' THEN label ELSE '' END) as label_c,
FROM table
GROUP id, label_a, label_b, label_c
Do you think the intermediate step is unnecessary to get to the final solution?
You can do conditional aggregation. In Big Query, arrays come handy for this:
select
id,
max(score_a) score_a,
array_agg(label_a order by score_a desc limit 1)[offset(0)] label_a,
max(score_b) score_b,
array_agg(label_b order by score_b desc limit 1)[offset(0)] label_b,
max(score_c) score_c,
array_agg(label_c order by score_c desc limit 1)[offset(0)] label_c
from mytable
group by id
Note: in terms of design, you should not have multiple columns to store the scores and labels per types; you already have a column that represents the types, so you should have just two columns for the store and type.

Join four tables in SQL

I am trying to join four tables, with almost the same data together.
+------------------+------------------+-------------------+-------------------+--+
| tableone.letters | tabletwo.letters | tablethree.leters | tablefour.letters | |
+------------------+------------------+-------------------+-------------------+--+
| 'a' | 'a' | 'a' | 'a' | |
+------------------+------------------+-------------------+-------------------+--+
| 'b' | 'b' | 'b' | 'e' | |
+------------------+------------------+-------------------+-------------------+--+
| 'c' | 'c' | 'c' | 'g' | |
+------------------+------------------+-------------------+-------------------+--+
| 'd' | 'd' | 'e' | | |
+------------------+------------------+-------------------+-------------------+--+
| 'e' | 'f' | | | |
+------------------+------------------+-------------------+-------------------+--+
| 'f' | 'g' | | | |
+------------------+------------------+-------------------+-------------------+--+
| 'g' | 'h' | | | |
+------------------+------------------+-------------------+-------------------+--+
| 'h' | 'i' | | | |
+------------------+------------------+-------------------+-------------------+--+
| 'i' | | | | |
+------------------+------------------+-------------------+-------------------+--+
| 'j' | | | | |
+------------------+------------------+-------------------+-------------------+--+
SELECT DISTINCT tableone.letters, tabletwo.letters, tablethree.letters, tablefour.letters FROM querytesting.tableone
FULL JOIN querytesting.tabletwo
ON tableone.letters = tabletwo.letters
FULL JOIN querytesting.tablethree
ON tabletwo.letters = tablethree.letters
FULL JOIN querytesting.tablefour
ON tablethree.letters = tablefour.letters;
When I join them I get the following result:
+------+------+------+------+--+
| "a" | "a" | "a" | "a" | |
+------+------+------+------+--+
| "b" | "b" | "b" | null | |
+------+------+------+------+--+
| "c" | "c" | "c" | null | |
+------+------+------+------+--+
| "d" | "d" | null | null | |
+------+------+------+------+--+
| "e" | null | null | null | |
+------+------+------+------+--+
| "f" | "f" | null | null | |
+------+------+------+------+--+
| "g" | "g" | null | null | |
+------+------+------+------+--+
| "h" | "h" | null | null | |
+------+------+------+------+--+
| "i" | "i" | null | null | |
+------+------+------+------+--+
| "j" | null | null | null | |
+------+------+------+------+--+
| "k" | null | null | null | |
+------+------+------+------+--+
| "l" | null | null | null | |
+------+------+------+------+--+
| null | null | "e" | "e" | |
+------+------+------+------+--+
| null | null | null | "g" | |
+------+------+------+------+--+
This is not the result I expected. I wanted the 'e' and 'g' in the third and fourth column to line up perfectly with the 'e' and 'g' in the first column.
Is there any way to do that?
Thank you in advance!
Given your explanation all you need to do is FULL JOIN all tables with tableone, though I'm not exactly sure if this is the actual intention specially if all data is only sample. Here:
SELECT DISTINCT tableone.letters, tabletwo.letters, tablethree.letters, tablefour.letters
FROM tableone
FULL JOIN tabletwo
ON tableone.letters = tabletwo.letters
FULL JOIN tablethree
ON tableone.letters = tablethree.letters
FULL JOIN tablefour
ON tableone.letters = tablefour.letters;
See it working here: http://sqlfiddle.com/#!17/acc44/4

Nested case with multiple sub conditions

I'm having trouble to understand how to nest case statements properly.
(MSSQL Server 2012)
Let's have the following table given.
The Column StatusMissing is what I want to create
+------+--+------+--+------+--+------+--+------+--+------+--+---------------+
| a1 | | a2 | | a3 | | b1 | | c1 | | d2 | | StatusMissing |
+------+--+------+--+------+--+------+--+------+--+------+--+---------------+
| OK | | OK | | OK | | OK | | OK | | OK | | AllOK |
| NULL | | NULL | | OK | | OK | | OK | | OK | | As |
| OK | | NULL | | OK | | OK | | OK | | OK | | As |
| OK | | OK | | NULL | | OK | | OK | | OK | | As |
| OK | | OK | | OK | | NULL | | OK | | OK | | B |
| OK | | OK | | OK | | OK | | NULL | | OK | | C |
| OK | | OK | | OK | | OK | | OK | | NULL | | D |
| NULL | | OK | | OK | | NULL | | NULL | | OK | | ABC |
| NULL | | OK | | OK | | OK | | NULL | | NULL | | ACD |
| NULL | | OK | | OK | | NULL | | OK | | NULL | | ABD |
| NULL | | OK | | OK | | NULL | | NULL | | NULL | | ABCD |
| NULL | | OK | | OK | | OK | | NULL | | NULL | | ACD |
| OK | | OK | | OK | | NULL | | NULL | | OK | | BC |
| OK | | OK | | OK | | OK | | OK | | OK | | AllOK |
| OK | | NULL | | OK | | OK | | NULL | | OK | | AC |
| OK | | OK | | OK | | NULL | | OK | | NULL | | BD |
| OK | | OK | | OK | | OK | | NULL | | NULL | | CD |
+------+--+------+--+------+--+------+--+------+--+------+--+---------------+
First, to understand the concept of nesting I simplified the table:
+------+--+------+--+------+
| a1 | | a2 | | b1 |
+------+--+------+--+------+
| OK | | OK | | OK |
| OK | | OK | | NULL |
| OK | | NULL | | OK |
| NULL | | OK | | OK |
| NULL | | NULL | | OK |
| NULL | | OK | | NULL |
| OK | | NULL | | NULL |
+------+--+------+--+------+
These attempts lead to these failures.
Query1
SELECT a1, a2, b1 'StatusMissing' =
CASE
WHEN a1 IS NULL
THEN
CASE
WHEN a1 IS NULL
THEN
CASE
WHEN b1 IS NULL
THEN 'AB'
END
ELSE 'A'
END
WHEN b1 IS NULL
THEN 'B'
ELSE 'AllOK'
END
FROM Table;
Result1:
+------+--+------+--+------+--+---------------+
| a1 | | a2 | | b1 | | StatusMissing |
+------+--+------+--+------+--+---------------+
| OK | | OK | | OK | | AllOK |
| OK | | OK | | NULL | | B |
| OK | | NULL | | OK | | AllOK |
| NULL | | OK | | OK | | NULL |
| NULL | | NULL | | OK | | NULL |
| NULL | | OK | | NULL | | AB |
| OK | | NULL | | NULL | | B |
+------+--+------+--+------+--+---------------+
Query2 (Else as main)
SELECT a1, a2, b1, 'Status' =
CASE
WHEN a1 IS NOT NULL AND a2 IS NOT NULL AND b1 IS NOT NULL
THEN 'AllOK!'
ELSE
CASE
WHEN a2 IS NOT NULL OR a2 IS NOT NULL
THEN
CASE
WHEN b1 IS NULL
THEN 'AB'
END
WHEN b1 IS NULL
THEN 'B'
ELSE 'A'
END
END
FROM Table;
Result2
+------+--+------+--+------+--+---------------+
| a1 | | a2 | | b1 | | StatusMissing |
+------+--+------+--+------+--+---------------+
| OK | | OK | | OK | | AllOK |
| OK | | OK | | NULL | | AB |
| OK | | NULL | | OK | | A |
| NULL | | OK | | OK | | NULL |
| NULL | | NULL | | OK | | A |
| NULL | | OK | | NULL | | AB |
| OK | | NULL | | NULL | | B |
+------+--+------+--+------+--+---------------+
What the hell am I doing wrong?
I'm quite new to SQL, so if there is a proper function to do this I would appreciate the info!
EDIT:
If something like this would be possible in SQL i mean:
Column StatusMissing = ' missing'
If(a1 == NULL) { StatusMissing += 'A'}
EDIT2:
The column StatusMissing IS NOT THERE!
I want to create it using the SQL statements like below.
SELECT .... Status =
So basically I only have A1,A2,B1 (in the simple table). Please don't get confused with the first Table. It's only there to SHOW HOW IT SHOULD look like.
For the simplified table, assuming data type to be nvarchar.
Try using UPDATE-
UPDATE [dbo].[StatusMissing]
SET result='';
UPDATE [dbo].[StatusMissing]
SET result= CONCAT(result , 'A')
WHERE a1 is null or a2 is null;
UPDATE [dbo].[StatusMissing]
SET result= CONCAT(result , 'B')
WHERE b1 is null ;
UPDATE [dbo].[StatusMissing]
SET result= 'AllOK'
WHERE result ='';
This can be done in one step as well.
I might suggest that you make two small modifications to your output:
Instead of "As", just say "A".
Instead of "AllOK", just leave the field blank.
With these modifications, the rules are pretty easy:
select t.*,
((case when a1 is null or a2 is null or a3 is null then 'A' else '' end) +
(case when b1 is null then 'B' else '' end) +
(case when c1 is null then 'C' else '' end) +
(case when d1 is null then 'D' else '' end)
) as StatusMissing
from table t;
If you do want your version, a subquery is perhaps the easiest way:
select t. . . .,
(case when StatusMissing = '' then 'AllOK'
when StatusMissing = 'A' then 'As'
else StatusMissing
end) as StatusMissing
from (select t.*,
((case when a1 is null or a2 is null or a3 is null then 'A' else '' end) +
(case when b1 is null then 'B' else '' end) +
(case when c1 is null then 'C' else '' end) +
(case when d1 is null then 'D' else '' end)
) as StatusMissing
from table t
) t
You can play with COALESCE and a couple of CASE conditions
SELECT a1,
a2,
a3,
b1,
c1,
d2,
COALESCE(
CASE WHEN
b1 = 'OK'
AND c1 = 'OK'
AND d2 = 'OK'
AND (a1 IS NULL OR a2 IS NULL OR a3 is NULL)
THEN 'As'
ELSE ''
END,
CASE WHEN
(a1 IS NULL OR a2 IS NULL or a3 is NULL)
THEN 'A'
END
+ CASE WHEN
b1 IS NULL
THEN 'B'
ELSE ''
END
+ CASE WHEN
c1 IS NULL
THEN 'C'
ELSE ''
END
+ CASE WHEN
d2 IS NULL
THEN 'D'
ELSE ''
END,
'AllOK') AS 'StatusMissing'
FROM Table;

Select 5 of each distinct value

I have the following table in PostgreSQL:
| a | b | c |
===================
| 'w' | 2 | 3 |
| 'w' | 7 | 2 |
| 'w' | 8 | 1 |
| 'w' | 3 | 6 |
| 'w' | 0 | 8 |
| 'w' | 2 | 9 |
| 'w' | 2 | 9 |
| 'z' | 4 | 9 |
| 'z' | 0 | 9 |
| 'z' | 0 | 8 |
| 'z' | 3 | 6 |
| 'z' | 2 | 7 |
| 'z' | 3 | 1 |
| 'z' | 3 | 2 |
| 'z' | 3 | 3 |
I want to select all records, but limit them to 5 records for each distinct value in column a.
So the result would look like:
| a | b | c |
===================
| 'w' | 2 | 3 |
| 'w' | 7 | 2 |
| 'w' | 8 | 1 |
| 'w' | 3 | 6 |
| 'w' | 0 | 8 |
| 'z' | 4 | 9 |
| 'z' | 0 | 9 |
| 'z' | 0 | 8 |
| 'z' | 3 | 6 |
| 'z' | 2 | 7 |
What is the most effecient way to achieve that in RoR? Thanks!
you can use row_number, but you have to specify order or you will get unpredictable resutls
with cte as (
select
*,
row_number() over(partition by a order by b, c) as row_num
from table1
)
select a, b, c
from cte
where row_num <= 5