Why does this yacc code generate shift/reduce conflicts - yacc

I don't know why the following code will generate shift / reduce conflicts
primary_no_literal_expression
: IDENTIFIER
{
$$ = mioc_create_identifier_expression($1);
}
| IDENTIFIER LC RC // shift/reduce conflicts
{
}
| function_no_argument_call_expression
| function_with_argument_call_expression
| primary_no_literal_expression slice_index_expression
| primary_no_literal_expression DOT IDENTIFIER
| primary_no_literal_expression DOT IDENTIFIER LP RP
| primary_no_literal_expression DOT function_with_argument_call_expression
;
show State 53 conflicts: 1 shift/reduce
y.output in state 53
state 53
95 primary_no_literal_expression: IDENTIFIER .
96 | IDENTIFIER . LC RC
103 function_no_argument_call_expression: IDENTIFIER . LP RP
107 function_with_argument_list: IDENTIFIER . LP argument_list RP
LP shift, and go to state 105
LC shift, and go to state 106
LC [reduce using rule 95 (primary_no_literal_expression)]
$default reduce using rule 95 (primary_no_literal_expression)
state 106
96 primary_no_literal_expression: IDENTIFIER LC . RC
RC shift, and go to state 159
...
state 159
96 primary_no_literal_expression: IDENTIFIER LC RC .
$default reduce using rule 96 (primary_no_literal_expression)
rule 95
95 primary_no_literal_expression: IDENTIFIER
96 | IDENTIFIER LC RC
97 | function_no_argument_call_expression
98 | function_with_argument_call_expression
99 | primary_no_literal_expression slice_index_expression
100 | primary_no_literal_expression DOT IDENTIFIER
101 | primary_no_literal_expression DOT IDENTIFIER LP RP
102 | primary_no_literal_expression DOT function_with_argument_call_expression
LC [reduce using rule 95 (primary_no_literal_expression)]
primary_no_literal_expression rule is "IDENTIFIER LC RC"(rule 95) Why just read LC to reduce?
full code
y.output https://controlc.com/56d66aea/fullscreen.php?hash=4f3bc1c214e1f3347b3a64df69a9b519&toolbar=true&linenum=false
test.y https://controlc.com/04d49e1b/fullscreen.php?hash=6dde9e69b9ea5873ff6ac234474a5927&toolbar=true&linenum=false

The shift/reduce conflict comes becuase there is some context where primary_no_literal_expression is used (on the right hand side of some rule), where it can be followed by an LC token. That means after seeing an IDENTIFIER token, when the next token (lookahead) is LC the parser doesn't know whether it should shift the LC to (to eventuallay match the primary_no_literal_expression: IDENTIFIER LC RC rule) or reduce the primary_no_literal_expression: IDENTIFIER rule (to match whatever it is that allows an LC after a primary_no_literal_expression
You need to find that rule and figure out what to do about it -- either get rid of one of the rules (if things are ambiguous), or figure out how to know which rule to match (based on additional lookahead and/or lexer feedback, or something else)
In your case the culprit (one of them, at least) is probably the rule if_statement: IF logical_or_expression block since a logical_or_expression can expand to (or end with) a primary_no_literal_expression, and a block can start with an LC. So the conflict is telling you the parser can't tell where an expression like this ends, and the block begins.

Related

ERROR: Function SUBSTR requires a character expression as argument 1. and adding zeroes in front of data

My end goal is to add zeroes in front of my data, so 918 becomes 0918 and 10 becomes 0010 limited at 4 characters. My solution so far is to use SUBSTR like i do below:
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_DAGLIGEKORREKTION_0000 AS
SELECT (SUBSTR(line_item, 1, 4)) AS line_item,
(SUBSTR(column_item, 1, 4)) AS column_item
FROM QUERY_FOR_DAGLIGEKORREKTIONER t1;
QUIT;
But when I run my query I get the following error:
ERROR: Function SUBSTR requires a character expression as argument 1.
ERROR: Function SUBSTR requires a character expression as argument 1.
This is my data set:
line_item column_item
918 10
230 10
260 10
918 10
918 10
918 10
70 10
80 10
110 10
250 10
35 10
What am I doing wrong? and is there another maybe easier way to add zeroes in fornt of my data?
I hope you can lead me in the right direction.
In SAS you can associate a format with a numeric variable to specify how the value is rendered when output in a report or displayed in a query result.
Example:
Specify a column to be displayed using the Z<n>. format.
select <numeric-var> format=z4.
The underlying column is still numeric.
If you want to convert the numeric result permanently to a character type, use the PUT function.
select PUT(<numeric-expression>, Z4.) as <column-name>
I found a solution by searching for something similar to the Oracle solution by #d r and I found the following solution to the problem:
put(line_item, z4.) AS PAD_line_item,
put(column_item, z4.) AS PAD_column_item,
resulting in:
line_item column_item
0918 0010
0230 0010
0260 0010
0918 0010
0918 0010
0918 0010
0070 0010
0080 0010
0110 0010
0250 0010
0035 0010
I hope this will help someone in the future with leading zeroes.
Oracle
Select
LPAD(1, 4, '0') "A",
LPAD(12, 4, '0') "B",
LPAD(123, 4, '0') "C",
LPAD(1234, 4, '0') "D",
LPAD(12345, 4, '0') "E"
From Dual
--
-- R e s u l t
--
-- A B C D E
-- ---- ---- ---- ---- ----
-- 0001 0012 0123 1234 1234
Add the value to 10,000; Cast the result to a VARCHAR(5) (or longer); Get SUBSTR(2,4) out of that.
SELECT
SUBSTR((line_item + 10000)::VARCHAR(5),2,4) AS s_line_item
, SUBSTR((column_item + 10000)::VARCHAR(5),2,4) AS s_column_item
FROM indata;
-- out s_line_item | s_column_item
-- out -------------+---------------
-- out 0918 | 0010
-- out 0230 | 0010
-- out 0260 | 0010
-- out 0918 | 0010
-- out 0918 | 0010
-- out 0918 | 0010
-- out 0070 | 0010
-- out 0080 | 0010
-- out 0110 | 0010
-- out 0250 | 0010
-- out 0035 | 0010

SQL Server not displaying currency symbol correctly

I have the following encoding problem: In 2 databases I should have the same data (2nd database is newer version of 1st). In some tables, characters are not displaying correctly, such a currency table that holds the name and symbols for different currencies.
I use SSMS to query both databases.
Id Name R7 R8 Name Different
-------------------------------------
148 DZD DZD 0
37 EGP £ £ EGP 1
149 ERN ERN 0
150 ETB ETB 0
1 EUR € € EUR 1
40 FJD $ $ FJD 0
39 FKP £ £ FKP 1
2 GBP £ £ GBP 1
151 GEL GEL 0
46 GGP £ £ GGP 1
42 GHC ¢ ¢ GHC 1
152 GHS GHS 0
Both tables(Currency) have the same structure and encoding for symbol col(R7 & R8): SQL_Latin1_General_CP1_CI_AS. I have tried to look up encoding solutions, but have run out of ideas of what to ask in google.
Does anyone know what might cause R7 to display incorrectly while R8 displays correctly?
Column definition for R7 (ShortName):
Column definition for R8 (ShortName):
#AlwaysLearning :

Alphanumeric Sorting in PostgreSQL 9.4

I've a table in PostgreSQL 9.4 database in which one of the column contains data both integer and alphabets in following format.
1
10
10A
10A1
1A
1A1
1A1A
1B
1C
1C1
2
65
89
Format is, it starts with a number then an alphabet then number then alphabet and it goes on. I want to sort the field like below,
1
1A
1A1
1A1A
1B
1C
1C1
2
10
10A
10A1
65
89
But when sorting 10 comes before 2. Please suggest a possible query to obtain desired result.
Thanks in advance
Try this
SELECT *
FROM table_name
ORDER BY (substring(column_name, '^[0-9]+'))::int -- cast to integer
,coalesce(substring(column_name, '[^0-9_].*$'),'')

Import (non-CSV) text data to PostgreSQL, which is separated via spaces and one capital letter

This is the first time I am working with SQL. I am using PostgreSQL on Windows 7 64bit.
I have the following (large) .txt file of tweets built like this:
T 2009-06-07 02:07:41
U http://twitter.com/cyberplumber
W SPC Severe Thunderstorm Watch 339: WW 339 SEVERE TSTM KS NE 070200Z - 070800Z URGENT - IMMEDIATE BROADCAST REQUE.. http://tinyurl.com/5th9sw
As you see, all three "columns" are separated in the following fashion: T \t (same goes for U and W) instead of the traditional comma (,).
I would like to import the whole file into a SQL table with columns named date, user and text_msg.
I am guessing I will probably have to parse it in some way. Any ideas how to get the data into a table in the simplest and most efficient manner? Please also consider that the .txt files in question are rather huge (>4GB) and thus there is no easy way for me to edit them manually.
Quick&dirty hack:
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE lutser
( id SERIAL NOT NULL PRIMARY KEY
, ztxt text
);
CREATE TABLE tweetdeck
( id SERIAL NOT NULL PRIMARY KEY
, stamp timestamp NOT NULL
, zurl text
, ztxt text
);
COPY lutser(ztxt)
FROM '/tmp/tweet.dat'
;
INSERT INTO tweetdeck (stamp, zurl, ztxt)
SELECT regexp_replace( t.ztxt, E'^[A-Z][ \t]*', '')::timestamp
, regexp_replace( u.ztxt, E'^[A-Z][ \t]*', '')
, regexp_replace( w.ztxt, E'^[A-Z][ \t]*', '')
FROM lutser t
JOIN lutser u ON u.id = t.id+1
JOIN lutser w ON w.id = t.id+2
WHERE t.id %3 = 1
AND LEFT(t.ztxt,1) = 'T' -- Should be redundant, Won't harm
AND LEFT(u.ztxt,1) = 'U'
AND LEFT(w.ztxt,1) = 'W'
;
SELECT * FROM lutser;
SELECT * FROM tweetdeck;
Results:
COPY 9
INSERT 0 3
id | ztxt
----+--------------------------------------------------------------------------------------------------------------------------------------------------
1 | T 2009-06-07 02:07:31
2 | U http://twitter.com/cyberplumber
3 | W SPC Severe Thunderstorm Watch 339: WW 339 SEVERE TSTM KS NE 070200Z - 070800Z URGENT - IMMEDIATE BROADCAST REQUE.. http://tinyurl.com/5th9sw
4 | T 2009-06-07 02:07:41
5 | U http://twitter.com/cyberplumber
6 | W SPC Severe Thunderstorm Watch 339: WW 339 SEVERE TSTM KS NE 070200Z - 070800Z URGENT - IMMEDIATE BROADCAST REQUE.. http://tinyurl.com/5th9sw
7 | T 2009-06-07 02:07:51
8 | U http://twitter.com/cyberplumber
9 | W SPC Severe Thunderstorm Watch 339: WW 339 SEVERE TSTM KS NE 070200Z - 070800Z URGENT - IMMEDIATE BROADCAST REQUE.. http://tinyurl.com/5th9sw
(9 rows)
id | stamp | zurl | ztxt
----+---------------------+---------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------
1 | 2009-06-07 02:07:31 | http://twitter.com/cyberplumber | SPC Severe Thunderstorm Watch 339: WW 339 SEVERE TSTM KS NE 070200Z - 070800Z URGENT - IMMEDIATE BROADCAST REQUE.. http://tinyurl.com/5th9sw
2 | 2009-06-07 02:07:41 | http://twitter.com/cyberplumber | SPC Severe Thunderstorm Watch 339: WW 339 SEVERE TSTM KS NE 070200Z - 070800Z URGENT - IMMEDIATE BROADCAST REQUE.. http://tinyurl.com/5th9sw
3 | 2009-06-07 02:07:51 | http://twitter.com/cyberplumber | SPC Severe Thunderstorm Watch 339: WW 339 SEVERE TSTM KS NE 070200Z - 070800Z URGENT - IMMEDIATE BROADCAST REQUE.. http://tinyurl.com/5th9sw
(3 rows)
Try doing the following:
Firstly, create an appropriate table in SQL like so:
CREATE TABLE tweet(
ts timestamp, -- if inserting the values as timestamps gives errors, change to 'TEXT'
url TEXT, -- There smarted UDTs for URL available too
message TEXT
);
Then go on and try to run a standard COPY statement, something like the following:
COPY tweet
FROM E'c:\\\\my dir\\\filename' -- path of the file using the magic E for escaped text with double backslash between directory names fro Windows
FORMAT text; -- The default delimiter for format text is a tab
Finally, pray that you'll have enough memory and log spaces for > 4GB of files. For more info regarding the COPY command, see http://www.postgresql.org/docs/9.2/static/sql-copy.html

PostgreSQL: strange collision of ORDER BY and LIMIT/OFFSET

I'm trying to do this in PostgreSQL 9.1:
SELECT m.id, vm.id, vm.value
FROM m
LEFT JOIN vm ON vm.m_id = m.id and vm.variation_id = 1
ORDER BY lower(trim(vm.value)) COLLATE "C" ASC LIMIT 10 OFFSET 120
The result is:
id | id | value
----+-----+---------------
504 | 511 | "andr-223322"
506 | 513 | "andr-322223"
824 | 831 | "angHybrid"
866 | 873 | "Another thing"
493 | 500 | "App update required!"
837 | 844 | "App update required!"
471 | 478 | "April"
905 | 912 | "Are you sure you want to delete this thing?"
25 | 29 | "Assignment"
196 | 201 | "AT ADDRESS"
Ok, let's execute the same query with OFFSET 130:
id | id | value
----+-----+---------------
196 | 201 | "AT ADDRESS"
256 | 261 | "Att Angle"
190 | 195 | "Att Angle"
273 | 278 | "Att Angle:"
830 | 837 | "attAngle"
475 | 482 | "August"
710 | 717 | "Averages"
411 | 416 | "AVG"
692 | 699 | "AVG SHAPE"
410 | 415 | "AVGs"
and we see our AT ADDRESS item again, but at the beginning!!!
The fact is that the vm table contains two following items:
id | m_id | value
----+------+---------------
201 | 196 | "AT ADDRESS"
599 | 592 | "At Address"
I cure this situation with a workaround:
(lower(trim(vm.value)) || vm.id)
but What The Hell ???!!!
Why do I have to use a workaround?
Swearing won't change the SQL standard that defines this behavior.
The order of rows is undefined unless specified in ORDER BY. The manual:
If sorting is not chosen, the rows will be returned in an unspecified
order. The actual order in that case will depend on the scan and join
plan types and the order on disk, but it must not be relied on. A
particular output ordering can only be guaranteed if the sort step is explicitly chosen.
Since you didn't define an order for these two peers (in your sort order):
id | m_id | value
----+------+---------------
201 | 196 | "AT ADDRESS"
599 | 592 | "At Address"
.. you get arbitrary ordering - whatever is convenient for Postgres. A query with LIMIT often uses a different query plan, which can explain different results.
Fix
ORDER BY lower(trim(vm.value)) COLLATE "C", vm.id;
Or (maybe more meaningful - possibly also tune to existing indexes):
ORDER BY lower(trim(vm.value)) COLLATE "C", vm.value, vm.id;
(This is unrelated to the use of COLLATE "C" here, btw.)
Don't concatenate for this purpose, that's much more expensive and potentially makes it impossible to use an index (unless you have an index on that precise expression). Add another expression that kicks in when prior expressions in the ORDER BY list leave ambiguity.
Also, since you have a LEFT JOIN there, rows in m without match in vm have null values for all current ORDER BY expressions. They come last and are sorted arbitrarily otherwise. If you want a stable sort order overall, you need to deal with that, too. Like:
ORDER BY lower(trim(vm.value)) COLLATE "C", vm.id, m.id;
Asides
Why store the double quotes? Seems to be costly noise. You might be better off without them. You can always add the quotes on output if need be.
Many clients cannot deal with the same column name multiple times in one result. You need a column alias for at least one of your id columns: SELECT m.id AS m_id, vm.id AS vm_id .... Goes to show why "id" for a column is an anti-pattern to begin with.