Apache Calcite Fails to Parse Its Own Google BigQuery Output - google-bigquery

I'm new to Apache Calcite and am running into a strange "gap" - given a simple select query:
select * from orders
I parse it using:
SqlParser.Config sqlParserConfig = SqlParser
.configBuilder()
.setConformance(SqlConformanceEnum.LENIENT)
.build();
SqlParser parser = SqlParser.create(sqlQuery, sqlParserConfig);
SqlNode parseAst = parser.parseQuery();
CalciteCatalogReader catalogReader = buildCatalogReader(schema, typeFactory);
SqlValidator.Config validatorConf = SqlValidator.Config.DEFAULT.withSqlConformance(SqlConformanceEnum.BIG_QUERY);
validatorConf = validatorConf.withIdentifierExpansion(true);
SqlValidator validator = SqlValidatorUtil.newValidator(SqlStdOperatorTable.instance(),
catalogReader, typeFactory,
validatorConf)
validator.validate(parseAst).toSqlString(BigQuerySqlDialect.DEFAULT).toString();
Which results in:
SELECT ORDERS.o_orderkey, ORDERS.o_custkey, ORDERS.o_orderstatus, ORDERS.o_totalprice, ORDERS.`o_order date`, ORDERS.o_orderpriority, ORDERS.o_clerk, ORDERS.o_shippriority, ORDERS.o_comment
FROM ORDERS AS ORDERS
(reasonable, note the quoted identifier for `o_order date`, necessary due to whitespace in columnname)
If I then take that query string and pass it back through (setting conformance to SqlConformanceEnum.BIG_QUERY) the parse fails with:
org.apache.calcite.sql.parser.SqlParseException: Lexical error at line 1, column 95. Encountered: "`" (96), after : ""
Puzzling on face, I tried again with a parse config:
SqlParser.Config sqlParserConfig = SqlParser
.configBuilder()
.setConformance(getConformance(SqlConformanceEnum.BIG_QUERY))
.setQuoting(Quoting.BACK_TICK_BACKSLASH)
.build();
to force handling backtick-quoted identifiers and I up with:
SELECT ORDERS.o_orderkey AS O_ORDERKEY, ORDERS.o_custkey AS O_CUSTKEY, ORDERS.o_orderstatus AS O_ORDERSTATUS, ORDERS.o_totalprice AS O_TOTALPRICE, ORDERS.`o_order date`, ORDERS.o_orderpriority AS O_ORDERPRIORITY, ORDERS.o_clerk AS O_CLERK, ORDERS.o_shippriority AS O_SHIPPRIORITY, ORDERS.o_comment AS O_COMMENT
FROM ORDERS AS ORDERS
which is usable... but
why is the "default" conformance for BigQuery setting DOUBLE_QUOTE when bigquery uses backticks
why does loop-parsing lead to a different query than on the input? (running the aliased query back through a second time gets itself back, so it does stabilize, but the initial inconsistency is weird

Related

I've performed a JOIN using bigrquery and the dbGetQuery function. Now I'd like to query the temporary table I've created but can't connect

I'm afraid that if a bunch of folks start running my actual code I'll be billed for the queries so my example code is for a fake database.
I've successfully established my connection to BigQuery:
con <- dbConnect(
bigrquery::bigquery(),
project = 'myproject',
dataset = 'dataset',
billing = 'myproject'
)
Then performed a LEFT JOIN using the coalesce function:
dbGetQuery(con,
"SELECT
`myproject.dataset.table_1x`.Pokemon,
coalesce(`myproject.dataset.table_1`.Type_1,`myproject.dataset.table_2`.Type_1) AS Type_1,
coalesce(`myproject.dataset.table_1`.Type_2,`myproject.dataset.table_2`.Type_2) AS Type_2,
`myproject.dataset.table_1`.Total,
`myproject.dataset.table_1`.HP,
`myproject.dataset.table_1`.Attack,
`myproject.dataset.table_1`.Special_Attack,
`myproject.dataset.table_1`.Defense,
`myproject.dataset.table_1`.Special_Defense,
`myproject.dataset.table_1`.Speed,
FROM `myproject.dataset.table_1`
LEFT JOIN `myproject.dataset.table_2`
ON `myproject.dataset.table_1`.Pokemon = `myproject.dataset.table_2`.Pokemon
ORDER BY `myproject.dataset.table_1`.ID;")
The JOIN produced the table I intended and now I'd like to query that table but like...where is it? How do I connect? Can I save it locally so that I can start working my analysis in R? Even if I go to BigQuery, select the Project History tab, select the query I just ran in RStudio, and copy the Job ID for the temporary table, I still get the following error:
Error: Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Run `rlang::last_error()` to see where the error occurred.
And if I follow up:
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
1. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. DBI:::.local(conn, statement, ...)
5. bigrquery::dbSendQuery(conn, statement, ...)
6. bigrquery:::BigQueryResult(conn, statement, ...)
7. bigrquery::bq_job_wait(job, quiet = conn#quiet)
Run `rlang::last_trace()` to see the full context.
> rlang::last_trace()
<error/rlang_error>
Job 'poke-340100.job_y0IBocmd6Cpy-irYtNdLJ-mWS7I0.US' failed
x Syntax error: Unexpected string literal 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae' at [2:6] [invalidQuery]
Backtrace:
x
1. +-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
2. \-DBI::dbGetQuery(con, "SELECT *\nFROM 'poke-340100:US.bquxjob_7c3a7664_17ed44bb4ae'\nWHERE Type_1 IS NULL;")
3. \-DBI:::.local(conn, statement, ...)
4. +-DBI::dbSendQuery(conn, statement, ...)
5. \-bigrquery::dbSendQuery(conn, statement, ...)
6. \-bigrquery:::BigQueryResult(conn, statement, ...)
7. \-bigrquery::bq_job_wait(job, quiet = conn#quiet)
Can someone please explain? Is it just that I can't query a temporary table with the bigrquery package?
From looking at the documentation here and here, the problem might just be that you did not assign the results anywhere.
local_df = dbGetQuery(...
should take the results from your database query and copy them into local R memory. Take care as there is no check for the size of the results, so it is easy to run out of memory in when doing this.
You have tagged the question with dbplyr, but it looks like you are just using the DBI package. If you want to be writing R and have it translated to SQL, then you can do this using dbplyr. It would look something like this:
con <- dbConnect(...) # your connection details here
remote_tbl1 = tbl(con, from = "table_1")
remote_tbl2 = tbl(con, from = "table_2")
new_remote_tbl = remote_tbl1 %>%
left_join(remote_tbl2, by = "Pokemon", suffix = c("",".y")) %>%
mutate(Type_1 = coalesce(Type_1, Type_1.y),
Type_2 = coalesce(Type_2, Type_2.y)) %>%
select(ID, Pokemon, Type_1, Type_2, ...) %>% # list your return columns
arrange(ID)
When you use this approach, new_remote_tbl can be thought of as a new table in the database which you can query and manipulate further. (It is not actually a table - no data was saved to disc - but you can query it and interact with it as if it were and the database will produce it for you on demand).
There are some limitations of working with a remote table (the biggest is you are limited to commands that dbplyr can translate into SQL). When you want to copy the current remote table into local R memory, use collect:
local_df = remote_df %>%
collect()

Fetching attribute from JSON string with JSON_VAL cause "<attribute> is invalid in the used context" error

A proprietary third-party application stores JSON strings in it's database like this one:
{"state":"complete","timestamp":1614776473000}
I need the timestamp and found out that
DB2 offers JSON functions. Since it's stored as string in the PROF_VALUE column, I guess that converting with SYSTOOLS.JSON2BSON is required, before I can use JSON_VAL to fetch the timestamp:
SELECT SYSTOOLS.JSON_VAL(SYSTOOLS.JSON2BSON(PROF_VALUE), "timestamp", "f")
FROM EMPINST.PROFILE_EXTENSIONS ext
WHERE PROF_PROPERTY_ID = 'touchpointState'
This causes an error that timestamp is invalid in the used context ( SQLCODE=-206, SQLSTATE=42703, DRIVER=4.26.14). The same error is thown when I remove the JSON2BSON call like this
SELECT SYSTOOLS.JSON_VAL(PROF_VALUE, "timestamp", "f")
Also not working with the same error (different data-types):
SELECT SYSTOOLS.JSON_VAL(SYSTOOLS.JSON2BSON(PROF_VALUE), "state", "s:1000")
SELECT SYSTOOLS.JSON_VAL(PROF_VALUE) "state", "s:1000")
I don't understand this error. My syntax is like the documented JSON_VAL ( json-value , search-string , result-type) and it is the same like in the examples, where they show how to fetch the name field of an object.
I also played around a bit with JSON_TABLE to use raw input data for testing (instead of the database data), but it seems not suiteable for that.
SELECT *
FROM TABLE(SYSTOOLS.JSON_TABLE( SYSTOOLS.JSON2BSON('{"state":"complete","timestamp":1614776473000}'), 'state','s:32')) DATA
This gave me a table with one row: Type = 2 and Value = complete.
I had two problems in my query: First it seems that double quotes " are for object references. I wasn't aware that there is any difference, because in most databases I used yet, both single ' and double quotes " are equal.
The second problem is, that JSON_VAL needs to be called without SYSTOOLS, but the reference is still needed on SYSTOOLS.JSON2BSON(PROF_VALUE).
With those changes, the following query worked:
SELECT JSON_VAL(SYSTOOLS.JSON2BSON(PROF_VALUE), 'timestamp', 'f')
FROM EMPINST.PROFILE_EXTENSIONS ext
WHERE PROF_PROPERTY_ID = 'touchpointState'

PYTHON - Using double quotes in SQL constant

I have a SQL query entered into a constant. One of the fields that I need to put in my where clause is USER which is a key word. To run the query I put the keyword into double quotes.
I have tried all of the suggestions from here yet none seem to be working.
Here is what I have for my constant:
SELECT_USER_SECURITY = "SELECT * FROM USER_SECURITY_TRANSLATED WHERE \"USER\" = '{user}' and COMPANY = " \
"'company_number' and TYPE NOT IN (1, 4)"
I am not sure how to get this query to work from my constant.
I also tried wrapping the whole query in """. I am getting a key error on the USER.
SELECT_USER_SECURITY = """SELECT * FROM USER_SECURITY_TRANSLATED WHERE "USER" = '{user}' and
COMPANY = 'company_number' and TYPE NOT IN (1, 4)"""
Below is the error I am getting:
nose.proxy.KeyError: 'user'
So the triple quoted solution was the best one. The problem I was running into was I had not included the "user" key in my dictionary of params which formatted the query.

Multiple parameter values

I have a problem with BIRT when I try to pass multiple values from report parameter.
I'm using BIRT 2.6.2 and eclipse.
I'm trying to put multiple values from cascading parameter group last parameter "JDSuser". The parameter is allowed to have multiple values and I'm using list box.
In order to be able to do that I'm writing my sql query with where-in statement where I replace text with javascript. Otherwise BIRT sql can't get multiple values from report parameter.
My sql query is
select jamacomment.createdDate, jamacomment.scopeId,
jamacomment.commentText, jamacomment.documentId,
jamacomment.highlightQuote, jamacomment.organizationId,
jamacomment.userId,
organization.id, organization.name,
userbase.id, userbase.firstName, userbase.lastName,
userbase.organization, userbase.userName,
document.id, document.name, document.description,
user_role.userId, user_role.roleId,
role.id, role.name
from jamacomment jamacomment left join
userbase on userbase.id=jamacomment.userId
left join organization on
organization.id=jamacomment.organizationId
left join document on
document.id=jamacomment.documentId
left join user_role on
user_role.userId=userbase.id
right join role on
role.id=user_role.roleId
where jamacomment.scopeId=11
and role.name in ( 'sample grupa' )
and userbase.userName in ( 'sample' )
and my javascript code for that dataset on beforeOpen state is:
if( params["JDSuser"].value[0] != "(All Users)" ){
this.queryText=this.queryText.replaceAll('sample grupa', params["JDSgroup"]);
var users = params["JDSuser"];
//var userquery = "'";
var userquery = userquery + users.join("', '");
//userquery = userquery + "'";
this.queryText=this.queryText.replaceAll('sample', userquery);
}
I tryed many different quote variations, with this one I get no error messages, but if I choose 1 value, I get no data from database, but if I choose at least 2 values, I get the last chosen value data.
If I uncomment one of those additional quote script lines, then I get syntax error like this:
The following items have errors:
Table (id = 597):
+ An exception occurred during processing. Please see the following message for details: Failed to prepare the query execution for the
data set: Organization Cannot get the result set metadata.
org.eclipse.birt.report.data.oda.jdbc.JDBCException: SQL statement does not return a ResultSet object. SQL error #1:You have an error in
your SQL syntax; check the manual that corresponds to your MySQL
server version for the right syntax to use near 'rudolfs.sviklis',
'sample' )' at line 25 ;
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to
your MySQL server version for the right syntax to use near
'rudolfs.sviklis', 'sample' )' at line 25
Also, I should tell you that i'm doing this by looking from working example. Everything is the same, the previous code resulted to the same syntax error, I changed it to this script which does the same.
The example is available here:
http://developer.actuate.com/community/forum/index.php?/files/file/593-default-value-all-with-multi-select-parsmeter/
If someone could give me at least a clue to what I should do that would be great.
You should always use the value property of a parameter, i.e.:
var users = params["JDSuser"].value;
It is not necessary to surround "userquery" with quotes because these quotes are already put in the SQL query arround 'sample'. Furthermore there is a mistake because userquery is not yet defined at line:
var userquery = userquery + users.join("', '");
This might introduce a string such "null" in your query. Therefore remove all references to userquery variable, just use this expression at the end:
this.queryText=this.queryText.replaceAll('sample', users.join("','"));
Notice i removed the blank space in the join expression. Finally once it works finely, you probably need to make your report input more robust by testing if the value is null:
if( params["JDSuser"].value!=null && params["JDSuser"].value[0] != "(All Users)" ){
//Do stuff...
}

JSON Parser Has Issue With Sql Syntax But Query Works

To start, I had this error:
Error parsing data org.json.JSONException: Value You of type java.lang.String cannot be converted to JSONObject
After some searching, I found a potential solution using substring to see if there were just some phantom characters causing an issue: 'json.substring(3)'
After trying different substring amounts, I got to json.substring(36) and it finally showed me more than 5 letters at a time:
Error parsing data org.json.JSONException: Expected literal value at character 0 of ; check the manual that corresponds to your MySQL server version for the right syntax to use near 'AND from_user = 277976949048048 AND (status = 'pending' OR status = 'accepted'))' at line 2
Maybe this new 'expected literal' thing is caused by doing the substring 36, but either way, it seems like there is an issue with my SQL syntax even though I tested it directly with the server and it works perfectly. here is my sql query.
$result = mysql_query("SELECT status FROM users join requests on users.facebook_id=requests.from_user
WHERE (to_user = $id AND from_user = $fbid AND (status = 'pending' OR status = 'accepted'))
OR (from_user = $id AND to_user = $fbid AND (status = 'pending' OR status = 'accepted'))
LIMIT 1") or die(mysql_error());
Any help much appreciated because now I'm officially stumped.
I had some issues in the past with parsing JSON files including huge integer numbers.
The number 277976949048048 seems too large to be treated as an integer, I would suggest treating it as a string.