Spark sql parse exception [duplicate] - sql

I'm new to SPARK-SQL. Is there an equivalent to "CASE WHEN 'CONDITION' THEN 0 ELSE 1 END" in SPARK SQL ?
select case when 1=1 then 1 else 0 end from table
Thanks
Sridhar

Before Spark 1.2.0
The supported syntax (which I just tried out on Spark 1.0.2) seems to be
SELECT IF(1=1, 1, 0) FROM table
This recent thread http://apache-spark-user-list.1001560.n3.nabble.com/Supported-SQL-syntax-in-Spark-SQL-td9538.html links to the SQL parser source, which may or may not help depending on your comfort with Scala. At the very least the list of keywords starting (at time of writing) on line 70 should help.
Here's the direct link to the source for convenience: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala.
Update for Spark 1.2.0 and beyond
As of Spark 1.2.0, the more traditional syntax is supported, in response to SPARK-3813: search for "CASE WHEN" in the test source. For example:
SELECT CASE WHEN key = 1 THEN 1 ELSE 2 END FROM testData
Update for most recent place to figure out syntax from the SQL Parser
The parser source can now be found here.
Update for more complex examples
In response to a question below, the modern syntax supports complex Boolean conditions.
SELECT
CASE WHEN id = 1 OR id = 2 THEN "OneOrTwo" ELSE "NotOneOrTwo" END AS IdRedux
FROM customer
You can involve multiple columns in the condition.
SELECT
CASE WHEN id = 1 OR state = 'MA'
THEN "OneOrMA"
ELSE "NotOneOrMA" END AS IdRedux
FROM customer
You can also nest CASE WHEN THEN expression.
SELECT
CASE WHEN id = 1
THEN "OneOrMA"
ELSE
CASE WHEN state = 'MA' THEN "OneOrMA" ELSE "NotOneOrMA" END
END AS IdRedux
FROM customer

For Spark 2.+
Spark when function
From documentation:
Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.
// Example: encoding gender string column into integer.
// Scala:
people.select(when(col("gender") === "male", 0)
.when(col("gender") === "female", 1)
.otherwise(2))
// Java:
people.select(when(col("gender").equalTo("male"), 0)
.when(col("gender").equalTo("female"), 1)
.otherwise(2))

This syntax worked for me in Databricks:
select
org,
patient_id,
case
when (age is null) then 'Not Available'
when (age < 15) then 'Less than 15'
when (age >= 15 and age < 25) then '15 to 25'
when (age >= 25 and age < 35) then '25 to 35'
when (age >= 35 and age < 45) then '35 to 45'
when (age >= 45) then '45 and Older'
end as age_range
from demo

The decode() function analog of Oracle SQL for SQL Spark can be implemented as follows:
​ case
​ ​ ​ when exp1 in ('a','b','c')
​ ​ ​ ​ then element_at(map('a','A','b','B','c','C'), exp1)
​ ​ ​ else exp1
​ ​ end

Based on my current production code, this works
val identifierDF =
tempIdentifierDF.select(tempIdentifierDF("t_item_account_id"),
when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_cusip")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_ticker")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_isin")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_sedol")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_valoren")),100)
.otherwise(0)
.alias("identifier_in_description_score")
)

Spark DataFrame API (Python version) also enable to do next query:
df.selectExpr('time', \
'CASE WHEN (time > 1) THAN time * 1.1 ELSE time END AS updated_time')

Related

Case expression with Boolean from PostgreSQL to SQL Server

I am translating a query from PostgreSQL to SQL Server. I didn't write the query in PostgreSQL and it's quite complicated for my knowledge so i don't understand every piece of it.
From my understand: we are trying to find the max version from p_policy and when insurancestatus = 7 or 14 / transactiontype = CAN, we compare two dates (whose format are BIG INT).
This is the PG Query:
SELECT *
FROM BLABLABLA
WHERE
pol.vnumber = (
SELECT MAX(pol1.vnumber)
FROM p_policy pol1
AND ( CASE WHEN pol1.insurancestatus IN (7,14)
or pol1.transactiontype IN ('CAN')
-- ('CAN','RCA')
THEN pol1.veffectivedate = pol1.vexpirydate
ELSE pol1.veffectivedate <> pol1.vexpirydate
END
)
AND pol1.vrecordstatus NOT IN (30,254)
etc.
I am used to have a where statement where I compare it to a value. I understand here from the Case statement we will have a boolean, but still that must be compared to something?
Anyway the main purpose is to make it work in SQL, but I believe SQL can't read a CASE statement where THEN is a comparison.
This is what I tried:
SELECT *
FROM BLABLABLA
WHERE pol.vnumber =
(
SELECT MAX(pol1.vnumber)
FROM p_policy pol1
WHERE sbuid = 4019
AND ( CASE WHEN pol1.insurancestatus IN (7,14)
or pol1.transactiontype IN ('CAN')
THEN CASE
WHEN pol1.veffectivedate = pol1.vexpirydate THEN 1
WHEN pol1.veffectivedate <> pol1.vexpirydate THEN 0
END
END
)
AND pol1.vrecordstatus NOT IN (30,254)
etc.
And then I get this error from SQL Server (which directly the last line of the current code - so after the double case statement)
Msg 4145, Level 15, State 1, Line 55
An expression of non-boolean type specified in a context where a condition is expected, near 'AND'.
Thank you !Let me know if it is not clear
I think you want boolean logic. The CASE expression would translate as:
(
(
(pol1.insurancestatus IN (7,14) OR pol1.transactiontype = 'CAN')
AND pol1.veffectivedate = pol1.vexpirydate
) OR (
NOT (pol1.insurancestatus IN (7,14) OR pol1.transactiontype = 'CAN')
AND pol1.veffectivedate <> pol1.vexpirydate
)
)
There are 2 main issues with your snippet, SQL Server-syntax-wise.
SELECT * FROM BLABLABLA WHERE
pol.vnumber = /* PROBLEM 1: we haven't defined pol yet; SQL Server has no idea what pol.vnumber is here, so you're going to get an error when you resolve your boolean issue */
(
SELECT MAX(pol1.vnumber)
FROM p_policy pol1
WHERE sbuid = 4019
AND ( CASE WHEN pol1.insurancestatus IN (7,14)
or pol1.transactiontype IN ('CAN')
THEN CASE
WHEN pol1.veffectivedate = pol1.vexpirydate THEN 1
WHEN pol1.veffectivedate <> pol1.vexpirydate THEN 0
END
END
) /* PROBLEM 2: Your case statement returns a 1 or a 0..
which means your WHERE is saying
WHERE sbuid = 4019
AND (1)
AND pol1.vrecordstatus NOT IN (30,254)
SQL Doesn't like that. I think you meant to add a boolean operation using your 1 or 0 after the parenthesis.
like this: */
= 1
AND pol1.vrecordstatus NOT IN (30,254)

Oracle PLSQL using a dynamic variable in a where clause

For testing in Toad, I have the following code
select ...
from ...
where term_code = :termcode AND
(
case
when :theSubject is not null then SUBJ_CODE = :theSubject
else 1 = 1
end
)
AND ptrm_code <> 8
In short: If theSubject is not entered (is null) I want to display all the courses, otherwise I want to display only those where subject_code is the same as the one entered in the variable window in Toad.
But I get an error:
[Error] Execution (77: 68): ORA-00905: missing keyword
in here:
when :theCourse is not null then sect.SSBSECT_SUBJ_CODE = theCourse
You can use boolean logic:
where
term_code = :termcode
and (:theSubject is null or subj_code = :theSubject)

Difference between querying from Impala and querying from Hive?

I have a Hive source table which contains:
select count(*) from dev_lkr_send.pz_send_param_ano;
--25283 lines
I am trying to get all of the table lines and put them into a dataframe using Spark2-Scala. I did the following:
val dfMet = spark.sql(s"""SELECT
CD_ANOMALIE,
CD_FAMILLE,
libelle AS LIB_ANOMALIE,
to_date(substr(MAJ_DATE, 1, 19), 'YYYY-MM-DD HH24:MI:SS') AS DT_MAJ,
CLASSIFICATION,
NB_REJEUX,
case when indic_cd_erreur = 'O' then 1 else 0 end AS TOP_INDIC_CD_ERREUR,
case when invalidation_coordonnee = 'O' then 1 else 0 end AS TOP_COORDONNEE_INVALIDE,
case when typ_mvt = 'S' then 1 else 0 end AS TOP_SUPP,
case when typ_mvt = 'S' then to_date(substr(dt_capt, 1, 19), 'YYYY-MM-DD HH24:MI:SS') else null end AS DT_SUPP
FROM ${use_database}.pz_send_param_ano""")
When I execute dfMet.count() it returns: 46314
Any ideas about the source of the difference?
EDIT1:
Trying the same query from Hive returns the same value as in the dataframe (I was querying from Impala UI before).
Someone can explain the difference please? I am working on Hue4.
A potential source of difference is your Hive query is returning the result from the metastore which is out of date rather than running a fresh count against the table.
If you have hive.compute.query.using.stats set to true and the table has stats computed then it will be returning the result from the metastore. If this is the case then it could be your stats are out of date and you need to recompute them.

Denodo - Unable to case DATE>= addday(cast(now() as date),-365)

I've encountered an issue when trying to obtain the following output:
"If x_date >= now-365 then 1 else 0"
My select statement reads:
SELECT
id,
x_date,
CASE x_date
WHEN x_date >= addday(cast(now() as date),-365) then 1
else 0
end as output
I'm receiving an error message that reads:
"SQL Error [30100] [HY000]: CASE argument case((xdate,ge,[addday(trunc(cast('date', now(), 'DATE')) '-365')], utc_il8n), 'true', 'false') is not compatible with the rest of the values.
Has anyone else performed a similar operation with dates in a CASE statement? The Addday works fine and returns 2017-01-05.
Issue with the CASE syntax. I think reading this and other sources may have cause the confusion: https://www.techonthenet.com/sql_server/functions/case.php
Should read:
SELECT
id,
x_date,
CASE WHEN x_date >= addday(cast(now() as date),-365) then 1
else 0
end as output

SPARK SQL - case when then

I'm new to SPARK-SQL. Is there an equivalent to "CASE WHEN 'CONDITION' THEN 0 ELSE 1 END" in SPARK SQL ?
select case when 1=1 then 1 else 0 end from table
Thanks
Sridhar
Before Spark 1.2.0
The supported syntax (which I just tried out on Spark 1.0.2) seems to be
SELECT IF(1=1, 1, 0) FROM table
This recent thread http://apache-spark-user-list.1001560.n3.nabble.com/Supported-SQL-syntax-in-Spark-SQL-td9538.html links to the SQL parser source, which may or may not help depending on your comfort with Scala. At the very least the list of keywords starting (at time of writing) on line 70 should help.
Here's the direct link to the source for convenience: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala.
Update for Spark 1.2.0 and beyond
As of Spark 1.2.0, the more traditional syntax is supported, in response to SPARK-3813: search for "CASE WHEN" in the test source. For example:
SELECT CASE WHEN key = 1 THEN 1 ELSE 2 END FROM testData
Update for most recent place to figure out syntax from the SQL Parser
The parser source can now be found here.
Update for more complex examples
In response to a question below, the modern syntax supports complex Boolean conditions.
SELECT
CASE WHEN id = 1 OR id = 2 THEN "OneOrTwo" ELSE "NotOneOrTwo" END AS IdRedux
FROM customer
You can involve multiple columns in the condition.
SELECT
CASE WHEN id = 1 OR state = 'MA'
THEN "OneOrMA"
ELSE "NotOneOrMA" END AS IdRedux
FROM customer
You can also nest CASE WHEN THEN expression.
SELECT
CASE WHEN id = 1
THEN "OneOrMA"
ELSE
CASE WHEN state = 'MA' THEN "OneOrMA" ELSE "NotOneOrMA" END
END AS IdRedux
FROM customer
For Spark 2.+
Spark when function
From documentation:
Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.
// Example: encoding gender string column into integer.
// Scala:
people.select(when(col("gender") === "male", 0)
.when(col("gender") === "female", 1)
.otherwise(2))
// Java:
people.select(when(col("gender").equalTo("male"), 0)
.when(col("gender").equalTo("female"), 1)
.otherwise(2))
This syntax worked for me in Databricks:
select
org,
patient_id,
case
when (age is null) then 'Not Available'
when (age < 15) then 'Less than 15'
when (age >= 15 and age < 25) then '15 to 25'
when (age >= 25 and age < 35) then '25 to 35'
when (age >= 35 and age < 45) then '35 to 45'
when (age >= 45) then '45 and Older'
end as age_range
from demo
The decode() function analog of Oracle SQL for SQL Spark can be implemented as follows:
​ case
​ ​ ​ when exp1 in ('a','b','c')
​ ​ ​ ​ then element_at(map('a','A','b','B','c','C'), exp1)
​ ​ ​ else exp1
​ ​ end
Based on my current production code, this works
val identifierDF =
tempIdentifierDF.select(tempIdentifierDF("t_item_account_id"),
when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_cusip")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_ticker")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_isin")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_sedol")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_valoren")),100)
.otherwise(0)
.alias("identifier_in_description_score")
)
Spark DataFrame API (Python version) also enable to do next query:
df.selectExpr('time', \
'CASE WHEN (time > 1) THAN time * 1.1 ELSE time END AS updated_time')