Related
I'm new to SPARK-SQL. Is there an equivalent to "CASE WHEN 'CONDITION' THEN 0 ELSE 1 END" in SPARK SQL ?
select case when 1=1 then 1 else 0 end from table
Thanks
Sridhar
Before Spark 1.2.0
The supported syntax (which I just tried out on Spark 1.0.2) seems to be
SELECT IF(1=1, 1, 0) FROM table
This recent thread http://apache-spark-user-list.1001560.n3.nabble.com/Supported-SQL-syntax-in-Spark-SQL-td9538.html links to the SQL parser source, which may or may not help depending on your comfort with Scala. At the very least the list of keywords starting (at time of writing) on line 70 should help.
Here's the direct link to the source for convenience: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala.
Update for Spark 1.2.0 and beyond
As of Spark 1.2.0, the more traditional syntax is supported, in response to SPARK-3813: search for "CASE WHEN" in the test source. For example:
SELECT CASE WHEN key = 1 THEN 1 ELSE 2 END FROM testData
Update for most recent place to figure out syntax from the SQL Parser
The parser source can now be found here.
Update for more complex examples
In response to a question below, the modern syntax supports complex Boolean conditions.
SELECT
CASE WHEN id = 1 OR id = 2 THEN "OneOrTwo" ELSE "NotOneOrTwo" END AS IdRedux
FROM customer
You can involve multiple columns in the condition.
SELECT
CASE WHEN id = 1 OR state = 'MA'
THEN "OneOrMA"
ELSE "NotOneOrMA" END AS IdRedux
FROM customer
You can also nest CASE WHEN THEN expression.
SELECT
CASE WHEN id = 1
THEN "OneOrMA"
ELSE
CASE WHEN state = 'MA' THEN "OneOrMA" ELSE "NotOneOrMA" END
END AS IdRedux
FROM customer
For Spark 2.+
Spark when function
From documentation:
Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.
// Example: encoding gender string column into integer.
// Scala:
people.select(when(col("gender") === "male", 0)
.when(col("gender") === "female", 1)
.otherwise(2))
// Java:
people.select(when(col("gender").equalTo("male"), 0)
.when(col("gender").equalTo("female"), 1)
.otherwise(2))
This syntax worked for me in Databricks:
select
org,
patient_id,
case
when (age is null) then 'Not Available'
when (age < 15) then 'Less than 15'
when (age >= 15 and age < 25) then '15 to 25'
when (age >= 25 and age < 35) then '25 to 35'
when (age >= 35 and age < 45) then '35 to 45'
when (age >= 45) then '45 and Older'
end as age_range
from demo
The decode() function analog of Oracle SQL for SQL Spark can be implemented as follows:
case
when exp1 in ('a','b','c')
then element_at(map('a','A','b','B','c','C'), exp1)
else exp1
end
Based on my current production code, this works
val identifierDF =
tempIdentifierDF.select(tempIdentifierDF("t_item_account_id"),
when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_cusip")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_ticker")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_isin")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_sedol")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_valoren")),100)
.otherwise(0)
.alias("identifier_in_description_score")
)
Spark DataFrame API (Python version) also enable to do next query:
df.selectExpr('time', \
'CASE WHEN (time > 1) THAN time * 1.1 ELSE time END AS updated_time')
I have a stored procedure into which I pass a number of variables and I create a dynamic where clause using the "AND (#var IS NULL OR table.field = #var)" and it works great... EXCEPT
One of the variables (#statusLevel) can be either 1-9 or 10. When the value passed in is 10, I want to return ONLY those projects that have a statusLevel = 10. When the value passed in is between 1 & 9, I want to return all of the projects between that value (let's say '5') and less than 10.
I've got each part working perfectly independently but I'm lost on how to get them to work together.
AND ((#statusLevelID IS NULL OR project.statusLevelID >= #statusLevelID) AND (#statusLevelID IS NULL OR project.statusLevelID < 10))
AND (#statusLevelID IS NULL OR project.statusLevelID = 10)
Using an "OR" just gives me ALL of the projects.
AND ((#statusLevelID IS NULL OR project.statusLevelID >= #statusLevelID) AND (#statusLevelID IS NULL OR project.statusLevelID < 10)
OR (#statusLevelID IS NULL OR project.statusLevelID = 10))
I was thinking a CASE statement might work here but I'm not exactly sure how to implement that.
Any assistance is greatly appreciated. Thanks in advance.
You want one boolean expression connected by ORs:
AND ( (#statusLevelID IS NULL) OR
(#statusLevelID = project.statusLevelID) OR
(#statusLevelID <> 10 AND project.statusLevelID >= #statusLevelID AND project.statusLevelID < 10)
)
You can write the condition like so:
AND (
#statusLevelID IS NULL
OR
project.statusLevelID BETWEEN #statusLevelID AND IIF(#statusLevelID = 10, 10, 9)
)
I am working a fix to one sql query to include a where clause with less than equal and greater than equal in a case statement but whenever I try to run I get the same error message saying the <= token was not valid. I have tried the symbols as both html code and wrapped in CDATA brackets both give the same error. The mybatis queue is below.
WHERE A.WDATE BETWEEN #{fdate} AND #{tdate}
AND A.FAC LIKE #{fac}
<if test = 'good != "%"'>
AND SUBSTR(A.ITDSC,21,4) NOT LIKE #{good}
</if>
<if test = 'size != ""'>
AND CASE WHEN #{size} = '16' THEN SUBSTR(A.ITDSC,13,2) <= '16' ELSE SUBSTR(A.ITDSC,13,2) >= '17' END
</if>
Escaping <= and >= with <= and >= looks OK and works for me.
But that part of the where-clause
AND
CASE
WHEN #{size} = '16' THEN SUBSTR(A.ITDSC,13,2) <= '16'
ELSE SUBSTR(A.ITDSC,13,2) >= '17'
END
looks suspicious.
In DB2 the CASE has to be part of an expression, e.g.
WHERE xyz >= CASE WHEN ... THEN ... END
or
WHERE
(CASE WHEN ... THEN ... ELSE ... END) >= xyz
See case-expressions for more info and examples.
Try this, maybe work;)
WHERE A.WDATE BETWEEN #{fdate} AND #{tdate}
AND A.FAC LIKE #{fac}
<if test = 'good != "%"'>
AND SUBSTR(A.ITDSC,21,4) NOT LIKE #{good}
</if>
<if test = 'size != ""'>
AND ((#{size} = '16' AND SUBSTR(A.ITDSC,13,2) <= '16') OR SUBSTR(A.ITDSC,13,2) >= '17')
</if>
I was able to solve the problem by using two IF statements for size one for size =='16' and the other for size=='17'
I'm trying to figure out how to make this formula work for my SQL Server 2012, and it has me completely stumped. In the first case statement where I am trying to set the Dateupdated column, on the middle line, I have it as
WHEN standardunitcost > (averageunitcost + 2.000000) THEN GETDATE()
I need to add something extra in there to make sure that the (Averageunitcost + 2.000000) is greater than 22. When I try to set it up as ((Averageunitcost + 2.000000) > 22.000000) it is incompatible.
Can someone explain to me why I cannot do it the way I am currently trying, and how to make this work properly? Also, I'm sorry if this is in the incorrect place, or has been asked before, but I'm not really sure what to search for to solve this!
UPDATE [mas_wgd].[dbo].[CI_Item]
SET dateupdated = CASE
WHEN StandardUnitCost < AverageUnitCost THEN GETDATE()
WHEN standardunitcost > (AverageUnitCost + 2.000000) THEN GETDATE()
WHEN StandardUnitCost < 22.000000 THEN GETDATE()
ELSE dateupdated
END
WHERE ProductLine IN ('A010', 'A020', 'A030', 'A040', 'A050', 'A060', 'A070', 'A080', 'A090', 'A100', 'A110', 'A120', 'A130', 'A130', 'A140', 'A150', 'A200', 'A250', 'A300', 'A350', 'A400', 'A450', 'A500', 'A550', 'A600', 'AGNC', 'C010', 'C020', 'C030', 'C040', 'C050', 'C060', 'C070', 'C080', 'C090', 'C100', 'C110', 'C120', 'C130', 'C130', 'C140', 'C150', 'C200', 'C250', 'C300', 'C350', 'C400', 'C450', 'C500', 'C550', 'C600', 'CGNC')
UPDATE [mas_wgd].[dbo].[CI_Item]
SET Standardunitcost = CASE
WHEN (AverageUnitCost between 0.010000 and 22.000000) THEN 22.00000
WHEN AverageUnitCost > 22.000000 THEN AverageUnitCost + 2.000000
ELSE StandardUnitCost
END
WHERE ProductLine IN ('A010', 'A020', 'A030', 'A040', 'A050', 'A060', 'A070', 'A080', 'A090', 'A100', 'A110', 'A120', 'A130', 'A130', 'A140', 'A150', 'A200', 'A250', 'A300', 'A350', 'A400', 'A450', 'A500', 'A550', 'A600', 'AGNC', 'C010', 'C020', 'C030', 'C040', 'C050', 'C060', 'C070', 'C080', 'C090', 'C100', 'C110', 'C120', 'C130', 'C130', 'C140', 'C150', 'C200', 'C250', 'C300', 'C350', 'C400', 'C450', 'C500', 'C550', 'C600', 'CGNC')
The exact syntax that is not working has not been posted, but is implied as being:
WHEN standardunitcost > ((Averageunitcost + 2.000000) > 22.000000) THEN GETDATE()
You cannot do compound comparison operations in SQL like you can in many languages. These need to be broken out into individual operations combined with the logical operator AND.
CASE
WHEN StandardUnitCost < AverageUnitCost THEN GETDATE()
WHEN (AverageUnitCost + 2.0) > 22
AND StandardUnitCost > (AverageUnitCost + 2.0) THEN GETDATE()
WHEN StandardUnitCost < 22.0 THEN GETDATE()
ELSE DateUpdated
END
But what about when [StandardUnitCost] = 22.0? Currently if StandardUnitCost is not "<" AverageUnitCost (the first condition), then this logic is only checking for > 22.000002 (or something like that) and < 22.0. It just seems like at least one of those comparisons needs to be either <= or >=.
I'm new to SPARK-SQL. Is there an equivalent to "CASE WHEN 'CONDITION' THEN 0 ELSE 1 END" in SPARK SQL ?
select case when 1=1 then 1 else 0 end from table
Thanks
Sridhar
Before Spark 1.2.0
The supported syntax (which I just tried out on Spark 1.0.2) seems to be
SELECT IF(1=1, 1, 0) FROM table
This recent thread http://apache-spark-user-list.1001560.n3.nabble.com/Supported-SQL-syntax-in-Spark-SQL-td9538.html links to the SQL parser source, which may or may not help depending on your comfort with Scala. At the very least the list of keywords starting (at time of writing) on line 70 should help.
Here's the direct link to the source for convenience: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala.
Update for Spark 1.2.0 and beyond
As of Spark 1.2.0, the more traditional syntax is supported, in response to SPARK-3813: search for "CASE WHEN" in the test source. For example:
SELECT CASE WHEN key = 1 THEN 1 ELSE 2 END FROM testData
Update for most recent place to figure out syntax from the SQL Parser
The parser source can now be found here.
Update for more complex examples
In response to a question below, the modern syntax supports complex Boolean conditions.
SELECT
CASE WHEN id = 1 OR id = 2 THEN "OneOrTwo" ELSE "NotOneOrTwo" END AS IdRedux
FROM customer
You can involve multiple columns in the condition.
SELECT
CASE WHEN id = 1 OR state = 'MA'
THEN "OneOrMA"
ELSE "NotOneOrMA" END AS IdRedux
FROM customer
You can also nest CASE WHEN THEN expression.
SELECT
CASE WHEN id = 1
THEN "OneOrMA"
ELSE
CASE WHEN state = 'MA' THEN "OneOrMA" ELSE "NotOneOrMA" END
END AS IdRedux
FROM customer
For Spark 2.+
Spark when function
From documentation:
Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.
// Example: encoding gender string column into integer.
// Scala:
people.select(when(col("gender") === "male", 0)
.when(col("gender") === "female", 1)
.otherwise(2))
// Java:
people.select(when(col("gender").equalTo("male"), 0)
.when(col("gender").equalTo("female"), 1)
.otherwise(2))
This syntax worked for me in Databricks:
select
org,
patient_id,
case
when (age is null) then 'Not Available'
when (age < 15) then 'Less than 15'
when (age >= 15 and age < 25) then '15 to 25'
when (age >= 25 and age < 35) then '25 to 35'
when (age >= 35 and age < 45) then '35 to 45'
when (age >= 45) then '45 and Older'
end as age_range
from demo
The decode() function analog of Oracle SQL for SQL Spark can be implemented as follows:
case
when exp1 in ('a','b','c')
then element_at(map('a','A','b','B','c','C'), exp1)
else exp1
end
Based on my current production code, this works
val identifierDF =
tempIdentifierDF.select(tempIdentifierDF("t_item_account_id"),
when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_cusip")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_ticker")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_isin")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_sedol")),100)
.when(tempIdentifierDF("h_description").contains(tempIdentifierDF("t_valoren")),100)
.otherwise(0)
.alias("identifier_in_description_score")
)
Spark DataFrame API (Python version) also enable to do next query:
df.selectExpr('time', \
'CASE WHEN (time > 1) THAN time * 1.1 ELSE time END AS updated_time')