SQL Syntax for CASE command with multiple WHEN value - sql

TL/DR
Is it possible to use "IN" syntax after "WHEN" if the condition is at CASE level ?
My scenario :
I am writing a SQL CASE statement with multiple WHEN value validation.
The CASE condition is complex (and long) so i don't want to repeat it at WHEN level.
This works :
CASE
WHEN ( SELECT VALUE FROM Tab1 INNER JOIN Tab2 ON Tab1 ....very long statement) IN ('A','B','C') THEN 1
WHEN ( same very long statement as above) IN ('D','E','F') THEN 2
WHEN ( same very long statement as above) IN ... etc
END
I would like to make it more readable as this, but syntax below fails
CASE ( SELECT VALUE FROM Tab1 INNER JOIN Tab2 ON Tab1 ....very long statement)
WHEN IN ('A','B','C') THEN 1 -- fails syntax error
WHEN 'D' OR 'E' OR 'F' THEN 2 -- also fails syntax error
END
Of course i am trying to avoid listing all values with same outcome in different when
Syntax below works but very long list of values
CASE ( SELECT VALUE FROM Tab1 INNER JOIN Tab2 ON Tab1 ....very long statement)
WHEN 'A' THEN 1
WHEN 'B' THEN 1
WHEN 'C' THEN 1
WHEN 'D' THEN 2
WHEN 'E' THEN 2
WHEN 'F' THEN 2
....
END
What can SQL do for me there ?

Formulate the long query as you did, and CROSS JOIN the main query with it:
SELECT
base_query.other
, base_query.columns
, base_query.otherquery
, CASE
WHEN xcross.result IN ('A','B','C') THEN 1
WHEN xcross.result IN ('D','E','F') THEN 2
WHEN xcross.result IN ('G','H','I') THEN 3
ELSE NULL
END
FROM other_table ot
JOIN yet_other_table you on ot.join_col = yot.join_col
CROSS JOIN (
SELECT val AS result FROM Tab1 INNER JOIN Tab2 ON Tab1 ....very long statement
) AS xcross

Related

IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands

I am trying to write a query in azure databricks and I am getting the following error
"IN/EXISTS predicate sub-queries can only be used in Filter/Join and a few commands"
This is the code I am using.
SELECT id,
(CASE WHEN id in (SELECT id from aTable) THEN 1 ELSE 0 END) as a,
(CASE WHEN id in (SELECT id from bTable) THEN 1 ELSE 0 END) as b,
(CASE WHEN id in (SELECT id from cTable) THEN 1 ELSE 0 END) as c
FROM table
I read that sql doesn't let you do this because the case statements are evaluated row by row, and it wants to prevent you from doing a SELECT statement for each row evaluation. If that is the case, is there an alternative or workaround to accomplish this? Thanks
Databricks does not support subqueries using IN or EXISTS in CASE statements. As an alternative, consider outer joining each view to master table:
Query could be like then structure below:
select .....
case when a.id is not null then a
when b.id is not null then b
end as id
from Table_t t LEFT JOIN (select id from aTable ) a ON t.id=a.id LEFT JOIN(
select id from bTable) b ON t.id=b.id
..................
I tried to reproduce similar scenario and got same error:
Regardless of whether it is contained in a CASE WHEN, the IN operator utilising a subquery only functions in filters, not projections. If you explicitly supply values in the IN clause as opposed to using a subquery, it works just great.
To work around this, I tried left join to tables and then check for a null in the case statement.
This query might work
%sql
SELECT t.Id,
(CASE WHEN at.Id is not null THEN 1 ELSE 0 END) as a,
(CASE WHEN bt.Id is not null THEN 1 ELSE 0 END) as b,
(CASE WHEN ct.Id is not null THEN 1 ELSE 0 END) as c
FROM table t
LEFT JOIN aTable at ON t.Id = at.Id
LEFT JOIN bTable bt ON t.Id = bt.Id
LEFT JOIN cTable ct ON t.Id = ct.Id
Sample data:
Output:

Scalar subquery produced more than one element, using UNNEST

I have the following sentence, as I have read, the UNNEST should be used, but I don't know how
select
(
select
case
when lin = 4 then 1
when lin != 4 then null
end as FLAG,
from table_1
WHERE
table_1.date = table_2.date
AND
table_1.acc = table_2.acc
) as FLAG
FROM
table_2
I have a lot of subqueries and that's why I can't use LEFT JOIN.
Currently my table_2 has 13 million records, and table_1 has 400 million records, what I want is to be able to show a FLAG for each account, knowing that the data universes are different.
I can't use LEFT JOIN ...
Scalar subquery produced more than one element ...
simply use below version of your query - which logically is equivalent to your original query but eliminating the issue you have
select *,
(
select 1 as FLAG
from table_1
WHERE
table_1.date = table_2.date
AND
table_1.acc = table_2.acc
AND lin = 4
LIMIT 1
) as FLAG
FROM
table_2
This indicates that you expect to see one and only one value from the subquery but when you join to table_1 you are getting duplicates on the date and acc fields. If you aggregate the value or remove the duplicates from table_1 that should solve your issue, although at that point why not just use a more efficient JOIN?
-- This will solve your immediate problem, use any aggregation
-- technique, I picked MAX because why not. I do not know your use case.
select
(
select
MAX(case
when lin = 4 then 1
when lin != 4 then null
end) as FLAG
from table_1
WHERE
table_1.date = table_2.date
AND
table_1.acc = table_2.acc
) as FLAG
FROM
table_2
A better way to do this would be
select
case
when t1.lin = 4 then 1
when t1.lin != 4 then null
end as FLAG
FROM
table_2 t2
LEFT JOIN
table_1 t1 ON (
t1.date = t2.date
AND t1.acc = t2.acc
)
As you said, left joins do not work for you. If you want multiple results to be nested in the same subquery then just wrap your subquery in an ARRAY() function to allow for repeated values.
select
ARRAY(
select
case
when lin = 4 then 1
when lin != 4 then null
end as FLAG
from table_1
WHERE
table_1.date = table_2.date
AND
table_1.acc = table_2.acc
-- Now you must filter out results that will return
-- NULL because NULL is not allowed in an ARRAY
AND
table_1.lin = 4
) as FLAG
FROM
table_2

SQL LEFT JOIN with conditional CASE statements

Hopefully this is a quickie
SELECT *
FROM T
left JOIN J ON
CASE
WHEN condition1 THEN 1 --prefer this option even if CASE2 has a value
WHEN condition2 THEN 2
ELSE 0
END = 1 (edit: but if 1 does not satisfy, then join on 2)
Both cases return results, but I want THEN 1 to supersede THEN 2 and be the lookup priority
Can I have SQL do something like join on max(CASE)?
Basically I am trying to duplicate a nested INDEX/MATCH from Excel
edit: what i am hearing is that the Case should stop at the first returned TRUE, but it doesn't behave like that when i test
SELECT *
FROM T
left JOIN J ON
CASE
WHEN condition1 THEN 1 --prefer this option even if CASE2 has a value
WHEN condition2 THEN 1
ELSE 0
END = 1
it seems to prefer the 2nd THEN 1 sometimes, but not always... is there anything i am missing that would cause this behavior?
It doesn't matter which of the conditions causes the rows to match in a join. There are legitimate reasons to use a case expression in a join but I think you just want to or your conditions and then use the case expression to output a ranked reason for the match.
SELECT *, CASE WHEN <condition1> THEN 1 WHEN <condition2> THEN 2 END as match_code
FROM T LEFT OUTER JOIN J ON <condition1> or <condition2>
I don't know what to picture regarding the "nested INDEX/MATCH" from Excel. If I'm on the wrong track above then perhaps you're looking for a nested case expression?
Now if your conditions will have matches across different rows and you only want to keep one then...
WITH matches AS (
SELECT *, CASE WHEN <condition1> THEN 1 WHEN <condition2> THEN 2 END AS match_code
FROM T LEFT OUTER JOIN J ON <condition1> OR <condition2>
), ranked as (
SELECT *, MIN(match_code) OVER (PARTITION BY ???) AS keeper
FROM matches
)
SELECT ...
FROM ranked
WHERE match_code = keeper
Well, you can always have several conditions in your CASE Statements:
SELECT *
FROM T
left JOIN J ON
CASE
WHEN condition1 THEN 1 --prefer this option even if CASE2 has a value
WHEN condition2 And !condition1 THEN 2
ELSE 0
END = 1
--UPDATED--
If both of your conditions are required to match, but condition1 is optional then you can try this statement too:
SELECT *
FROM T
left JOIN J ON
CASE
WHEN condition1 And condition2 THEN 1 --Both conditions match
WHEN condition2 THEN 2 -- condition1 has no match
ELSE 0
END = 1
You can use With statement to this in 2 steps:
With first_join as
(SELECT *
FROM T
left JOIN J ON condition1)
select * from first_join
join J On case when Name_of_2nd_condition is null
then condition2
ELSE null end
You can use the CROSS APPLY /OUTER APPLY operator : https://www.mssqltips.com/sqlservertip/1958/sql-server-cross-apply-and-outer-apply/
SELECT *
FROM T
OUTER APPLY (SELECT TOP 1 *
FROM J
WHERE condition1 OR condition2
ORDER BY order) J

Determine which values are not in sql table

I have this query in oracle:
select * from table where col2 in (1,2,3,4);
lets say I got this result
col1 | col2
-----------
a 1
b 2
My 'in (1,2,3,4)' part has like 20 or more options, how can I determinate which values I don't found in my table? in my example 3 and 4 doesn't exist in the table
You can't in the way you want.
You need to insert the values you want to find into a table and than select all the values which don't exist in the desired table.
Lets say the data you want to find is in A and you want to know which doesn't exist in B.
SELECT *
FROM table_a A
WHERE NOT EXISTS (SELECT *
FROM table_b B
WHERE B.col1 = A.col1);
IN lists are stupid, or at least not very useful. Use a SQL Type collection to store your values instead because we can turn them into tables.
In this example I'm using the obscure SYS.KU$_OBJNUMSET type, which is the only nested table of Number I know of on 10g. (There's lots more in 11g).
So
select t.column_value
from table ( SYS.KU$_OBJNUMSET (1,2,3,4) ) t
left join your_table
on col2 = t.column_value
where col2 is null;
Here would be a way to do it if you're just using integers for your specific example:
SELECT *
FROM (
Select Rownum r
From dual
Connect By Rownum IN (1,2,3,4)
) T
LEFT JOIN YourTable T2 ON T.r = T2.Col2
WHERE T2.Col2 IS NULL
And the Fiddle.
This creates a table out of your where criteria 1,2,3,4 and uses that to LEFT JOIN on.
--EDIT
Because values aren't ints, here is another "ugly" option:
SELECT *
FROM (
Select 'a' r From dual UNION
Select 'b' r From dual UNION
Select 'c' r From dual UNION
Select 'd' r From dual
) T
LEFT JOIN YourTable T2 ON T.r = T2.Col2
WHERE T2.Col2 IS NULL
http://www.sqlfiddle.com/#!4/5e769/2
Good luck.

grouping records in one temp table

I have a table where one column has duplicate records but other columns are distinct. so something like this
Code SubCode version status
1234 D1 1 A
1234 D1 0 P
1234 DA 1 A
1234 DB 1 P
5678 BB 1 A
5678 BB 0 P
5678 BP 1 A
5678 BJ 1 A
0987 HH 1 A
So in the above table. subcode and Version are unique values whereas Code is repeated. I want to transfer records from the above table into a temporary table. Only records I would like to transfer are where ALL the subcodes for a code have status of 'A' and I want them in the temp table only once.
So from example above. the temporary table should only have
5678 and 0987 since all the subcodes relative to 5678 have status of 'A' and all subcodes for 0987 (it only has one) have status of A. 1234 is ommited because its subcode 'DB' has status of 'P'
I'd appreciate any help!
Here's my solution
SELECT Code
FROM
(
SELECT
Code,
COUNT(SubCode) as SubCodeCount
SUM(CASE WHEN ACount > 0 THEN 1 ELSE 0 END)
as SubCodeCountWithA
FROM
(
SELECT
Code,
SubCode,
SUM(CASE WHEN Status = 'A' THEN 1 ELSE 0 END)
as ACount
FROM CodeTable
GROUP BY Code, SubCode
) sub
GROUP BY Code
) sub2
WHERE SubCodeCountWithA = SubCodeCount
Let's break it down from the inside out.
SELECT
Code,
SubCode,
SUM(CASE WHEN Status = 'A' THEN 1 ELSE 0 END)
as ACount
FROM CodeTable
GROUP BY Code, SubCode
Group up the codes and subcodes (Each row is a distinct pairing of Code and Subcode). See how many A's occured in each pairing.
SELECT
Code,
COUNT(SubCode) as SubCodeCount
SUM(CASE WHEN ACount > 0 THEN 1 ELSE 0 END)
as SubCodeCountWithA
FROM
--previous
GROUP BY Code
Regroup those pairings by Code (now each row is a Code) and count how many subcodes there are, and how many subcodes had an A.
SELECT Code
FROM
--previous
WHERE SubCodeCountWithA = SubCodeCount
Emit those codes with have the same number of subcodes as subcodes with A's.
It's a little unclear as to whether or not the version column comes into play. For example, do you only want to consider rows with the largest version or if ANY subcde has an "A" should it count. Take 5678, BB for example, where version 1 has an "A" and version 0 has a "B". Is 5678 included because at least one of subcode BB has an "A" or is it because version 1 has an "A".
The following code assumes that you want all codes where every subcode has at least one "A" regardless of the version.
SELECT
T1.code,
T1.subcode,
T1.version,
T1.status
FROM
MyTable T1
WHERE
(
SELECT COUNT(DISTINCT subcode)
FROM MyTable T2
WHERE T2.code = T1.code
) =
(
SELECT COUNT(DISTINCT subcode)
FROM MyTable T3
WHERE T3.code = T1.code AND T3.status = 'A'
)
Performance may be abysmal if your table is large. I'll try to come up with a query that is likely to have better performance since this was off the top of my head.
Also, if you explain the full extent of your problem maybe we can find a way to get rid of that temp table... ;)
Here are two more possible methods. Still a lot of subqueries, but they look like they will perform better than the method above. They are both very similar, although the second one here had a better query plan in my DB. Of course, with limited data and no indexing that's not a great test. You should try all of the methods out and see which is best for your database.
SELECT
T1.code,
T1.subcode,
T1.version,
T1.status
FROM
MyTable T1
WHERE
EXISTS
(
SELECT *
FROM MyTable T2
WHERE T2.code = T1.code
AND T2.status = 'A'
) AND
NOT EXISTS
(
SELECT *
FROM MyTable T3
LEFT OUTER JOIN MyTable T4 ON
T4.code = T3.code AND
T4.subcode = T3.subcode AND
T4.status = 'A'
WHERE T3.code = T1.code
AND T3.status <> 'A'
AND T4.code IS NULL
)
SELECT
T1.code,
T1.subcode,
T1.version,
T1.status
FROM
MyTable T1
WHERE
EXISTS
(
SELECT *
FROM MyTable T2
WHERE T2.code = T1.code
AND T2.status = 'A'
) AND
NOT EXISTS
(
SELECT *
FROM MyTable T3
WHERE T3.code = T1.code
AND T3.status <> 'A'
AND NOT EXISTS
(
SELECT *
FROM MyTable T4
WHERE T4.code = T3.code
AND T4.subcode = T3.subcode
AND T4.status = 'A'
)
)
In your select, add a where clause that reads:
Select [stuff]
From Table T
Where Exists
(Select * From Table
Where Code = T.Code
And Status = 'A')
And Not Exists
(Select * From Table I
Where Code = T.Code
And Not Exists
(Select * From Table
Where Code = I.Code
And SubCode = I.SubCode
And Status = 'A'))
In English,
Show me the rows,
where there is at least one row with status 'A',
and there are NO rows with any specific subcode,
that do not have at least one row with that code/subcode, with status 'A'
INSERT theTempTable (Code)
SELECT t.Code
FROM theTable t
LEFT OUTER JOIN theTable subT ON (t.Code = subT.Code AND subT.status <> 'A')
WHERE subT.Code IS NULL
GROUP BY t.Code
This should do the trick. The logic is a little tricky, but I'll do my best to explain how it is derived.
The outer join combined with the IS NULL check allows you to search for the absence of a criteria. Combine that with the inverse of what you're normally looking for (in this case status = 'A') and the query succeeds when there are no rows that do not match. This is the same as ((there are no rows) OR (all rows match)). Since we know that there are rows due to the other query on the table, all rows must match.