SQL - "Case when" vs condition - sql

I have a table Table_A with columns A, B and C whereby column C needs to be summed, but only if column B is a certain value. Otherwise column C may not always contain a value to be summed.
So the normal SQL of:
SELECT A, B, SUM(C)
FROM Table_A
WHERE B = 'value condition'
GROUP BY A,B
Works well. However, I thought I could use "CASE WHEN" to catch the conditions with zero, say, like this:
SELECT
A, B,
CASE WHEN B = 'value condition' THEN SUM(C) ELSE 0 END
FROM Table_A
GROUP BY A, B
I get an error, referring to the fact that it is still trying to SUM a value not which is not in the condition. Am I missing something? Or have I misinterpreted CASE WHEN?

You want conditional aggregation. The CASE expression should appear inside SUM:
SELECT A, SUM(CASE WHEN B = 'value condition' THEN C ELSE 0 END) AS total
FROM Table_A
GROUP BY A;
Note that B probably does not belong in the GROUP BY clause, given that you want to conditionally aggregate using its values.

Related

Not a group expression: troubleshooting an Oracle query

Oracle query
I have a column value with hardcoded value 'N/A' and other char values as well. I need to write a select query to get the min of this column grouping the other set of columns.. but the challenge is i need to replace the hard coded value of 'N/A' with another character 'Abc' along with min function
Option 1: nvl won't work as the value is hardcoded
Option 2: decode in the select statement along with min clause in the decode list, and group by clause with the other columns used in the select list
However, getting an error
ORA-00979 : not a group expression.
Example :
Select a, b, decode(z,'N/A','abc',min(z))
From table 1, table 2
Where table 1.p=table2.q
Group by a,b
Having c.table1 >= table2.d
You should be using DECODE inside the MIN function, not the other way around. But, I would probably just use a single CASE expression here:
SELECT
a,
b,
MIN(CASE WHEN z = 'N/A' THEN 'abc' ELSE z END) AS min_value
FROM table1 t1
INNER JOIN table2 t2
ON t1.p = t2.q
GROUP BY
a,
b;
The above CASE expression is just taking the minimum value of z for each group, with the only difference between MIN(z) being that should the value be N/A, it would be treated as abc.

Filter if values provided otherwise return everything

Say I have a table t with 2 columns:
a int
b int
I can do a query such as:
select b
from t
where b > a
and a in(1,2,3)
order by b
where 1,2,3 is provided from the outside.
Obviously, the query can return no rows. In that case, I'd like to select everything as if the query did not have the and a in(1,2,3) part. That is, I'd like:
if exists (
select b
from t
where b > a
and a in(1,2,3)
)
select b
from t
where b > a
and a in(1,2,3)
order by b
else
select b
from t
where b > a
order by b
Is there a way to do this:
Without running two queries (one for exists, the other one the actual query)
That is less verbose than repeating queries (real queries are quite long, so DRY and all that stuff)
Using NOT EXISTS with a Sub Query to Determine if condition exists
SELECT b
FROM
t
WHERE
b > a
AND (
NOT EXISTS (SELECT 1 FROM #Table WHERE a IN (1,2,3))
OR a IN (1,2,3)
)
ORDER BY
b
The reason this works is because if the condition exists then the OR statement will include the rows and if the condition does not exist then the NOT EXISTS will include ALL rows.
Or With Common Table Expression and window Function with Conditional Aggregation.
WITH cte AS (
SELECT
b
,CASE WHEN a IN (1,2,3) THEN 1 ELSE 0 END as MeetsCondition
,COUNT(CASE WHEN a IN (1,2,3) THEN a END) OVER () as ConditionCount
FROM
t
)
SELECT
b
FROM
cte
WHERE
(ConditionCount > 0 AND MeetsCondition = 1)
OR (ConditionCount = 0)
ORDER BY
b
I find it a bit "ugly". Maybe it would be better to materialize output from your query within a temp table and then based on count from temp table perform first or second query (this limits accessing the original table from 3 times to 2 and you will be able to add some flag for qualifying rows for your condition not to repeat it). Other than that, read below . . .
Though, bear in mind that EXISTS query should execute pretty fast. It stops whether it finds any row that satisfies the condition.
You could achieve this using UNION ALL to combine resultset from constrained query and full query without constraint on a column and then decide what to show depending on output from first query using CASE statement.
How CASE statement works: when any row from constrained part of your query is found, return resultset from constrainted query else return everything omitting the constraint.
If your database supports using CTE use this solution:
with tmp_data as (
select *
from (
select 'constraint' as type, b
from t
where b > a
and a in (1,2,3) -- here goes your constraint
union all
select 'full query' as type, b
from t
where b > a
) foo
)
SELECT b
FROM tmp_data
WHERE
CASE WHEN (select count(*) from tmp_data where type = 'constraint') > 0
THEN type = 'constraint'
ELSE type = 'full query'
END
;

Sum of null columns in SQL

I have a table where A, B and C allow null values.
SELECT
A, B, C,
A + B + C AS 'SUM'
FROM Table
If my A, B and C values are 10, NULL and NULL, my SUM column shows nothing in it.
Is there any way to fix this, and display SUM as 10? OTHER THAN converting NULL s to Zeros?
You could use SQL COALESCE which uses the value in the column, or an alternative value if the column is null
So
SUM ( COALESCE(A,0) + COALESCE(B,0) + COALESCE(C,0))
In this case you need to use IsNull(Column, 0) to ensure it is always 0 at minimum.
SELECT
A, B, C,
IsNull(A,0) + IsNull(B,0) + IsNull(C,0) AS 'SUM'
FROM Table
ISNULL() determines what to do when you have a null value. if column returns a null value so you specified a 0 to be returned instead.

How can I group by two rows in SQL?

In the result of an SQL Select command I have two rows:
A | B
B | A
A|B and B|A means the same to me. I want, that only one of them would be selected in an SQL command.
How can I do that?
I have a select command , I join it self (natural join), like this:
SELECT a.coloumn ,b.coloumn
FROM table a,table b
where .... (not important)
and b.coloumn IN (
SELECT coloumn
FROM table
where ... (the same like above)
)
and b.coloumn != a.coloumn ;
And after that I have multiple coloumns.
You neither told us your column names nor your table name, but assuming you have two columns A and B in a table named the_table then the following will do:
select distinct least(a,b), greatest(a,b)
from the_table;
If you want to group by them using standard SQL:
select (case when a < b then a else b end) as a,
(case when a < b then b else a end) as b,
count(*) as cnt
from table t
group by (case when a < b then a else b end),
(case when a < b then b else a end);
Oracle supports the greatest() and least() functions, but not all databases do.
Another possible solution is:
select a, b from the_table
union
select b, a from the_table
This would work fine even if there are NULL values.

Using new columns in the "where" clause

hive rejects this code:
select a, b, a+b as c
from t
where c > 0
saying Invalid table alias or column reference 'c'.
do I really need to write something like
select * from
(select a, b, a+b as c
from t)
where c > 0
EDIT:
the computation of c it complex enough for me not to want to repeat it in where a + b > 0
I need a solution which would work in hive
Use a Common Table Expression if you want to use derived columns.
with x as
(
select a, b, a+b as c
from t
)
select * from x where c >0
You can run this query like this or with a Common Table Expression
select a, b, a+b as c
from t
where a+b > 0
Reference the below order of operations for logical query processing to know if you can use derived columns in another clause.
Keyed-In Order
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
Logical Querying Processing Phases
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
You are close, you can do this
select a, b, a+b as c
from t
where a+b > 0
It would have to look like this:
select a, b, a+b as c
from t
where a+b > 0
An easy way to explain/remember this is this: SQL cannot reference aliases assigned within its own instance. It would, however, work if you did this:
SELECT a,b,c
FROM(
select a, b, a+b as c
from t) as [calc]
WHERE c > 0
This syntax would work because the alias is assigned in a subquery.
no
just:
select a, b, a+b as c
from t
where a+b > 0
note: for mysql at least: order by and group by you can use column (or expression) positions
e.g. group by 2, order by 1 would get you one row per column 2 (whether a field name or an expression) and order it by column 1 (field or expression)
also: some RDBMS's do let you refer to the column alias as you first attempted