using COUNT (CASE WHEN....) vs CASE WHEN = ...THEN COUNT . I get different results, can someone kindly explain why? - sql

When i use method 1: ' COUNT (Case WHEN..) ' method it produces the output that I want, but when i use the 2nd method ' CASE WHEN .. COUNT ' method, i get a diagonal matrix of sorts which is not what I am looking for.
My steps are :
i) Created a dummy table
INSERT INTO job (jobid, jobname, [priority])
VALUES ('something', '1', 1),
('something', '2', 2),
('something', '3', 3),
('something', '4', 4),
('something', '5', 5),
('something', '6', 1),
('something', '7', 1),
('something', '8', 3),
('something', '9', 3),
('something', '10', 2);
ii) method 1 : COUNT (CASE WHEN....)
SELECT
COUNT(CASE WHEN [Priority] = 1 THEN 1 ELSE NULL END ) as Priority1,
COUNT(CASE WHEN [Priority] = 2 THEN 1 ELSE NULL END )as Priority2,
COUNT(CASE WHEN [Priority] = 3 THEN 1 ELSE NULL END )as Priority3
FROM job
Result :
Priority1 Priority2 Priority3
3 2 3
iii) method 2 : CASE WHEN .... COUNT
SELECT
CASE WHEN [Priority] = 1 THEN COUNT(*) END as Priority1,
CASE WHEN [Priority] = 2 THEN COUNT(*) END as Priority2,
CASE WHEN [Priority] = 3 THEN COUNT(*) END as Priority3
FROM job
GROUP BY [Priority]
Result :
Priority1 Priority2 Priority3
3 NULL NULL
NULL 2 NULL
NULL NULL 3
NULL NULL NULL
NULL NULL NULL
Method 1 gives me the right result, but method 2's output suprised me... i was expecting the same result as method 1!

Method 1:
The aggregate function COUNT is applied at the table level and COUNT ignored and consumed all the NULL values (cases where [Priority] is other than 1, 2 or 3). So, at the end you got only 1 row.
Method 2:
The aggregate function COUNT is applied to each row of the table. So, the result contains equal number of rows as the number of unique [Priority] values in the table. And result contains some NULL because the case condition didn't satisfied in those cases and COUNT return NULL.

You have a group by in the second method, so you are going to get one row per value in Priority.
So you sort of want:
SELECT CASE WHEN [Priority] = 1 THEN COUNT(*) END as Priority1,
CASE WHEN [Priority] = 2 THEN COUNT(*) END as Priority2,
CASE WHEN [Priority] = 3 THEN COUNT(*) END as Priority3
FROM job;
But this won't work. Because [Priority] is not aggregated.
Hmmm, you are basically back to your first method, where the condition is in the argument to the aggregation function Your expectation is wrong. Use the first method (although I personally prefer using SUM() to COUNT()).

Related

how to avoid duplicates in hive query

I have two tables:
table1
the_date | my_id |
02/03/2021,123
02/03/2021, 1234
02/03/2021, 12345
table2
the_date | my_id |seq | txt
02/03/2021, 1234, 1 , 'OK'
02/03/2021, 12345, 1, 'OK'
02/03/2021, 12345, 2, 'HELLO HI THERE'
02/03/2021, 123456, 1, 'Ok'
Here is my code:
WITH AB AS (
SELECT A1.my_id
FROM DB1.table1 A1 , DB1.MSG_REC A2 WHERE
A1.my_id=A2.my_id
),
BC AS (
SELECT AB.the_date
COUNT ( DISTINCT (CASE WHEN (TXT like '%OK%') THEN AB.my_id ELSE NULL END )) AS
CASE1 ,
COUNT ( DISTINCT (CASE WHEN (TXT like '%HELLO HI THERE%') THEN AB.my_id ELSE NULL END )) AS
CASE2
FROM AB left JOIN DB1.my_id BC ON AB.my_id =BC.my_id
The issue that stems from above is I am looping over the value '12345' twice because it satisfies both of the case statements.
That causes data duplicates when capturing metrics of the counts. Is there a way to execute the first case and then perform the second case but exclude looping any of the "my_id' records from the first case.
So for example, when it is time to run the above script and the first case executes, it will pick up the below records and the count would be 3
02/03/2021, 1234, 1 , 'OK'
02/03/2021, 12345, 1, 'OK'
02/03/2021, 123456, 1, 'Ok
The second case should only be looping through the below records and the count would be only 1
02/03/2021, 12345, 2, 'HELLO HI THERE'
CASE1 would be 4 and CASE2 would by 2 if I don't create a condition to circumvent this issue. Any tips or suggestions?
Assign case to each your ID before DISTINCT aggregation . After that do distinct aggregation, in such way you will eliminate same IDs counted in different cases. See comments in the code:
select --do final distinct aggregation
count(distinct (case when assigned_case='CASE1' then my_id else null end ) ) as CASE1,
count(distinct (case when assigned_case='CASE2' then my_id else null end ) ) as CASE2
from
(
select my_id
--assign single CASE to all rows with the same id based on some logic:
case when case1_flag = 1 then 'CASE1'
when case1_flag = 1 then 'CASE2'
else NULL
end as assigned_case
from
(--calculate all CASE flags for each ID
select AB.my_id,
max(CASE WHEN (TXT like '%OK%') THEN 1 ELSE NULL END) over (partition by AB.my_id) as case1_flag
max(CASE WHEN (TXT like '%HELLO HI THERE%') THEN 1 ELSE NULL END) over (partition by AB.my_id) as case2_flag
from ...
) s
) s

SQL query to get repeating column value that have other columns in a certain codition

Let's say we have below table of below schema.
create table result
(
id int,
task_id int,
test_name string,
test_result string
);
And dataset populated on this table looks like this.
insert into result
values (1, 1, 'test_a', 'pass'),
(2, 1, 'test_b', 'fail'),
(3, 1, 'test_c', 'pass'),
(4, 1, 'test_d', 'pass'),
(5, 2, 'test_a', 'pass'),
(6, 2, 'test_b', 'pass'),
(7, 2, 'test_c', 'pass'),
(8, 2, 'test_d', 'pass');
Basically single task has multiple test results entry. I want to retrieve task_id that has test_b fail but all the other test passed. So in this example it should return only task_id: 1.
I've tried with EXISTS and HAVING but it doesn't seem working in this case. I'm new to SQL. How can I implement it?
I would just use aggregation with a having clause:
select task_id
from result
group by task_id
having sum(case when test_name = 'test_b' and test_result = 'fail' then 1 else 0 end) = 1 and
sum(case when test_result = 'pass' then 1 else 0 end) = count(*) - 1;
The first condition validates that test_b failed. The second counts the number of passes and it should be one less then the number of rows for the task.
If your database supports except (or minus), you an use set-based operations:
select task_id
from result
where test_name = 'test_b' and test_result = 'fail'
except
select task_id
from result
where test_name <> 'test_b' and test_result = 'fail'
Maybe selecting distinct task IDs that have a fail result:
select distinct [task_id], [task_result]
from [result]
where [task_result] = 'fail'
Note that this query will scan the entire table unless there is an index on task_result.
Following code first sums test takers per task and counts fro 'test_b' whether it failed or not. Outer select ensure 'test_b' failed and other have passed.
select task_id from (
select
task_id,
count(test_result) numberoftakers,
sum(case when test_result<>'pass' AND test_name='test_b' then 1 else 0 end) numberoffailb,
sum(case when test_result='pass' then 1 else 0 end) numberofallpasses
from result
group by task_id) a
where numberoftakers=numberoffailb+numberofallpasses and numberoffailb=1
Assuming that (task_id, task_name) is a unique key of your table, you can indeed use (not) exists, along with a correlated subqueries wich ensures that other records having the same task_id did not passed.
select task_id
from result r
where
test_name = 'test_b'
and test_result = 'fail'
and not exists (
select 1
from result r1
where
r1.task_id = r.task_id
and r1.id != r.id
and r1.test_result = 'fail'
)
The left join antipattern also comes to mind:
select r.task_id
from result r
left join result r1
on r1.task_id = r.task_id
and r1.id != r.id
and r1.test_result = 'fail'
where
r.test_name = 'test_b'
and r.test_result = 'fail'
and r1.id is null
Demo on DB Fiddle - Both queries return:
| task_id |
| :------ |
| 1 |

Return single value when checking table rows values

Im trying to return a single value from a table with a lot of rows if a condition is met.
For example, I have a table (ID (pk), CODE (pk), DESCRIPTION) which has a lot of rows. How can I return in a single row if..
SELECT CASE
WHEN CODE IN ('1', '2') THEN '100'
WHEN CODE IN ('2', '3') THEN '200'
WHEN CODE IN ('5', '7') THEN '300'
END AS ASDASD
FROM TABLE
WHERE ID = 1;
The problem is that CODE must check for both and not just one of them. The code as it is will return if for example that ID has got the code '2'.
ASDASD
NULL
'200'
And I want to return just '200' because that ID has got code '2' and '3'.
Assuming codes are not duplicated for a particular id:
SELECT ID,
(CASE WHEN SUM(CASE WHEN CODE IN ('1', '2') THEN 1 ELSE 0 END) = 2
THEN '100'
WHEN SUM(CASE WHEN CODE IN ('2', '3') THEN 1 ELSE 0 END) = 2
THEN '200'
WHEN SUM(CASE WHEN CODE IN ('5', '7') THEN 1 ELSE 0 END) = 2
THEN '300'
END) AS ASDASD
FROM TABLE
WHERE ID = 1
GROUP BY ID;
I added ID to the SELECT, just because this might be useful for multiple ids.
You could try and use condition aggregation, as follows :
SELECT CASE
WHEN MAX(DECODE(code, '1', 1)) = 1 AND MAX(DECODE(code, '2', 1)) = 1
THEN '100'
WHEN MAX(DECODE(code, '2', 1)) = 1 AND MAX(DECODE(code, '3', 1)) = 1
THEN '200'
WHEN MAX(DECODE(code, '5', 1)) = 1 AND MAX(DECODE(code, '7', 1)) = 1
THEN '300'
END AS asdasd
FROM TABLE
WHERE ID = 1;
DECODE() is a handy Oracle function that compares an expression (code) to a series of values and returns results accordingly. Basically, condition MAX(DECODE(code, '1', 1)) = 1 ensures that at least one row has code = '1'.
PS : are you really storing numbers as strings ? If code is a number datatype, please remove the single quotes in the above query.
You need to check the number returned by a query like this:
SELECT COUNT(DISTINCT CODE) FROM TABLE WHERE ID = 1 AND CODE IN ('1', '2')
If this number is 2 then ID = 1 has both CODE values '1' and '2'.
SELECT
CASE
WHEN (SELECT COUNT(DISTINCT CODE) FROM TABLE WHERE ID = 1 AND CODE IN ('1', '2')) = 2 THEN '100'
WHEN (SELECT COUNT(DISTINCT CODE) FROM TABLE WHERE ID = 1 AND CODE IN ('2', '3')) = 2 THEN '200'
WHEN (SELECT COUNT(DISTINCT CODE) FROM TABLE WHERE ID = 1 AND CODE IN ('5', '7')) = 2 THEN '300'
END AS ASDASD
FROM TABLE

SQL query to get count based on filtered status

I have a table which has two columns, CustomerId & Status (A, B, C).
A customer can have multiple status in different rows.
I need to get the count of different status based on following rules:
If the status of a customer is A & B, he should be counted in Status A.
If status is both B & C, it should be counted in Status B.
If status is all three, it will fall in status A.
What I need is a table with status and count.
Could please someone help?
I know that someone would ask me to write my query first, but i couldn't understand how to implement this logic in query.
You could play with different variations of this:
select customerId,
case when HasA+HasB+HasC = 3 then 'A'
when HasA+HasB = 2 then 'A'
when HasB+HasC = 2 then 'B'
when HasA+HasC = 2 then 'A'
when HasA is null and HasB is null and HasC is not null then 'C'
when HasB is null and HasC is null and HasA is not null then 'A'
when HasC is null and HasA is null and HasB is not null then 'B'
end as overallStatus
from
(
select customerId,
max(case when Status = 'A' then 1 end) HasA,
max(case when Status = 'B' then 1 end) HasB,
max(case when Status = 'C' then 1 end) HasC
from tableName
group by customerId
) as t;
I like to use Cross Apply for this type of query as it allows for use of the calculated status in the Group By clause.
Here's my solution with some sample data.
Declare #Table Table (Customerid int, Stat varchar(1))
INSERT INTO #Table (Customerid, Stat )
VALUES
(1, 'a'),
(1 , 'b'),
(2, 'b'),
(2 , 'c'),
(3, 'a'),
(3 , 'b'),
(3, 'c')
SELECT
ca.StatusGroup
, COUNT(DISTINCT Customerid) as Total
FROM
#Table t
CROSS APPLY
(VALUES
(
CASE WHEN
EXISTS
(SELECT 1 FROM #Table x where x.Customerid = t.CustomerID and x.Stat = 'a')
AND EXISTS
(SELECT 1 FROM #Table x where x.Customerid = t.CustomerID and x.Stat = 'b')
THEN 'A'
WHEN
EXISTS
(SELECT 1 FROM #Table x where x.Customerid = t.CustomerID and x.Stat = 'b')
AND EXISTS
(SELECT 1 FROM #Table x where x.Customerid = t.CustomerID and x.Stat = 'c')
THEN 'B'
ELSE t.stat
END
)
) ca (StatusGroup)
GROUP BY ca.StatusGroup
I edited this to deal with Customers who only have one status... in which case it will return A, B or C dependant on the customers status

oracle adding, then avg with null

I have sql like:
select avg(decode(type, 'A', value, null) + decode(type, 'B', value, null)) from table;
The problem with this is some of these types can be null, so the addition part will result in null because adding anything to null makes it null. So you might think I could change the decode from null to 0, but that seems to make the avg() count it as part of it's averaging, but it shouldn't/I don't want it counted as part of the average.
Ideally the addition would just ignore the nulls and just not try to add them to the rest of the values.
So let's say my numbers are:
5 + 6 + 5
3 + 2 + 1
4 + null + 2
They total 28 and I'd want to divide by 8 (ignore the null), but if I change the null to 0 in the decode, the avg will then divide by 9 which isn't what I want.
As written, your code should always return null, since if the first decode returns value, then the second decode must always return null. I'm going to assume that you made an error in genericizing your code and that what you really meant was this:
avg(decode(type1, 'A', value1, null) + decode(type2, 'B', value2, null))
(Or, instead of type1, it could be a.type. The point is that the fields in the two decodes are meant to be separate fields)
In this case, I think the easisest thing to do is check for nulls first:
avg(case when type1 is null and type2 is null then null
else case type1 when 'A' then value1 else 0 end
+ case type2 when 'B' then value2 else 0 end
end)
(I replaced decode with case because I find it easier to read, but, in this case decode would work just as well.)
This is overcomplicated to do a sum here. Juste output the values with a CASE, and you are done.
SELECT AVG(
CASE WHEN type = 'A' OR type = 'B'
THEN value
ELSE null
END
)
FROM table
A simple workaround would be to calculate the average yourself:
select
-- The sum of all values with type 'A' or 'B'
sum(decode(type, 'A', value, 'B', value, 0)) /
-- ... divided by the "count" of all values with type 'A' or 'B'
sum(decode(type, 'A', 1, 'B', 1, 0))
from table;
A SQLFiddle example
But the way AVG() works, it would probably be sufficient, if you just removed the addition and put everything in a single DECODE()
select avg(decode(type, 'A', value, 'B', value, null)) from table
The logic here is a bit complicated:
select avg((case when type = 'A' then value else 0 end) + (case when type = 'B' then value else 0 end))
from table
where type in ('A', 'B')
The where clause guarantees that you have at least one "A" or "B". The problem is arising when you have no examples of "A" or "B".