I am trying to perform two sum functions from a query. However, I want to only perform one of the sum functions if it meets a certain condition without affecting the other sum function.
What I was thinking is to use something similar to select x where condition = 1 from AC which is however not possible.
Here is the sample query where I want the second [sum(t.match)] selection to only calculate if the result in the subquery: match = 1 while still getting the total sum of all qqty.
select
sum(t.qqty), sum(t.qqty)
from
(select
car, cqty, qqty,
case when cqty = qqty then 1 else 0 end as match,
location, state) t
Use conditional aggregation -- that is case as the argument to the sum():
select sum(t.qqty), sum(case when condition = 1 then t.qqty else 0 end)
from t;
Related
I have the following table. Using sqlite DB
Item
Result
A
Pass
B
Pass
A
Fail
B
Fail
I want to realize the above table as below using some query.
Item
Total
Accept
Reject
A
2
1(50%)
1(50%)
B
2
1(50%)
1(50%)
How should I construct this query?
You can try PIVOT() if your DBMS supports. Then use CONCAT or || operator depending on the DMBS.
Query:
SELECT
item,
total,
SUM(Pass)||'('|| CAST((SUM(Pass)*1.0/total*1.0)*100.0 AS DECIMAL)||'%)' AS Accept,
SUM(Fail)||'('|| CAST((SUM(Fail)*1.0/total*1.0)*100.0 AS DECIMAL)||'%)' AS Reject
FROM
(
SELECT
Item,
result,
COUNT(result) OVER(PARTITION BY item ORDER BY result ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS total,
CASE
WHEN Result = 'Pass' then 1
ELSE 0
END AS Pass,
CASE
WHEN Result = 'Fail' then 1
ELSE 0
END AS Fail
FROM t
) AS j
GROUP BY item, total
Query explanation:
Since SQLITE does not handle PIVOT, we are creating the flags Pass and Fail manually using CASE statement
To calculate total, COUNT is used as analytical function here. It is basically a shortcut to calculate count and place it in all rows
Then in the outer query, we are calculating %s and using || as the concatenate operator to concatenate the result with total sum and % of it
See demo in db<>fiddle
Simplified example:
In hive, I have a table t with two columns:
Name, Value
Bob, 2
Betty, 4
Robb, 3
I want to do a case when that uses the total of the Value column:
Select
Name
, CASE
When value>0.5*sum(value) over () THEN ‘0’
When value>0.9*sum(value) over () THEN ‘1’
ELSE ‘2’
END as var
From table
I don’t like the fact that sum(value) over () is computed twice. Is there a way to compute this only once. Added twist, I want to do this in one query, so without declaring user variables.
I was thinking of scalar queries:
With total as
(Select sum(value) from table)
Select
Name
, CASE
When value>0.5*(select * from total) THEN ‘0’
When value>0.9*(select * from total)THEN ‘1’
ELSE ‘2’
END as var
From table;
But this doesn’t work.
TLDR: Is there a way to simplify the first query without user variables ?
Don't worry about that. Let the optimizer worry about it. But, you can use a subquery or CTE if you don't want to repeat the expression:
select Name,
(case when value > 0.5 * total then '0'
when value > 0.9 * total then '1'
else '2'
end) as var
From (select t.*, sum(value) over () as total
from table t
) t;
Cross join a subquery that fetches the sum to the table:
Select
t.Name
, CASE
When t.value>0.9*tt.value THEN '1'
When t.value>0.5*tt.value THEN '0'
ELSE '2'
END as var
From table t cross join (select sum(value) value from table) tt
and change the order of the WHEN clauses in the CASE expression because as they are, the 2nd case will never succeed.
Since I/O is the major factor the slows down Hive queries, we should strive to reduce the num of stages to get better performance.
So it's better not to use a sub-query or CTE here.
Try this SQL with a global window clause:
select
name,
case
when value > 0.5*sum(value) over w then '0'
when value > 0.9*sum(value) over w then '1'
else '2'
end as var
from my_table
window w as (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
In this case window clause is the recommended way to reduce repetition of code.
Both the windowing and the sum aggregation will be computed only once. You can run explain select..., confirming that only ONE meaningful MR stage will be launched.
Edit:
1. A simple select clause on a subquery is not sth to worry about. It can be pushed down to the last phase of the subquery, so as to avoid additional MR stage.
2. Two identical aggregations residing in the same query block will only be evaluated once. So don’t worry about potential repeated calculation.
I have a database people that looks like this:
I wanted to count the occurrences of state='CA'.
My first attempt was:
SELECT COUNT(state='CA')
FROM people
;
this returned 1 row with a value of 1000. So I thought that there were 1000 people from CA in the database.
This turns out to be incorrect. I know that they are 127, which I can verify with the query
SELECT COUNT(*)
FROM people
WHERE state='CA'
;
which returns 1 row with a value of 127.
I understand how the second query works. However, I do not understand what is wrong with the first one. What is it returning?
If you want to see what's going on, run the query:
select state='CA' from people;
You will see that you will get one result for each row in people, with the value 0 or 1 (or True/False). What you've selected is whether state='CA' for each row, and there will be just as many of those results as there are rows.
You can't constrain a COUNT statement within the statement, you have to do that via the WHERE clause as in your second example.
count is not a sum .. your first query is improper because don't return the number of the rows true .. but the total numbers of not null rows true or false
if you want a filter count you must use a where condition (as your second query) otherwise you must use an if or a a select case inside the sum() function eg:
Select sum(case
when state='CA' then 1 else 0
end) as my_result from People;
or if you want count .. use null and not 0min count
Select count(case
when state='CA' then 1 else null
end) as my_result from People;
Try this-:
Select count(case when state='CA' then 1 else null end) as xyz from People;
1st query will work if you use case when in side count,
like below query will returned count of CA
SELECT sum( case when state='CA' then 1 else 0 end)
FROM people
In first query it is assigning the value 'CA' to the column state for all 1000 rows instead of filtering the values. That is what SELECT does. SELECT does not filter the number of returning rows, it modifies the data.
Whereas in WHERE clause the rows are being filtered first then the SELECT clause runs the COUNT function.
There is a sequence for running the query. It starts from FROM then WHERE, GROUP BY, ORDER BY at the end SELECT will run.
To answer the actual question - why do you get 1000? I'm guessing that there are 1000 rows in your database, or at least 1000 where state is not null. Count will return the number of rows where the thing inside the () is not null and as one of your comments says, the part inside your () will return either true or false, neither of which is null, so will count them all. Your second example is of course the right way to do it.
I have the below sql and would like to add a filter clause within the window function. Is this possible?
select ROUND(SUM(M.CHRG_RATE/M.CONTRACTUAL_RATE) OVER
(PARTITION BY M.PROGRAM),0) AS BILLED_MEMBERS_PER_MONTH2
from tableA
where 1=1
I was thinking I could wrap a case statement?
Something like:
You can use case to leave rows out of the sum. For example, this only sums rows where the flag column equals 'Q':
SUM(CASE WHEN M.FLAG = 'Q' THEN M.CHRG_RATE/M.CONTRACTUAL_RATE END)
I have query which running fine and its doing two types of work, COUNT and SUM.
Something like
select
id,
Count (contracts) as countcontracts,
count(something1),
count(something1),
count(something1),
sum(cost) as sumCost
from
table
group by
id
My problem is: if there is no contract for a given ID, it will return 0 for COUNT and Null for SUM. I want to see null instead of 0
I was thinking about case when Count (contracts) = 0 then null else Count (contracts) end but I don't want to do it this way because I have more than 12 count positions in query and its prepossessing big amount of records so I think it may slow down query performance.
Is there any other ways to replace 0 with NULL?
Try this:
select NULLIF ( Count(something) , 0)
Here are three methods:
1. (case when count(contracts) > 0 then count(contracts) end) as countcontracts
2. sum(case when contracts is not null then 1 end) as countcontracts
3. nullif(count(contracts), 0)
All three of these require writing more complicated expressions. However, this really isn't that difficult. Just copy the line multiple times, and change the name of the variable on each one. Or, take the current query, put it into a spreadsheet and use spreadsheet functions to make the transformation. Then copy the function down. (Spreadsheets are really good code generators for repeated lines of code.)