Replace set of values with single value and sum - sql

I have a table as follows:
id group1 group2 is_bar amount
1 a bar1 true 100
2 a bar2 true 200
3 a baz false 150
4 a baz false 250
5 b bar1 true 50
Every time is_bar is true, I'd like to replace the value in group2 and sum over amount resulting in:
group1 group2 amount
a bar 300
a baz 400
b bar 50
I currently do this using a subquery and then grouping by every other column in the table! But this seems a bit noob to me:
SELECT group1, group2, sum(amount) FROM
(
SELECT group1,
CASE WHEN is_bar THEN 'bar' ELSE group2 END as group2,
amount
FROM foo
) new_foo
GROUP BY group1, group2
ORDER BY group1, group2;
Is there a smart-person solution to this?

I believe this should work:
SELECT
group1
, CASE WHEN is_bar THEN 'bar' ELSE group2 END as group2
, SUM(AMOUNT)
FROM foo
GROUP BY group1, CASE WHEN is_bar THEN 'bar' ELSE group2 END
As #Patrick has mentioned in the comments you can replace the very long conditions in the GROUP BY with a GROUP BY 1, 2.
This will automatically refer to columns 1 and 2 (first and second) in the SELECT statement and the query will have the same output. But, if you add a different column as the first, you will have to make sure the GROUP BY still works as intended by changing or adding a number/condition representing the first column.

It turns out that in Postgresql you can GROUP BY an aliased column, as stated in this part of the docs:
In strict SQL, GROUP BY can only group by columns of the source table
but PostgreSQL extends this to also allow GROUP BY to group by columns
in the select list. Grouping by value expressions instead of simple
column names is also allowed.
This SQL Fiddle shows that in action:
SELECT
group1,
CASE WHEN is_bar THEN 'bar' ELSE group2 END group2_grouped,
SUM(AMOUNT)
FROM foo
GROUP BY group1, group2_grouped
The problem arises when you try to alias the CASE statement to the same as the column name as Postgresql will GROUP BY the original column not the alias. This is mentioned in the docs:
In case of ambiguity, a GROUP BY name will be interpreted as an
input-column name rather than an output column name.

Related

How to set the first column value of a GROUPING SETS

I have the following request :
SELECT column1, column2, SUM(column3) as total
FROM my_table
GROUP BY GROUPING SETS ((column1, column2), ())
Which returns :
Name1
Name2
QTT
AB
CD
15
ZE
EF
15
None
None
30
So |None|None|30 is the output from the GROUPING BY SETS
But I am wondering how to define the first None to be something else :
|SubTotal|None|30
For example.
The final output would be :
Name1
Name2
QTT
AB
CD
15
ZE
EF
15
SubTotal
None
30
The dialect is not provided, but the corresponding part to GROUPING SETS is GROUPING function:
Describes which of a list of expressions are grouped in a row produced by a GROUP BY query.
GROUPING_ID is not an aggregate function, but rather a utility function that can be used alongside aggregation, to determine the level of aggregation a row was generated for
CREATE TABLE my_table(Name1 TEXT, Name2 TEXT, QTT INT)
AS SELECT 'AB','CD',15
UNION SELECT 'ZE','EF',15;
SELECT CASE WHEN GROUPING_ID(Name1,Name2)=3THEN 'Subtotal' ELSE Name1 END AS Name1
,Name2, SUM(QTT) as total
FROM my_table
GROUP BY GROUPING SETS ((Name1, Name2), ());
Output:
Related:
SQL Server GROUPING_ID
Postgresql GROUPING
MySQL GROUPING

In SQL, how do you count if any rows in a group match a certain criteria?

I'm new to SQL, but I have a dataset that has students, their class subjects, and if there was an error in their work. I want to know how many students have at least 1 error in any subject. Thus, whether a student has one subject with an error (like students 2 and 3 in the example) or multiple errors (like student 4), they'd be flagged. Only if they have no errors should they be categorized as 'no'.
I know I have to use GROUP BY and COUNT, and I'm thinking I have to use HAVING as well, but I can't seem to put it together. Here's a sample dataset:
ID Rating Error
==========================================
1 English No
1 Math No
2 English Yes
2 Math No
2 Science No
3 English Yes
4 English Yes
4 Math Yes
And the desired output:
Error Count Percent
==========================================
No 1 .25
Yes 3 .75
there are many different ways you can do it, here is one example by using CTE (common table expressions):
with t as (
select
id,
case when sum(case when error='Yes' then 1 else 0 end) > 0 then 'Yes' else 'No' end as error
from students
group by id
)
select
error,
count(*),
(0.0 + count(*)) / (select count(*) from t) as perc
from t
group by error
basically, inner query (t) is used to calculate error status for each student, outer query calculates error distribution/percentage numbers
There are several useful functions you can use:
bool_or(boolean) → boolean - Returns TRUE if any input value is TRUE, otherwise FALSE.
if(condition, true_value, false_value) - Evaluates and returns true_value if condition is true, otherwise evaluates and returns false_value.
select count(distinct id) - to count distinct ids.
with dataset (ID,Rating,Error) as (
values (1,'Math','No'),
(2,'English','Yes'),
(1,'English','No'),
(2,'Math','No'),
(2,'Science','No'),
(3,'English','Yes'),
(4,'English','Yes'),
(4,'Math','Yes')
)
select if(has_error, 'Yes', 'No') Error,
count(*) Count,
cast(count(*) as double) / (select count(distinct id) from dataset) Percent
from (
select bool_or(Error = 'Yes') has_error
from dataset
group by id
)
group by has_error;
Output:
Error
Count
Percent
Yes
3
0.75
No
1
0.25

SQL: How to exclude group from result set by one of the elements, not using subqueries

Input:
id group_id type_id
1 1 aaaaa
2 1 BAD
3 2 bbbbb
4 2 ccccc
5 3 ddddd
6 3 eeeee
7 3 aaaaa
I need to output group_ids which consist only of a members for which type_id <> 'BAD'. A whole group with at least one BAD member should be excluded
Use of subqueries (or CTE or NOT EXISTS or views or T-SQL inline functions) is not allowed!
Use of except is not allowed!
Use of cursors is not allowed.
Any solutions which trick the rules above are appreciated. Any RDBMS is ok.
Bad example solution producing correct results, (using except):
select distinct group_id
from input
except
select group_id
from input
where type_id = 'bad'
group by group_id, type_id
Output:
group_id
2
3
I would just use group by and having:
select group_id
from input
group by group_id
having min(type_id) = 'good' and max(type_id) = min(type_id);
This particular version assumes that type_id (as in the question) does not take on NULL values. It is easily modified to take that into account.
EDIT:
If you are looking for one bad, then just do:
select group_id
from input
where type_id = 'bad'
group by group_id;
Group by group_id and count occurrences of 'BAD':
select group_id
from mytable
group by group_id
having count(case when type_id = 'BAD' then 'count me' end) = 0;

How to check for sequential ordering in having clause in HIVE?

I want to be able to write a query that will tell me which groups of my data do not have every number in a sequence.
For example, my table loks like this:
Columns: sequence group1
0 ADM
1 ADM
0 GDM
2 GDM
3 GDM
0 WJK
And, I want to know which unique values in group1 contain all of the numbers starting at 0 and counting. So, in this instance, ADM and WJK would get returned, but GDM would not. GDM would not, because it goes from 0, 2, 3 and skips 1.
How would I write a query in HIVE to tell me which unique values in column group1 contain all integers sequentially?
SELECT group1
FROM
TableName
GROUP BY
group1
HAVING
COUNT(*) = MAX(sequence) - MIN(sequence) + 1
this works if 0 based or positive integer
You can use count and max, due the fact that sequence start form zero
select group1
from my_table
group by group1
having count(*) = max(sequence) +1
for your last comment then you can use
select group1
from my_table
group by group1
having count(distinct sequence) = max(sequence) +1
If sequence can have duplicates, then you need to be careful. One method is to take a real sequence, subtract it, and be sure the difference is contant:
select group1
from (select t.*,
(sequence - row_number() over (partition by group1 order by sequence)) as diff
from t
) t
group by group1
having min(diff) = max(diff);

Grouping by intervals

Given a table (mytable) containing a numeric field (mynum), how would one go about writing an SQL query which summarizes the table's data based on ranges of values in that field rather than each distinct value?
For the sake of a more concrete example, let's make it intervals of 3 and just "summarize" with a count(*), such that the results tell the number of rows where mynum is 0-2.99, the number of rows where it's 3-5.99, where it's 6-8.99, etc.
The idea is to compute some function of the field that has constant value within each group you want:
select count(*), round(mynum/3.0) foo from mytable group by foo;
I do not know if this is applicable to mySql, anyway in SQL Server I think you can "simply" use group by in both the select list AND the group by list.
Something like:
select
CASE
WHEN id <= 20 THEN 'lessthan20'
WHEN id > 20 and id <= 30 THEN '20and30' ELSE 'morethan30' END,
count(*)
from Profiles
where 1=1
group by
CASE
WHEN id <= 20 THEN 'lessthan20'
WHEN id > 20 and id <= 30 THEN '20and30' ELSE 'morethan30' END
returns something like
column1 column2
---------- ----------
20and30 3
lessthan20 3
morethan30 13