Creating a group by of a group by - sql

I'm sure this is really straightforward, but I'm struggling! I'd like to create a group by for the count of the incidence of a value, eg. there are two incidences of case_id '10' and two of case_id '20', so there would be two counts of 'two'
Data table:
id | case_id
---------------
0 | 10
1 | 10
2 | 20
3 | 20
4 | 30
5 | 30
6 | 30
7 | 40
8 | 40
7 | 40
8 | 40
Creates this:
no of occurrences | count
of a case_id |
---------------------------
2 | 2
3 | 1
4 | 1
Thank you!

Use an inner query:
SELECT occurences, COUNT(*) cnt
FROM (
SELECT COUNT(*) occurences FROM mytable GROUP BY case_id
) x
GROUP BY occurences
Demo on DB Fiddle:
| occurences | cnt |
| ---------- | --- |
| 2 | 2 |
| 3 | 1 |
| 4 | 1 |

Related

Postgres Query: Select the row with maximum value on a column from two distinct rows

user_id | sum | app_id | app_count
---------+------+--------+-----------
1 | 100 | 3 | 1
2 | 300 | 2 | 1
4 | 1100 | 1 | 2
4 | 1100 | 4 | 1
How do I write the query such that distinct user_id is selected based on the rank of app_count?
Here is the result I want:
user_id | sum | app_id | app_count
---------+------+--------+-----------
1 | 100 | 3 | 1
2 | 300 | 2 | 1
4 | 1100 | 1 | 2
In Postgres, you would use distinct on:
select distinct on (user_id) t.*
from t
order by user_id, app_count desc;

Get min and count rows group by id

EDIT I forget an important detail
I have a postgresql table like this:
| id | n_1 | n_2 |
| 1 | 3 | 5 |
| 1 | 2 | 6 |
| 1 | 8 | 4 |
| 1 | 1 | 5 |
| 2 | 4 | 3 |
| 2 | 5 | 1 |
I want to get the min values and the count count only if n_2 >= min(n_1):
| id | n_1 | n_2 | count |
| 1 | 1 | 4 | 4 |
| 2 | 4 | 1 | 0 |
The min number from n_1, min number from n_2 and count total when n_2 >= min(n_1) records from each id.
Any help?
here how you can do it by grouping them by id :
select id , min(n_1) ,min(n_2), count(case when n_2 >= min_n_1 then 1 end)
from ( select *, min(n_1) over (partition by id) as min_n_1 from table) t
group by id
I think this is just aggregation:
select id, min(n_1), min(n_2), count(*)
from t
group by id;

Multiple column count and aggregation

I have following table which has multiple entries for an order and the order can either be rejected or approved.
Amount | Approved | Rejected | OrderNo
-------------------------------------------
10 | N | Y | 10
20 | Y | N | 10
30 | N | N | 10
40 | Y | N | 10
22 | N | Y | 11
10 | N | N | 10
--------------------------------------------
Want to build a result set which can summarise.
OrderNo | TotalEntries | Approved_Or_Rejected_Entries | TotalAmount
-----------------------------------------------------------------
10 | 5 | 3 | 110
11 | 1 | 1 | 22
Use conditional aggregation:
select
orderno,
count(*) totalentries
sum(case when 'Y' in (approved, rejected) then 1 else 0 end) approved_or_rejected
sum(amount) total_amount
from mytable
group by orderno

How to sum rows before a condition is met in SQL

I have a table which has multiple records for the same id. Looks like this, and the rows are sorted by sequence number.
+----+--------+----------+----------+
| id | result | duration | sequence |
+----+--------+----------+----------+
| 1 | 12 | 7254 | 1 |
+----+--------+----------+----------+
| 1 | 12 | 2333 | 2 |
+----+--------+----------+----------+
| 1 | 11 | 1000 | 3 |
+----+--------+----------+----------+
| 1 | 6 | 5 | 4 |
+----+--------+----------+----------+
| 1 | 3 | 20 | 5 |
+----+--------+----------+----------+
| 2 | 1 | 230 | 1 |
+----+--------+----------+----------+
| 2 | 9 | 10 | 2 |
+----+--------+----------+----------+
| 2 | 6 | 0 | 3 |
+----+--------+----------+----------+
| 2 | 1 | 5 | 4 |
+----+--------+----------+----------+
| 2 | 12 | 3 | 5 |
+----+--------+----------+----------+
E.g. for id=1, i would like to sum the duration for all the rows before and include result=6, which is 7254+2333+1000+5. Same for id =2, it would be 230+10+0. Anything after the row where result=6 will be left out.
My expected output:
+----+----------+
| id | duration |
+----+----------+
| 1 | 10592 |
+----+----------+
| 2 | 240 |
+----+----------+
The sequence has to be in ascending order.
I'm not sure how I can do this in sql.
Thank you in advance!
I think you want:
select t2.id, sum(t2.duration)
from t
where t.sequence <= (select t2.sequence
from t t2
where t2.id = t.id and t2.result = 6
);
In PrestoDB, I would recommend window functions:
select id, sum(duration)
from (select t.*,
min(case when result = 6 then sequence end) over (partition by id) as sequence_6
from t
) t
where sequence <= sequence_6;
You can use a simple aggregate query with a condition that uses a subquery to recover the sequence corresponding to the record whose sequence is 6 :
SELECT t.id, SUM(t.duration) total_duration
FROM mytable t
WHERE t.sequence <= (
SELECT sequence
FROM mytable
WHERE id = t.id AND result = 6
)
GROUP BY t.id
This demo on DB Fiddle with your test data returns :
| id | total_duration |
| --- | -------------- |
| 1 | 10592 |
| 2 | 240 |
Basic group by query should solve your issue
select
id,
sum(duration) duration
from t
group by id
for the certain rows:
select
id,
sum(duration) duration
from t
where id = 1
group by id
if you want to include it in your result set
select id, duration, sequence from t
union all
select
id,
sum(duration) duration
null sequence
from t
group by id

SQL - Select distinct on two column

I have this table 'words' with more information:
+---------+------------+-----------
| ID |ID_CATEGORY | ID_THEME |
+---------+------------+-----------
| 1 | 1 | 1
| 2 | 1 | 1
| 3 | 1 | 1
| 4 | 1 | 2
| 5 | 1 | 2
| 6 | 1 | 2
| 7 | 2 | 3
| 8 | 2 | 3
| 9 | 2 | 3
| 10 | 2 | 4
| 11 | 2 | 4
| 12 | 3 | 5
| 13 | 3 | 5
| 14 | 3 | 6
| 15 | 3 | 6
| 16 | 3 | 6
And this query that gives to me 3 random ids from different categories, but not from different themes too:
SELECT Id
FROM words
GROUP BY Id_Category, Id_Theme
ORDER BY RAND()
LIMIT 3
What I want as result is:
+---------+------------+-----------
| ID |ID_CATEGORY | ID_THEME |
+---------+------------+-----------
| 2 | 1 | 1
| 7 | 2 | 3
| 14 | 3 | 6
That is, repeat no category or theme.
When you use GROUP BY you cannot include in the select list a column which is not being ordered. So, in your query it's impossible to inlcude Id in the select list.
So you need to do something a bit more complex:
SELECT Id_Category, Id_Theme,
(SELECT Id FROM Words W
WHERE W.Id_Category = G.Id_Category AND W.Id_Theme = G.Id_Theme
ORDER BY RAND() LIMIT 1
) Id
FROM Words G
GROUP BY Id_Category, Id_Theme
ORDER BY RAND()
LIMIT 3
NOTE: the query groups by the required columns, and the subselect is used to take a random Id from all the possible Ids in the group. Then main query is filtered to take three random rows.