SQL Query to do grouping across a join

SQL Query to do grouping across a join - sql

I've an enrollment table containing student IDs, course IDs and teacher IDs.
___________________
| sID | cID | tID |
___________________
| 1 | 1 | 1 |
| 1 | 2 | 2 |
| 1 | 3 | 3 |
| 2 | 1 | 1 |
| 2 | 3 | 5 |
| 3 | 1 | 1 |
| 3 | 2 | 2 |
I would like to get a table that can tell me how many students are in each course with a given professor. In other words, I'd like this:
_____________________________
| cID | tID | numOfStudents |
____________________________
| 1 | 1 | 3 |
| 2 | 2 | 2 |
| 3 | 3 | 1 |
| 3 | 5 | 1 |
I've tried
SELECT cID, tID, count(sID)
FROM enrollment
GROUP BY tID
but this type of formula, with different combinations is not working for me. Does anyone have any other suggestions?

Just add cid to the GROUP BY:
SELECT cID, tID, count(*)
FROM enrollment
GROUP BY cid,tID
sqlfiddle demo
From the docs:
When GROUP BY is present, it is not valid for the SELECT list
expressions to refer to ungrouped columns except within aggregate
functions, since there would be more than one possible value to return
for an ungrouped column.

SELECT cID, tID, count(sID)
FROM enrollment
GROUP BY 1,2

Related

How does having clause filter rows in mysql?

As I know HAVING clause is used to filter rows for each group.
I have a table that stores scores of students.
create table sc
(
`classid` int,
`studentid` int,
`score` int
);
Here is the sample data:
+---------+-----------+-------+
| classid | studentid | score |
+---------+-----------+-------+
| 1 | 1 | 50 |
| 1 | 2 | 59 |
| 1 | 3 | 80 |
| 1 | 4 | 68 |
| 1 | 5 | 70 |
| 1 | 6 | 20 |
| 1 | 7 | 90 |
| 1 | 8 | 100 |
| 1 | 9 | 25 |
| 2 | 1 | 51 |
| 2 | 2 | 59 |
| 2 | 3 | 80 |
| 2 | 4 | 68 |
| 2 | 5 | 70 |
| 2 | 6 | 30 |
| 2 | 7 | 44 |
| 2 | 8 | 80 |
| 3 | 1 | 20 |
| 1 | 11 | 30 |
| 1 | 12 | 40 |
+---------+-----------+-------+
And I want to query the max score of each class, so I wrote this SQL statement:
select *
from sc
group by classid
having score = max(score);
But the output is not what I expect. The output only prints one row.
+---------+-----------+-------+
| classid | studentid | score |
+---------+-----------+-------+
| 3 | 1 | 20 |
+---------+-----------+-------+

If you have columns in your SELECT clause that are not being aggregated by an aggregate formula like sum(), max(), avg(), etc, then those columns also need to be present in your GROUP BY. Older versions of mysql are the only RDBMS that doesn't error when you miss this step. Instead of erroring, it will just grab whichever values it wishes and give you random results every time you run.
HAVING is one of the last steps to execute in SQL (just before ORDER BY). there's some nuance there with window functions and other stuff that executes late. Because of that doing score = max(score) doesn't make much sense. Either score is aggregated at this point or it is not. You can't compare both aggregated state and non-aggregated state of that column at the same time.
Instead you want a correlated subquery:
SELECT *
FROM t1 as dt
WHERE score =
(
SELECT MAX(score)
FROM t1 as dt2
WHERE dt.classid = dt2.classid
);
Alternatively you can use window functions:
SELECT *
FROM
(
SELECT classid,
studentid,
score,
MAX(score) OVER (PARTITION BY classid) as maxclassscore
FROM t1
) dt
WHERE score = maxclassscore;

Your code is not valid, but with your actual data you get only one result as only one has 100
SELECT * FROM sc WHERE score =
(select max(score) maxscore FROM sc);
classid | studentid | score
------: | --------: | ----:
1 | 8 | 100
db<>fiddle here

Selecting rows that doesn't have duplicates

Let's say I have the following table:
| sku | id | value | count |
|-----|----|-------|-------|
| A | 1 | 1 | 2 |
| A | 1 | 2 | 2 |
| A | 3 | 3 | 3 |
I want to select rows that don't have the same count for the same id. So my desired outcome is:
| sku | id | value | count |
|-----|----|-------|-------|
| A | 3 | 3 | 3 |
I need something that works with Postgres 10

A simple method is window functions:
select t.*
from (select t.*, count(*) over (partition by sku, id) as cnt
from t
) t
where cnt = 1;
This assumes you really mean the sku/id combination.

Hive - over (partition by ...) with a column not in group by

Is it possible to do something like:
select
avg(count(distinct user_id))
over (partition by some_date) as average_users_per_day
from user_activity
group by user_type
(notably, the partition by column, some_date, is not in the group by columns)
The idea I'm going for is something like: the average users per day by user type.
I know how to do it using subqueries (see below), but I'd like to know if there is a nice way using only over (partition by ...) and group by.
Notes:
From reading this answer, my understanding (correct me if I'm wrong) is that the following query:
select
avg(count(distinct a)) over (partition by b)
from foo
group by b
can be expanded equivalently to:
select
avg(count_distinct_a)
from (
select
b,
count(distinct a) as count_distinct_a
from foo
group by b
)
group by b
And from that, I can tweak it a bit to achieve what I want:
select
avg(count_distinct_user_id) as average_users_per_day
from (
select
user_type,
count(distinct user_id) as count_distinct_user_id
from user_activity
group by user_type, some_date
)
group by user_type
(notably, the inner group by user_type, some_date differs from the outer group by user_type)
I'd like to be able to tell the partition by-group by interaction to use a "sub-group-by" for the windowing part. Please let me know if my understanding of partition by/group by is completely off.
EDIT: Some sample data and desired output.
Source table:
+---------+-----------+-----------+
| user_id | user_type | some_date |
+---------+-----------+-----------+
| 1 | a | 1 |
| 1 | a | 2 |
| 2 | a | 1 |
| 3 | a | 2 |
| 3 | a | 2 |
| 4 | b | 2 |
| 5 | b | 1 |
| 5 | b | 3 |
| 5 | b | 3 |
| 6 | c | 1 |
| 7 | c | 1 |
| 8 | c | 4 |
| 9 | c | 2 |
| 9 | c | 3 |
| 9 | c | 4 |
+---------+-----------+-----------+
Sample intermediate table (for reasoning with):
+-----------+-----------+---------------------+
| user_type | some_date | distinct_user_count |
+-----------+-----------+---------------------+
| a | 1 | 2 |
| a | 2 | 2 |
| b | 1 | 1 |
| b | 2 | 1 |
| b | 3 | 1 |
| c | 1 | 2 |
| c | 2 | 1 |
| c | 3 | 1 |
| c | 4 | 2 |
+-----------+-----------+---------------------+
SQL is: select user_type, some_date, count(distinct user_id) from user_activity group by user_type, some_date.
Desired result:
+-----------+---------------------+
| user_type | average_daily_users |
+-----------+---------------------+
| a | 2 |
| b | 1 |
| c | 1.5 |
+-----------+---------------------+

How to get the max row count grouped by the ID in sql

Say I have this table:
(column: Row is a count based on the column ID)
ID | Row | State |
1 | 1 | CA |
1 | 2 | AK |
2 | 1 | KY |
2 | 2 | GA |
2 | 3 | FL |
3 | 1 | WY |
3 | 2 | HI |
3 | 3 | NY |
3 | 4 | DC |
4 | 1 | RI |
I'd like to generate a new column that would have the highest number in the Row column grouped by the ID column for each row. How would I accomplish this? I've been messing around with MAX(), GROUP BY, and some partitioning but I'm getting different errors each time. It's difficult to finesse this correctly. Here's my target output:
ID | Row | State | MaxRow
1 | 1 | CA | 2
1 | 2 | AK | 2
2 | 1 | KY | 3
2 | 2 | GA | 3
2 | 3 | FL | 3
3 | 1 | WY | 4
3 | 2 | HI | 4
3 | 3 | NY | 4
3 | 4 | DC | 4
4 | 1 | RI | 1

Use window version of MAX:
SELECT ID, Row, State, MAX(Row) OVER (PARTITION BY ID) AS MaxRow
FROM mytable
Demo here

You could join between a query on the table and an aggregate table:
SELECT t.*, max_row
FROM t
JOIN (SELECT id, MAX([row]) AS max_row
FROM t
GROUP BY id) agg ON t.id = agg.id

You can create first a query using group by id and max to get the highest number. Then use this query as a sub query and use the id to inner join.
Then use the max column from the sub query to obtain your final result.

How to return smallest value inside the resultset as a separate column in SQL?

I've been struggling with the following SQL query.
My resultset is now:
| Id | Customer | Sales |
| 1 | 1 | 10 |
| 2 | 1 | 20 |
| 3 | 2 | 30 |
| 4 | 2 | 40 |
What I'd like to do is to add additional column that shows the smallest sale for that customer:
| Id | Customer | Sales | SmallestSale |
| 1 | 1 | 10 | 10 |
| 2 | 1 | 20 | 10 |
| 3 | 2 | 30 | 30 |
| 4 | 2 | 40 | 30 |
As the select query to get those three columns is now rather complex I'd like to avoid subqueries.
Any ideas?
Mika

Assuming your RDBMS supports windowed aggregates
SELECT Id,
Customer,
Sales,
MIN(Sales) OVER (PARTITION BY Customer) AS SmallestSale
FROM YourTable

select s.Id, s.Customer, s.Sales, sm.SmallestSale
from Sales s
inner join (
select Customer, min(sales) as SmallestSale
from Sales
group by Customer
) sm on s.Customer = sm.Customer

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Query to do grouping across a join - sql

SELECT cID, tID, count(sID) FROM enrollment GROUP BY 1,2

Related

How does having clause filter rows in mysql?

Selecting rows that doesn't have duplicates

Hive - over (partition by ...) with a column not in group by

How to get the max row count grouped by the ID in sql

How to return smallest value inside the resultset as a separate column in SQL?

Categories

Resources