How do you use two aggregate functions for separate tables in a join? - sql

Sorry if this is a noob question!
I have two tables - a movie and a comment table.
I am trying to return output of the movie name and each comment for that movie as long as that movie has more than 1 comment associated to it.
Here are my tables
test_movies=# SELECT * FROM movie;
id | name | rating | release_date | original_copy_location
----+------------------------------------+--------+--------------+------------------------
1 | Cruella | 9 | 2021-05-28 | 4
7 | Shutter Island | 9 | 2010-02-19 | 4
9 | Grown Ups | 7 | 2010-06-25 | 4
11 | Guardians of the Galaxy: Volume 1 | 8 | 2014-09-01 | 4
14 | The RIng | 8 | 2002-10-18 | 4
17 | Digimon: The Movie | 6 | 2000-01-10 | 4
19 | Star Wars Episode 1 | 5 | 1999-06-21 | 4
20 | Ghosts Of Mars | 5 | 1998-09-15 | 4
5 | Interstellar | 8 | 2014-11-07 | 1
10 | Mean Girls | 8 | 2004-04-30 | 1
12 | Captain America: The First Avenger | 7 | 2011-07-22 | 1
15 | Get Out | 6 | 2017-02-24 | 1
6 | The Dark Knight | 10 | 2008-07-18 | 2
16 | Pokemon: The First Movie | 5 | 1998-11-10 | 2
18 | The Last Dance | 8 | 2020-05-01 | 2
8 | Just Go With It | 8 | 2011-02-11 | 3
13 | The Blair Witch Project | 8 | 1999-08-29 | 3
(17 rows)
test_movies=# SELECT * FROM comments;
c_id | c_comment | c_movie | c_user
------+--------------------------------------+---------+--------
1 | testing comment 1 | 16 | 4
2 | testing comment 1 | 1 | 1
3 | testing comment 1 | 1 | 2
4 | testing comment 1 | 8 | 5
5 | testing comment 1 | 6 | 3
6 | testing comment 1 | 12 | 2
7 | testing comment 1 | 20 | 3
8 | testing comment 1 | 16 | 5
9 | testing comment 1 | 17 | 4
10 | testing comment 1 | 12 | 2
(10 rows)
Output im trying to get is this:
name | c_comment
------------------------+-------------------------------------
Cruella | testing comment 1
Curella | testing comment 1
Pokemon:The First Movie | testing comment 1
Pokemon:The First Movie | testing comment 1
Captain America | testing comment 1
Captain America | testing comment 1
The problem with my queries is that I can't figure out how to return both the movie name and comment associated with it using aggregate functions.
If I use the count in the first select statement it returns all rows:
SELECT m.name, c.c_comment FROM movie m, comments c WHERE m.id = c.c_movie GROUP BY m.name, c.c_comment HAVING COUNT(m.name) >= 1;
If I try the below subquery I get the error - ERROR: subquery must return only one column
SELECT m.name, c.c_comment FROM movie m, comments c WHERE m.id = c.c_movie AND(SELECT m.name, COUNT(c.c_movie) FROM movie m, comments c WHERE m.id =c.c_movie GROUP BY name HAVING COUNT(c.c_movie) > 1);
Still a bit new to SQL as I'm a student and having a tough time figuring this query out lol.
Thanks in advance!

Something like this could work
select m.name, c.c_comment
from movie m
join comment c
on c.c_movie = m.id
where exists (select 1 from comments cc where cc.c_movie=m.id group by c_movie having count(*)>1)
It's standard sql, but you cannot work with mysql and postgresql at the same time... 🤔

Use window functions!
select m.name, c.c_comment
from movie m join
(select c.*, count(*) over (partition by c_movie) as cnt
from comment c
) c
on c.c_movie = m.id
where cnt > 1;

Related

Generate 'average' column from sub query and ROW_NUMBER window function in SQL SELECT

I have the following SQL Server tables (with sample data):
Questionnaire
id | coachNodeId | youngPersonNodeId | complete
1 | 12 | 678 | 1
2 | 12 | 52 | 1
3 | 30 | 99 | 1
4 | 12 | 678 | 1
5 | 12 | 678 | 1
6 | 30 | 99 | 1
7 | 12 | 52 | 1
8 | 30 | 102 | 1
Answer
id | questionnaireId | score
1 | 1 | 1
2 | 2 | 3
3 | 2 | 2
4 | 2 | 5
5 | 3 | 5
6 | 4 | 5
7 | 4 | 3
8 | 5 | 4
9 | 6 | 1
10 | 6 | 3
11 | 7 | 5
12 | 8 | 5
ContentNode
id | text
12 | Zak
30 | Phil
52 | Jane
99 | Ali
102 | Ed
678 | Chris
I have the following T-SQL query:
SELECT
Questionnaire.id AS questionnaireId,
coachNodeId AS coachNodeId,
coachNode.[text] AS coachName,
youngPersonNodeId AS youngPersonNodeId,
youngPersonNode.[text] AS youngPersonName,
ROW_NUMBER() OVER (PARTITION BY Questionnaire.coachNodeId, Questionnaire.youngPersonNodeId ORDER BY Questionnaire.id) AS questionnaireNumber,
score = (SELECT AVG(score) FROM Answer WHERE Answer.questionnaireId = Questionnaire.id)
FROM
Questionnaire
LEFT JOIN
ContentNode AS coachNode ON Questionnaire.coachNodeId = coachNode.id
LEFT JOIN
ContentNode AS youngPersonNode ON Questionnaire.youngPersonNodeId = youngPersonNode.id
WHERE
(complete = 1)
ORDER BY
coachNodeId, youngPersonNodeId
This query outputs the following example data:
questionnaireId | coachNodeId | coachName | youngPersonNodeId | youngPersonName | questionnaireNumber | score
1 | 12 | Zak | 678 | Chris | 1 | 1
2 | 12 | Zak | 52 | Jane | 1 | 3
3 | 30 | Phil | 99 | Ali | 1 | 5
4 | 12 | Zak | 678 | Chris | 2 | 4
5 | 12 | Zak | 678 | Chris | 3 | 4
6 | 30 | Phil | 99 | Ali | 2 | 2
7 | 12 | Zak | 52 | Jane | 2 | 5
8 | 30 | Phil | 102 | Ed | 1 | 5
To explain what's happening here… There are various coaches whose job is to undertake questionnaires with various young people, and log the scores. A coach might, at a later date, repeat the questionnaire with the same young person several times, hoping that they get a better score. The ultimate goal of what I'm trying to achieve is that the managers of the coaches want to see how well the coaches are performing, so they'd like to see whether the scores for the questionnaires tend to go up or not. The window function represents a way to establish how many times the questionnaire has been undertaken by the same coach/young person combo.
I need to be able to determine the average score based on the questionnaire number. So for example, the coach 'Zak' logged scores of '1' and '3' for his first questionnaires (where questionnaireNumber = 1) so the average would be 2. For his second questionnaires (where questionnaireNumber = 2) the scores were '3' and '5' so the average would be 4. So in analysing this data we know that over time Zak's questionnaire scores have improved from an average of '2' the first time to an average of '4' the second time.
I feel like the query needs to be grouped by the coachNodeId and questionnaireNumber values so it would output something like this (I've ommitted the questionnaireId, youngPersonNodeId, youngPersonName and score columns as they aren't crucial for the output — they're only used to derive the averageScore — and wouldn't be useful the way the results are grouped):
coachNodeId | coachName | questionnaireNumber | averageScore
12 | Zak | 1 | 2 (calculation: (1 + 3) / 2)
12 | Zak | 2 | 4 (calculation: (3 + 5) / 2)
12 | Zak | 3 | 4 (only one value: 4)
30 | Phil | 1 | 5 (calculation: (5 + 5) / 2)
30 | Phil | 2 | 2 (only one value: 2)
Could anyone suggest how I can modify my query to output the average scores based on the score from the sub-query and the ROW_NUMBER window function? I've hit the limits of my SQL skills!
Many thanks.
It is a bit hard to tell without sample data, but I think you are describing aggregation:
SELECT q.coachNodeId AS coachNodeId,
cn.[text] AS coachName,
q.youngPersonNodeId AS youngPersonNodeId,
ypn.[text] AS youngPersonName,
AVG(score)
FROM Questionnaire q JOIN
ContentNode cn
ON q.coachNodeId = cn.id JOIN
ContentNode ypn
ON q.youngPersonNodeId = ypn.id LEFT JOIN
Answer a
ON a.questionnaireId = q.id
WHERE complete = 1
GROUP BY q.coachNodeID, cn.[text] AS coachName,
q.youngPersonNodeId, ypn.[text]

Can't figure out a simple SQL query

Might be very simple, but I've been digging fow a few days now... I just can't figure out how to make this SQL query in Access...
In reference to the tables below, i'm looking for the query that can extract all the ITEMS for a specific Shop (ie 1:Alpha) from a specific GROUP (ie 1:Tools), that are NOT in the report for 2014... in this case ITEMS.IDs 6, 8, 9 and 10!
Tables:
Years
ID | Year
-----------------------------------------------
1 | 2014
2 | 2015
Shops
ID | ShopName
-----------------------------------------------
1 | Alpha
2 | Bravo
Items
ID | StockNbr | Description | GroupID
-----------------------------------------------
1 | 00-1200 | Ratchet 1/4 | 1
2 | 00-1201 | Ratchet 1/2 | 1
3 | 00-1300 | Screwdriver Philips No1 | 1
4 | 01-5544 | Banana | 2
5 | 00-4457 | Apple | 2
6 | 21-8887 | Hammer | 1
7 | 21-6585 | Drill | 1
8 | 21-4499 | Multimeter | 1
9 | 21-5687 | Digital Caliper | 1
10 | 22-7319 | File Set | 1
...
Groups
ID | GroupName
-----------------------------------------------
1 | Tools
2 | Fruits
REPORTS
ID | YearID | ShopID | ItemID
-----------------------------------------------
1 | 1 | 1 | 1
2 | 1 | 1 | 2
3 | 1 | 1 | 3
4 | 1 | 1 | 4
5 | 1 | 1 | 7
6 | 1 | 2 | 5
7 | 1 | 2 | 8
8 | 1 | 2 | 10
I've tried this, but then I realize it doesn't take the shops into consideration, it'll list all items that are not listed in reports, so if reports has an item for shop 2, it won't list it either...
SELECT Items.ID, Items.StockNbr, Items.Description, Items.GroupID, Reports.YearID, Reports.ShopID
FROM Reports
RIGHT JOIN Items ON Reports.ItemID = Items.ID
WHERE (((Items.GroupID)=1) AND ((Reports.UnitID) Is Null))
ORDER BY Items.StockNbr;
Thank you!
I think you're looking for an anti-join. There are several ways to do this. Here's one using not in.
select i.* from items i
where i.GroupId = 1
and i.ID NOT IN (
select ItemID from reports r
where r.ShopID = 1
and r.YearID = 2014
)
If the table Reports does not reference Items.ID then there is no available relationship ShopID or YearID
select *
from items
left join reports on items.id = reports.itemid
where reports.itemid IS NULL

Strange window function behaviour

I have the following set of data:
player | score | day
--------+-------+------------
John | 3 | 02-01-2014
John | 5 | 02-02-2014
John | 7 | 02-03-2014
John | 9 | 02-04-2014
John | 11 | 02-05-2014
John | 13 | 02-06-2014
Mark | 2 | 02-01-2014
Mark | 4 | 02-02-2014
Mark | 6 | 02-03-2014
Mark | 8 | 02-04-2014
Mark | 10 | 02-05-2014
Mark | 12 | 02-06-2014
Given two time ranges:
02-01-2014..02-03-2014
02-04-2014..02-06-2014
I need to get average score for each player within a given time range. Ultimate result I'm trying to achieve is this:
player | period_1_score | period_2_score
--------+----------------+----------------
John | 5 | 11
Mark | 4 | 10
The original algorithm I came up with was:
perform SELECT with two values, derived by partitioning the set of scores into two for each time period
over the first SELECT, perform another one, grouping the set by player name.
I'm stuck on step 1: running the following query:
SELECT
player,
AVG(score) OVER (PARTITION BY day BETWEEN '02-01-2014' AND '02-03-2014') AS period_1,
AVG(score) OVER (PARTITION BY day BETWEEN '02-04-2014' AND '02-06-2014') AS period_2;
Gets me incorrect result (note how period1 and period2 average scores scores are the same:
player | period_1_score | period_2_score
--------+----------------+----------------
John | 5 | 5
John | 5 | 5
John | 5 | 5
John | 5 | 5
John | 5 | 5
John | 5 | 5
Mark | 4 | 4
Mark | 4 | 4
Mark | 4 | 4
Mark | 4 | 4
Mark | 4 | 4
Mark | 4 | 4
I think I don't fully understand how window functions work... I have 2 questions:
What is wrong with my query?
How do I do it right?
You don't need window function for this.
Try:
select
player
,avg(case when day BETWEEN '02-01-2014' AND '02-03-2014' then score else null end) as period_1_score
,avg(case when day BETWEEN '02-04-2014' AND '02-06-2014' then score else null end) as period_1_score
from <your data>
group by player

Simple Sum & Group

I have a table that has 2 simple fields: RoomNumber & RoomEarned
I would like to group the rooms together that have multiple RoomEarned Values and combine their sum. Basically adding the value together inline.
basically making this table..
RoomNumber | RoomEarned
1 | 13.23
2 | 23.79
3 | 50.75
4 | 32.90
10 | 11.31
11 | 31.83
12 | 13.92
12 | 18.82
13 | 41.87
14 | 87.74
15 | 100.83
into this...
RoomNumber | RoomEarned
1 | 13.23
2 | 23.79
3 | 50.75
4 | 32.90
10 | 11.31
11 | 31.83
12 | 32.74
13 | 41.87
14 | 87.74
15 | 100.83
Obviously its a grouping function, but to my abilities.. I fall terribly short.
any ideas?
select RoomNumber, SUM(RoomEarned) from MyTable group by RoomNumber

Simple psql count query

I am very new to postgresql and would like to generate some summary data from our table
We have a simple message board - table name messages which has an element ctg_uid. Each ctg_uid corresponds to a category name in the table categories.
Here are the categories select * from categories ORDER by ctg_uid ASC;
ctg_uid | ctg_category | ctg_creator_uid
---------+--------------------+-----------------
1 | general | 1
2 | faults | 1
3 | computing | 1
4 | teaching | 2
5 | QIS-FEEDBACK | 3
6 | QIS-PHYS-FEEDBACK | 3
7 | SOP-?-CHANGE | 3
8 | agenda items | 7
10 | Acq & Process | 2
12 | physics-jobs | 3
13 | Tech meeting items | 12
16 | incident-forms | 3
17 | ERRORS | 3
19 | Files | 10
21 | QIS-CAR | 3
22 | doses | 4
24 | admin | 3
25 | audit | 3
26 | For Sale | 4
31 | URGENT-REPORTS | 4
34 | dt-jobs | 3
35 | JOBS | 3
36 | IN-PATIENTS | 4
37 | Ordering | 4
38 | dep-meetings | 4
39 | reporting | 4
What I would like to do is for all messages on our messages is count the frequency of each category
I can do it on a category by category basis
SELECT count(msg_ctg_uid) FROM messages where msg_ctg_uid='13';
However is it possible to do this in a one liner?
The following gives the the category and ctg_uid for each message
SELECT ctg_category, msg_ctg_uid FROM messages INNER JOIN categories ON (ctg_uid = msg_ctg_uid);
but SELECT ctg_category, count(msg_ctg_uid) FROM messages INNER JOIN categories ON (ctg_uid = msg_ctg_uid);
gives me the error ERROR: column "categories.ctg_category" must appear in the GROUP BY clause or be used in an aggregate function
How do I aggregate the frequency of each category ?
You're missing the group by clause:
SELECT ctg_category, count(msg_ctg_uid)
FROM messages INNER JOIN categories ON (ctg_uid = msg_ctg_uid);
GROUP BY ctg_category
this means you want the count per ctg_category