Strange window function behaviour - sql

I have the following set of data:
player | score | day
--------+-------+------------
John | 3 | 02-01-2014
John | 5 | 02-02-2014
John | 7 | 02-03-2014
John | 9 | 02-04-2014
John | 11 | 02-05-2014
John | 13 | 02-06-2014
Mark | 2 | 02-01-2014
Mark | 4 | 02-02-2014
Mark | 6 | 02-03-2014
Mark | 8 | 02-04-2014
Mark | 10 | 02-05-2014
Mark | 12 | 02-06-2014
Given two time ranges:
02-01-2014..02-03-2014
02-04-2014..02-06-2014
I need to get average score for each player within a given time range. Ultimate result I'm trying to achieve is this:
player | period_1_score | period_2_score
--------+----------------+----------------
John | 5 | 11
Mark | 4 | 10
The original algorithm I came up with was:
perform SELECT with two values, derived by partitioning the set of scores into two for each time period
over the first SELECT, perform another one, grouping the set by player name.
I'm stuck on step 1: running the following query:
SELECT
player,
AVG(score) OVER (PARTITION BY day BETWEEN '02-01-2014' AND '02-03-2014') AS period_1,
AVG(score) OVER (PARTITION BY day BETWEEN '02-04-2014' AND '02-06-2014') AS period_2;
Gets me incorrect result (note how period1 and period2 average scores scores are the same:
player | period_1_score | period_2_score
--------+----------------+----------------
John | 5 | 5
John | 5 | 5
John | 5 | 5
John | 5 | 5
John | 5 | 5
John | 5 | 5
Mark | 4 | 4
Mark | 4 | 4
Mark | 4 | 4
Mark | 4 | 4
Mark | 4 | 4
Mark | 4 | 4
I think I don't fully understand how window functions work... I have 2 questions:
What is wrong with my query?
How do I do it right?

You don't need window function for this.
Try:
select
player
,avg(case when day BETWEEN '02-01-2014' AND '02-03-2014' then score else null end) as period_1_score
,avg(case when day BETWEEN '02-04-2014' AND '02-06-2014' then score else null end) as period_1_score
from <your data>
group by player

Related

How do you use two aggregate functions for separate tables in a join?

Sorry if this is a noob question!
I have two tables - a movie and a comment table.
I am trying to return output of the movie name and each comment for that movie as long as that movie has more than 1 comment associated to it.
Here are my tables
test_movies=# SELECT * FROM movie;
id | name | rating | release_date | original_copy_location
----+------------------------------------+--------+--------------+------------------------
1 | Cruella | 9 | 2021-05-28 | 4
7 | Shutter Island | 9 | 2010-02-19 | 4
9 | Grown Ups | 7 | 2010-06-25 | 4
11 | Guardians of the Galaxy: Volume 1 | 8 | 2014-09-01 | 4
14 | The RIng | 8 | 2002-10-18 | 4
17 | Digimon: The Movie | 6 | 2000-01-10 | 4
19 | Star Wars Episode 1 | 5 | 1999-06-21 | 4
20 | Ghosts Of Mars | 5 | 1998-09-15 | 4
5 | Interstellar | 8 | 2014-11-07 | 1
10 | Mean Girls | 8 | 2004-04-30 | 1
12 | Captain America: The First Avenger | 7 | 2011-07-22 | 1
15 | Get Out | 6 | 2017-02-24 | 1
6 | The Dark Knight | 10 | 2008-07-18 | 2
16 | Pokemon: The First Movie | 5 | 1998-11-10 | 2
18 | The Last Dance | 8 | 2020-05-01 | 2
8 | Just Go With It | 8 | 2011-02-11 | 3
13 | The Blair Witch Project | 8 | 1999-08-29 | 3
(17 rows)
test_movies=# SELECT * FROM comments;
c_id | c_comment | c_movie | c_user
------+--------------------------------------+---------+--------
1 | testing comment 1 | 16 | 4
2 | testing comment 1 | 1 | 1
3 | testing comment 1 | 1 | 2
4 | testing comment 1 | 8 | 5
5 | testing comment 1 | 6 | 3
6 | testing comment 1 | 12 | 2
7 | testing comment 1 | 20 | 3
8 | testing comment 1 | 16 | 5
9 | testing comment 1 | 17 | 4
10 | testing comment 1 | 12 | 2
(10 rows)
Output im trying to get is this:
name | c_comment
------------------------+-------------------------------------
Cruella | testing comment 1
Curella | testing comment 1
Pokemon:The First Movie | testing comment 1
Pokemon:The First Movie | testing comment 1
Captain America | testing comment 1
Captain America | testing comment 1
The problem with my queries is that I can't figure out how to return both the movie name and comment associated with it using aggregate functions.
If I use the count in the first select statement it returns all rows:
SELECT m.name, c.c_comment FROM movie m, comments c WHERE m.id = c.c_movie GROUP BY m.name, c.c_comment HAVING COUNT(m.name) >= 1;
If I try the below subquery I get the error - ERROR: subquery must return only one column
SELECT m.name, c.c_comment FROM movie m, comments c WHERE m.id = c.c_movie AND(SELECT m.name, COUNT(c.c_movie) FROM movie m, comments c WHERE m.id =c.c_movie GROUP BY name HAVING COUNT(c.c_movie) > 1);
Still a bit new to SQL as I'm a student and having a tough time figuring this query out lol.
Thanks in advance!
Something like this could work
select m.name, c.c_comment
from movie m
join comment c
on c.c_movie = m.id
where exists (select 1 from comments cc where cc.c_movie=m.id group by c_movie having count(*)>1)
It's standard sql, but you cannot work with mysql and postgresql at the same time... 🤔
Use window functions!
select m.name, c.c_comment
from movie m join
(select c.*, count(*) over (partition by c_movie) as cnt
from comment c
) c
on c.c_movie = m.id
where cnt > 1;

Grouping the rows on the basis of specific condition in SQL Server

I want to group the rows on the basis of a specific condition.
The table structure is something like this
EmpID | EmpName | TaskId | A_Shift_Status | B_Shift_Status | C_Shift_Status | D_Shift_Status
1 | John | 1 | 1 | null | 2 | 1
1 | John | 2 | 1 | null | 1 | 1
2 | Mike | 3 | 1 | 1 | 2 | 1
2 | Mike | 4 | null | 1 | null | 1
3 | Steve | 5 | null | 1 | 2 | 1
3 | Steve | 6 | 1 | null | 2 | 1
The criteria will be
Done 1
Pending 2
NA 3
The expected output is to group the employees by task and the status will be on the following condition
if ALL tasks are done by any employee then the status will be done
(i.e. 1)
if ANY of the tasks is incomplete then the status will be
incomplete/pending (i.e. 2)
So the desired output will be
EmpID | EmpName | A_Shift_Status | B_Shift_Status | C_Shift_Status | D_Shift_Status
1 | John | 1 | null | 2 | 1
2 | Mike | 1 | 1 | 2 | 1
3 | Steve | 1 | 1 | 2 | 1
So in other terms summary/grouping should only show complete/done (i.e. 1) when all the rows of a particular shift column of an employee have status as complete/done (i.e. 1)
Based on your data (where the criteria are 1, 2 and NULL for n/a), a simple 'group by' the employee, and MAX of the columns, should work e.g.,
SELECT
yt.EmpID,
yt.EmpName,
MAX(yt.A_Shift_Status) AS A_Shift_Status,
MAX(yt.B_Shift_Status) AS B_Shift_Status,
MAX(yt.C_Shift_Status) AS C_Shift_Status,
MAX(yt.D_Shift_Status) AS D_Shift_Status
FROM
yourtable yt
GROUP BY
yt.EmpID,
yt.EmpName;
For the shift statuses
If any of them are 2, it returns 2
otherwise if any of them are 1, it returns 1
otherwise it returns NULL
Notes re 1/2/3 (which was specified as criteria) vs 1/2/NULL (which is in the data)
It gets a little tricker if the inputs are supposed to use 1/2/3 instead of 1/2/NULL. Let us know if you are changing the inputs to reflect that.
If the input is fine as NULLs, but you need the output to have '3' for n/a (nulls), you can put an ISNULL or COALESCE around the MAX statements e.g., ISNULL(MAX(yt.A_Shift_Status), 3) AS A_Shift_Status

Generate 'average' column from sub query and ROW_NUMBER window function in SQL SELECT

I have the following SQL Server tables (with sample data):
Questionnaire
id | coachNodeId | youngPersonNodeId | complete
1 | 12 | 678 | 1
2 | 12 | 52 | 1
3 | 30 | 99 | 1
4 | 12 | 678 | 1
5 | 12 | 678 | 1
6 | 30 | 99 | 1
7 | 12 | 52 | 1
8 | 30 | 102 | 1
Answer
id | questionnaireId | score
1 | 1 | 1
2 | 2 | 3
3 | 2 | 2
4 | 2 | 5
5 | 3 | 5
6 | 4 | 5
7 | 4 | 3
8 | 5 | 4
9 | 6 | 1
10 | 6 | 3
11 | 7 | 5
12 | 8 | 5
ContentNode
id | text
12 | Zak
30 | Phil
52 | Jane
99 | Ali
102 | Ed
678 | Chris
I have the following T-SQL query:
SELECT
Questionnaire.id AS questionnaireId,
coachNodeId AS coachNodeId,
coachNode.[text] AS coachName,
youngPersonNodeId AS youngPersonNodeId,
youngPersonNode.[text] AS youngPersonName,
ROW_NUMBER() OVER (PARTITION BY Questionnaire.coachNodeId, Questionnaire.youngPersonNodeId ORDER BY Questionnaire.id) AS questionnaireNumber,
score = (SELECT AVG(score) FROM Answer WHERE Answer.questionnaireId = Questionnaire.id)
FROM
Questionnaire
LEFT JOIN
ContentNode AS coachNode ON Questionnaire.coachNodeId = coachNode.id
LEFT JOIN
ContentNode AS youngPersonNode ON Questionnaire.youngPersonNodeId = youngPersonNode.id
WHERE
(complete = 1)
ORDER BY
coachNodeId, youngPersonNodeId
This query outputs the following example data:
questionnaireId | coachNodeId | coachName | youngPersonNodeId | youngPersonName | questionnaireNumber | score
1 | 12 | Zak | 678 | Chris | 1 | 1
2 | 12 | Zak | 52 | Jane | 1 | 3
3 | 30 | Phil | 99 | Ali | 1 | 5
4 | 12 | Zak | 678 | Chris | 2 | 4
5 | 12 | Zak | 678 | Chris | 3 | 4
6 | 30 | Phil | 99 | Ali | 2 | 2
7 | 12 | Zak | 52 | Jane | 2 | 5
8 | 30 | Phil | 102 | Ed | 1 | 5
To explain what's happening here… There are various coaches whose job is to undertake questionnaires with various young people, and log the scores. A coach might, at a later date, repeat the questionnaire with the same young person several times, hoping that they get a better score. The ultimate goal of what I'm trying to achieve is that the managers of the coaches want to see how well the coaches are performing, so they'd like to see whether the scores for the questionnaires tend to go up or not. The window function represents a way to establish how many times the questionnaire has been undertaken by the same coach/young person combo.
I need to be able to determine the average score based on the questionnaire number. So for example, the coach 'Zak' logged scores of '1' and '3' for his first questionnaires (where questionnaireNumber = 1) so the average would be 2. For his second questionnaires (where questionnaireNumber = 2) the scores were '3' and '5' so the average would be 4. So in analysing this data we know that over time Zak's questionnaire scores have improved from an average of '2' the first time to an average of '4' the second time.
I feel like the query needs to be grouped by the coachNodeId and questionnaireNumber values so it would output something like this (I've ommitted the questionnaireId, youngPersonNodeId, youngPersonName and score columns as they aren't crucial for the output — they're only used to derive the averageScore — and wouldn't be useful the way the results are grouped):
coachNodeId | coachName | questionnaireNumber | averageScore
12 | Zak | 1 | 2 (calculation: (1 + 3) / 2)
12 | Zak | 2 | 4 (calculation: (3 + 5) / 2)
12 | Zak | 3 | 4 (only one value: 4)
30 | Phil | 1 | 5 (calculation: (5 + 5) / 2)
30 | Phil | 2 | 2 (only one value: 2)
Could anyone suggest how I can modify my query to output the average scores based on the score from the sub-query and the ROW_NUMBER window function? I've hit the limits of my SQL skills!
Many thanks.
It is a bit hard to tell without sample data, but I think you are describing aggregation:
SELECT q.coachNodeId AS coachNodeId,
cn.[text] AS coachName,
q.youngPersonNodeId AS youngPersonNodeId,
ypn.[text] AS youngPersonName,
AVG(score)
FROM Questionnaire q JOIN
ContentNode cn
ON q.coachNodeId = cn.id JOIN
ContentNode ypn
ON q.youngPersonNodeId = ypn.id LEFT JOIN
Answer a
ON a.questionnaireId = q.id
WHERE complete = 1
GROUP BY q.coachNodeID, cn.[text] AS coachName,
q.youngPersonNodeId, ypn.[text]

How correctly use AVG in query?

In PostgreSQL database I have table called answers which looks like this:
| EMPLOYEE | QUESTION_ID | QUESTION_TEXT | OPTION_ID | OPTION_TEXT |
|----------|-------------|------------------------|-----------|--------------|
| Bob | 1 | Do you like soup? | 1 | 1 |
| Alex | 1 | Do you like soup? | 9 | 9 |
| Oliver | 1 | Do you like soup? | 6 | 6 |
| Bob | 2 | Do you like ice cream? | 3 | 3 |
| Alex | 2 | Do you like ice cream? | 9 | 9 |
| Oliver | 2 | Do you like ice cream? | 8 | 8 |
| Bob | 3 | Do you like summer? | 2 | 2 |
| Alex | 3 | Do you like summer? | 9 | 9 |
| Oliver | 3 | Do you like summer? | 8 | 8 |
In this table you can notice that I have 3 question and user answers to them. Users answer questions on a scale of one to ten. I'm trying to find the number of users whose avg of answers to questions 1, 2 and 3 is greater than 5 without deep subquery. For example only 2 user has avg(option_text) for three question more than 5. They are Alex and Oliver.
I tried to use this script, but it's work not as I expected:
SELECT
SUM(CASE WHEN (AVG(OPTION_ID) FILTER(WHERE QUESTION_ID IN(61, 62))) > 5 THEN 1 ELSE 0 END) AS COUNT
FROM
ANSWERS;
ERROR:
SQL Error [42803]: ERROR: aggregate function calls cannot be nested
You can select all employees that have an average response of greater than 5 for questions 1,2,3 with a group by query
select employee, avg(option_id)
from answers
where question_id in (1,2,3)
group by employee
having avg(option_id) > 5
and count(distinct question_id) = 3
-- the last part is only needed if you only want employees that answered all questions
To count the number of users that have an average that's greater than 5
select count(*) from (
select employee
from answers
where question_id in (1,2,3)
group by employee
having avg(option_id) > 5
and count(distinct question_id) = 3
)
This following query should work-
SELECT
DISTINCT COUNT(*) OVER () AS CNT
FROM ANSWERS
WHERE QUESTION_ID NOT IN(61, 62)
GROUP BY EMPLOYEE
HAVING AVG(OPTION_ID) > 5
Check demo Here

SQL Order by group of specific values

I tried to found the solution, I might have done wrong researches, that's why I need your help :(
There is 6 different categorie, with different values. I want to select all of them, but ordered in 2 differents groups : the first would contain all between 1 and 3, ordered by another value.
Always in the same request, I want to display category between 4 and 6, ordered by another value.
The better way is to show you before and after, what I would like :
BEFORE
|Category | name |
| 1 | Barney |
| 6 | Ted |
| 6 | Anita |
| 3 | Jessica |
| 2 | Marshall |
| 3 | Lily |
| 4 | Robin |
| 2 | Bryan |
| 5 | Oliver |
AFTER
|Category | name | ----- Alphabetic sort
| 1 | Barney |
| 2 | Bryan |
| 3 | Jessica |
| 3 | Lily |
| 2 | Marshall |
---------------------------Imaginary line which seperate 2 groups : category 1 2 3 and 4 5 6
| 6 | Anita |
| 5 | Oliver |
| 4 | Robin |
| 6 | Ted |
I hope you understood what I meaned !
Thank you for your help !
Try this:
ORDER BY CASE
WHEN category IN (1, 2, 3) THEN 1
WHEN category IN (4, 5, 6) THEN 2
ELSE 3
END,
name
The query uses a CASE expression in order to group together category subsets: subset 1, 2, 3 is assigned a value of 1 and hence has the greatest priority. Subset 4, 5, 6 is assigned a value of 2, whereas the rest of categories get the lowest priority, i.e. the value of 3.
If you are using MySQL or PostgreSQL you can easily get this by using ORDER BY category>3, name, assuming there are only these two possible groups.
In your SELECT statement try:
... ORDER BY IF(Category<4,0,1) ASC, name ASC