sql HAVING & averages [closed] - sql

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 8 years ago.
Improve this question
Learning the having clause, and averages (did AVG before), and I have to admit, I'm confused.
I'll post a problem here, one of the simpler ones on this weeks' assignment, and ask someone to explain it to me in 'real world' terminology (not the textbook language, which is sparse at best).I kinda see what I need to do, but averaging multiple values within same column has me somewhat confused, as does using 'Having' over 'Where'.
SQL> desc inventory
Name
----------------------------
BOOK_CODE
BRANCH_NUM
ON_HAND
SQL> SELECT * FROM INVENTORY;
BOOK BRANCH_NUM ON_HAND
---- ---------- ----------
0180 1 2
0189 2 2
0200 1 1
0200 2 3
0378 3 2
079X 2 1
079X 3 2
079X 4 3
0808 2 1
1351 2 4
1351 3 2
1382 2 1
138X 2 3
2226 1 3
2226 3 2
2226 4 1
2281 4 3
2766 3 2
2908 1 3
2908 4 1
3350 1 2
3743 2 1
3906 2 1
3906 3 2
5163 1 1
5790 4 2
6128 2 4
6128 3 3
6328 2 2
669X 1 1
6908 2 2
7405 3 2
7443 4 1
7559 2 2
8092 3 1
8720 1 3
9611 1 2
9627 3 5
9627 4 2
9701 1 2
9701 2 1
9701 3 3
9701 4 2
9882 3 3
9883 2 3
9883 4 2
9931 1 2

Maybe this helps:
You know how SELECT determines which COLUMNS to return? And you use WHERE in order to choose which ROWS will be returned? Obvious so far, right. But when you aggregate values from multiple rows together into one returned value, then you have to use HAVING if you want to perform some filtering on those.
For example:
SELECT author, title
FROM books
WHERE author = 'Twain, Mark'
would give you the list of all the books Mark Twain wrote.
And
SELECT author, count(*)
FROM books
GROUP BY author
would give you the count of books written by each author.
Finally, if you wanted only those authors who had written 3 or more books:
SELECT author, count(*)
FROM books
GROUP BY author
HAVING count(*) >= 3
I intentionally simplified the data here, but you can probably see how it applies. Another example could be to show all the authors whose books average less than $10:
SELECT author, average(price)
FROM books
GROUP BY author
HAVING average(price) < 10
If you tried to use WHERE here, all you could do is filter out rows in which the price of a single book was less:
SELECT author, average(price)
FROM books
GROUP BY author
WHERE price < 10
That would be the wrong answer of course, mostly because it would fail. But even if it returned a result, it would be the average price of all books less than $10 -- not the same at all.

Related

Cant merge two queries with different columns

I'm studying SQL and somehow I'm stuck with a question. I have 2 tables ('users' and 'follows').
Follows Table
user_id
follows
date
1
2
1993-09-01
2
1
1989-01-01
3
1
1993-07-01
2
3
1994-10-10
3
2
1995-03-01
4
2
1988-08-08
4
1
1988-08-08
1
4
1994-04-02
1
5
2000-01-01
5
1
2000-01-02
5
6
1986-01-10
7
1
1990-02-02
1
7
1996-10-01
1
8
1993-09-03
8
1
1995-09-01
8
9
1995-09-01
9
8
1996-01-10
7
8
1993-09-01
3
9
1996-05-30
4
9
1996-05-30
Users Table
user_id
first_name
last_name
school
1
Harry
Potter
Gryffindor
2
Ron
Wesley
Gryffindor
3
Hermonie
Granger
Gryffindor
4
Ginny
Weasley
Gryffindor
5
Draco
Malfoy
Slytherin
6
Tom
Riddle
Slytherin
7
Luna
Lovegood
Ravenclaw
8
Cho
Chang
Ravenclaw
9
Cedric
Diggory
Hufflepuff
I need to list all rows from follows where someone from one house follows someone from a different house. I tried to make 2 queries, one to get all houses related to follows.user_id and another one with all houses related to follows.follows and "merge" then:
select a.nome_id, a.user_id_house, b.follows_id, b.follows_house
from ( select follows.user_id as nome_id
, users.house as user_id_house
from follows inner join users
ON users.user_id = follows.user_id
) as a,
( select follows.follows as follows_id
, users.house as follows_house
from follows inner join users
ON follows.user_id = users.user_id
) as b
where a.user_id_house <> b.follows_house;
The problem is that the result is like 400 rows, its not right. I have no idea how I could solve this.
Try this
SELECT follows.user_id, users.school, followers.user_id, followers.school FROM follows
JOIN users ON follows.user_id=users.user_id
JOIN users as followers ON follows.follows=followers.user_id
WHERE users.school <> followers.school
Note: Pay attention to naming in my answer
Thanks for correcting to Thorsten Kettner

CTE Hierarchy Question - 1 table, 2 columns

I'm new to CTEs, and I am slowly starting to understand the basics.
I have a table that essentially goes in this pattern:
CUSTOMER X
CUSTOMER Y
1
1
1
2
2
3
3
4
3
5
4
5
5
6
I was wondering if a CTE would help return the numbers 1 through 6 (CUSTOMER Y) if Number 1 in CUSTOMER X had a specific column flagged for relevancy.
Customer 1 would be considered the main customer, while 2 - 6 could be stores related to said customer.
My end goal would be propogating down this relevancy flag for customers 2 - 6 if customer 1 has it and I'm currently trying to figure out how to get that full list.
I'd want the CTE to return a distinct list of customers.
CUSTOMER
1
2
3
4
5
6

Comma separated values to results

I have 3 tables namely Question, Feedback and FeedbackResult. I wish to generate a report where I need to find the rating average against each question.
Questions
---------
QuestionId Question
1 Question 1
2 Question 2
3 Question 3
4 Question 4
5 Question 5
Feedback
--------
FeedbackId Questions QuestionCount
2 1,2,3,4,5 5 -- Questions column has questionid from Question table
FeedbackResults
---------------
FeedbackResId FeedbackId Answers
1 2 4,3,5,2,3 -- these answers are ratings(1 to 5) against each question
2 2 4,2,3,4,5 -- which means 4 is the rating for QuestionId 1, 2 is for QuestionId 2 etc
3 2 5,3,4,3,2
4 2 4,1,1,1,2
I wish to get result as average rating against each question
Question Rating
Question 1 3.5
Question 2 4
Question 3 5
Question 4 2
Question 5 4.5
Edit
Should I redesign the database table as
FeedbackResults
---------------
FeedbackResId FeedbackId QuetionID Answers UserId
1 2 4 4 1
2 2 1 3 1
3 2 5 3 1
4 2 1 2 2

In a game show database scenario, how do I fetch the average total episode score per season in a single query?

Pardon the title gore. I'm having trouble finding a good way to express my question, which is endemic to the problem.
The Tables
season
id name
---- ------
1 Season 1
2 Season 2
3 Season 3
episode
id season_id number title
---- ----------- -------- ---------------------------------------
1 1 1 Pilot
2 1 2 1x02 - We Got Picked Up
3 1 3 1x03 - This is the Third Episode
4 2 1 2x01 - We didn't get cancelled.
5 2 2 2x02 - We're running out of ideas!
6 3 1 3x01 - We're still here.
7 3 2 3x02 - Okay, this game show is dying.
8 3 3 3x03 - Untitled
score
id episode_id score contestant_id (table not given)
---- ------------ ------- ---------------------------------
1 1 35 1
2 1 -12 2
3 1 8 3
4 1 5 4
5 2 13 1
6 2 -2 5
7 2 3 3
8 2 -14 6
9 3 -14.5 1
10 3 -3 2
11 3 1.5 7
12 3 9.5 5
13 4 22.8 1
14 4 -3 8
15 5 2 1
16 5 13.5 9
17 5 7 3
18 6 13 1
19 6 -84 10
20 6 12 11
21 7 3 1
22 7 10 2
23 8 29 1
24 8 1 5
As you can see, you have multiple episodes per season, and multiple scores per episode (one score per contestant). Contestants can reappear in later episodes (irrelevant), scores are floating point values, and there can be an arbitrary number of scores per episode.
So what am I looking for?
I'd like to get the average total episode score per season, where the total episode score is the sum of all the scores in an episode. Mathematically, this comes out to be the sum of all scores in a season divided by the number of episodes. Easy enough to comprehend, but I have had trouble doing it in a single query and getting the correct result. I'd like an output like the following:
name average_total_episode_score
---------- -----------------------------
Season 1 9.83
Season 2 21.15
Season 3 -5.33
The top-level query needs to be on the season table as it will be combined with other, similar queries on the same table. It's easy enough to do this with an aggregate in a subquery, but an aggregation executes the subquery, failing my single-query requirement. Can this be done in a single query?
Hope this should work
Select s.id, avg(score)
FROM Season S,
Episode e,
Score sc
WHERE s.id = e.season_id
AND e.id = sc.episode_id
Group by s.id
Okay, just figured it out. As usual, I had to write and post a whole book before the simple solution descended upon me.
The problem in my query (which I didn't give in the question) was the lack of a DISTINCT count. Here is a working query:
SELECT
"season"."id",
"season"."name",
(SUM("score"."score") / COUNT(DISTINCT "episode"."id")) AS "average_total_episode_score"
FROM "season"
LEFT OUTER JOIN "episode"
ON ("season"."id" = "episode"."season_id")
LEFT OUTER JOIN "score"
ON ("episode"."id" = "score"."episode_id")
GROUP BY "season"."id"
select Se.id AS Season_Id, sum(score) As season_score, avg(score) from score S join episode E ON S.episode_id = E.id
join Season se ON se.id = e.season_id group by se.id

Query: Employee Training Schedules Based on Position/Workrole

My company sends folks to training. Based on projected new hires/transfers, I was asked to generate a report that estimates the number of seats we need in each course broken out by quarter.
Question: My question is two-fold:
What is the best way to represent a sequence of courses (i.e. prerequisites) in a relational DB?
How do I create the query(-ies) necessary to produce the following desired output:
Desired Output:
ID PersonnelID CourseID ProjectedStartDate ProjectedEndDate
1 1 1 1/14/2017 1/14/2017
2 2 1 2/17/2017 2/17/2017
3 2 2 2/18/2017 2/19/2017
4 2 3 2/20/2017 2/20/2017
5 3 49 1/18/2017 2/03/2017
6 …
Background Info: The courses are taken in-sequence: the first few courses are orientation courses for the company, and later courses are more specific to the employee's workrole. There are over 50 different courses, 40 different workroles and we're projecting ~1k new hires/transfers. Each work role must take a sequence of courses in a prescribed order, but I'm having trouble representing this ordering and subsequently writing the necessary query.
Existing Tables:
I have several tables that I've used to store the data: Personnel, LnkPersonnelToWorkroles,Workroles, LnkWorkrolesToCourses, and Courses (there's many others as well, but I omit them for the sake of scoping this question down). Here's some notional data from these tables:
Personnel (These are the projected new hires and their estimated arrival date.)
ID DisplayName RequiredCompletionDate
1 Kristel Bump 10/1/2016
2 Shelton Franke 3/11/2017
3 Shaunda Launer 4/16/2017
4 Clarinda Kestler 3/13/2017
5 My Wimsatt 6/6/2017
6 Gillian Bramer 10/25/2016
7 ...
Workroles (These are the positions in the company)
ID Workrole
1 Manager
2 Secretary
3 Admin Asst.
4 ...
LnkPersonnelToWorkroles (Links projected new hires to their projected workrole)
ID PersonnelID WorkroleID
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 ...
Courses (All courses available)
ID CourseName LengthInDays
1 Orientation 1
2 Email Etiquette 2
3 Workplace Safety 1
4 ...
LnkWorkrolesToCourses
(Links workroles to their required courses in a Many-to-Many relationship)
ID WorkroleID CourseID
1 1 1
2 2 1
3 2 2
4 2 3
5 3 49
6 ...
Thoughts: My approach is to first develop a person-by-person schedule based upon the new hire's target completion date and workrole. Then for each class, I could sum the number of new hires starting in that quarter.
I've considered trying to represent the courses in the most general way I could think of (i.e. using a directed acyclic graph), but since most of the courses have only a single prerequisite course, I think it's much easier to represent the prerequisites using the Prerequisites table below; however, I don't know how I would use this in a query.
Prerequisites (Is this a good idea?)
ID CourseID PrereqCourseID
1 2 1
2 3 1
3 4 1
4 5 4
5 ...
Note: I am not currently concerned with whether or not the courses are actually offered on those days; we will figure out the course schedules once we know approximately how many we need each quarter. Right now, we're trying to estimate the demand for each course.
Edit 1: To clarify the Desired Output table: if the person begins course 1 on day D, then they can't start course 2 until after they finish course 1, i.e. until the next day. For courses with a length L >1 days, the start date for a subsequent courses is delayed L days. Notice this effect playing out for workrole ID 2 in the Desired Output table: He is expected to arrive on 2/17, start and complete course 1 the same day, begin course 2 the next day (on 2/18), and finish course 2 the day after that (on 2/19).
I'm posting this answer because it gives me an approximate solution; other answers are still welcome.
I avoided a prerequisite table altogether and opted for a simpler approach: a partial ordering of the courses.
First, I drew the course prerequisite tree; it looked similar to this image:
I defined a partial ordering of the courses based on their depth in the prerequisite tree. In the picture above, CHM124 and High School Chem w/ Lab are priority 1, CHM152 is priority 2, CHM 153 is priority 3, CHM260 and CHM 270 are priority 4, and so on... This partial ordering was stored in the CoursePriority table:
CoursePriority:
ID CourseID Priority
1 1 1
2 2 2
3 3 3
4 4 3
5 5 4
6 6 3
7 ...
So that no two courses would every be taken at the same time, I perturbed each course's priority by a small random number using the following Update query:
UPDATE CoursePriority SET CoursePriority.Priority = [Priority]+Rnd([ID])/1000;
(I used [ID] as input to the Rnd method to ensure each course was perturbed by a different random number.) I ended up with this:
ID CourseID Priority
1 1 1.000005623
2 2 2.000094955
3 3 3.000036401
4 4 3.000052486
5 5 4.000076711
6 6 3.00000535
7 ...
The approach above answers my first question "What is the best [sensible] way to represent a sequence of courses (i.e. prerequisites) in a relational DB?" Now as for generating the course schedule...
First, I created a query qryLnkCoursesPriorities to link Courses to the CoursePriority table:
SELECT Courses.ID AS CourseID, Courses.DurationInDays, CoursePriority.Priority
FROM Courses INNER JOIN CoursePriority ON Courses.ID = CoursePriority.CourseID;
Result:
CourseID DurationInDays Priority
1 35 1.000076177
2 21 2.000148297
3 28 3.000094352
4 14 3.000081442
5...
Second, I created the qryWorkrolePriorityDelay query:
SELECT LnkWorkrolesToCourses.WorkroleID, qryLnkCoursePriorities.CourseID AS CourseID, qryLnkCoursePriorities.Priority, qryLnkCoursePriorities.DurationInDays, ([DurationInDays]+Nz(DSum("DurationInDays","qryLnkCoursePriorities","[Priority]>" & [Priority] & ""))) AS LeadTimeInDays
FROM LnkWorkrolesToCourses INNER JOIN qryLnkCoursePriorities ON LnkWorkrolesToCourses.CourseID = qryLnkCoursePriorities.CourseID
ORDER BY LnkWorkrolesToCourses.WorkroleID, qryLnkCoursePriorities.Priority;
Simply put: The qryWorkrolePriorityDelay query tells me how many days in advance each course should be taken to ensure the new hire can complete all subsequent courses prior to their required training completion deadline. It looks like this:
WorkroleID CourseID Priority DurationInDays LeadTimeInDays
1 7 1.000060646 7 147
1 1 1.000076177 35 140
1 2 2.000148297 21 105
1 4 3.000081442 14 84
1 6 3.000082824 14 70
1 3 3.000094352 28 56
1 5 4.000106905 28 28
2...
Finally, I was able to bring this all together to create the qryCourseSchedule query:
SELECT Personnel.ID AS PersonnelID, LnkWorkrolesToCourses.CourseID, [ProjectedHireDate]-[leadTimeInDays] AS ProjectedStartDate, [ProjectedHireDate]-[leadTimeInDays]+[Courses].[DurationInDays] AS ProjectedEndDate
FROM Personnel INNER JOIN (((LnkWorkrolesToCourses INNER JOIN (Courses INNER JOIN qryWorkrolePriorityDelay ON Courses.ID = qryWorkrolePriorityDelay.CourseID) ON (Courses.ID = LnkWorkrolesToCourses.CourseID) AND (LnkWorkrolesToCourses.WorkroleID = qryWorkrolePriorityDelay.WorkroleID)) INNER JOIN LnkPersonnelToWorkroles ON LnkWorkrolesToCourses.WorkroleID = LnkPersonnelToWorkroles.WorkroleID) INNER JOIN CoursePriority ON Courses.ID = CoursePriority.CourseID) ON Personnel.ID = LnkPersonnelToWorkroles.PersonnelID
ORDER BY Personnel.ID, [ProjectedHireDate]-[leadTimeInDays]+[Courses].[DurationInDays];
This query gives me the following output:
PersonnelID CourseID ProjectedStartDate ProjectedEndDate
1 7 5/7/2016 5/14/2016
1 1 5/14/2016 6/18/2016
1 2 6/18/2016 7/9/2016
1 4 7/9/2016 7/23/2016
1 6 7/23/2016 8/6/2016
1 3 8/6/2016 9/3/2016
1 5 9/3/2016 10/1/2016
2...
With this output, I created a pivot table, where course start dates were grouped by quarter and counted. This gave me exactly what I needed.