Calculate the variance of the weights of all players [closed] - sql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
We are given data from nba where a description of each table is as follows:
coaches_season, each tuple of which describes the performance of one coach in one season;[cid, year, yr_order, year, season_win, season_loss, play_off_win, play_off_loss, tid]
teams, each tuple of which gives the basic information of a team; [tid, location, name, league]
players, each tuple of which gives the basic information of one player; [ilkid, firstname, lastname, position, first_season, last_season, h_feet, h_inches, weight, college, birthday]
player_rs, each tuple of which gives the detailed performance of one player in one regular season; [ilkid,tid, pts, asts, of, ftm, tpa,tpm, fgm,fga, fta, blk, turnover, stl, dreb, oreb, reb, minutes, gp, league, lastname, firstname, year]
player_rs_career, each tuple of which gives the detailed regular-season performance of one player in his career;[ilkid, firstname, lastname, fga, fgm, fta, ftm, tpa, tpm, pf, stl, oreb, minutes, gp, dreb, asts, turnover, blk, reb, league]
draft, each tuple of which shows the information of an NBA draft. [draft_year, firstname, lastname, draft_round, tid, selection, draft_from, ilkid, league]
I found many queries but am stuck with these 3 queries:
I) For each college, print the college name and average number of drafts (per season) they sent to NBA. However, only report those colleges that sent drafts in at least 3 seasons.
II) Calculate the variance of the weights of all players;
III) Print the first and last names of those who either scored more than 12000 points in their careers or played for more than 12 seasons.

In this case especially the DDL for each table including Primary and Foreign Key definitions. Also include sample data, as text, not an image and the expected output from that data. Further, in this case you may want to include the definitions for the column names as not everyone is familiar with the acronyms used for the NBA.
With that said, I'll give it a stab. Note since you didn't include test data nor table definitions the queries have NOT been tested.
-- I) For each college, print the college name and average number of drafts (per season) they sent to NBA.
-- However, only report those colleges that sent drafts in at least 3 seasons.
-- assumptions:
-- draft yr integer specifying calendar year of draft
-- draft_from text name of college
-- average number of drafts (per season) ?? how many drafts are there per season
-- what is the difference between season and year
with draft as
(select max(draft_yr) dy from draft_year)
, dy_last3 as
(select distinct draft_from df
from draft_year
where exists (select null from draft_year, draft where draft_yr = dy)
and exists (select null from draft_year, draft where draft_yr = dy-1)
and exists (select null from draft_year, draft where draft_yr = dy-2)
)
select draft_from, round(avg(drc),2) adv_drafts
from (
select draft_from, draft_yr, count(*) drc
from draft_year
where draft_from in (select df from dy_last3)
group by draft_from,draft_yr
) t
group by draft_from;
-- II) Calculate the variance of the weights of all players;
-- assumption: weight defined as float;
select var_samp(weight) from player;
OR
select var_pop(weight) from player;
-- III)Print the first and last names of those who either scored more than 12000 points in their careers or played for more than 12 seasons.
-- assumption fgm => field goals made = 2 points each
-- ftm => free throws made = 1 point each
-- tpm => 3 point shot make = 3 points each
-- ilkid => Pk in player and FK in player_rs_career
-- table player_rs_career does include last/current season
-- note player_rs_career does NOT contain year/season, unless hidden by undescribed column name
select distinct *
from (select p.firstname, p.lastname
, sum(ftm + (2*fgm) + (3*tpm)) over (partition by p.ilkid) points
, (coalesce (p.last_season, extract (year from now())::integer) - p.first_season + 1) seasons
from player p
join player_rs_career pc
on p.ilkid = pc.ilkid
) pp
where points > 12000
or seasons > 12;

Related

SQL: nested query with tuple constructor

I have some difficulties dealing with an SQL exercise for my Intro to Database course. The SQL standard we mainly use is the Oracle one (the one compatible with Apex).
I have the following SQL database (primary keys bold):
TEENAGER(SSN, Name, Surname, BirthDate, CityOfResidence, Sex)
ACTIVITY(ActivityCode, AName, Description, Category)
SUMMER-CAMP(CampCode, CampName, City)
SUBSCRIPTION-TO-ACTIVITY-IN-SUMMER-CAMP(SSN,ActivityCode, CampCode,
SubscriptionDate)
This is what the exercise asks:
"For each teenager, born before 2005, who subscribed to activities
organized by at least 5 different summer camps, show name, surname,
birth date of the teenager and the name of each summer camp to which
the teenager subscribed to all the different activities organized by
the camp."
I do not have any problem finding the SSNs of the teenagers born before 2005 and who subscribed to at least 5 camps and I am able to find the number of different activities organized by the camp. How do I manage to use this information to find the final result?
Now, this is my attempt to a solution (I added two in-line comments with "#" for clarity):
FROM TEENAGER T, SUMMER-CAMP SC, SUBSCRIPTION-TO-ACTIVITY-IN-SUMMER-CAMP STAISC
WHERE T.SSN = STAISC.SSN AND STAISC.CampCode = SC.CampCode
AND SSN IN (SELECT T.SSN #born before 2005 and at least 5 camps
FROM TEENAGER T, SUBSCRIPTION-TO-ACTIVITY-IN-SUMMER-CAMP STAISC
WHERE T.BirthDate < TO_DATE('01/01/2005', 'DD/MM/YYYY')
AND T.SSN = STAISC.SSN
GROUP BY T.SSN
HAVING COUNT(DISTINCT STAISC.CampCode) > 4)
GROUP BY STAISC.CampCode, T.SSN
HAVING (STAISC.CampCode, COUNT(DISTINCT ActivityCode)) IN (SELECT CampCode, COUNT(DISTINCT ActivityCode) #number of activities in camps
FROM SUBSCRIPTION-TO-ACTIVITY-IN-SUMMER-CAMP
GROUP BY CampCode)```
As you can see, I am using a tuple constructor in the outer-most query in a HAVING clause to try and use the information about the total number of activities organised in a camp. Am I allowed to do that and would it work? (The professor did not give us any database since in the exam we will have to write down the query without being able to run it).
Thanks in advance!
I answer to my own question since I found a correct solution:
SELECT T.SSN, SC.CampCode, T.Name, T.Surname, T.BirthDate, SC.CampName
FROM TEENAGER T, SUMMER-CAMP SC, SUBSCRIPTION-TO-ACTIVITY-IN-SUMMER-CAMP STAISC
WHERE BirthDate < TO_DATE('01/01/2005', 'DD/MM/YYYY')
AND T.SSN = STAISC.SSN AND STAISC.CampCode = SC.CampCode
AND T.SSN IN(SELECT SSN FROM SUBSCRIPTION-TO-ACTIVITY-IN-SUMMER-CAMP
GROUP BY SSN
HAVING COUNT(DISTINCT CampCode))
GROUP BY T.SSN, SC.CampCode
HAVING COUNT(DISTINCT STAISC.ActivityCode) = (SELECT COUNT(ActivityCode)
FROM SUBSCRIPTION-TO-ACTIVITY-IN-SUMMER-CAMP STAISC2
WHERE STAISC.CampCode = STAISC2.CampCode)

SQL query , group by only one column

i want to group this query by project only because there are two records of same project but i only want one.
But when i add group by clause it asks me to add other columns as well by which grouping does not work.
*
DECLARE #Year varchar(75) = '2018'
DECLARE #von DateTime = '1.09.2018'
DECLARE #bis DateTime = '30.09.2018'
select new_projekt ,new_geschftsartname, new_mitarbeitername, new_stundensatz
from Filterednew_projektkondition ps
left join Filterednew_fakturierungsplan fp on ps.new_projekt = fp.new_hauptprojekt1
where ps.statecodename = 'Aktiv'
and fp.new_startdatum >= #von +'00:00:00'
and fp.new_enddatum <= #bis +'23:59:59'
--and new_projekt= Filterednew_projekt.new_
--group by new_projekt
*
look at the column new_projekt . row 2 and 3 has same project, but i want it to appear only once. Due to different other columns this is not possible.
if its of interested , there is another coluim projectcondition id which is unique for both
You can't ask a database to decide arbitrarily for you, which records should be thrown away when doing a group. You have to be precise and specific
Example, here is some data about a person:
Name, AddressZipCode
John Doe, 90210
John Doe, 12345
SELECT name, addresszipcode FROM person INNER JOIN address on address.personid = person.id
There are two addresses stored for this one guy, the person data is repeated in the output!
"I don't want that. I only want to see one line for this guy, together with his address"
Which address?
That's what you have to tell the database
"Well, obviously his current address"
And how do you denote that an address is current?
"It's the one with the null enddate"
SELECT name, addresszipcode FROM person INNER JOIN address on address.personid = person.id WHERE address.enddate = null
If you still get two addresses out, there are two address records that are null - you have data that is in violation of your business data modelling principles ("a person's address history shall have at most one adddress that is current, denoted by a null end date") - fix the data
"Why can't i just group by name?"
You can, but if you do, you still have to tell the database how to accumulate the non-name data that it shows you. You want an address data out of it, it has 2 it wants to show you, you have to tell it which to discard. You could do this:
SELECT name, MAX(addresszipcode) FROM person INNER JOIN address on address.personid = person.id GROUP BY name
"But I don't want the max zipcode? That doesn't make sense"
OK, use the MIN, the SUM, the AVG, anything that makes sense. If none of these make sense, then use something that does, like the address line that has the highest end date, or the lowest end date that is a future end date. If you only want one address on show you must decide how to boil that data down to just one record - you have to write the rule for the database to follow and no question about it you have to create a rule so make it a rule that describes what you actually want
Ok, so you created a rule - you want only the rows with the minimum new_stundenstatz
DECLARE #Year varchar(75) = '2018'
DECLARE #von DateTime = '1.09.2018'
DECLARE #bis DateTime = '30.09.2018'
select new_projekt ,new_geschftsartname, new_mitarbeitername, new_stundensatz
from
(SELECT *, ROW_NUMBER() OVER(PARTITON BY new_projekt ORDER BY new_stundensatz) rown FROM Filterednew_projektkondition) ps
left join
Filterednew_fakturierungsplan fp on ps.new_projekt = fp.new_hauptprojekt1
where ps.statecodename = 'Aktiv'
and fp.new_startdatum >= #von +'00:00:00'
and fp.new_enddatum <= #bis +'23:59:59'
and ps.rown = 1
Here I've used an analytic operation to number the rows in your PS table. They're numbered in order of ascending new_stundensatz, starting with 1. The numbering restarts when the new_projekt changes, so each new_projekt will have a number 1 row.. and then we make that a condition of the where
(Helpful side note for applying this technique in future.. Ff it were the FP table we were adding a row number to, we would need to put AND fp.rown= 1 in the ON clause, not the WHERE clause, because putting it in the where would make the LEFT join behave like an INNER, hiding rows that don't have any FP matching record)

How can I select the highest counts attributes from different groups?

So I have a table with players data(name, team, etc..) and a table with goals (player who scored it, local team, etc...). What I need to do is, get from each team the highest scorer. So the result I'm getting is something like:
germany - whatever name - 1
germany - another dude - 5
spain - another name - 8
italy - one more name - 6
As you can see teams repeat, and I want them not to, just get the highest scorer of each team.
Right now I have this:
SELECT P.TEAM_PLAYER, G.PLAYER_GOAL, COUNT(*) AS "TOTAL GOALS" FROM PLAYER P, GOAL G
WHERE TO_CHAR(G.DATE_GOAL, 'YYYY')=2002
AND P.NAME = G.PLAYER_GOAL
GROUP BY G.PLAYER_GOAL, P.TEAM_PLAYER
HAVING COUNT(*)>=ALL (SELECT COUNT(*) FROM PLAYER P2 where P.TEAM_PLAYER = P2.TEAM_PLAYER GROUP BY P2.TEAM_PLAYER)
ORDER BY COUNT(*) DESC;
I am 100% sure I'm close, and I'm pretty sure I have to do this with the HAVING feature, but I can't get it right.
Without the HAVING it returns a list of all the players, their teams and how many goals have they scored, now I want to cut it down to only one player for each team.
PD: the teams in the table GOAL are local and visiting team, so I have to use the Player table to get the team. Also the Goal table is not a list of the players and how many goals they have scored, but a list of every individual goal and the player who scored it.
If I understand correctly you can try this query.
just get MAX of PLAYER_GOAL column,SUM(G.PLAYER_GOAL) instead of COUNT(*)
SELECT P.TEAM_PLAYER,
MAX(G.PLAYER_GOAL) "PLAYER_GOAL",
SUM(G.PLAYER_GOAL) AS "TOTAL GOALS"
FROM PLAYER P
INNER JOIN GOAL G
ON P.NAME = G.PLAYER_NAME
WHERE TO_CHAR(G.DATE_GOAL, 'YYYY')=2002
GROUP BY P.TEAM_PLAYER
ORDER BY SUM(G.PLAYER_GOAL) DESC;
NOTE :
Avoid using commas to join tables it's a old join style, You can use inner-join instead.
Edit
I don't know your table schema, but this query might be work.
use a subquery to contain your current result set. then get MAX function and GROUP BY
SELECT T.TEAM_PLAYER,
T.PLAYER_GOAL,
MAX(TOTAL_GOALS) AS "TOTAL GOALS"
FROM
(
SELECT P.TEAM_PLAYER, G.PLAYER_GOAL, COUNT(*) AS "TOTAL_GOALS" FROM
PLAYER P, GOAL G
WHERE TO_CHAR(G.DATE_GOAL, 'YYYY')=2002
AND P.NAME = G.PLAYER_GOAL
GROUP BY G.PLAYER_GOAL, P.TEAM_PLAYER
HAVING COUNT(*)>=ALL (SELECT COUNT(*) FROM PLAYER P2 where P.TEAM_PLAYER = P2.TEAM_PLAYER GROUP BY P2.TEAM_PLAYER)
) T
GROUP BY T.TEAM_PLAYER,
T.PLAYER_GOAL
ORDER BY MAX(TOTAL_GOALS) DESC

finding people with possible incorrectly spelled cities where zip codes match

I am trying to create a report that will return a list of people whose cities most likely need to be corrected.
I was thinking of comparing the data against other data within the table to leverage the assumption that most of the cities are spelled correctly. Take Albuquerque, for example. We have records for many of the zip codes, but the city isn't always spelled correctly.
I can't figure out my next step.
Here's what I have started with:
SELECT city, zip_5_digits, COUNT(*) AS "COUNT"
FROM people
INNER JOIN addresses
ON addresses.people_id = people.id
AND city LIKE 'Albu%que'
GROUP BY city, zip_5_digits
Doing this results in
Albuqureque 87108 1
Albuquerque 87108 238
Albuqerque 87109 1
Albuquerque 87109 34
What I'd like to do is, for each row, find the maximum records where the zip code matches but the city does not match. If there is no match, I want to return that record, and I'll use this to return people's id and names, since I most likely need to correct the name of the city for those people who have it mis-spelled.
This is hard, because some "cities" have very few residents. And, some zip codes might just have a small part of a city.
I would recommend two rules:
Look at zip codes that have at least a certain number of people -- say 100.
Look at cities in the zip code that have less than some number -- say 5.
There are candidates for misspellings:
SELECT pa.*
FROM (SELECT city, zip_5_digits, COUNT(*) AS cnt,
MAX(COUNT(*)) OVER (PARTITION BY zip_5_digits) as max_cnt,
SUM(COUNT(*)) OVER (PARTITION BY zip_5_digits) as sum_cnt
FROM people p, INNER JOIN
addresses a
ON a.people_id = p.id
GROUP BY city, zip_5_digits
) pa
WHERE sum_cnt >= 100 AND cnt <= 5;

Help with SQL query (Calculate a ratio between two entitiess)

I’m going to calculate a ratio between two entities but are having some trouble with the query.
The principal is the same to, say a forum, where you say:
A user gets points for every new thread. Then, calculate the ratio of points for the number of threads.
Example:
User A has 300 points. User A has started 6 thread. The point ratio is: 50:6
My schemas look as following:
student(studentid, name, class, major)
course(courseid, coursename, department)
courseoffering(courseid, semester, year, instructor)
faculty(name, office, salary)
gradereport(studentid, courseid, semester, year, grade)
The relations is a following:
Faculity(name) = courseoffering(instructor)
Student(studentid) = gradereport (studentid)
Courseoffering(courseid) = course(courseid)
Gradereport(courseid) = courseoffering(courseid)
I have this query to select the faculty names there is teaching one or more students:
SELECT COUNT(faculty.name) FROM faculty, courseoffering, gradereport, student WHERE faculty.name = courseoffering.instructor AND courseoffering.courseid = gradereport.courseid AND gradereport.studentid = student.studentid
My problem is to find the ratio between the faculty members salary in regarding to the number of students they are teaching.
Say, a teacher get 10.000 in salary and teaches 5 students, then his ratio should be 1:5.
I hope that someone has an answer to my problem and understand what I'm having trouble with.
Thanks
Mestika
Some further explanation and examples on my problem and request:
Employee 1: Salary = 10.000 | # of courses he teaches: 3 | # of students (totaly) following thoes 3 courses: 15.
Then, Employee 1 earns 666,7 pr. each student. (i believe this is the ratio)
Employee 2: Salary = 30.000 | # of courses he teaches: 1 | # of students (totaly) following thoes 3 courses: 6.
Then, Employee 2 earns 5000 pr. each student.
You are completely right that my own ratio examples don’t make sense so I will try to explain further.
What I am seeking to do is, to find out how much salary each faculty member has depending on the number of students they are teaching. I imagine that it is a simple question about dividing a faculty members salary by the number of students following a course that the member is teaching.
I get an error when I am running your query, my MySQL has a problem with the convert part (it seems) but otherwise you query is correct it seems.
I haven’t tried the convert statement before, but is it (and why) necessary to convert them? If I for each faculty member that has the correct conditions, find the number of students that are attending the course. Then take that faculty members salary and divide it by the found numbers of student?
when I look at your first example it says that 300 points for 6 threads works out to 50:6 rato. Don't you mean in your later example that 10000 salary for 5 students works out to 2000:5 ratio? not 1:5 ratio?
anyway if my understanding of your example is correct then this should be a good solution
select f.name, f.salary, count(s.studentid) as noofstudents, convert(f.salary / count(s.studentid),varchar(50)) + ':' + convert(count(s.studentid),varchar(10)) as ratio
from faculty f
inner join courseoffering co on f.name = co.instructor
inner join gradereport gr on co.courseid = gr.courseid
inner join student s on gr.studentid = s.studentid
where co.semester = #semester
and co.year = #year
group by f.name, f.salary
perhaps you could expand on your question a bit if this isn't what you're looking for.