Selecting and comparing the averages of multiple SQL tables - sql

I'm doing a NBA database with random data refering a team (for example the Lakers). The thing here is that I have a table for each season and in each season-table I have the game number, the player name, points, assists, rebounds, steals and blocks. What i want to do and don't know how to express it in an SQL sentence or multiple is to ask: select the avg of points, avg assists, avg rebounds... of ONE season of ONE player that for example averaged more than 25 points, 5 assists and 5 rebounds, in a SINGLE season. So that a player did it multiple times in multiple seasons, it appears the averages of that player multiple times.

Written in postgres, the CTE is a UNION of all your different tables, with calculated averages using each respective table. Then you simply select from the CTE.
with seasons as (
select '2018' as season, player,
avg(points) as avg_points,
avg(assists) as avt_assists,
avg(rebounds) as avg_rebounds,
avg(steals) as avg_steals,
avg(blocks) as avg_blocks
from table1
group by 1, 2
union
select '2019' as season, player,
avg(points) as avg_points,
avg(assists) as avt_assists,
avg(rebounds) as avg_rebounds,
avg(steals) as avg_steals,
avg(blocks) as avg_blocks
from table2
group by 1, 2
union
select '2020' as season, player,
avg(points) as avg_points,
avg(assists) as avt_assists,
avg(rebounds) as avg_rebounds,
avg(steals) as avg_steals,
avg(blocks) as avg_blocks
from table3
group by 1, 2
)
select *
from seasons
where avg_points > 25
and avg_assists > 5
and avg_rebounds > 5

Related

count the number of times a combination of values occurs

Dataset looking at the types of crime for a given city.
Incident ID
Incident Code
Incident Category
Incident Subcategory
Incident Description
618691
4134
Assault
Simple Assault
Battery
618691
15300
Offences Against The Family And Children
Other
Hate Crime (secondary only)
618701
7053
Vehicle Impounded
Vehicle Impounded
Vehicle, Impounded
618701
65010
Traffic Violation Arrest
Traffic Violation Arrest
Traffic Violation Arrest
618701
65050
Other Miscellaneous
Other
Driving While Under The Influence Of Alcohol
626010
5043
Burglary
Burglary - Residential
Burglary, Residence, Unlawful Entry
626010
6381
Larceny Theft
Larceny Theft - Other
Embezzlement from Dependent or Elder Adult by Caretaker
626010
7041
Recovered Vehicle
Recovered Vehicle
Vehicle, Recovered, Auto
626010
16650
Drug Offense
Drug Violation
Methamphetamine Offense
Each IncidentID has 2, 3, or 4 Incident Codes associated with it.
I want to be able to count the number of times each combination of 2, 3, or 4 Incident Codes appears in the entire dataset.
For example:
Incident Codes 4134, 15300: x amount of times
Incident Codes 7053, 65010, 65050: x amount of times
Incident Codes 5043, 6381, 7041, 16650: x amount of times
I apologize if I've given a poor explanation - this is my first post on SO and quite frankly I don't know how to best communicate this question.
I don't know what SQL code to run to get my answer. The closest I've come to finding an answer is this post, Select combination of two columns, and count occurrences of this combination, but it already has the data separated into two columns, which my data is not there.
My thought is to split the additional codes into other columns, but perhaps there is a way to avoid doing that by having the code run the calculation for me without it.
I appreciate any and all input you may be able to give!
Let's suppose your table is named "TableX". I think this query should be near to what you need:
Select T1.IncidentCode, T2.IncidentCode, T3.IncidentCode, T4.IncidentCode, Count(1) AS AmountOfTimes
From TableX T1
Join TableX T2 ON T2.IncidentID = T1.IncidentID AND
T2.IncidentCode <> T1.IncidentCode
Left Join TableX T3 ON T3.IncidentID = T1.IncidentID AND
T3.IncidentCode <> T1.IncidentCode AND
T3.IncidentCode <> T2.IncidentCode
Left Join TableX T4 ON T4.IncidentID = T1.IncidentID AND
T4.IncidentCode <> T1.IncidentCode AND
T4.IncidentCode <> T2.IncidentCode AND
T4.IncidentCode <> T3.IncidentCode
Group By T1.IncidentCode, T2.IncidentCode, T3.IncidentCode, T4.IncidentCode
You would probably be best to try and NOT get all 3 parts in one query and here is why. Lets say for example that one officer enters their data as codes 1, 2, 3. Another enters codes as 3, 1, 2, and yet another enters as 2, 3, 1. They are all the same "set" of codes just in different order. If you rely on just being the first being the same, you would be getting 3 different rows showing the same thing each with 1 count.
You would be better served by running 3 distinct queries with a WHERE and HAVING clause based on just the codes you are interested in the "set". Something simple like
select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 4134, 15300 )
group by
YT.IncidentID
having
count(*) = 2
This will return all incidents that have BOTH parts, even if the incident was associated with any 3rd and/or 4th additional codes in a given incident. Having the total records IS your count.
So, now, take your codes of interest ex: 1 & 2, and you have the possibility of 2 more incident codes per incident, and you add an additional 30+ combinations of codes 3 & 4 into the mix. If you dont care about the others that may be "extra", it does not screw up your count on the precise piece(s) you are looking for.
Then, all you have to do to get your other "what if" scenario counts is change your IN clause once and the having to match the count. Since you are only filtering based on the specific codes in question, you only want those that have the same count regardless of extra incident codes per example stated.
YT.IncidentCode in ( 7053, 65010, 65050 )
group by
YT.IncidentID
having
count(*) = 3
YT.IncidentCode in ( 5043, 6381, 7041, 16650 )
group by
YT.IncidentID
having
count(*) = 4
Now, if you only really care about the final count of each respectively, just wrap that up one more to get the count of rows returned such as
select
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 4134, 15300 )
group by
YT.IncidentID
having
count(*) = 2 ) PreQualified
Then, if you wanted to do this on some time period basis such as you have a given date of the incident, and you wanted to keep running the same query / counts, you could expand and do something like this by doing a UNION to each query.
select
'Assault and Offenses against Family and Children' as Activity,
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 4134, 15300 )
AND WhateverDateFilters...
group by
YT.IncidentID
having
count(*) = 2 ) PreQualified
UNION
select
'Vehicle Impound, Traffic Arrest, Other Misc' as Activity,
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 7053, 65010, 65050 )
AND WhateverDateFilters...
group by
YT.IncidentID
having
count(*) = 3 ) PreQualified
UNION
select
'Burglary, Theft, Drugs and Vehicle Recovery' as Activity,
count(*) NumberOfIncidents
from
( select
YT.IncidentID,
count(*) HowMany
from
YourTable YT
where
YT.IncidentCode in ( 5043, 6381, 7041, 16650 )
AND WhateverDateFilters...
group by
YT.IncidentID
having
count(*) = 4 ) PreQualified
Notice each query in the UNION returns the same number, and order of columns. So it will just return a list (in this case) of 3 rows with a description and count per category regardless of the physical order the incident codes were entered, even IF they were entered in the 3rd and 4th when only looking for 2 code possibilities.
Sometimes a generic query (as in the left-join sample) is ok, and nothing wrong with it, but ask yourself the flexibility and do you want to drill into each permutation just to get your final result numbers.

Check if 20% players of a tournament has played 8 or more rounds in previous year - Query Optimization - BigQuery

So I have a table "tournaments" with the following attributes/columns:
player_name
tournament_name
round_1, round_2, ......, round_10
round_1_date, round_2_date, ......, round_10_date
Other columns
I'm willing to check if 20% players of a particular tournament has played 8 or more rounds in previous year. If a person has played a round then there will be some score in that round else "null".
e.g: If a player has played round 1 and 2 then columns of round_1 and round_2 will be populated with some score along with round_1_date and round_2_date. All the other columns of rounds will be having null.
(Starting date can be round_1_date.)
I have made the following query which is giving accurate results but I beleive there can be a more better/optimized approach which will take less time than this one as this query will run multiple times in a SQL loop for large dataset. Query returns true or false.
SELECT
(CAST((total_players_meeting_threshold_criteria/COUNT(t4.player_name)*100>=20) AS string)) AS result
FROM (
SELECT
COUNT(*) total_players_meeting_threshold_criteria,
FROM (
SELECT
(player_name),
COUNT(round_1)+COUNT(round_2)+COUNT(round_3)+COUNT(round_4)+COUNT(round_5)+COUNT(round_6)+COUNT(round_7)+COUNT(round_8)+COUNT(round_9)+COUNT(round_10) AS total_rounds_played,
FROM
`Golf_DB_Women_Dataset.tournaments` t2
WHERE
player_name IN (
SELECT
DISTINCT player_name,
FROM
`tournaments` t1
WHERE
tournament_name = 'Tournament of Champions')
AND t2.round_1_date BETWEEN DATE_SUB((
SELECT
round_1_date
FROM
`tournaments`
WHERE
tournament_name = 'Tournament of Champions'
LIMIT
1), INTERVAL 1 YEAR)
AND DATE_SUB((
SELECT
round_1_date
FROM
`tournaments`
WHERE
tournament_name = 'Tournament of Champions'
LIMIT
1), INTERVAL 1 DAY)
GROUP BY
player_name
HAVING
total_rounds_played >= 8
ORDER BY
player_name) ) AS threshold_result,
`tournaments` t4
WHERE
t4.tournament_name = 'Tournament of Champions'
GROUP BY
total_players_meeting_threshold_criteria
Any help will be highly appreciated.
Thank you

SQL MAx function with multiple columns showing in apex?

image1 image2I am trying to write an sql function to show the year, playername, and ppg of the player with the highest ppg from each year in our database.
We have a Players table with all the stats, and a team table with stats linked to each season as a team total.
What I want to do is get the highest scorer from each season so:
2010: Jake 10ppg
2011: Jake 12 ppg
2012 Carl 13 ppq
Etc.
here is my current query
SELECT Year, PlayerName, MAX(PPG) AS PPG
FROM PLAYERS_T, TEAM_T
GROUP BY Year
ORDER BY PPG;
However this is not working, what do I need to do to make this work?
This should work, but will show duplicated record if same PPG. Dont know what is the use of Team table there
SQL DEMO
WITH PLAYERS_T as (
SELECT 2010 "Year", 'Jake' "PlayerName", 10 ppg
UNION
SELECT 2011 "Year", 'Jake' "PlayerName", 12 ppg
UNION
SELECT 2012 "Year", 'Carl' "PlayerName", 13 ppg
)
SELECT T1."Year", T1."PlayerName", T1.PPG
FROM PLAYERS_T T1
LEFT JOIN PLAYERS_T T2
ON T1."Year" = T2."Year"
AND T1.PPG < T2.PPG
WHERE T2."Year" IS NULL
OUTPUT
Try this one:
SELECT players_T.playername, players_T.ppg, players_T.year
FROM
(SELECT year, MAX(PPG) AS mx
FROM players_T
GROUP BY year) sub
INNER JOIN players_T ON sub.mx = players_T.ppg
WHERE sub.year = players_T.year
ORDER BY players_T.year
In the subquery, this finds the max ppg per year. Then we join with the players table on the ppg to find the player name. The result should be the player name, ppg and year together. Let me know what you find!
Edit: Need to include a WHERE clause for year

SQL - Get consecutively minimum numbers

Title may not make sense so I will provide some context.
I have a table, call it Movies.
A movie tuple has the values: Name, Director, Genre, Year
I'm trying to create a query that allows me to return all Directors who have never released two consecutive Horror films with more than 4 years apart.
I'm not sure where I'd begin but I'm trying to start off by creating a query that given some specific year, returns the next minimum year, so that I can check if the difference between these two is less than 4, and keep doing that for all movies.
My attempt was:
SELECT D1.Director
FROM Movies D1
WHERE D1.Director NOT IN
(SELECT D2.Director FROM Director D2
WHERE D2.Director = D1.Director
AND D2.Genre = 'Horror'
AND D1.Genre = 'Horror' AND D2.Year - D1.Year > 4
OR D1.Year - D2.Year > 4)
which does not work for obvious reasons.
I've also had a few attempts using joins, and it works on films that follow a pattern such as 2000, 2003, 2006, but fail if more than 3 films.
You could try this:
Select all data, and use lag or lead to return the last or next year. After that look at the difference between the two.
WITH TempTable AS (
SELECT
Name,
Director,
Genre,
Year,
LAG(Year) OVER (PARTITION BY Name, Director, Genre ORDER BY Year ASC) AS 'PriorYear'
FROM
Movies
WHERE
Genre = 'Horror'
)
SELECT
Name,
Director
FROM
TempTable
GROUP BY
Name,
Director
HAVING
MAX(Year-PriorYear) < 2
Try this:
SELECT * FROM (
SELECT director, min(diff) as diff FROM (
SELECT m1.director, m1.year as year1, m2.year as year2, m2.year-m1.year as diff
FROM `movies` m1, movies m2
WHERE m1.director = m2.director and m1.name <> m2.name and m1.year<=m2.year
and m1.genre='horror' and m2.genre='horror'
) d1 group by director
) d2 WHERE diff>4
First, in the inner Select it will list all movie pairs of directors' horror movies with year difference calculated, then minimum of these are selected (for consecutiveness), then longer than 4 years differences are selected...

Top N results grouped Oracle SQL

I want to write a query that allows me to only get the specific data I want and nothing more.
We will use TV's as an example. I have three brands of TVs and I want to see the top ten selling models of each brand. I only want to return 30 rows. One solution is unions, but that can get messy fast. Ideally there would be a WHERE ROWNUM grouping by situation.
SELECT
A.Brand
, A.Model
, A.Sales
FROM
( SELECT
TV.Brand
, TV.Model
, SUM(TV.SALES) AS SALES
FROM TV_TABLE as TV
ORDER BY
TV.Brand
, SALES DESC
) A
WHERE ROWNUM <10
In my code above I will get the top 10 total results from the inner query, but not 10 from each Grouping.
What I want to see is something like this:
Brand: Model: Sales
Sony: x10: 20
Sony: X20: 18
Sony: X30: 10
VISIO: A40: 40
VISIO: A20: 10
This is an oversimplified example, in practice I'll need to have 20-50 gropings and would like to avoid downloading all of the data and using a Pivot feature.
select Brand, Model, SALES
from(
select Brand, Model, SALES,row_number()over(partition by Brand order by SALES desc) rn
from (
SELECT TV.Brand, TV.Model,SUM(TV.SALES) AS SALES,
FROM TV_TABLE as TV
group BY TV.Brand,TV.Model
)a
)b
where rn <= 10
SELECT TV.Brand, TV.Model, SUM(TV.SALES) AS SALES
FROM TV_TABLE TV
group by TV.Brand, TV.Model
order by SUM(TV.SALES) desc, TV.Brand
limit 30