Query associated MAX column with all it's fields in Postgres - sql

I have three database tables:
car
id
speed
id
actual_speed
car_id
gear_id
gear
id
I would like to select max speeds of all cars and the gear they achieve the max speed with. I got to the following query:
SELECT MAX(speed.actual_speed)
FROM car
INNER JOIN speed ON car.id = speed.car_id
GROUP BY car.id;
This query works but doesn't return the gear. If I include gear_id in the select SELECT MAX(speed.actual_speed), speed.gear_id. The database complains that gear_id should be included in group by or aggregate function.
But if I include it in the group by GROUP BY car.id, speed.gear_id the query returns max speeds for all gears which I'm not interested in.
Is there maybe a way to get back all cars with their max speed and the gear they achieve it in?

A simple and portable solution uses a correlated subquery:
select s.*
from speed s
where s.actual_speed = (select max(s1.actual_speed) from speed s1 where s1.car_id = s.car_id)
This would benefit an index on (car_id, actual_speed).
In Postgres, I would recommend distinct on:
select distinct on (car_id) s.*
from speed s
order by car_id, actual_speed desc
Or, if you want to allow ties:
select *
from (
select s.*, rank() over(partition by car_id order by actual_speed desc) rn
from speed s
) s
where rn = 1

Related

How to work with problems correlated subqueries that reference other tables, without using Join

I am trying to work on public dataset bigquery-public-data.austin_crime.crime of the BigQuery. My goal is to get the output as three column that shows the
discription(of the crime), count of them, and top district for that particular description(crime).
I am able to get the first two columns with this query.
select
a.description,
count(*) as district_count
from `bigquery-public-data.austin_crime.crime` a
group by description order by district_count desc
and was hoping I can get that done with one query and then I tried this in order to get the third column showing me the Top district for that particular description (crime) by adding the code below
select
a.description,
count(*) as district_count,
(
select district from
( select
district, rank() over(order by COUNT(*) desc) as rank
FROM `bigquery-public-data.austin_crime.crime`
where description = a.description
group by district
) where rank = 1
) as top_District
from `bigquery-public-data.austin_crime.crime` a
group by description
order by district_count desc
The error i am getting is this. "Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN."
I think i can do that by joins. Can someone has better solution possibly to do that using without join.
Below is for BigQuery Standard SQL
#standardSQL
SELECT description,
ANY_VALUE(district_count) AS district_count,
STRING_AGG(district ORDER BY cnt DESC LIMIT 1) AS top_district
FROM (
SELECT description, district,
COUNT(1) OVER(PARTITION BY description) AS district_count,
COUNT(1) OVER(PARTITION BY description, district) AS cnt
FROM `bigquery-public-data.austin_crime.crime`
)
GROUP BY description
-- ORDER BY district_count DESC

Sql max trophy count

I Create DataBase in SQL about Basketball. Teacher give me the task, I need print out basketball players from my database with the max trophy count. So, I wrote this little bit of code:
select surname ,count(player_id) as trophy_count
from dbo.Players p
left join Trophies t on player_id=p.id
group by p.surname
and SQL gave me this:
but I want, that SQL will print only this:
I read info about select in selects, but I don't know how it works, I tried but it doesn't work.
Use TOP:
SELECT TOP 1 surname, COUNT(player_id) AS trophy_count -- or TOP 1 WITH TIES
FROM dbo.Players p
LEFT JOIN Trophies t
ON t.player_id = p.id
GROUP BY p.surname
ORDER BY COUNT(player_id) DESC;
If you want to get all ties for the highest count, then use SELECT TOP 1 WITH TIES.
;WITH CTE AS
(
select surname ,count(player_id) as trophy_count
from dbo.Players p
group by p.surname;
)
select *
from CTE
where trophy_count = (select max(trophy_count) from CTE)
While select top with ties works (and is probably more efficient) I would say this code is probably more useful in the real world as it could be used to find the max, min or specific trophy count if needed with a very simple modification of the code.
This is basically getting your group by first, then allowing you to specify what results you want back. In this instance you can use
max(trophy_count) - get the maximum
min(trophy_count) - get the minimum
# i.e. - where trophy_count = 3 - to get a specific trophy count
avg(trophy_count) - get the average trophy_count
There are many others. Google "SQL Aggregate functions"
You will eventually go down the rabbit hole of needing to subsection this (examples are by week or by league). Then you are going to want to use windows functions with a cte or subquery)
For your example:
;with cte_base as
(
-- Set your detail here (this step is only needed if you are looking at aggregates)
select surname,Count(*) Ct
left join Trophies t on player_id=p.id
group by p.surname
, cte_ranked as
-- Dense_rank is chosen because of ties
-- Add to the partition to break out your detail like by league, surname
(
select *
, dr = DENSE_RANK() over (partition by surname order by Ct desc)
from cte_base
)
select *
from cte_ranked
where dr = 1 -- Bring back only the #1 of each partition
This is by far overkill but helping you lay the foundation to handle much more complicated queries. Tim Biegeleisen's answer is more than adequate to answer you question.

Pulling max values grouped by a variable with other columns in SQL

Say I have three columns in a very large table: a timestamp variable (last_time_started), a player name (Michael Jordan), and the team he was on the last time he started (Washington Wizards, Chicago Bulls), how do I pull the last time a player started, grouped by player, showing the team? For example:
if I did
select max(last_time_started), player, team
from table
group by 2
I would not know which team the player was on when he played his last game, which is important to me.
In Postgres the most efficient way is to use distinct on():
SELECT DISTINCT ON (player)
last_time_started,
player,
team,
FROM the_table
ORDER BY player, last_time_started DESC;
Using a window function is usually the second fastest solution, using a join with a derived table is usually the slowest alternative.
Here's a couple of ways to do this in Postgres:
With windowing functions:
SELECT last_time_started, player, team
FROM
(
SELECT
last_time_started,
player,
team,
CASE WHEN max(last_time_started) OVER (PARTITION BY PLAYER) = last_time_started then 'X' END as max_last_time_started
FROM table
)
WHERE max_last_time_started = 'x';
Or with a correlated subquery:
SELECT last_time_started, player, team
FROM table t1
WHERE last_time_started = (SELECT max(last_time_started) FROM table WHERE table.player = t1.player);
Try this solution
select s.*
from table s
inner join (
select max(t.last_time_started) as last_time_started, t.player
from table t
group by t.player) v on s.player = t.player and s.last_time_started = t.last_time_started
Also this approach should be faster, because it does not contain join
select v.last_time_started,
v.player,
v.team
from (
select t.last_time_started,
t.player,
t.team,
row_number() over (partition by t.player order by last_time_started desc) as n
from table t
) v
where v.n = 1

Producing n rows per group

It is known that GROUP BY produces one row per group. I want to produce multiple rows per group. The particular use case is, for example, selecting two cheapest offerings for each item.
It is trivial for two or three elements in the group:
select type, variety, price
from fruits
where price = (select min(price) from fruits as f where f.type = fruits.type)
or price = (select min(price) from fruits as f where f.type = fruits.type
and price > (select min(price) from fruits as f2 where f2.type = fruits.type));
(Select n rows per group in mysql)
But I am looking for a query that can show n rows per group, where n is arbitrarily large. In other words, a query that displays 5 rows per group should be convertible to a query that displays 7 rows per group by just replacing some constants in it.
I am not constrained to any DBMS, so I am interested in any solution that runs on any DBMS. It is fine if it uses some non-standard syntax.
For any database that supports analytic functions\ window functions, this is relatively easy
select *
from (select type,
variety,
price,
rank() over ([partition by something]
order by price) rnk
from fruits) rank_subquery
where rnk <= 3
If you omit the [partition by something], you'll get the top three overall rows. If you want the top three for each type, you'd partition by type in your rank() function.
Depending on how you want to handle ties, you may want to use dense_rank() or row_number() rather than rank(). If two rows tie for first, using rank, the next row would have a rnk of 3 while it would have a rnk of 2 with dense_rank. In both cases, both tied rows would have a rnk of 1. row_number would arbitrarily give one of the two tied rows a rnk of 1 and the other a rnk of 2.
To save anyone looking some time, at the time of this writing, apparently this won't work because https://dev.mysql.com/doc/refman/5.7/en/subquery-restrictions.html.
I've never been a fan of correlated subqueries as most uses I saw for them could usually be written more simply, but I think this has changed by mind... a little. (This is for MySQL.)
SELECT `type`, `variety`, `price`
FROM `fruits` AS f2
WHERE `price` IN (
SELECT DISTINCT `price`
FROM `fruits` AS f1
WHERE f1.type = f2.type
ORDER BY `price` ASC
LIMIT X
)
;
Where X is the "arbitrary" value you wanted.
If you know how you want to limit further in cases of duplicate prices, and the data permits such limiting ...
SELECT `type`, `variety`, `price`
FROM `fruits` AS f2
WHERE (`price`, `other_identifying_criteria`) IN (
SELECT DISTINCT `price`, `other_identifying_criteria`
FROM `fruits` AS f1
WHERE f1.type = f2.type
ORDER BY `price` ASC, `other_identifying_criteria` [ASC|DESC]
LIMIT X
)
;
"greatest N per group problems" can easily be solved using window functions:
select type, variety, price
from (
select type, variety, price,
dense_rank() over (partition by type) order by price as rnk
from fruits
) t
where rnk <= 5;
Windows functions only work on SQL Server 2012 and above. Try this out:
SQL Server 2005 and Above Solution
DECLARE #yourTable TABLE(Category VARCHAR(50), SubCategory VARCHAR(50), price INT)
INSERT INTO #yourTable
VALUES ('Meat','Steak',1),
('Meat','Chicken Wings',3),
('Meat','Lamb Chops',5);
DECLARE #n INT = 2;
SELECT DISTINCT Category,CA.SubCategory,CA.price
FROM #yourTable A
CROSS APPLY
(
SELECT TOP (#n) SubCategory,price
FROM #yourTable B
WHERE A.Category = B.Category
ORDER BY price DESC
) CA
Results in two highest priced subCategories per Category:
Category SubCategory price
------------------------- ------------------------- -----------
Meat Chicken Wings 3
Meat Lamb Chops 5

Query in sql to get the top 10 percent in standard sql (without limit, top and the likes, without window functions)

I'm wondering how to retrieve the top 10% athletes in terms of points, without using any clauses such as TOP, Limit etc, just a plain SQL query.
My idea so far:
Table Layout:
Score:
ID | Name | Points
Query:
select *
from Score s
where 0.10 * (select count(*) from Score x) >
(select count(*) from Score p where p.Points < s.Points)
Is there an easier way to do this? Any suggestions?
In most databases, you would use the ANSI standard window functions:
select s.*
from (select s.*,
count(*) over () as cnt,
row_number() over (order by score) as seqnum
from s
) s
where seqnum*10 < cnt;
Try:
select s1.id, s1.name s1.points, count(s2.points)
from score s1, score s2
where s2.points > s1.points
group by s1.id, s1.name s1.points
having count(s2.points) <= (select count(*)*.1 from score)
Basically calculates the count of players with a higher score than the current score, and if that count is less than or equal to 10% of the count of all scores, it's in the top 10%.
The PERCENTILE_DISC function is standard SQL and can help you here. Not every SQL implementation supports it, but the following should work in SQL Server 2012, for example. If you need to be particular about ties, or what the top 10% means if there are fewer than 10 athletes, make sure this is computing what you want. PERCENTILE_COMP may be a better option for some questions.
WITH C(cutoff) AS (
SELECT DISTINCT
PERCENTILE_DISC(0.90)
WITHIN GROUP (ORDER BY points)
OVER ()
FROM T
)
SELECT *
FROM Score JOIN C
ON points >= cutoff;