Pulling max values grouped by a variable with other columns in SQL - sql

Say I have three columns in a very large table: a timestamp variable (last_time_started), a player name (Michael Jordan), and the team he was on the last time he started (Washington Wizards, Chicago Bulls), how do I pull the last time a player started, grouped by player, showing the team? For example:
if I did
select max(last_time_started), player, team
from table
group by 2
I would not know which team the player was on when he played his last game, which is important to me.

In Postgres the most efficient way is to use distinct on():
SELECT DISTINCT ON (player)
last_time_started,
player,
team,
FROM the_table
ORDER BY player, last_time_started DESC;
Using a window function is usually the second fastest solution, using a join with a derived table is usually the slowest alternative.

Here's a couple of ways to do this in Postgres:
With windowing functions:
SELECT last_time_started, player, team
FROM
(
SELECT
last_time_started,
player,
team,
CASE WHEN max(last_time_started) OVER (PARTITION BY PLAYER) = last_time_started then 'X' END as max_last_time_started
FROM table
)
WHERE max_last_time_started = 'x';
Or with a correlated subquery:
SELECT last_time_started, player, team
FROM table t1
WHERE last_time_started = (SELECT max(last_time_started) FROM table WHERE table.player = t1.player);

Try this solution
select s.*
from table s
inner join (
select max(t.last_time_started) as last_time_started, t.player
from table t
group by t.player) v on s.player = t.player and s.last_time_started = t.last_time_started
Also this approach should be faster, because it does not contain join
select v.last_time_started,
v.player,
v.team
from (
select t.last_time_started,
t.player,
t.team,
row_number() over (partition by t.player order by last_time_started desc) as n
from table t
) v
where v.n = 1

Related

Query associated MAX column with all it's fields in Postgres

I have three database tables:
car
id
speed
id
actual_speed
car_id
gear_id
gear
id
I would like to select max speeds of all cars and the gear they achieve the max speed with. I got to the following query:
SELECT MAX(speed.actual_speed)
FROM car
INNER JOIN speed ON car.id = speed.car_id
GROUP BY car.id;
This query works but doesn't return the gear. If I include gear_id in the select SELECT MAX(speed.actual_speed), speed.gear_id. The database complains that gear_id should be included in group by or aggregate function.
But if I include it in the group by GROUP BY car.id, speed.gear_id the query returns max speeds for all gears which I'm not interested in.
Is there maybe a way to get back all cars with their max speed and the gear they achieve it in?
A simple and portable solution uses a correlated subquery:
select s.*
from speed s
where s.actual_speed = (select max(s1.actual_speed) from speed s1 where s1.car_id = s.car_id)
This would benefit an index on (car_id, actual_speed).
In Postgres, I would recommend distinct on:
select distinct on (car_id) s.*
from speed s
order by car_id, actual_speed desc
Or, if you want to allow ties:
select *
from (
select s.*, rank() over(partition by car_id order by actual_speed desc) rn
from speed s
) s
where rn = 1

SQL query to get max homeruns for each year

In the database we have the table batter, which contains a record for each batter each year they played as well as their game stats for that year. How would I write a query to select each player that has the max number of homeruns out of all players?
I've been trying with
SELECT *
FROM master
WHERE batting.HR = (
SELECT MAX(batting.HR)
FROM batting
)
But have had no luck. How could I do this?
SELECT b.*
FROM batter b
INNER JOIN (
select year, Max(hr) as MaxHR
from batter
group by year
) y on b.year = y.year and b.HR = y.MaxHR
Note this is vulnerable to two players tying. It's not clear what you want to see in that case.
Assuming your RDBMS supports SQL2003, you can use a window function:
SELECT *
FROM
(
SELECT
batter,
year,
hr,
MAX(hr) OVER (PARTITION BY year) AS year_best_hr
FROM batting
)
WHERE hr = year_best_hr

Select entry of each group having exactly 1 entry

I am looking for an optimized query
let me show you a small example.
Lets suppose I have a table having three field studentId, teacherId and subject as
Now I want those data in which a physics teacher is teaching to only one student, i.e
teacher 300 is only teaching student 3 and so on.
What I have tried till now
select sid,tid from tabletesting with(nolock)
where tid in (select tid from tabletesting with(nolock)
where subject='physics' group by tid having count(tid) = 1)
and subject='physics'
The above query is working fine. But I want different solution in which I don't have to scan the same table twice.
I also tried using Rank() and Row_Number() but no result.
FYI :
I have showed you an example, this is not the actual table i am playing with, my table contain huge number of rows and columns and where clause is also very complex(i.e date comparison etc.), so I don't want to give the same where clause in subquery and outquery.
You can do this with window functions. Assuming that there are no duplicate students for a given teacher (as in your sample data):
select tt.sid, tt.tid
from (select tt.*, count(*) over (partition by teacher) as scnt
from TableTesting tt
) tt
where scnt = 1;
Another way to approach this, which might be more efficient, is to use an exists clause:
select tt.sid, tt.tid
from TableTesting tt
where not exists (select 1 from TableTesting tt1 where tt1.tid = tt.tid and tt1.sid <> tt.sid)
Another option is to use an analytic function:
select sid, tid, subject from
(
select sid, tid, subject, count(sid) over (partition by subject, tid) cnt
from tabletesting
) X
where cnt = 1

SQL query to select distinct row with minimum value

I want an SQL statement to get the row with a minimum value.
Consider this table:
id game point
1 x 5
1 z 4
2 y 6
3 x 2
3 y 5
3 z 8
How do I select the ids that have the minimum value in the point column, grouped by game? Like the following:
id game point
1 z 4
2 y 5
3 x 2
Use:
SELECT tbl.*
FROM TableName tbl
INNER JOIN
(
SELECT Id, MIN(Point) MinPoint
FROM TableName
GROUP BY Id
) tbl1
ON tbl1.id = tbl.id
WHERE tbl1.MinPoint = tbl.Point
This is another way of doing the same thing, which would allow you to do interesting things like select the top 5 winning games, etc.
SELECT *
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Point) as RowNum, *
FROM Table
) X
WHERE RowNum = 1
You can now correctly get the actual row that was identified as the one with the lowest score and you can modify the ordering function to use multiple criteria, such as "Show me the earliest game which had the smallest score", etc.
This will work
select * from table
where (id,point) IN (select id,min(point) from table group by id);
As this is tagged with sql only, the following is using ANSI SQL and a window function:
select id, game, point
from (
select id, game, point,
row_number() over (partition by game order by point) as rn
from games
) t
where rn = 1;
Ken Clark's answer didn't work in my case. It might not work in yours either. If not, try this:
SELECT *
from table T
INNER JOIN
(
select id, MIN(point) MinPoint
from table T
group by AccountId
) NewT on T.id = NewT.id and T.point = NewT.MinPoint
ORDER BY game desc
SELECT DISTINCT
FIRST_VALUE(ID) OVER (Partition by Game ORDER BY Point) AS ID,
Game,
FIRST_VALUE(Point) OVER (Partition by Game ORDER BY Point) AS Point
FROM #T
SELECT * from room
INNER JOIN
(
select DISTINCT hotelNo, MIN(price) MinPrice
from room
Group by hotelNo
) NewT
on room.hotelNo = NewT.hotelNo and room.price = NewT.MinPrice;
This alternative approach uses SQL Server's OUTER APPLY clause. This way, it
creates the distinct list of games, and
fetches and outputs the record with the lowest point number for that game.
The OUTER APPLY clause can be imagined as a LEFT JOIN, but with the advantage that you can use values of the main query as parameters in the subquery (here: game).
SELECT colMinPointID
FROM (
SELECT game
FROM table
GROUP BY game
) As rstOuter
OUTER APPLY (
SELECT TOP 1 id As colMinPointID
FROM table As rstInner
WHERE rstInner.game = rstOuter.game
ORDER BY points
) AS rstMinPoints
This is portable - at least between ORACLE and PostgreSQL:
select t.* from table t
where not exists(select 1 from table ti where ti.attr > t.attr);
Most of the answers use an inner query. I am wondering why the following isn't suggested.
select
*
from
table
order by
point
fetch next 1 row only // ... or the appropriate syntax for the particular DB
This query is very simple to write with JPAQueryFactory (a Java Query DSL class).
return new JPAQueryFactory(manager).
selectFrom(QTable.table).
setLockMode(LockModeType.OPTIMISTIC).
orderBy(QTable.table.point.asc()).
fetchFirst();
Try:
select id, game, min(point) from t
group by id

SQL: max(count((x))

I have a table of baseball fielding statistics for a project. There are many fields on this table, but the ones I care about for this are playerID, pos (position), G (games).
This table is historical so it contains multiple rows per playerID (one for each year/pos). What I want to be able to do is return the position that a player played the most for his career.
First what I imaging I have to do is count the games per position per playerID, then return the max of it. How can this be done in SQL? I am using SQL Server. On a side note, there may be a situation where there are ties, what would max do then?
If the player played in the same position over multiple teams over multiple games, I'd be more apt to use the sum() function, instead of count, in addition to using a group by statement, as a sub-query. See code for explanation.
SELECT playerID, pos, MAX( g_sum )
FROM (
SELECT DISTINCT playerID, pos, SUM( G ) as g_sum
FROM player_stats
GROUP BY id, pos
ORDER BY 3 DESC
) game_sums
GROUP BY playerID
It may not be the exact answer, at least it's a decent starting point and it worked on my lame testbed that I whipped up in 10 minutes.
As far as how max() acts with ties: It doesn't (as far as I can tell, at least). It's up to the actual GROUP BY statement itself, and where and how that max value shows up within the query or sub query.
If we were to include pos in the outer GROUP BY statement, in the event of a tie, it would show you both positions and the amount of games the player has played at said positions (which would be the same number). With it not in that GROUP BY statement, the query will go with the last given value for that column. So if position 2 showed up before position 3 in the sub query, the full query will show position 3 as the position that the player has played the most games in.
In SQL, I believe this will do it. Given that the same subquery is needed twice, I expect that doing this as a stored procedure would be more efficient.
SELECT MaxGamesInAnyPosition.playerID, GamesPerPosition.pos
FROM (
SELECT playerID, Max(totalGames) As maxGames
FROM (
SELECT playerID, pos, SUM(G) As totalGames
FROM tblStats
GROUP BY playerId, pos) Tallies
GROUP BY playerID) MaxGamesInAnyPosition
INNER JOIN (
SELECT playerID, pos, SUM(g) As totalGames
FROM tblStats
GROUP BY playerID, pos) GamesPerPosition
ON (MaxGamesInAnyPosition.playerID=GamesPerPosition.playerId
AND MaxGamesInAnyPosition.maxGames=GamesPerPosition.totalGames)
does not look pretty, but it is direct translation of what I built in linq to sql, give it a try and see if that's what you want:
SELECT [t2].[playerID], (
SELECT TOP (1) [t7].[pos]
FROM (
SELECT [t4].[playerID], [t4].[pos], (
SELECT COUNT(*)
FROM (
SELECT DISTINCT [t5].[G]
FROM [players] AS [t5]
WHERE ([t4].[playerID] = [t5].[playerID]) AND ([t4].[pos] = [t5].[pos])
) AS [t6]
) AS [value]
FROM (
SELECT [t3].[playerID], [t3].[pos]
FROM [players] AS [t3]
GROUP BY [t3].[playerID], [t3].[pos]
) AS [t4]
) AS [t7]
WHERE [t2].[playerID] = [t7].[playerID]
ORDER BY [t7].[value] DESC
) AS [pos]
FROM (
SELECT [t1].[playerID]
FROM (
SELECT [t0].[playerID]
FROM [players] AS [t0]
GROUP BY [t0].[playerID], [t0].[pos]
) AS [t1]
GROUP BY [t1].[playerID]
) AS [t2]
Here is a second answer, much better (I think) than my first kick at the can last night. Certainly much easier to read and understand.
SELECT playerID, pos
FROM (
SELECT playerID, pos, SUM(G) As totGames
FROM tblStats
GROUP BY playerID, pos) Totals
WHERE NOT (Totals.totGames < ANY(
SELECT SUM(G)
FROM tblStats
WHERE Totals.playerID=tblStats.playerID
GROUP BY playerID, pos))
The subquery ensures that all rows will be thrown out if its games total at that given position is smaller than the number of games that player played at any other position.
In case of ties, the player in question will have all tied rows appear, as none of the tied records will be thrown out.