Find max over multiple columns - sql

I am trying to query a list of meetings from the most recent semester, where semester is determined by two fields (year, semester). Here's a basic outline of the schema:
Otherfields Year Semester
meeting1 2014 1
meeting2 2014 1
meeting3 2013 2
... etc ...
As the max should be considered for the Year first, and then the Semester, my results should look like this:
Otherfields Year Semester
meeting1 2014 1
meeting2 2014 1
Unfortunately simply using the MAX() function on each column separately will try to find Year=2014, Semester=2, which is incorrect. I tried a couple approaches using nested subqueries and inner joins but couldn't quite get something to work. What is the most straightforward approach to solving this?

Using a window function:
SELECT Year, Semester, RANK() OVER(ORDER BY Year DESC, Semester DESC) R
FROM your_table;
R will be a column containing the "rank" of the couple (Year, Semester). You can then use this column as a filter, for instance :
WITH TT AS (
SELECT Year, Semester, RANK() OVER(ORDER BY Year DESC, Semester DESC) R
FROM your_table
)
SELECT ...
FROM TT
WHERE R = 1;
If you don't want gaps between ranks, you can use dense_rank instead of rank.
This answer assumes you use a RDBMS who is advanced enough to offer window functions (i.e. not MySQL)

I wouldn't be surprised if there's a more effecient way to do this (and avoid the duplicate subquery), but this will get you the answer you want:
SELECT * FROM table WHERE Year =
(SELECT MAX(Year) FROM table)
AND Semester =
(SELECT MAX(Semester) FROM table WHERE Year =
(SELECT MAX(Year) FROM table))

Here's Postgres:
with table2 as /*virtual temporary table*/
(
select *, year::text || semester as yearsemester
from table
)
select Otherfields, year, semester
from table2
where (Otherfields, yearsemester) in
(
select Otherfields, max(yearsemester)
from table2
group by Otherfields
)

I've been overthinking this, there's a much simpler way to get this:
SELECT Meeting.year, Meeting.semester, Meeting.otherFields
FROM Meeting
JOIN (SELECT year, semester
FROM Meeting
WHERE ROWNUM = 1
ORDER BY year DESC, semester DESC) MostRecent
ON MostRecent.year = Meeting.year
AND MostRecent.semester = Meeting.semester
(and working Fiddle)
Note that variations of this should work for pretty much all dbs (anything that supports a limiting clause in a subquery); here's the MySQL version, for example:
SELECT Meeting.year, Meeting.semester, Meeting.otherFields
FROM Meeting
JOIN (SELECT year, semester
FROM Meeting
ORDER BY year DESC, semester DESC
LIMIT 1) MostRecent
ON MostRecent.year = Meeting.year
AND MostRecent.semester = Meeting.semester
(...and working fiddle)
Given some of the data in this answer this should be performant for Oracle, and I suspect other dbs as well (given the shortcuts the optimizer is allowed to take). This should be able to replace the use of things like ROW_NUMBER() in most instances where no partitioning clause is provided (no window).

why don't you simply use ORDER BY???
that way, it would be easier to handle and less messy!! :)
SELECT * FROM table
Where Year = (Select Max(Year) from table) /* optional clause to select only 2014*/
Order by Semester ASC, Year DESC, Otherfields; /*numericaly lowest sem first. in case of sem clash, sort by descending year first */
EDIT
In case, you need limited results from 2014, use Limit clause ( for mysql )
SELECT * FROM table
Where Year = (Select Max(Year) from table)
Order by Semester ASC, Year DESC, Otherfields
LIMIT 10;
It will order first, then get the Limit - 10, so u get your limited result set!
This will fetch output like :
Otherfields Year Semester
meeting1 2014 1
meeting2 2014 1
meeting1 2013 1
meeting2 2013 2

Answering my own question here:
This query was run in a stored procedure, so I went ahead and found the maximum year/semester in separate queries before the rest of the query. This is most likely inefficient and inelegant, but it is also the most understandable method- I don't need to worry about other members of my team getting confused by it. I'll leave this question here since it's generally applicable to many other situations, and there appear to be some good answers providing alternative approaches.
-- Find the most recent year.
SELECT MAX(year) INTO max_year FROM meeting;
-- Find the most recent semester in the year.
SELECT MAX(semester) INTO max_semester FROM meeting WHERE year = max_year;
-- Open a ref cursor for meetings in most recent year/semester.
OPEN meeting_list FOR
SELECT otherfields, year, semester
FROM meeting
WHERE year = max_year
AND semester = max_semester;

Related

show only user with at least one entry per month

I have two tables, let's say one is called User and the other one is called Data
Every User has many many entries in the Data table.
The Data table has the UserID and Dates included.
I would like to make a SQL query where I only get users with at least one entry per month in year 2019.
I have no idea how to do that.
You should really mention your database type. Treat this more like pseudo-code for now. But if you update your question, I can update my answer.
SELECT userID,
YEAR(Dates),
COUNT(DISTINCT MONTH(Dates))
FROM Data
WHERE YEAR(Dates) = 2019
GROUP BY UserId,
YEAR(Dates)
HAVING COUNT(DISTINCT MONTH(Dates))=12
Since you are looking only at the year 2019, you can exclude it from the GROUP BY clause. If you need to adjust the minimum entries for MONTH, I would suggest:
WITH CTE AS (
SELECT userID,
MONTH(Dates) as [month],
COUNT(*) as TotalEntriesPerMonth
FROM Data
WHERE YEAR(Dates) = 2019
GROUP BY UserId, MONTH(Dates)
HAVING COUNT(*)>=5
)
SELECT userID
FROM CTE
GROUP BY userID
HAVING COUNT([month]) = 12
Not clear what you are asking, but your query may be like below :
CREATE TABLE Data(UserId int,Dates date)
INSERT INTO Data(UserId,Dates) VALUES(1,'2020/04/28'),(1,'2020/04/29'),(2,'2020/04/29')
;WITH CTE AS (
SELECT UserId,ROW_NUMBER() OVER(PARTITION BY MONTH(Dates),UserId ORDER BY UserId) AS rn FROM Data)
SELECT Distinct UserId FROM CTE WHERE rn >=1

Selecting the max value of two different columns

I have the following table named 'MoviesInStock'
I would like to select to latest movies from the last month.
In this case, the result should be only the movie 'The Mummy' since he is latest one.
I was trying the next query:
SELECT MovieName
FROM MovieInStock
WHERE Month = (SELECT MAX(Month) FROM MovieInStock) AND
(SELECT MovieName FROM MovieInStock WHERE Year = (SELECT MAX(Year) FROM MovieInStock))
But choosing the AND operator was not that smart. I was also trying to create a temporary table using SELECT INTO # for selecting the Max Year and then on the temp table to select the Max Month, but then it become complicated to me.
You are overcomplicating the problem. You can use TOP with ORDER BY.
Because you say "movies":
select top (1) with ties mis.*
from movieinstock mis
order by year desc, month desc
other solution, but better is Gordon Solution
with maxdt as (
select MAX(Month) MaxMonth, MAX(Year) MaxYear FROM MovieInStock
)
SELECT top 1 MovieName
FROM MovieInStock f1
inner join maxdt f2 on f1.Month=f2.MaxMonth and f1.Year=MaxYear

SQL - How to extract the minimum of all the maximums

Supposing I have a table "main":
CAT YEAR
1 2010
1 2015
2 2012
2 2010
I succeed to extract the maximum year by category with:
SELECT CAT, MAX(YEAR) FROM main GROUP BY CAT
And I would like to get the minimum of the maximum year values, namely 2012 (the third row).
Something like that:
SELECT MIN(SELECT MAX(YEAR) FROM main GROUP BY CAT)
Could someone help me?
Using your query as a subquery you can select the minimum value of the extracted years from the subquery as follow:
select min(year) from -- Select the minimum year from
(
SELECT CAT, MAX(YEAR) as year FROM main GROUP BY CAT
) -- Your query as subquery
SELECT MIN(yr)
FROM (SELECT CAT, MAX(YEAR) as yr FROM main GROUP BY CAT)
This is done using a subquery.
A subquery is a query that is nested inside a SELECT, INSERT, UPDATE, or DELETE statement, or inside another subquery. A subquery can be used anywhere an expression is allowed.
One method uses a subquery. I prefer returning a single row:
select cat, max(year)
from main
group by cat
order by max(year) desc
fetch first 1 row only;
This allows you to get the cat and the year in a single query.
Note: not all databases support the ANSI standard fetch first 1 row only. Some use limit, top or even other methods.
You can try use SELECT in SELECT. It's good ways to resolve problem like this.
SELECT min(yr)
FROM (SELECT cat, max(YEAR) as yr
FROM main
GROUP BY cat)
You're already halfway there and can select the minimum value from the table you just created (by using it in the FROM part):
SELECT MIN(maxYearPerCat) FROM
(SELECT CAT, MAX(YEAR) as maxYearPerCatFROM main GROUP BY CAT) as tmpTable
See this SqlFiddle
SELECT MAX(YEAR) FROM main GROUP BY CAT ORDER BY 1 LIMIT 1;

How to select a table based on a value in a analytical function Oracle SQL

I have a list of dates and I want to find out which one occurs the earliest in the year, I used a dense rank function to only extract the date and the month, but I can't get it to return all the values equal to 1 (there may be multiple earliest dates not just one).
SELECT
S.SG_HOSTCITY,
C.COUNTRY_OLYMPIC_CODE,
DENSE_RANK() OVER (ORDER BY to_char(S.SG_START, 'MMDD')) AS RN
FROM
SUMMERGAMES S,
COUNTRY C
WHERE
S.COUNTRY_ISOCODE = C.COUNTRY_ISOCODE
RN = 1
ORDER BY RN;
Just spits out 00933. 00000 - "SQL command not properly ended"
Can anyone help? I don't know what I'm doing wrong.
Put it into an inline view:
select SG_HOSTCITY, COUNTRY_OLYMPIC_CODE
from (SELECT S.SG_HOSTCITY,
C.COUNTRY_OLYMPIC_CODE,
DENSE_RANK() OVER(ORDER BY to_char(S.SG_START, 'MMDD')) AS RN
FROM SUMMERGAMES S
join COUNTRY C
on S.COUNTRY_ISOCODE = C.COUNTRY_ISOCODE)
WHERE RN = 1
You can't use the WHERE clause to filter in on the output values of an analytic function within the same query. You have to put it into a subquery. The above is the same as your current query but is free of syntax errors.
However I don't know if it will actually give you the output you're expecting. I might also try:
select *
from (SELECT S.SG_HOSTCITY,
C.COUNTRY_OLYMPIC_CODE,
DENSE_RANK() OVER( partition by TRUNC(S.SG_START, 'YYYY')
order BY TRUNC(S.SG_START) ) AS RN
FROM SUMMERGAMES S
join COUNTRY C
on S.COUNTRY_ISOCODE = C.COUNTRY_ISOCODE)
WHERE RN = 1
This will give you combinations of SG_HOSTCITY and COUNTRY_OLYMPIC_CODE falling on the first SG_START date associated with each year. If the first of the year 2002 is 1/5, for instance, and there are 5 such SG_HOSTCITY and COUNTRY_OLYMPIC_CODE values falling on that date for year 2002, this will show all 5 for that year, because it will bring back ties.
The difference is that the rank ascends and then restarts at the change in each year, not throughout all years (notice the partition).
I'm thinking the second query above is what you really want.

max records with dense rank

Is there a better alternative to using max to get the max records.
I have been playing with dense rank and partition over with the below query
but I am getting undesired results and poor performance.
select Tdate = (Select max(Date)
from Industries
where Industries.id = i.id
and Industries.Date <= '22 June 2011')
from #ii_t i
Many Thanks.
The supplied query doesn't use the DENSE_RANK windowing function. Not being familiar with your data structure, I believe your query is attempting to find the largest value of Date for each Industry id, yes? Rewriting the above query to use a ranking function, I would write it as a common table expression.
;
WITH RANKED AS
(
SELECT
II.*
-- RANK would serve just as well in this scenario
, DENSE_RANK() OVER (PARTITION BY II.id ORDER BY II.Date desc) AS most_recent
FROM Industries II
WHERE
II.Date <= '22 June 2011'
)
, MOST_RECENT AS
(
-- This query restricts it to the most recent row by id
SELECT
R.*
FROM
RANKED R
WHERE
R.most_recent = 1
)
SELECT
*
FROM
MOST_RECENT MR
INNER JOIN
#ii_t i
ON i.id = MR.id
Also, to address the question of performance, you might need to look at how Industries is structured. There may not be an index on that table and if there is, it might not cover the Date (descending) and id field. To improve the efficiency of the above query, don't pull back everything in the RANKED section. I did that as I was not sure what fields you would need but obviously the less you have to pull back, the more efficient the engine can be in retrieving data.
Try this (untested) code and see if it does what you want. By the looks of it, it should return the same things and hopefully a bit faster.
select Tdate = max(Industries.Date)
from #ii_t i
left outer join Industries
on Industries.id = i.id and
Industries.Date <= '22 June 2011'
group by i.id