Select a Column in SQL not in Group By - sql

I have been trying to find some info on how to select a non-aggregate column that is not contained in the Group By statement in SQL, but nothing I've found so far seems to answer my question. I have a table with three columns that I want from it. One is a create date, one is a ID that groups the records by a particular Claim ID, and the final is the PK. I want to find the record that has the max creation date in each group of claim IDs. I am selecting the MAX(creation date), and Claim ID (cpe.fmgcms_cpeclaimid), and grouping by the Claim ID. But I need the PK from these records (cpe.fmgcms_claimid), and if I try to add it to my select clause, I get an error. And I can't add it to my group by clause because then it will throw off my intended grouping. Does anyone know any workarounds for this? Here is a sample of my code:
Select MAX(cpe.createdon) As MaxDate, cpe.fmgcms_cpeclaimid
from Filteredfmgcms_claimpaymentestimate cpe
where cpe.createdon < 'reportstartdate'
group by cpe.fmgcms_cpeclaimid
This is the result I'd like to get:
Select MAX(cpe.createdon) As MaxDate, cpe.fmgcms_cpeclaimid, cpe.fmgcms_claimid
from Filteredfmgcms_claimpaymentestimate cpe
where cpe.createdon < 'reportstartdate'
group by cpe.fmgcms_cpeclaimid

The columns in the result set of a select query with group by clause must be:
an expression used as one of the group by criteria , or ...
an aggregate function , or ...
a literal value
So, you can't do what you want to do in a single, simple query. The first thing to do is state your problem statement in a clear way, something like:
I want to find the individual claim row bearing the most recent
creation date within each group in my claims table
Given
create table dbo.some_claims_table
(
claim_id int not null ,
group_id int not null ,
date_created datetime not null ,
constraint some_table_PK primary key ( claim_id ) ,
constraint some_table_AK01 unique ( group_id , claim_id ) ,
constraint some_Table_AK02 unique ( group_id , date_created ) ,
)
The first thing to do is identify the most recent creation date for each group:
select group_id ,
date_created = max( date_created )
from dbo.claims_table
group by group_id
That gives you the selection criteria you need (1 row per group, with 2 columns: group_id and the highwater created date) to fullfill the 1st part of the requirement (selecting the individual row from each group. That needs to be a virtual table in your final select query:
select *
from dbo.claims_table t
join ( select group_id ,
date_created = max( date_created )
from dbo.claims_table
group by group_id
) x on x.group_id = t.group_id
and x.date_created = t.date_created
If the table is not unique by date_created within group_id (AK02), you you can get duplicate rows for a given group.

You can do this with PARTITION and RANK:
select * from
(
select MyPK, fmgcms_cpeclaimid, createdon,
Rank() over (Partition BY fmgcms_cpeclaimid order by createdon DESC) as Rank
from Filteredfmgcms_claimpaymentestimate
where createdon < 'reportstartdate'
) tmp
where Rank = 1

The direct answer is that you can't. You must select either an aggregate or something that you are grouping by.
So, you need an alternative approach.
1). Take you current query and join the base data back on it
SELECT
cpe.*
FROM
Filteredfmgcms_claimpaymentestimate cpe
INNER JOIN
(yourQuery) AS lookup
ON lookup.MaxData = cpe.createdOn
AND lookup.fmgcms_cpeclaimid = cpe.fmgcms_cpeclaimid
2). Use a CTE to do it all in one go...
WITH
sequenced_data AS
(
SELECT
*,
ROW_NUMBER() OVER (PARITION BY fmgcms_cpeclaimid ORDER BY CreatedOn DESC) AS sequence_id
FROM
Filteredfmgcms_claimpaymentestimate
WHERE
createdon < 'reportstartdate'
)
SELECT
*
FROM
sequenced_data
WHERE
sequence_id = 1
NOTE: Using ROW_NUMBER() will ensure just one record per fmgcms_cpeclaimid. Even if multiple records are tied with the exact same createdon value. If you can have ties, and want all records with the same createdon value, use RANK() instead.

You can join the table on itself to get the PK:
Select cpe1.PK, cpe2.MaxDate, cpe1.fmgcms_cpeclaimid
from Filteredfmgcms_claimpaymentestimate cpe1
INNER JOIN
(
select MAX(createdon) As MaxDate, fmgcms_cpeclaimid
from Filteredfmgcms_claimpaymentestimate
group by fmgcms_cpeclaimid
) cpe2
on cpe1.fmgcms_cpeclaimid = cpe2.fmgcms_cpeclaimid
and cpe1.createdon = cpe2.MaxDate
where cpe1.createdon < 'reportstartdate'

Thing I like to do is to wrap addition columns in aggregate function, like max().
It works very good when you don't expect duplicate values.
Select MAX(cpe.createdon) As MaxDate, cpe.fmgcms_cpeclaimid, MAX(cpe.fmgcms_claimid) As fmgcms_claimid
from Filteredfmgcms_claimpaymentestimate cpe
where cpe.createdon < 'reportstartdate'
group by cpe.fmgcms_cpeclaimid

What you are asking, Sir, is as the answer of RedFilter.
This answer as well helps in understanding why group by is somehow a simpler version or partition over:
SQL Server: Difference between PARTITION BY and GROUP BY
since it changes the way the returned value is calculated and therefore you could (somehow) return columns group by can not return.

You can use as below,
Select X.a, X.b, Y.c from (
Select X.a as a, sum (b) as sum_b from name_table X
group by X.a)X
left join from name_table Y on Y.a = X.a
Example;
CREATE TABLE #products (
product_name VARCHAR(MAX),
code varchar(3),
list_price [numeric](8, 2) NOT NULL
);
INSERT INTO #products VALUES ('paku', 'ACE', 2000)
INSERT INTO #products VALUES ('paku', 'ACE', 2000)
INSERT INTO #products VALUES ('Dinding', 'ADE', 2000)
INSERT INTO #products VALUES ('Kaca', 'AKB', 2000)
INSERT INTO #products VALUES ('paku', 'ACE', 2000)
--SELECT * FROM #products
SELECT distinct x.code, x.SUM_PRICE, product_name FROM (SELECT code, SUM(list_price) as SUM_PRICE From #products
group by code)x
left join #products y on y.code=x.code
DROP TABLE #products

Related

SQL server 2016 Getting distinct results when only limited to a where statement

I am looking for a distinct list of the CUSTOMER_NAME field from my table. Normally I would simply do
SELECT
distinct
[CUSTOMER_NAME]
FROM [iData3].[dbo].[N241650]
or
SELECT
[CUSTOMER_NAME]
FROM [iData3].[dbo].[N241650]
Group by [CUSTOMER_NAME]
But I am limited in my query. Due to software restrictions, the query can only be of the form
SELECT * from
[iData3].[dbo].[N241650]
where ...
How do I get a distinct list of customer names given these restrictions? I essentially need to cram everything into the WHERE clause. I'm thinking possibly WHERE EXISTS or NOT EXISTS but I haven't used those conditions before so I'm not certain if they'd be useful.
This is not possible because... is acceptable if the disappointing answer.
You can use row_number() function :
SELECT TOP (1) WITH TIES [CUSTOMER_NAME]
FROM [iData3].[dbo].[N241650]
ORDER BY ROW_NUMBER() OVER (PARTITION BY CUSTOMER_NAME ORDER BY ?)
? indicates something identity or primary/unique column which you have.
You can group by that column to achieve the same result.
select CUSTOMER_NAME
from ...
group by CUSTOMER_NAME
order by CUSTOMER_NAME;
Another alternative is to use a stored procedure.
If you can't escape from the *, then you can't GROUP BY and if you just have a WHERE then you will need a key (unique set of columns) to be able to filter correctly or else you can't differentiate dupicates (and end up selecting more than 1 row with the same customer name).
It's a bit convoluted, but try with this. It will get you 1 row per each CUSTOMER_NAME.
SELECT
*
from
[iData3].[dbo].[N241650]
where
[N241650].KeyColumn IN
(
SELECT
Z.KeyColumn
FROM
(
SELECT
X.KeyColumn,
Ranking = ROW_NUMBER() OVER (PARTITION BY X.CUSTOMER_NAME ORDER BY X.KeyColumn ASC)
FROM
[iData3].[dbo].[N241650] AS X
WHERE
X.KeyColumn IS NOT NULL
) AS Z
WHERE
Z.Ranking = 1
)
The ORDER BY inside the OVER will determine which row you get for each CUSTOMER_NAME.
If you have multiple columns for your key, then you will have to switch the IN for an EXISTS against multiple columns (you can't do a multiple column IN in SQL Server).
SELECT
*
from
[iData3].[dbo].[N241650]
where
EXISTS (
SELECT
'key columns match'
FROM (
SELECT
X.KeyColumn1,
X.KeyColumn2,
Ranking = ROW_NUMBER() OVER (PARTITION BY X.CUSTOMER_NAME ORDER BY X.KeyColumn1 ASC)
FROM
[iData3].[dbo].[N241650] AS X
) AS Z
WHERE
Z.Ranking = 1 AND
[N241650].KeyColumn1 = Z.KeyColumn1 AND
[N241650].KeyColumn2 = Z.KeyColumn2
)
You need something unique in each row. If you have that, you can use:
SELECT CUSTOMER_NAME
FROM [iData3].[dbo].[N241650]
WHERE pk = (SELECT MIN(n2.pk)
FROM [iData3].[dbo].[N241650] n2
WHERE n2.CUSTOMER_NAME = N241650.N241650
);
pk is the unique column.

Sub query to return the most recent instance of an entity [duplicate]

Table:
UserId, Value, Date.
I want to get the UserId, Value for the max(Date) for each UserId. That is, the Value for each UserId that has the latest date. Is there a way to do this simply in SQL? (Preferably Oracle)
Update: Apologies for any ambiguity: I need to get ALL the UserIds. But for each UserId, only that row where that user has the latest date.
I see many people use subqueries or else window functions to do this, but I often do this kind of query without subqueries in the following way. It uses plain, standard SQL so it should work in any brand of RDBMS.
SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t2
ON (t1.UserId = t2.UserId AND t1."Date" < t2."Date")
WHERE t2.UserId IS NULL;
In other words: fetch the row from t1 where no other row exists with the same UserId and a greater Date.
(I put the identifier "Date" in delimiters because it's an SQL reserved word.)
In case if t1."Date" = t2."Date", doubling appears. Usually tables has auto_inc(seq) key, e.g. id.
To avoid doubling can be used follows:
SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t2
ON t1.UserId = t2.UserId AND ((t1."Date" < t2."Date")
OR (t1."Date" = t2."Date" AND t1.id < t2.id))
WHERE t2.UserId IS NULL;
Re comment from #Farhan:
Here's a more detailed explanation:
An outer join attempts to join t1 with t2. By default, all results of t1 are returned, and if there is a match in t2, it is also returned. If there is no match in t2 for a given row of t1, then the query still returns the row of t1, and uses NULL as a placeholder for all of t2's columns. That's just how outer joins work in general.
The trick in this query is to design the join's matching condition such that t2 must match the same userid, and a greater date. The idea being if a row exists in t2 that has a greater date, then the row in t1 it's compared against can't be the greatest date for that userid. But if there is no match -- i.e. if no row exists in t2 with a greater date than the row in t1 -- we know that the row in t1 was the row with the greatest date for the given userid.
In those cases (when there's no match), the columns of t2 will be NULL -- even the columns specified in the join condition. So that's why we use WHERE t2.UserId IS NULL, because we're searching for the cases where no row was found with a greater date for the given userid.
This will retrieve all rows for which the my_date column value is equal to the maximum value of my_date for that userid. This may retrieve multiple rows for the userid where the maximum date is on multiple rows.
select userid,
my_date,
...
from
(
select userid,
my_date,
...
max(my_date) over (partition by userid) max_my_date
from users
)
where my_date = max_my_date
"Analytic functions rock"
Edit: With regard to the first comment ...
"using analytic queries and a self-join defeats the purpose of analytic queries"
There is no self-join in this code. There is instead a predicate placed on the result of the inline view that contains the analytic function -- a very different matter, and completely standard practice.
"The default window in Oracle is from the first row in the partition to the current one"
The windowing clause is only applicable in the presence of the order by clause. With no order by clause, no windowing clause is applied by default and none can be explicitly specified.
The code works.
SELECT userid, MAX(value) KEEP (DENSE_RANK FIRST ORDER BY date DESC)
FROM table
GROUP BY userid
I don't know your exact columns names, but it would be something like this:
SELECT userid, value
FROM users u1
WHERE date = (
SELECT MAX(date)
FROM users u2
WHERE u1.userid = u2.userid
)
Not being at work, I don't have Oracle to hand, but I seem to recall that Oracle allows multiple columns to be matched in an IN clause, which should at least avoid the options that use a correlated subquery, which is seldom a good idea.
Something like this, perhaps (can't remember if the column list should be parenthesised or not):
SELECT *
FROM MyTable
WHERE (User, Date) IN
( SELECT User, MAX(Date) FROM MyTable GROUP BY User)
EDIT: Just tried it for real:
SQL> create table MyTable (usr char(1), dt date);
SQL> insert into mytable values ('A','01-JAN-2009');
SQL> insert into mytable values ('B','01-JAN-2009');
SQL> insert into mytable values ('A', '31-DEC-2008');
SQL> insert into mytable values ('B', '31-DEC-2008');
SQL> select usr, dt from mytable
2 where (usr, dt) in
3 ( select usr, max(dt) from mytable group by usr)
4 /
U DT
- ---------
A 01-JAN-09
B 01-JAN-09
So it works, although some of the new-fangly stuff mentioned elsewhere may be more performant.
I know you asked for Oracle, but in SQL 2005 we now use this:
-- Single Value
;WITH ByDate
AS (
SELECT UserId, Value, ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY Date DESC) RowNum
FROM UserDates
)
SELECT UserId, Value
FROM ByDate
WHERE RowNum = 1
-- Multiple values where dates match
;WITH ByDate
AS (
SELECT UserId, Value, RANK() OVER (PARTITION BY UserId ORDER BY Date DESC) Rnk
FROM UserDates
)
SELECT UserId, Value
FROM ByDate
WHERE Rnk = 1
I don't have Oracle to test it, but the most efficient solution is to use analytic queries. It should look something like this:
SELECT DISTINCT
UserId
, MaxValue
FROM (
SELECT UserId
, FIRST (Value) Over (
PARTITION BY UserId
ORDER BY Date DESC
) MaxValue
FROM SomeTable
)
I suspect that you can get rid of the outer query and put distinct on the inner, but I'm not sure. In the meantime I know this one works.
If you want to learn about analytic queries, I'd suggest reading http://www.orafaq.com/node/55 and http://www.akadia.com/services/ora_analytic_functions.html. Here is the short summary.
Under the hood analytic queries sort the whole dataset, then process it sequentially. As you process it you partition the dataset according to certain criteria, and then for each row looks at some window (defaults to the first value in the partition to the current row - that default is also the most efficient) and can compute values using a number of analytic functions (the list of which is very similar to the aggregate functions).
In this case here is what the inner query does. The whole dataset is sorted by UserId then Date DESC. Then it processes it in one pass. For each row you return the UserId and the first Date seen for that UserId (since dates are sorted DESC, that's the max date). This gives you your answer with duplicated rows. Then the outer DISTINCT squashes duplicates.
This is not a particularly spectacular example of analytic queries. For a much bigger win consider taking a table of financial receipts and calculating for each user and receipt, a running total of what they paid. Analytic queries solve that efficiently. Other solutions are less efficient. Which is why they are part of the 2003 SQL standard. (Unfortunately Postgres doesn't have them yet. Grrr...)
Wouldn't a QUALIFY clause be both simplest and best?
select userid, my_date, ...
from users
qualify rank() over (partition by userid order by my_date desc) = 1
For context, on Teradata here a decent size test of this runs in 17s with this QUALIFY version and in 23s with the 'inline view'/Aldridge solution #1.
In Oracle 12c+, you can use Top n queries along with analytic function rank to achieve this very concisely without subqueries:
select *
from your_table
order by rank() over (partition by user_id order by my_date desc)
fetch first 1 row with ties;
The above returns all the rows with max my_date per user.
If you want only one row with max date, then replace the rank with row_number:
select *
from your_table
order by row_number() over (partition by user_id order by my_date desc)
fetch first 1 row with ties;
With PostgreSQL 8.4 or later, you can use this:
select user_id, user_value_1, user_value_2
from (select user_id, user_value_1, user_value_2, row_number()
over (partition by user_id order by user_date desc)
from users) as r
where r.row_number=1
Just had to write a "live" example at work :)
This one supports multiple values for UserId on the same date.
Columns:
UserId, Value, Date
SELECT
DISTINCT UserId,
MAX(Date) OVER (PARTITION BY UserId ORDER BY Date DESC),
MAX(Values) OVER (PARTITION BY UserId ORDER BY Date DESC)
FROM
(
SELECT UserId, Date, SUM(Value) As Values
FROM <<table_name>>
GROUP BY UserId, Date
)
You can use FIRST_VALUE instead of MAX and look it up in the explain plan. I didn't have the time to play with it.
Of course, if searching through huge tables, it's probably better if you use FULL hints in your query.
I'm quite late to the party but the following hack will outperform both correlated subqueries and any analytics function but has one restriction: values must convert to strings. So it works for dates, numbers and other strings. The code does not look good but the execution profile is great.
select
userid,
to_number(substr(max(to_char(date,'yyyymmdd') || to_char(value)), 9)) as value,
max(date) as date
from
users
group by
userid
The reason why this code works so well is that it only needs to scan the table once. It does not require any indexes and most importantly it does not need to sort the table, which most analytics functions do. Indexes will help though if you need to filter the result for a single userid.
Use ROW_NUMBER() to assign a unique ranking on descending Date for each UserId, then filter to the first row for each UserId (i.e., ROW_NUMBER = 1).
SELECT UserId, Value, Date
FROM (SELECT UserId, Value, Date,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY Date DESC) rn
FROM users) u
WHERE rn = 1;
If you're using Postgres, you can use array_agg like
SELECT userid,MAX(adate),(array_agg(value ORDER BY adate DESC))[1] as value
FROM YOURTABLE
GROUP BY userid
I'm not familiar with Oracle. This is what I came up with
SELECT
userid,
MAX(adate),
SUBSTR(
(LISTAGG(value, ',') WITHIN GROUP (ORDER BY adate DESC)),
0,
INSTR((LISTAGG(value, ',') WITHIN GROUP (ORDER BY adate DESC)), ',')-1
) as value
FROM YOURTABLE
GROUP BY userid
Both queries return the same results as the accepted answer. See SQLFiddles:
Accepted answer
My solution with Postgres
My solution with Oracle
I think something like this. (Forgive me for any syntax mistakes; I'm used to using HQL at this point!)
EDIT: Also misread the question! Corrected the query...
SELECT UserId, Value
FROM Users AS user
WHERE Date = (
SELECT MAX(Date)
FROM Users AS maxtest
WHERE maxtest.UserId = user.UserId
)
i thing you shuold make this variant to previous query:
SELECT UserId, Value FROM Users U1 WHERE
Date = ( SELECT MAX(Date) FROM Users where UserId = U1.UserId)
Select
UserID,
Value,
Date
From
Table,
(
Select
UserID,
Max(Date) as MDate
From
Table
Group by
UserID
) as subQuery
Where
Table.UserID = subQuery.UserID and
Table.Date = subQuery.mDate
select VALUE from TABLE1 where TIME =
(select max(TIME) from TABLE1 where DATE=
(select max(DATE) from TABLE1 where CRITERIA=CRITERIA))
(T-SQL) First get all the users and their maxdate. Join with the table to find the corresponding values for the users on the maxdates.
create table users (userid int , value int , date datetime)
insert into users values (1, 1, '20010101')
insert into users values (1, 2, '20020101')
insert into users values (2, 1, '20010101')
insert into users values (2, 3, '20030101')
select T1.userid, T1.value, T1.date
from users T1,
(select max(date) as maxdate, userid from users group by userid) T2
where T1.userid= T2.userid and T1.date = T2.maxdate
results:
userid value date
----------- ----------- --------------------------
2 3 2003-01-01 00:00:00.000
1 2 2002-01-01 00:00:00.000
The answer here is Oracle only. Here's a bit more sophisticated answer in all SQL:
Who has the best overall homework result (maximum sum of homework points)?
SELECT FIRST, LAST, SUM(POINTS) AS TOTAL
FROM STUDENTS S, RESULTS R
WHERE S.SID = R.SID AND R.CAT = 'H'
GROUP BY S.SID, FIRST, LAST
HAVING SUM(POINTS) >= ALL (SELECT SUM (POINTS)
FROM RESULTS
WHERE CAT = 'H'
GROUP BY SID)
And a more difficult example, which need some explanation, for which I don't have time atm:
Give the book (ISBN and title) that is most popular in 2008, i.e., which is borrowed most often in 2008.
SELECT X.ISBN, X.title, X.loans
FROM (SELECT Book.ISBN, Book.title, count(Loan.dateTimeOut) AS loans
FROM CatalogEntry Book
LEFT JOIN BookOnShelf Copy
ON Book.bookId = Copy.bookId
LEFT JOIN (SELECT * FROM Loan WHERE YEAR(Loan.dateTimeOut) = 2008) Loan
ON Copy.copyId = Loan.copyId
GROUP BY Book.title) X
HAVING loans >= ALL (SELECT count(Loan.dateTimeOut) AS loans
FROM CatalogEntry Book
LEFT JOIN BookOnShelf Copy
ON Book.bookId = Copy.bookId
LEFT JOIN (SELECT * FROM Loan WHERE YEAR(Loan.dateTimeOut) = 2008) Loan
ON Copy.copyId = Loan.copyId
GROUP BY Book.title);
Hope this helps (anyone).. :)
Regards,
Guus
Assuming Date is unique for a given UserID, here's some TSQL:
SELECT
UserTest.UserID, UserTest.Value
FROM UserTest
INNER JOIN
(
SELECT UserID, MAX(Date) MaxDate
FROM UserTest
GROUP BY UserID
) Dates
ON UserTest.UserID = Dates.UserID
AND UserTest.Date = Dates.MaxDate
Solution for MySQL which doesn't have concepts of partition KEEP, DENSE_RANK.
select userid,
my_date,
...
from
(
select #sno:= case when #pid<>userid then 0
else #sno+1
end as serialnumber,
#pid:=userid,
my_Date,
...
from users order by userid, my_date
) a
where a.serialnumber=0
Reference: http://benincampus.blogspot.com/2013/08/select-rows-which-have-maxmin-value-in.html
select userid, value, date
from thetable t1 ,
( select t2.userid, max(t2.date) date2
from thetable t2
group by t2.userid ) t3
where t3.userid t1.userid and
t3.date2 = t1.date
IMHO this works. HTH
I think this should work?
Select
T1.UserId,
(Select Top 1 T2.Value From Table T2 Where T2.UserId = T1.UserId Order By Date Desc) As 'Value'
From
Table T1
Group By
T1.UserId
Order By
T1.UserId
First try I misread the question, following the top answer, here is a complete example with correct results:
CREATE TABLE table_name (id int, the_value varchar(2), the_date datetime);
INSERT INTO table_name (id,the_value,the_date) VALUES(1 ,'a','1/1/2000');
INSERT INTO table_name (id,the_value,the_date) VALUES(1 ,'b','2/2/2002');
INSERT INTO table_name (id,the_value,the_date) VALUES(2 ,'c','1/1/2000');
INSERT INTO table_name (id,the_value,the_date) VALUES(2 ,'d','3/3/2003');
INSERT INTO table_name (id,the_value,the_date) VALUES(2 ,'e','3/3/2003');
--
select id, the_value
from table_name u1
where the_date = (select max(the_date)
from table_name u2
where u1.id = u2.id)
--
id the_value
----------- ---------
2 d
2 e
1 b
(3 row(s) affected)
This will also take care of duplicates (return one row for each user_id):
SELECT *
FROM (
SELECT u.*, FIRST_VALUE(u.rowid) OVER(PARTITION BY u.user_id ORDER BY u.date DESC) AS last_rowid
FROM users u
) u2
WHERE u2.rowid = u2.last_rowid
Just tested this and it seems to work on a logging table
select ColumnNames, max(DateColumn) from log group by ColumnNames order by 1 desc
This should be as simple as:
SELECT UserId, Value
FROM Users u
WHERE Date = (SELECT MAX(Date) FROM Users WHERE UserID = u.UserID)
If (UserID, Date) is unique, i.e. no date appears twice for the same user then:
select TheTable.UserID, TheTable.Value
from TheTable inner join (select UserID, max([Date]) MaxDate
from TheTable
group by UserID) UserMaxDate
on TheTable.UserID = UserMaxDate.UserID
TheTable.[Date] = UserMaxDate.MaxDate;
select UserId,max(Date) over (partition by UserId) value from users;

Retrieve row with latest date ORACLE [duplicate]

Table:
UserId, Value, Date.
I want to get the UserId, Value for the max(Date) for each UserId. That is, the Value for each UserId that has the latest date. Is there a way to do this simply in SQL? (Preferably Oracle)
Update: Apologies for any ambiguity: I need to get ALL the UserIds. But for each UserId, only that row where that user has the latest date.
I see many people use subqueries or else window functions to do this, but I often do this kind of query without subqueries in the following way. It uses plain, standard SQL so it should work in any brand of RDBMS.
SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t2
ON (t1.UserId = t2.UserId AND t1."Date" < t2."Date")
WHERE t2.UserId IS NULL;
In other words: fetch the row from t1 where no other row exists with the same UserId and a greater Date.
(I put the identifier "Date" in delimiters because it's an SQL reserved word.)
In case if t1."Date" = t2."Date", doubling appears. Usually tables has auto_inc(seq) key, e.g. id.
To avoid doubling can be used follows:
SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t2
ON t1.UserId = t2.UserId AND ((t1."Date" < t2."Date")
OR (t1."Date" = t2."Date" AND t1.id < t2.id))
WHERE t2.UserId IS NULL;
Re comment from #Farhan:
Here's a more detailed explanation:
An outer join attempts to join t1 with t2. By default, all results of t1 are returned, and if there is a match in t2, it is also returned. If there is no match in t2 for a given row of t1, then the query still returns the row of t1, and uses NULL as a placeholder for all of t2's columns. That's just how outer joins work in general.
The trick in this query is to design the join's matching condition such that t2 must match the same userid, and a greater date. The idea being if a row exists in t2 that has a greater date, then the row in t1 it's compared against can't be the greatest date for that userid. But if there is no match -- i.e. if no row exists in t2 with a greater date than the row in t1 -- we know that the row in t1 was the row with the greatest date for the given userid.
In those cases (when there's no match), the columns of t2 will be NULL -- even the columns specified in the join condition. So that's why we use WHERE t2.UserId IS NULL, because we're searching for the cases where no row was found with a greater date for the given userid.
This will retrieve all rows for which the my_date column value is equal to the maximum value of my_date for that userid. This may retrieve multiple rows for the userid where the maximum date is on multiple rows.
select userid,
my_date,
...
from
(
select userid,
my_date,
...
max(my_date) over (partition by userid) max_my_date
from users
)
where my_date = max_my_date
"Analytic functions rock"
Edit: With regard to the first comment ...
"using analytic queries and a self-join defeats the purpose of analytic queries"
There is no self-join in this code. There is instead a predicate placed on the result of the inline view that contains the analytic function -- a very different matter, and completely standard practice.
"The default window in Oracle is from the first row in the partition to the current one"
The windowing clause is only applicable in the presence of the order by clause. With no order by clause, no windowing clause is applied by default and none can be explicitly specified.
The code works.
SELECT userid, MAX(value) KEEP (DENSE_RANK FIRST ORDER BY date DESC)
FROM table
GROUP BY userid
I don't know your exact columns names, but it would be something like this:
SELECT userid, value
FROM users u1
WHERE date = (
SELECT MAX(date)
FROM users u2
WHERE u1.userid = u2.userid
)
Not being at work, I don't have Oracle to hand, but I seem to recall that Oracle allows multiple columns to be matched in an IN clause, which should at least avoid the options that use a correlated subquery, which is seldom a good idea.
Something like this, perhaps (can't remember if the column list should be parenthesised or not):
SELECT *
FROM MyTable
WHERE (User, Date) IN
( SELECT User, MAX(Date) FROM MyTable GROUP BY User)
EDIT: Just tried it for real:
SQL> create table MyTable (usr char(1), dt date);
SQL> insert into mytable values ('A','01-JAN-2009');
SQL> insert into mytable values ('B','01-JAN-2009');
SQL> insert into mytable values ('A', '31-DEC-2008');
SQL> insert into mytable values ('B', '31-DEC-2008');
SQL> select usr, dt from mytable
2 where (usr, dt) in
3 ( select usr, max(dt) from mytable group by usr)
4 /
U DT
- ---------
A 01-JAN-09
B 01-JAN-09
So it works, although some of the new-fangly stuff mentioned elsewhere may be more performant.
I know you asked for Oracle, but in SQL 2005 we now use this:
-- Single Value
;WITH ByDate
AS (
SELECT UserId, Value, ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY Date DESC) RowNum
FROM UserDates
)
SELECT UserId, Value
FROM ByDate
WHERE RowNum = 1
-- Multiple values where dates match
;WITH ByDate
AS (
SELECT UserId, Value, RANK() OVER (PARTITION BY UserId ORDER BY Date DESC) Rnk
FROM UserDates
)
SELECT UserId, Value
FROM ByDate
WHERE Rnk = 1
I don't have Oracle to test it, but the most efficient solution is to use analytic queries. It should look something like this:
SELECT DISTINCT
UserId
, MaxValue
FROM (
SELECT UserId
, FIRST (Value) Over (
PARTITION BY UserId
ORDER BY Date DESC
) MaxValue
FROM SomeTable
)
I suspect that you can get rid of the outer query and put distinct on the inner, but I'm not sure. In the meantime I know this one works.
If you want to learn about analytic queries, I'd suggest reading http://www.orafaq.com/node/55 and http://www.akadia.com/services/ora_analytic_functions.html. Here is the short summary.
Under the hood analytic queries sort the whole dataset, then process it sequentially. As you process it you partition the dataset according to certain criteria, and then for each row looks at some window (defaults to the first value in the partition to the current row - that default is also the most efficient) and can compute values using a number of analytic functions (the list of which is very similar to the aggregate functions).
In this case here is what the inner query does. The whole dataset is sorted by UserId then Date DESC. Then it processes it in one pass. For each row you return the UserId and the first Date seen for that UserId (since dates are sorted DESC, that's the max date). This gives you your answer with duplicated rows. Then the outer DISTINCT squashes duplicates.
This is not a particularly spectacular example of analytic queries. For a much bigger win consider taking a table of financial receipts and calculating for each user and receipt, a running total of what they paid. Analytic queries solve that efficiently. Other solutions are less efficient. Which is why they are part of the 2003 SQL standard. (Unfortunately Postgres doesn't have them yet. Grrr...)
Wouldn't a QUALIFY clause be both simplest and best?
select userid, my_date, ...
from users
qualify rank() over (partition by userid order by my_date desc) = 1
For context, on Teradata here a decent size test of this runs in 17s with this QUALIFY version and in 23s with the 'inline view'/Aldridge solution #1.
In Oracle 12c+, you can use Top n queries along with analytic function rank to achieve this very concisely without subqueries:
select *
from your_table
order by rank() over (partition by user_id order by my_date desc)
fetch first 1 row with ties;
The above returns all the rows with max my_date per user.
If you want only one row with max date, then replace the rank with row_number:
select *
from your_table
order by row_number() over (partition by user_id order by my_date desc)
fetch first 1 row with ties;
With PostgreSQL 8.4 or later, you can use this:
select user_id, user_value_1, user_value_2
from (select user_id, user_value_1, user_value_2, row_number()
over (partition by user_id order by user_date desc)
from users) as r
where r.row_number=1
Just had to write a "live" example at work :)
This one supports multiple values for UserId on the same date.
Columns:
UserId, Value, Date
SELECT
DISTINCT UserId,
MAX(Date) OVER (PARTITION BY UserId ORDER BY Date DESC),
MAX(Values) OVER (PARTITION BY UserId ORDER BY Date DESC)
FROM
(
SELECT UserId, Date, SUM(Value) As Values
FROM <<table_name>>
GROUP BY UserId, Date
)
You can use FIRST_VALUE instead of MAX and look it up in the explain plan. I didn't have the time to play with it.
Of course, if searching through huge tables, it's probably better if you use FULL hints in your query.
I'm quite late to the party but the following hack will outperform both correlated subqueries and any analytics function but has one restriction: values must convert to strings. So it works for dates, numbers and other strings. The code does not look good but the execution profile is great.
select
userid,
to_number(substr(max(to_char(date,'yyyymmdd') || to_char(value)), 9)) as value,
max(date) as date
from
users
group by
userid
The reason why this code works so well is that it only needs to scan the table once. It does not require any indexes and most importantly it does not need to sort the table, which most analytics functions do. Indexes will help though if you need to filter the result for a single userid.
Use ROW_NUMBER() to assign a unique ranking on descending Date for each UserId, then filter to the first row for each UserId (i.e., ROW_NUMBER = 1).
SELECT UserId, Value, Date
FROM (SELECT UserId, Value, Date,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY Date DESC) rn
FROM users) u
WHERE rn = 1;
If you're using Postgres, you can use array_agg like
SELECT userid,MAX(adate),(array_agg(value ORDER BY adate DESC))[1] as value
FROM YOURTABLE
GROUP BY userid
I'm not familiar with Oracle. This is what I came up with
SELECT
userid,
MAX(adate),
SUBSTR(
(LISTAGG(value, ',') WITHIN GROUP (ORDER BY adate DESC)),
0,
INSTR((LISTAGG(value, ',') WITHIN GROUP (ORDER BY adate DESC)), ',')-1
) as value
FROM YOURTABLE
GROUP BY userid
Both queries return the same results as the accepted answer. See SQLFiddles:
Accepted answer
My solution with Postgres
My solution with Oracle
I think something like this. (Forgive me for any syntax mistakes; I'm used to using HQL at this point!)
EDIT: Also misread the question! Corrected the query...
SELECT UserId, Value
FROM Users AS user
WHERE Date = (
SELECT MAX(Date)
FROM Users AS maxtest
WHERE maxtest.UserId = user.UserId
)
i thing you shuold make this variant to previous query:
SELECT UserId, Value FROM Users U1 WHERE
Date = ( SELECT MAX(Date) FROM Users where UserId = U1.UserId)
Select
UserID,
Value,
Date
From
Table,
(
Select
UserID,
Max(Date) as MDate
From
Table
Group by
UserID
) as subQuery
Where
Table.UserID = subQuery.UserID and
Table.Date = subQuery.mDate
select VALUE from TABLE1 where TIME =
(select max(TIME) from TABLE1 where DATE=
(select max(DATE) from TABLE1 where CRITERIA=CRITERIA))
(T-SQL) First get all the users and their maxdate. Join with the table to find the corresponding values for the users on the maxdates.
create table users (userid int , value int , date datetime)
insert into users values (1, 1, '20010101')
insert into users values (1, 2, '20020101')
insert into users values (2, 1, '20010101')
insert into users values (2, 3, '20030101')
select T1.userid, T1.value, T1.date
from users T1,
(select max(date) as maxdate, userid from users group by userid) T2
where T1.userid= T2.userid and T1.date = T2.maxdate
results:
userid value date
----------- ----------- --------------------------
2 3 2003-01-01 00:00:00.000
1 2 2002-01-01 00:00:00.000
The answer here is Oracle only. Here's a bit more sophisticated answer in all SQL:
Who has the best overall homework result (maximum sum of homework points)?
SELECT FIRST, LAST, SUM(POINTS) AS TOTAL
FROM STUDENTS S, RESULTS R
WHERE S.SID = R.SID AND R.CAT = 'H'
GROUP BY S.SID, FIRST, LAST
HAVING SUM(POINTS) >= ALL (SELECT SUM (POINTS)
FROM RESULTS
WHERE CAT = 'H'
GROUP BY SID)
And a more difficult example, which need some explanation, for which I don't have time atm:
Give the book (ISBN and title) that is most popular in 2008, i.e., which is borrowed most often in 2008.
SELECT X.ISBN, X.title, X.loans
FROM (SELECT Book.ISBN, Book.title, count(Loan.dateTimeOut) AS loans
FROM CatalogEntry Book
LEFT JOIN BookOnShelf Copy
ON Book.bookId = Copy.bookId
LEFT JOIN (SELECT * FROM Loan WHERE YEAR(Loan.dateTimeOut) = 2008) Loan
ON Copy.copyId = Loan.copyId
GROUP BY Book.title) X
HAVING loans >= ALL (SELECT count(Loan.dateTimeOut) AS loans
FROM CatalogEntry Book
LEFT JOIN BookOnShelf Copy
ON Book.bookId = Copy.bookId
LEFT JOIN (SELECT * FROM Loan WHERE YEAR(Loan.dateTimeOut) = 2008) Loan
ON Copy.copyId = Loan.copyId
GROUP BY Book.title);
Hope this helps (anyone).. :)
Regards,
Guus
Assuming Date is unique for a given UserID, here's some TSQL:
SELECT
UserTest.UserID, UserTest.Value
FROM UserTest
INNER JOIN
(
SELECT UserID, MAX(Date) MaxDate
FROM UserTest
GROUP BY UserID
) Dates
ON UserTest.UserID = Dates.UserID
AND UserTest.Date = Dates.MaxDate
Solution for MySQL which doesn't have concepts of partition KEEP, DENSE_RANK.
select userid,
my_date,
...
from
(
select #sno:= case when #pid<>userid then 0
else #sno+1
end as serialnumber,
#pid:=userid,
my_Date,
...
from users order by userid, my_date
) a
where a.serialnumber=0
Reference: http://benincampus.blogspot.com/2013/08/select-rows-which-have-maxmin-value-in.html
select userid, value, date
from thetable t1 ,
( select t2.userid, max(t2.date) date2
from thetable t2
group by t2.userid ) t3
where t3.userid t1.userid and
t3.date2 = t1.date
IMHO this works. HTH
I think this should work?
Select
T1.UserId,
(Select Top 1 T2.Value From Table T2 Where T2.UserId = T1.UserId Order By Date Desc) As 'Value'
From
Table T1
Group By
T1.UserId
Order By
T1.UserId
First try I misread the question, following the top answer, here is a complete example with correct results:
CREATE TABLE table_name (id int, the_value varchar(2), the_date datetime);
INSERT INTO table_name (id,the_value,the_date) VALUES(1 ,'a','1/1/2000');
INSERT INTO table_name (id,the_value,the_date) VALUES(1 ,'b','2/2/2002');
INSERT INTO table_name (id,the_value,the_date) VALUES(2 ,'c','1/1/2000');
INSERT INTO table_name (id,the_value,the_date) VALUES(2 ,'d','3/3/2003');
INSERT INTO table_name (id,the_value,the_date) VALUES(2 ,'e','3/3/2003');
--
select id, the_value
from table_name u1
where the_date = (select max(the_date)
from table_name u2
where u1.id = u2.id)
--
id the_value
----------- ---------
2 d
2 e
1 b
(3 row(s) affected)
This will also take care of duplicates (return one row for each user_id):
SELECT *
FROM (
SELECT u.*, FIRST_VALUE(u.rowid) OVER(PARTITION BY u.user_id ORDER BY u.date DESC) AS last_rowid
FROM users u
) u2
WHERE u2.rowid = u2.last_rowid
Just tested this and it seems to work on a logging table
select ColumnNames, max(DateColumn) from log group by ColumnNames order by 1 desc
This should be as simple as:
SELECT UserId, Value
FROM Users u
WHERE Date = (SELECT MAX(Date) FROM Users WHERE UserID = u.UserID)
If (UserID, Date) is unique, i.e. no date appears twice for the same user then:
select TheTable.UserID, TheTable.Value
from TheTable inner join (select UserID, max([Date]) MaxDate
from TheTable
group by UserID) UserMaxDate
on TheTable.UserID = UserMaxDate.UserID
TheTable.[Date] = UserMaxDate.MaxDate;
select UserId,max(Date) over (partition by UserId) value from users;

SQL ranking over two tables

I have two tables with user rankings.
Table rankingA and rankingB.
Each table has the columns:
user_id
points
group_id
Higher the points so higher the rank of the user/group...
Now i try to get the group ranking for the question which rank has my group.
So far i have this SQL:
select sum(ra.points) as rapoints, sum(rb.points) as rbpoints from public.rankinga ra
LEFT JOIN public.rankingb rb ON ra.group_id=rb.group_id and ra.user_id=rb.user_id where
ra.group_id=200;
It returns the points from rankinga and rankinb for the group 200.
How can i get the rankings of the group? I tryd it with:
row_number() OVER (ORDER BY sum(rb.points) DESC) AS rankb
but got a wrong result.
My expected result for group_id 200 is:
rapoints,rbpoints,rarank, rbrank
420, 10, 3, same points as group_id 300 so rbrank 2 or 3
How can i get this?
Setup
CREATE TABLE rankinga
(
user_id bigint,
group_id bigint,
points integer
)
CREATE TABLE rankingb
(
user_id bigint,
group_id bigint,
points integer
)
insert into public.rankinga (user_id,group_id,points) values (1,100,120),(2,100,300), (3,100,20),(4,200,300),(5,200,120),(6,300,600);
insert into public.rankingb (user_id,group_id,points) values (1,100,5),(2,100,3),(3,100,10),(4,200,2),(5,200,8),(6,300,10);
I think you want to do this with union all, aggregation, and the window function. Joining the tables is likely to miss rows (if users are in one table but not the other) or over count (if you join on group). So this may do what you want:
select group_id, sum(rapoints) as rapoints, sum(rbpoints) as rbpoints,
sum(rapoints) + sum(rbpoints) as points,
dense_rank() over (order by sum(rapoints) + sum(rbpoints) desc) as ranking
from ((select ra.group_id, sum(ra.points) as rapoints, 0 as rbpoints
from public.rankinga ra
group by ra.group_id
) union all
(select rb.group_id, 0, sum(rb.points) as rbpoints
from public.rankingb rb
group by rb.group_id
)
) ab
group by group_id;
If you want to select just one group, then put this in a subquery (or CTE) and then select the group.
Here is a SQL Fiddle.
EDIT:
If you want just the result for one group, you still need to calculate the values for all groups. So:
select ab.*
from (<above query here>) ab
where group_id = 200;

Fetch the rows which have the Max value for a column for each distinct value of another column

Table:
UserId, Value, Date.
I want to get the UserId, Value for the max(Date) for each UserId. That is, the Value for each UserId that has the latest date. Is there a way to do this simply in SQL? (Preferably Oracle)
Update: Apologies for any ambiguity: I need to get ALL the UserIds. But for each UserId, only that row where that user has the latest date.
I see many people use subqueries or else window functions to do this, but I often do this kind of query without subqueries in the following way. It uses plain, standard SQL so it should work in any brand of RDBMS.
SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t2
ON (t1.UserId = t2.UserId AND t1."Date" < t2."Date")
WHERE t2.UserId IS NULL;
In other words: fetch the row from t1 where no other row exists with the same UserId and a greater Date.
(I put the identifier "Date" in delimiters because it's an SQL reserved word.)
In case if t1."Date" = t2."Date", doubling appears. Usually tables has auto_inc(seq) key, e.g. id.
To avoid doubling can be used follows:
SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t2
ON t1.UserId = t2.UserId AND ((t1."Date" < t2."Date")
OR (t1."Date" = t2."Date" AND t1.id < t2.id))
WHERE t2.UserId IS NULL;
Re comment from #Farhan:
Here's a more detailed explanation:
An outer join attempts to join t1 with t2. By default, all results of t1 are returned, and if there is a match in t2, it is also returned. If there is no match in t2 for a given row of t1, then the query still returns the row of t1, and uses NULL as a placeholder for all of t2's columns. That's just how outer joins work in general.
The trick in this query is to design the join's matching condition such that t2 must match the same userid, and a greater date. The idea being if a row exists in t2 that has a greater date, then the row in t1 it's compared against can't be the greatest date for that userid. But if there is no match -- i.e. if no row exists in t2 with a greater date than the row in t1 -- we know that the row in t1 was the row with the greatest date for the given userid.
In those cases (when there's no match), the columns of t2 will be NULL -- even the columns specified in the join condition. So that's why we use WHERE t2.UserId IS NULL, because we're searching for the cases where no row was found with a greater date for the given userid.
This will retrieve all rows for which the my_date column value is equal to the maximum value of my_date for that userid. This may retrieve multiple rows for the userid where the maximum date is on multiple rows.
select userid,
my_date,
...
from
(
select userid,
my_date,
...
max(my_date) over (partition by userid) max_my_date
from users
)
where my_date = max_my_date
"Analytic functions rock"
Edit: With regard to the first comment ...
"using analytic queries and a self-join defeats the purpose of analytic queries"
There is no self-join in this code. There is instead a predicate placed on the result of the inline view that contains the analytic function -- a very different matter, and completely standard practice.
"The default window in Oracle is from the first row in the partition to the current one"
The windowing clause is only applicable in the presence of the order by clause. With no order by clause, no windowing clause is applied by default and none can be explicitly specified.
The code works.
SELECT userid, MAX(value) KEEP (DENSE_RANK FIRST ORDER BY date DESC)
FROM table
GROUP BY userid
I don't know your exact columns names, but it would be something like this:
SELECT userid, value
FROM users u1
WHERE date = (
SELECT MAX(date)
FROM users u2
WHERE u1.userid = u2.userid
)
Not being at work, I don't have Oracle to hand, but I seem to recall that Oracle allows multiple columns to be matched in an IN clause, which should at least avoid the options that use a correlated subquery, which is seldom a good idea.
Something like this, perhaps (can't remember if the column list should be parenthesised or not):
SELECT *
FROM MyTable
WHERE (User, Date) IN
( SELECT User, MAX(Date) FROM MyTable GROUP BY User)
EDIT: Just tried it for real:
SQL> create table MyTable (usr char(1), dt date);
SQL> insert into mytable values ('A','01-JAN-2009');
SQL> insert into mytable values ('B','01-JAN-2009');
SQL> insert into mytable values ('A', '31-DEC-2008');
SQL> insert into mytable values ('B', '31-DEC-2008');
SQL> select usr, dt from mytable
2 where (usr, dt) in
3 ( select usr, max(dt) from mytable group by usr)
4 /
U DT
- ---------
A 01-JAN-09
B 01-JAN-09
So it works, although some of the new-fangly stuff mentioned elsewhere may be more performant.
I know you asked for Oracle, but in SQL 2005 we now use this:
-- Single Value
;WITH ByDate
AS (
SELECT UserId, Value, ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY Date DESC) RowNum
FROM UserDates
)
SELECT UserId, Value
FROM ByDate
WHERE RowNum = 1
-- Multiple values where dates match
;WITH ByDate
AS (
SELECT UserId, Value, RANK() OVER (PARTITION BY UserId ORDER BY Date DESC) Rnk
FROM UserDates
)
SELECT UserId, Value
FROM ByDate
WHERE Rnk = 1
I don't have Oracle to test it, but the most efficient solution is to use analytic queries. It should look something like this:
SELECT DISTINCT
UserId
, MaxValue
FROM (
SELECT UserId
, FIRST (Value) Over (
PARTITION BY UserId
ORDER BY Date DESC
) MaxValue
FROM SomeTable
)
I suspect that you can get rid of the outer query and put distinct on the inner, but I'm not sure. In the meantime I know this one works.
If you want to learn about analytic queries, I'd suggest reading http://www.orafaq.com/node/55 and http://www.akadia.com/services/ora_analytic_functions.html. Here is the short summary.
Under the hood analytic queries sort the whole dataset, then process it sequentially. As you process it you partition the dataset according to certain criteria, and then for each row looks at some window (defaults to the first value in the partition to the current row - that default is also the most efficient) and can compute values using a number of analytic functions (the list of which is very similar to the aggregate functions).
In this case here is what the inner query does. The whole dataset is sorted by UserId then Date DESC. Then it processes it in one pass. For each row you return the UserId and the first Date seen for that UserId (since dates are sorted DESC, that's the max date). This gives you your answer with duplicated rows. Then the outer DISTINCT squashes duplicates.
This is not a particularly spectacular example of analytic queries. For a much bigger win consider taking a table of financial receipts and calculating for each user and receipt, a running total of what they paid. Analytic queries solve that efficiently. Other solutions are less efficient. Which is why they are part of the 2003 SQL standard. (Unfortunately Postgres doesn't have them yet. Grrr...)
Wouldn't a QUALIFY clause be both simplest and best?
select userid, my_date, ...
from users
qualify rank() over (partition by userid order by my_date desc) = 1
For context, on Teradata here a decent size test of this runs in 17s with this QUALIFY version and in 23s with the 'inline view'/Aldridge solution #1.
In Oracle 12c+, you can use Top n queries along with analytic function rank to achieve this very concisely without subqueries:
select *
from your_table
order by rank() over (partition by user_id order by my_date desc)
fetch first 1 row with ties;
The above returns all the rows with max my_date per user.
If you want only one row with max date, then replace the rank with row_number:
select *
from your_table
order by row_number() over (partition by user_id order by my_date desc)
fetch first 1 row with ties;
With PostgreSQL 8.4 or later, you can use this:
select user_id, user_value_1, user_value_2
from (select user_id, user_value_1, user_value_2, row_number()
over (partition by user_id order by user_date desc)
from users) as r
where r.row_number=1
Just had to write a "live" example at work :)
This one supports multiple values for UserId on the same date.
Columns:
UserId, Value, Date
SELECT
DISTINCT UserId,
MAX(Date) OVER (PARTITION BY UserId ORDER BY Date DESC),
MAX(Values) OVER (PARTITION BY UserId ORDER BY Date DESC)
FROM
(
SELECT UserId, Date, SUM(Value) As Values
FROM <<table_name>>
GROUP BY UserId, Date
)
You can use FIRST_VALUE instead of MAX and look it up in the explain plan. I didn't have the time to play with it.
Of course, if searching through huge tables, it's probably better if you use FULL hints in your query.
I'm quite late to the party but the following hack will outperform both correlated subqueries and any analytics function but has one restriction: values must convert to strings. So it works for dates, numbers and other strings. The code does not look good but the execution profile is great.
select
userid,
to_number(substr(max(to_char(date,'yyyymmdd') || to_char(value)), 9)) as value,
max(date) as date
from
users
group by
userid
The reason why this code works so well is that it only needs to scan the table once. It does not require any indexes and most importantly it does not need to sort the table, which most analytics functions do. Indexes will help though if you need to filter the result for a single userid.
Use ROW_NUMBER() to assign a unique ranking on descending Date for each UserId, then filter to the first row for each UserId (i.e., ROW_NUMBER = 1).
SELECT UserId, Value, Date
FROM (SELECT UserId, Value, Date,
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY Date DESC) rn
FROM users) u
WHERE rn = 1;
If you're using Postgres, you can use array_agg like
SELECT userid,MAX(adate),(array_agg(value ORDER BY adate DESC))[1] as value
FROM YOURTABLE
GROUP BY userid
I'm not familiar with Oracle. This is what I came up with
SELECT
userid,
MAX(adate),
SUBSTR(
(LISTAGG(value, ',') WITHIN GROUP (ORDER BY adate DESC)),
0,
INSTR((LISTAGG(value, ',') WITHIN GROUP (ORDER BY adate DESC)), ',')-1
) as value
FROM YOURTABLE
GROUP BY userid
Both queries return the same results as the accepted answer. See SQLFiddles:
Accepted answer
My solution with Postgres
My solution with Oracle
I think something like this. (Forgive me for any syntax mistakes; I'm used to using HQL at this point!)
EDIT: Also misread the question! Corrected the query...
SELECT UserId, Value
FROM Users AS user
WHERE Date = (
SELECT MAX(Date)
FROM Users AS maxtest
WHERE maxtest.UserId = user.UserId
)
i thing you shuold make this variant to previous query:
SELECT UserId, Value FROM Users U1 WHERE
Date = ( SELECT MAX(Date) FROM Users where UserId = U1.UserId)
Select
UserID,
Value,
Date
From
Table,
(
Select
UserID,
Max(Date) as MDate
From
Table
Group by
UserID
) as subQuery
Where
Table.UserID = subQuery.UserID and
Table.Date = subQuery.mDate
select VALUE from TABLE1 where TIME =
(select max(TIME) from TABLE1 where DATE=
(select max(DATE) from TABLE1 where CRITERIA=CRITERIA))
(T-SQL) First get all the users and their maxdate. Join with the table to find the corresponding values for the users on the maxdates.
create table users (userid int , value int , date datetime)
insert into users values (1, 1, '20010101')
insert into users values (1, 2, '20020101')
insert into users values (2, 1, '20010101')
insert into users values (2, 3, '20030101')
select T1.userid, T1.value, T1.date
from users T1,
(select max(date) as maxdate, userid from users group by userid) T2
where T1.userid= T2.userid and T1.date = T2.maxdate
results:
userid value date
----------- ----------- --------------------------
2 3 2003-01-01 00:00:00.000
1 2 2002-01-01 00:00:00.000
The answer here is Oracle only. Here's a bit more sophisticated answer in all SQL:
Who has the best overall homework result (maximum sum of homework points)?
SELECT FIRST, LAST, SUM(POINTS) AS TOTAL
FROM STUDENTS S, RESULTS R
WHERE S.SID = R.SID AND R.CAT = 'H'
GROUP BY S.SID, FIRST, LAST
HAVING SUM(POINTS) >= ALL (SELECT SUM (POINTS)
FROM RESULTS
WHERE CAT = 'H'
GROUP BY SID)
And a more difficult example, which need some explanation, for which I don't have time atm:
Give the book (ISBN and title) that is most popular in 2008, i.e., which is borrowed most often in 2008.
SELECT X.ISBN, X.title, X.loans
FROM (SELECT Book.ISBN, Book.title, count(Loan.dateTimeOut) AS loans
FROM CatalogEntry Book
LEFT JOIN BookOnShelf Copy
ON Book.bookId = Copy.bookId
LEFT JOIN (SELECT * FROM Loan WHERE YEAR(Loan.dateTimeOut) = 2008) Loan
ON Copy.copyId = Loan.copyId
GROUP BY Book.title) X
HAVING loans >= ALL (SELECT count(Loan.dateTimeOut) AS loans
FROM CatalogEntry Book
LEFT JOIN BookOnShelf Copy
ON Book.bookId = Copy.bookId
LEFT JOIN (SELECT * FROM Loan WHERE YEAR(Loan.dateTimeOut) = 2008) Loan
ON Copy.copyId = Loan.copyId
GROUP BY Book.title);
Hope this helps (anyone).. :)
Regards,
Guus
Assuming Date is unique for a given UserID, here's some TSQL:
SELECT
UserTest.UserID, UserTest.Value
FROM UserTest
INNER JOIN
(
SELECT UserID, MAX(Date) MaxDate
FROM UserTest
GROUP BY UserID
) Dates
ON UserTest.UserID = Dates.UserID
AND UserTest.Date = Dates.MaxDate
Solution for MySQL which doesn't have concepts of partition KEEP, DENSE_RANK.
select userid,
my_date,
...
from
(
select #sno:= case when #pid<>userid then 0
else #sno+1
end as serialnumber,
#pid:=userid,
my_Date,
...
from users order by userid, my_date
) a
where a.serialnumber=0
Reference: http://benincampus.blogspot.com/2013/08/select-rows-which-have-maxmin-value-in.html
select userid, value, date
from thetable t1 ,
( select t2.userid, max(t2.date) date2
from thetable t2
group by t2.userid ) t3
where t3.userid t1.userid and
t3.date2 = t1.date
IMHO this works. HTH
I think this should work?
Select
T1.UserId,
(Select Top 1 T2.Value From Table T2 Where T2.UserId = T1.UserId Order By Date Desc) As 'Value'
From
Table T1
Group By
T1.UserId
Order By
T1.UserId
First try I misread the question, following the top answer, here is a complete example with correct results:
CREATE TABLE table_name (id int, the_value varchar(2), the_date datetime);
INSERT INTO table_name (id,the_value,the_date) VALUES(1 ,'a','1/1/2000');
INSERT INTO table_name (id,the_value,the_date) VALUES(1 ,'b','2/2/2002');
INSERT INTO table_name (id,the_value,the_date) VALUES(2 ,'c','1/1/2000');
INSERT INTO table_name (id,the_value,the_date) VALUES(2 ,'d','3/3/2003');
INSERT INTO table_name (id,the_value,the_date) VALUES(2 ,'e','3/3/2003');
--
select id, the_value
from table_name u1
where the_date = (select max(the_date)
from table_name u2
where u1.id = u2.id)
--
id the_value
----------- ---------
2 d
2 e
1 b
(3 row(s) affected)
This will also take care of duplicates (return one row for each user_id):
SELECT *
FROM (
SELECT u.*, FIRST_VALUE(u.rowid) OVER(PARTITION BY u.user_id ORDER BY u.date DESC) AS last_rowid
FROM users u
) u2
WHERE u2.rowid = u2.last_rowid
Just tested this and it seems to work on a logging table
select ColumnNames, max(DateColumn) from log group by ColumnNames order by 1 desc
This should be as simple as:
SELECT UserId, Value
FROM Users u
WHERE Date = (SELECT MAX(Date) FROM Users WHERE UserID = u.UserID)
If (UserID, Date) is unique, i.e. no date appears twice for the same user then:
select TheTable.UserID, TheTable.Value
from TheTable inner join (select UserID, max([Date]) MaxDate
from TheTable
group by UserID) UserMaxDate
on TheTable.UserID = UserMaxDate.UserID
TheTable.[Date] = UserMaxDate.MaxDate;
select UserId,max(Date) over (partition by UserId) value from users;