Max (SQL-Server) - sql

I have a table that looks like this:
BARCODE | PRICE | STARTDATE
007023819815 | 159000 | 2008-11-17 00:00:00.000
007023819815 | 319000 | 2009-02-01 00:00:00.000
How can I select so I can get the result like this:
BARCODE | PRICE | STARTDATE
007023819815 | 319000 | 2009-02-01 00:00:00.000
select by using max date.
Thanks in advance.

SELECT TOP 1 barcode, price, startdate
FROM TableName
ORDER BY startdate DESC
Or if there can be more than one rows.
SELECT barcode, price, startdate
FROM TableName A
WHERE startdate = (SELECT max(startdate) FROM TableName B WHERE B.barcode = A.barcode)
UPDATE
changed second query to view max values per barcode.

An elegant way to do that is using the analytic function row_number:
SELECT barcode, price, startdate
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY barcode ORDER BY startdate DESC) as rn
FROM YourTable
) subquery
WHERE rn = 1
If performance is an issue, check out some more complex options in this blog post.

Related

SQL - Convert ROW_NUMBER function with multiple order by

I am trying to convert this query to a subquery without the ROW_NUMBER function:
SELECT InvestorFundId, AsOfDate, AddedOn FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY InvestorFundId ORDER BY AsOfDate DESC, AddedOn DESC) AS HistoryIndex, *
FROM [Investor.Fund.History]
WHERE DataStatusId = 1 AND AsOfDate <= (SELECT PeriodDate FROM [Fund.Period] WHERE Id = 5461)
) G WHERE HISTORYINDEX = 1
Basically this is selecting the most recent [Investor.Fund.History] within a time period and depending on the status.
So far I have this:
SELECT InvestorFundId, MAX(AsOfDate) AS MaxAsOfDate, MAX(AddedOn) AS MaxAddedOn FROM [Investor.Fund.History]
WHERE DataStatusId = 1 AND AsOfDate <= (SELECT PeriodDate FROM [Fund.Period] WHERE Id = 5461)
GROUP BY InvestorFundId
My query gives the incorrect results and it does this because when i use the MAX function on multiple columns, it does not select the max based on the order of both columns like the ROW_NUMBER does, instead it selects the MAX no matter both columns order positions.
For example, if I have a subset of data which looks like this:
| InvestorFundId | AsOfDate | AddedOn |
| 1 | 2010-10-01 00:00:00.000 | 2012-04-18 09:29:33.277 |
| 1 | 2006-11-01 00:00:00.000 | 2013-04-18 11:25:23.033 |
The ROW_NUMBER function will return the following:
| 1 | 2010-10-01 00:00:00.000 | 2012-04-18 09:29:33.277 |
Whereas my function returns this:
| 1 | 2010-10-01 00:00:00.000 | 2013-04-18 11:25:23.033 |
Which as you can see, is not actually a row in the table.
I would like my function to correctly return the row in the table based on the MAX AsOfDATE AND AddedOn
Can anyone help?
If you have a unique id that identifies each row, then you can do:
WITH ifh as (
SELECT InvestorFundId, AsOfDate, AddedOn
FROM [Investor.Fund.History]
WHERE DataStatusId = 1 AND AsOfDate <= (SELECT PeriodDate FROM [Fund.Period] WHERE Id = 5461)
)
SELECT ifh.*
FROM ifh
WHERE ifh.? = (SELECT ?
FROM ifh ifh2
WHERE ifh2.InvestorFundId = ifh.InvestorFundId
ORDER BY AsOfDate DESC, AddedOn DESC
FETCH FIRST 1 ROW ONLY
);
The ? is for the column that uniquely identifies each row.
This is also possible to do using APPLY:
select ifh2.*
from (select distinct InvestorFundId
from ifh
) i cross apply
(select top (1) ifh2.*
from ifh ifh2
where fh2.InvestorFundId = i.InvestorFundId
order by AsOfDate DESC, AddedOn DESC
fetch first 1 row only
) ifh2;

get the id based on condition in group by

I'm trying to create a sql query to merge rows where there are equal dates. the idea is to do this based on the highest amount of hours, so that i in the end gets the corresponding id for each date with the highest amount of hours. i've been trying to do with a simple group by, but does not seem to work, since i CANT just put a aggregate function on id column, since it should be based the hours condition
+------+-------+--------------------------------------+
| id | date | hours |
+------+-------+--------------------------------------+
| 1 | 2012-01-01 | 37 |
| 2 | 2012-01-01 | 10 |
| 3 | 2012-01-01 | 5 |
| 4 | 2012-01-02 | 37 |
+------+-------+--------------------------------------+
desired result
+------+-------+--------------------------------------+
| id | date | hours |
+------+-------+--------------------------------------+
| 1 | 2012-01-01 | 37 |
| 4 | 2012-01-02 | 37 |
+------+-------+--------------------------------------+
If you want exactly one row -- even if there are ties -- then use row_number():
select t.*
from (select t.*, row_number() over (partition by date order by hours desc) as seqnum
from t
) t
where seqnum = 1;
Ironically, both Postgres and Oracle (the original tags) have what I would consider to be better ways of doing this, but they are quite different.
Postgres:
select distinct on (date) t.*
from t
order by date, hours desc;
Oracle:
select date, max(hours) as hours,
max(id) keep (dense_rank first over order by hours desc) as id
from t
group by date;
Here's one approach using row_number:
select id, dt, hours
from (
select id, dt, hours, row_number() over (partition by dt order by hours desc) rn
from yourtable
) t
where rn = 1
You can use subquery with correlation approach :
select t.*
from table t
where id = (select t1.id
from table t1
where t1.date = t.date
order by t1.hours desc
limit 1);
In Oracle you can use fetch first 1 row only in subquery instead of LIMIT clause.

Return one SQL row per product with price and latest date

I am not a SQL guy, I have used it in the past and rarely have an issue that cant be solved by google... however this time I need to ask the Community.
I have a database with a table called 'Transactions' it has data like this:
ProdNo | Price | TransactionDate | PurchasedBy | etc.....
----------------------------------------------------------
3STRFLEX | 13.02 | 20162911 | AWC | .....
3STRFLEX | 15.02 | 20162011 | DWC | .....
3STRFLEX | 15.02 | 20160101 | AWC | .....
AFTV2 | 35.49 | 20162708 | AWC | .....
AFTV2 | 29.99 | 20160106 | DWC | .....
AFTV2 | 29.99 | 20160205 | AWC | .....
The desired output is:
ProdNo | Price | TransactionDate
-----------------------------------
3STRFLEX | 13.02 | 20162911
AFTV2 | 35.49 | 20162708
I have tried a to write this myself and I ended up with SQL like this:
select t.ProdNo, t.TransactionDate as 'LastPurchaseDate', t.Price
from Transactions t
inner join (
select ProdNo, max(TransactionDate) as 'LastPurchaseDate'
from Transactions
WHERE Price > 0
group by ProdNo
) tm on t.ProdNo = tm.ProdNo and LastPurchaseDate = tm.LastPurchaseDate
However in my data set this returns (cut down) which shows multiple rows per product
ProdNo | LastPurchaseDate | Price
3STRFLX | 20120924 | 0.000000
3STRFLX | 20120924 | 22.000000
3STRFLX | 20150623 | 0.000000
3STRFLX | 20150623 | 1.220000
3STRFLX | 20150623 | 1.222197
So to confirm: I would like 1 row per product for the latest date it was purchased regardless of the price, but I need the price in the returned data.
Thanks
You can use a CTE and the ranking function PARTITION BY:
WITH CTE AS
(
select t.ProdNo, t.TransactionDate as 'LastPurchaseDate', t.Price,
rn = row_number() over (partition by ProdNo order by TransactionDate desc)
from Transactions t
)
SELECT ProdNo, LastPurchaseDate, Price FROM CTE WHERE RN = 1
You're on the right track. What if you use:
select t.ProdNo, t.TransactionDate as 'LastPurchaseDate', t.Price
from Transactions t
inner join (
select ProdNo, max(TransactionDate) as 'LastPurchaseDate'
from Transactions
WHERE Price > 0
group by ProdNo
) tm on t.ProdNo = tm.ProdNo and t.TransactionDate= tm.LastPurchaseDate
Note the change in join conditions.
What happened in your query: LastPurchaseDate = tm.LastPurchaseDate. There is only one column called LastPurchaseDate, so it's equating it with itself, which is always true. So you're left with t.ProdNo = tm.ProdNo, since t.ProdNo is not unique, you get multiple records returned for each t.ProdNo.
One method uses the ROW_NUMBER windowed function.
My query uses a common table expression (CTE) to provide sample data. When using this technique you always need a CTE, or subquery. This is because values generated in the SELECT clause are not available to the WHERE clause. This is a consequence of something called the Logical Processing Order. In other words; SQL Server generates the row numbers after it has filtered the data. CTEs/Subqueries provide you with a second WHERE clause, that is actioned after the row numbers have been generated**.
-- Returning the most recent record from a transaction table.
WITH SampleDate AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY ProdNo ORDER BY TransactionDate DESC) AS Rn,
*
FROM
(
VALUES
('3STRFLEX', 13.02, '20162911', 'AWC '),
('3STRFLEX', 15.02, '20162011', 'DWC '),
('3STRFLEX', 15.02, '20160101', 'AWC '),
('AFTV2' , 35.49, '20162708', 'AWC '),
('AFTV2' , 29.99, '20160106', 'DWC '),
('AFTV2' , 29.99, '20160205', 'AWC ')
) AS x(ProdNo, Price, TransactionDate, PurchasedBy)
)
SELECT
*
FROM
SampleDate
WHERE
Rn = 1
;
** Actually this isn't entirely true. It is called the logical order for a reason. SQL Sever can/will exeucte your queries any way it sees fit. But however your query is physically executed it will respect the logical order.
Try this. by using row_number().
select * from
(
select
T.ProdNo,
T.TransactionDate as 'LastPurchaseDate',
T.Price,
row_number() over (partition by ProdNo order by TransactionDate desc) as rnk
from Transactions T
)a
where rnk='1'
Pls Try this
--If one ProdNo has two same max TransactionDate, will return both two rows
select ProdNo,Price,TransactionDate from Transactions t
where not exists (select 1 from Transactions where ProdNo=t.ProdNo and TransactionDate>t.TransactionDate)
SELECT A.ProdNo ,Price ,A.TransactionDate,PurchasedBy
FROM #Transactions
JOIN
(
SELECT ProdNo,MAX(TransactionDate) TransactionDate
FROM #Transactions
GROUP BY ProdNo
)A ON A.TransactionDate = #Trans.TransactionDate

SQL Server : select from duplicate columns where date newest

I have inherited a SQL Server table in the (abbreviated) form of (includes sample data set):
| SID | name | Invite_Date |
|-----|-------|-------------|
| 101 | foo | 2013-01-06 |
| 102 | bar | 2013-04-04 |
| 101 | fubar | 2013-03-06 |
I need to select all SID's and the Invite_date, but if there is a duplicate SID, then just get the latest entry (by date).
So the results from the above would look like:
101 | fubar | 2013-03-06
102 | bar | 2013-04-04
Any ideas please.
N.B the Invite_date column has been declared as a nvarchar, so to get it in a date format I am using CONVERT(DATE, Invite_date)
You can use a ranking function like ROW_NUMBER or DENSE_RANK in a CTE:
WITH CTE AS
(
SELECT SID, name, Invite_Date,
rn = Row_Number() OVER (PARTITION By SID
Order By Invite_Date DESC)
FROM dbo.TableName
)
SELECT SID, name, Invite_Date
FROM CTE
WHERE RN = 1
Demo
Use Row_Number if you want exactly one row per group and Dense_Rank if you want all last Invite_Date rows for each group in case of repeating max-Invite_Dates.
select t1.*
from your_table t1
inner join
(
select sid, max(CONVERT(DATE, Invite_date)) mdate
from your_table
group by sid
) t2 on t1.sid = t2.sid and CONVERT(DATE, t1.Invite_date) = t2.mdate
select
SID,name,MAX(Invite_date)
FROM
Table1
GROUP BY
SID
http://sqlfiddle.com/#!2/6b6f66/1

Trouble using ROW_NUMBER() OVER (PARTITION BY ...)

I'm using SQL Server 2008 R2. I have table called EmployeeHistory with the following structure and sample data:
EmployeeID Date DepartmentID SupervisorID
10001 20130101 001 10009
10001 20130909 001 10019
10001 20131201 002 10018
10001 20140501 002 10017
10001 20141001 001 10015
10001 20141201 001 10014
Notice that the Employee 10001 has been changing 2 departments and several supervisors over time. What I am trying to do is to list the start and end dates of this employee's employment in each Department ordered by the Date field. So, the output will look like this:
EmployeeID DateStart DateEnd DepartmentID
10001 20130101 20131201 001
10001 20131201 20141001 002
10001 20141001 NULL 001
I intended to use partitioning the data using the following query but it failed. The Department changes from 001 to 002 and then back to 001. Obviously I cannot partition by DepartmentID... I'm sure I'm overlooking the obvious. Any help? Thank you, in advance.
SELECT * ,ROW_NUMBER() OVER (PARTITION BY EmployeeID, DepartmentID
ORDER BY [Date]) RN FROM EmployeeHistory
I would do something like this:
;WITH x
AS (SELECT *,
Row_number()
OVER(
partition BY employeeid
ORDER BY datestart) rn
FROM employeehistory)
SELECT *
FROM x x1
LEFT OUTER JOIN x x2
ON x1.rn = x2.rn + 1
Or maybe it would be x2.rn - 1. You'll have to see. In any case, you get the idea. Once you have the table joined on itself, you can filter, group, sort, etc. to get what you need.
A bit involved. Easiest would be to refer to this SQL Fiddle I created for you that produces the exact result. There are ways you can improve it for performance or other considerations, but this should hopefully at least be clearer than some alternatives.
The gist is, you get a canonical ranking of your data first, then use that to segment the data into groups, then find an end date for each group, then eliminate any intermediate rows. ROW_NUMBER() and CROSS APPLY help a lot in doing it readably.
EDIT 2019:
The SQL Fiddle does in fact seem to be broken, for some reason, but it appears to be a problem on the SQL Fiddle site. Here's a complete version, tested just now on SQL Server 2016:
CREATE TABLE Source
(
EmployeeID int,
DateStarted date,
DepartmentID int
)
INSERT INTO Source
VALUES
(10001,'2013-01-01',001),
(10001,'2013-09-09',001),
(10001,'2013-12-01',002),
(10001,'2014-05-01',002),
(10001,'2014-10-01',001),
(10001,'2014-12-01',001)
SELECT *,
ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY DateStarted) AS EntryRank,
newid() as GroupKey,
CAST(NULL AS date) AS EndDate
INTO #RankedData
FROM Source
;
UPDATE #RankedData
SET GroupKey = beginDate.GroupKey
FROM #RankedData sup
CROSS APPLY
(
SELECT TOP 1 GroupKey
FROM #RankedData sub
WHERE sub.EmployeeID = sup.EmployeeID AND
sub.DepartmentID = sup.DepartmentID AND
NOT EXISTS
(
SELECT *
FROM #RankedData bot
WHERE bot.EmployeeID = sup.EmployeeID AND
bot.EntryRank BETWEEN sub.EntryRank AND sup.EntryRank AND
bot.DepartmentID <> sup.DepartmentID
)
ORDER BY DateStarted ASC
) beginDate (GroupKey);
UPDATE #RankedData
SET EndDate = nextGroup.DateStarted
FROM #RankedData sup
CROSS APPLY
(
SELECT TOP 1 DateStarted
FROM #RankedData sub
WHERE sub.EmployeeID = sup.EmployeeID AND
sub.DepartmentID <> sup.DepartmentID AND
sub.EntryRank > sup.EntryRank
ORDER BY EntryRank ASC
) nextGroup (DateStarted);
SELECT * FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY GroupKey ORDER BY EntryRank ASC) AS GroupRank FROM #RankedData
) FinalRanking
WHERE GroupRank = 1
ORDER BY EntryRank;
DROP TABLE #RankedData
DROP TABLE Source
It looks like a common gaps-and-islands problem. The difference between two sequences of row numbers rn1 and rn2 give the "group" number.
Run this query CTE-by-CTE and examine intermediate results to see how it works.
Sample data
I expanded sample data from the question a little.
DECLARE #Source TABLE
(
EmployeeID int,
DateStarted date,
DepartmentID int
)
INSERT INTO #Source
VALUES
(10001,'2013-01-01',001),
(10001,'2013-09-09',001),
(10001,'2013-12-01',002),
(10001,'2014-05-01',002),
(10001,'2014-10-01',001),
(10001,'2014-12-01',001),
(10005,'2013-05-01',001),
(10005,'2013-11-09',001),
(10005,'2013-12-01',002),
(10005,'2014-10-01',001),
(10005,'2016-12-01',001);
Query for SQL Server 2008
There is no LEAD function in SQL Server 2008, so I had to use self-join via OUTER APPLY to get the value of the "next" row for the DateEnd.
WITH
CTE
AS
(
SELECT
EmployeeID
,DateStarted
,DepartmentID
,ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY DateStarted) AS rn1
,ROW_NUMBER() OVER (PARTITION BY EmployeeID, DepartmentID ORDER BY DateStarted) AS rn2
FROM #Source
)
,CTE_Groups
AS
(
SELECT
EmployeeID
,MIN(DateStarted) AS DateStart
,DepartmentID
FROM CTE
GROUP BY
EmployeeID
,DepartmentID
,rn1 - rn2
)
SELECT
CTE_Groups.EmployeeID
,CTE_Groups.DepartmentID
,CTE_Groups.DateStart
,A.DateEnd
FROM
CTE_Groups
OUTER APPLY
(
SELECT TOP(1) G2.DateStart AS DateEnd
FROM CTE_Groups AS G2
WHERE
G2.EmployeeID = CTE_Groups.EmployeeID
AND G2.DateStart > CTE_Groups.DateStart
ORDER BY G2.DateStart
) AS A
ORDER BY
EmployeeID
,DateStart
;
Query for SQL Server 2012+
Starting with SQL Server 2012 there is a LEAD function that makes this task more efficient.
WITH
CTE
AS
(
SELECT
EmployeeID
,DateStarted
,DepartmentID
,ROW_NUMBER() OVER (PARTITION BY EmployeeID ORDER BY DateStarted) AS rn1
,ROW_NUMBER() OVER (PARTITION BY EmployeeID, DepartmentID ORDER BY DateStarted) AS rn2
FROM #Source
)
,CTE_Groups
AS
(
SELECT
EmployeeID
,MIN(DateStarted) AS DateStart
,DepartmentID
FROM CTE
GROUP BY
EmployeeID
,DepartmentID
,rn1 - rn2
)
SELECT
CTE_Groups.EmployeeID
,CTE_Groups.DepartmentID
,CTE_Groups.DateStart
,LEAD(CTE_Groups.DateStart) OVER (PARTITION BY CTE_Groups.EmployeeID ORDER BY CTE_Groups.DateStart) AS DateEnd
FROM
CTE_Groups
ORDER BY
EmployeeID
,DateStart
;
Result
+------------+--------------+------------+------------+
| EmployeeID | DepartmentID | DateStart | DateEnd |
+------------+--------------+------------+------------+
| 10001 | 1 | 2013-01-01 | 2013-12-01 |
| 10001 | 2 | 2013-12-01 | 2014-10-01 |
| 10001 | 1 | 2014-10-01 | NULL |
| 10005 | 1 | 2013-05-01 | 2013-12-01 |
| 10005 | 2 | 2013-12-01 | 2014-10-01 |
| 10005 | 1 | 2014-10-01 | NULL |
+------------+--------------+------------+------------+