Get Distinct results of all columns based on MAX DATE of one - sql

Using SQL Server 2012
I have seen a few threads about this topic but I can't find one that involves multiple joins in the query. I can't create a VIEW on this database so the joins are needed.
The Query
SELECT
p.Price
,s.Type
,s.Symbol
, MAX(d.Date) Maxed
FROM AdventDW.dbo.FactPrices p
INNER JOIN dbo.DimSecurityMaster s
ON s.SecurityID = p.SecurityID
INNER JOIN dbo.DimDateTime d
ON
p.DateTimeKey = d.DateTimeKey
GROUP BY p.Price ,
s.Type ,
s.Symbol
ORDER BY s.Symbol
The query works but does not produce distinct results. I am using Order by to validate the results, but it is not required once I get it working. I The result set looks like this.
Price Type Symbol Maxed
10.57 bfus *bbkd 3/31/1989
10.77 bfus *bbkd 2/28/1990
100.74049 cbus 001397AA6 8/2/2005
100.8161 cbus 001397AA6 7/21/2005
The result set I want is
Price Type Symbol Maxed
10.77 bfus *bbkd 2/28/1990
100.74049 cbus 001397AA6 8/2/2005
Here were a few other StackOverflow threads I tried but couldn't get t work with my specific query
How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL?
SQL Selecting distinct rows from multiple columns based on max value in one column

If you want data for the maximum date, use row_number() rather than group by:
SELECT ts.*
FROM (SELECT p.Price, s.Type, s.Symbol, d.Date,
ROW_NUMBER() OVER (PARTITION BY s.Type, s.Symbol
ORDER BY d.Date DESC
) as seqnum
FROM AdventDW.dbo.FactPrices p INNER JOIN
dbo.DimSecurityMaster s
ON s.SecurityID = p.SecurityID INNER JOIN
dbo.DimDateTime d
ON p.DateTimeKey = d.DateTimeKey
) ts
WHERE seqnum = 1
ORDER BY s.Symbol;

You should use a derived table since you really only want to group the DateTimeKey table to get the MAX date.
SELECT p.Price ,
s.Type ,
s.Symbol ,
tmp.MaxDate
FROM AdventDW.dbo.FactPrices p
INNER JOIN dbo.DimSecurityMaster s ON s.SecurityID = p.SecurityID
INNER JOIN
( SELECT MAX(d.Date) AS MaxDate ,
d.DateTimeKey
FROM dbo.DimDateTime d
GROUP BY d.DateTimeKey ) tmp ON p.DateTimeKey = tmp.DateTimeKey
ORDER BY s.Symbol;

/*
this is your initial select which is fine because this is base from your original criteria,
I cannot ignore this so i'll keep this in-tact. Instead from here i'll create a temp
*/
SELECT
p.Price
, s.Type
, s.Symbol
, MAX(d.Date) Maxed
INTO #tmpT
FROM AdventDW.dbo.FactPrices p
INNER JOIN dbo.DimSecurityMaster s
ON s.SecurityID = p.SecurityID
INNER JOIN dbo.DimDateTime d
ON p.DateTimeKey = d.DateTimeKey
GROUP BY p.Price ,
s.Type ,
s.Symbol
ORDER BY s.Symbol
SELECT innerTable.Price, innerTable.Symbol, innerTable.Type, innerTable.Maxed
FROM (
SELECT
ROW_NUMBER () OVER (PARTITION BY t1.Symbol, t1.Type, t1.Maxed ORDER BY t1.Maxed DESC) as row
, *
FROM #tmpT AS t1
) AS innerTable
WHERE row = 1
DROP TABLE #tmpT

Related

show only MAX rows

i have a table which shows me workhours of postoffices, the problem is that sometimes there are duplicates: for example i have saturday showing twice from one postoffice with the same time.
Solution is to show only 1 saturday with MAX(ID), but i can't deal with that and tha think is that i don't have to show id in select.
this is my script
SELECT h.ID,
h.POSTINDEX,
H.LONGNAME_UA,
h.SHORTNAME_UA,
pt.LONGNAME_UA,
h.parent_Id,
WORKCOMMENT,
INTERVALTYPE,
TO_CHAR(TFROM, 'HH24:MI') AS TFROM,
TO_CHAR(TTO, 'HH24:MI'),
WD.NAME_UA,
WD.NAME_EN,
WD.NAME_RU,
WD.SHORTNAME_UA,
pt.isVPZ,
lr.NAME_UA,
lr.CODE
FROM ADDR_PO_WORKSCHEDULE tt
LEFT JOIN ADDR_POSTOFFICE h
ON tt.POSTOFFICE_ID = h.ID
INNER JOIN mdm_lockReason lr
ON lr.id = H.LOCK_REASON
INNER JOIN ADDR_POSTOFFICEtype pt
ON pt.ID = H.POSTOFFICETYPE_ID
INNER JOIN ADDR_PO_WORKDAYS wd
ON wd.ID = tt.dayofweek
where tt.datestop = TO_DATE('9999-12-31','YYYY-MM-DD') AND tt.postoffice_id = 8221
HAVING MAX(tt.ID)
ORDER BY h.postIndex,
h.POSTOFFICETYPE_ID,
dayofweek,
intervaltype,
tFrom,
tto
as you can see there i've add HAVING MAX(tt.ID) but i understand that it is incorrect and don't know how to solve that. help please!
You can use row_number():
with t as (
<your query here with no `having` and with tt.id as tt_id>
)
select t.*
from (select t.*, row_number() over (order by tt_id desc) as seqnum
from t
) t
where seqnum = 1;
If the source of your duplicate rows is coming from the ADDR_PO_WORKSCHEDULE table, then you can just de-dupe it before you use it in the rest of your query:
SELECT
h.ID, h.POSTINDEX, h.LONGNAME_UA, h.SHORTNAME_UA, pt.LONGNAME_UA, h.parent_Id,
WORKCOMMENT, INTERVALTYPE, TO_CHAR(TFROM, 'HH24:MI') AS TFROM, TO_CHAR(TTO, 'HH24:MI'),
WD.NAME_UA, WD.NAME_EN, WD.NAME_RU, WD.SHORTNAME_UA, pt.isVPZ, lr.NAME_UA, lr.CODE
FROM (
SELECT src.*
FROM (
SELECT t.*,
ROW_NUMBER() OVER(
PARTITION BY tt.postoffice_id, tt.dayofweek -- Group by post office / day of week
ORDER BY <order_columns> -- Rank rows within each group
) AS RowNum
FROM ADDR_PO_WORKSCHEDULE t
) src
WHERE RowNum = 1
) tt
LEFT JOIN ADDR_POSTOFFICE h ON tt.POSTOFFICE_ID = h.ID
INNER JOIN mdm_lockReason lr ON lr.id = H.LOCK_REASON
INNER JOIN ADDR_POSTOFFICEtype pt ON pt.ID = H.POSTOFFICETYPE_ID
INNER JOIN ADDR_PO_WORKDAYS wd ON wd.ID = tt.dayofweek
WHERE tt.datestop = TO_DATE('9999-12-31','YYYY-MM-DD')
AND tt.postoffice_id = 8221
ORDER BY h.postIndex, h.POSTOFFICETYPE_ID, dayofweek, intervaltype, tFrom, tto
Just replace with the columns you want to use to decide which of the "duplicate" rows to keep. If the order doesn't matter, just remove the ORDER BY altogether.
Also, you may want to qualify all column references, since you are selecting from multiple tables.
You can use analytical function as following:
Select * from
(Select <your select column list>,
Row_number() over (partition by h.postindex, trunc(tfrom) order by tt.id desc nulls last) as rn
From <from clause>
Where <where clause>
)
Where rn = 1;
group by is not needed.

Adding a Helper SQL Index

I have the following View which seems to work quickly enough but when I look at the Execution Plan, it shows the Top N Sort in the second query taking ~90% due to it being repeated for every row in the first query.
Should I be adding an Index to the Loan table to help the ORDER BY clause?
CREATE VIEW [dbo].[ResourceItemStatus] AS
SELECT
i.ID AS ItemID,
i.ResourceID,
i.DateAdded,
i.LocationID,
i.OwnerID,
i.Barcode,
i.MissingReasonID,
i.DateRemoved,
ll.PatronID,
ll.ID AS LoanID,
ll.IssueDateTime,
ll.DueDate,
ll.ReturnDateTime,
ll.LoanTypeID,
ll.RenewalCount,
ll.DeleteSummary,
ll.ReturnStatusID,
ll.FineID,
(SELECT COUNT(*) FROM Loan WHERE Loan.ItemID = i.ID) AS LoanCount,
(SELECT COUNT(*) FROM Item WHERE Item.DateRemoved IS NULL AND Item.ResourceID = i.ResourceID) AS AvailableItemCount
FROM Item i
OUTER APPLY
(
SELECT TOP 1
l.ID,
l.ItemID,
l.PatronID,
l.IssueDateTime,
l.DueDate,
l.ReturnDateTime,
l.LoanTypeID,
l.RenewalCount,
l.DeleteSummary,
l.ReturnStatusID,
l.FineID
FROM Loan l
WHERE l.ItemID = i.ID
ORDER BY l.IssueDateTime DESC, l.ID DESC
) AS ll
Try Windowed Aggregates instead of Scalar Subqueries/Outer Apply:
SELECT
i.ID AS ItemID,
i.ResourceID,
i.DateAdded,
i.LocationID,
i.OwnerID,
i.Barcode,
i.MissingReasonID,
i.DateRemoved,
ll.PatronID,
ll.ID AS LoanID,
ll.IssueDateTime,
ll.DueDate,
ll.ReturnDateTime,
ll.LoanTypeID,
ll.RenewalCount,
ll.DeleteSummary,
ll.ReturnStatusID,
ll.FineID,
coalesce(ll.LoanCount, 0)
COUNT(case when Item.DateRemoved IS NULL then 1 end)
over (partition by ResourceID) AS AvailableItemCount
FROM Item i
LEFT JOIN
(
SELECT
l.ID,
l.ItemID,
l.PatronID,
l.IssueDateTime,
l.DueDate,
l.ReturnDateTime,
l.LoanTypeID,
l.RenewalCount,
l.DeleteSummary,
l.ReturnStatusID,
l.FineID,
COUNT(*) over (partition by ItemId) AS LoanCount,
row_number()
over (partition by ItemId
order by l.IssueDateTime DESC, l.ID DESC) as rn
FROM Loan l
) as ll
on ll.ItemID = i.ID
and ll.rn = 1

Query in SQL Server 2014 for a report (I need the last ROW of a table)

I'm using SQL Server 2014 and I have a problem with a query.
I want to have in my report, ALL the items of the order with ID_Order = 9 that have been delivered. And for the items that have been delivered at two times (Item Code = Art3 for example), I just want to have the last row, that means the last delivery of this Item, with NO repetition.
I already tried these two queries without success:
Attempt #1: DISTINCT
SELECT DISTINCT
Order.ItemCode, Delivery. Qty, Delivery.ID_Delivery,
Order.ID_Order
FROM
Delivery
INNER JOIN
Order ON Order.ID_Order = Delivery.ID_Order
WHERE
Order.ID_Order = '9'
Attempt #2: subquery
SELECT *
FROM
(SELECT
Order.ItemCode, Delivery.Qty,
FROM
Delivery
INNER JOIN
Order ON Order.ID_Order = Delivery.ID_Order
WHERE
Order.ID_Order = '9')
GROUP BY
a.ItemCode, a.Qty
Try this query --
;WITH CTE
AS (
SELECT C.ID_Order
,D.ID_Delivery
,C.ItemCode
,C.Quantity
,ROW_NUMBER() OVER (
PARTITION BY C.ItemCode ORDER BY D.ID_Delivery DESC
) AS RowNum
FROM Customer_Order C
INNER JOIN Delivery D ON C.ID_Order = D.ID_Order
AND C.ItemCode = D.ItemCode
WHERE C.ID_Order = 9
)
SELECT ID_Order
,ID_Delivery
,ItemCode
,Quantity
FROM CTE
WHERE RowNum = 1
SELECT
Order.ItemCode, Delivery. Qty, Delivery.ID_Delivery,
Order.ID_Order
FROM
Delivery
INNER JOIN
Order ON Order.ID_Order = Delivery.ID_Order
WHERE
Order.ID_Order = '9'
AND Delivery.ID_Delivery IN
(
SELECT MAX(ID_Delivery) FROM Delivery D WHERE D.ID_Order = Delivery.ID_Order GROUP BY D.ID_Order
)
I hope it will work for you.

How to display distinct values based on MAX date in report builder?

I'm quite new to SQL and I hope you can help me.
I'm trying to retrieve unique values from my table based on the latest date where specific users are selected.
This is the data:
Raw Data
And this is what I'm looking to achieve:
Desired Data
I tried to write 2 queries but unfortunately:
My 1st query would display duplicated rows for each company:
SELECT DISTINCT FilteredAppointment.regardingobjectidname ,FilteredAppointment.owneridname ,FilteredAppointment.subject ,MAX(FilteredAppointment.scheduledstart) as Date ,FilteredAppointment.location ,FilteredCcx_member.ccx_mnemonic FROM FilteredAppointment INNER JOIN FilteredAccount ON FilteredAppointment.regardingobjectid = FilteredAccount.accountid INNER JOIN FilteredCcx_member ON FilteredAccount.accountid = FilteredCcx_member.ccx_accountid WHERE FilteredAppointment.statecodename != N'Canceled' AND FilteredAppointment.owneridname IN (N'User1', N'User2', N'User3') GROUP BY FilteredAppointment.regardingobjectidname ,FilteredAppointment.owneridname ,FilteredAppointment.subject ,FilteredAppointment.scheduledstart ,FilteredAppointment.location ,FilteredCcx_member.ccx_mnemonic ORDER BY FilteredAppointment.regardingobjectidname
And my 2nd query would display one row only:
SELECT DISTINCT FilteredAppointment.regardingobjectidname ,FilteredAppointment.owneridname ,FilteredAppointment.subject ,FilteredAppointment.scheduledstart ,FilteredAppointment.location ,FilteredCcx_member.ccx_mnemonic FROM FilteredAppointment INNER JOIN FilteredAccount ON FilteredAppointment.regardingobjectid = FilteredAccount.accountid INNER JOIN FilteredCcx_member ON FilteredAccount.accountid = FilteredCcx_member.ccx_accountid WHERE FilteredAppointment.scheduledstart = (SELECT MAX(FilteredAppointment.scheduledstart) FROM FilteredAppointment WHERE FilteredAppointment.regardingobjectidname = FilteredAppointment.regardingobjectidname) AND FilteredAppointment.statecodename != N'Canceled' AND FilteredAppointment.owneridname IN (N'User1', N'User2', N'User3') GROUP BY FilteredAppointment.regardingobjectidname ,FilteredAppointment.owneridname ,FilteredAppointment.subject ,FilteredAppointment.scheduledstart ,FilteredAppointment.location ,FilteredCcx_member.ccx_mnemonic ORDER BY FilteredAppointment.regardingobjectidname
Try this:-
SELECT distinct a.date, a.company, a.companyID, a.User, a.Location, a.topic
FROM tablename a
inner join
(
Select company, companyID, User, max(date) as recent_date
from
tablename
group by company, companyID, User
) b
on a.date=b.recent_date and a.company=b.company and a.companyID=b.companyID
and a.User=b.User;
I managed to solve the issue - Thank you for the help again!
WITH apptmts AS (SELECT TOP 1 WITH TIES fa.scheduledstart,fa.location,fa.regardingobjectidname,mem.ccx_mnemonic,fa.owneridname,fa.subject FROM FilteredAppointment fa JOIN FilteredAccount acc on fa.regardingobjectid = acc.accountid JOIN FilteredCcx_member mem ON acc.accountid = mem.ccx_accountid WHERE fa.statecodename != N'Canceled' AND fa.owneridname IN (N'User1', N'User2', N'User3') ORDER BY ROW_NUMBER() OVER(PARTITION BY fa.regardingobjectidname ORDER BY fa.scheduledstart DESC) ) SELECT * FROM apptmts ORDER BY scheduledstart DESC

SQL Server 2005, SELECT DISTINCT query

How can I modify this query to return distinct Models and VideoProcessor columns ?
SELECT TOP(20) *
FROM
(SELECT
[3D_Benchmarks].Id AS BenchmarkId,
Manufacturer, Model, Slug, VideoProcessor,
FPS, CPU
FROM
[3D_Benchmarks]
JOIN
[3D_Slugs] ON [3D_Benchmarks].Id = [3D_Slugs].BenchmarkId) AS tb
ORDER BY
tb.FPS DESC;
New answer based on your comment. This looks up the twenty highest FPS Model+VideoProcessor. For each of those, it picks the row with the highest FPS.
select details.Model
, details.VideoProcessor
, details.FPS
, <add other columns here>
from (
select top 20 b.Model
, VideoProcessor
from [3D_Benchmarks] b
join [3D_Slugs] s
on b.Id = s.BenchmarkId
group by
b.Model
order by
max(b.FPS) desc
) top20
cross apply
(
select top 1 *
from [3D_Benchmarks[ b
join [3D_Slugs] s
on b.Id = s.BenchmarkId
where b.Model = top20.Model
and b.VideoProcessor = top20.VideoProcessor
order by
b.FPS desc
) details