Getting most recent distinct records - sql

Considering the following table:
User CreatedDateTime Quantity
----- ----------------- --------
Jim 2012-09-19 01:00 1
Jim 2012-09-19 02:00 5
Jim 2012-09-19 03:00 2
Bob 2012-09-19 02:00 2
Bob 2012-09-19 03:00 9
Bob 2012-09-19 05:00 1
What query would return the most recent rows (as defined by CreatedDateTime) for each User, so that we could determine the associated Quantity.
i.e. the following records
User CreatedDateTime Quantity
----- ----------------- --------
Jim 2012-09-19 03:00 2
Bob 2012-09-19 05:00 1
We thought that we could simply Group By User and CreatedDateTime and add a Having MessageCreationDateTime = MAX(.MessageCreationDateTime. Of course this does not work because Quantity is not available following the Group By.

Since you are using SQL Server, you can use Window Function on this.
SELECT [User], CreatedDateTime, Quantity
FROM
(
SELECT [User], CreatedDateTime, Quantity,
ROW_NUMBER() OVER(PARTITION BY [User] ORDER BY CreatedDateTime DESC) as RowNum
FROM tableName
) a
WHERE a.RowNum = 1
SQLFiddle Demo

;WITH x AS
(
SELECT [User], CreatedDateTime, Quantity,
rn = ROW_NUMBER() OVER (PARTITION BY [User] ORDER BY CreatedDateTime DESC)
FROM dbo.table_name
)
SELECT [User], CreatedDateTime, Quantity
FROM x WHERE rn = 1;

If you do not have the ability to use windowing functions, then you can use a sub-query:
select t1.[user], t2.mxdate, t1.quantity
from yourtable t1
inner join
(
select [user], max(CreatedDateTime) mxdate
from yourtable
group by [user]
) t2
on t1.[user]= t2.[user]
and t1.CreatedDateTime = t2.mxdate
see SQL Fiddle with Demo

SELECT DISTINCT
User,
CreatedDateTime,
Quantity
FROM
YourTable
WHERE
CreatedDateTime =
(SELECT MAX(CreatedDateTime) FROM YourTable t WHERE t.User = YourTable.User)

select * from <table_name> where CreatedDateTime in (select max(CreatedDateTime) from <table_name> group by user) group by user;

Related

SQL Server : remove duplicates and add columns

I have a table which has duplicate record this is how the table looks like.
ID Date Status ModifiedBy
------------------------------------------
1 1/2/2019 10:29 Assigned(0) xyz
1 1/2/2019 12:21 Pending(1) abc
1 1/4/2019 11:42 Completed(5)abc
1 1/20/2019 2:45 Closed(8) pqr
2 9/18/2018 10:05 Assigned(0) xyz
2 9/18/2018 11:15 Pending(1) abc
2 9/21/2018 11:15 Completed(5)abc
2 10/7/2018 2:46 Closed(8) pqr
What I want to do is take the minimum date value but also I want to add additional column which is PendingStartDate and PendingEndDate.
PendingStartDate: date when ID went into pending status
PendingEndDate: date when ID went from pending status to any other status
So my final output should look like this
ID AuditDate Status ModifiedBy PendingStartDate PendingEndDate
---------------------------------------------------------------------------
1 1/2/2019 10:29 Assigned(0) xyz 1/2/2019 12:21 1/4/2019 11:42
2 9/18/2018 10:05 Assigned(0) abc 9/18/2018 11:15 9/21/2018 11:15
Any help as to how to do this is appreciated.
Thanks
I think you want conditional aggregation:
select id, min(date) as auditdate,
max(case when seqnum = 1 then status end) as status,
max(case when seqnum = 1 then modifiedBy end) as modifiedBy,
min(case when status like 'Pending%' then date end) as pendingStartDate,
max(case when status like 'Pending%' then next_date end) as pendingEndDate
from (select t.*,
row_number() over (partition by id order by date) as seqnum,
lead(date) over (partition by id order by date) as next_date
from t
) t
group by id;
please try this:
Declare #Tab Table(Id int, [Date] DATETIME,[Status] Varchar(25),ModifiedBy varchar(10))
Insert into #Tab
SELECT 1,'1/2/2019 10:29','Assigned(0)','xyz' Union All
SELECT 1,'1/2/2019 11:29','Started(0)','xyz' Union All
SELECT 1,'1/2/2019 12:21','Pending(1)','abc' Union All
SELECT 1,'1/2/2019 12:21','In-Progress(1)','abc' Union All
SELECT 1,'1/4/2019 11:42','Completed(5)','abc'Union All
SELECT 1,'1/20/2019 2:45','Closed(8)','pqr' Union All
SELECT 2,'9/18/2018 10:05','Assigned(0)','xyz'Union All
SELECT 2,'9/18/2018 11:15','Pending(1)','abc' Union All
SELECT 2,'9/21/2018 11:15','Completed(5)','abc' Union All
SELECT 2,'10/7/2018 2:46','Closed(8)','pqr'
;with cte As
(
Select * ,lead(date) over (partition by id order by date) as pendingStartDate
from #Tab
Where Status in ('Assigned(0)','Pending(1)','Completed(5)')
)
,cte2 As
(
Select * , lead(pendingStartDate) over (partition by id order by date) As pendingEndDate
from cte
)
Select * from cte2 where Status ='Assigned(0)'
As you mentioned in comment, i have included few states between Assigned,pending and completed.

select most recent record if theres a duplicate

So i have been scratching my head over this one,mostly because i am on access 2010 and most of the queries i have found on the internet have commands that do not work on access.
id name date qty created
====================================================
1 abc 01/2016 20 06/07/2016 11:00
2 abc 02/2016 20 06/07/2016 11:00
3 abc 03/2016 20 06/07/2016 11:00
4 abc 01/2016 30 06/07/2016 13:00
I need to pull out a recordset like this:
id name date qty created
====================================================
2 abc 02/2016 20 06/07/2016 11:00
3 abc 03/2016 20 06/07/2016 11:00
4 abc 01/2016 30 06/07/2016 13:00
the created field is just a timestamp, the date field is a "due date". basically i need to pull out the most recent qty for each name and date. the ID is unique so i can use it instead,if its easier.
By far i've got:
SELECT m1.date, m1.name, m1.created
FROM table AS m1 LEFT JOIN table AS m2 ON (m1.created < m2.created) AND
(m1.date = m2.date)
WHERE m2.created IS NULL;
but this one gives me only the most recent conflicted data, ie. the record n°4 in my example.i also need the other two records. any thoughts?
Try using NOT EXISTS() :
SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
WHERE t.date = s.date and s.created > t.created
AND t.name = s.name)
I think you are also missing a condition so I've added it:
and t.name = s.name
You didn't tag your RDBMS, if its SQL-Server/Oracle/Postgresql you can use ROW_NUMBER() :
SELECT s.date, s.name, s.created FROM (
SELECT t.*,
ROW_NUMBER() OVER(PARTITION BY t.date,t.name ORDER BY t.created DESC) as rnk
FROM YourTable t) s
WHERE s.rnk = 1
Try this:
SELECT m1.date, m1.name, m1.qty, m1.created
FROM table AS m1
JOIN (
SELECT date, name, MAX(created) AS created
FROM table
GROUP BY date, name
) AS m2 ON m1.date = m2.date AND m1.name = m2.name AND m1.created = m2.created

SQL - Get oldest date while date is in where clause

Suppose I have this data
userid logdate event
0 2009-01-01 x
1 2010-01-01 x
1 2011-01-01 xy
1 2011-01-05 xz
2 2011-01-21 xx
2 2011-01-22 xx
I need to get users who made a log between 2011-01-01 and 2011-02-01
including their first logdate since beginning.
Expected result
userid first_logdate
1 2010-01-01
2 2011-01-21
Current solution
SELECT user_id, first_logdate
FROM (
SELECT user_id, logdate, MIN(logdate) AS first_logdate
FROM tablex
GROUP BY 1
)
WHERE logdate BETWEEN '2011-01-01' AND '2011-02-01'
If the data is large, is this query optimized?
GROUP BY the userid and get the MIN date as their first log date
SELECT userid, MIN(logdate) AS first_logdate
FROM table
WHERE logdate BETWEEN '2011-01-01' AND '2011-01-21'
GROUP BY userid
Use Group By and Min Aggregate
SELECT DISTINCT userid,
(SELECT Min(first_logdate)
FROM yourtable B
WHERE a.userid = b.userid)
FROM yourtable A
WHERE first_logdate BETWEEN '2011-01-01' AND '2011-02-01'
Try:
SELECT userid, MIN(logdate) AS first_logdate
FROM table
WHERE userid IN (
SELECT userid FROM table
WHERE logdate BETWEEN '2011-01-01' AND '2011-01-21'
)
GROUP BY userid
A self join may also be used:
SELECT userid, MIN(t1.logdate) AS first_logdate
FROM table t1
JOIN table t2 USING ( userid )
WHERE t2.logdate BETWEEN '2011-01-01' AND '2011-01-21'
GROUP BY userid
and a third version using EXISTS operator
SELECT userid, MIN(logdate) AS first_logdate
FROM table t1
WHERE EXISTS (
SELECT 555821 FROM table t2
WHERE t2.logdate BETWEEN '2011-01-01' AND '2011-01-21'
AND t1.userid = t2.userid
)
GROUP BY userid

Finding the interval between dates in SQL Server

I have a table including more than 5 million rows of sales transactions. I would like to find sum of date intervals between each customer three recent purchases.
Suppose my table looks like this :
CustomerID ProductID ServiceStartDate ServiceExpiryDate
A X1 2010-01-01 2010-06-01
A X2 2010-08-12 2010-12-30
B X4 2011-10-01 2012-01-15
B X3 2012-04-01 2012-06-01
B X7 2012-08-01 2013-10-01
A X5 2013-01-01 2015-06-01
The Result that I'm looking for may looks like this :
CustomerID IntervalDays
A 802
B 135
I know the query need to first retrieve 3 resent transactions of each customer (based on ServiceStartDate) and then calculate the interval between startDate and ExpiryDate of his/her transactions.
You want to calculate the difference between the previous row's ServiceExpiryDate and the current row's ServiceStartDate based on descending dates and then sum up the last two differences:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc
, ServiceExpiryDate desc -- don't know if this 2nd column is necessary
) as rn
from tab
)
select t2.customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte as t2 left join cte as t1
on t1.customerId = t2.customerId
and t1.rn = t2.rn+1 -- previous and current row
where t2.rn <= 3 -- last three rows
group by t2.customerId;
Same result using LEAD:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc) as rn
,lead(ServiceExpiryDate)
over (partition by customerId
order by ServiceStartDate desc
) as prevEnd
from tab
)
select customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte
where rn <= 3
group by customerId;
Both will not return the expected result unless you subtract purchases (or max(rn)) from Intervaldays. But as you only sum two differences this seems to be not correct for me either...
Additional logic must be applied based on your rules regarding:
customer has less than 3 purchases
overlapping intervals
Assuming there are no overlaps, I think you want this:
select customerId,
sum(datediff(day, ServiceStartDate, ServieEndDate) as Intervaldays
from (select t.*, row_number() over (partition by customerId
order by ServiceStartDate desc) as seqnum
from table t
) t
where seqnum <= 3
group by customerId;
Try this:
SELECT dt.CustomerID,
SUM(DATEDIFF(DAY, dt.PrevExpiry, dt.ServiceStartDate)) As IntervalDays
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ServiceStartDate DESC) AS rn
, (SELECT Max(ti.ServiceExpiryDate)
FROM yourTable ti
WHERE t.CustomerID = ti.CustomerID
AND ti.ServiceStartDate < t.ServiceStartDate) As PrevExpiry
FROM yourTable t )dt
GROUP BY dt.CustomerID
Result will be:
CustomerId | IntervalDays
-----------+--------------
A | 805
B | 138

How do I get the highest sum per day for last X days?

This is probably a easy one, but for the life of me I can't seem to figure it out.
Here is my table:
Date User Amount
---------- ----- ------
01/01/2010 User1 2
01/01/2010 User2 2
01/01/2010 User1 4
01/01/2010 User2 1
01/02/2010 User2 2
01/02/2010 User1 2
01/02/2010 User2 4
01/02/2010 User2 1
So on for past several months. I need get the following results:
Date User Amount
---------- ----- ------
01/01/2010 User1 6
01/02/2010 User2 7
Basically, the user with Max(SUM(Amount)) for each day.
I would appreciate any hints you guys can offer.
Thanks.
SELECT MAX(amt),`Date`,`User` FROM
(SELECT SUM(`Amount`),`Date`,`User` as amt .... GROUP BY `Date`,`User`)
GROUP BY `Date`
select t.*
from (
select Date, Max(Amount) as MaxAmount
from MyTable
group by Date
) tm
inner join MyTable t on tm.Date = t.Date and tm.MaxAmount = t.Amount
Note: this will give you both user records if there are two users with the same max amount on a given day.
I actually ended up going with the following:
WITH ranked AS
(
SELECT ROW_NUMBER() OVER (ORDER BY SUM(Amount), Date, User) as 'rank', SUM(Amount) AS Amount, User, Date FROM MyTable GROUP BY Date, User
)
SELECT Date, User, Amount
FROM ranked
WHERE rank IN ( select MAX(rank) from ranked group by Date)
ORDER BY Date DESC
Can be less verbose with the RANK ... OVER, but following is the straight-forward solution:
WITH summary_user_date
AS (SELECT Date, User, SUM(Amount) AS SumAmount
FROM MyTable
GROUP BY Date, User
)
, summary_date
AS (SELECT Date, MAX(SumAmount) AS SumAmount
FROM summary_user_date
GROUP BY Date
)
SELECT summary_user_date.*
FROM summary_user_date
INNER JOIN summary_date
ON summary_date.Date = summary_user_date.Date
AND summary_date.SumAmount = summary_user_date.SumAmount
It should be mentioned that if more then one user has the same maximum amount, all of them will be shown. If this is not desired then one should use RANK based solution.
Using CTEs you could do something like:
With DailyTotals As
(
Select [Date], [User], Sum(Amount) As Total
From #Test
Group By [Date], [User]
)
Select [Date],[User],Total
From DailyTotals As DT
Where Total = (
Select Max(Total)
From DailyTotals As DT1
Where DT1.[Date] = DT.[Date]
)
Order By DT.[Date]
A non-CTE solution would be:
Select [Date],[User],Total
From (
Select [Date], [User], Sum(Amount) As Total
From #Test
Group By [Date], [User]
) As DT
Where DT.Total = (
Select Max(DT1.Total)
From (
Select [Date], [User], Sum(Amount) As Total
From #Test
Group By [Date], [User]
) As DT1
Where DT1.[Date] = DT.[Date]
)
Order By DT.[Date]