How to select the user with max count by day - sql

I have a table with three columns
UserID, Count, Date
I'd like to be able to select the userid with the highest count for each date.
I've tried a few different variations of queries with inline select statements but none have worked 100%, and I'm not too fond of having a select with three inline selects.
Is doing inline selects the only way to go without using temp tables? Whats the best way to tackle this?

This solution will give you multiple records if there is a tie in Count but should work.
SELECT a.Date, a.UserId, a.[Count]
FROM yourTable a INNER JOIN (
SELECT MAX([Count]) as [Count], Date
FROM yourTable
GROUP BY Date
) b ON a.[Count] = b.[Count] AND a.Date = b.Date
ORDER BY a.Date

If [Date] is in fact a [Date] column with no time component:
;WITH x AS
(
SELECT [Date], [Count], UserID, rn = ROW_NUMBER() OVER
(PARTITION BY [Date] ORDER BY [Count] DESC)
FROM dbo.table
)
SELECT [Date], [Count], UserID
FROM x
WHERE rn = 1
ORDER BY [Date];
If [Date] is a DATETIME column with a time component, then:
;WITH x AS
(
SELECT [Date] = DATEADD(DAY, DATEDIFF(DAY, '19000101', [Date]), '19000101'),
[Count], UserID, rn = ROW_NUMBER() OVER
(PARTITION BY DATEADD(DAY, DATEDIFF(DAY, '19000101', [Date]), '19000101')
ORDER BY [Count] DESC)
FROM dbo.table
)
SELECT [Date], [Count], UserID
FROM x
WHERE rn = 1
ORDER BY [Date];
If you want to pick a specific row in the event of a tie, you can add a tie-breaker to the ORDER BY within the over. If you want to include multiple rows in the case of ties, you can try changing ROW_NUMBER() to DENSE_RANK().

SELECT x.*
FROM (
SELECT Date
FROM atable
GROUP BY Date
) t
CROSS APPLY (
SELECT TOP 1 WITH TIES
UserID, Count, Date
FROM atable
WHERE Date = t.Date
ORDER BY Count DESC
) x
If Date is datetime type and can have a non-zero time component, change the t table like this:
…
FROM (
SELECT Date = DATEADD(DAY, DATEDIFF(DAY, 0, Date), 0)
FROM atable
GROUP BY DATEADD(DAY, DATEDIFF(DAY, 0, Date), 0)
) t
…
References:
TOP (Transact-SQL)
Using APPLY

for SQL 2k5
select UserID, Count, Date
from tb
where Rank() over (partition by Date order by Count DESC, UserID DESC) = 1

Related

Find duplicate data in last 1 hour

I am looking for a SQL script to find the data which has more than 2 entries in last 1 hour.
I have a table having user_id & event_time. I want a way to find out if the user_id has more than 1 entries in last 1 hour.
I have tried below till now:
Create temp table to put all duplicate entries :
SELECT a.*
INTO #temp
FROM Table a
JOIN (
SELECT USERID, COUNT(*) AS Duplicates
FROM Table
GROUP BY userid
HAVING count(*) > 1
) AS b ON a.userid = b.USERID
Run self Joins to fetch records having time difference of 1 hour or less:
SELECT a.*
FROM #temp a
INNER JOIN #temp b ON a.userid = b.USERID
WHERE DATEDIFF(hour, a.EVENTTIME, b.EVENTTIME) = 1
Once first script is ran it gives around 800+ rows for duplicate data. But after running the second script the data I get is in thousands.
Can anyone help here?
cross apply can be used to get all related events for each event according to your criteria as follows:
With CTE As (
Select USERID, EVENTTIME, Row_Number() Over (Order by USERID, EVENTTIME) As ID
From Tbl
)
Select a.ID, a.USERID, a.EVENTTIME, T.ID, T.USERID, T.EVENTTIME
From CTE As a Cross Apply (Select ID, USERID, EVENTTIME
From CTE
Where Abs(datediff(minute, a.EVENTTIME, EVENTTIME))<=60
And USERID=a.USERID And ID<>a.ID) As T
Order by a.ID, a.USERID, a.EVENTTIME, T.ID, T.USERID, T.EVENTTIME
or you can get a list of events without binding to a specific event:
With CTE As (
Select USERID, EVENTTIME, Row_Number() Over (Order by USERID, EVENTTIME) As ID
From Tbl
)
Select T.USERID, T.EVENTTIME
From CTE As a Cross Apply (Select USERID, EVENTTIME
From CTE
Where Abs(datediff(minute, a.EVENTTIME, EVENTTIME))<=60
And USERID=a.USERID And ID<>a.ID) As T
Group by T.USERID, T.EVENTTIME
db<>fiddle
to get the events only for last hour, you can add the appropriate filter to Where clause in CTE.
With CTE As (
Select USERID, EVENTTIME, Row_Number() Over (Order by USERID, EVENTTIME) As ID
From Tbl
Where EVENTTIME Between dateadd(minute, -60, GetDate()) And GetDate()
)
Select T.USERID, T.EVENTTIME
From CTE As a Cross Apply (Select USERID, EVENTTIME
From CTE
Where Abs(datediff(minute, a.EVENTTIME, EVENTTIME))<=60
And USERID=a.USERID
And ID<>a.ID) As T
Group by T.USERID, T.EVENTTIME
Give a row number for each group of user_id in the order of date difference in hours. Remember to filter the rows which have the event_date in last 1 hour.
Query
;with cte as(
select [rn] = row_number() over(
partition by [user_id]
order by [user_id], datediff(hour, [event_time], getdate())
), *
from [your_table_name]
where datediff(hour, [event_time], getdate()) < 2
)
select * from [your_table_name] as [t1]
where exists(
select 1 from cte as [t2]
where [t1].[user_id]= [t2].[user_id]
and [t2].[rn] > 1
);

SQL - Group rows by contiguous date

I have a table:
Value Date
100 01/01/2000
110 01/05/2002
100 01/10/2003
100 01/12/2004
I want to group the data in this way
Value StartDate EndDate
100 01/01/2000 30/04/2002
110 01/05/2002 30/09/2003
100 01/10/2003 NULL --> or value like '01/01/2099'
How can I accomplish this?
Can a CTE be useful and how?
For RDBMS supported window functions (example on MS SQL database):
with Test(value, dt) as(
select 100, cast('2000-01-01' as date) union all
select 110, cast('2002-05-01' as date) union all
select 100, cast('2003-10-01' as date) union all
select 100, cast('2004-12-01' as date)
)
select max(value) value, min(dt) startDate, max(end_dt) endDate
from (
select a.*, sum(brk) over(order by dt) grp
from (
select t.*,
case when value!=lag(value) over(order by dt) then 1 else 0 end brk,
DATEADD(DAY,-1,lead(dt,1,cast('2099-01-02' as date)) over(order by dt)) end_dt
from Test t
) a
) b
group by grp
order by startDate
I think the difference of row numbers is simpler in this case:
select value, min(date) as endDate,
dateadd(day, -1, lead(min(date)) over (order by min(date))) as endDate
from (select t.*,
row_number() over (order by date) as seqnum,
row_number() over (partition by value order by date) as seqnum_v
from t
) t
group by value, (seqnum - seqnum_v);
The difference of the row numbers defines the groups you want. This is a bit hard to see at first . . . if you stare at the results of the subquery, you'll see how it works.

Convert a list of dates to date ranges in SQL Server

I have a query as following:
SELECT [Date] FROM [TableX] ORDER BY [Date]
The result is:
2016-06-01
2016-06-03
2016-06-10
2016-06-11
How can I get following pairs?
From To
2016-06-01 2016-06-03
2016-06-03 2016-06-10
2016-06-10 2016-06-11
If you're using SQL Server 2012 or later, you can use the LEAD method.
Accesses data from a subsequent row in the same result set without the use of a self-join in SQL Server 2016. LEAD provides access to a row at a given physical offset that follows the current row.
I think it would look like this for you:
SELECT [Date] AS [From], LEAD([Date], 1) OVER (ORDER BY [Date]) AS [To]
FROM TableX
ORDER BY [Date]
Note that on the last row, the [To] field will be NULL. If you wanted to remove that row, you could put it in an inner query:
SELECT *
FROM
(
SELECT [Date] AS [From], LEAD([Date], 1) OVER (ORDER BY [Date]) AS [To]
FROM TableX
) x
WHERE [To] IS NOT NULL
All you need to do is add a row number for each date.
Then unite all these rows by the next row (except the last row)
WITH cteDates AS
(
SELECT [Date],
ROW_NUMBER() OVER (ORDER BY (SELECT [Date])) As RowNum
FROM TableX
)
SELECT TOP(SELECT COUNT(*) - 1 FROM cteDates)
[Date] [From],
(SELECT [Date] FROM cteDates WHERE RowNum = d.RowNum + 1) [To]
FROM cteDates d
A little tricky solution for SQL 2008.
declare #tbl table(dt datetime)
insert #tbl values
('2016-06-01'),
('2016-06-03'),
('2016-06-10'),
('2016-06-11')
;with cte as (
select dt, ROW_NUMBER() over(order by dt) rn --add number
from #tbl
),
newTbl as (
select t1.dt start, t2.dt [end]
from cte t1 inner join cte t2 on t1.rn+1=t2.rn
)
select *
from newTbl
The result is what you wish.
Since there are never any gaps as you stated, you can just used DATEADD()
SELECT DISTINCT
[Date] as [FROM],
DATEADD(DAY,1,[Date]) as [TO]
FROM TableX
ORDER BY [Date] DESC

SQL Select MAX and 2nd MAX

I am running a query against MS SQL Server 2008 and am selecting an accountnumber and the max of the column mydate grouped by accountnumber:
select AccountNumber,
max(mydate),
from #SampleData
group by AccountNumber
I want to add a column to the result that contains the second highest mydate that is associated with the AccountNumber group. I know it would have to be something like:
select max(mydate)
from #SampleData
where mydate < (select max(mydate) from #SampleData)
But how do I get both the max and 2nd max in one select query?
You didn't specify your DBMS so this is ANSI SQL:
select accountnumber,
rn,
mydate
from (
select accountnumber,
mydate,
row_number() over (partition by accountnumber order by mydate desc) as rn
from #SampleData
) t
where rn <= 2;
Try this
Select AccountNumber,
MAX(Case when Rnum = 1 Then mydate END) mydate_1,
MAX(Case when Rnum = 2 Then mydate END) mydate_2
From
(
select
AccountNumber, mydate,
ROW_NUMBER() OVER (PARTITION By AccountNumber ORDER BY mydate DESC) as Rnum
from #SampleData
) V
Group By AccountNumber
Something like this should select the second highest:
select AccountNumber,
max(mydate),
(select max(SD2.mydate) from #SampleData SD2 where SD2.AccountNumber=#SampleData.AccountNumber AND SD2.mydate<max(#SampleData.mydate))
from #SampleData
group by AccountNumber
You could also use a TOP N clause combined with an order by:
select
TOP 2
accountnumber,
mydate,
row_number() over (partition by accountnumber order by mydate desc) as rn
from #SampleData
ORDER BY
row_number() over (partition by accountnumber order by mydate desc)

SQL Server - Select all top of the hour records

I have a large table with records created every second and want to select only those records that were created at the top of each hour for the last 2 months. So we would get 24 selected records for every day over the last 60 days
The table structure is Dateandtime, Value1, Value2, etc
Many Thanks
You could group by on the date part (cast(col1 as date)) and the hour part (datepart(hh, col1). Then pick the minimum date for each hour, and filter on that:
select *
from YourTable yt
join (
select min(dateandtime) as dt
from YourTable
where datediff(day, dateandtime, getdate()) <= 60
group by
cast(dateandtime as date)
, datepart(hh, dateandtime)
) filter
on filter.dt = yt.dateandtime
Alternatively, you can group on a date format that only includes the date and the hour. For example, convert(varchar(13), getdate(), 120) returns 2013-05-11 18.
...
group by
convert(varchar(13), getdate(), 120)
) filter
...
For clarity's sake, I would probably use a two-step, CTE-based approach (this works in SQL Server 2005 and newer - you didn't clearly specify which version of SQL Server you're using, so I'm just hoping you're not on an ancient version like 2000 anymore):
-- define a "base" CTE to get the hour component of your "DateAndTime"
-- column and make it accessible under its own name
;WITH BaseCTE AS
(
SELECT
ID, DateAndTime,
Value1, Value2,
HourPart = DATEPART(HOUR, DateAndTime)
FROM dbo.YourTable
WHERE DateAndTime >= #SomeThresholdDateHere
),
-- define a second CTE which "partitions" the data by this "HourPart",
-- and number all rows for each partition starting at 1. So each "last"
-- event for each hour is the one with the RN = 1 value
HourlyCTE AS
(
SELECT ID, DateAndTime, Value1, Value2,
RN = ROW_NUMBER() OVER(PARTITION BY HourPart ORDER BY DateAndTime DESC)
FROM BaseCTE
)
SELECT *
FROM HourlyCTE
WHERE RN=1
Also: I wasn't sure what exactly you mean by "top of the hour" - the row that's been created right at the beginning of each hour (e.g. at 04:00:00) - or rather the last row created in that hour's time span? If you mean the first one for each hour - then you'd need to change the ORDER BY DateAndTime DESC to ORDER BY DateAndTime ASC
You can use option with EXISTS operator
SELECT *
FROM dbo.tableName t
WHERE t.DateAndTime >= #YourDateCondition
AND EXISTS (
SELECT 1
FROM dbo.tableName t2
WHERE t2.Dateandtime >= DATEADD(HOUR, DATEDIFF(HOUR, 0, t.Dateandtime), 0)
AND t2.Dateandtime < DATEADD(HOUR, DATEDIFF(HOUR, 0, t.Dateandtime)+1, 0)
HAVING MAX(t2.Dateandtime) = t.Dateandtime
)
OR option with CROSS APPLY operator
SELECT *
FROM dbo.test83 t CROSS APPLY (
SELECT 1
FROM dbo.test83 t2
WHERE t2.Dateandtime >= DATEADD(HOUR, DATEDIFF(HOUR, 0, t.Dateandtime), 0)
AND t2.Dateandtime < DATEADD(HOUR, DATEDIFF(HOUR, 0, t.Dateandtime)+1, 0)
HAVING MAX(t2.Dateandtime) = t.Dateandtime
) o(IsMatch)
WHERE t.DateAndTime >= #YourDateCondition
For improving performance use this index:
CREATE INDEX x ON dbo.test83(DateAndTime) INCLUDE(Value1, Value2)
Try:
select * from mytable
where datepart(mi, dateandtime)=0 and
datepart(ss, dateandtime)=0 and
datediff(d, dateandtime, getdate()) <=60
You can use window functions for this:
select dateandtime, val1, val2, . . .
from (select t.*,
row_number() over (partition by cast(dateandtime as date), hour(dateandtime)
order by dateandtime
) as seqnum
from t
) t
where seqnum = 1
The function row_number() assigns a sequential number to each group defined by the partition clause -- in this case each hour of each day. Within this group, it orders by the dateandtime value, so the one closest to the top of the hour gets a value of 1. The outer query just selects this one record for each group.
You may need an additional filter clause to get records in the last 60 days. Use this in the subquery:
where dateandtime >= getdate() - 60
This helped me get the top of the hour. Anything that ends in ":00:00".
WHERE (CAST(DATETIME as VARCHAR(19))) LIKE '%:00:00'