Find duplicate data in last 1 hour - sql

I am looking for a SQL script to find the data which has more than 2 entries in last 1 hour.
I have a table having user_id & event_time. I want a way to find out if the user_id has more than 1 entries in last 1 hour.
I have tried below till now:
Create temp table to put all duplicate entries :
SELECT a.*
INTO #temp
FROM Table a
JOIN (
SELECT USERID, COUNT(*) AS Duplicates
FROM Table
GROUP BY userid
HAVING count(*) > 1
) AS b ON a.userid = b.USERID
Run self Joins to fetch records having time difference of 1 hour or less:
SELECT a.*
FROM #temp a
INNER JOIN #temp b ON a.userid = b.USERID
WHERE DATEDIFF(hour, a.EVENTTIME, b.EVENTTIME) = 1
Once first script is ran it gives around 800+ rows for duplicate data. But after running the second script the data I get is in thousands.
Can anyone help here?

cross apply can be used to get all related events for each event according to your criteria as follows:
With CTE As (
Select USERID, EVENTTIME, Row_Number() Over (Order by USERID, EVENTTIME) As ID
From Tbl
)
Select a.ID, a.USERID, a.EVENTTIME, T.ID, T.USERID, T.EVENTTIME
From CTE As a Cross Apply (Select ID, USERID, EVENTTIME
From CTE
Where Abs(datediff(minute, a.EVENTTIME, EVENTTIME))<=60
And USERID=a.USERID And ID<>a.ID) As T
Order by a.ID, a.USERID, a.EVENTTIME, T.ID, T.USERID, T.EVENTTIME
or you can get a list of events without binding to a specific event:
With CTE As (
Select USERID, EVENTTIME, Row_Number() Over (Order by USERID, EVENTTIME) As ID
From Tbl
)
Select T.USERID, T.EVENTTIME
From CTE As a Cross Apply (Select USERID, EVENTTIME
From CTE
Where Abs(datediff(minute, a.EVENTTIME, EVENTTIME))<=60
And USERID=a.USERID And ID<>a.ID) As T
Group by T.USERID, T.EVENTTIME
db<>fiddle
to get the events only for last hour, you can add the appropriate filter to Where clause in CTE.
With CTE As (
Select USERID, EVENTTIME, Row_Number() Over (Order by USERID, EVENTTIME) As ID
From Tbl
Where EVENTTIME Between dateadd(minute, -60, GetDate()) And GetDate()
)
Select T.USERID, T.EVENTTIME
From CTE As a Cross Apply (Select USERID, EVENTTIME
From CTE
Where Abs(datediff(minute, a.EVENTTIME, EVENTTIME))<=60
And USERID=a.USERID
And ID<>a.ID) As T
Group by T.USERID, T.EVENTTIME

Give a row number for each group of user_id in the order of date difference in hours. Remember to filter the rows which have the event_date in last 1 hour.
Query
;with cte as(
select [rn] = row_number() over(
partition by [user_id]
order by [user_id], datediff(hour, [event_time], getdate())
), *
from [your_table_name]
where datediff(hour, [event_time], getdate()) < 2
)
select * from [your_table_name] as [t1]
where exists(
select 1 from cte as [t2]
where [t1].[user_id]= [t2].[user_id]
and [t2].[rn] > 1
);

Related

How to get max date among others ids for current id using BigQuery?

I need to get max date for each row over other ids. Of course I can do this with CROSS JOIN and JOIN .
Like this
WITH t AS (
SELECT 1 AS id, rep_date FROM UNNEST(GENERATE_DATE_ARRAY('2021-09-01','2021-09-09', INTERVAL 1 DAY)) rep_date
UNION ALL
SELECT 2 AS id, rep_date FROM UNNEST(GENERATE_DATE_ARRAY('2021-08-20','2021-09-03', INTERVAL 1 DAY)) rep_date
UNION ALL
SELECT 3 AS id, rep_date FROM UNNEST(GENERATE_DATE_ARRAY('2021-08-25','2021-09-05', INTERVAL 1 DAY)) rep_date
)
SELECT id, rep_date, MAX(rep_date) OVER (PARTITION BY id) max_date, max_date_over_others FROM t
JOIN (
SELECT t.id, MAX(max_date) max_date_over_others FROM t
CROSS JOIN (
SELECT id, MAX(rep_date) max_date FROM t
GROUP BY 1
) t1
WHERE t1.id <> t.id
GROUP BY 1
) USING (id)
But it's too wired for huge tables. So I'm looking for the some simpler way to do this. Any ideas?
Your version is good enough I think. But if you want to try other options - consider below approach. It might looks more verbose from first look - but should be more optimal and cheaper to compare with your version with cross join
temp as (
select id,
greatest(
ifnull(max(max_date_for_id) over preceding_ids, '1970-01-01'),
ifnull(max(max_date_for_id) over following_ids, '1970-01-01')
) as max_date_for_rest_ids
from (
select id, max(rep_date) max_date_for_id
from t
group by id
)
window
preceding_ids as (order by id rows between unbounded preceding and 1 preceding),
following_ids as (order by id rows between 1 following and unbounded following)
)
select *
from t
join temp
using (id)
Assuming your original table data just has columns id and dt - wouldn't this solve it? I'm using the fact that if an id has the max dt of everything, then it gets the second-highest over the other id values.
WITH max_dates AS
(
SELECT
id,
MAX(dt) AS max_dt
FROM
data
GROUP BY
id
),
with_top1_value AS
(
SELECT
*,
MAX(dt) OVER () AS max_overall_dt_1,
MIN(dt) OVER () AS min_overall_dt
FROM
max_dates
),
with_top2_values AS
(
SELECT
*,
MAX(CASE WHEN dt = max_overall_dt_1 THEN min_overall_dt ELSE dt END) AS max_overall_dt2
FROM
with_top1_value
),
SELECT
*,
CASE WHEN dt = max_overall_dt1 THEN max_overall_dt2 ELSE max_overall_dt1 END AS max_dt_of_others
FROM
with_top2_values

Find the max date to last one year transaction for each group

I have to query in sql server where I have to find for each id it's volume such that we have last 1 year date for each id with it's volume.
for example below is my data ,
for each id I need to query the last 1 year transaction from when we have the entry for that id as you can see from the snippet for id 1 we have the latest date as 7/31/2020 so I need the last 1 year entry from that date for that id, The highlighted one is exclude because that date is more than 1 year from the latest date for that id
Similarly for Id 3 we have all the date range in one year from the latest date for that particular id
I tried using the below query and I can get the latest date for each id but I am not sure how to extract all the dates for each id from the latest date to one year, I would appreciate if some one could help me.
I am using Microsoft sql server would need the query which executes in sql server, Table name is emp and have millions of id
Select *
From emp as t
inner join (
Select tm.id, max(tm.date_tran) as MaxDate
From emp tm
Group by tm.id
) tm on t.id = tm.id and t.date_tran = tm.MaxDate
To exclude transactions where the date difference between the tran_date and the maximum tran_date for each id is greater than 1 year, something like this:
;with max_cte(id, max_date) as (
Select id, max(date_tran)
From emp tm
Group by id )
Select *
From emp e
join max_cte mc on e.id=mc.id
and datediff(d, e.date_tran, mc.max_date)<=365;
Update: per comments, added volume. Thnx GMB :)
;with max_cte(id, date_tran, volume, max_date) as (
Select *, dateadd(year, -1, max(date_tran) over(partition by id)) max_date
From #emp tm)
Select id, sum(volume) sum_volume
From max_cte mc
where mc.date_tran>max_date
group by id;
You can do this with window functions:
select id, sum(volume) total_volume
from (
select t.*, max(date_tran) over(partition by id) max_date_tran
from mytable t
) t
where date_tran > dateadd(year, -1, max_date_tran)
group by id
Alternatively, you can use a correlated subquery for filtering:
select id, sum(volume) total_volume
from mytable t
where t.date_tran > (
select dateadd(year, -1, max(t1.date_tran))
from mytable t1
where t1.id = t.id
)
The second query would take advantage of an index on (id, date_tran).
this should do the trick for you:
SELECT
*
FROM
emp
JOIN
(
SELECT
MAX(date_tran) max_date_tran
, Id
FROM
emp
GROUP BY
id
) emp2
ON emp2.Id = emp.Id
AND DATEADD(YEAR, -1, emp2.max_date_tran) <= emp.date_tran;
Your code is good. Just add the date difference function to get the particular time in between the transaction, like the following:
Select *
From emp as t
inner join ( Select id as id, max(date_tran) as maxdate
From emp tm
Group by id
) tm on t.id = tm.id and datediff(d, e.date_tran, mc.maxdate)<=365;

How to narrow down count query by a finite time frame?

I have a query where I am identifying more than 1 submission by user for a particular form:
select userid, form_id, count(*)
from table_A
group by userid, form_id
having count(userid) > 1
However, I am trying to see which users are submitting more than 1 form within a 5 second timeframe (We have a field for the submission timestamp in this table). How would I narrow this query down by that criteria?
#nikotromus
You've not provided a lot of details about your schema and other columns available, nor about what / how and where this information will be used.
However if you want to do it "live" so compare results in your time against current timestamp it would look something like:
SELECT userid, form_id, count(*)
FROM table_A
WHERE DATEDIFF(SECOND,YourColumnWithSubmissionTimestamp, getdate()) <= 5
GROUP BY userid, form_id
HAVING count(userid) > 1
One way is to add to the group by DATEDIFF(Second, '2017-01-01', SubmittionTimeStamp) / 5.
This will group records based on the userid, form_id and a five seconds interval:
select userid, form_id, count(*)
from table_A
group by userid, form_id, datediff(Second, '2017-01-01', SubmittionTimeStamp) / 5
having count(userid) > 1
Read this SO post for a more detailed explanation.
You can use lag to form groups of rows that are within 5 seconds of each other and then do aggregation on them:
select distinct userid,
form_id
from (
select t.*,
sum(val) over (
order by t.submission_timestamp
) as grp
from (
select t.*,
case
when datediff(ms, lag(t.submission_timestamp, 1, t.submission_timestamp) over (
order by t.submission_timestamp
), t.submission_timestamp) > 5000
then 1
else 0
end val
from your_table t
) t
) t
group by userid,
form_id,
grp
having count(*) > 1;
See this answer for more explanation:
Group records by consecutive dates when dates are not exactly consecutive
I would just use exists to get the users:
select userid, form_id
from table_A a
where exists (select 1
from table_A a2
where a2.userid = a.userid and a2.timestamp >= a.timestamp and a2.timestamp < dateadd(second, 5, a.timestamp
);
If you want a count, you can just add group by and count(*).

Calculating per day in SQL

I have an sql table like that:
Id Date Price
1 21.09.09 25
2 31.08.09 16
1 23.09.09 21
2 03.09.09 12
So what I need is to get min and max date for each id and dif in days between them. It is kind of easy. Using SQLlite syntax:
SELECT id,
min(date),
max(date),
julianday(max(date)) - julianday(min(date)) as dif
from table group by id
Then the tricky one: how can I receive the price per day during this difference period. I mean something like this:
ID Date PricePerDay
1 21.09.09 25
1 22.09.09 0
1 23.09.09 21
2 31.08.09 16
2 01.09.09 0
2 02.09.09 0
2 03.09.09 12
I create a cte as you mentioned with calendar but dont know how to get the desired result:
WITH RECURSIVE
cnt(x) AS (
SELECT 0
UNION ALL
SELECT x+1 FROM cnt
LIMIT (SELECT ((julianday('2015-12-31') - julianday('2015-01-01')) + 1)))
SELECT date(julianday('2015-01-01'), '+' || x || ' days') as date FROM cnt
p.s. If it will be in sqllite syntax-would be awesome!
You can use a recursive CTE to calculate all the days between the min date and max date. The rest is just a left join and some logic:
with recursive cte as (
select t.id, min(date) as thedate, max(date) as maxdate
from t
group by id
union all
select cte.id, date(thedate, '+1 day') as thedate, cte.maxdate
from cte
where cte.thedate < cte.maxdate
)
select cte.id, cte.date,
coalesce(t.price, 0) as PricePerDay
from cte left join
t
on cte.id = t.id and cte.thedate = t.date;
One method is using a tally table.
To build a list of dates and join that with the table.
The date stamps in the DD.MM.YY format are first changed to the YYYY-MM-DD date format.
To make it possible to actually use them as a date in the SQL.
At the final select they are formatted back to the DD.MM.YY format.
First some test data:
create table testtable (Id int, [Date] varchar(8), Price int);
insert into testtable (Id,[Date],Price) values (1,'21.09.09',25);
insert into testtable (Id,[Date],Price) values (1,'23.09.09',21);
insert into testtable (Id,[Date],Price) values (2,'31.08.09',16);
insert into testtable (Id,[Date],Price) values (2,'03.09.09',12);
The SQL:
with Digits as (
select 0 as n
union all select 1
union all select 2
union all select 3
union all select 4
union all select 5
union all select 6
union all select 7
union all select 8
union all select 9
),
t as (
select Id,
('20'||substr([Date],7,2)||'-'||substr([Date],4,2)||'-'||substr([Date],1,2)) as [Date],
Price
from testtable
),
Dates as (
select Id, date(MinDate,'+'||(d2.n*10+d1.n)||' days') as [Date]
from (
select Id, min([Date]) as MinDate, max([Date]) as MaxDate
from t
group by Id
) q
join Digits d1
join Digits d2
where date(MinDate,'+'||(d2.n*10+d1.n)||' days') <= MaxDate
)
select d.Id,
(substr(d.[Date],9,2)||'.'||substr(d.[Date],6,2)||'.'||substr(d.[Date],3,2)) as [Date],
coalesce(t.Price,0) as Price
from Dates d
left join t on (d.Id = t.Id and d.[Date] = t.[Date])
order by d.Id, d.[Date];
The recursive SQL below was totally inspired by the excellent answer from Gordon Linoff.
And a recursive SQL is probably more performant for this anyway.
(He should get the 15 points for the accepted answer).
The difference in this version is that the datestamps are first formatted to YYYY-MM-DD.
with t as (
select Id,
('20'||substr([Date],7,2)||'-'||substr([Date],4,2)||'-'||substr([Date],1,2)) as [Date],
Price
from testtable
),
cte as (
select Id, min([Date]) as [Date], max([Date]) as MaxDate from t
group by Id
union all
select Id, date([Date], '+1 day'), MaxDate from cte
where [Date] < MaxDate
)
select cte.Id,
(substr(cte.[Date],9,2)||'.'||substr(cte.[Date],6,2)||'.'||substr(cte.[Date],3,2)) as [Date],
coalesce(t.Price, 0) as PricePerDay
from cte
left join t
on (cte.Id = t.Id and cte.[Date] = t.[Date])
order by cte.Id, cte.[Date];

How to select the user with max count by day

I have a table with three columns
UserID, Count, Date
I'd like to be able to select the userid with the highest count for each date.
I've tried a few different variations of queries with inline select statements but none have worked 100%, and I'm not too fond of having a select with three inline selects.
Is doing inline selects the only way to go without using temp tables? Whats the best way to tackle this?
This solution will give you multiple records if there is a tie in Count but should work.
SELECT a.Date, a.UserId, a.[Count]
FROM yourTable a INNER JOIN (
SELECT MAX([Count]) as [Count], Date
FROM yourTable
GROUP BY Date
) b ON a.[Count] = b.[Count] AND a.Date = b.Date
ORDER BY a.Date
If [Date] is in fact a [Date] column with no time component:
;WITH x AS
(
SELECT [Date], [Count], UserID, rn = ROW_NUMBER() OVER
(PARTITION BY [Date] ORDER BY [Count] DESC)
FROM dbo.table
)
SELECT [Date], [Count], UserID
FROM x
WHERE rn = 1
ORDER BY [Date];
If [Date] is a DATETIME column with a time component, then:
;WITH x AS
(
SELECT [Date] = DATEADD(DAY, DATEDIFF(DAY, '19000101', [Date]), '19000101'),
[Count], UserID, rn = ROW_NUMBER() OVER
(PARTITION BY DATEADD(DAY, DATEDIFF(DAY, '19000101', [Date]), '19000101')
ORDER BY [Count] DESC)
FROM dbo.table
)
SELECT [Date], [Count], UserID
FROM x
WHERE rn = 1
ORDER BY [Date];
If you want to pick a specific row in the event of a tie, you can add a tie-breaker to the ORDER BY within the over. If you want to include multiple rows in the case of ties, you can try changing ROW_NUMBER() to DENSE_RANK().
SELECT x.*
FROM (
SELECT Date
FROM atable
GROUP BY Date
) t
CROSS APPLY (
SELECT TOP 1 WITH TIES
UserID, Count, Date
FROM atable
WHERE Date = t.Date
ORDER BY Count DESC
) x
If Date is datetime type and can have a non-zero time component, change the t table like this:
…
FROM (
SELECT Date = DATEADD(DAY, DATEDIFF(DAY, 0, Date), 0)
FROM atable
GROUP BY DATEADD(DAY, DATEDIFF(DAY, 0, Date), 0)
) t
…
References:
TOP (Transact-SQL)
Using APPLY
for SQL 2k5
select UserID, Count, Date
from tb
where Rank() over (partition by Date order by Count DESC, UserID DESC) = 1