SQL to determine distinct periods of sequential days of access?

SQL to determine distinct periods of sequential days of access? - sql

Jeff recently asked this question and got some great answers.
Jeff's problem revolved around finding the users that have had (n) consecutive days where they have logged into a system. Using a database table structure as follows:
Id UserId CreationDate
------ ------ ------------
750997 12 2009-07-07 18:42:20.723
750998 15 2009-07-07 18:42:20.927
751000 19 2009-07-07 18:42:22.283
Read the original question first for clarity and then...
I was intrigued by the problem of determining how many distinct (n)-day periods for a user.
Could one craft a speedy SQL query that could return a list of users and the number of distinct (n)-day periods they have?
EDIT: as per a comment below If someone has 2 consecutive days, then a gap, then 4 consecutive days, then a gap, then 8 consecutive days. It would be 3 "distinct 4 day periods". The 8 day period should count as two back-to-back 4 day periods.

My answer appears to have not appeared...
I'll try again...
Rob Farley's answer to the original question has the handy benefit of including the number of consecutive days.
with numberedrows as
(
select row_number() over (partition by UserID order by CreationDate) - cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
from tablename
)
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset
Using integer division, simply dividing the consecutive number of days gives the number of "distinct (n)-day periods" covered by the whole consecutive period...
- 2 / 4 = 0
- 4 / 4 = 1
- 8 / 4 = 2
- 9 / 4 = 2
- etc, etc
So here is my take on Rob's answer for your needs...
(I really LOVE Rob's answer, go read the explanation, it's inspired thinking!)
with
numberedrows (
UserID,
TheOffset
)
as
(
select
UserID,
row_number() over (partition by UserID order by CreationDate)
- DATEDIFF(DAY, 0, CreationDate) as TheOffset
from
tablename
),
ConsecutiveCounts(
UserID,
ConsecutiveDays
)
as
(
select
UserID,
count(*) as ConsecutiveDays
from
numberedrows
group by
UserID,
TheOffset
)
select
UserID,
SUM(ConsecutiveDays / #period_length) AS distinct_n_day_periods
from
ConsecutiveCounts
group by
UserID
The only real difference is that I take Rob's results and then run it through another GROUP BY...

So - I'm going to start with my query from the last question, which listed each run of consecutive days. Then I'm going to group that by userid and NumConsecutiveDays, to count how many runs of days there are for those users.
with numberedrows as
(
select row_number() over (partition by UserID order by CreationDate) - cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
from tablename
)
,
runsOfDay as
(
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset
)
select UserID, NumConsecutiveDays, count(*) as NumOfRuns
from runsOfDays
group by UserID, NumConsecutiveDays
;
And of course, if you want to filter this to only consider runs of a certain length, then put "where NumConsecutiveDays >= #days" in the last query.
Now, if you want to count a run of 16 days as three 5-day runs, then each run will count as NumConsecutiveDays / #runlength of these (which will round down for each integer). So now instead of just counting how many there are of each, use SUM instead. You could use the query above and use SUM(NumOfRuns * NumConsecutiveDays / #runlength), but if you understand the logic, then the query below is a bit easier.
with numberedrows as
(
select row_number() over (partition by UserID order by CreationDate) - cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
from tablename
)
,
runsOfDay as
(
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset
)
select UserID, sum(NumConsecutiveDays / #runlength) as NumOfRuns
from runsOfDays
where NumConsecutiveDays >= #runlength
group by UserID
;
Hope this helps,
Rob

This works quite nicely with the test data I have.
DECLARE #days int
SET #days = 30
SELECT DISTINCT l.UserId, (datediff(d,l.CreationDate, -- Get first date in contiguous range
(
SELECT min(a.CreationDate ) as CreationDate
FROM UserHistory a
LEFT OUTER JOIN UserHistory b
ON a.CreationDate = dateadd(day, -1, b.CreationDate ) AND
a.UserId = b.UserId
WHERE b.CreationDate IS NULL AND
a.CreationDate >= l.CreationDate AND
a.UserId = l.UserId
) )+1)/#days as cnt
INTO #cnttmp
FROM UserHistory l
LEFT OUTER JOIN UserHistory r
ON r.CreationDate = dateadd(day, -1, l.CreationDate ) AND
r.UserId = l.UserId
WHERE r.CreationDate IS NULL
ORDER BY l.UserId
SELECT UserId, sum(cnt)
FROM #cnttmp
GROUP BY UserId
HAVING sum(cnt) > 0

Related

count consecutive number of -1 in a column. count >=14

I'm trying to figure out query to count "-1" that have occurred for more than 14 times. Can anyone help me here. I tried everything from lead, row number, etc but nothing is working out.
The BP is recorded for every minute and I need to figure the id's who's bp_level was "-1" for more than 14min

You may try the following:
Select Distinct B.Person_ID, B.[Consecutive]
From
(
Select D.person_ID, COUNT(D.bp_level) Over (Partition By D.grp, D.person_ID Order By D.Time_) [Consecutive]
From
(
Select Time_, Person_ID, bp_level,
DATEADD(Minute, -ROW_NUMBER() Over (Partition By Person_ID Order By Time_), Time_) grp
From mytable Where bp_level = -1
) D
) B
Where B.[Consecutive] >= 14
See a demo from db<>fiddle. Using SQL Server.
DATEADD(Minute, -ROW_NUMBER() Over (Partition By Person_ID Order By Time_), Time_): to define a unique group for consecutive times per person, where (bp_level = -1).
COUNT(D.bp_level) Over (Partition By D.grp, D.person_ID Order By D.Time_): to find the cumulative sum of bp_level over the increasing of time for each group.
Once a none -1 value appeared the group will split into two groups and the counter will reset to 0 for the other group.
NOTE: this solution works only if there are no gaps between the consecutive times, the time is increased by one minute for each row/ person, otherwise, the query will not work but can be modified to cover the gaps.

with data as (
select *,
count(case when bp_level = 1 then 1 end) over
(partition by person_id order by time) as grp
from T
)
select distinct person_id
from data
where bp_level = -1
group by person_id, grp
having count(*) > 14; /* = or >= ? */
If you want to rely on timestamps rather than a count of rows then you could use the time difference:
...
-- where 1 = 1 /* all rows */
group by person_id, grp
having datediff(minute, min(time), max(time)) > 14;
The accepted answer would have issues with scenarios where there are multiple rows with the same timestamp if there's any potential for that to happen.
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=2ad6a1b515bb4091efba9b8831e5d579

Order by decimals for counts of same value

EDIT: I have edited this question to make the query simpler:
ReportTracking:
Userid, ReportId, Duration, CreatedDate
Query:
SELECT t.UserId, COUNT(DISTINCT(t.ReportId)) AS ReportsRead
FROM ReportTracking t
WHERE t.Duration >= 30
AND t.CreatedDate > DATEADD(Day, -30, GETDATE())
GROUP BY t.UserId
Sample Result:
UserId ReportsRead
1 22
2 13
3 2
4 2
5 2
What I need to do is assign a number value to Reports Read. Essentially because there are 3 users who read swimming and they tie in terms of ranking (they each have 2 read only) I need to order them by who read the report last. I need to assign them all a decimal number value based on order of reading. So the person who read the report last would get .1, the person who read it first would get .3.
I'm not quite sure how to achieve this, the key part is that they do have have a decimal number value that ranks them and this decimal should be few decimal points long as the records are rather long. My idea was to use DateCreated and convert it a number value which I can substract from a max. But since there are multiple dates (one for each report), I'm not sure how to grab the latest one and only use that date with my report count.

I'm not sure why you need to assign decimals...
Just order by ReportsRead desc, max(createdDate) (this should be most recent read for a user in the select).
Also distinct isn't a function it's a statement. No need for the ()
SELECT t.UserId
, COUNT(DISTINCT t.ReportId) AS ReportsRead
max(t.createDate) Asc) RN
FROM ReportTracking t
WHERE t.Duration >= 30
AND t.CreatedDate > DATEADD(Day, -30, GETDATE())
GROUP BY t.UserId
ORDER BY ReportsRead DESC, max(createdDate)
if you need the numbers and plan on displaying them
WITH CTE AS (
SELECT t.UserId
, COUNT(DISTINCT t.ReportId) AS ReportsRead
, row_number() over (partition by count(Distinct t.reportID) order by max(t.createDate) Asc) RN
FROM ReportTracking t
WHERE t.Duration >= 30
AND t.CreatedDate > DATEADD(Day, -30, GETDATE())
GROUP BY t.UserId)
SELECT *
FROM CTE
ORDER BY ReportsRead DESC, RN

You can rank your rows within ReportsRead partition to obtain a ranking by ordering on the max(createddate). documentation: SQL Server Rank function
here is an example: http://sqlfiddle.com/#!18/1eefc/11
You may simplify the query by using CTE to reuse column aliases but the concept is:
SELECT t.UserId
, COUNT(DISTINCT( t.ReportId )) AS ReportsRead
, CAST(RANK()
OVER(
partition BY COUNT(DISTINCT( t.ReportId ))
ORDER BY MAX(t.createdDate) DESC) AS DECIMAL) / 10 ranking
FROM ReportTracking t
WHERE t.Duration >= 30
AND t.CreatedDate > DATEADD(Day, -30, GETDATE())
GROUP BY t.UserId
ORDER BY ReportsRead DESC
, ranking;

How to narrow down count query by a finite time frame?

I have a query where I am identifying more than 1 submission by user for a particular form:
select userid, form_id, count(*)
from table_A
group by userid, form_id
having count(userid) > 1
However, I am trying to see which users are submitting more than 1 form within a 5 second timeframe (We have a field for the submission timestamp in this table). How would I narrow this query down by that criteria?

#nikotromus
You've not provided a lot of details about your schema and other columns available, nor about what / how and where this information will be used.
However if you want to do it "live" so compare results in your time against current timestamp it would look something like:
SELECT userid, form_id, count(*)
FROM table_A
WHERE DATEDIFF(SECOND,YourColumnWithSubmissionTimestamp, getdate()) <= 5
GROUP BY userid, form_id
HAVING count(userid) > 1

One way is to add to the group by DATEDIFF(Second, '2017-01-01', SubmittionTimeStamp) / 5.
This will group records based on the userid, form_id and a five seconds interval:
select userid, form_id, count(*)
from table_A
group by userid, form_id, datediff(Second, '2017-01-01', SubmittionTimeStamp) / 5
having count(userid) > 1
Read this SO post for a more detailed explanation.

You can use lag to form groups of rows that are within 5 seconds of each other and then do aggregation on them:
select distinct userid,
form_id
from (
select t.*,
sum(val) over (
order by t.submission_timestamp
) as grp
from (
select t.*,
case
when datediff(ms, lag(t.submission_timestamp, 1, t.submission_timestamp) over (
order by t.submission_timestamp
), t.submission_timestamp) > 5000
then 1
else 0
end val
from your_table t
) t
) t
group by userid,
form_id,
grp
having count(*) > 1;
See this answer for more explanation:
Group records by consecutive dates when dates are not exactly consecutive

I would just use exists to get the users:
select userid, form_id
from table_A a
where exists (select 1
from table_A a2
where a2.userid = a.userid and a2.timestamp >= a.timestamp and a2.timestamp < dateadd(second, 5, a.timestamp
);
If you want a count, you can just add group by and count(*).

How to get value by a range of dates?

I have a table like so
And With this code I get the 5 latest values for each domainId
;WITH grp AS
(
SELECT DomainId, [Date],Passed, DatabasePerformance,ServerPerformance,
rn = ROW_NUMBER() OVER
(PARTITION BY DomainId ORDER BY [Date] DESC)
FROM dbo.DomainDetailDataHistory H
)
SELECT g.DomainId, g.[Date],g.Passed, g.ServerPerformance, g.DatabasePerformance
FROM grp g
INNER JOIN #Latest T ON T.DomainId = g.DomainId
WHERE rn < 7 AND t.date != g.[Date]
ORDER BY DomainId, [Date] DESC
What I Want
Well I would like to know how many tickets were sold for each of these 5 latest rows but with the following condition:
Each of these rows come with their own date which differs.
for each date I want to check how many were sold the last 15minutes AND how many were sold the last 30mns.
Example:
I get these 5 rows for each domainId
I want to extend the above with two columns, "soldTicketsLast15" and "soldTicketsLast30"
The date column contains all the dates I need and for each of these dates I want to go back 15 min and go back 30min to and get how many tickets were sold
Example:
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] >= DATEADD(minute, -15, '2016-04-12 12:10:28.2270000')
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] >= DATEADD(minute, -30, '2016-04-12 12:10:28.2270000')
How can i accomplish this?

I'd use OUTER APPLY or CROSS APPLY.
;WITH grp AS
(
SELECT
DomainId, [Date], Passed, DatabasePerformance, ServerPerformance,
rn = ROW_NUMBER() OVER (PARTITION BY DomainId ORDER BY [Date] DESC)
FROM dbo.DomainDetailDataHistory H
)
SELECT
g.DomainId, g.[Date],g.Passed, g.ServerPerformance, g.DatabasePerformance
,A15.SoldTicketsLast15
,A30.SoldTicketsLast30
FROM
grp g
INNER JOIN #Latest T ON T.DomainId = g.DomainId
OUTER APPLY
(
SELECT MAX(H.SoldTickets) - MIN(H.SoldTickets) AS SoldTicketsLast15
FROM DomainDetailDataHistory AS H
WHERE
H.DomainId = g.DomainId AND
H.[Date] >= DATEADD(minute, -15, g.[Date])
) AS A15
OUTER APPLY
(
SELECT MAX(H.SoldTickets) - MIN(H.SoldTickets) AS SoldTicketsLast30
FROM DomainDetailDataHistory AS H
WHERE
H.DomainId = g.DomainId AND
H.[Date] >= DATEADD(minute, -30, g.[Date])
) AS A30
WHERE
rn < 7
AND T.[date] != g.[Date]
ORDER BY DomainId, [Date] DESC;
To make the correlated APPLY queries efficient there should be an appropriate index, like the following:
CREATE NONCLUSTERED INDEX [IX_DomainId_Date] ON [dbo].[DomainDetailDataHistory]
(
[DomainId] ASC,
[Date] ASC
)
INCLUDE ([SoldTickets])
This index may also help to make the main part of your query (grp) efficient.

If I understood your question correctly, you want to get the tickets sold from one of your dates (in the Date column) going back 15 minutes and 30 minutes. Assuming that you are using your DATEADD function correctly, the following should work:
SELECT MAX(SoldTickets) FROM DomainDetailDataHistory
WHERE [Date] BETWEEN [DATE] AND DATEADD(minute, -15, '2016-04-12 12:10:28.2270000') GROUP BY [SoldTickets]
The between operator allows you to retrieve results between two date parameters. In the SQL above, we also need a group by since you are using a GROUPING function (MAX). The group by would depend on what you want to group by but I think in your case it would be SoldTickets.
The SQL above will give you the ones between the date and 15 minutes back. You could do something similar with the 30 minutes back.

SQL Aggregates OVER and PARTITION

All,
This is my first post on Stackoverflow, so go easy...
I am using SQL Server 2008.
I am fairly new to writing SQL queries, and I have a problem that I thought was pretty simple, but I've been fighting for 2 days. I have a set of data that looks like this:
UserId Duration(Seconds) Month
1 45 January
1 90 January
1 50 February
1 42 February
2 80 January
2 110 February
3 45 January
3 62 January
3 56 January
3 60 February
Now, what I want is to write a single query that gives me the average for a particular user and compares it against all user's average for that month. So the resulting dataset after a query for user #1 would look like this:
UserId Duration(seconds) OrganizationDuration(Seconds) Month
1 67.5 63 January
1 46 65.5 February
I've been batting around different subqueries and group by scenarios and nothing ever seems to work. Lately, I've been trying OVER and PARTITION BY, but with no success there either. My latest query looks like this:
select Userid,
AVG(duration) OVER () as OrgAverage,
AVG(duration) as UserAverage,
DATENAME(mm,MONTH(StartDate)) as Month
from table.name
where YEAR(StartDate)=2014
AND userid=119
GROUP BY MONTH(StartDate), UserId
This query bombs out with a "Duration' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause" error.
Please keep in mind I'm dealing with a very large amount of data. I think I can make it work with CASE statements, but I'm looking for a cleaner, more efficient way to write the query if possible.
Thank you!

You are joining two queries together here:
Per-User average per month
All Organisation average per month
If you are only going to return data for one user at a time then an inline select may give you joy:
SELECT AVG(a.duration) AS UserAvergage,
(SELECT AVG(b.Duration) FROM tbl b WHERE MONTH(b.StartDate) = MONTH(a.StartDate)) AS OrgAverage
...
FROM tbl a
WHERE userid = 119
GROUP BY MONTH(StartDate), UserId
Note - using comparison on MONTH may be slow - you may be better off having a CTE (Common Table Expression)

missing partition clause in Average function
OVER ( Partition by MONTH(StartDate))

Please try this. It works fine to me.
WITH C1
AS
(
SELECT
AVG(Duration) AS TotalAvg,
[Month]
FROM [dbo].[Test]
GROUP BY [Month]
),
C2
AS
(
SELECT Distinct UserID,
AVG(Duration) OVER(PARTITION BY UserID, [Month] ORDER BY UserID) AS DetailedAvg,
[Month]
FROM [dbo].[Test]
)
SELECT C2.*, C1.TotalAvg
FROM C2 c2
INNER JOIN C1 c1 ON c1.[Month] = c2.[Month]
ORDER BY c2.UserID, c2.[Month] desc;

I was able to get it done using a self join, There's probably a better way.
Select UserId, AVG(t1.Duration) as Duration, t2.duration as OrgDur, t1.Month
from #temp t1
inner join (Select Distinct MONTH, AVG(Duration) over (partition by Month) as duration
from #temp) t2 on t2.Month = t1.Month
group by t1.Month, t1.UserId, t2.Duration
order by t1.UserId, Month desc
Here's using a CTE which is probably a better solution and definitely easier to read
With MonthlyAverage
as
(
Select MONTH, AVG(Duration) as OrgDur
from #temp
group by Month
)
Select UserId, AVG(t1.Duration) as Duration, m.duration as OrgDur , t1.Month
from #temp t1
inner join MonthlyAverage m on m.Month = t1.Month
group by UserId, t1.Month, m.duration

You can try below with less code.
SELECT Distinct UserID,
AVG(Duration) OVER(PARTITION BY [Month]) AS TotalAvg,
AVG(Duration) OVER(PARTITION BY UserID, [Month] ORDER BY UserID) AS DetailedAvg,
[Month]
FROM [dbo].[Test]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas