How to narrow down count query by a finite time frame? - sql

I have a query where I am identifying more than 1 submission by user for a particular form:
select userid, form_id, count(*)
from table_A
group by userid, form_id
having count(userid) > 1
However, I am trying to see which users are submitting more than 1 form within a 5 second timeframe (We have a field for the submission timestamp in this table). How would I narrow this query down by that criteria?

#nikotromus
You've not provided a lot of details about your schema and other columns available, nor about what / how and where this information will be used.
However if you want to do it "live" so compare results in your time against current timestamp it would look something like:
SELECT userid, form_id, count(*)
FROM table_A
WHERE DATEDIFF(SECOND,YourColumnWithSubmissionTimestamp, getdate()) <= 5
GROUP BY userid, form_id
HAVING count(userid) > 1

One way is to add to the group by DATEDIFF(Second, '2017-01-01', SubmittionTimeStamp) / 5.
This will group records based on the userid, form_id and a five seconds interval:
select userid, form_id, count(*)
from table_A
group by userid, form_id, datediff(Second, '2017-01-01', SubmittionTimeStamp) / 5
having count(userid) > 1
Read this SO post for a more detailed explanation.

You can use lag to form groups of rows that are within 5 seconds of each other and then do aggregation on them:
select distinct userid,
form_id
from (
select t.*,
sum(val) over (
order by t.submission_timestamp
) as grp
from (
select t.*,
case
when datediff(ms, lag(t.submission_timestamp, 1, t.submission_timestamp) over (
order by t.submission_timestamp
), t.submission_timestamp) > 5000
then 1
else 0
end val
from your_table t
) t
) t
group by userid,
form_id,
grp
having count(*) > 1;
See this answer for more explanation:
Group records by consecutive dates when dates are not exactly consecutive

I would just use exists to get the users:
select userid, form_id
from table_A a
where exists (select 1
from table_A a2
where a2.userid = a.userid and a2.timestamp >= a.timestamp and a2.timestamp < dateadd(second, 5, a.timestamp
);
If you want a count, you can just add group by and count(*).

Related

count consecutive number of -1 in a column. count >=14

I'm trying to figure out query to count "-1" that have occurred for more than 14 times. Can anyone help me here. I tried everything from lead, row number, etc but nothing is working out.
The BP is recorded for every minute and I need to figure the id's who's bp_level was "-1" for more than 14min
You may try the following:
Select Distinct B.Person_ID, B.[Consecutive]
From
(
Select D.person_ID, COUNT(D.bp_level) Over (Partition By D.grp, D.person_ID Order By D.Time_) [Consecutive]
From
(
Select Time_, Person_ID, bp_level,
DATEADD(Minute, -ROW_NUMBER() Over (Partition By Person_ID Order By Time_), Time_) grp
From mytable Where bp_level = -1
) D
) B
Where B.[Consecutive] >= 14
See a demo from db<>fiddle. Using SQL Server.
DATEADD(Minute, -ROW_NUMBER() Over (Partition By Person_ID Order By Time_), Time_): to define a unique group for consecutive times per person, where (bp_level = -1).
COUNT(D.bp_level) Over (Partition By D.grp, D.person_ID Order By D.Time_): to find the cumulative sum of bp_level over the increasing of time for each group.
Once a none -1 value appeared the group will split into two groups and the counter will reset to 0 for the other group.
NOTE: this solution works only if there are no gaps between the consecutive times, the time is increased by one minute for each row/ person, otherwise, the query will not work but can be modified to cover the gaps.
with data as (
select *,
count(case when bp_level = 1 then 1 end) over
(partition by person_id order by time) as grp
from T
)
select distinct person_id
from data
where bp_level = -1
group by person_id, grp
having count(*) > 14; /* = or >= ? */
If you want to rely on timestamps rather than a count of rows then you could use the time difference:
...
-- where 1 = 1 /* all rows */
group by person_id, grp
having datediff(minute, min(time), max(time)) > 14;
The accepted answer would have issues with scenarios where there are multiple rows with the same timestamp if there's any potential for that to happen.
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=2ad6a1b515bb4091efba9b8831e5d579

Select latest 30 dates for each unique ID

This is a sample data file
Data Contains unique IDs with different latitudes and longitudes on multiple timestamps.I would like to select the rows of latest 30 days of coordinates for each unique ID.Please help me on how to run the query .This date is in Hive table
Regards,
Akshay
According to your example above (where no current year dates for id=2,3), you can numbering date for each id (order by date descending) using window function ROW_NUMBER(). Then just get latest 30 values:
--get all values for each id where num<=30 (get last 30 days for each day)
select * from
(
--numbering each date for each id order by descending
select *, row_number()over(partition by ID order by DATE desc)num from Table
)X
where num<=30
If you need to get only unique dates (without consider time) for each id, then can try this query:
select * from
(
--numbering date for each id
select *, row_number()over(partition by ID order by new_date desc)num
from
(
-- move duplicate using distinct
select distinct ID,cast(DATE as date)new_date from Table
)X
)Y
where num<=30
In Oracle this will be:
SELECT * FROM TEST_DATE1
WHERE DATEUPDT > SYSDATE - 30;
select * from MyTable
where
[Date]>=dateadd(d, -30, getdate());
To group by ID and perform aggregation, something like this
select ID,
count(*) row_count,
max(Latitude) max_lat,
max(Longitude) max_long
from MyTable
where
[Date]>=dateadd(d, -30, getdate())
group by ID;

SQL Server: Attempting to output a count with a date

I am trying to write a statement and just a bit puzzled what is the best way to put it together. So I am doing a UNION on a number of tables and then from there I want to produce as the output a count for the UserID within that day.
So I will have numerous tables union such as:
Order ID, USERID, DATE, Task Completed.
UNION
Order ID, USERID, DATE, Task Completed
etc
Above is layout of the table which will have 4 tables union together with same names.
Then statement output I want is for a count of USERID that occurred within the last 24 hours.
So output should be:
USERID--- COUNT OUTPUT-- DATE
I was attempting a WHERE statement but think the output is not what I am after exactly, just thinking if anyone can point me in the right direction and if there is alternative way compared to the union? Maybe a joint could be a better alternative, any help be appreciated.
I will eventually then put this into a SSRS report, so it gets updated daily.
You can try this:
select USERID, count(*) as [COUNT], cast(DATE as date) as [DATE]
from
(select USERID, DATE From SomeTable1
union all
select USERID, DATE From SomeTable2
....
) t
where DATE <= GETDATE() AND DATE >= DATEADD(hh, -24, GETDATE())
group by USERID, cast(DATE as date)
First, you should use union all rather than union. Second, you need to aggregate and use count distinct to get what you want:
So, the query you want is something like:
select count(distinct userid)
from ((select date, userid
from table1
where date >= '2015-05-26'
) union all
(select date, userid
from table2
where date >= '2015-05-26'
) union all
(select date, userid
from table3
where date >= '2015-05-26'
)
) du
Note that this hardcodes the date. In SQL Server, you would do something like:
date >= cast(getdate() - 1 as date)
And in MySQL
date >= date_sub(curdate(), interval 1 day)
EDIT:
I read the question as wanting a single day. It is easy enough to extend to all days:
select cast(date as date) as dte, count(distinct userid)
from ((select date, userid
from table1
) union all
(select date, userid
from table2
) union all
(select date, userid
from table3
)
) du
group by cast(date as date)
order by dte;
For even more readability, you could use a CTE:
;WITH cte_CTEName AS(
SELECT UserID, Date, [Task Completed] FROM Table1
UNION
SELECT UserID, Date, [Task Completed] FROM Table2
etc
)
SELECT COUNT(UserID) AS [Count] FROM cte_CTEName
WHERE Date <= GETDATE() AND Date >= DATEADD(hh, -24, GETDATE())
I think this is what you are trying to achieve...
Select
UserID,
Date,
Count(1)
from
(Select *
from table1
Union All
Select *
from table2
Union All
Select *
from table3
Union All
Select *
from table4
) a
Group by
Userid,
Date

SQL grouping user count by Mondays

Given a Users table like so:
Users: id, created_at
How can I get the # of users created grouped by day? My goal is to see the number of users created this Monday versus previous Monday's.
If created_at is of type timestamp, the simplest and fastest way is a plain cast to date:
SELECT created_at::date AS day, count(*) AS ct
FROM users
GROUP BY 1;
Since I am assuming that id cannot be NULL, count(*) is a tiny bit shorter and faster than count(id), while doing the same.
If you just want to see days since "last Monday":
SELECT created_at::date, count(*) AS ct
FROM users
WHERE created_at >= (now()::date - (EXTRACT(ISODOW FROM now())::int + 6))
GROUP BY 1
ORDER BY 1;
This is carefully drafted to use a sargable condition, so it can use a simple index on created_at if present.
Consider the manual for EXTRACT.
SELECT COUNT(id) AS cnt, EXTRACT(DOW FROM created_at) AS dow
FROM Users
GROUP BY EXTRACT(DAY FROM created_at)
If you want to see the days, use to_char(<date>, 'Day').
So, one way to do what you want:
select date_trunc('day', created_at), count(*)
from users u
where to_char(created_at, 'Dy') = 'Mon'
group by date_trunc('day', created_at)
order by 1 desc;
Perhaps a more general way to look at it would be to summarize the results by day of the week, for this week and last week. Something like:
select to_char(created_at, 'Day'),
sum(case when created_at >= current_date - 6 then 1 else 0 end) as ThisWeek,
sum(case when trunc(created_at) between current_date - 13 and current_date - 7 then 1 else 0 end) as LastWeek
from users u
group by to_char(created_at, 'Day')
I am from a T-SQL background and I would do something like this
CREATE TABLE #users
(id int,
created_at datetime
)
INSERT INTO #users
(id, created_at)
VALUES
(
1, getdate()
)
INSERT INTO #users
(id, created_at)
VALUES
(
1, getdate()
)
INSERT INTO #users
(id, created_at)
VALUES
(
1, dateadd(DAY, 1,getdate())
)
SELECT id, created_at, count(id) FROM #users
GROUP BY id, created_at
DROP TABLE #users
You will get better results if you only group by day part and not the entire datetime value.
Coming to second part - only comparing for Mondays; you can use something like
select datename(dw,getdate())
the above will give you the name of the weekday which you can compare against a string literal 'Monday'.

SQL to determine distinct periods of sequential days of access?

Jeff recently asked this question and got some great answers.
Jeff's problem revolved around finding the users that have had (n) consecutive days where they have logged into a system. Using a database table structure as follows:
Id UserId CreationDate
------ ------ ------------
750997 12 2009-07-07 18:42:20.723
750998 15 2009-07-07 18:42:20.927
751000 19 2009-07-07 18:42:22.283
Read the original question first for clarity and then...
I was intrigued by the problem of determining how many distinct (n)-day periods for a user.
Could one craft a speedy SQL query that could return a list of users and the number of distinct (n)-day periods they have?
EDIT: as per a comment below If someone has 2 consecutive days, then a gap, then 4 consecutive days, then a gap, then 8 consecutive days. It would be 3 "distinct 4 day periods". The 8 day period should count as two back-to-back 4 day periods.
My answer appears to have not appeared...
I'll try again...
Rob Farley's answer to the original question has the handy benefit of including the number of consecutive days.
with numberedrows as
(
select row_number() over (partition by UserID order by CreationDate) - cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
from tablename
)
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset
Using integer division, simply dividing the consecutive number of days gives the number of "distinct (n)-day periods" covered by the whole consecutive period...
- 2 / 4 = 0
- 4 / 4 = 1
- 8 / 4 = 2
- 9 / 4 = 2
- etc, etc
So here is my take on Rob's answer for your needs...
(I really LOVE Rob's answer, go read the explanation, it's inspired thinking!)
with
numberedrows (
UserID,
TheOffset
)
as
(
select
UserID,
row_number() over (partition by UserID order by CreationDate)
- DATEDIFF(DAY, 0, CreationDate) as TheOffset
from
tablename
),
ConsecutiveCounts(
UserID,
ConsecutiveDays
)
as
(
select
UserID,
count(*) as ConsecutiveDays
from
numberedrows
group by
UserID,
TheOffset
)
select
UserID,
SUM(ConsecutiveDays / #period_length) AS distinct_n_day_periods
from
ConsecutiveCounts
group by
UserID
The only real difference is that I take Rob's results and then run it through another GROUP BY...
So - I'm going to start with my query from the last question, which listed each run of consecutive days. Then I'm going to group that by userid and NumConsecutiveDays, to count how many runs of days there are for those users.
with numberedrows as
(
select row_number() over (partition by UserID order by CreationDate) - cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
from tablename
)
,
runsOfDay as
(
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset
)
select UserID, NumConsecutiveDays, count(*) as NumOfRuns
from runsOfDays
group by UserID, NumConsecutiveDays
;
And of course, if you want to filter this to only consider runs of a certain length, then put "where NumConsecutiveDays >= #days" in the last query.
Now, if you want to count a run of 16 days as three 5-day runs, then each run will count as NumConsecutiveDays / #runlength of these (which will round down for each integer). So now instead of just counting how many there are of each, use SUM instead. You could use the query above and use SUM(NumOfRuns * NumConsecutiveDays / #runlength), but if you understand the logic, then the query below is a bit easier.
with numberedrows as
(
select row_number() over (partition by UserID order by CreationDate) - cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
from tablename
)
,
runsOfDay as
(
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset
)
select UserID, sum(NumConsecutiveDays / #runlength) as NumOfRuns
from runsOfDays
where NumConsecutiveDays >= #runlength
group by UserID
;
Hope this helps,
Rob
This works quite nicely with the test data I have.
DECLARE #days int
SET #days = 30
SELECT DISTINCT l.UserId, (datediff(d,l.CreationDate, -- Get first date in contiguous range
(
SELECT min(a.CreationDate ) as CreationDate
FROM UserHistory a
LEFT OUTER JOIN UserHistory b
ON a.CreationDate = dateadd(day, -1, b.CreationDate ) AND
a.UserId = b.UserId
WHERE b.CreationDate IS NULL AND
a.CreationDate >= l.CreationDate AND
a.UserId = l.UserId
) )+1)/#days as cnt
INTO #cnttmp
FROM UserHistory l
LEFT OUTER JOIN UserHistory r
ON r.CreationDate = dateadd(day, -1, l.CreationDate ) AND
r.UserId = l.UserId
WHERE r.CreationDate IS NULL
ORDER BY l.UserId
SELECT UserId, sum(cnt)
FROM #cnttmp
GROUP BY UserId
HAVING sum(cnt) > 0