General approach when needing to calculate values between rows

General approach when needing to calculate values between rows - sql

For the following table :
Id Timestamp AssignedTo
1 2012-01-01 User1
2 2012-01-02 User2
3 2012-01-10 User3
4 2012-01-15 User1
what would be the general approach in MS SQL Server to calculating how many days/hours/minutes the entity was assigned to a specific user?
As a developer (with basic SQL knowledge), I've always opted for Functions or Stored procedures when doing these kind of queries (these kind = queries including results based on the difference between two rows in a set).
In this example, I'd have a cursor iterating over the items, keeping the previous one in a variable, and then calculating the difference. However, I've been told that this is a serious performance concern, especially on large sets.
EDIT: Sample results
User TotalTime
User1 23d
User2 8d
User3 5d
i.e. the total time that an item was assigned to a specific user. User1's TotalTime is 23 days because it has been assigned to him since 2012-01-15 (and it's 2012-02-06 now), along with the first day it was assigned to him.

In this example I'd use a subquery within the select statement to get the end date for each row, then use this end date to get the time assigned to a specific user. The below is SQL Server syntax but the same principal can be applied whatever the RDBMS.
SELECT AssignedTo,
SUM(DATEDIFF(DAY, TimeStamp, ISNULL(EndDate, GETDATE()))) [Days]
FROM ( SELECT ID,
[TimeStamp],
AssignedTo,
( SELECT MIN(TimeStamp)
FROM [YOURTABLE] b
WHERE b.TimeStamp > a.TimeStamp
) [EndDate]
FROM [YOURTABLE] a
) a
GROUP BY AssignedTo

Related

Get open cases counts for particular user in specific date range

I'm creating a SSRS report and I want to get the open cases for particular user in specific date range like below.
I have table called User from there I'm getting user info(User1,User2,User3).
I have open cases in the table management under description table.
I have c_date column in class table.
And I have 3 parameters user, startdate and enddate
And I need to use c_date between startdate and enddate.
If User enters startdate as 2019-01-01 and enddate as 2019-31-01, then I want to display the User1 who has open count.
For 0-5days and User1 who has open count for 6-11 days and same thing for user2 also.
Expected output:
User 0-5days 6-11days
---- ------- -------
User1 2 1
User2 1 4
User3 5 0
Explanation: User 1 has 2 open cases between 0-5 days means when I enter date range consider 2019-01-01 and 2019-31-01 so I have 2 open cases between first 0-5 days(2019-01-01 and 2019-05-01) and 1 open cases between next 6-11 days(2019-06-01 and 2019-11-01) etc.
Can I get result like this?

You should probably do this in the dataset query if possible. Use CASE and DATEDIFF to group your data something like
SELECT
[User],
[AnyOtherColumns],
CASE
WHEN DATEDIFF(d, #startdate, c_date) BETWEEN 0 AND 5 THEN '0-5'
WHEN DATEDIFF(d, #startdate, c_date) BETWEEN 6 AND 11 THEN '6-11'
ELSE 'older'
END AS [Age]
FROM myTable
WHERE [User] = #user
AND c_date BETWEEN #startdate AND #enddate
(done from memory so may not be perfect)
In your report you can use [User] on your row group, [Age] as your column group and then simply count any of the columns to give you the actual count of records.
You could do the counting in SQL too but I'm not sure if you need the detail for something else.

Considering you have two columns,
My approach would be
Have 3 parameters, one for user and other for To and from date.
Now selecting these parameters, add them to your dataset query as filter
Note you can apply filter on ssrs dataset as well but I would prefer on query level so that you have data been filtered and loaded only req one.
Then you can apply summing and grouping based on user and play around with Ssrs tablix to get the desired results.
https://www.mssqltips.com/sqlservertip/3453/sql-server-reporting-services-reports-with-optional-query-parameters/
https://reportsyouneed.com/ssrs-tip-put-parameters-in-your-query-not-your-filter/

count occurrences for each week using db2

I am looking for some general advice rather than a solution. My problem is that I have a list of dates per person where due to administrative procedures, a person may have multiple records stored for this one instance, yet the date recorded is when the data was entered in as this person is passed through the paper trail. I understand this is quite difficult to explain so I'll give an example:
Person Date Audit
------ ---- -----
1 2000-01-01 A
1 2000-01-01 B
1 2000-01-02 C
1 2003-04-01 A
1 2003-04-03 A
where I want to know how many valid records a person has by removing annoying audits that have recorded the date as the day the data was entered, rather than the date the person first arrives in the dataset. So for the above person I am only interested in:
Person Date Audit
------ ---- -----
1 2000-01-01 A
1 2003-04-01 A
what makes this problem difficult is that I do not have the luxury of an audit column (the audit column here is just to present how to data is collected). I merely have dates. So one way where I could crudely count real events (and remove repeat audit data) is to look at individual weeks within a persons' history and if a record(s) exists for a given week, add 1 to my counter. This way even though there are multiple records split over a few days, I am only counting the succession of dates as one record (which after all I am counting by date).
So does anyone know of any db2 functions that could help me solve this problem?

If you can live with standard weeks it's pretty simple:
select
person, year(dt), week(dt), min(dt), min(audit)
from
blah
group by
person, year(dt), week(dt)
If you need seven-day ranges starting with the first date you'd need to generate your own week numbers, a calendar of sorts, e.g. like so:
with minmax(mindt, maxdt) as ( -- date range of the "calendar"
select min(dt), max(dt)
from blah
),
cal(dt,i) as ( -- fill the range with every date, count days
select mindt, 0
from minmax
union all
select dt+1 day , i+1
from cal
where dt < (select maxdt from minmax) and i < 100000
)
select
person, year(blah.dt), wk, min(blah.dt), min(audit)
from
(select dt, int(i/7)+1 as wk from cal) t -- generate week numbers
inner join
blah
on t.dt = blah.dt
group by person, year(blah.dt), wk

Running a query over past date ranges

I have a rather interesting problem which I first thought would be straight-forward, but it turned out to be more complicated.
I have data like this:
Date User ID
2012-10-11 a
2012-10-11 b
2012-10-12 c
2012-10-12 d
2012-10-13 e
2012-10-14 b
2012-10-14 e
... ...
Each row has a Date, User ID couple which indicates that that user was active on that day. A user can appear on multiple dates and a date will have multiple users -- just like in the example. I have millions of rows like this which cover a time range of about 90 days.
Here's the question: For each day, I want to get the number of users who have not been active for the past 10 days. For instance, if the user "a" was active on 2012-05-31 and but hasn't been active on any of the days between 06-01 and 06-10, I want to count this user on 6/10. I wouldn't count him again on the following days though unless he becomes active and disappears again.
Can I do this in SQL or would I need to some kind of script to organize the data the way I want. What would be your recommendations? I use Hive.
Thank you so much!

I think you can do this in Hive-compatible SQL. Here is the idea.
For each user/date get the next date for the user.
Discard the original record if the next is less than 10 days after the current one.
Add 10 to the date
Aggregate and count
I am not sure of all the Hive functions for things like date. Here is an example of how to do it:
select date+10, count(*)
from (select t.userid, t.date,
min(case when tnext.date > t.date then tnext.date end) as nextdate
from t left outer join
t tnext
on t.userid = tnext.userid
group by t.userid, t.date
) t
where nextdate is null or nextdate - date >= 10
group by date+10;
Note that the inner subquery would be better written using:
on t.userid = tnext.userid and t2.date > t.date
However, I don't know if Hive supports such a join (it doesn't support non-equijoins and it not clear about whether one or all clauses have to be equal).

how to pick records an hour apart from one another in sql server

i have a set of login data for a user_id with a time stamp.
a user could login multiple times but we need to return records at least an hour apart from one another, starting from the min record. the deduping has to happen at a user level (there can be multiple users)
for eg.
user1 2012-03-07 14:24:30.000
user1 2012-03-07 14:34:30.000
user1 2012-03-07 15:14:30.000
user1 2012-03-07 15:20:30.000
user1 2012-03-07 15:30:30.000
user1 2012-03-08 09:20:30.000
user1 2012-03-08 09:50:30.000
user1 2012-03-08 10:30:30.000
user2 2012-03-07 15:20:30.000
i would only want to see the following records
user1 2012-03-07 14:24:30.000
user1 2012-03-07 15:30:30.000
user1 2012-03-08 09:20:30.000
user1 2012-03-08 10:30:30.000
user2 2012-03-07 15:20:30.000
========================================================================
is there any way to do this in a clean way? we could do this recursively but i was hoping there might be a way to use row_number partition by.
any help is much appreciated!!

In Sql Server 2005 or newer this CTE will return table of LoginAt datetimes removing the ones less than hour apart from already selected LoginAts.
;with SkipHour(UserID, LoginAT, rn) as (
select UserID, min(LoginAt), cast (1 as bigint)
from LogTable
group by UserID
union all
select SkipHour.UserID, LogTable.LoginAt,
row_number() over (partition by SkipHour.UserID
order by Logtable.LoginAt) rn
from SkipHour
inner join LogTable
on LogTable.UserID = SkipHour.UserID
where datediff(minute, SkipHour.LoginAt, LogTable.LoginAt) >= 60
-- Only first rows from previous generation qualify to have children
and rn = 1
)
select *
from SkipHour
where rn = 1
order by UserID, LoginAT
Crucial part is row_number(). As Sql Server does not allow neither aggregate functions nor top predicate, row_number() is the only way (IMO) to order loginAt datetimes and keep only first one.
Sql Fiddle playground is this way.
UPDATE:
Row numbers are applies for each generation individually. Extract from WITH common_table_expression (Transact-SQL):
Analytic and aggregate functions in the recursive part of the CTE are
applied to the set for the current recursion level and not to the set
for the CTE. Functions like ROW_NUMBER operate only on the subset of
data passed to them by the current recursion level and not the entire
set of data pased to the recursive part of the CTE. For more
information, see J. Using analytical functions in a recursive CTE.

How do I analyse time periods between records in SQL data without cursors?

The root problem: I have an application which has been running for several months now. Users have been reporting that it's been slowing down over time (so in May it was quicker than it is now). I need to get some evidence to support or refute this claim. I'm not interested in precise numbers (so I don't need to know that a login took 10 seconds), I'm interested in trends - that something which used to take x seconds now takes of the order of y seconds.
The data I have is an audit table which stores a single row each time the user carries out any activity - it includes a primary key, the user id, a date time stamp and an activity code:
create table AuditData (
AuditRecordID int identity(1,1) not null,
DateTimeStamp datetime not null,
DateOnly datetime null,
UserID nvarchar(10) not null,
ActivityCode int not null)
(Notes: DateOnly (datetime) is the DateTimeStamp with the time stripped off to make group by for daily analysis easier - it's effectively duplicate data to make querying faster).
Also for the sake of ease you can assume that the ID is assigned in date time order, that is 1 will always be before 2 which will always be before 3 - if this isn't true I can make it so).
ActivityCode is an integer identifying the activity which took place, for instance 1 might be user logged in, 2 might be user data returned, 3 might be search results returned and so on.
Sample data for those who like that sort of thing...:
1, 01/01/2009 12:39, 01/01/2009, P123, 1
2, 01/01/2009 12:40, 01/01/2009, P123, 2
3, 01/01/2009 12:47, 01/01/2009, P123, 3
4, 01/01/2009 13:01, 01/01/2009, P123, 3
User data is returned (Activity Code 2) immediate after login (Activity Code 1) so this can be used as a rough benchmark of how long the login takes (as I said, I'm interested in trends so as long as I'm measuring the same thing for May as July it doesn't matter so much if this isn't the whole login process - it takes in enough of it to give a rough idea).
(Note: User data can also be returned under other circumstances so it's not a one to one mapping).
So what I'm looking to do is select the average time between login (say ActivityID 1) and the first instance after that for that user on that day of user data being returned (say ActivityID 2).
I can do this by going through the table with a cursor, getting each login instance and then for that doing a select to say get the minimum user data return following it for that user on that day but that's obviously not optimal and is slow as hell.
My question is (finally) - is there a "proper" SQL way of doing this using self joins or similar without using cursors or some similar procedural approach? I can create views and whatever to my hearts content, it doesn't have to be a single select.
I can hack something together but I'd like to make the analysis I'm doing a standard product function so would like it to be right.

SELECT TheDay, AVG(TimeTaken) AvgTimeTaken
FROM (
SELECT
CONVERT(DATE, logins.DateTimeStamp) TheDay
, DATEDIFF(SS, logins.DateTimeStamp,
(SELECT TOP 1 DateTimeStamp
FROM AuditData userinfo
WHERE UserID=logins.UserID
and userinfo.ActivityCode=2
and userinfo.DateTimeStamp > logins.DateTimeStamp )
)TimeTaken
FROM AuditData logins
WHERE
logins.ActivityCode = 1
) LogInTimes
GROUP BY TheDay
This might be dead slow in real world though.

In Oracle this would be a cinch, because of analytic functions. In this case, LAG() makes it easy to find the matching pairs of activity codes 1 and 2 and also to calculate the trend. As you can see, things got worse on 2nd JAN and improved quite a bit on the 3rd (I'm working in seconds rather than minutes).
SQL> select DateOnly
2 , elapsed_time
3 , elapsed_time - lag (elapsed_time) over (order by DateOnly) as trend
4 from
5 (
6 select DateOnly
7 , avg(databack_time - prior_login_time) as elapsed_time
8 from
9 ( select DateOnly
10 , databack_time
11 , ActivityCode
12 , lag(login_time) over (order by DateOnly,UserID, AuditRecordID, ActivityCode) as prior_login_time
13 from
14 (
15 select a1.AuditRecordID
16 , a1.DateOnly
17 , a1.UserID
18 , a1.ActivityCode
19 , to_number(to_char(a1.DateTimeStamp, 'SSSSS')) as login_time
20 , 0 as databack_time
21 from AuditData a1
22 where a1.ActivityCode = 1
23 union all
24 select a2.AuditRecordID
25 , a2.DateOnly
26 , a2.UserID
27 , a2.ActivityCode
28 , 0 as login_time
29 , to_number(to_char(a2.DateTimeStamp, 'SSSSS')) as databack_time
30 from AuditData a2
31 where a2.ActivityCode = 2
32 )
33 )
34 where ActivityCode = 2
35 group by DateOnly
36 )
37 /
DATEONLY ELAPSED_TIME TREND
--------- ------------ ----------
01-JAN-09 120
02-JAN-09 600 480
03-JAN-09 150 -450
SQL>
Like I said in my comment I guess you're working in MSSQL. I don't know whether that product has any equivalent of LAG().

If the assumptions are that:
Users will perform various tasks in no mandated order, and
That the difference between any two activities reflects the time it takes for the first of those two activities to execute,
Then why not create a table with two timestamps, the first column containing the activity start time, the second column containing the next activity start time. Thus the difference between these two will always be total time of the first activity. So for the logout activity, you would just have NULL for the second column.
So it would be kind of weird and interesting, for each activity (other than logging in and logging out), the time stamp would be recorded in two different rows--once for the last activity (as the time "completed") and again in a new row (as time started). You would end up with a jacob's ladder of sorts, but finding the data you are after would be much more simple.
In fact, to get really wacky, you could have each row have the time that the user started activity A and the activity code, and the time started activity B and the time stamp (which, as mentioned above, gets put down again for the following row). This way each row will tell you the exact difference in time for any two activities.
Otherwise, you're stuck with a query that says something like
SELECT TIME_IN_SEC(row2-timestamp) - TIME_IN_SEC(row1-timestamp)
which would be pretty slow, as you have already suggested. By swallowing the redundancy, you end up just querying the difference between the two columns. You probably would have less need of knowing the user info as well, since you'd know that any row shows both activity codes, thus you can just query the average for all users on any given day and compare it to the next day (unless you are trying to find out which users are having the problem as well).

This is the faster query to find out, in one row you will have current and row before datetime value, after that you can use DATEDIFF ( datepart , startdate , enddate ). I use #DammyVariable and DamyField as i remember the is some problem if is not first #variable=Field in update statement.
SELECT *, Cast(NULL AS DateTime) LastRowDateTime, Cast(NULL As INT) DamyField INTO #T FROM AuditData
GO
CREATE CLUSTERED INDEX IX_T ON #T (AuditRecordID)
GO
DECLARE #LastRowDateTime DateTime
DECLARE #DammyVariable INT
SET #LastRowDateTime = NULL
SET #DammyVariable = 1
UPDATE #T SET
#DammyVariable = DammyField = #DammyVariable
, LastRowDateTime = #LastRowDateTime
, #LastRowDateTime = DateTimeStamp
option (maxdop 1)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

General approach when needing to calculate values between rows - sql

Related

Get open cases counts for particular user in specific date range

count occurrences for each week using db2

Running a query over past date ranges

how to pick records an hour apart from one another in sql server

How do I analyse time periods between records in SQL data without cursors?

Categories

Resources