Group By n Minutes or Hours - sql

I need to query a table and group records by a user-defined time period that could be any integer of minutes or hours. One assumption we'll make is that any chosen time period is started at 12:00AM (if that makes any sense). In other words, if the user chooses to group records by 15 minutes, we will not allow them to say, begin grouping every 15 minutes starting at 12:07AM. We'll automatically assume/use 12:00AM as the starting point for grouping. Same for any other time period.
Do I need to create my own function for this? I'm not overly concerned about performance as I will be using other methods/limitations to try to keep performance issues at bay.
My table looks like this:
timeentry
--entryid (autonumber)
--begindatetime (datetime)
--enddatetime (datetime)
If I use a function I don't think this matters but I do plan to base my groupings on begindatetime and ignore enddatetime.
I'm using MS Access but I'd like my solution to be compatible with SQL Server and MySQL if possible. However, my primary focus for the moment is just MS Access.

Seems to me the Partition() function could be useful here.
Your code would create a SELECT statement based on user's choices for date (I assumed you want to limit the query to begindatetime values for a single date), time units, and group interval.
This one would be for Jun 14, 2011 as date, minutes as time units, and 15 minutes as the interval.
SELECT
Partition(elapsed,0,1440,15) AS time_block,
q.id,
q.begindatetime
FROM
[SELECT
t.id,
t.begindatetime,
TimeValue(t.begindatetime) * 1440 AS elapsed
FROM tblHK1 AS t
WHERE
t.begindatetime>=#2011-06-14#
And t.begindatetime<#2011-06-15#
]. AS q
ORDER BY q.begindatetime;
Not sure how much you'll like this, though. Here's some sample output:
time_block id begindatetime
60: 74 1 6/14/2011 1:06:05 AM
555: 569 3 6/14/2011 9:15:00 AM
1395:1409 4 6/14/2011 11:15:00 PM
The time_block column isn't very user friendly.

I am not quite sure what you want, but here is one idea:
SELECT DateDiff("n",CDate("00:00"),[BeginDateTime])\15 AS No15s,
(DateDiff("n",CDate("00:00"),[BeginDateTime])\15)*15 AS NoMins,
Count(Table1.BeginDateTime) AS [Count]
FROM Table1
GROUP BY DateDiff("n",CDate("00:00"),[BeginDateTime])\15,
(DateDiff("n",CDate("00:00"),[BeginDateTime])\15)*15;

Related

Performing of my basic query taking long time

I use MsSQL. I have a "jobs" table which has 140 columns and includes more than 4 million records in it. This table's columns mostly varchar and bit.
The table's 40 columns connected to some other tables. Like "issuerid" from "issuers" table, "fileid" from "files"...
The indexes of table is only on the "fileid" which is non-unique and non-clustered.
My basic query is like in the following:
select issuerid,count(id) as total , sum(case when X_Status=1 then 1 else 0 end) P_Count
from jobs where 1=1 and issuerid='1001' and creationdate between '01/01/2019 12:00:01 AM' and '06/30/2019 11:59:59 PM' group by issuerid
The duration of the query is: 1min 20seconds (The PC has SSD and 4GB Ram)
So i tried to index on issuerid but it didn't affect as much.
I have a lot of queries on this table for my asp page. For example the sum case changes mostly;
sum(case when Y_Status=1 then 1 else 0 end) P_Count
Like this.
So even tried to let 2 columns in the table and executed this query
select count(id) as, sum(case when X_Status=1 then 1 else 0 end) P_Count from newjobs where 1=1
and this took around 30seconds.
I read many topics and article to improve query performance but didn't work. Is there anyone who has any idea to share?
Thank you.
The following should work for your exact query:
CREATE NONCLUSTERED INDEX IX_Jobs__IssuerID_CreationDate ON dbo.Jobs (IssuerID, CreationDate)
INCLUDE (X_Status);
Since your query filters on IssuerID and CreationDate these are the key columns, then I hav eadded X_Status as a non key column so that the whole query can be run from this index and there is no chance of a bookmark lookup or an index scan.
As an aside, your current where clause will always exclude things that happen in the first second of the first day and the last second of the last day (i.e between 00:00:00 and 00:00:01on 1st January, and 06/30/2019 23:59:59 and 07/01/2019 00:00:00). This may be deliberate, but I suspect it isn't. It is usually much better, and also more clear as to your intentions to use an open ended date range.
WHERE CreationDate > '20190101'
AND CreationDate < '20190701'
Or More likely:
WHERE CreationDate >= '20190101'
AND CreationDate < '20190701'
I have also swtiched to a culture invariant date time format, so that the date literal is interpretted as the same date on every machine. For more reading see:
What do BETWEEN and the devil have in common?
Bad habits to kick : mis-handling date / range queries

Giving the wrong records when used datetime parameter in MS Access Query

I am working MS-Access 2007 DB .
I am trying to write the query for the Datetime, I want to get records between 14 December and 16 December so I write the bellow query.
SELECT * FROM Expense WHERE CreatedDate > #14-Dec-15# and CreatedDate < #16-Dec-15#
( I have to use the two dates for the query.)
But It returning the records having CreatedDate is 14 December...
Whats wrong with the query ?
As #vkp mentions in the comments, there is a time part to a date as well. If it is not defined it defaults to midnight (00:00:00). As 14-dec-2015 6:46:56 is after 14-dec-2015 00:00:00 it is included in the result set. You can use >= 15-dec-15 to get around this, as it will also include records from 15-dec-2015. Same goes for the end date.
It seems you want only records from Dec 15th regardless of the time of day stored in CreatedDate. If so, this query should give you what you want with excellent performance assuming an index on CreatedDate ...
SELECT *
FROM Expense
WHERE CreatedDate >= #2015-12-15# and CreatedDate < #2015-12-16#;
Beware of applying functions to your target field in the WHERE criterion ... such as CDATE(INT(CreatedDate)). Although logically correct, it would force a full table scan. That might not be a problem if your Expense table contains only a few rows. But for a huge table, you really should try to avoid a full table scan.
You must inlcude the time in your thinking:
EDIT: I wrote this with the misunderstanding, that you wanted to
include data rows from 14th to 16th of Dec (three full days).
If you'd write <#17-Dec-15# it would be the full 16th. Or you'd have to write <=#16-Dec-15 23:59:59#.
A DateTime on the 16th of December with a TimePart of let's say 12:30 is bigger than #16-Dec-15#...
Just some backgorund: In Ms-Access a DateTime is stored as a day's number and a fraction part for the time. 0.5 is midday, 0.25 is 6 in the morning...
Comparing DateTime values means to compare Double-values in reality.
Just add one day to your end date and exclude this:
SELECT * FROM Expense WHERE CreatedDate >= #2015/12/14# AND CreatedDate < #2015/12/17#
Thanks A Lot guys for your help...
I finally ended with the solution given by Darren Bartrup-Cook and Gustav ....
My previous query was....
SELECT * FROM Expense WHERE CreatedDate > #14-Dec-15# and CreatedDate < #16-Dec-15#
And the New working query is...
SELECT * FROM Expense WHERE CDATE(INT(CreatedDate)) > #14-Dec-15# and CDATE(INT(CreatedDate)) < #16-Dec-15#

SQL Query data issues

I have the following data:
ID Date interval interval_date tot_activity non-activity
22190 2011-09-27 00:00:00 1000 2011-09-27 10:00:00.000 265 15
I have another table with this data:
Date ID Start END sched_non_activity non_activity
10/3/2011 12:00:00 AM HBLV-22267 10/3/2011 2:02:00 PM 10/3/2011 2:11:00 PM 540
Now, in the second table's non_activity field, I would like this to be the value from the first table. However, I need to capture the tot_activity - non_activity where the intervals(in 15 min increments) from the first table, fall in the same time frame as the start and end of the second table.
I have the following so far:
SELECT 1.ID, 1.Date, 1.interval, 1.interval_date, 1.tot_activity, 1.non_activity,
1.tot_activity - 1.non_activity AS non_activity
FROM table1 AS 1 INNER JOIN
LIST AS L ON 1.ID = L.ID INNER JOIN
table2 AS 2 ON 1.Date = 2.Date AND L.ID = Right(2.ID,5)
Where 1.interval_date >= 2.Start AND 1.interval_date < 2.End
ORDER BY 1.ID, 1.interval_date
With this, I can already see I will be unable to capture if a start from table 2 is at 15:50, which means that I need to capture interval 15:45.
is there any way of doing this through queries, or should I be using variables, and doing the check per interval. Any help at all would be greatly appreciated.
I think you are asking too much from a query here.
What i would do is treat the two tables as lists ordered by time stamps and solve the problem programatically (ie not with a single query)
For example, create a function that traverses the first table in 15min increments and find the best match in the second table (i am guessing this is what you are trying to do). Implement your function to return the same results set as your query above or store it in a temporary table. Select from the result set. T-SQL is your friend :)
I'm having a tough time understanding your issue, but you might have better luck with the DATEDIFF function:
DATEDIFF(SECOND, 1.interval_date, 2.Start) >= 0 AND DATEDIFF(SECOND, 1.interval_date, 2.End) <= 0
I apologize if I'm not catching your drift. If I'm missing something, could you try to clarify a little bit?

how to get data whose expired within 45 days..?

HI all,
i have one sql table and field for that table is
id
name
expireydate
Now i want only those record which one is expired within 45 days or 30 days.
how can i do with sql query .?
I have not much more exp with sql .
Thanks in advance,
If you are using mysql then try DATEDIFF.
for 45 days
select * from `table` where DATEDIFF(now(),expireydate)<=45;
for 30 days
select * from `table` where DATEDIFF(now(),expireydate)<=30;
In oracle - will do the trick instead of datediff and SYSDATE instead of now().[not sure]
In sql server DateDiff is quite different you have to provide unit in which difference to be taken out from 2 dates.
DATEDIFF(datepart,startdate,enddate)
to get current date try one of this: CURRENT_TIMESTAMP or GETDATE() or {fn NOW()}
You can use a simple SELECT * FROM yourtable WHERE expireydate < "some formula calculating today+30 or 45 days".
Simple comparison will work there, the tricky part is to write this last bit concerning the date you want to compare to. It'll depend of your environment and how you stored the "expireydate" in the database.
Try Below:-
SELECT * FROM MYTABLE WHERE (expireydate in days) < ((CURRENTDATE in days)+ 45)
Do not execute directly! Depending of your database, way of obtaining a date in days will be different. Go look at your database manual or please precise what is your database.

How do I analyse time periods between records in SQL data without cursors?

The root problem: I have an application which has been running for several months now. Users have been reporting that it's been slowing down over time (so in May it was quicker than it is now). I need to get some evidence to support or refute this claim. I'm not interested in precise numbers (so I don't need to know that a login took 10 seconds), I'm interested in trends - that something which used to take x seconds now takes of the order of y seconds.
The data I have is an audit table which stores a single row each time the user carries out any activity - it includes a primary key, the user id, a date time stamp and an activity code:
create table AuditData (
AuditRecordID int identity(1,1) not null,
DateTimeStamp datetime not null,
DateOnly datetime null,
UserID nvarchar(10) not null,
ActivityCode int not null)
(Notes: DateOnly (datetime) is the DateTimeStamp with the time stripped off to make group by for daily analysis easier - it's effectively duplicate data to make querying faster).
Also for the sake of ease you can assume that the ID is assigned in date time order, that is 1 will always be before 2 which will always be before 3 - if this isn't true I can make it so).
ActivityCode is an integer identifying the activity which took place, for instance 1 might be user logged in, 2 might be user data returned, 3 might be search results returned and so on.
Sample data for those who like that sort of thing...:
1, 01/01/2009 12:39, 01/01/2009, P123, 1
2, 01/01/2009 12:40, 01/01/2009, P123, 2
3, 01/01/2009 12:47, 01/01/2009, P123, 3
4, 01/01/2009 13:01, 01/01/2009, P123, 3
User data is returned (Activity Code 2) immediate after login (Activity Code 1) so this can be used as a rough benchmark of how long the login takes (as I said, I'm interested in trends so as long as I'm measuring the same thing for May as July it doesn't matter so much if this isn't the whole login process - it takes in enough of it to give a rough idea).
(Note: User data can also be returned under other circumstances so it's not a one to one mapping).
So what I'm looking to do is select the average time between login (say ActivityID 1) and the first instance after that for that user on that day of user data being returned (say ActivityID 2).
I can do this by going through the table with a cursor, getting each login instance and then for that doing a select to say get the minimum user data return following it for that user on that day but that's obviously not optimal and is slow as hell.
My question is (finally) - is there a "proper" SQL way of doing this using self joins or similar without using cursors or some similar procedural approach? I can create views and whatever to my hearts content, it doesn't have to be a single select.
I can hack something together but I'd like to make the analysis I'm doing a standard product function so would like it to be right.
SELECT TheDay, AVG(TimeTaken) AvgTimeTaken
FROM (
SELECT
CONVERT(DATE, logins.DateTimeStamp) TheDay
, DATEDIFF(SS, logins.DateTimeStamp,
(SELECT TOP 1 DateTimeStamp
FROM AuditData userinfo
WHERE UserID=logins.UserID
and userinfo.ActivityCode=2
and userinfo.DateTimeStamp > logins.DateTimeStamp )
)TimeTaken
FROM AuditData logins
WHERE
logins.ActivityCode = 1
) LogInTimes
GROUP BY TheDay
This might be dead slow in real world though.
In Oracle this would be a cinch, because of analytic functions. In this case, LAG() makes it easy to find the matching pairs of activity codes 1 and 2 and also to calculate the trend. As you can see, things got worse on 2nd JAN and improved quite a bit on the 3rd (I'm working in seconds rather than minutes).
SQL> select DateOnly
2 , elapsed_time
3 , elapsed_time - lag (elapsed_time) over (order by DateOnly) as trend
4 from
5 (
6 select DateOnly
7 , avg(databack_time - prior_login_time) as elapsed_time
8 from
9 ( select DateOnly
10 , databack_time
11 , ActivityCode
12 , lag(login_time) over (order by DateOnly,UserID, AuditRecordID, ActivityCode) as prior_login_time
13 from
14 (
15 select a1.AuditRecordID
16 , a1.DateOnly
17 , a1.UserID
18 , a1.ActivityCode
19 , to_number(to_char(a1.DateTimeStamp, 'SSSSS')) as login_time
20 , 0 as databack_time
21 from AuditData a1
22 where a1.ActivityCode = 1
23 union all
24 select a2.AuditRecordID
25 , a2.DateOnly
26 , a2.UserID
27 , a2.ActivityCode
28 , 0 as login_time
29 , to_number(to_char(a2.DateTimeStamp, 'SSSSS')) as databack_time
30 from AuditData a2
31 where a2.ActivityCode = 2
32 )
33 )
34 where ActivityCode = 2
35 group by DateOnly
36 )
37 /
DATEONLY ELAPSED_TIME TREND
--------- ------------ ----------
01-JAN-09 120
02-JAN-09 600 480
03-JAN-09 150 -450
SQL>
Like I said in my comment I guess you're working in MSSQL. I don't know whether that product has any equivalent of LAG().
If the assumptions are that:
Users will perform various tasks in no mandated order, and
That the difference between any two activities reflects the time it takes for the first of those two activities to execute,
Then why not create a table with two timestamps, the first column containing the activity start time, the second column containing the next activity start time. Thus the difference between these two will always be total time of the first activity. So for the logout activity, you would just have NULL for the second column.
So it would be kind of weird and interesting, for each activity (other than logging in and logging out), the time stamp would be recorded in two different rows--once for the last activity (as the time "completed") and again in a new row (as time started). You would end up with a jacob's ladder of sorts, but finding the data you are after would be much more simple.
In fact, to get really wacky, you could have each row have the time that the user started activity A and the activity code, and the time started activity B and the time stamp (which, as mentioned above, gets put down again for the following row). This way each row will tell you the exact difference in time for any two activities.
Otherwise, you're stuck with a query that says something like
SELECT TIME_IN_SEC(row2-timestamp) - TIME_IN_SEC(row1-timestamp)
which would be pretty slow, as you have already suggested. By swallowing the redundancy, you end up just querying the difference between the two columns. You probably would have less need of knowing the user info as well, since you'd know that any row shows both activity codes, thus you can just query the average for all users on any given day and compare it to the next day (unless you are trying to find out which users are having the problem as well).
This is the faster query to find out, in one row you will have current and row before datetime value, after that you can use DATEDIFF ( datepart , startdate , enddate ). I use #DammyVariable and DamyField as i remember the is some problem if is not first #variable=Field in update statement.
SELECT *, Cast(NULL AS DateTime) LastRowDateTime, Cast(NULL As INT) DamyField INTO #T FROM AuditData
GO
CREATE CLUSTERED INDEX IX_T ON #T (AuditRecordID)
GO
DECLARE #LastRowDateTime DateTime
DECLARE #DammyVariable INT
SET #LastRowDateTime = NULL
SET #DammyVariable = 1
UPDATE #T SET
#DammyVariable = DammyField = #DammyVariable
, LastRowDateTime = #LastRowDateTime
, #LastRowDateTime = DateTimeStamp
option (maxdop 1)