Can cumulative data for time-bands be calculated using SQL? - sql

I'm wondering if either a running total or total-by-time-block for sales data can be generated using only SQL.
Let's say I have a simple table that records sales and the time they occurred.
ID | Timestamp | Amount
1 | 2014-03-04 09:00:00 | 25.00
2 | 2014-03-04 09:02:25 | 15.00
3 | 2014-03-04 09:13:00 | 5.00
4 | 2014-03-04 09:16:11 | 17.50
5 | 2014-03-04 09:28:18 | 44.50
...
I can easily calculate the total sales for a day with a query like:
SELECT sum(Amount) from Sales
WHERE Timestamp BETWEEN '2014-03-04 00:00:00' AND '2014-03-04 23:59:59'
But I'd like to all calculate the amounts sold during each (say) 15 minute period to get a result like:
08:45 | 0.00
09:00 | 45.00
09:15 | 62:00
...
and a cumulative running total for each (say) 15 minute period to produce a result like:
08:45 | 0:00
09:00 | 40.00
09:15 | 107:00
...
I can write a simple program or use a spreadsheet to achieve these two results given the raw data, but I'm wondering how to do it just using SQL. Is it possible? If so, how?
EDIT: If possible, a DB-agnostic solution would be preferred. I use SQL Server at present.

In SQL Server 2012, you can do this using the cumulative sum window function. You can also get the timeslot in a way that comes close to working in more than one database:
select timeslot,
sum(amount) as amount,
sum(sum(amount)) over (order by timeslot) as cumamount
from (select t.*,
(cast('2014-03-04 00:00:00' as datetime) +
cast( ("timestamp" - cast('2014-03-04 00:00:00' as datetime))*24*4 as int)/(24.0*4)
) as timeslot
from table t
) t
where Timestamp between '2014-03-04 00:00:00' and '2014-03-04 23:59:59'
group by timeslot;
The idea behind the timeslot calculation is to take the difference between timestamp and midnight of some day. This gives the number of days (with fractions) between the two dates. Then multiply this by 24 for hours and 4 for the 15-minute intervals, and it gives the number of 15 minute intervals since midnight on some date. Truncate this value by converting to an integer and add back to the original date. This is all done in a subquery, so the calculation can be repeated.
This approach will work in many databases, though there might be some nuances on the exact expression. The formatting of the datetime would be rather database specific.
The rest is just using the cumulative sum function. If you don't have this, then you can use a correlated subquery instead.

I do not have sol for first request. (total for every "Timeslot" kind of query)
but I do have sol for second request. (cumulative running total for each "Timeslot")
AS Gordon mentioned with SQL server 2012 this is much simpler.
yet as I am providing an old way which can be done on SQL 2005 onward.
Also solution is not 100% database agnostic, but easier to translate from SQL-SERVER to ORACLE or DB2 or anything else.
before going to actual query check out the functions i created to simply give me a TimeSlot values when I give two Date Range. UFN to GET TIMESLOT Values
Note that the function is created at different granularity level by Slot Type. Hour, Minute, Second etc.... you can create new as you like.
In the below sample query I am choosing the timeslot of 11-Seconds.
check the result here. Sample Output
DECLARE #dt TABLE
(
RowID INT IDENTITY NOT NULL
,LastModified DATETIME2(2) NOT NULL
,Amount INT NOT NULL DEFAULT 0
)
INSERT INTO #dt( LastModified, Amount )
SELECT '2014-03-04 00:00:00.00', 10
UNION ALL SELECT '2014-03-04 00:00:05.00', 10
UNION ALL SELECT '2014-03-04 00:00:10.00', 10
UNION ALL SELECT '2014-03-04 00:00:15.00', 10
UNION ALL SELECT '2014-03-04 00:00:20.00', 10
UNION ALL SELECT '2014-03-04 00:00:25.00', 10
UNION ALL SELECT '2014-03-04 00:00:30.00', 10
UNION ALL SELECT '2014-03-04 00:00:35.00', 10
UNION ALL SELECT '2014-03-04 00:00:40.00', 10
UNION ALL SELECT '2014-03-04 00:00:45.00', 10
UNION ALL SELECT '2014-03-04 00:00:50.00', 10
DECLARE #DatePart sysname
,#SlotValue INT
,#MinDt DATETIME2(2)
,#MaxDt DATETIME2(2)
SET #SlotValue = 11
SELECT #MinDt=MIN(LastModified)
,#MaxDt=MAX(LastModified)
FROM #dt
;WITH AllDt(RowID,timeslot,amount)
AS
(
SELECT CAST (ROW_NUMBER() OVER (ORDER BY COALESCE(t1.TimeSlot,t2.LastModified)) AS INT) RowID
,COALESCE(t1.TimeSlot,t2.LastModified)
,ISNULL(t2.Amount,0) AS Amount
FROM dbo.ufn_utl_timeslotBySecond(#SlotValue,#MinDt,#MaxDt) t1
FULL OUTER JOIN #dt t2
ON t1.TimeSlot=t2.LastModified
)
,
RCTE1(RowID,timeslot,amount)
AS
(
SELECT RowID
,timeslot
,Amount
FROM AllDt
WHERE RowID=1
UNION ALL
SELECT dt.RowID,dt.TimeSlot,CAST(dt.Amount+t3.amount AS INT) AS amount
FROM ALLDt dt
JOIN RCTE1 t3
ON dt.RowID=t3.RowID+1
)
SELECT *
FROM RCTE1
ORDER BY TimeSlot

Related

T-sql count number of times a week on rows with date interval

If you have table like this:
Name
Data type
UserID
INT
StartDate
DATETIME
EndDate
DATETIME
With data like this:
UserID
StartDate
EndDate
21
2021-01-02 00:00:00
2021-01-02 23:59:59
21
2021-01-03 00:00:00
2021-01-04 15:42:00
24
2021-01-02 00:00:00
2021-01-06 23:59:59
And you want to calculate number of users that is represented on each day in a week with a result like this:
Year
Week
NumberOfTimes
2021
1
8
2021
2
10
2021
3
4
Basically I want to to a Select like this:
SELECT YEAR(dateColumn) AS yearname, WEEK(dateColumn)as week name, COUNT(somecolumen)
GROUP BY YEAR(dateColumn) WEEK(dateColumn)
The problem I have is the start and end date if the date goes over several days I want it to counted each day. Preferably I don't want the same user counted twice each day. There are millions of rows that are constantly being deleted and added so speed is key.
The database is MS-SQL 2019
I would suggest a recursive CTE:
with cte as (
select userid, startdate, enddate
from t
union all
select userid, startdate,
enddate
from cte
where startdate < enddate and
week(startdate) <> week(enddate)
)
select year(startdate), week(startdate), count(*)
from cte
group by year(startdate), week(startdate)
option (maxrecursion 0);
The CTE expands the data by adding 7 days to each row. This should be one day per week.
There is a little logic in the second part to handle the situation where the enddate ends in the same week as the last start date. The above solution assumes that the dates are all in the same year -- which seems quite reasonable given the sample data. There are other ways to prevent this problem.
You need to cross-join each row with the relevant dates.
Create a calendar table with columns of years and weeks, include a start and end date of the week. See here for an example of how to create one, and make sure you index those columns.
Then you can cross-join like this
SELECT
YEAR(dateColumn) AS yearname,
WEEK(dateColumn)as weekname,
COUNT(somecolumen)
FROM Table t
JOIN CalendarWeek c ON c.StartDate >= t.StartDate AND c.EndDate <= t.EndDate
GROUP BY YEAR(dateColumn), WEEK(dateColumn)

create a temporary sql table using recursion as a loop to populate custom time interval

Suppose you have a table like:
id subscription_start subscription_end segment
1 2016-12-01 2017-02-01 87
2 2016-12-01 2017-01-24 87
...
And wish to generate a temporary table with months.
One way would be to encode the month date as:
with months as (
select
'2016-12-01' as 'first',
'2016-12-31' as 'last'
union
select
'2017-01-01' as 'first',
'2017-01-31' as 'last'
...
) select * from months;
So that I have an output table like:
first_day last_day
2017-01-01 2017-01-31
2017-02-01 2017-02-31
2017-03-01 2017-03-31
I would like to generate a temporary table with a custom interval (above), without manually encoding all the dates.
Say the interval is of 12 months, for each year, for as many years there are in the db.
I'd like to have general approach to compute the months table with the same output as above.
Or, one may adjust the range to a custom interval (months split an year in 12 parts, but one may want to split a time in a custom interval of days).
To start, I was thinking to use recursive query like:
with months(id, first_day, last_day, month) as (
select
id,
first_day,
last_day,
0
where
subscriptions.first_day = min(subscriptions.first_day)
union all
select
id,
first_day,
last_day,
months.month + 1
from
subscriptions
left join months on cast(
strftime('%m', datetime(subscriptions.subscription_start)) as int
) = months.month
where
months.month < 13
)
select
*
from
months
where
month = 1;
but it does not do what I'd expect: here I was attempting to select the first row from the table with the minimum date, and populate a table at interval of months, ranging from 1 to 12. For each month, I was comparing the string date field of my table (e.g. 2017-03-01 = 3 is march).
The query above does work and also seems a bit complicated, but for the sake of learning, which alternative would you propose to create a temporary table months without manually coding the intervals ?

Grouping Timestamps based on the interval between them

I have a table in Hive (SQL) with a bunch of timestamps that need to be grouped in order to create separate sessions based on the time difference between the timestamps.
Example:
Consider the following timestamps(Given in HH:MM for simplicity):
9.00
9.10
9.20
9.40
9.43
10.30
10.45
11.25
12.30
12.33
and so on..
So now, all timestamps that fall within 30 mins of the next timestamp come under the same session,
i.e. 9.00,9.10,9.20,9.40,9.43 form 1 session.
But since the difference between 9.43 and 10.30 is more than 30 mins, the time stamp 10.30 falls under a different session. Again, 10.30 and 10.45 fall under one session.
After we have created these sessions, we have to obtain the minimum timestamp for that session and the max timestamp.
I tried to subtract the current timestamp with its LEAD and place a flag if it is greater than 30 mins, but I'm having difficulty with this.
Any suggestion from you guys would be greatly appreciated. Please let me know if the question isn't clear enough.
Expected Output for this sample data:
Session_start Session_end
9.00 9.43
10.30 10.45
11.25 11.25 (same because the next time is not within 30 mins)
12.30 12.33
Hope this helps.
So it's not MySQL but Hive. I don't know Hive, but if it supports LAG, as you say, try this PostgreSQL query. You will probably have to change the time difference calculation, that's usually different from one dbms to another.
select min(thetime) as start_time, max(thetime) as end_time
from
(
select thetime, count(gap) over (rows between unbounded preceding and current row) as groupid
from
(
select thetime, case when thetime - lag(thetime) over (order by thetime) > interval '30 minutes' then 1 end as gap
from mytable
) times
) groups
group by groupid
order by min(thetime);
The query finds gaps, then uses a running total of gap counts to build group IDs, and the rest is aggregation.
SQL fiddle: http://www.sqlfiddle.com/#!17/8bc4a/6.
With MySQL lacking LAG and LEAD functions, getting the previous or next record is some work already. Here is how:
select
thetime,
(select max(thetime) from mytable afore where afore.thetime < mytable.thetime) as afore_time,
(select min(thetime) from mytable after where after.thetime > mytable.thetime) as after_time
from mytable;
Based on this we can build the whole query where we are looking for gaps (i.e. the time difference to the previous or next record is more than 30 minutes = 1800 seconds).
select
startrec.thetime as start_time,
(
select min(endrec.thetime)
from
(
select
thetime,
coalesce(time_to_sec(timediff((select min(thetime) from mytable after where after.thetime > mytable.thetime), thetime)), 1801) > 1800 as gap
from mytable
) endrec
where gap
and endrec.thetime >= startrec.thetime
) as end_time
from
(
select
thetime,
coalesce(time_to_sec(timediff(thetime, (select max(thetime) from mytable afore where afore.thetime < mytable.thetime))), 1801) > 1800 as gap
from mytable
) startrec
where gap;
SQL fiddle: http://www.sqlfiddle.com/#!2/d307b/20.
Try this..
SELECT MIN(session_time_tmp) session_start, MAX(session_time_tmp) session_end FROM
(
SELECT IF((TIME_TO_SEC(TIMEDIFF(your_time_field, COALESCE(#previousValue, your_time_field))) / 60) > 30 ,
#sessionCount := #sessionCount + 1, #sessionCount ) sessCount,
( #previousValue := your_time_field ) session_time_tmp FROM
(
SELECT your_time_field, #previousValue:= NULL, #sessionCount := 1 FROM yourtable ORDER BY your_time_field
) a
) b
GROUP BY sessCount
Just replace yourtable and your_time_field
Try this:
SELECT DATE_FORMAT(MIN(STR_TO_DATE(B.column1, '%H.%i')), '%H.%i') AS Session_start,
DATE_FORMAT(MAX(STR_TO_DATE(B.column1, '%H.%i')), '%H.%i') AS Session_end
FROM tableA A
LEFT JOIN ( SELECT A.column1, diff, IF(#diff:=diff < 30, #id, #id:=#id+1) AS rnk
FROM (SELECT B.column1, TIME_TO_SEC(TIMEDIFF(STR_TO_DATE(B.column1, '%H.%i'), STR_TO_DATE(A.column1, '%H.%i'))) / 60 AS diff
FROM tableA A
INNER JOIN tableA B ON STR_TO_DATE(A.column1, '%H.%i') < STR_TO_DATE(B.column1, '%H.%i')
GROUP BY STR_TO_DATE(A.column1, '%H.%i')
) AS A, (SELECT #diff:=0, #id:= 1) AS B
) AS B ON A.column1 = B.column1
GROUP BY IFNULL(B.rnk, 1);
Check the SQL FIDDLE DEMO
OUTPUT
| SESSION_START | SESSION_END |
|---------------|-------------|
| 9.00 | 9.43 |
| 10.30 | 10.45 |
| 11.25 | 11.25 |
| 12.30 | 12.33 |

SQL -- return 0s if no group exists

I have a rollup table that sums up raw data for a given hour. It looks something like this:
stats_hours:
- obj_id : integer
- start_at : datetime
- count : integer
The obj_id points to a separate table, the start_at field contains a timestamp for the beginning of the hour of the data, and the count contains the sum of the data for that hour.
I would like to build a query that returns a set of data per day, so something like this:
Date | sum_count
2014-06-01 | 2000
2014-06-02 | 3000
2014-06-03 | 0
2014-06-04 | 5000
The query that I built does a grouping on the date column and sums up the count:
SELECT date(start_at) as date, sum(count) as sum_count
FROM stats_hours GROUP BY date;
This works fine unless I have no data for a given date, in which case it obviously leaves out the row:
Date | sum_count
2014-06-01 | 2000
2014-06-02 | 3000
2014-06-04 | 5000
Does anyone know of a good way in SQL to return a zeroed-out row in the case that there is no data for a given date group? Maybe some kind of case statement?
You need a full list of dates first, then connect that list to your available dates and group by that. Try the following:
--define start and end limits
Declare #todate datetime, #fromdate datetime
Select #fromdate='2009-03-01', #todate='2014-06-04'
;With DateSequence( Date ) as
(
Select #fromdate as Date
union all
Select dateadd(day, 1, Date)
from DateSequence
where Date < #todate
)
--select result
SELECT DateSequence.Date, SUM(Stats_Hours.Count) AS Sum_Count
FROM
DateSequence
LEFT JOIN
Stats_Hours ON DateSequence.Date = Stats_Hours.Start_At
GROUP BY DateSequence.Date
option (MaxRecursion 0)
EDIT: CTE code from this post

Select repeat occurrences within time period <x days

If I had a large table (100000 + entries) which had service records or perhaps admission records. How would I find all the instances of re-occurrence within a set number of days.
The table setup could be something like this likely with more columns.
Record ID Customer ID Start Date Time Finish Date Time
1 123456 24/04/2010 16:49 25/04/2010 13:37
3 654321 02/05/2010 12:45 03/05/2010 18:48
4 764352 24/03/2010 21:36 29/03/2010 14:24
9 123456 28/04/2010 13:49 31/04/2010 09:45
10 836472 19/03/2010 19:05 20/03/2010 14:48
11 123456 05/05/2010 11:26 06/05/2010 16:23
What I am trying to do is work out a way to select the records where there is a re-occurrence of the field [Customer ID] within a certain time period (< X days). (Where the time period is Start Date Time of the 2nd occurrence - Finish Date Time of the first occurrence.
This is what I would like it to look like once it was run for say x=7
Record ID Customer ID Start Date Time Finish Date Time Re-occurence
9 123456 28/04/2010 13:49 31/04/2010 09:45 1
11 123456 05/05/2010 11:26 06/05/2010 16:23 2
I can solve this problem with a smaller set of records in Excel but have struggled to come up with a SQL solution in MS Access. I do have some SQL queries that I have tried but I am not sure I am on the right track.
Any advice would be appreciated.
I think this is a clear expression of what you want. It's not extremely high performance but I'm not sure that you can avoid either correlated sub-query or a cartesian JOIN of the table to itself to solve this problem. It is standard SQL and should work in most any engine, although the details of the date math may differ:
SELECT * FROM YourTable YT1 WHERE EXISTS
(SELECT * FROM YourTable YT2 WHERE
YT2.CustomerID = YT1.CustomerID AND YT2.StartTime <= YT2.FinishTime + 7)
In order to accomplish this you would need to make a self join as you are comparing the entire table to itself. Assuming similar names it would look something like this:
select r1.customer_id, min(start_time), max(end_time), count(1) as reoccurences
from records r1,
records r2
where r1.record_id > r2.record_id -- this ensures you don't double count the records
and r1.customer_id = r2.customer_id
and r1.finish_time - r2.start_time <= 7
group by r1.customer_id
You wouldn't be able to easily get both the record_id and the number of occurences, but you could go back and find it by correlating the start time to the record number with that customer_id and start_time.
This will do it:
declare #t table(Record_ID int, Customer_ID int, StartDateTime datetime, FinishDateTime datetime)
insert #t values(1 ,123456,'2010-04-24 16:49','2010-04-25 13:37')
insert #t values(3 ,654321,'2010-05-02 12:45','2010-05-03 18:48')
insert #t values(4 ,764352,'2010-03-24 21:36','2010-03-29 14:24')
insert #t values(9 ,123456,'2010-04-28 13:49','2010-04-30 09:45')
insert #t values(10,836472,'2010-03-19 19:05','2010-03-20 14:48')
insert #t values(11,123456,'2010-05-05 11:26','2010-05-06 16:23')
declare #days int
set #days = 7
;with a as (
select record_id, customer_id, startdatetime, finishdatetime,
rn = row_number() over (partition by customer_id order by startdatetime asc)
from #t),
b as (
select record_id, customer_id, startdatetime, finishdatetime, rn, 0 recurrence
from a
where rn = 1
union all
select a.record_id, a.customer_id, a.startdatetime, a.finishdatetime,
a.rn, case when a.startdatetime - #days < b.finishdatetime then recurrence + 1 else 0 end
from b join a
on b.rn = a.rn - 1 and b.customer_id = a.customer_id
)
select record_id, customer_id, startdatetime, recurrence from b
where recurrence > 0
Result:
https://data.stackexchange.com/stackoverflow/q/112808/
I just realize it should be done in access. I am so sorry, this was written for sql server 2005. I don't know how to rewrite it for access.