How to group and count by date and 8-hr interval on those dates in datetime SQL column? - sql

I have a table with a column visit_date that is a datetime object with format YYYY-MM-DD HH:MI:SS that looks like the following:
visit_date |visit_id
-------------------|-----
2010-11-01 00:02:00|92314
2010-11-01 23:05:21|23498
2010-11-01 12:42:31|12343
2010-11-02 05:13:21|79881
2010-11-02 14:35:15|22134
2010-11-02 16:12:23|12348
2010-11-03 01:22:44|12384
2010-11-03 05:23:41|12394
2010-11-03 15:13:55|99384
I would like to group by date and by 8-hr window on that date such that I have:
interval |count
-------------------|-----
2010-11-01 00:00:00|1
2010-11-01 08:00:00|2
2010-11-01 16:00:00|3
2010-11-02 00:00:00|4
2010-11-02 08:00:00|5
2010-11-02 16:00:00|6
2010-11-03 00:00:00|7
2010-11-03 08:00:00|8
2010-11-03 16:00:00|9
My original query (using only dates) was:
SELECT CAST(visit_date as DATE), count(1) as count
FROM table
GROUP BY CAST(visit_date as DATE)
ORDER BY CAST(visit_date as DATE)
But that only groups by date.
Is there a recommended way to get interval counts for each interval per day? I have seen implementations using DATEADD and DATEPART but not sure which makes the most sense in this situation.
Thanks!

Add the hours in to what you group and count:
SELECT
CAST(visit_date as DATE),
HOUR(visit_date)/8 as ival8h
count(1) as count
FROM table
GROUP BY CAST(visit_date as DATE), HOUR(visit_date)/8
ORDER BY CAST(visit_date as DATE)
The hour function returns the hour number of the passed date, divide it by 8 to get an int of the interval, so 0 to 7 becomes 0, 8 to 16 becomes 1 etc
If you want it back as a time pegged to a round 8h multiply it by 8 again and format it to NN:00:00, or add it to the date, thus:
SELECT
DATEADD(hour, (HOUR(visit_date)/8)*8, CAST(CAST(visit_date as DATE) as DATETIME) as quantized_date,
count(1) as count
FROM table
GROUP BY DATEADD(hour, (HOUR(visit_date)/8)*8, CAST(CAST(visit_date as DATE) as DATETIME)
ORDER BY CAST(visit_date as DATE)
This basically rounds the hours down to the lesser 8h market and adds that to midnight. Two casts are required (probably) on the date because DATEADD won't add hours to a date, only a datetime but we need the cast to date to peg the tine element to midnight
If you want there to be a date and a 0 count for periods where no events took place, use a numbers table or row generator and create a sequence of dates to left join your real data onto, then count the real data grouped by the fake dates

Use a cross apply to form 4 shift boundary values, then use those in a case expression to genertate the group by values
SELECT
case
when visit_date >= s1 and visit_date < s2 then s1
when visit_date >= s2 and visit_date < s3 then s2
when visit_date >= s3 and visit_date < s4 then s3
end as shift
, count(1) as count
FROM mytable
CROSS APPLY (
select
cast(CAST(visit_date as DATE)as datetime) s1
, dateadd(hh,8,cast(CAST(visit_date as DATE)as datetime)) s2
, dateadd(hh,16,cast(CAST(visit_date as DATE)as datetime)) s3
, dateadd(hh,24,cast(CAST(visit_date as DATE)as datetime)) s4
) ca
GROUP BY
case
when visit_date >= s1 and visit_date < s2 then s1
when visit_date >= s2 and visit_date < s3 then s2
when visit_date >= s3 and visit_date < s4 then s3
end
ORDER BY shift
result:
+----+---------------------+-------+
| | shift | count |
+----+---------------------+-------+
| 1 | 01.11.2010 00:00:00 | 1 |
| 2 | 01.11.2010 08:00:00 | 1 |
| 3 | 01.11.2010 16:00:00 | 1 |
| 4 | 02.11.2010 00:00:00 | 1 |
| 5 | 02.11.2010 08:00:00 | 1 |
| 6 | 02.11.2010 16:00:00 | 1 |
| 7 | 03.11.2010 00:00:00 | 2 |
| 8 | 03.11.2010 08:00:00 | 1 |
+----+---------------------+-------+

I think the canonical way in SQL Server is to use dateadd() and datediff():
select dateadd(hour, 0, 3 * (datediff(hour, 0, visit_date) / 3)) as day_hour8,
count(*)
from t
group by dateadd(hour, 0, 3 * (datediff(hour, 0, visit_date) / 3))
order by day_hour8;

Related

Postgresql Where Specific Time On Date

I've table structure like this:
---------------------------
No | Data | create_time
---------------------------
1 | Data1 | 2020-04-28 00:01:30
2 | Data2 | 2020-04-28 13:04:00
3 | Data3 | 2020-04-27 01:01:30
4 | Data4 | 2020-04-27 14:04:00
How to query But with condition: Date 27 April Until 28 April And Time 00:00 Until 12:00
What I've tried so far:
SELECT * FROM mytable WHERE ((date(create_time) >= '2020-04-27' AND date(2020-04-27) <= '2020-04-28'
AND TO_CHAR(create_time,'HH24:MI:SS') BETWEEN '00:00:00' AND '12:00:00'))
And
SELECT * FROM mytable WHERE ((date(create_time) >= '2020-04-27' AND date(2020-04-27) <= '2020-04-28'
AND TO_CHAR(create_time,'HH24:MI:SS') =>'00:00:00' AND TO_CHAR(create_time,'HH24:MI:SS') <= '12:00:00'))
What I want to achieve is, I want to get the data from that date but the time is only on 00:00:00 until 12:00:00 (24 Hours Format)
But it's still not working, The Date is correct but the time that I want is not.
Note: this answer assumes that create_time is correctly defined as timestamp.
You can combine conditions on the date and time part:
select *
from mytable
where create_time::date between date '2020-04-27'and date '2020-04-28'
and create_time::time between time '00:00' and time '12:00'
Alternatively you can use a range condition without casting the column:
select *
from mytable
where create_time >= date '2020-04-27'
and create_time < date '2020-04-29'
and create_time::time between time '00:00' and time '12:00'
That can use an index on create_time.
If you don't want to include times at precisely 12:00, you need to change the "time" condition as well:
and create_time::time >= time '00:00'
and create_time::time < time '12:00'
Online example
Try the following and here is the demo.
with cte as
(
select
*,
cast (create_time::timestamp as time) as hour
from times
)
select
no,
data
from cte
where date(create_time) >= '2020-04-27'
and date(create_time) <= '2020-04-28'
and hour between '00:00:00' and '12:00:00'
Output:
| No| Data |
*----------*
|1 | Data1 |
|3 | Data3 |

Reporting on time information using start and end time

Is it possible to create a report that sums hours for a day grouped by an Id using a start and end time stamp?
I need to be able to split time that spans days and take part of that time and sum to the correct date group.
NOTE: The date ids are to a date dimension table.
------------------------------------------------------------------------------
TaskId | StartDateId | EndDateId | StartTime | EndTime
------------------------------------------------------------------------------
2 | 20190317 | 20190318 | 2019-03-17 16:30:00 | 2019-03-18 09:00:00
------------------------------------------------------------------------------
1 | 20190318 | 20190318 | 2019-03-18 09:00:00 | 2019-03-18 16:30:00
------------------------------------------------------------------------------
2 | 20190318 | 20190319 | 2019-03-18 16:30:00 | 2019-03-19 09:00:00
------------------------------------------------------------------------------
So based on this, the desired report output would be:
-------------------------
Date | Task | Hours
-------------------------
2019-03-17 | 2 | 7.5
-------------------------
2019-03-18 | 1 | 7.5
-------------------------
2019-03-18 | 2 | 16.5
-------------------------
...
The only working solution I have managed to implement is splitting records so that no record spans multiple days. I was hoping to find a report query solution, rather than an ETL base based solution.
I have tried to simulate your problem here: https://rextester.com/DEV45608 and I hope it helps you :) (The CTE GetDates can be replaced by your date dimension)
DECLARE #minDate DATE
DECLARE #maxDate DATE
CREATE TABLE Tasktime
(
Task_id INT,
Start_time DATETIME,
End_time DATETIME
);
INSERT INTO Tasktime VALUES
(2,'2019-03-17 16:30:00','2019-03-18 09:00:00'),
(1,'2019-03-18 09:00:00','2019-03-18 16:30:00'),
(2,'2019-03-18 16:30:00','2019-03-19 09:00:00');
SELECT #mindate = MIN(Start_time) FROM Tasktime;
SELECT #maxdate = MAX(End_time) FROM Tasktime;
;WITH GetDates AS
(
SELECT 1 AS counter, #minDate as Date
UNION ALL
SELECT counter + 1, DATEADD(day,counter,#minDate)
from GetDates
WHERE DATEADD(day, counter, #minDate) <= #maxDate
)
SELECT counter, Date INTO #tmp FROM GetDates;
SELECT
g.Date,
t.Task_id,
SUM(
CASE WHEN CAST(t.Start_time AS DATE) = CAST(t.End_time AS DATE) THEN
DATEDIFF(second, t.Start_time, t.End_time) / 3600.0
WHEN CAST(t.Start_time AS DATE) = g.Date THEN
DATEDIFF(second, t.Start_time, CAST(DATEADD(day,1,g.Date) AS DATETIME)) / 3600.0
WHEN CAST(t.End_time AS DATE) = g.Date THEN
DATEDIFF(second, CAST(g.Date AS DATETIME), t.End_time) / 3600.0
ELSE
24.0
END) AS hours_on_the_day_for_the_task
from
#tmp g
INNER JOIN
Tasktime t
ON
g.Date BETWEEN CAST(t.Start_time AS DATE) AND CAST(t.End_time AS DATE)
GROUP BY g.Date, t.Task_id
The Desired Date can be joined to the date dimension and return the "calendar date" and you can show that date in the report.
As for the HOURS.. when you are retrieving your dataset in SQL, just do this.. it is as simple as:
cast(datediff(MINUTE,'2019-03-18 16:30:00','2019-03-19 09:00:00') /60.0 as decimal(13,1)) as 'Hours'
So in your case it would be
cast(datediff(MINUTE,sometable.startdate,sometable.enddate) /60.0 as decimal(13,1)) as 'Hours'
Just doing a HOUR will return the whole hour.. and dividing by 60 will return a whole number. Hence the /60.0 and the cast

SQLite: Sum of differences between two dates group by every date

I have a SQLite database with start and stop datetimes
With the following SQL query I get the difference hours between start and stop:
SELECT starttime, stoptime, cast((strftime('%s',stoptime)-strftime('%s',starttime)) AS real)/60/60 AS diffHours FROM tracktime;
I need a SQL query, which delivers the sum of multiple timestamps, grouped by every day (also whole dates between timestamps).
The result should be something like this:
2018-08-01: 12 hours
2018-08-02: 24 hours
2018-08-03: 12 hours
2018-08-04: 0 hours
2018-08-05: 1 hours
2018-08-06: 14 hours
2018-08-07: 8 hours
You can try this, use CTE RECURSIVE make a calendar table for every date start time and end time, and do some calculation.
Schema (SQLite v3.18)
CREATE TABLE tracktime(
id int,
starttime timestamp,
stoptime timestamp
);
insert into tracktime values
(11,'2018-08-01 12:00:00','2018-08-03 12:00:00');
insert into tracktime values
(12,'2018-09-05 18:00:00','2018-09-05 19:00:00');
Query #1
WITH RECURSIVE cte AS (
select id,starttime,date(starttime,'+1 day') totime,stoptime
from tracktime
UNION ALL
SELECT id,
date(starttime,'+1 day'),
date(totime,'+1 day'),
stoptime
FROM cte
WHERE date(starttime,'+1 day') < stoptime
)
SELECT strftime('%Y-%m-%d', starttime),(strftime('%s',CASE
WHEN totime > stoptime THEN stoptime
ELSE totime
END) -strftime('%s',starttime))/3600 diffHour
FROM cte;
| strftime('%Y-%m-%d', starttime) | diffHour |
| ------------------------------- | -------- |
| 2018-08-01 | 12 |
| 2018-09-05 | 1 |
| 2018-08-02 | 24 |
| 2018-08-03 | 12 |
View on DB Fiddle

Summing counts based on overlapping intervals in postgres

I want to sum the column for every two minute interval (so it would be the sum of 1,2 and 2,3 and 3,4, etc...), but I'm not exactly sure how to go about doing that.
My data looks something like:
minute | source | count
2018-01-01 10:00 | a | 7
2018-01-01 10:01 | a | 5
2018-01-01 10:02 | a | 10
2018-01-01 10:00 | b | 20
2018-01-01 10:05 | a | 12
What I want
(e.g. row1+row2, row2+3, row3, row4, row5)
minute | source | count
2018-01-01 10:00 | a | 12
2018-01-01 10:01 | a | 15
2018-01-01 10:02 | a | 10
2018-01-01 10:00 | b | 20
2018-01-01 10:05 | a | 12
You can use a correlated subquery selecting the sum of the counts for the records in the interval sharing the source (I guess that the source must match is an requirement. If not, just remove the comparison in the WHERE clause.).
SELECT "t1"."minute",
"t1"."source",
(SELECT sum("t2"."count")
FROM "elbat" "t2"
WHERE "t2"."source" = "t1"."source"
AND "t2"."minute" >= "t1"."minute"
AND "t2"."minute" <= "t1"."minute" + INTERVAL '1 MINUTE') "count"
FROM "elbat" "t1";
SQL Fiddle
the post above assumes all the timestamps are to the minute. if you want to check for every 2 minutes throughout the day you can use the generate_series function. the issue with including the beginning minute and ending time in each interval will be b having 2 rows in the results.
ie.
select begintime,
endtime,
source,
sum(count)
from mytable
inner join (
select begintime, endtime
from (
select lag(time, 1) over (order by time) as begintime,
time as endtime
from (
select *
from generate_series('2018-01-01 00:00:00', '2018-01-02 00:00:00', interval '2 minutes') time
) q
) q2
where begintime is not null
) times on minute between begintime and endtime
group by begintime, endtime, source
order by begintime, endtime, source
you can change the 'minute between begintime and endtime' to 'minute > begintime and minute <= endtime' if you don't want that overlap

Filling Out & Filtering Irregular Time Series Data

Using Postgresql 9.4, I am trying to craft a query on time series log data that logs new values whenever the value updates (not on a schedule). The log can update anywhere from several times a minute to once a day.
I need the query to accomplish the following:
Filter too much data by just selecting the first entry for the timestamp range
Fill in sparse data by using the last reading for the log value. For example, if I am grouping the data by hour and there was an entry at 8am with a log value of 10. Then the next entry isn't until 11am with a log value of 15, I would want the query to return something like this:
Timestamp | Value
2015-07-01 08:00 | 10
2015-07-01 09:00 | 10
2015-07-01 10:00 | 10
2015-07-01 11:00 | 15
I have got a query that accomplishes the first of these goals:
with time_range as (
select hour
from generate_series('2015-07-01 00:00'::timestamp, '2015-07-02 00:00'::timestamp, '1 hour') as hour
),
ranked_logs as (
select
date_trunc('hour', time_stamp) as log_hour,
log_val,
rank() over (partition by date_trunc('hour', time_stamp) order by time_stamp asc)
from time_series
)
select
time_range.hour,
ranked_logs.log_val
from time_range
left outer join ranked_logs on ranked_logs.log_hour = time_range.hour and ranked_logs.rank = 1;
But I can't figure out how to fill in the nulls where there is no value. I tried using the lag() feature of Postgresql's Window functions, but it didn't work when there were multiple nulls in a row.
Here's a SQLFiddle that demonstrates the issue:
http://sqlfiddle.com/#!15/f4d13/5/0
your columns are log_hour and first_vlue
with time_range as (
select hour
from generate_series('2015-07-01 00:00'::timestamp, '2015-07-02 00:00'::timestamp, '1 hour') as hour
),
ranked_logs as (
select
date_trunc('hour', time_stamp) as log_hour,
log_val,
rank() over (partition by date_trunc('hour', time_stamp) order by time_stamp asc)
from time_series
),
base as (
select
time_range.hour lh,
ranked_logs.log_val
from time_range
left outer join ranked_logs on ranked_logs.log_hour = time_range.hour and ranked_logs.rank = 1)
SELECT
log_hour, log_val, value_partition, first_value(log_val) over (partition by value_partition order by log_hour)
FROM (
SELECT
date_trunc('hour', base.lh) as log_hour,
log_val,
sum(case when log_val is null then 0 else 1 end) over (order by base.lh) as value_partition
FROM base) as q
UPDATE
this is what your query return
Timestamp | Value
2015-07-01 01:00 | 10
2015-07-01 02:00 | null
2015-07-01 03:00 | null
2015-07-01 04:00 | 15
2015-07-01 05:00 | nul
2015-07-01 06:00 | 19
2015-07-01 08:00 | 13
I want this result set to be split in groups like this
2015-07-01 01:00 | 10
2015-07-01 02:00 | null
2015-07-01 03:00 | null
2015-07-01 04:00 | 15
2015-07-01 05:00 | nul
2015-07-01 06:00 | 19
2015-07-01 08:00 | 13
and to assign to every row in a group the value of first row from that group (done by last select)
In this case, a method for obtaining the grouping is to create a column which holds the number of
not null values counted until current row and split by this value. (use of sum(case))
value | sum(case)
| 10 | 1 |
| null | 1 |
| null | 1 |
| 15 | 2 | <-- new not null, increment
| nul | 2 |
| 19 | 3 | <-- new not null, increment
| 13 | 4 | <-- new not null, increment
and now I can partion by sum(case)