Generate 15 minute date intervals and join matching rows - sql

What I would like to do is get the 15 minute intervals based on a date range in a row and insert them into another table.
Given the following code gets me the date range which is part of my goal:
DECLARE #Table1 TABLE (ID INT IDENTITY(0,1), TIMEVALUE DATETIME, TIMEVALUE2 DATETIME);
DECLARE #start DATETIME2(7) = '2018-01-04 10:55:00'
DECLARE #end DATETIME2(7) = '2018-01-05 03:55:00'
SELECT #start = dateadd(minute, datediff(minute,0,#start) / 15 * 15, 0);
WITH CTE_DT AS
(
SELECT #start AS DT
UNION ALL
SELECT DATEADD(MINUTE,15,DT) FROM CTE_DT
WHERE DT< #end
)
INSERT INTO #Table1
SELECT DT, DATEADD(minute,14,dt) FROM CTE_DT
OPTION (MAXRECURSION 0);
SELECT * FROM #Table1
result:
ID TIMEVALUE TIMEVALUE2
0 2018-01-04 10:45:00.000 2018-01-04 10:59:00.000
1 2018-01-04 11:00:00.000 2018-01-04 11:14:00.000
2 2018-01-04 11:15:00.000 2018-01-04 11:29:00.000
3 2018-01-04 11:30:00.000 2018-01-04 11:44:00.000
4 2018-01-04 11:45:00.000 2018-01-04 11:59:00.000
5 2018-01-04 12:00:00.000 2018-01-04 12:14:00.000
6 2018-01-04 12:15:00.000 2018-01-04 12:29:00.000
7 2018-01-04 12:30:00.000 2018-01-04 12:44:00.000
8 2018-01-04 12:45:00.000 2018-01-04 12:59:00.000
..
..
What I want to accomplish is the apply the same logic above i from a record source.
So if my SourceData is
Col1 Col2 StartDate EndDate
AA AA 2018-01-01 13:25 2018-01-02 13:00
AA BB 2018-01-02 13:25 2018-01-03 13:00
so with a query somehow use the start and endate to produce this result with just a query
Col1 Col2 TIMEVALUE TIMEVALUE2
AA AA 2018-01-01 13:15:00 2018-01-01 13:29:00
AA AA 2018-01-01 13:30:00 2018-01-01 13:44:00
AA AA 2018-01-01 13:45:00 2018-01-01 13:59:00
...
...
AA AA 2018-01-02 12:30:00 2018-01-02 12:44:00
AA AA 2018-01-02 12:45:00 2018-01-02 12:59:00
AA AA 2018-01-02 13:00:00 2018-01-02 13:14:00
AA BB 2018-01-02 13:15:00 2018-01-02 13:29:00
AA BB 2018-01-02 13:30:00 2018-01-02 13:44:00
AA BB 2018-01-02 13:45:00 2018-01-02 13:59:00
...
...
AA BB 2018-01-03 12:30:00 2018-01-03 12:44:00
AA BB 2018-01-03 12:45:00 2018-01-03 12:59:00
AA BB 2018-01-03 13:00:00 2018-01-03 13:14:00
I would like to avoid using a cursor if I can. I have managed to make this work with a User Defined Function by passing the required columns with the select statement. I am hoping I can avoid using that if I can.

Change the end of the first query so that instead of
SELECT * FROM #Table1
It says:
SELECT * FROM #Table1 d
INNER JOIN SourceData sd
ON NOT(d.timevalue2 < sd.startdate OR d.timevalue1 > sd.enddate)
Consider taking your first query that generates the dates, and just run it for now until year 2030 and insert the dates into an actual table. Keep the query hanging around so it can be used again in ~11 years to add some more rows to the calendar table

Rather than using a rCTE (which is a form of RBAR), I would use a virtual tally table to generate your dates:
--; is a statement terminator, not a "beginninator". It goes at the end, for the start.
WITH N AS(
SELECT NULL AS N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I
FROM N N1 --10
CROSS JOIN N N2 --100
CROSS JOIN N N3 --1000
CROSS JOIN N N4 --10000
)
SELECT DATEADD(MINUTE,15*I,#Start)
FROM Tally
WHERE DATEADD(MINUTE,15*I,#Start) < #End;
If you want to generate 15 minute intervals for a combination of columns, you can then do a further CROSS JOIN. For example:
--Assume CTEs are already declared
SELECT V.Col1, V.Col2,
DATEADD(MINUTE,15*T.I,#Start)
FROM Tally T
CROSS JOIN (VALUES('AA','AA'),('AA','BB')) V(Col1, Col2) --This could be a CROSS APPLY to a DISTINCT, or similar is you wish
WHERE DATEADD(MINUTE,15*T.I,#Start) < #End;

Related

pandas groupby not using for loop (how to make smart)

Suppose you have a pandas Series like this.
a = pd.Series(range(31),index = pd.date_range('2018-01-01','2018-01-31',freq='D'))
If you want to make groupby dataframe with multi index like this
data
date
2018-01-01 2018-01-01 0
2018-01-02 1
2018-01-03 2
2018-01-04 3
2018-01-05 4
2018-01-02 2018-01-02 1
2018-01-03 2
2018-01-04 3
2018-01-05 4
2018-01-06 5
2018-01-03 2018-01-03 2
2018-01-04 3
2018-01-05 4
2018-01-06 5
2018-01-07 6
.....
This data shows that in the first level Multi index, it shows the original date time index. And in the second level, we cut the date into 5 days.
For example, if first level is 2018-01-01, second level is 2018-01-01 to 2018-01-05.
If first level is 2018-01-15, second level is 2018-01-15 to 2018-01-19 and data is 14, 15, 16, 17, 18.
How Can I make this DataFrame or Series without any loop?
Use -
import datetime as dt
first = np.repeat(a.index.values,5)
second = [ i + np.timedelta64(j,'D') for i in a.index for j in range(5)]
arrays = [first, second]
print(np.shape(second))
d=pd.DataFrame(index=pd.MultiIndex.from_arrays(arrays, names=('date1', 'date2')))
Output (d.head())
value
date1 date2
2018-01-01 2018-01-01 0.0
2018-01-02 1.0
2018-01-03 2.0
2018-01-04 3.0
2018-01-05 4.0
2018-01-02 2018-01-02 1.0
2018-01-03 2.0
2018-01-04 3.0
2018-01-05 4.0
2018-01-06 5.0

How can I delete from a table using the join and where clause? [duplicate]

This question already has answers here:
How can I delete using INNER JOIN with SQL Server?
(14 answers)
Closed 4 years ago.
The following is my code so far and the select query is what I would like to delete from the Events table.
declare #maxsnapevent table (sita varchar(10), snap date)
insert into #maxsnapevent
select sita, max(snapshot_date) from ukrmc.dbo.strategy group by sita
--select * from #maxsnapevent
--need to delete everything that the following code gives
select events.sita, events.date, events.event from ukrmc.dbo.events events
join #maxsnapevent max on max.sita = events.sita
where date >= max.snap and events.sita != 'lcypd' and events.sita != 'lonza'
example data from the strategy table:
date sita Event Snapshot_date
2018-01-01 London Bank Holiday 2017-12-31
2018-01-02 London 2017-12-31
2018-01-03 London 2017-12-31
2018-01-04 London Concert 2017-12-31
2018-01-02 London 2018-01-01
2018-01-03 London 2018-01-01
2018-01-04 London Concert 2018-01-01
2018-01-01 Bham Bank Holiday 2017-12-31
2018-01-02 Bham 2017-12-31
2018-01-03 Bham 2017-12-31
2018-01-04 Bham 2017-12-31
2018-01-02 Bham 2018-01-01
2018-01-03 Bham Charity 2018-01-01
2018-01-04 Bham 2018-01-01
example data from the events table:
date sita Event
2018-01-01 London Bank Holiday
2018-01-02 London
2018-01-03 London
2018-01-04 London Concert
2018-01-01 Bham Bank Holiday
2018-01-02 Bham
2018-01-03 Bham
2018-01-04 Bham Concert
As you can see each snapshot has several sitas with several dates.
Have you tried this next code?
delete events
from ukrmc.dbo.events events
join #maxsnapevent max on max.sita = events.sita
where date >= max.snap and events.sita != 'lcypd' and events.sita != 'lonza'

Consolidate overlapping time periods stored as start and end time

I have DB table with two columns of interest, the first is the start(or On) time, the second the end (or Off) time for numerous events within a given physical area.
The requirement is to identify the unique period of time during which a vehicle was within the area. So the start of the first event to the end of the last as a continuous period. The number of on or off event in the period is not needed for the resulting table.
There are millions of rows so a join may cause problems due to the size fo the resulting table. I'm not against it but...
Data :
id timeOn timeOff
761058840 2018-01-02 07:54:28.000 2018-01-02 08:33:34.000
761058840 2018-01-02 07:54:28.000 2018-01-02 08:36:30.000
761058840 2018-01-02 08:33:45.000 2018-01-02 08:35:30.000
761058840 2018-01-02 13:11:18.000 2018-01-02 13:14:04.000
761058840 2018-01-02 13:11:18.000 2018-01-02 13:39:40.000
761058840 2018-01-02 13:22:11.000 2018-01-02 13:40:25.000
761058840 2018-01-02 15:56:18.000 2018-01-02 15:59:34.000
761058840 2018-01-02 15:56:18.000 2018-01-02 16:36:25.000
761058840 2018-01-02 16:01:34.000 2018-01-02 16:05:34.000
761058840 2018-01-02 16:33:19.000 2018-01-02 16:38:26.000
761058840 2018-01-02 21:20:25.000 2018-01-02 21:24:25.000
761058840 2018-01-02 22:20:36.000 2018-01-03 05:20:37.000
761058840 2018-01-02 22:20:36.000 2018-01-03 05:20:37.000
761058840 2018-01-03 08:31:29.000 2018-01-03 09:01:10.000
761058840 2018-01-03 08:31:59.000 2018-01-03 09:01:07.000
761058840 2018-01-03 09:01:57.000 2018-01-03 09:06:27.000
761058840 2018-01-03 14:07:27.000 2018-01-03 14:17:32.000
761058840 2018-01-03 14:09:28.000 2018-01-03 14:45:00.000
761058840 2018-01-03 14:19:32.000 2018-01-03 14:48:22.000
761058840 2018-01-03 17:30:38.000 2018-01-03 18:06:35.000
761058840 2018-01-03 17:33:54.000 2018-01-03 18:09:48.000
Consider the rows in this data, what I'm looking for is :
761058840 2018-01-02 07:54:28.000 2018-01-02 08:36:30.000
761058840 2018-01-02 13:11:18.000 2018-01-02 13:40:25.000
761058840 2018-01-02 15:56:18.000 2018-01-02 16:38:26.000
761058840 2018-01-02 21:20:25.000 2018-01-02 21:24:25.000
761058840 2018-01-02 22:20:36.000 2018-01-03 05:20:37.000
761058840 2018-01-03 08:31:59.000 2018-01-03 09:01:07.000
761058840 2018-01-03 09:01:57.000 2018-01-03 09:06:27.000
761058840 2018-01-03 14:07:27.000 2018-01-03 14:48:22.000
761058840 2018-01-03 17:30:38.000 2018-01-03 18:09:48.000
Other solutions are for dates, I have multiple events within an hour. other solutions are categorizing the events into periods (binning them by hours).
There doesn't seem to be any things for continuous times.
The DB is SQL Server so T-SQL or ANSI would be ideal but I'm prepared for a bit of translation.
(edit for clarification : I'm trying to combine the sequence of overlapping times from timeOn to timeOff into a single row for each continuous sequence)
This is a classic problem of merging overlapping intervals. The simplest solution is to order the data by start point and group rows, start new group each time you find a gap between max. end point over previous rows and start point of current row.
The following solution is based on this idea (I used ROWS BETWEEN ... instead of LAG):
WITH t_with_change AS (
SELECT id, timeOn, timeOff, CASE WHEN MAX(timeOff) OVER (PARTITION BY ID ORDER BY timeOn ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) >= timeOn THEN 0 ELSE 1 END AS chg
FROM #t
), t_with_groups AS(
SELECT id, timeOn, timeOff, SUM(chg) OVER (PARTITION BY ID ORDER BY timeOn) AS grp
FROM t_with_change
)
SELECT id, grp, MIN(timeOn) AS timeOn, MAX(timeOff) AS timeOff
FROM t_with_groups
GROUP BY id, grp
DB Fiddle

SAS - PROC SQL: two tables: each one column distinct value, left join

I have a table with distinct dates YYYYMMDD from 20000101 until 20001231 and a table with distinct time points (HH:MM:SS) from 09:30:00 until 16:00:00.
I would like to create a (left) join where every day gets repeated 391 times assigned with each time point. That looks to me like a left join, however, I do not have any id's for joining.
date time
20000101 09:30:00
20000101 09:31:00
20000101 ...
20000101 ...
20000101 15:59:00
20000101 16:00:00
20000102 09:30:00
20000102 ...
20000102 16:00:00
how would the respective code look like (if there is no explicit common primary key to join on)?
PROC SQL;
SELECT DISTINCT a.date, b.time
FROM table_1 a, table_1 b (both information are in the same table)
;
QUIT;
Just as background: there are days that are "shorter" / less than 391 observation points. However, I would like to make sure every day has 391 observation points, just filled up with missing values.
You need Cartesian Product since you want to generate all combinations of date and time. So to produce such result you need CROSS JOIN in which you don't have to give any JOIN Condition.
Try the below query:
PROC SQL;
SELECT a.date, b.time
FROM table_1 a
CROSS JOIN
table_1 b
GROUP BY a.date, b.time
;
QUIT;
OR
PROC SQL;
SELECT a.date, b.time
FROM (SELECT date FROM table_1) a
CROSS JOIN
(SELECT time FROM table_1) b
GROUP BY a.date, b.time
;
QUIT;
For more info on CROSS JOIN Follow the below link:
http://support.sas.com/documentation/cdl/en/fedsqlref/67364/HTML/default/viewer.htm#p1q7agzgxs9ik5n1p7k3sdft0u9u.htm
You can do either a Left Join or Join and add Where 1=1 this will create the Cartesian Product for you:
Code:
proc sql;
create table want as
select t1.date, t2.time
from t1 left join t2 on 1=1
order by date, time;
quit;
To show all observed times (over all dates) for each date, as well as maintaining original satellite information I would use a reflexive cross join of the combinatoric columns for the basis of a reflexive left join.
Consider this sample data generator. It simulates the case of data being gathered at different intervals (every 10 or 20 minutes) on different days.
data have;
do i = 1 to 5;
date = '01-apr-2018'd + (i-1);
do j = 0 to 4;
time = '12:00't + (mod(i,2)+1) * 600 * j; * every other day sample at 1o or 20 minute interval;
x = ceil ( 25 * ranuni(123) );
OUTPUT;
end;
end;
format date yymmdd10. time time8.;
keep date time x;
run;
SQl is used to cross join the distinct dates and times and then the original data is left joined to the cross join.
proc sql;
create table cross_as_left_basis
as
select
cross.date
, cross.time
, have.x
from
( select distinct dates.date, times.time
from have as dates
cross join have as times
) as
cross
left join
have
on
cross.date = have.date
and cross.time = have.time
;
Have is
date time x
2018-04-01 12:00:00 19
12:20:00 9
12:40:00 5
13:00:00 23
13:20:00 9
2018-04-02 12:00:00 6
12:10:00 20
12:20:00 10
12:30:00 4
12:40:00 5
2018-04-03 12:00:00 20
12:20:00 11
12:40:00 25
13:00:00 7
13:20:00 18
2018-04-04 12:00:00 14
12:10:00 14
12:20:00 22
12:30:00 4
12:40:00 22
2018-04-05 12:00:00 17
12:20:00 20
12:40:00 18
13:00:00 9
13:20:00 14
The join result is
date time x
2018-04-01 12:00:00 19
12:10:00 .
12:20:00 9
12:30:00 .
12:40:00 5
13:00:00 23
13:20:00 9
2018-04-02 12:00:00 6
12:10:00 20
12:20:00 10
12:30:00 4
12:40:00 5
13:00:00 .
13:20:00 .
2018-04-03 12:00:00 20
12:10:00 .
12:20:00 11
12:30:00 .
12:40:00 25
13:00:00 7
13:20:00 18
2018-04-04 12:00:00 14
12:10:00 14
12:20:00 22
12:30:00 4
12:40:00 22
13:00:00 .
13:20:00 .
2018-04-05 12:00:00 17
12:10:00 .
12:20:00 20
12:30:00 .
12:40:00 18
13:00:00 9
13:20:00 14

PostgreSQL group by with interval but without window functions

This is follow-up of my previous question:
PostgreSQL group by with interval
There was a very good answer but unfortunately it is not working with PostgreSQL 8.0 - some clients still use this old version.
So I need to find another solution without using window functions
Here is what I have as a table:
id quantity price1 price2 date
1 100 1 0 2018-01-01 10:00:00
2 200 1 0 2018-01-02 10:00:00
3 50 5 0 2018-01-02 11:00:00
4 100 1 1 2018-01-03 10:00:00
5 100 1 1 2018-01-03 11:00:00
6 300 1 0 2018-01-03 12:00:00
I need to sum "quantity" grouped by "price1" and "price2" but only when they change
So the end result should look like this:
quantity price1 price2 dateStart dateEnd
300 1 0 2018-01-01 10:00:00 2018-01-02 10:00:00
50 5 0 2018-01-02 11:00:00 2018-01-02 11:00:00
200 1 1 2018-01-03 10:00:00 2018-01-03 11:00:00
300 1 0 2018-01-03 12:00:00 2018-01-03 12:00:00
It is not efficient, but you can implement the same logic with subqueries:
select sum(quantity), price1, price2,
min(date) as dateStart, max(date) as dateend
from (select d.*,
(select count(*)
from data d2
where d2.date <= d.date
) as seqnum,
(select count(*)
from data d2
where d2.price1 = d.price1 and d2.price2 = d.price2 and d2.date <= d.date
) as seqnum_pp
from data d
) t
group by price1, price2, (seqnum - seqnum_pp)
order by dateStart