Consolidate overlapping time periods stored as start and end time - sql

I have DB table with two columns of interest, the first is the start(or On) time, the second the end (or Off) time for numerous events within a given physical area.
The requirement is to identify the unique period of time during which a vehicle was within the area. So the start of the first event to the end of the last as a continuous period. The number of on or off event in the period is not needed for the resulting table.
There are millions of rows so a join may cause problems due to the size fo the resulting table. I'm not against it but...
Data :
id timeOn timeOff
761058840 2018-01-02 07:54:28.000 2018-01-02 08:33:34.000
761058840 2018-01-02 07:54:28.000 2018-01-02 08:36:30.000
761058840 2018-01-02 08:33:45.000 2018-01-02 08:35:30.000
761058840 2018-01-02 13:11:18.000 2018-01-02 13:14:04.000
761058840 2018-01-02 13:11:18.000 2018-01-02 13:39:40.000
761058840 2018-01-02 13:22:11.000 2018-01-02 13:40:25.000
761058840 2018-01-02 15:56:18.000 2018-01-02 15:59:34.000
761058840 2018-01-02 15:56:18.000 2018-01-02 16:36:25.000
761058840 2018-01-02 16:01:34.000 2018-01-02 16:05:34.000
761058840 2018-01-02 16:33:19.000 2018-01-02 16:38:26.000
761058840 2018-01-02 21:20:25.000 2018-01-02 21:24:25.000
761058840 2018-01-02 22:20:36.000 2018-01-03 05:20:37.000
761058840 2018-01-02 22:20:36.000 2018-01-03 05:20:37.000
761058840 2018-01-03 08:31:29.000 2018-01-03 09:01:10.000
761058840 2018-01-03 08:31:59.000 2018-01-03 09:01:07.000
761058840 2018-01-03 09:01:57.000 2018-01-03 09:06:27.000
761058840 2018-01-03 14:07:27.000 2018-01-03 14:17:32.000
761058840 2018-01-03 14:09:28.000 2018-01-03 14:45:00.000
761058840 2018-01-03 14:19:32.000 2018-01-03 14:48:22.000
761058840 2018-01-03 17:30:38.000 2018-01-03 18:06:35.000
761058840 2018-01-03 17:33:54.000 2018-01-03 18:09:48.000
Consider the rows in this data, what I'm looking for is :
761058840 2018-01-02 07:54:28.000 2018-01-02 08:36:30.000
761058840 2018-01-02 13:11:18.000 2018-01-02 13:40:25.000
761058840 2018-01-02 15:56:18.000 2018-01-02 16:38:26.000
761058840 2018-01-02 21:20:25.000 2018-01-02 21:24:25.000
761058840 2018-01-02 22:20:36.000 2018-01-03 05:20:37.000
761058840 2018-01-03 08:31:59.000 2018-01-03 09:01:07.000
761058840 2018-01-03 09:01:57.000 2018-01-03 09:06:27.000
761058840 2018-01-03 14:07:27.000 2018-01-03 14:48:22.000
761058840 2018-01-03 17:30:38.000 2018-01-03 18:09:48.000
Other solutions are for dates, I have multiple events within an hour. other solutions are categorizing the events into periods (binning them by hours).
There doesn't seem to be any things for continuous times.
The DB is SQL Server so T-SQL or ANSI would be ideal but I'm prepared for a bit of translation.
(edit for clarification : I'm trying to combine the sequence of overlapping times from timeOn to timeOff into a single row for each continuous sequence)

This is a classic problem of merging overlapping intervals. The simplest solution is to order the data by start point and group rows, start new group each time you find a gap between max. end point over previous rows and start point of current row.
The following solution is based on this idea (I used ROWS BETWEEN ... instead of LAG):
WITH t_with_change AS (
SELECT id, timeOn, timeOff, CASE WHEN MAX(timeOff) OVER (PARTITION BY ID ORDER BY timeOn ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) >= timeOn THEN 0 ELSE 1 END AS chg
FROM #t
), t_with_groups AS(
SELECT id, timeOn, timeOff, SUM(chg) OVER (PARTITION BY ID ORDER BY timeOn) AS grp
FROM t_with_change
)
SELECT id, grp, MIN(timeOn) AS timeOn, MAX(timeOff) AS timeOff
FROM t_with_groups
GROUP BY id, grp
DB Fiddle

Related

Best way to pull repeated table data from a pcap http file (could awk handle the disruptive breaks)?

I am collecting data readings from my PV system. The web client will graph one day of data - I want to collect a whole year or two in one file for patterns etc.
So far I capture lines into a cap file with Wireshark and just filter the data I want with headers and a few retransmitted packet.
The data of interest is being sent to a js app but I want to lift out the data which repeats in each packet as date time=watts, see sample below...
I was hoping to use AWK to parse the data into an array keyed by date and time, then print it back out to a file. This gets rid of duplicates in retransmitted packets and sorts the data. I ideally I'd also remove unwanted decimal data in the watts field too.
This sample was passed through strings to remove binary data in the cap. Can awk handle that better?
There are regular packet breaks that interrupt the fields at any place, in this sample with the year 2018 with the 20 being at the end of a packet and the 18 in the start of the next. The inter-line text is not consistent though there might be something more consistent in the binary file.
So the rule needs to be:
ignore until {"1":"{
parse for 4n-2n-2n space 2n:2n space real_nb comma (ignoring any other line breaks or chars)
stop collecting at }","0":"2018-01-01"} NB the end dates vary!
Here are 2 sample blocks. The first shows strings around a table block that has been shortened to just a few times from that day.
The second block is just the full table data from a day without the context.
(I added a line break for visual separation. Note break within the 76.549995 which would be better rounded as 77)
Path=/
/[CB
$e/N
{"1":"{2018-01-08 08:50=4.5, 2018-01-08 08:55=9.5, 2018-01-08 11:30=76
/[CB
$e/QM
.549995, 2018-01-08 11:35=73.9, 2018-01-08 11:40=65.93333, 2018-01-08 15:30=2.25, 2018-01-08 15:40=0.0}","0":"2018-01-08"}
/[CB
$e/Vq
XT2P
HTTP/1.1 200 OK
{"1":"{2018-01-01 08:15=9.5, 2018-01-01 08:20=22.0, 2018-01-01 08:25=29.4, 2018-01-01 08:30=30.150002, 2018-01-01 08:35=35.3, 2018-01-01 08:40=42.0, 2018-01-01 08:45=77.5, 2018-01-01 08:50=62.6, 2018-01-01 08:55=62.6, 2018-01-01 09:00=75.4, 2018-01-01 09:05=61.199997, 2018-01-01 09:10=57.85, 2018-01-01 09:15=45.7, 2018-01-01 09:20=44.266666, 2018-01-01 09:25=47.2, 2018-01-01 09:30=46.8, 2018-01-01 09:35=53.2, 2018-01-01 09:40=58.2, 2018-01-01 09:45=55.600002, 2018-01-01 09:50=56.733337, 2018-01-01 09:55=62.0, 2018-01-01 10:00=66.3, 2018-01-01 10:05=62.466663, 2018-01-01 10:10=62.699997, 2018-01-01 10:15=70.3, 2018-01-01 10:20=87.1, 2018-01-01 10:25=88.24999, 2018-01-01 10:30=102.5, 2018-01-01 10:35=95.46667, 2018-01-01 10:40=100.73334, 2018-01-01 10:45=100.700005, 2018-01-01 10:50=102.06667, 2018-01-01 10:55=116.4, 20
/[CB
X7BP
18-01-01 11:00=126.7, 2018-01-01 11:05=125.166664, 2018-01-01 11:10=128.26666, 2018-01-01 11:15=125.43333, 2018-01-01 11:20=119.666664, 2018-01-01 11:25=116.649994, 2018-01-01 11:30=94.700005, 2018-01-01 11:35=101.7, 2018-01-01 11:40=95.13333, 2018-01-01 11:45=98.76666, 2018-01-01 11:50=98.466675, 2018-01-01 11:55=92.43334, 2018-01-01 12:00=85.96667, 2018-01-01 12:05=77.833336, 2018-01-01 12:10=75.95, 2018-01-01 12:15=67.75, 2018-01-01 12:20=57.699997, 2018-01-01 12:25=74.2, 2018-01-01 12:30=87.1, 2018-01-01 12:35=77.6, 2018-01-01 12:40=74.1, 2018-01-01 12:45=63.36667, 2018-01-01 12:50=59.300003, 2018-01-01 12:55=76.9, 2018-01-01 13:00=66.6, 2018-01-01 13:05=203.4, 2018-01-01 13:10=203.45, 2018-01-01 13:15=203.45, 2018-01-01 13:20=157.3, 2018-01-01 13:25=101.333336, 2018-01-01 13:30=96.45, 2018-01-01 13:35=81.3, 2018-01-01 13:40=93.7, 2018-01-01 13:45=127.9, 2018-01-01 13:50=176.1, 2018-01-01 13:55=152.0, 2018-01-01 14:00=169.6, 2018-01-01 14:05=203.2, 2018-01-01 14:10=257.5, 2018-01-01 14:15=261.30002, 2018-01-01 14:20=261.3, 2018-01-01 14:25=218.13335, 2018-01-01 14:30=385.5, 2018-01-01 14:35=287.5, 2018-01-01 14:40=248.35002, 2018-01-01 14:45=98.2, 2018-01-01 14:50=136.2, 2018-01-01 14:55=160.0, 2018-01-01 15:00=148.1
/[CB
X7BP
, 2018-01-01 15:05=133.59999, 2018-01-01 15:10=93.3, 2018-01-01 15:15=79.25, 2018-01-01 15:20=44.300003, 2018-01-01 15:25=36.56667, 2018-01-01 15:30=43.8, 2018-01-01 15:35=39.3, 2018-01-01 15:40=39.5, 2018-01-01 15:45=33.05, 2018-01-01 15:50=28.649998, 2018-01-01 15:55=26.65, 2018-01-01 16:00=16.55, 2018-01-01 16:05=7.5, 2018-01-01 16:10=0.0}","0":"2018-01-01"}
I will have several thousand lines of this source data and 40-100k date_time data points, can keyed arrays handle that? Should I define the comma as my line separator? (I am unsure if a comma might occur in the packet/line break texts...)
Is there a better, easier solution?
At present I have been using a text editor to process a few sample months and test my analysis ideas but this is too slow and onerous for the full data set.
My ideal output looks like (different sample data from my editing)
06/11/18 11:20 799
06/11/18 11:25 744
06/11/18 11:30 720
06/11/18 11:35 681
06/11/18 11:40 543
06/11/18 11:45 350
06/11/18 11:50 274
06/11/18 11:55 230
06/11/18 12:00 286
06/11/18 12:05 435
06/11/18 12:10 544
06/11/18 12:15 899
06/11/18 12:20 1187
06/11/18 12:25 1575
06/11/18 12:30 1362
06/11/18 12:35 1423
Maybe Python is better suited but that is a bigger learning curve and lower starting knowledge point for me...
Here is my start, it gets most of the data about right BUT does not deal with a record split across 2 packets or the trailing }"
awk 'BEGIN{RS=","}; ($1~"^201"){if (NF=2) {split($2,X,"=");print $1,X[1], X[2]}}' sample.txt
outputs
2018-01-06 15:30 39.033333
2018-01-06 15:35 34.9
2018-01-06 15:40 24.25
2018-01-06 15 NB lost data at packet break as line not starting 201
2018-01-06 15:50 0.0
2018-01-06 15:55 0.0}" NB failed to remove trailer
2018-01-07 08:25 7.8
2018-01-07 08:30 23.7
Just noticed my text editing version reformatted dates as dd/mm/yy while the awk has preserved the input date format. Spreadsheets will read either so I don't care about that!
For the record just ran my awk on the binary cap file and it still seems to work in the same way as the file output by strings.
Real data, as output from strings
Mac OS X 10.11.6, build 15G22010 (Darwin 15.6.0)
Dumpcap (Wireshark) 2.6.5 (v2.6.5-0-gf766965a)
host 47.91.67.66
Mac OS X 10.11.6, build 15G22010 (Darwin 15.6.0)
.#/[CB
HTTP/1.1 200 OK
Date: Tue, 12 Nov 2019 16:15:11 GMT
Content-Type: application/json;charset=UTF-8
Content-Length: 2432
Connection: keep-alive
Accept-Charset: big5, big5-hkscs, euc-jp, euc-kr, gb18030, gb2312, gbk, ibm-thai, ibm00858, ibm01140, ibm01141, ibm01142, ibm01143, ibm01144, ibm01145, ibm01146, ibm01147, ibm01148, ibm01149, ibm037, ibm1026, ibm1047, ibm273, ibm277, ibm278, ibm280, ibm284, ibm285, ibm290, ibm297, ibm420, ibm424, ibm437, ibm500, ibm775, ibm850, ibm852, ibm855, ibm857, ibm860, ibm861, ibm862, ibm863, ibm864, ibm865, ibm866, ibm868, ibm869, ibm870, ibm871, ibm918, iso-2022-cn, iso-2022-jp, iso-2022-jp-2, iso-2022-kr, iso-8859-1, iso-8859-13, iso-8859-15, iso-8859-2, iso-8859-3, iso-8859-4, iso-8859-5, iso-8859-6, iso-8859-7, iso-8859-8, iso-8859-9, jis_x0201, jis_x0212-1990, koi8-r, koi8-u, shift_jis, tis-620, us-ascii, utf-16, utf-16be, utf-16le, utf-32, utf-32be, utf-32le, utf-8, windows-1250, windows-1251, windows-1252, windows-1253, windows-1254, windows-1255, windows-1256, windows-1257, windows-1258, windows-31j, x-big5-hkscs-2001, x-big5-solaris, x-euc-jp-linux, x-euc-tw, x-eucjp-open, x-ibm1006, x-ibm1025, x-ibm1046, x-ibm1097, x-ibm1098, x-ibm1112, x-ibm1122, x-ibm1123, x-ibm1124, x-ibm13
/v/[CB
X7BP
64, x-ibm1381, x-ibm1383, x-ibm300, x-ibm33722, x-ibm737, x-ibm833, x-ibm834, x-ibm856, x-ibm874, x-ibm875, x-ibm921, x-ibm922, x-ibm930, x-ibm933, x-ibm935, x-ibm937, x-ibm939, x-ibm942, x-ibm942c, x-ibm943, x-ibm943c, x-ibm948, x-ibm949, x-ibm949c, x-ibm950, x-ibm964, x-ibm970, x-iscii91, x-iso-2022-cn-cns, x-iso-2022-cn-gb, x-iso-8859-11, x-jis0208, x-jisautodetect, x-johab, x-macarabic, x-maccentraleurope, x-maccroatian, x-maccyrillic, x-macdingbat, x-macgreek, x-machebrew, x-maciceland, x-macroman, x-macromania, x-macsymbol, x-macthai, x-macturkish, x-macukraine, x-ms932_0213, x-ms950-hkscs, x-ms950-hkscs-xp, x-mswin-936, x-pck, x-sjis_0213, x-utf-16le-bom, x-utf-32be-bom, x-utf-32le-bom, x-windows-50220, x-windows-50221, x-windows-874, x-windows-949, x-windows-950, x-windows-iso2022jp
Set-Cookie: SERVERID=dfd94e11c720d0a37cf8b7c8c0cc0c75|1573575311|1573575148;Path=/
/[CB
X7BP
{"1":"{2018-01-01 08:15=9.5, 2018-01-01 08:20=22.0, 2018-01-01 08:25=29.4, 2018-01-01 08:30=30.150002, 2018-01-01 08:35=35.3, 2018-01-01 08:40=42.0, 2018-01-01 08:45=77.5, 2018-01-01 08:50=62.6, 2018-01-01 08:55=62.6, 2018-01-01 09:00=75.4, 2018-01-01 09:05=61.199997, 2018-01-01 09:10=57.85, 2018-01-01 09:15=45.7, 2018-01-01 09:20=44.266666, 2018-01-01 09:25=47.2, 2018-01-01 09:30=46.8, 2018-01-01 09:35=53.2, 2018-01-01 09:40=58.2, 2018-01-01 09:45=55.600002, 2018-01-01 09:50=56.733337, 2018-01-01 09:55=62.0, 2018-01-01 10:00=66.3, 2018-01-01 10:05=62.466663, 2018-01-01 10:10=62.699997, 2018-01-01 10:15=70.3, 2018-01-01 10:20=87.1, 2018-01-01 10:25=88.24999, 2018-01-01 10:30=102.5, 2018-01-01 10:35=95.46667, 2018-01-01 10:40=100.73334, 2018-01-01 10:45=100.700005, 2018-01-01 10:50=102.06667, 2018-01-01 10:55=116.4, 20
/[CB
X7BP
18-01-01 11:00=126.7, 2018-01-01 11:05=125.166664, 2018-01-01 11:10=128.26666, 2018-01-01 11:15=125.43333, 2018-01-01 11:20=119.666664, 2018-01-01 11:25=116.649994, 2018-01-01 11:30=94.700005, 2018-01-01 11:35=101.7, 2018-01-01 11:40=95.13333, 2018-01-01 11:45=98.76666, 2018-01-01 11:50=98.466675, 2018-01-01 11:55=92.43334, 2018-01-01 12:00=85.96667, 2018-01-01 12:05=77.833336, 2018-01-01 12:10=75.95, 2018-01-01 12:15=67.75, 2018-01-01 12:20=57.699997, 2018-01-01 12:25=74.2, 2018-01-01 12:30=87.1, 2018-01-01 12:35=77.6, 2018-01-01 12:40=74.1, 2018-01-01 12:45=63.36667, 2018-01-01 12:50=59.300003, 2018-01-01 12:55=76.9, 2018-01-01 13:00=66.6, 2018-01-01 13:05=203.4, 2018-01-01 13:10=203.45, 2018-01-01 13:15=203.45, 2018-01-01 13:20=157.3, 2018-01-01 13:25=101.333336, 2018-01-01 13:30=96.45, 2018-01-01 13:35=81.3, 2018-01-01 13:40=93.7, 2018-01-01 13:45=127.9, 2018-01-01 13:50=176.1, 2018-01-01 13:55=152.0, 2018-01-01 14:00=169.6, 2018-01-01 14:05=203.2, 2018-01-01 14:10=257.5, 2018-01-01 14:15=261.30002, 2018-01-01 14:20=261.3, 2018-01-01 14:25=218.13335, 2018-01-01 14:30=385.5, 2018-01-01 14:35=287.5, 2018-01-01 14:40=248.35002, 2018-01-01 14:45=98.2, 2018-01-01 14:50=136.2, 2018-01-01 14:55=160.0, 2018-01-01 15:00=148.1
/[CB
X7BP
, 2018-01-01 15:05=133.59999, 2018-01-01 15:10=93.3, 2018-01-01 15:15=79.25, 2018-01-01 15:20=44.300003, 2018-01-01 15:25=36.56667, 2018-01-01 15:30=43.8, 2018-01-01 15:35=39.3, 2018-01-01 15:40=39.5, 2018-01-01 15:45=33.05, 2018-01-01 15:50=28.649998, 2018-01-01 15:55=26.65, 2018-01-01 16:00=16.55, 2018-01-01 16:05=7.5, 2018-01-01 16:10=0.0}","0":"2018-01-01"}
/[CB
HTTP/1.1 200 OK
Date: Tue, 12 Nov 2019 16:15:14 GMT
Content-Type: application/json;charset=UTF-8
Content-Length: 2184
Connection: keep-alive
Accept-Charset: big5, big5-hkscs, euc-jp, euc-kr, gb18030, gb2312, gbk, ibm-thai, ibm00858, ibm01140, ibm01141, ibm01142, ibm01143, ibm01144, ibm01145, ibm01146, ibm01147, ibm01148, ibm01149, ibm037, ibm1026, ibm1047, ibm273, ibm277, ibm278, ibm280, ibm284, ibm285, ibm290, ibm297, ibm420, ibm424, ibm437, ibm500, ibm775, ibm850, ibm852, ibm855, ibm857, ibm860, ibm861, ibm862, ibm863, ibm864, ibm865, ibm866, ibm868, ibm869, ibm870, ibm871, ibm918, iso-2022-cn, iso-2022-jp, iso-2022-jp-2, iso-2022-kr, iso-8859-1, iso-8859-13, iso-8859-15, iso-8859-2, iso-8859-3, iso-8859-4, iso-8859-5, iso-8859-6, iso-8859-7, iso-8859-8, iso-8859-9, jis_x0201, jis_x0212-1990, koi8-r, koi8-u, shift_jis, tis-620, us-ascii, utf-16, utf-16be, utf-16le, utf-32, utf-32be, utf-32le, utf-8, windows-1250, windows-1251, windows-1252, windows-1253, windows-1254, windows-1255, windows-1256, windows-1257, windows-1258, windows-31j, x-big5-hkscs-2001, x-big5-solaris, x-euc-jp-linux, x-euc-tw, x-eucjp-open, x-ibm1006, x-ibm1025, x-ibm1046, x-ibm1097, x-ibm1098, x-ibm1112, x-ibm1122, x-ibm1123, x-ibm1124, x-ibm13
/q/[CB
64, x-ibm1381, x-ibm1383, x-ibm300, x-ibm33722, x-ibm737, x-ibm833, x-ibm834, x-ibm856, x-ibm874, x-ibm875, x-ibm921, x-ibm922, x-ibm930, x-ibm933, x-ibm935, x-ibm937, x-ibm939, x-ibm942, x-ibm942c, x-ibm943, x-ibm943c, x-ibm948, x-ibm949, x-ibm949c, x-ibm950, x-ibm964, x-ibm970, x-iscii91, x-iso-2022-cn-cns, x-iso-2022-cn-gb, x-iso-8859-11, x-jis0208, x-jisautodetect, x-johab, x-macarabic, x-maccentraleurope, x-maccroatian, x-maccyrillic, x-macdingbat, x-macgreek, x-machebrew, x-maciceland, x-macroman, x-macromania, x-macsymbol, x-macthai, x-macturkish, x-macukraine, x-ms932_0213, x-ms950-hkscs, x-ms950-hkscs-xp, x-mswin-936, x-pck, x-sjis_0213, x-utf-16le-bom, x-utf-32be-bom, x-utf-32le-bom, x-windows-50220, x-windows-50221, x-windows-874, x-windows-949, x-windows-950, x-windows-iso2022jp
Set-Cookie: SERVERID=dfd94e11c720d0a37cf8b7c8c0cc0c75|1573575314|1573575148;Path=/
/[CB
{"1":"{2018-01-02 08:35=0.0, 2018-01-02 08:40=6.6, 2018-01-02 08:45=6.35, 2018-01-02 08:50=7.8, 2018-01-02 08:55=6.9, 2018-01-02 09:00=12.2, 2018-01-02 09:05=18.3, 2018-01-02 09:10=25.9, 2018-01-02 09:15=26.15, 2018-01-02 09:20=40.0, 2018-01-02 09:25=36.45, 2018-01-02 09:30=36.450005, 2018-01-02 09:35=30.633333, 2018-01-02 09:40=41.4, 2018-01-02 09:45=44.1, 2018-01-02 09:50=53.9, 2018-01-02 09:55=66.2, 2018-01-02 10:00=75.6, 2018-01-02 10:05=70.1, 2018-01-02 10:10=72.05, 2018-01-02 10:15=54.0, 2018-01-02 10:20=40.55, 2018-01-02 10:25=40.549995, 2018-01-02 10:30=31.699997, 2018-01-02 10:35=33.8, 2018-01-02 10:40=47.6, 2018-01-02 10:45=40.699997, 2018-01-02 10:50=36.65, 2018-01-02 10:55=19.55, 2018-01-02 11:00=12.1, 2018-01-02 11:05=9.549999, 2018-01-02 11:10=25.9, 2018-01-02 11:15=30.0, 2018-01-02 11:20=52.3, 2018-01-0
/[CB
2 11:25=63.3, 2018-01-02 11:30=97.1, 2018-01-02 11:35=147.7, 2018-01-02 11:40=163.8, 2018-01-02 11:45=186.8, 2018-01-02 11:50=241.0, 2018-01-02 11:55=289.9, 2018-01-02 12:00=265.45, 2018-01-02 12:05=247.70001, 2018-01-02 12:10=204.5, 2018-01-02 12:15=206.59999, 2018-01-02 12:20=207.83333, 2018-01-02 12:25=201.36665, 2018-01-02 12:30=189.93333, 2018-01-02 12:35=185.30002, 2018-01-02 12:40=151.65, 2018-01-02 12:45=222.9, 2018-01-02 12:50=197.65, 2018-01-02 12:55=199.46667, 2018-01-02 13:00=254.3, 2018-01-02 13:05=337.7, 2018-01-02 13:10=296.06668, 2018-01-02 13:15=308.80002, 2018-01-02 13:20=314.9, 2018-01-02 13:25=348.0, 2018-01-02 13:30=378.6, 2018-01-02 13:35=356.06665, 2018-01-02 13:40=360.1, 2018-01-02 13:45=287.86667, 2018-01-02 13:50=262.6, 2018-01-02 13:55=265.80002, 2018-01-02 14:00=256.53333, 2018-01-02 14:05=251.90001, 2018-01-02 14:10=158.45, 2018-01-02 14:15=117.0, 2018-01-02 14:20=99.5, 2018-01-02 14:25=91.25, 2018-01-02 14:30=94.1, 2018-01-02 14:35=95.55, 2018-01-02 14:40=91.666664, 2018-01-02 14:45=87.23334, 2018-01-02 14:50=81.66667, 2018-01-02 14:55=79.166664, 2018-01-02 15:00=75.333336, 2018-01-02 15:05=72.850006, 2018-01-02 15:10=60.300003, 2018-01-02 15:15=43.75, 2018-01-02 15:20=30.0, 2018-01-02 15:25
2t/[CB
=18.2, 2018-01-02 15:30=11.0, 2018-01-02 15:35=7.0, 2018-01-02 15:40=3.3, 2018-01-02 15:45=1.55}","0":"2018-01-02"}
/[CB
X>~P```
Get the full file of a month or so data here
`https://www.dropbox.com/s/3vb6g9ywlgt7isw/dayData2.txt?dl=1`
KamilCuk has provided a best solution. The first solution with a series of commands does the best job but is not convenient to use as it stands, also it will not operate on the binary cap file.
The combined sed command, his solution 2, does not work so well. Probably because it works with one line pattern at a time and the multiline problems are not well catered for. Maybe it could be fixed up if the read point can be backed up a line or the residual of the last line saved and included with the next.
My own quick and lossy (more later) is a convenient one liner. It works on the binary cap file, which would allow it to accept a pipe from tcpdump or ngrep - also useful options.
Comparing my lossy solution: it lost about 1% of data points that were split across IP packets, it allowed me to also reject 1% of packets that were at the shutdown of PV system and just recording 0.0 watts.
In terms of my aims of analysing trends and probabilities in power output over times and seasons (which I will do in buckets of 15 or 30 mins, also combining days within week_from_solstice e.g. days 14-7 before and 7-14 after 21 Dec) the loss of a few readings does not matter. The removal of the zeros at day end actually improves my data analysis.
So next time I process a sample of data via IP capture I think I might use:
awk 'BEGIN{RS=","}; ($1~"^201"){if (NF=2) {split($2,X,"="); if (0+X[2] > 0) {split($1,D,"-");print D[3]"/"D[2]"/"substr(D[1],3,2),X[1], 0+X[2]}}}'
0+X[2] required since some lines end 0.0}", the calculation both gets a number 0 from this and discards the }".
If I do not want to reformat the date (Excel takes 2018-01-31) the command is simpler:
awk 'BEGIN{RS=","}; ($1~"^201"){if (NF=2) {split($2,X,"="); if (0+X[2] > 0) {print $1,X[1], 0+X[2]}}}'
The command also removes responses where the database has no data (comms were down or they purged it), the IP feed then sends "2017-12-25 10:10 null"
The following code that uses GNU sed with recreated input as a stream as here doc separated with END_OF_INPUT, with some comments:
cat <<'END_OF_INPUT' |
Path=/
/[CB
$e/N
{"1":"{2018-01-08 08:50=4.5, 2018-01-08 08:55=9.5, 2018-01-08 11:30=76
/[CB
$e/QM
.549995, 2018-01-08 11:35=73.9, 2018-01-08 11:40=65.93333, 2018-01-08 15:30=2.25, 2018-01-08 15:40=0.0}","0":"2018-01-08"}
/[CB
$e/Vq
XT2P
HTTP/1.1 200 OK
{"1":"{2018-01-01 08:15=9.5, 2018-01-01 08:20=22.0, 2018-01-01 08:25=29.4, 2018-01-01 08:30=30.150002, 2018-01-01 08:35=35.3, 2018-01-01 08:40=42.0, 2018-01-01 08:45=77.5, 2018-01-01 08:50=62.6, 2018-01-01 08:55=62.6, 2018-01-01 09:00=75.4, 2018-01-01 09:05=61.199997, 2018-01-01 09:10=57.85, 2018-01-01 09:15=45.7, 2018-01-01 09:20=44.266666, 2018-01-01 09:25=47.2, 2018-01-01 09:30=46.8, 2018-01-01 09:35=53.2, 2018-01-01 09:40=58.2, 2018-01-01 09:45=55.600002, 2018-01-01 09:50=56.733337, 2018-01-01 09:55=62.0, 2018-01-01 10:00=66.3, 2018-01-01 10:05=62.466663, 2018-01-01 10:10=62.699997, 2018-01-01 10:15=70.3, 2018-01-01 10:20=87.1, 2018-01-01 10:25=88.24999, 2018-01-01 10:30=102.5, 2018-01-01 10:35=95.46667, 2018-01-01 10:40=100.73334, 2018-01-01 10:45=100.700005, 2018-01-01 10:50=102.06667, 2018-01-01 10:55=116.4, 20
/[CB
X7BP
18-01-01 11:00=126.7, 2018-01-01 11:05=125.166664, 2018-01-01 11:10=128.26666, 2018-01-01 11:15=125.43333, 2018-01-01 11:20=119.666664, 2018-01-01 11:25=116.649994, 2018-01-01 11:30=94.700005, 2018-01-01 11:35=101.7, 2018-01-01 11:40=95.13333, 2018-01-01 11:45=98.76666, 2018-01-01 11:50=98.466675, 2018-01-01 11:55=92.43334, 2018-01-01 12:00=85.96667, 2018-01-01 12:05=77.833336, 2018-01-01 12:10=75.95, 2018-01-01 12:15=67.75, 2018-01-01 12:20=57.699997, 2018-01-01 12:25=74.2, 2018-01-01 12:30=87.1, 2018-01-01 12:35=77.6, 2018-01-01 12:40=74.1, 2018-01-01 12:45=63.36667, 2018-01-01 12:50=59.300003, 2018-01-01 12:55=76.9, 2018-01-01 13:00=66.6, 2018-01-01 13:05=203.4, 2018-01-01 13:10=203.45, 2018-01-01 13:15=203.45, 2018-01-01 13:20=157.3, 2018-01-01 13:25=101.333336, 2018-01-01 13:30=96.45, 2018-01-01 13:35=81.3, 2018-01-01 13:40=93.7, 2018-01-01 13:45=127.9, 2018-01-01 13:50=176.1, 2018-01-01 13:55=152.0, 2018-01-01 14:00=169.6, 2018-01-01 14:05=203.2, 2018-01-01 14:10=257.5, 2018-01-01 14:15=261.30002, 2018-01-01 14:20=261.3, 2018-01-01 14:25=218.13335, 2018-01-01 14:30=385.5, 2018-01-01 14:35=287.5, 2018-01-01 14:40=248.35002, 2018-01-01 14:45=98.2, 2018-01-01 14:50=136.2, 2018-01-01 14:55=160.0, 2018-01-01 15:00=148.1
/[CB
X7BP
, 2018-01-01 15:05=133.59999, 2018-01-01 15:10=93.3, 2018-01-01 15:15=79.25, 2018-01-01 15:20=44.300003, 2018-01-01 15:25=36.56667, 2018-01-01 15:30=43.8, 2018-01-01 15:35=39.3, 2018-01-01 15:40=39.5, 2018-01-01 15:45=33.05, 2018-01-01 15:50=28.649998, 2018-01-01 15:55=26.65, 2018-01-01 16:00=16.55, 2018-01-01 16:05=7.5, 2018-01-01 16:10=0.0}","0":"2018-01-01"}
END_OF_INPUT
# preprocessing
# ignore until {"1":"{
# stop collecting at }","0":"2018-01-01"}
sed -E -n '/\{"1":"\{/,/\}","0":"[0-9]{4}-[0-9]{2}-[0-9]{2}"\}/p' |
# remove the /[CB + the next line + one newline more
sed -E '/\/\[CB/{N;d;n;}' |
# we shoudld get nice {"1":.....} lines here
# elements are separated by comma
# so we can just be cruel
tr ',' '\n' |
# now each line will have one date
# so for each data in line
# output it in our format(TM)
sed -E -n '
/.*[0-9]{2}([0-9]{2})-([0-9]{2})-([0-9]{2}) ([0-9]{2}:[0-9]{2})=([0-9]*.[0-9]*).*/{
s!!\3/\2/\1 \4 \5!
p
}
'
Will output:
08/01/18 08:50 4.5
08/01/18 08:55 9.5
08/01/18 11:30 76
08/01/18 11:35 73.9
08/01/18 11:40 65.93333
08/01/18 15:30 2.25
08/01/18 15:40 0.0
01/01/18 08:15 9.5
01/01/18 08:20 22.0
01/01/18 08:25 29.4
01/01/18 08:30 30.150002
01/01/18 08:35 35.3
01/01/18 08:40 42.0
01/01/18 08:45 77.5
01/01/18 08:50 62.6
01/01/18 08:55 62.6
01/01/18 09:00 75.4
01/01/18 09:05 61.199997
01/01/18 09:10 57.85
01/01/18 09:15 45.7
01/01/18 09:20 44.266666
01/01/18 09:25 47.2
01/01/18 09:30 46.8
01/01/18 09:35 53.2
01/01/18 09:40 58.2
01/01/18 09:45 55.600002
01/01/18 09:50 56.733337
01/01/18 09:55 62.0
01/01/18 10:00 66.3
01/01/18 10:05 62.466663
01/01/18 10:10 62.699997
01/01/18 10:15 70.3
01/01/18 10:20 87.1
01/01/18 10:25 88.24999
01/01/18 10:30 102.5
01/01/18 10:35 95.46667
01/01/18 10:40 100.73334
01/01/18 10:45 100.700005
01/01/18 10:50 102.06667
01/01/18 10:55 116.4
01/01/18 11:05 125.166664
01/01/18 11:10 128.26666
01/01/18 11:15 125.43333
01/01/18 11:20 119.666664
01/01/18 11:25 116.649994
01/01/18 11:30 94.700005
01/01/18 11:35 101.7
01/01/18 11:40 95.13333
01/01/18 11:45 98.76666
01/01/18 11:50 98.466675
01/01/18 11:55 92.43334
01/01/18 12:00 85.96667
01/01/18 12:05 77.833336
01/01/18 12:10 75.95
01/01/18 12:15 67.75
01/01/18 12:20 57.699997
01/01/18 12:25 74.2
01/01/18 12:30 87.1
01/01/18 12:35 77.6
01/01/18 12:40 74.1
01/01/18 12:45 63.36667
01/01/18 12:50 59.300003
01/01/18 12:55 76.9
01/01/18 13:00 66.6
01/01/18 13:05 203.4
01/01/18 13:10 203.45
01/01/18 13:15 203.45
01/01/18 13:20 157.3
01/01/18 13:25 101.333336
01/01/18 13:30 96.45
01/01/18 13:35 81.3
01/01/18 13:40 93.7
01/01/18 13:45 127.9
01/01/18 13:50 176.1
01/01/18 13:55 152.0
01/01/18 14:00 169.6
01/01/18 14:05 203.2
01/01/18 14:10 257.5
01/01/18 14:15 261.30002
01/01/18 14:20 261.3
01/01/18 14:25 218.13335
01/01/18 14:30 385.5
01/01/18 14:35 287.5
01/01/18 14:40 248.35002
01/01/18 14:45 98.2
01/01/18 14:50 136.2
01/01/18 14:55 160.0
01/01/18 15:00 148.1
01/01/18 15:05 133.59999
01/01/18 15:10 93.3
01/01/18 15:15 79.25
01/01/18 15:20 44.300003
01/01/18 15:25 36.56667
01/01/18 15:30 43.8
01/01/18 15:35 39.3
01/01/18 15:40 39.5
01/01/18 15:45 33.05
01/01/18 15:50 28.649998
01/01/18 15:55 26.65
01/01/18 16:00 16.55
01/01/18 16:05 7.5
01/01/18 16:10 0.0
In one command with GNU sed that supports \n as a newline:
sed -E -n '
/\{"1":"\{/,/\}","0":"[0-9]{4}-[0-9]{2}-[0-9]{2}"\}/{
# remove the /[CB + the next line + one newline more
/\/\[CB/{N;d;n;}
: loop
/([^\n]*)[0-9]{2}([0-9]{2})-([0-9]{2})-([0-9]{2}) ([0-9]{2}:[0-9]{2})=([0-9]*.[0-9]*)([^\n]*)/{
# put the interesting string on the end of the pattern space
s!!\1\7\n\4/\3/\2 \5 \6!
# again, until nothing interesting is found
b loop
}
# remove everything in front of the newline that we did not parse
s/[^\n]*\n//
# output
p
} '

pandas groupby not using for loop (how to make smart)

Suppose you have a pandas Series like this.
a = pd.Series(range(31),index = pd.date_range('2018-01-01','2018-01-31',freq='D'))
If you want to make groupby dataframe with multi index like this
data
date
2018-01-01 2018-01-01 0
2018-01-02 1
2018-01-03 2
2018-01-04 3
2018-01-05 4
2018-01-02 2018-01-02 1
2018-01-03 2
2018-01-04 3
2018-01-05 4
2018-01-06 5
2018-01-03 2018-01-03 2
2018-01-04 3
2018-01-05 4
2018-01-06 5
2018-01-07 6
.....
This data shows that in the first level Multi index, it shows the original date time index. And in the second level, we cut the date into 5 days.
For example, if first level is 2018-01-01, second level is 2018-01-01 to 2018-01-05.
If first level is 2018-01-15, second level is 2018-01-15 to 2018-01-19 and data is 14, 15, 16, 17, 18.
How Can I make this DataFrame or Series without any loop?
Use -
import datetime as dt
first = np.repeat(a.index.values,5)
second = [ i + np.timedelta64(j,'D') for i in a.index for j in range(5)]
arrays = [first, second]
print(np.shape(second))
d=pd.DataFrame(index=pd.MultiIndex.from_arrays(arrays, names=('date1', 'date2')))
Output (d.head())
value
date1 date2
2018-01-01 2018-01-01 0.0
2018-01-02 1.0
2018-01-03 2.0
2018-01-04 3.0
2018-01-05 4.0
2018-01-02 2018-01-02 1.0
2018-01-03 2.0
2018-01-04 3.0
2018-01-05 4.0
2018-01-06 5.0

Generate 15 minute date intervals and join matching rows

What I would like to do is get the 15 minute intervals based on a date range in a row and insert them into another table.
Given the following code gets me the date range which is part of my goal:
DECLARE #Table1 TABLE (ID INT IDENTITY(0,1), TIMEVALUE DATETIME, TIMEVALUE2 DATETIME);
DECLARE #start DATETIME2(7) = '2018-01-04 10:55:00'
DECLARE #end DATETIME2(7) = '2018-01-05 03:55:00'
SELECT #start = dateadd(minute, datediff(minute,0,#start) / 15 * 15, 0);
WITH CTE_DT AS
(
SELECT #start AS DT
UNION ALL
SELECT DATEADD(MINUTE,15,DT) FROM CTE_DT
WHERE DT< #end
)
INSERT INTO #Table1
SELECT DT, DATEADD(minute,14,dt) FROM CTE_DT
OPTION (MAXRECURSION 0);
SELECT * FROM #Table1
result:
ID TIMEVALUE TIMEVALUE2
0 2018-01-04 10:45:00.000 2018-01-04 10:59:00.000
1 2018-01-04 11:00:00.000 2018-01-04 11:14:00.000
2 2018-01-04 11:15:00.000 2018-01-04 11:29:00.000
3 2018-01-04 11:30:00.000 2018-01-04 11:44:00.000
4 2018-01-04 11:45:00.000 2018-01-04 11:59:00.000
5 2018-01-04 12:00:00.000 2018-01-04 12:14:00.000
6 2018-01-04 12:15:00.000 2018-01-04 12:29:00.000
7 2018-01-04 12:30:00.000 2018-01-04 12:44:00.000
8 2018-01-04 12:45:00.000 2018-01-04 12:59:00.000
..
..
What I want to accomplish is the apply the same logic above i from a record source.
So if my SourceData is
Col1 Col2 StartDate EndDate
AA AA 2018-01-01 13:25 2018-01-02 13:00
AA BB 2018-01-02 13:25 2018-01-03 13:00
so with a query somehow use the start and endate to produce this result with just a query
Col1 Col2 TIMEVALUE TIMEVALUE2
AA AA 2018-01-01 13:15:00 2018-01-01 13:29:00
AA AA 2018-01-01 13:30:00 2018-01-01 13:44:00
AA AA 2018-01-01 13:45:00 2018-01-01 13:59:00
...
...
AA AA 2018-01-02 12:30:00 2018-01-02 12:44:00
AA AA 2018-01-02 12:45:00 2018-01-02 12:59:00
AA AA 2018-01-02 13:00:00 2018-01-02 13:14:00
AA BB 2018-01-02 13:15:00 2018-01-02 13:29:00
AA BB 2018-01-02 13:30:00 2018-01-02 13:44:00
AA BB 2018-01-02 13:45:00 2018-01-02 13:59:00
...
...
AA BB 2018-01-03 12:30:00 2018-01-03 12:44:00
AA BB 2018-01-03 12:45:00 2018-01-03 12:59:00
AA BB 2018-01-03 13:00:00 2018-01-03 13:14:00
I would like to avoid using a cursor if I can. I have managed to make this work with a User Defined Function by passing the required columns with the select statement. I am hoping I can avoid using that if I can.
Change the end of the first query so that instead of
SELECT * FROM #Table1
It says:
SELECT * FROM #Table1 d
INNER JOIN SourceData sd
ON NOT(d.timevalue2 < sd.startdate OR d.timevalue1 > sd.enddate)
Consider taking your first query that generates the dates, and just run it for now until year 2030 and insert the dates into an actual table. Keep the query hanging around so it can be used again in ~11 years to add some more rows to the calendar table
Rather than using a rCTE (which is a form of RBAR), I would use a virtual tally table to generate your dates:
--; is a statement terminator, not a "beginninator". It goes at the end, for the start.
WITH N AS(
SELECT NULL AS N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) N(N)),
Tally AS (
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I
FROM N N1 --10
CROSS JOIN N N2 --100
CROSS JOIN N N3 --1000
CROSS JOIN N N4 --10000
)
SELECT DATEADD(MINUTE,15*I,#Start)
FROM Tally
WHERE DATEADD(MINUTE,15*I,#Start) < #End;
If you want to generate 15 minute intervals for a combination of columns, you can then do a further CROSS JOIN. For example:
--Assume CTEs are already declared
SELECT V.Col1, V.Col2,
DATEADD(MINUTE,15*T.I,#Start)
FROM Tally T
CROSS JOIN (VALUES('AA','AA'),('AA','BB')) V(Col1, Col2) --This could be a CROSS APPLY to a DISTINCT, or similar is you wish
WHERE DATEADD(MINUTE,15*T.I,#Start) < #End;

How can I delete from a table using the join and where clause? [duplicate]

This question already has answers here:
How can I delete using INNER JOIN with SQL Server?
(14 answers)
Closed 4 years ago.
The following is my code so far and the select query is what I would like to delete from the Events table.
declare #maxsnapevent table (sita varchar(10), snap date)
insert into #maxsnapevent
select sita, max(snapshot_date) from ukrmc.dbo.strategy group by sita
--select * from #maxsnapevent
--need to delete everything that the following code gives
select events.sita, events.date, events.event from ukrmc.dbo.events events
join #maxsnapevent max on max.sita = events.sita
where date >= max.snap and events.sita != 'lcypd' and events.sita != 'lonza'
example data from the strategy table:
date sita Event Snapshot_date
2018-01-01 London Bank Holiday 2017-12-31
2018-01-02 London 2017-12-31
2018-01-03 London 2017-12-31
2018-01-04 London Concert 2017-12-31
2018-01-02 London 2018-01-01
2018-01-03 London 2018-01-01
2018-01-04 London Concert 2018-01-01
2018-01-01 Bham Bank Holiday 2017-12-31
2018-01-02 Bham 2017-12-31
2018-01-03 Bham 2017-12-31
2018-01-04 Bham 2017-12-31
2018-01-02 Bham 2018-01-01
2018-01-03 Bham Charity 2018-01-01
2018-01-04 Bham 2018-01-01
example data from the events table:
date sita Event
2018-01-01 London Bank Holiday
2018-01-02 London
2018-01-03 London
2018-01-04 London Concert
2018-01-01 Bham Bank Holiday
2018-01-02 Bham
2018-01-03 Bham
2018-01-04 Bham Concert
As you can see each snapshot has several sitas with several dates.
Have you tried this next code?
delete events
from ukrmc.dbo.events events
join #maxsnapevent max on max.sita = events.sita
where date >= max.snap and events.sita != 'lcypd' and events.sita != 'lonza'

PostgreSQL group by with interval but without window functions

This is follow-up of my previous question:
PostgreSQL group by with interval
There was a very good answer but unfortunately it is not working with PostgreSQL 8.0 - some clients still use this old version.
So I need to find another solution without using window functions
Here is what I have as a table:
id quantity price1 price2 date
1 100 1 0 2018-01-01 10:00:00
2 200 1 0 2018-01-02 10:00:00
3 50 5 0 2018-01-02 11:00:00
4 100 1 1 2018-01-03 10:00:00
5 100 1 1 2018-01-03 11:00:00
6 300 1 0 2018-01-03 12:00:00
I need to sum "quantity" grouped by "price1" and "price2" but only when they change
So the end result should look like this:
quantity price1 price2 dateStart dateEnd
300 1 0 2018-01-01 10:00:00 2018-01-02 10:00:00
50 5 0 2018-01-02 11:00:00 2018-01-02 11:00:00
200 1 1 2018-01-03 10:00:00 2018-01-03 11:00:00
300 1 0 2018-01-03 12:00:00 2018-01-03 12:00:00
It is not efficient, but you can implement the same logic with subqueries:
select sum(quantity), price1, price2,
min(date) as dateStart, max(date) as dateend
from (select d.*,
(select count(*)
from data d2
where d2.date <= d.date
) as seqnum,
(select count(*)
from data d2
where d2.price1 = d.price1 and d2.price2 = d.price2 and d2.date <= d.date
) as seqnum_pp
from data d
) t
group by price1, price2, (seqnum - seqnum_pp)
order by dateStart