Create an event log from an excel file by turning columns into repeated rows - vba

I have an Excel sheet like the following:
ID Arrival Passed Berthing Date UnBerthing Date Departure Passed
1 13/05/2017 15:30 13/05/2017 16:00 31/05/2017 20:44 31/05/2017
2 15/05/2017 16:56 15/05/2017 17:15 16/05/2017 00:00 16/05/2017
3 20/05/2017 09:54 20/05/2017 10:26 20/05/2017 18:07 20/05/2017
4 24/05/2017 16:09 24/05/2017 16:35 25/05/2017 01:03 25/05/2017
5 29/05/2017 10:30 29/05/2017 10:45 29/05/2017 17:33 29/05/2017
I need this in the following format:
ID Event Time
1 Arrival 13/05/2017 15:30
1 Berth 13/05/2017 16:00
1 UnBerth 31/05/2017 20:44
1 Departure 31/05/2017 20:58
2 Arrival 15/05/2017 16:56
2 Berth 15/05/2017 17:15
2 UnBerth 16/05/2017 00:00
2 Departure 16/05/2017 00:04
etc
I've searched the web and this site(youtube...), but with no right answer, i've tried the transpose function and pivot table, but i couldn't make it.
Any help would be appreciated.
Thanks you.

Assuming that your dataset is in range A2:E6.
For getting ID:
=INDEX($A$2:$E$6,CEILING(ROWS($A$1:A1)/4,1),1)
For getting Event:
=CHOOSE(MOD(ROWS($A$1:A1)-1,4)+1,"Arrival","Berth","Unberth","Departure")
For getting Time:
=INDEX($A$2:$E$6,CEILING(ROWS($A$1:A1)/4,1),MOD(ROWS($A$1:A1)-1,4)+2)
and then copy down until you get error.

Related

Getting wrong(?) average when calculating values in a time range

I am working with AWS Redshift / PostgreSQL. I have two tables that can be joined on the interval_date (DATE data_type) and interval_time_utc (VARCHAR data type) and/or the status and price_source columns. Source A is equivalent to the Y status and Source B is equivalent to the N status. I am trying to get the average price and the sum of mw_power for a given hour for each status / price_source. An hour is the timestamps from XX:05 to XX:00 so for 15:00, the values should be from the 14:05 to the 15:00 timestamps. Even if for an hour interval where all status are one value, I still need to calculate the average price for both price_source values, but the sum of mw_power would be 0. I am passing in the date and time intervals through my application code. I am seeing a different average price for the 15:00 hour than I expect so either I am bad at math or there is a bug in my query I can't determine. The 14:00 and 16:00 hour results come back as expected.
power_table
interval_date
interval_time_utc
mw_power
status
2022-05-09
13:00
92.25
N
2022-05-09
13:05
90.75
N
2022-05-09
13:10
91.25
N
2022-05-09
13:15
92.00
N
2022-05-09
13:20
92.00
N
2022-05-09
13:25
90.00
N
2022-05-09
13:30
93.00
N
2022-05-09
13:35
91.75
N
2022-05-09
13:40
90.25
N
2022-05-09
13:45
93.00
N
2022-05-09
13:50
91.00
N
2022-05-09
13:55
94.00
N
2022-05-09
14:00
91.00
N
2022-05-09
14:05
91.00
N
2022-05-09
14:10
94.00
N
2022-05-09
14:15
92.00
N
2022-05-09
14:20
91.00
N
2022-05-09
14:25
94.00
Y
2022-05-09
14:30
92.00
Y
2022-05-09
14:35
91.75
Y
2022-05-09
14:40
92.25
Y
2022-05-09
14:45
91.00
Y
2022-05-09
14:50
92.00
Y
2022-05-09
14:55
93.00
Y
2022-05-09
15:00
90.00
Y
price_table
interval_date
interval_time_utc
price
price_source
2022-05-09
13:00
54.20
Source A
2022-05-09
13:05
54.20
Source A
2022-05-09
13:10
54.20
Source A
2022-05-09
13:00
54.20
Source B
2022-05-09
13:05
54.20
Source B
2022-05-09
13:10
54.20
Source B
2022-05-09
13:15
34.11
Source A
2022-05-09
13:20
34.11
Source A
2022-05-09
13:25
34.11
Source A
2022-05-09
13:15
39.61
Source B
2022-05-09
13:20
39.61
Source B
2022-05-09
13:25
39.61
Source B
2022-05-09
13:30
2.81
Source A
2022-05-09
13:35
2.81
Source A
2022-05-09
13:40
2.81
Source A
2022-05-09
13:30
17.13
Source B
2022-05-09
13:35
17.13
Source B
2022-05-09
13:40
17.13
Source B
2022-05-09
13:45
1.58
Source A
2022-05-09
13:50
1.58
Source A
2022-05-09
13:55
1.58
Source A
2022-05-09
13:45
15.98
Source B
2022-05-09
13:50
15.98
Source B
2022-05-09
13:55
15.98
Source B
2022-05-09
14:00
4.60
Source A
2022-05-09
14:05
4.60
Source A
2022-05-09
14:10
4.60
Source A
2022-05-09
14:00
18.09
Source B
2022-05-09
14:05
18.09
Source B
2022-05-09
14:10
18.09
Source B
2022-05-09
14:15
2.46
Source A
2022-05-09
14:20
2.46
Source A
2022-05-09
14:25
2.46
Source A
2022-05-09
14:15
16.66
Source B
2022-05-09
14:20
16.66
Source B
2022-05-09
14:25
16.66
Source B
2022-05-09
14:30
3.36
Source A
2022-05-09
14:35
3.36
Source A
2022-05-09
14:40
3.36
Source A
2022-05-09
14:30
21.52
Source B
2022-05-09
14:35
21.52
Source B
2022-05-09
14:40
21.52
Source B
2022-05-09
14:45
4.55
Source A
2022-05-09
14:50
4.55
Source A
2022-05-09
14:55
4.55
Source A
2022-05-09
14:45
16.30
Source B
2022-05-09
14:50
16.30
Source B
2022-05-09
14:55
16.30
Source B
2022-05-09
15:00
-21.87
Source A
2022-05-09
15:00
4.96
Source B
-- query that i am using to get hourly values
SELECT pricet.price_source,
COALESCE(powert.volume, 0),
pricet.price,
powert.status
FROM (SELECT status,
SUM(mw_power) volume
FROM power_table
WHERE (interval_date || ' ' || interval_time_utc)::timestamp BETWEEN '2022-05-09 14:05:00.0' AND '2022-05-09 15:00:00.0'
GROUP BY status) powert
RIGHT JOIN (SELECT price_source,
AVG(price) price
FROM price_table
WHERE (interval_date || ' ' || interval_time_utc)::timestamp BETWEEN '2022-05-09 14:05:00.0' AND '2022-05-09 15:00:00.0'
GROUP BY price_source) pricet
ON pricet.price_source = CASE WHEN powert.status = 'Y' THEN 'Source A'
ELSE 'Source B'
END;
I am looking to get an expected output of the following for the 15:00 hour:
price_source
volume
price
status
Source A
736.00
0.54
Y
Source B
368.00
17.38
N
Result that I'm getting from query:
price_source
volume
price
status
Source A
736.00
1.54
Y
Source B
368.00
17.05
N
db fiddle link of tables and query and results: https://dbfiddle.uk/?rdbms=postgres_14&fiddle=474b009c5cf5366961751a61c0f96c6c
I think you made a calculator error. I changed your fiddle to add a rolling sum and rolling average for the second part of your query. To get an average of .54 (Source A) your sum would need to be 12 less than the total of the values for this hour. 12 is the count of values for the hour so a possible slip in subtracting 12 before dividing by 12?
The other source (B) the total would need to be off by 4m (an addition of 4 to the sum). Not sure how this could have happened but ...
Anyway the fiddle is at https://dbfiddle.uk/?rdbms=postgres_14&fiddle=e65c38677f3ab92607bbff778bc0f69e

Grouping time-series by some custom datetime range?

I have a simple OHLCV time-series. I obtain it from yahoo finance.
Open High Adj Close Volume
Datetime
2021-11-27 00:00:00+00:00 53736.429688 54287.300781 54287.300781 349732864
2021-11-27 00:15:00+00:00 54321.816406 54470.097656 54470.097656 50278400
2021-11-27 00:30:00+00:00 54362.085938 54688.476562 54563.937500 125132800
2021-11-27 00:45:00+00:00 54552.707031 54552.707031 54208.027344 23285760
2021-11-27 01:00:00+00:00 54186.679688 54304.398438 54080.007812 0
2022-01-25 07:30:00+00:00 35861.457031 36036.191406 36023.011719 389357568
2022-01-25 07:45:00+00:00 36036.500000 36078.332031 36075.312500 102707200
2022-01-25 08:00:00+00:00 36069.089844 36211.867188 36152.500000 234246144
2022-01-25 08:15:00+00:00 36179.812500 36179.812500 36179.812500 125779968
2022-01-25 08:16:00+00:00 36283.058594 36283.058594 36283.058594 0
I know how to group them by frequency using resample.
df.resample("6H", origin="end")
Also I use origin="end" so that the intervals are relative to the end.
Can I somehow group the time-series by some custom defined range. Let's say: 6horus, 12h, 24h, 7days, 1month, 3m , 1y ... Those are the time-periods substracted from the last value.
The date-range would look like this:
2021-01-25 08:16:00+00:00
2021-10-25 08:16:00+00:00
2021-12-25 08:16:00+00:00
2022-01-18 08:16:00+00:00
2022-01-24 08:16:00+00:00
2022-01-24 20:16:00+00:00
2022-01-25 02:16:00+00:00
2022-01-25 08:16:00+00:00
Now I know I could do several resamples and than filter and concatenate the resulting data-frames. I was wondering is there a simpler way to do this?

How to group field by id and find the sum?

I have the following data
id starting_point ending_point Date
A 2525 6565 25/05/2017 13:25:00
B 5656 8989 25/01/2017 10:55:00
A 1234 5656 20/05/2017 03:20:00
A 4562 6245 01/02/2017 19:45:00
B 6496 9999 06/12/2016 21:55:00
B 1122 2211 20/03/2017 18:30:00
How to group the data by their id in the ascending order of date and find the sum of first stating point and last starting point. In this case,
Expected output is :
id starting_point ending_point Date Value
A 4562 6245 01/02/2017 19:45:00
A 1234 5656 20/05/2017 03:20:00
A 2525 6565 25/05/2017 13:25:00 4532 + 6565 = 11127
B 6496 9999 06/12/2016 21:55:00
B 1122 2211 20/03/2017 18:30:00 6496 + 2211 = 8707
IIUC:
In [146]: x.groupby('id').apply(lambda df: df['starting_point'].head(1).values[0]
+ df['ending_point'].tail(1).values[0])
Out[146]:
id
A 8770
B 7867
dtype: int64

How to make SAS Log information appear in Excel file

I need to create a table that keeps track of my SAS Job. It should get add a new row everytime I run the job without deleting the previous record as below. Also the information here should contain the following information:
obs Report ID User Start time End Time Duration #of Records
1 328 st150 1/2/2017 16:39 1/2/2017 16:42 02:21 1,231,531,321
2 325 st123 1/2/2017 16:40 1/2/2017 16:41 01:25 231,361,546
3 326 vt125 1/2/2017 16:41 1/2/2017 16:48 07:29 32,654,642
4 328 vt126 2/2/2017 13:22 2/2/2017 13:26 04:23 1,231,531,131
5 326 st150 2/2/2017 13:30 2/2/2017 13:35 05:11 32,653,942
6 329 st320 2/3/2017 13:32 2/3/2017 13:39 07:15 3,562,626,464
Everytime the report is run this table needs to add the next row with the above information in that table. Is it doable in SAS? IF so, Can anyone help me in getting to know how?
Any help will be appreciated.

How to calculate time difference of rows using Lag/Lead in DB2?

I have this following table:
id Date Hour Description Username
1 2015-05-13 10:08 SessionClosed Thierry
2 2015-05-12 23:30 SessionClosed Leao
3 2015-05-12 20:50 SessionOpened Thierry
4 2015-05-11 17:10 SessionOpened Leao
How can I calculate the difference in time of each user's session?
I'm using DB2.
The result should look like this:
id Date Hour Description Username DiffTime
1 2015-05-13 10:08 SessionClosed Thierry 14:18
2 2015-05-12 23:30 SessionClosed Leao 30:20
3 2015-05-12 20:50 SessionOpened Thierry 00:00
4 2015-05-11 17:10 SessionOpened Leao 00:00