Find highest and lowest bar number from resample - pandas

My dataframe contains 30 minute OHLC data. I need to find out which bar had the highest value, and which one had the lowest value for each day. So for example:
28/05/2018 = the highest value was 1.16329 and it occurred on bar 6 for that day.
29/05/2018 = highest value was 1.159 occuring on bar 2
I have used the following formula which resamples into daily data but then I lose the information on what bar of the day the high and low was acheived.
d3 = df.resample('D').agg({'Open':'first', 'High':'max', 'Low':'min', 'Close':'last'})
Date Time Open High Low Last
28/05/2018 14:30:00 1.16167 1.16252 1.1613 1.16166
28/05/2018 15:00:00 1.16166 1.16287 1.16159 1.16276
28/05/2018 15:30:00 1.16277 1.16293 1.16177 1.16212
28/05/2018 16:00:00 1.16213 1.16318 1.16198 1.16262
28/05/2018 16:30:00 1.16262 1.16298 1.16258 1.16284
28/05/2018 17:00:00 1.16285 1.16329 1.16264 1.16265
28/05/2018 17:30:00 1.16266 1.163 1.16243 1.16289
28/05/2018 18:00:00 1.16288 1.1629 1.16228 1.16269
28/05/2018 18:30:00 1.16269 1.16278 1.16264 1.16274
28/05/2018 19:00:00 1.16275 1.16277 1.1627 1.16275
28/05/2018 19:30:00 1.16276 1.16284 1.1627 1.1628
28/05/2018 20:00:00 1.16279 1.16288 1.16264 1.16278
28/05/2018 20:30:00 1.16278 1.16289 1.1626 1.16265
28/05/2018 21:00:00 1.16267 1.1627 1.16251 1.16262
29/05/2018 14:30:00 1.15793 1.15827 1.15714 1.15786
29/05/2018 15:00:00 1.15785 1.159 1.15741 1.15814
29/05/2018 15:30:00 1.15813 1.15813 1.15601 1.15647
29/05/2018 16:00:00 1.15647 1.15658 1.15451 1.15539
29/05/2018 16:30:00 1.15539 1.15601 1.15418 1.1551
29/05/2018 17:00:00 1.15508 1.15599 1.15463 1.15527
29/05/2018 17:30:00 1.15528 1.15587 1.15442 1.15465
29/05/2018 18:00:00 1.15465 1.15469 1.15196 1.15261
29/05/2018 18:30:00 1.15261 1.15441 1.15261 1.15349
29/05/2018 19:00:00 1.15348 1.15399 1.15262 1.15399
29/05/2018 19:30:00 1.154 1.15412 1.15239 1.15322
29/05/2018 20:00:00 1.15322 1.15373 1.15262 1.15367
29/05/2018 20:30:00 1.15367 1.15419 1.15351 1.15367
29/05/2018 21:00:00 1.15366 1.15438 1.15352 1.15354
29/05/2018 21:30:00 1.15355 1.15355 1.15354 1.15354
30/05/2018 14:30:00 1.16235 1.16323 1.16133 1.16161
30/05/2018 15:00:00 1.16162 1.16193 1.1602 1.16059
Any ideas on how to acheive this?

You could groupby and apply some sorting logic to retain the Time columns, such as:
highs = df.groupby(df.index).apply(lambda x: x.sort_values(by='High').iloc[-1])
lows = df.groupby(df.index).apply(lambda x: x.sort_values(by='Low').iloc[0])
Output:
# Highs
Time Open High Low Last
Date
2018-05-28 17:00:00 1.16285 1.16329 1.16264 1.16265
2018-05-29 15:00:00 1.15785 1.15900 1.15741 1.15814
2018-05-30 14:30:00 1.16235 1.16323 1.16133 1.16161
# Lows
Time Open High Low Last
Date
2018-05-28 14:30:00 1.16167 1.16252 1.16130 1.16166
2018-05-29 18:00:00 1.15465 1.15469 1.15196 1.15261
2018-05-30 15:00:00 1.16162 1.16193 1.16020 1.16059
EDIT
To join then, something like that should do it:
new_df = pd.concat([highs.Time.rename('time_of_high'), lows.Time.rename('time_of_low')], 1)
Output:
time_of_high time_of_low
Date
28/05/2018 17:00:00 14:30:00
29/05/2018 15:00:00 18:00:00
30/05/2018 14:30:00 15:00:00

Related

Pandas replace daily observations by monthly mean

Suppose, I have a pandas Series with daily observations:
pd_series = pd.Series(np.random.rand(26281), index = pd.date_range('2022-01-01', '2024-12-31', freq = 'H'))
pd_series
2022-01-01 00:00:00 0.933746
2022-01-01 01:00:00 0.588907
2022-01-01 02:00:00 0.229040
2022-01-01 03:00:00 0.557752
2022-01-01 04:00:00 0.798649
2024-12-30 20:00:00 0.314143
2024-12-30 21:00:00 0.670485
2024-12-30 22:00:00 0.300531
2024-12-30 23:00:00 0.075403
2024-12-31 00:00:00 0.716685
What I want is to replace every observation by the monthly average. I know that the average can be calculated as
pd_series.resample('MS').mean()
But how do I put the observations to the respective observations?
Use Resampler.transform:
print (pd_series.resample('MS').transform('mean'))
2022-01-01 00:00:00 0.495015
2022-01-01 01:00:00 0.495015
2022-01-01 02:00:00 0.495015
2022-01-01 03:00:00 0.495015
2022-01-01 04:00:00 0.495015
2024-12-30 20:00:00 0.508646
2024-12-30 21:00:00 0.508646
2024-12-30 22:00:00 0.508646
2024-12-30 23:00:00 0.508646
2024-12-31 00:00:00 0.508646
Freq: H, Length: 26281, dtype: float64

Pandas: merge two time series and get the mean values during the period when these two have overlapped time period

I got two pandas dataframes as following:
ts1
Out[50]:
soil_moisture_ids41
date_time
2007-01-07 05:00:00 0.1830
2007-01-07 06:00:00 0.1825
2007-01-07 07:00:00 0.1825
2007-01-07 08:00:00 0.1825
2007-01-07 09:00:00 0.1825
... ...
2017-10-10 20:00:00 0.0650
2017-10-10 21:00:00 0.0650
2017-10-10 22:00:00 0.0650
2017-10-10 23:00:00 0.0650
2017-10-11 00:00:00 0.0650
[94316 rows x 3 columns]
and the other one is
ts2
Out[51]:
soil_moisture_ids42
date_time
2016-07-20 00:00:00 0.147
2016-07-20 01:00:00 0.148
2016-07-20 02:00:00 0.149
2016-07-20 03:00:00 0.150
2016-07-20 04:00:00 0.152
... ...
2019-12-31 19:00:00 0.216
2019-12-31 20:00:00 0.216
2019-12-31 21:00:00 0.215
2019-12-31 22:00:00 0.215
2019-12-31 23:00:00 0.215
[30240 rows x 3 columns]
You could see that, from 2007-01-07 to 2016-07-19, only ts1 has the data points. And from 2016-07-20 to 2017-10-11 there are some overlapped time series. Now I want to combine these two data frames. During the overlapped period, I want to get the mean values over ts1 and ts2. During the non-overlapped period, (2007-01-07 to 2016-07-19 and 2017-10-12 to 2019-12-31), the values at each time stamp is set as the value from ts1 or ts2. So how can I do it?
Thanks!
Use concat with aggregate mean, if only one value get same ouput, if multiple get mean. Also finally DatatimeIndex is sorted:
s = pd.concat([ts1, ts2]).groupby(level=0).mean()
Just store the concatenated series first and then apply the mean. i.e. merged_ts = pd.concat([ts1, ts2]) and then mean_ts = merged_ts.group_by(level=0).mean()

resample dataset with one irregular datetime

I have a dataframe like the following. I wanted to check the values for each 15minutes. But I see that there is a time at 09:05:51. How can I resample the dataframe for 15minutes?
hour_min value
06:30:00 0.0
06:45:00 0.0
07:00:00 0.0
07:15:00 0.0
07:30:00 102.754717
07:45:00 130.599057
08:00:00 154.117925
08:15:00 189.061321
08:30:00 214.924528
08:45:00 221.382075
09:00:00 190.839623
09:05:51 428.0
09:15:00 170.973995
09:30:00 0.0
09:45:00 0.0
10:00:00 174.448113
10:15:00 174.900943
10:30:00 182.976415
10:45:00 195.783019
11:00:00 200.337292
11:14:00 80.0
11:15:00 206.280952
11:30:00 218.87886
11:45:00 238.251781
12:00:00 115.5
12:15:00 85.5
12:30:00 130.0
12:45:00 141.0
13:00:00 267.353774
13:15:00 257.061321
13:21:00 8.0
13:27:19 80.0
13:30:00 258.761905
13:45:00 254.703088
13:53:52 278.0
14:00:00 254.790476
14:15:00 247.165094
14:30:00 250.061321
14:45:00 264.014151
15:00:00 132.0
15:15:00 108.0
15:30:00 158.5
15:45:00 457.0
16:00:00 273.745283
16:15:00 273.962264
16:30:00 279.089623
16:45:00 280.264151
17:00:00 296.061321
17:15:00 296.481132
17:30:00 282.957547
17:45:00 279.816038
I have tried this line, but i get a typeError.
res = s.resample('15T').sum()
I tried to make the index to date, but it does not work too.

Why do I get different values when I extract data from netCDF files using CDO and ArcGIS for a same grid point?

details of the raw data (Mnth.nc)
netcdf Mnth {
dimensions:
time = UNLIMITED ; // (480 currently)
bnds = 2 ;
longitude = 25 ;
latitude = 33 ;
variables:
double time(time) ;
time:standard_name = "time" ;
time:long_name = "verification time generated by wgrib2 function verftime()" ;
time:bounds = "time_bnds" ;
time:units = "seconds since 1970-01-01 00:00:00.0 0:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
double time_bnds(time, bnds) ;
double longitude(longitude) ;
longitude:standard_name = "longitude" ;
longitude:long_name = "longitude" ;
longitude:units = "degrees_east" ;
longitude:axis = "X" ;
double latitude(latitude) ;
latitude:standard_name = "latitude" ;
latitude:long_name = "latitude" ;
latitude:units = "degrees_north" ;
latitude:axis = "Y" ;
float APCP_sfc(time, latitude, longitude) ;
APCP_sfc:long_name = "Total Precipitation" ;
APCP_sfc:units = "kg/m^2" ;
APCP_sfc:_FillValue = 9.999e+20f ;
APCP_sfc:missing_value = 9.999e+20f ;
APCP_sfc:cell_methods = "time: sum" ;
APCP_sfc:short_name = "APCP_surface" ;
APCP_sfc:level = "surface" ;
}
Detail information of the raw data (Mnth.nc)
File format : NetCDF4 classic
-1 : Institut Source T Steptype Levels Num Points Num Dtype : Parameter ID
1 : unknown unknown v instant 1 1 825 1 F32 : -1
Grid coordinates :
1 : lonlat : points=825 (25x33)
longitude : 87 to 89.88 by 0.12 degrees_east
latitude : 25.08 to 28.92 by 0.12 degrees_north
Vertical coordinates :
1 : surface : levels=1
Time coordinate : 480 steps
RefTime = 1970-01-01 00:00:00 Units = seconds Calendar = standard Bounds = true
YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss
1980-01-16 12:30:00 1980-02-15 12:30:00 1980-03-16 12:30:00 1980-04-16 00:30:00
1980-05-16 12:30:00 1980-06-16 00:30:00 1980-07-16 12:30:00 1980-08-16 12:30:00
1980-09-16 00:30:00 1980-10-16 12:30:00 1980-11-16 00:30:00 1980-12-16 12:30:00
1981-01-16 12:30:00 1981-02-15 00:30:00 1981-03-16 12:30:00 1981-04-16 00:30:00
1981-05-16 12:30:00 1981-06-16 00:30:00 1981-07-16 12:30:00 1981-08-16 12:30:00
1981-09-16 00:30:00 1981-10-16 12:30:00 1981-11-16 00:30:00 1981-12-16 12:30:00
1982-01-16 12:30:00 1982-02-15 00:30:00 1982-03-16 12:30:00 1982-04-16 00:30:00
1982-05-16 12:30:00 1982-06-16 00:30:00 1982-07-16 12:30:00 1982-08-16 12:30:00
1982-09-16 00:30:00 1982-10-16 12:30:00 1982-11-16 00:30:00 1982-12-16 12:30:00
1983-01-16 12:30:00 1983-02-15 00:30:00 1983-03-16 12:30:00 1983-04-16 00:30:00
1983-05-16 12:30:00 1983-06-16 00:30:00 1983-07-16 12:30:00 1983-08-16 12:30:00
1983-09-16 00:30:00 1983-10-16 12:30:00 1983-11-16 00:30:00 1983-12-16 12:30:00
1984-01-16 12:30:00 1984-02-15 12:30:00 1984-03-16 12:30:00 1984-04-16 00:30:00
1984-05-16 12:30:00 1984-06-16 00:30:00 1984-07-16 12:30:00 1984-08-16 12:30:00
1984-09-16 00:30:00 1984-10-16 12:30:00 1984-11-16 00:30:00 1984-12-16 12:30:00
................................................................................
............................
2016-01-16 12:30:00 2016-02-15 12:30:00 2016-03-16 12:30:00 2016-04-16 00:30:00
2016-05-16 12:30:00 2016-06-16 00:30:00 2016-07-16 12:30:00 2016-08-16 12:30:00
2016-09-16 00:30:00 2016-10-16 12:30:00 2016-11-16 00:30:00 2016-12-16 12:30:00
2017-01-16 12:30:00 2017-02-15 00:30:00 2017-03-16 12:30:00 2017-04-16 00:30:00
2017-05-16 12:30:00 2017-06-16 00:30:00 2017-07-16 12:30:00 2017-08-16 12:30:00
2017-09-16 00:30:00 2017-10-16 12:30:00 2017-11-16 00:30:00 2017-12-16 12:30:00
2018-01-16 12:30:00 2018-02-15 00:30:00 2018-03-16 12:30:00 2018-04-16 00:30:00
2018-05-16 12:30:00 2018-06-16 00:30:00 2018-07-16 12:30:00 2018-08-16 12:30:00
2018-09-16 00:30:00 2018-10-16 12:30:00 2018-11-16 00:30:00 2018-12-16 12:30:00
2019-01-16 12:30:00 2019-02-15 00:30:00 2019-03-16 12:30:00 2019-04-16 00:30:00
2019-05-16 12:30:00 2019-06-16 00:30:00 2019-07-16 12:30:00 2019-08-16 12:30:00
2019-09-16 00:30:00 2019-10-16 12:30:00 2019-11-16 00:30:00 2019-12-16 12:30:00
2020-01-16 12:30:00 2020-02-15 12:30:00 2020-03-16 12:30:00 2020-04-16 00:30:00
2020-05-16 12:30:00 2020-06-16 00:30:00 2020-07-16 12:30:00 2020-08-16 12:30:00
2020-09-16 00:30:00 2020-10-16 12:30:00 2020-11-16 00:30:00 2020-12-16 12:30:00
cdo sinfo: Processed 1 variable over 480 timesteps [0.50s 30MB].
I extracted monthly rainfall values from the Mnth.nc file for a location (lon: 88.44; lat: 27.12)using the following command
cdo remapnn,lon=88.44-lat=27.12 Mnth.nc Mnth1.nc
cdo outputtab,year, month, value Mnth1.nc > Mnth.csv
The output is as follows ()
Year month Value
1980 1 31.74219
1980 2 54.60938
1980 3 66.94531
1980 4 149.4062
1980 5 580.7227
1980 6 690.1328
1980 7 1146.305
1980 8 535.8164
1980 9 486.4688
1980 10 119.5391
1980 11 82.10547
1980 12 13.95703
Then I extracted the rainfall values from the same data (Mnth.nc) for the same location (lon: 88.44; lat: 27.12) using the features of the multidimensional toolbox provided in ArcGIS. The result is as follows-
year month Value
1980 1 38.8125
1980 2 58.6542969
1980 3 71.7382813
1980 4 148.6367188
1980 5 564.7070313
1980 6 653.0390625
1980 7 1026.832031
1980 8 501.3164063
1980 9 458.5429688
1980 10 113.078125
1980 11 74.0976563
1980 12 24.2265625
Why I'm getting different results in two different software for the same location and for the same variable? Any help will highly be appreciated.
Thanks in advance.
The question is perhaps misleading, in that you are not "extracting" the data in both cases. Instead you are interpolating it. The method used by CDO is nearest neighbour. arcGIS is probably simply using a different method, so you should get different results. They should give slightly different results.
The results look very similar, so both are almost certainly working as advertised.
I think I ended up in the same issues. I used CDO to extract a point and also used ArcGIS for cross checking. I found out that the values were different.
Just to be sure, I recorded the location extent of one particular cell and tried extracting values for different locations within the cell boundary extent. CDO seemed to have been giving the same results as expected because it uses nearest neighbour resampling method.
Then I tried the same with ArcGIS. Interestingly, in my case, I found out that ArcGIS also gave me same results sometimes within the same cell boundary extent and sometimes different. I checked the values by also using 'Panoply' and I realised that CDO gave accurate results, while ArcGIS was sometimes giving offset results,i.e., it was giving the values of the nearby cells. This was confirmed by cross-checking with Panoply. As #Robert Wilson mentioned that ArcGIS must be using different resampling method, I figured out in the results section after using the tool 'Netcdf to table view' that it also uses Nearest neighbour method. This is not an answer to your question, but just something I found.

SQL Server : get start time and end time with in the multiple night shift

How can I get the Start and End time of this list? I can add date to this time and can get by min and max but you can see row 3 have next day shift but it will come under same date because it is night shift
I have added normal day shift employee also get the logic right
EmployeeId ShiftDate ShiftStartTime ShiftEndTime
-----------------------------------------------------
20040 2017-11-01 21:00:00 23:00:00
20040 2017-11-01 23:00:00 00:30:00
20040 2017-11-01 00:30:00 06:00:00
20124 2017-11-01 09:00:00 16:30:00
20124 2017-11-01 16:30:00 22:00:00
20124 2017-11-01 22:00:00 22:30:00
I need it like below:
EmployeeId ShiftDate ShiftStartTime ShiftEndTime
----------------------------------------------------
20040 2017-11-01 21:00:00 06:00:00
20124 2017-11-01 09:00:00 22:30:00
In a commercial environment we solved this by attaching a FLAG to each shift. The Flag would indicate the 'Reporting Date' of the Shift...The Flag would have have a value of 1 if the 'Reporting / Administrative date' was the 'next' day. 0 for the same day. -1 for the previous day (which we never used...depends on your scenario)
I modified your table to show a possible SHIFTS table, which should also have a NAME column I guess (like Morning, Afternoon, Day, Night shift etc)
ReportFlag ShiftStartTime ShiftEndTime
1 21:00:00 23:00:00
1 23:00:00 00:30:00
0 00:30:00 06:00:00
0 09:00:00 16:30:00
0 16:30:00 22:00:00
1 22:00:00 22:30:00
Notice how I added 1 - to say that 'this shift' is actually considered to be on the 'next' day.
Then you can use your flag value 0,1 to add to DATE functions in your queries too