Pandas: merge two time series and get the mean values during the period when these two have overlapped time period - pandas

I got two pandas dataframes as following:
ts1
Out[50]:
soil_moisture_ids41
date_time
2007-01-07 05:00:00 0.1830
2007-01-07 06:00:00 0.1825
2007-01-07 07:00:00 0.1825
2007-01-07 08:00:00 0.1825
2007-01-07 09:00:00 0.1825
... ...
2017-10-10 20:00:00 0.0650
2017-10-10 21:00:00 0.0650
2017-10-10 22:00:00 0.0650
2017-10-10 23:00:00 0.0650
2017-10-11 00:00:00 0.0650
[94316 rows x 3 columns]
and the other one is
ts2
Out[51]:
soil_moisture_ids42
date_time
2016-07-20 00:00:00 0.147
2016-07-20 01:00:00 0.148
2016-07-20 02:00:00 0.149
2016-07-20 03:00:00 0.150
2016-07-20 04:00:00 0.152
... ...
2019-12-31 19:00:00 0.216
2019-12-31 20:00:00 0.216
2019-12-31 21:00:00 0.215
2019-12-31 22:00:00 0.215
2019-12-31 23:00:00 0.215
[30240 rows x 3 columns]
You could see that, from 2007-01-07 to 2016-07-19, only ts1 has the data points. And from 2016-07-20 to 2017-10-11 there are some overlapped time series. Now I want to combine these two data frames. During the overlapped period, I want to get the mean values over ts1 and ts2. During the non-overlapped period, (2007-01-07 to 2016-07-19 and 2017-10-12 to 2019-12-31), the values at each time stamp is set as the value from ts1 or ts2. So how can I do it?
Thanks!

Use concat with aggregate mean, if only one value get same ouput, if multiple get mean. Also finally DatatimeIndex is sorted:
s = pd.concat([ts1, ts2]).groupby(level=0).mean()

Just store the concatenated series first and then apply the mean. i.e. merged_ts = pd.concat([ts1, ts2]) and then mean_ts = merged_ts.group_by(level=0).mean()

Related

Pandas replace daily observations by monthly mean

Suppose, I have a pandas Series with daily observations:
pd_series = pd.Series(np.random.rand(26281), index = pd.date_range('2022-01-01', '2024-12-31', freq = 'H'))
pd_series
2022-01-01 00:00:00 0.933746
2022-01-01 01:00:00 0.588907
2022-01-01 02:00:00 0.229040
2022-01-01 03:00:00 0.557752
2022-01-01 04:00:00 0.798649
2024-12-30 20:00:00 0.314143
2024-12-30 21:00:00 0.670485
2024-12-30 22:00:00 0.300531
2024-12-30 23:00:00 0.075403
2024-12-31 00:00:00 0.716685
What I want is to replace every observation by the monthly average. I know that the average can be calculated as
pd_series.resample('MS').mean()
But how do I put the observations to the respective observations?
Use Resampler.transform:
print (pd_series.resample('MS').transform('mean'))
2022-01-01 00:00:00 0.495015
2022-01-01 01:00:00 0.495015
2022-01-01 02:00:00 0.495015
2022-01-01 03:00:00 0.495015
2022-01-01 04:00:00 0.495015
2024-12-30 20:00:00 0.508646
2024-12-30 21:00:00 0.508646
2024-12-30 22:00:00 0.508646
2024-12-30 23:00:00 0.508646
2024-12-31 00:00:00 0.508646
Freq: H, Length: 26281, dtype: float64

resample dataset with one irregular datetime

I have a dataframe like the following. I wanted to check the values for each 15minutes. But I see that there is a time at 09:05:51. How can I resample the dataframe for 15minutes?
hour_min value
06:30:00 0.0
06:45:00 0.0
07:00:00 0.0
07:15:00 0.0
07:30:00 102.754717
07:45:00 130.599057
08:00:00 154.117925
08:15:00 189.061321
08:30:00 214.924528
08:45:00 221.382075
09:00:00 190.839623
09:05:51 428.0
09:15:00 170.973995
09:30:00 0.0
09:45:00 0.0
10:00:00 174.448113
10:15:00 174.900943
10:30:00 182.976415
10:45:00 195.783019
11:00:00 200.337292
11:14:00 80.0
11:15:00 206.280952
11:30:00 218.87886
11:45:00 238.251781
12:00:00 115.5
12:15:00 85.5
12:30:00 130.0
12:45:00 141.0
13:00:00 267.353774
13:15:00 257.061321
13:21:00 8.0
13:27:19 80.0
13:30:00 258.761905
13:45:00 254.703088
13:53:52 278.0
14:00:00 254.790476
14:15:00 247.165094
14:30:00 250.061321
14:45:00 264.014151
15:00:00 132.0
15:15:00 108.0
15:30:00 158.5
15:45:00 457.0
16:00:00 273.745283
16:15:00 273.962264
16:30:00 279.089623
16:45:00 280.264151
17:00:00 296.061321
17:15:00 296.481132
17:30:00 282.957547
17:45:00 279.816038
I have tried this line, but i get a typeError.
res = s.resample('15T').sum()
I tried to make the index to date, but it does not work too.

Aggregate time from 15-minute interval to single hour in SQL

I have below table structure in SQL Server:
StartDate Start End Sales
==============================================
2020-08-25 00:00:00 00:15:00 291.4200
2020-08-25 00:15:00 00:30:00 401.1700
2020-08-25 00:30:00 00:45:00 308.3300
2020-08-25 00:45:00 01:00:00 518.3200
2020-08-25 01:00:00 01:15:00 247.3700
2020-08-25 01:15:00 01:30:00 115.4700
2020-08-25 01:30:00 01:45:00 342.3800
2020-08-25 01:45:00 02:00:00 233.0900
2020-08-25 02:00:00 02:15:00 303.3400
2020-08-25 02:15:00 02:30:00 11.9000
2020-08-25 02:30:00 02:45:00 115.2400
2020-08-25 02:45:00 03:00:00 199.5200
2020-08-25 06:00:00 06:15:00 0.0000
2020-08-25 06:15:00 06:30:00 45.2400
2020-08-25 06:30:00 06:45:00 30.4800
2020-08-25 06:45:00 07:00:00 0.0000
2020-08-25 07:00:00 07:15:00 0.0000
2020-08-25 07:15:00 07:30:00 69.2800
Is there a way to group above data into one hour interval instead of 15 minute interval?
It has to be based on start and end columns.
Thanks,
Maybe something like the following using datepart?
select startdate, DatePart(hour,start) [Hour], Sum(sales) SalesPerHour
from t
group by startdate, DatePart(hour,start)

Pandas DateTime Calculating Daily Averages

I have 2 columns of data in a pandas DF that looks like this with the "DateTime" column in format YYYY-MM-DD HH:MM:SS - this is first 24 hrs but the df is for one full year or 8784 x 2.
BAFFIN BAY DateTime
8759 8.112838 2016-01-01 00:00:00
8760 7.977169 2016-01-01 01:00:00
8761 8.420204 2016-01-01 02:00:00
8762 9.515370 2016-01-01 03:00:00
8763 9.222840 2016-01-01 04:00:00
8764 8.872423 2016-01-01 05:00:00
8765 8.776145 2016-01-01 06:00:00
8766 9.030668 2016-01-01 07:00:00
8767 8.394983 2016-01-01 08:00:00
8768 8.092915 2016-01-01 09:00:00
8769 8.946967 2016-01-01 10:00:00
8770 9.620883 2016-01-01 11:00:00
8771 9.535951 2016-01-01 12:00:00
8772 8.861761 2016-01-01 13:00:00
8773 9.077692 2016-01-01 14:00:00
8774 9.116074 2016-01-01 15:00:00
8775 8.724343 2016-01-01 16:00:00
8776 8.916940 2016-01-01 17:00:00
8777 8.920438 2016-01-01 18:00:00
8778 8.926278 2016-01-01 19:00:00
8779 8.817666 2016-01-01 20:00:00
8780 8.704014 2016-01-01 21:00:00
8781 8.496358 2016-01-01 22:00:00
8782 8.434297 2016-01-01 23:00:00
I am trying to calculate daily averages of the "BAFFIN BAY" and I've tried these approaches:
davg_df2 = df2.groupby(pd.Grouper(freq='D', key='DateTime')).mean()
davg_df2 = df2.groupby(pd.Grouper(freq='1D', key='DateTime')).mean()
davg_df2 = df2.groupby(by=df2['DateTime'].dt.date).mean()
All of these approaches yields the same answer as shown below :
BAFFIN BAY
DateTime
2016-01-01 6.008044
However, if you do the math, the correct average for 2016-01-01 is 8.813134 Thank you kindly for your help. I'm assuming the grouping is just by day or 24hrs to make consecutive DAILY averages but the 3 approaches above clearly is looking at other data in my 8784 x 2 DF.
I just ran your df with this code and i get 8.813134:
df['DateTime'] = pd.to_datetime(df['DateTime'])
df = df.groupby(by=pd.Grouper(freq='D', key='DateTime')).mean()
print(df)
Output:
BAFFIN BAY
DateTime
2016-01-01 8.813134

Select data between night and day hours

My data looks like this, it is a minute based data for 2 years.
2017-04-02 00:00:00
2017-04-02 00:01:00
2017-04-02 00:02:00
2017-04-02 00:03:00
2017-04-02 00:04:00
....
2017-04-02 23:59:00
...
2019-02-01 22:54:00
2019-02-01 22:55:00
2019-02-01 22:56:00
2019-02-01 22:57:00
2019-02-01 22:58:00
2019-02-01 22:59:00
2019-02-01 23:00:00
I want to access all the data rows between the end of the workday to the beginning of the next. Example between 2018-04-02 18:00:00 2018-04-03 05:00:00 for all the days in my data frame. Please help
If you use a DatetimeIndex then you can use .between_time
import pandas as pd
df = pd.DataFrame({'date': pd.date_range('2017-04-02', freq='90min', periods=100)})
df = df.set_index('date')
df.between_time('18:00', '5:00')
#date
#2017-04-02 00:00:00
#2017-04-02 01:30:00
#2017-04-02 03:00:00
#2017-04-02 04:30:00
#2017-04-02 18:00:00
#2017-04-02 19:30:00
#2017-04-02 21:00:00
#2017-04-02 22:30:00
#....
One approach is boolean indexing based on conditions on the datetime column or index. Assuming your DataFrame is named df and it has a DatetimeIndex equal to the example data you've posted, try this:
df[(df.index.hour >= 18) | (df.index.hour <= 5)]