Interpolating datetime Index - pandas

I have a DataFrame (df) as follow where 'date' is a datetime index (Y-M-D):
df :
values
date
2010-01-01 10
2010-01-02 20
2010-01-03 - 30
I want to create a new df with interpolated datetime index as follow:
values
date
2010-01-01 12:00:00 10
2010-01-01 17:00:00 15 # mean value betw. 2010-01-01 and 2010-01-02
2010-01-02 12:00:00 20
2010-01-02 17:00:00 - 5 # mean value betw. 2010-01-02 and 2010-01-03
2010-01-03 12:00:00 -30
Can anyone help me on this?

I believe need add 12 hours to index first, then reindex by union new indices with 17 and last interpolate:
df1 = df.set_index(df.index + pd.Timedelta(12, unit='h'))
idx = (df.index + pd.Timedelta(17, unit='h')).union(df1.index)
df2 = df1.reindex(idx).interpolate()
print (df2)
values
date
2010-01-01 12:00:00 10.0
2010-01-01 17:00:00 15.0
2010-01-02 12:00:00 20.0
2010-01-02 17:00:00 -5.0
2010-01-03 12:00:00 -30.0
2010-01-03 17:00:00 -30.0

Related

Pandas replace daily observations by monthly mean

Suppose, I have a pandas Series with daily observations:
pd_series = pd.Series(np.random.rand(26281), index = pd.date_range('2022-01-01', '2024-12-31', freq = 'H'))
pd_series
2022-01-01 00:00:00 0.933746
2022-01-01 01:00:00 0.588907
2022-01-01 02:00:00 0.229040
2022-01-01 03:00:00 0.557752
2022-01-01 04:00:00 0.798649
2024-12-30 20:00:00 0.314143
2024-12-30 21:00:00 0.670485
2024-12-30 22:00:00 0.300531
2024-12-30 23:00:00 0.075403
2024-12-31 00:00:00 0.716685
What I want is to replace every observation by the monthly average. I know that the average can be calculated as
pd_series.resample('MS').mean()
But how do I put the observations to the respective observations?
Use Resampler.transform:
print (pd_series.resample('MS').transform('mean'))
2022-01-01 00:00:00 0.495015
2022-01-01 01:00:00 0.495015
2022-01-01 02:00:00 0.495015
2022-01-01 03:00:00 0.495015
2022-01-01 04:00:00 0.495015
2024-12-30 20:00:00 0.508646
2024-12-30 21:00:00 0.508646
2024-12-30 22:00:00 0.508646
2024-12-30 23:00:00 0.508646
2024-12-31 00:00:00 0.508646
Freq: H, Length: 26281, dtype: float64

Overlap in seconds between datetime range and a time range

I have a dataframe like this:
df11 = pd.DataFrame(
{
"Start_date": ["2018-01-31 12:00:00", "2018-02-28 16:00:00", "2018-02-27 22:00:00"],
"End_date": ["2019-01-31 21:45:00", "2019-03-24 22:00:00", "2018-02-28 01:00:00"],
}
)
Start_date End_date
0 2018-01-31 12:00:00 2019-01-31 21:45:00
1 2018-02-28 16:00:00 2019-03-24 22:00:00
2 2018-02-27 22:00:00 2018-02-28 01:00:00
I need to check the overlap time duration in specific periods in seconds. My expected results are like this:
Start_date End_date 12h-16h 16h-22h 22h-00h 00h-02h30
0 2018-01-31 12:00:00 2019-01-31 21:45:00 14400 20700 0 0
1 2018-02-28 16:00:00 2019-03-24 22:00:00 0 21600 0 0
2 2018-02-27 22:00:00 2018-02-28 01:00:00 0 0 7200 3600
I know it`s completely wrong and I´ve tried other solutions. This is one of my attempts:
df11['12h-16h']=np.where(df11['Start_date']<timedelta(hours=16, minutes=0, seconds=0) & df11['End_date']>timedelta(hours=12, minutes=0, seconds=0),(np.minimum(df11['End_date'],timedelta(hours=16, minutes=0, seconds=0)))-(np.maximum(df11['Start_date'],timedelta(hours=12, minutes=0, seconds=0)))

Pandas DateTime Calculating Daily Averages

I have 2 columns of data in a pandas DF that looks like this with the "DateTime" column in format YYYY-MM-DD HH:MM:SS - this is first 24 hrs but the df is for one full year or 8784 x 2.
BAFFIN BAY DateTime
8759 8.112838 2016-01-01 00:00:00
8760 7.977169 2016-01-01 01:00:00
8761 8.420204 2016-01-01 02:00:00
8762 9.515370 2016-01-01 03:00:00
8763 9.222840 2016-01-01 04:00:00
8764 8.872423 2016-01-01 05:00:00
8765 8.776145 2016-01-01 06:00:00
8766 9.030668 2016-01-01 07:00:00
8767 8.394983 2016-01-01 08:00:00
8768 8.092915 2016-01-01 09:00:00
8769 8.946967 2016-01-01 10:00:00
8770 9.620883 2016-01-01 11:00:00
8771 9.535951 2016-01-01 12:00:00
8772 8.861761 2016-01-01 13:00:00
8773 9.077692 2016-01-01 14:00:00
8774 9.116074 2016-01-01 15:00:00
8775 8.724343 2016-01-01 16:00:00
8776 8.916940 2016-01-01 17:00:00
8777 8.920438 2016-01-01 18:00:00
8778 8.926278 2016-01-01 19:00:00
8779 8.817666 2016-01-01 20:00:00
8780 8.704014 2016-01-01 21:00:00
8781 8.496358 2016-01-01 22:00:00
8782 8.434297 2016-01-01 23:00:00
I am trying to calculate daily averages of the "BAFFIN BAY" and I've tried these approaches:
davg_df2 = df2.groupby(pd.Grouper(freq='D', key='DateTime')).mean()
davg_df2 = df2.groupby(pd.Grouper(freq='1D', key='DateTime')).mean()
davg_df2 = df2.groupby(by=df2['DateTime'].dt.date).mean()
All of these approaches yields the same answer as shown below :
BAFFIN BAY
DateTime
2016-01-01 6.008044
However, if you do the math, the correct average for 2016-01-01 is 8.813134 Thank you kindly for your help. I'm assuming the grouping is just by day or 24hrs to make consecutive DAILY averages but the 3 approaches above clearly is looking at other data in my 8784 x 2 DF.
I just ran your df with this code and i get 8.813134:
df['DateTime'] = pd.to_datetime(df['DateTime'])
df = df.groupby(by=pd.Grouper(freq='D', key='DateTime')).mean()
print(df)
Output:
BAFFIN BAY
DateTime
2016-01-01 8.813134

Pandas datetime comparison

I have the following dataframe:
start = ['31/12/2011 01:00','31/12/2011 01:00','31/12/2011 01:00','01/01/2013 08:00','31/12/2012 20:00']
end = ['02/01/2013 01:00','02/01/2014 01:00','02/01/2014 01:00','01/01/2013 14:00','01/01/2013 04:00']
df = pd.DataFrame({'start':start,'end':end})
df['start'] = pd.to_datetime(df['start'],format='%d/%m/%Y %H:%M')
df['end'] = pd.to_datetime(df['end'],format='%d/%m/%Y %H:%M')
print(df)
end start
0 2013-01-02 01:00:00 2011-12-31 01:00:00
1 2014-01-02 01:00:00 2011-12-31 01:00:00
2 2014-01-02 01:00:00 2011-12-31 01:00:00
3 2013-01-01 14:00:00 2013-01-01 08:00:00
4 2013-01-01 04:00:00 2012-12-31 20:00:00
I am tying to compare df.end and df.start to two given dates, year_start and year_end:
year_start = pd.to_datetime(2013,format='%Y')
year_end = pd.to_datetime(2013+1,format='%Y')
print(year_start)
print(year_end)
2013-01-01 00:00:00
2014-01-01 00:00:00
But i can't get my comparison to work (comparison in conditions):
conditions = [(df['start'].any()< year_start) and (df['end'].any()> year_end)]
choices = [8760]
df['test'] = np.select(conditions, choices, default=0)
I also tried to define year_end and year_start as follows but it does not work either:
year_start = np.datetime64(pd.to_datetime(2013,format='%Y'))
year_end = np.datetime64(pd.to_datetime(2013+1,format='%Y'))
Any idea on how I could make it work?
Try this:
In [797]: df[(df['start']< year_start) & (df['end']> year_end)]
Out[797]:
end start
1 2014-01-02 01:00:00 2011-12-31 01:00:00
2 2014-01-02 01:00:00 2011-12-31 01:00:00

Pandas Datetime conversion

I have the following dataframe;
Date = ['01-Jan','01-Jan','01-Jan','01-Jan']
Heure = ['00:00','01:00','02:00','03:00']
value =[1,2,3,4]
df = pd.DataFrame({'value':value,'Date':Date,'Hour':Heure})
print(df)
Date Hour value
0 01-Jan 00:00 1
1 01-Jan 01:00 2
2 01-Jan 02:00 3
3 01-Jan 03:00 4
I am trying to create a datetime index, knowing that the file I am working with is for 2015. I have tried a lot of things but can get it to work! I tried to only convert the date and the month, but even that does not work:
df.index = pd.to_datetime(df['Date'],format='%d-%m')
I expect the following result:
Date Hour value
2015-01-01 00:00:00 01-Jan 00:00 1
2015-01-01 01:00:00 01-Jan 01:00 2
2015-01-01 02:00:00 01-Jan 02:00 3
2015-01-01 03:00:00 01-Jan 03:00 4
Does anyone know how to do it?
Thanks,
You need to explicitely add 2015 somehow, and include the Hour column as well. I would do something like this:
df.index = pd.to_datetime(df.Date + '-2015 ' + df.Hour, format='%d-%b-%Y %H:%M')
>>> df
Date Hour value
2015-01-01 00:00:00 01-Jan 00:00 1
2015-01-01 01:00:00 01-Jan 01:00 2
2015-01-01 02:00:00 01-Jan 02:00 3
2015-01-01 03:00:00 01-Jan 03:00 4
You can replace the default 1900 by using replace
s=pd.to_datetime(df['Date']+df['Hour'],format='%d-%b%H:%M').apply(lambda x : x.replace(year=2015))
s
Out[131]:
0 2015-01-01 00:00:00
1 2015-01-01 01:00:00
2 2015-01-01 02:00:00
3 2015-01-01 03:00:00
dtype: datetime64[ns]
df.index=s