Pandas datetime comparison - pandas

I have the following dataframe:
start = ['31/12/2011 01:00','31/12/2011 01:00','31/12/2011 01:00','01/01/2013 08:00','31/12/2012 20:00']
end = ['02/01/2013 01:00','02/01/2014 01:00','02/01/2014 01:00','01/01/2013 14:00','01/01/2013 04:00']
df = pd.DataFrame({'start':start,'end':end})
df['start'] = pd.to_datetime(df['start'],format='%d/%m/%Y %H:%M')
df['end'] = pd.to_datetime(df['end'],format='%d/%m/%Y %H:%M')
print(df)
end start
0 2013-01-02 01:00:00 2011-12-31 01:00:00
1 2014-01-02 01:00:00 2011-12-31 01:00:00
2 2014-01-02 01:00:00 2011-12-31 01:00:00
3 2013-01-01 14:00:00 2013-01-01 08:00:00
4 2013-01-01 04:00:00 2012-12-31 20:00:00
I am tying to compare df.end and df.start to two given dates, year_start and year_end:
year_start = pd.to_datetime(2013,format='%Y')
year_end = pd.to_datetime(2013+1,format='%Y')
print(year_start)
print(year_end)
2013-01-01 00:00:00
2014-01-01 00:00:00
But i can't get my comparison to work (comparison in conditions):
conditions = [(df['start'].any()< year_start) and (df['end'].any()> year_end)]
choices = [8760]
df['test'] = np.select(conditions, choices, default=0)
I also tried to define year_end and year_start as follows but it does not work either:
year_start = np.datetime64(pd.to_datetime(2013,format='%Y'))
year_end = np.datetime64(pd.to_datetime(2013+1,format='%Y'))
Any idea on how I could make it work?

Try this:
In [797]: df[(df['start']< year_start) & (df['end']> year_end)]
Out[797]:
end start
1 2014-01-02 01:00:00 2011-12-31 01:00:00
2 2014-01-02 01:00:00 2011-12-31 01:00:00

Related

Pandas replace daily observations by monthly mean

Suppose, I have a pandas Series with daily observations:
pd_series = pd.Series(np.random.rand(26281), index = pd.date_range('2022-01-01', '2024-12-31', freq = 'H'))
pd_series
2022-01-01 00:00:00 0.933746
2022-01-01 01:00:00 0.588907
2022-01-01 02:00:00 0.229040
2022-01-01 03:00:00 0.557752
2022-01-01 04:00:00 0.798649
2024-12-30 20:00:00 0.314143
2024-12-30 21:00:00 0.670485
2024-12-30 22:00:00 0.300531
2024-12-30 23:00:00 0.075403
2024-12-31 00:00:00 0.716685
What I want is to replace every observation by the monthly average. I know that the average can be calculated as
pd_series.resample('MS').mean()
But how do I put the observations to the respective observations?
Use Resampler.transform:
print (pd_series.resample('MS').transform('mean'))
2022-01-01 00:00:00 0.495015
2022-01-01 01:00:00 0.495015
2022-01-01 02:00:00 0.495015
2022-01-01 03:00:00 0.495015
2022-01-01 04:00:00 0.495015
2024-12-30 20:00:00 0.508646
2024-12-30 21:00:00 0.508646
2024-12-30 22:00:00 0.508646
2024-12-30 23:00:00 0.508646
2024-12-31 00:00:00 0.508646
Freq: H, Length: 26281, dtype: float64

Pandas DateTime Calculating Daily Averages

I have 2 columns of data in a pandas DF that looks like this with the "DateTime" column in format YYYY-MM-DD HH:MM:SS - this is first 24 hrs but the df is for one full year or 8784 x 2.
BAFFIN BAY DateTime
8759 8.112838 2016-01-01 00:00:00
8760 7.977169 2016-01-01 01:00:00
8761 8.420204 2016-01-01 02:00:00
8762 9.515370 2016-01-01 03:00:00
8763 9.222840 2016-01-01 04:00:00
8764 8.872423 2016-01-01 05:00:00
8765 8.776145 2016-01-01 06:00:00
8766 9.030668 2016-01-01 07:00:00
8767 8.394983 2016-01-01 08:00:00
8768 8.092915 2016-01-01 09:00:00
8769 8.946967 2016-01-01 10:00:00
8770 9.620883 2016-01-01 11:00:00
8771 9.535951 2016-01-01 12:00:00
8772 8.861761 2016-01-01 13:00:00
8773 9.077692 2016-01-01 14:00:00
8774 9.116074 2016-01-01 15:00:00
8775 8.724343 2016-01-01 16:00:00
8776 8.916940 2016-01-01 17:00:00
8777 8.920438 2016-01-01 18:00:00
8778 8.926278 2016-01-01 19:00:00
8779 8.817666 2016-01-01 20:00:00
8780 8.704014 2016-01-01 21:00:00
8781 8.496358 2016-01-01 22:00:00
8782 8.434297 2016-01-01 23:00:00
I am trying to calculate daily averages of the "BAFFIN BAY" and I've tried these approaches:
davg_df2 = df2.groupby(pd.Grouper(freq='D', key='DateTime')).mean()
davg_df2 = df2.groupby(pd.Grouper(freq='1D', key='DateTime')).mean()
davg_df2 = df2.groupby(by=df2['DateTime'].dt.date).mean()
All of these approaches yields the same answer as shown below :
BAFFIN BAY
DateTime
2016-01-01 6.008044
However, if you do the math, the correct average for 2016-01-01 is 8.813134 Thank you kindly for your help. I'm assuming the grouping is just by day or 24hrs to make consecutive DAILY averages but the 3 approaches above clearly is looking at other data in my 8784 x 2 DF.
I just ran your df with this code and i get 8.813134:
df['DateTime'] = pd.to_datetime(df['DateTime'])
df = df.groupby(by=pd.Grouper(freq='D', key='DateTime')).mean()
print(df)
Output:
BAFFIN BAY
DateTime
2016-01-01 8.813134

How to convert to datetime if the format of dates changes gradually through the column?

df.head():
start_date end_date
0 03.09.2013 03.09.2025
1 09.08.2019 14.05.2020
2 03.08.2015 03.08.2019
3 31.03.2014 31.03.2019
4 02.02.2015 02.02.2019
5 21.08.2019 21.08.2024
when I do df.tail():
start_date end_date
30373 2019-07-05 00:00:00 2023-07-05 00:00:00
30374 2019-06-11 00:00:00 2023-06-11 00:00:00
30375 19.01.2017 2020-02-09 00:00:00 #these 2 start dates are just same as in head
30376 11.12.2009 2011-12-11 00:00:00
30377 2019-07-30 00:00:00 2023-07-30 00:00:00
when i do
df[start_date] = pd.to_datetime(df[start_date])
some dates have month converted as days.
The format is inconsistent through the column. How to convert properly?
Use dayfirst=True parameter:
df['start_date'] = pd.to_datetime(df['start_date'], dayfirst=True)
Or specify format by http://strftime.org/:
df['start_date'] = pd.to_datetime(df['start_date'], format='%d.%m.%Y')
df['start_date'] = pd.to_datetime(df['start_date'], dayfirst=True)
df['end_date'] = pd.to_datetime(df['end_date'], dayfirst=True)
print (df)
start_date end_date
0 2013-09-03 2025-09-03
1 2019-08-09 2020-05-14
2 2015-08-03 2019-08-03
3 2014-03-31 2019-03-31
4 2015-02-02 2019-02-02
5 2019-08-21 2024-08-21

Pandas Datetime conversion

I have the following dataframe;
Date = ['01-Jan','01-Jan','01-Jan','01-Jan']
Heure = ['00:00','01:00','02:00','03:00']
value =[1,2,3,4]
df = pd.DataFrame({'value':value,'Date':Date,'Hour':Heure})
print(df)
Date Hour value
0 01-Jan 00:00 1
1 01-Jan 01:00 2
2 01-Jan 02:00 3
3 01-Jan 03:00 4
I am trying to create a datetime index, knowing that the file I am working with is for 2015. I have tried a lot of things but can get it to work! I tried to only convert the date and the month, but even that does not work:
df.index = pd.to_datetime(df['Date'],format='%d-%m')
I expect the following result:
Date Hour value
2015-01-01 00:00:00 01-Jan 00:00 1
2015-01-01 01:00:00 01-Jan 01:00 2
2015-01-01 02:00:00 01-Jan 02:00 3
2015-01-01 03:00:00 01-Jan 03:00 4
Does anyone know how to do it?
Thanks,
You need to explicitely add 2015 somehow, and include the Hour column as well. I would do something like this:
df.index = pd.to_datetime(df.Date + '-2015 ' + df.Hour, format='%d-%b-%Y %H:%M')
>>> df
Date Hour value
2015-01-01 00:00:00 01-Jan 00:00 1
2015-01-01 01:00:00 01-Jan 01:00 2
2015-01-01 02:00:00 01-Jan 02:00 3
2015-01-01 03:00:00 01-Jan 03:00 4
You can replace the default 1900 by using replace
s=pd.to_datetime(df['Date']+df['Hour'],format='%d-%b%H:%M').apply(lambda x : x.replace(year=2015))
s
Out[131]:
0 2015-01-01 00:00:00
1 2015-01-01 01:00:00
2 2015-01-01 02:00:00
3 2015-01-01 03:00:00
dtype: datetime64[ns]
df.index=s

Interpolating datetime Index

I have a DataFrame (df) as follow where 'date' is a datetime index (Y-M-D):
df :
values
date
2010-01-01 10
2010-01-02 20
2010-01-03 - 30
I want to create a new df with interpolated datetime index as follow:
values
date
2010-01-01 12:00:00 10
2010-01-01 17:00:00 15 # mean value betw. 2010-01-01 and 2010-01-02
2010-01-02 12:00:00 20
2010-01-02 17:00:00 - 5 # mean value betw. 2010-01-02 and 2010-01-03
2010-01-03 12:00:00 -30
Can anyone help me on this?
I believe need add 12 hours to index first, then reindex by union new indices with 17 and last interpolate:
df1 = df.set_index(df.index + pd.Timedelta(12, unit='h'))
idx = (df.index + pd.Timedelta(17, unit='h')).union(df1.index)
df2 = df1.reindex(idx).interpolate()
print (df2)
values
date
2010-01-01 12:00:00 10.0
2010-01-01 17:00:00 15.0
2010-01-02 12:00:00 20.0
2010-01-02 17:00:00 -5.0
2010-01-03 12:00:00 -30.0
2010-01-03 17:00:00 -30.0