I have a column in DataFrame a Start_Time (datatype: Object) with values as :-
6:00:00
7:00:00
8:01:00
and want to change it to
6:30:00
7:30:00
8:30:00
Maybe you can use to_datetime and time:
print df
a
0 6:00:00
1 7:00:00
2 8:01:00
b = pd.to_datetime(df['a']) + pd.Timedelta(minutes=30)
print b
0 2016-02-17 06:30:00
1 2016-02-17 07:30:00
2 2016-02-17 08:31:00
Name: a, dtype: datetime64[ns]
print b.dt.time
0 06:30:00
1 07:30:00
2 08:31:00
Name: a, dtype: object
print b.dt.time.values
[datetime.time(6, 30) datetime.time(7, 30) datetime.time(8, 31)]
But if you need reset minutes:
print df
a
0 6:00:00
1 7:00:00
2 8:01:00
df['a'] = pd.to_datetime(df['a'])
print df
a
0 2016-02-17 06:00:00
1 2016-02-17 07:00:00
2 2016-02-17 08:01:00
df['a'] = df['a'].map(lambda x: datetime.datetime(x.year, x.month, x.day, x.hour))
print df
a
0 2016-02-17 06:00:00
1 2016-02-17 07:00:00
2 2016-02-17 08:00:00
b = df['a'] + pd.Timedelta(minutes=30)
print b.dt.time
0 06:30:00
1 07:30:00
2 08:30:00
Name: a, dtype: object
print b.dt.time.values
[datetime.time(6, 30) datetime.time(7, 30) datetime.time(8, 30)]
Related
I am using data below, which is saved in a CSV file, and trying to convert it to hourly using linear interpolation. However, not successful.
Code:
import pandas as pd
df = pd.read_csv('d:/Python/resampling/FairyLake.csv')
df[ 'Date' ] = pd.to_datetime(df['Date'])
df.set_index('Date').resample('M').interpolate()
print(df)
Data
Date,Discharge
1/3/2008,0.05865
1/4/2008,0.105812
1/5/2008,0.191388
1/6/2008,0.315378
1/7/2008,0.477782
1/8/2008,0.6786
1/9/2008,0.917832
1/10/2008,0.783875701
1/11/2008,0.65678957
1/12/2008,0.545651187
1/13/2008,0.44222808
1/14/2008,0.353907613
1/15/2008,0.27414753
Results
Date Discharge
0 2008-01-03 0.058650
1 2008-01-04 0.105812
2 2008-01-05 0.191388
3 2008-01-06 0.315378
4 2008-01-07 0.477782
5 2008-01-08 0.678600
6 2008-01-09 0.917832
7 2008-01-10 0.783876
8 2008-01-11 0.656790
9 2008-01-12 0.545651
10 2008-01-13 0.442228
11 2008-01-14 0.353908
12 2008-01-15 0.274148
Two things:
resample interpolate should be hourly (H)
results need to be assigned back df = ...:
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date').resample('H').interpolate()
df:
Discharge
Date
2008-01-03 00:00:00 0.058650
2008-01-03 01:00:00 0.060615
2008-01-03 02:00:00 0.062580
2008-01-03 03:00:00 0.064545
2008-01-03 04:00:00 0.066510
... ...
2008-01-14 20:00:00 0.287441
2008-01-14 21:00:00 0.284118
2008-01-14 22:00:00 0.280794
2008-01-14 23:00:00 0.277471
2008-01-15 00:00:00 0.274148
I only want to replace 0 which lies between exactly two numbers with its average value.
My dataset looks like below:
time value
9:45:00 0
10:00:00 0
10:15:00 0
10:30:00 10
10:45:00 0
11:00:00 10
11:15:00 10
11:30:00 0
11:45:00 10
12:00:00 0
12:15:00 0
12:30:00 0
12:45:00 10
13:00:00 0
13:15:00 0
I want it to look like this:
time value
9:45:00 0
10:00:00 0
10:15:00 0
10:30:00 10
10:45:00 10
11:00:00 10
11:15:00 10
11:30:00 10
11:45:00 10
12:00:00 0
12:15:00 0
12:30:00 0
12:45:00 10
13:00:00 0
13:15:00 0
in this, since the 0 between 11:45 to 12:45 is not exactly between two numbers (ie multiple zeros), we are not filling in these values
How about this?
from io import StringIO as sio
data = sio("""
time value
9:45:00 0
10:00:00 0
10:15:00 0
10:30:00 10
10:45:00 0
11:00:00 10
11:15:00 10
11:30:00 0
11:45:00 10
12:00:00 0
12:15:00 0
12:30:00 0
12:45:00 10
13:00:00 0
13:15:00 0
""")
import pandas as pd
df = pd.read_csv(data, sep='\s+')
df['flag_to_fill'] = (df['value']==0) & (df['value'].shift(1)!=0) & (df['value'].shift(-1)!=0)
df.loc[df['flag_to_fill'], 'value'] = 0.5*(df['value'].shift(1) + df['value'].shift(-1))
df
I have a DataFrame (df) as follow where 'date' is a datetime index (Y-M-D):
df :
values
date
2010-01-01 10
2010-01-02 20
2010-01-03 - 30
I want to create a new df with interpolated datetime index as follow:
values
date
2010-01-01 12:00:00 10
2010-01-01 17:00:00 15 # mean value betw. 2010-01-01 and 2010-01-02
2010-01-02 12:00:00 20
2010-01-02 17:00:00 - 5 # mean value betw. 2010-01-02 and 2010-01-03
2010-01-03 12:00:00 -30
Can anyone help me on this?
I believe need add 12 hours to index first, then reindex by union new indices with 17 and last interpolate:
df1 = df.set_index(df.index + pd.Timedelta(12, unit='h'))
idx = (df.index + pd.Timedelta(17, unit='h')).union(df1.index)
df2 = df1.reindex(idx).interpolate()
print (df2)
values
date
2010-01-01 12:00:00 10.0
2010-01-01 17:00:00 15.0
2010-01-02 12:00:00 20.0
2010-01-02 17:00:00 -5.0
2010-01-03 12:00:00 -30.0
2010-01-03 17:00:00 -30.0
I have a time Series. I'd like to group into into blocks of 24-hour blocks, from 8am to 7:59am the next day. I know how to group by date, but I've tried and failed to handle this 8-hour offset using TimeGroupers and DateOffsets.
I think you can use Grouper with parameter base:
print df
date name
0 2015-06-13 00:21:25 1
1 2015-06-14 01:00:25 2
2 2015-06-14 02:54:48 3
3 2015-06-15 14:38:15 2
4 2015-06-15 15:29:28 1
print df.groupby(pd.Grouper(key='date', freq='24h', base=8)).sum()
name
date
2015-06-12 08:00:00 1.0
2015-06-13 08:00:00 5.0
2015-06-14 08:00:00 NaN
2015-06-15 08:00:00 3.0
alternatively to #jezrael's method you can use your custom grouper function:
start_ts = '2016-01-01 07:59:59'
df = pd.DataFrame({'Date': pd.date_range(start_ts, freq='10min', periods=1000)})
def my_grouper(df, idx):
return df.ix[idx, 'Date'].date() if df.ix[idx, 'Date'].hour >= 8 else df.ix[idx, 'Date'].date() - pd.Timedelta('1day')
df.groupby(lambda x: my_grouper(df, x)).size()
Test:
In [468]: df.head()
Out[468]:
Date
0 2016-01-01 07:59:59
1 2016-01-01 08:09:59
2 2016-01-01 08:19:59
3 2016-01-01 08:29:59
4 2016-01-01 08:39:59
In [469]: df.tail()
Out[469]:
Date
995 2016-01-08 05:49:59
996 2016-01-08 05:59:59
997 2016-01-08 06:09:59
998 2016-01-08 06:19:59
999 2016-01-08 06:29:59
In [470]: df.groupby(lambda x: my_grouper(df, x)).size()
Out[470]:
2015-12-31 1
2016-01-01 144
2016-01-02 144
2016-01-03 144
2016-01-04 144
2016-01-05 144
2016-01-06 144
2016-01-07 135
dtype: int64
I have a Pandas DataFrame:
Out[57]:
lastrun rate
0 2013-11-04 12:15:02 0
1 2013-11-04 13:14:50 4
2 2013-11-04 14:14:48 10
3 2013-11-04 16:14:59 16
I would like to convert that into an hourly time series and interpolate missing values (15:00) so that I end up with:
2013-11-04 12:00:00 0
2013-11-04 13:00:00 4
2013-11-04 14:00:00 10
2013-11-04 15:00:00 13
2013-11-04 16:00:00 16
How do I convert / map the dataframe data to a time series in Pandas?
Assuming your 'lastrun' has datetime objects:
In [22]: s = df.set_index('lastrun').resample('H')['rate']
In [23]: s
Out[23]:
lastrun
2013-11-04 12:00:00 0
2013-11-04 13:00:00 4
2013-11-04 14:00:00 10
2013-11-04 15:00:00 NaN
2013-11-04 16:00:00 16
Freq: H, dtype: float64
In [24]: s.interpolate()
Out[24]:
lastrun
2013-11-04 12:00:00 0
2013-11-04 13:00:00 4
2013-11-04 14:00:00 10
2013-11-04 15:00:00 13
2013-11-04 16:00:00 16
Freq: H, dtype: int64
That's if you want linear interpolation. There's a bunch more options in the upcoming .13 release!