How to convert object to hour and add to date? - pandas

i have the following data frame :
correction = ['2.0','-2.5','4.5','-3.0']
date = ['2015-05-19 20:45:00','2017-04-29 17:15:00','2011-05-09 10:40:00','2016-12-18 16:10:00']
i want to convert correction as hours and add it to the date. i tried the following code, but it get the error.
df['correction'] = pd.to_timedelta(df['correction'],unit='h')
df['date'] =pd.DatetimeIndex(df['date'])
df['date'] = df['date'] + df['correction']
I get the error in converting correction to timedelta as:
ValueError: no units specified

For me works cast to float column correction:
df['correction'] = pd.to_timedelta(df['correction'].astype(float),unit='h')
df['date'] = pd.DatetimeIndex(df['date'])
df['date'] = df['date'] + df['correction']
print (df)
correction date
0 02:00:00 2015-05-19 22:45:00
1 -1 days +21:30:00 2017-04-29 14:45:00
2 04:30:00 2011-05-09 15:10:00
3 -1 days +21:00:00 2016-12-18 13:10:00

Related

Date object and time integer to datetime

All, I have a dataframe with a date column and an hour column. I am trying to combine those into a single timestamp. I tried many solutions available using datetime.datetime.combine and just implicitly extracting month day and year and creating a datetime stamp with it but all lead to some error.
idOnController date eventTime Energy hour
0 5014 2018-05-31 2018-05-31 01:00:00 26.619 0
2 5014 2018-06-02 2018-06-02 02:00:00 29.251 0
3 5014 2018-06-03 2018-06-03 03:00:00 30.635 0
The datatypes are as follows
idOnController int64
date object
eventTime datetime64[ns]
Energy float64
hour int64
dtype: object
I am looking to combine date and hour into a timestamp that looks like eventTime and then replace eventTime with that value.
You can do:
df['new_date'] = pd.to_datetime(df['date']) + df['hour'] * pd.to_timedelta('1H')
Output of df.dtypes:
idOnController int64
date object
eventTime datetime64[ns]
Energy float64
hour int64
new_date datetime64[ns]
dtype: object
If you want to have the string timestamps you can do
df['new_date'] = df['new_date'].dt.strftime('%Y-%m-%d %H:%M:%S')
Another way of doing this would be (a bit more verbose though!):
df['date'] = pd.to_datetime(df['date'])
df['year'] = df.date.dt.year
df['month'] = df.date.dt.month
df['day'] = df.date.dt.day
df['date'] = pd.to_datetime(df[['year','month','day','hour']])

Add random datetimes to timestamps

I have a column of timestamps that span over 24 hours. I want to convert these to differentiate between days. I've done this by converting to timedelta. The result is displayed below.
The question I have is, can these be converted or re-arranged again to provide random datetimes. e.g. dd:mm:yyyy hh:mm:ss.
import pandas as pd
df = pd.DataFrame({
'Time' : ['8:00','18:00','28:00'],
})
df['Time'] = [x + ':00' for x in df['Time']]
df['Time'] = pd.to_timedelta(df['Time'])
Out:
Time
0 0 days 08:00:00
1 0 days 18:00:00
2 1 days 04:00:00
Intended Output:
Time
0 1/01/1904 08:00:00 AM
1 1/01/1904 18:00:00 PM
2 2/01/1904 04:00:00 AM
The input timestamps will never go over more than 2 days. Is there a package that can achieve this or would a dummy start and end dates.
After you convert the Time just adding the date part
df.Time+pd.to_datetime('1904-01-01')
0 1904-01-01 08:00:00
1 1904-01-01 18:00:00
2 1904-01-02 04:00:00
Name: Time, dtype: datetime64[ns]

pandas to_datetime does not accept '24' as time

The time is in the YYYYMMDDHH format.The first time 2010010101, increases by 1 hour, reaches 2010010124, then 2010010201.
date
0 2010010101
1 2010010124
2 2010010201
df['date'] = pd.to_datetime(df['date'], format ='%Y%m%d%H')
I am getting error:
'int' object is unsliceable
If I run:
df2['date'] = pd.to_datetime(df2['date'], format ='%Y%m%d%H', errors = 'coerce')
All the '24' hour is labeled as NaT.
[
Time starts from 00 (midnight) till 23 so the time 24 in your date is 00 of the next day. One way is to define a custom to_datetime to handle the date format.
df = pd.DataFrame({'date':['2010010101', '2010010124', '2010010201']})
def custom_to_datetime(date):
# If the time is 24, set it to 0 and increment day by 1
if date[8:10] == '24':
return pd.to_datetime(date[:-2], format = '%Y%m%d') + pd.Timedelta(days=1)
else:
return pd.to_datetime(date, format = '%Y%m%d%H')
df['date'] = df['date'].apply(custom_to_datetime)
date
0 2010-01-01 01:00:00
1 2010-01-02 00:00:00
2 2010-01-02 01:00:00

How to change datetime to numeric discarding 0s at end [duplicate]

I have a dataframe in pandas called 'munged_data' with two columns 'entry_date' and 'dob' which i have converted to Timestamps using pd.to_timestamp.I am trying to figure out how to calculate ages of people based on the time difference between 'entry_date' and 'dob' and to do this i need to get the difference in days between the two columns ( so that i can then do somehting like round(days/365.25). I do not seem to be able to find a way to do this using a vectorized operation. When I do munged_data.entry_date-munged_data.dob i get the following :
internal_quote_id
2 15685977 days, 23:54:30.457856
3 11651985 days, 23:49:15.359744
4 9491988 days, 23:39:55.621376
7 11907004 days, 0:10:30.196224
9 15282164 days, 23:30:30.196224
15 15282227 days, 23:50:40.261632
However i do not seem to be able to extract the days as an integer so that i can continue with my calculation.
Any help appreciated.
Using the Pandas type Timedelta available since v0.15.0 you also can do:
In[1]: import pandas as pd
In[2]: df = pd.DataFrame([ pd.Timestamp('20150111'),
pd.Timestamp('20150301') ], columns=['date'])
In[3]: df['today'] = pd.Timestamp('20150315')
In[4]: df
Out[4]:
date today
0 2015-01-11 2015-03-15
1 2015-03-01 2015-03-15
In[5]: (df['today'] - df['date']).dt.days
Out[5]:
0 63
1 14
dtype: int64
You need 0.11 for this (0.11rc1 is out, final prob next week)
In [9]: df = DataFrame([ Timestamp('20010101'), Timestamp('20040601') ])
In [10]: df
Out[10]:
0
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00
In [11]: df = DataFrame([ Timestamp('20010101'),
Timestamp('20040601') ],columns=['age'])
In [12]: df
Out[12]:
age
0 2001-01-01 00:00:00
1 2004-06-01 00:00:00
In [13]: df['today'] = Timestamp('20130419')
In [14]: df['diff'] = df['today']-df['age']
In [16]: df['years'] = df['diff'].apply(lambda x: float(x.item().days)/365)
In [17]: df
Out[17]:
age today diff years
0 2001-01-01 00:00:00 2013-04-19 00:00:00 4491 days, 00:00:00 12.304110
1 2004-06-01 00:00:00 2013-04-19 00:00:00 3244 days, 00:00:00 8.887671
You need this odd apply at the end because not yet full support for timedelta64[ns] scalars (e.g. like how we use Timestamps now for datetime64[ns], coming in 0.12)
Not sure if you still need it, but in Pandas 0.14 i usually use .astype('timedelta64[X]') method
http://pandas.pydata.org/pandas-docs/stable/timeseries.html (frequency conversion)
df = pd.DataFrame([ pd.Timestamp('20010101'), pd.Timestamp('20040605') ])
df.ix[0]-df.ix[1]
Returns:
0 -1251 days
dtype: timedelta64[ns]
(df.ix[0]-df.ix[1]).astype('timedelta64[Y]')
Returns:
0 -4
dtype: float64
Hope that will help
Let's specify that you have a pandas series named time_difference which has type
numpy.timedelta64[ns]
One way of extracting just the day (or whatever desired attribute) is the following:
just_day = time_difference.apply(lambda x: pd.tslib.Timedelta(x).days)
This function is used because the numpy.timedelta64 object does not have a 'days' attribute.
To convert any type of data into days just use pd.Timedelta().days:
pd.Timedelta(1985, unit='Y').days
84494

Backfilling a pandas dataframe missed the first month

I have a pandas df or irrigation demand data that has daily values from 1900 to 2099. I resampled the df to get the monthly average and then resampled and backfilled the monthly averages on a daily frequency, so that the average daily value for each month, was input as the daily value for every day of that month.
My problem is that the first month was not backfilled and there is only a value for the last day of that month (1900-01-31).
Here is my code, any suggestions on what I am doing wrong?
I2 = pd.DataFrame(IrrigDemand, columns = ['Year', 'Month', 'Day', 'IrrigArea_1', 'IrrigArea_2','IrrigArea_3','IrrigArea_4','IrrigArea_5'],dtype=float)
# set dates as index
I2.set_index('Year')
# make a column of dates in datetime format
dates = pd.to_datetime(I2[['Year', 'Month', 'Day']])
# add the column of dates to df
I2['dates'] = pd.Series(dates, index=I2.index)
# set dates as index of df
I2.set_index('dates')
# delete the three string columns replaced with datetime values
I2.drop(['Year', 'Month', 'Day'],inplace=True,axis=1)
# calculate the average daily value for each month
I2_monthly_average = I2.reset_index().set_index('dates').resample('m').mean()
I2_daily_average = I2_monthly_average.resample('d').bfill()
There is problem first day is not added by resample('m'), so necessary add it manually:
# make a column of dates in datetime format and assign to index
I2.index = pd.to_datetime(I2[['Year', 'Month', 'Day']])
# delete the three string columns replaced with datetime values
I2.drop(['Year', 'Month', 'Day'],inplace=True,axis=1)
# calculate the average daily value for each month
I2_monthly_average = I2.resample('m').mean()
first_day = I2_monthly_average.index[0].replace(day = 1)
I2_monthly_average.loc[first_day] = I2_monthly_average.iloc[0]
I2_daily_average = I2_monthly_average.resample('d').bfill()
Sample:
rng = pd.date_range('2017-04-03', periods=10, freq='20D')
I2 = pd.DataFrame({'a': range(10)}, index=rng)
print (I2)
a
2017-04-03 0
2017-04-23 1
2017-05-13 2
2017-06-02 3
2017-06-22 4
2017-07-12 5
2017-08-01 6
2017-08-21 7
2017-09-10 8
2017-09-30 9
I2_monthly_average = I2.resample('m').mean()
print (I2_monthly_average)
a
2017-04-30 0.5
2017-05-31 2.0
2017-06-30 3.5
2017-07-31 5.0
2017-08-31 6.5
2017-09-30 8.5
first_day = I2_monthly_average.index[0].replace(day = 1)
I2_monthly_average.loc[first_day] = I2_monthly_average.iloc[0]
print (I2_monthly_average)
a
2017-04-30 0.5
2017-05-31 2.0
2017-06-30 3.5
2017-07-31 5.0
2017-08-31 6.5
2017-09-30 8.5
2017-04-01 0.5 <- added first day
I2_daily_average = I2_monthly_average.resample('d').bfill()
print (I2_daily_average.head())
a
2017-04-01 0.5
2017-04-02 0.5
2017-04-03 0.5
2017-04-04 0.5
2017-04-05 0.5