Duration to Seconds in Pandas - pandas

How can i a Series like this:
2016-11-09 00:07:00 0 days 00:00:15.000000000
2016-11-09 00:07:15 0 days 00:20:14.000000000
2016-11-09 00:07:30 0 days 10:00:15.000000000
into in integer values like this:
2016-11-09 00:07:00 15
2016-11-09 00:07:15 1214 // 20*60+14
2016-11-09 00:07:30 36015 // 10*60*60+15

Those are TimeDeltas. You should be able to use the total_seconds method. However, you'd need to access that method via the datetime accessor dt. Assuming your series is named s
s.dt.total_seconds()
2016-11-09 00:07:00 15.0
2016-11-09 00:07:15 1214.0
2016-11-09 00:07:30 36015.0
dtype: float64
Hower, if by chance those are strings. It might be better to do use pd.to_timedelta
pd.to_timedelta(s).dt.total_seconds()
2016-11-09 00:07:00 15.0
2016-11-09 00:07:15 1214.0
2016-11-09 00:07:30 36015.0
dtype: float64

Related

Pandas Convert YYMMDD & HHMM column to single DateTime Column

I'm having all kinds of trouble combining these two date columns into a single datetime column. The data looks like this:
dfn2.head(3)
Out[134]:
Plant_Name YYMMDD HHMM BestGuess(kWh)
0 BII NEE STIPA 20180101 0100 20715.0
1 BII NEE STIPA 20180101 0200 15742.0
2 BII NEE STIPA 20180101 0300 16934.0
dfn2.dtypes
Out[138]:
Plant_Name object
YYMMDD object
HHMM object
BestGuess(kWh) float64
dtype: object
I've tried several options and I'm not getting the expected result from:
dfn2['Datetime'] = (pd.to_datetime(dfn2['YYMMDD'],format='%Y%m%d').add(pd.to_timedelta(dfn2['HHMM'], 'h')))
dfn2.head(3)
Out[101]:
Plant_Name YYMMDD HHMM BestGuess(kWh) Datetime
0 BII NEE STIPA 20180101 100 20715.0 2018-01-05 04:00:00
1 BII NEE STIPA 20180101 200 15742.0 2018-01-09 08:00:00
2 BII NEE STIPA 20180101 300 16934.0 2018-01-13 12:00:00
I'm expecting the 'Datetime' column of the first 3 rows to look like:
2018-01-01 01:00:00
2018-01-01 02:00:00
2018-01-01 03:00:00
not like what the result shows above. I've also tried this lambda solution and the result looks the same:
dfn2['DateTime'] = dfn2['YYMMDD'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d')) + (pd.to_timedelta(dfn2.HHMM, unit='H'))
dfn2.head(3)
Out[103]:
Plant_Name YYMMDD HHMM BestGuess(kWh) Datetime DateTime
0 BII NEE STIPA 20180101 100 20715.0 2018-01-05 04:00:00 2018-01-05 04:00:00
1 BII NEE STIPA 20180101 200 15742.0 2018-01-09 08:00:00 2018-01-09 08:00:00
2 BII NEE STIPA 20180101 300 16934.0 2018-01-13 12:00:00 2018-01-13 12:00:00
Am I missing something? thank you,
You can do something like this
pd.to_datetime(df['YYMMDD'].astype(str) + ' ' + df['HHMM'].astype(str))
P.S. you can do this directly while reading the CSV file using the parameterparse_dates=[['YYMMDD', 'HHMM']]

Groupby month a pandas' dataframe based on a date column

I have a datafarme with a few lines listed below. I wanted to group the dataframe by month on the column labeled Date which spans from 1/1/1970 to 1/1/2011. Then compute some statistics for each month. The date column is datetime.datetime type. I used the following w/o success.
DataFrame.groupby(pd.Grouper(key='Date',freq='M')
I converted the datetime.datetime to Timestamp and tried DatetimeIndex but neither worked.
But i got the following error "Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'"
Date Name Time I D Pre Height
1 2011-01-01 OldFathful 9.55 01:39:00 4:15 09:41:00 130.0
2 2011-01-01 OldFathful 11.33 01:38:00 3:59 11:20:00 130.0
3 2011-01-01 OldFathful 13:00 01:27:00 4:00 13:00:00 140.0
4 2011-01-01 OldFathful 14:42 01:42:00 3:44 14:29:00 150.0
5 2011-01-01 OldFathful 16:08 01:26:00 4:00 16:02:00 140.0
Thanks in advance
EK

Creating values from datetime objects in certain fixed divisions

I am trying to create a new column, in which e.g. the time 14:02 should be saved as 14.0, whereas 14:16 should be 14.5. This would equal half-hour units. Of course 15min units should also be creatable and so on. This is my approach for full hours, but I need a higher resolution.
df["Time"] = df.StartDateTime.apply(lambda x: x.hour)
So long as the units evenly divide an hour you can round with that frequency and then divide by an hour.
import pandas as pd
df = pd.DataFrame({'Time': pd.timedelta_range('14:00:00', freq='4min', periods=10)})
for freq in ['30min', '15min', '20min', '10min']:
df[freq] = df['Time'].dt.round(freq)/pd.Timedelta('1H')
Time 30min 15min 20min 10min
0 14:00:00 14.0 14.00 14.000000 14.000000
1 14:04:00 14.0 14.00 14.000000 14.000000
2 14:08:00 14.0 14.25 14.000000 14.166667
3 14:12:00 14.0 14.25 14.333333 14.166667
4 14:16:00 14.5 14.25 14.333333 14.333333
5 14:20:00 14.5 14.25 14.333333 14.333333
6 14:24:00 14.5 14.50 14.333333 14.333333
7 14:28:00 14.5 14.50 14.333333 14.500000
8 14:32:00 14.5 14.50 14.666667 14.500000
9 14:36:00 14.5 14.50 14.666667 14.666667
If you start from a datetime64[ns] column you can isolate the time by subtracting off the normalized date. For example:
df = pd.DataFrame({'Time': pd.date_range('2010-01-01 14:00:00', freq='4min', periods=5)})
df['Time_only'] = df['Time'] - df['Time'].dt.normalize()
# Time Time_only
#0 2010-01-01 14:00:00 14:00:00
#1 2010-01-01 14:04:00 14:04:00
#2 2010-01-01 14:08:00 14:08:00
#3 2010-01-01 14:12:00 14:12:00
#4 2010-01-01 14:16:00 14:16:00
print(df.dtypes)
#Time datetime64[ns]
#Time_only timedelta64[ns]
#dtype: object

Why won't my dataframe column in pandas convert to datetime dtype?

I want to change my dataframe column to datetime, but whenever I run the code it doesn't change the dtype from object to datetime64[ns]
df['expiration'] = pd.to_datetime(df['expiration'], format='%Y-%m-%d')
df['expiration']
this shows
57098 2017-02-01
57248 2017-01-27
57430 2017-01-27
57589 2017-01-25
57590 2017-01-25
57591 2017-01-25
57601 2017-01-27
57602 2017-01-27
57784 2017-01-25
57785 2017-01-25
57786 2017-01-25
Name: expiration, Length: 642, dtype: object
I expected the dtype to be datetime64[ns] but I'm not sure where I'm going wrong here.

Pandas - Group into 24-hour blocks, but not midnight-to-midnight

I have a time Series. I'd like to group into into blocks of 24-hour blocks, from 8am to 7:59am the next day. I know how to group by date, but I've tried and failed to handle this 8-hour offset using TimeGroupers and DateOffsets.
I think you can use Grouper with parameter base:
print df
date name
0 2015-06-13 00:21:25 1
1 2015-06-14 01:00:25 2
2 2015-06-14 02:54:48 3
3 2015-06-15 14:38:15 2
4 2015-06-15 15:29:28 1
print df.groupby(pd.Grouper(key='date', freq='24h', base=8)).sum()
name
date
2015-06-12 08:00:00 1.0
2015-06-13 08:00:00 5.0
2015-06-14 08:00:00 NaN
2015-06-15 08:00:00 3.0
alternatively to #jezrael's method you can use your custom grouper function:
start_ts = '2016-01-01 07:59:59'
df = pd.DataFrame({'Date': pd.date_range(start_ts, freq='10min', periods=1000)})
def my_grouper(df, idx):
return df.ix[idx, 'Date'].date() if df.ix[idx, 'Date'].hour >= 8 else df.ix[idx, 'Date'].date() - pd.Timedelta('1day')
df.groupby(lambda x: my_grouper(df, x)).size()
Test:
In [468]: df.head()
Out[468]:
Date
0 2016-01-01 07:59:59
1 2016-01-01 08:09:59
2 2016-01-01 08:19:59
3 2016-01-01 08:29:59
4 2016-01-01 08:39:59
In [469]: df.tail()
Out[469]:
Date
995 2016-01-08 05:49:59
996 2016-01-08 05:59:59
997 2016-01-08 06:09:59
998 2016-01-08 06:19:59
999 2016-01-08 06:29:59
In [470]: df.groupby(lambda x: my_grouper(df, x)).size()
Out[470]:
2015-12-31 1
2016-01-01 144
2016-01-02 144
2016-01-03 144
2016-01-04 144
2016-01-05 144
2016-01-06 144
2016-01-07 135
dtype: int64