pandas - change time object to a float? - pandas

I have a field for call length in my raw data which is listed as an object, such as: 00:10:30 meaning 10 minutes and 30 seconds. How can I convert this to a number like 10.50?
I keep getting errors. If convert the fields with pd.datetime then I can't do an .astype('float'). In Excel, I just multiple the time stamp by 1440 and it outputs the number value I want to work with. (Timestamp * 24 * 60)

You can use time deltas to do this more directly:
In [11]: s = pd.Series(["00:10:30"])
In [12]: s = pd.to_timedelta(s)
In [13]: s
Out[13]:
0 00:10:30
dtype: timedelta64[ns]
In [14]: s / pd.offsets.Minute(1)
Out[14]:
0 10.5
dtype: float64

I would convert the string to a datetime and then use the dt accessor to access the components of the time and generate your minutes column:
In [16]:
df = pd.DataFrame({'time':['00:10:30']})
df['time'] = pd.to_datetime(df['time'])
df['minutes'] = df['time'].dt.hour * 60 + df['time'].dt.minute + df['time'].dt.second/60
df
Out[16]:
time minutes
0 2015-02-05 00:10:30 10.5

There is probably a better way of doing this, but this will work.
from datetime import datetime
import numpy as np
my_time = datetime.strptime('00:10:30','%H:%M:%S')
zero_time = datetime.strptime('00:00:00','%H:%M:%S')
x = my_time - zero_time
x.seconds
Out[25]: 630

Related

Parsing date in pandas.read_csv

I am trying to read a CSV file which has in its first column date values specified in this format:
"Dec 30, 2021","1.1","1.2","1.3","1"
While I can define the types for the remaining columns using dtype= clause, I do not know how to handle the Date.
I have tried the obvious np.datetime64 without success.
Is there any way to specify a format to parse this date directly using read_csv method?
You may use parse_dates :
df = pd.read_csv('data.csv', parse_dates=['date'])
But in my experience it is a frequent source of errors, I think it is better to specify the date format and convert manually the date column. For example, in your case :
df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'], format = '%b %d, %Y')
Just specify a list of columns that should be convert to dates in the parse_dates= of pd.read_csv:
>>> df = pd.read_csv('file.csv', parse_dates=['date'])
>>> df
date a b c d
0 2021-12-30 1.1 1.2 1.3 1
>>> df.dtypes
date datetime64[ns]
a float64
b float64
c float64
d int64
Update
What if I want to further specify the format for a,b,c and d? I used a simplified example, in my file numbers are formated like this "2,345.55" and those are read as object by read_csv, not as float64 or int64 as in your example
converters = {
'Date': lambda x: datetime.strptime(x, "%b %d, %Y"),
'Number': lambda x: float(x.replace(',', ''))
}
df = pd.read_csv('data.csv', converters=converters)
Output:
>>> df
Date Number
0 2021-12-30 2345.55
>>> df.dtypes
Date datetime64[ns]
Number float64
dtype: object
# data.csv
Date,Number
"Dec 30, 2021","2,345.55"
Old answer
If you have a particular format, you can pass a custom function to date_parser parameter:
from datetime import datetime
custom_date_parser = lambda x: datetime.strptime(x, "%b %d, %Y")
df = pd.read_csv('data.csv', parse_dates=['Date'], date_parser=custom_date_parser)
print(df)
# Output
Date A B C D
0 2021-12-30 1.1 1.2 1.3 1
Or let Pandas try to determine the format as suggested by #richardec.

Add fractional number of years to date in pandas Python

I have a pandas df that includes two columns: time_in_years (float64) and date (datetime64).
import pandas as pd
df = pd.DataFrame({
'date': ['2009-12-25','2005-01-09','2010-10-31'],
'time_in_years': ['10.3434','5.0977','3.3426']
})
df['date'] = pd.to_datetime(df['date'])
df["time_in_years"] = df.time_in_years.astype(float)
I need to create date2 as a datetime64 column by adding the number of years to the date.
I tried the following but with no luck:
df['date_2'] = df['date'] + datetime.timedelta(years=df['time_in_years'])
I know that with fractions I will not be able to get the exact date, but I want to get the closest new date as possible.
Try package dateutil:
from dateutil.relativedelta import relativedelta
First convert fractional years to number of days, then use lambda function and apply it to dataframe:
df['date_2'] = df.apply(lambda x: x['date'] + relativedelta(days = int(x['time_in_years']*365)), axis = 1)
Result:
date time_in_years date_2
0 2009-12-25 10.3434 2020-04-26
1 2005-01-09 5.0977 2010-02-12
2 2010-10-31 3.3426 2014-03-04
datetime.timedelta also works fine:
df['date_2'] = df.apply(lambda x: x['date'] + datetime.timedelta(days = int(x['time_in_years']*365)), axis = 1)
Please note conversion to int is necessary, because relativedelta and timedelta do not accept fractional values.

How to convert a pandas column to datetime data type?

I have a dataframe that just has timedate stamps of data type "object". I want to convert the whole dataframe to a datetime data type. Also I would like to convert all the columns to the linux epoch nano seconds. So, I can use this dataframe in pca. enter image description here
Sample:
rng = pd.date_range('2017-04-03', periods=3).astype(str)
time_df = pd.DataFrame({'s': rng, 'a': rng})
print (time_df)
s a
0 2017-04-03 2017-04-03
1 2017-04-04 2017-04-04
2 2017-04-05 2017-04-05
Use DataFrame.apply with converting to datetimes and then to native epoch format by convert to numpy array and then to integers:
f = lambda x: pd.to_datetime(x, infer_datetime_format=True).values.astype(np.int64)
#pandas 0.24+
#f = lambda x: pd.to_datetime(x, infer_datetime_format=True).to_numpy().astype(np.int64)
time_df = time_df.apply(f)
print (time_df)
s a
0 1491177600000000000 1491177600000000000
1 1491264000000000000 1491264000000000000
2 1491350400000000000 1491350400000000000

Creating tz aware pandas timestamp objects from an integer

I have a integer that is the number of microseconds after the unix epoch. (in GMT)
How can I convert 1349863207154117 using astype to a pandas.Timestamp("2012-10-10T06:00:07.154117", tz=¨UTC¨)? The documentation on astype is not very thorough. I have tried the following.
x = 1349863207154117
dt64 = np.int64(x).astype("M8[us]")
print dt64
returns:
np.datetime64("2012-10-10T06:00:07.154117-0400")
if I only want seconds, this works:
time = pd.Timestamp(datetime.datetime.fromtimestamp(int(x / 1e6)), tz=¨UTC¨)
Pandas: Epoch timestamps covers this.
In [2]: pd.to_datetime(1349863207154117,unit='us')
Out[2]: Timestamp('2012-10-10 10:00:07.154117')
If you want this in a local timezone
In [6]: pd.to_datetime(1349863207154117,unit='us').tz_localize('US/Eastern')
Out[6]: Timestamp('2012-10-10 10:00:07.154117-0400', tz='US/Eastern')
If your time is in UTC, but you want it in another tz.
In [9]: pd.to_datetime(1349863207154117,unit='us').tz_localize('UTC').tz_convert('US/Eastern')
Out[9]: Timestamp('2012-10-10 06:00:07.154117-0400', tz='US/Eastern')
Or this
In [10]: pd.to_datetime(1349863207154117,unit='us',utc=True).tz_convert('US/Eastern')
Out[10]: Timestamp('2012-10-10 06:00:07.154117-0400', tz='US/Eastern')

How to convert numpy.timedelta64 to minutes

I have a date time column in a Pandas DataFrame and I'd like to convert it to minutes or seconds.
For example: I want to convert 00:27:00 to 27 mins.
example = data['duration'][0]
example
result: numpy.timedelta64(1620000000000,'ns')
What's the best way to achieve this?
Use array.astype() to convert the type of an array safely:
>>> import numpy as np
>>> a = np.timedelta64(1620000000000,'ns')
>>> a.astype('timedelta64[m]')
numpy.timedelta64(27,'m')