How to convert a pandas column to datetime data type? - pandas

I have a dataframe that just has timedate stamps of data type "object". I want to convert the whole dataframe to a datetime data type. Also I would like to convert all the columns to the linux epoch nano seconds. So, I can use this dataframe in pca. enter image description here

Sample:
rng = pd.date_range('2017-04-03', periods=3).astype(str)
time_df = pd.DataFrame({'s': rng, 'a': rng})
print (time_df)
s a
0 2017-04-03 2017-04-03
1 2017-04-04 2017-04-04
2 2017-04-05 2017-04-05
Use DataFrame.apply with converting to datetimes and then to native epoch format by convert to numpy array and then to integers:
f = lambda x: pd.to_datetime(x, infer_datetime_format=True).values.astype(np.int64)
#pandas 0.24+
#f = lambda x: pd.to_datetime(x, infer_datetime_format=True).to_numpy().astype(np.int64)
time_df = time_df.apply(f)
print (time_df)
s a
0 1491177600000000000 1491177600000000000
1 1491264000000000000 1491264000000000000
2 1491350400000000000 1491350400000000000

Related

Parsing date in pandas.read_csv

I am trying to read a CSV file which has in its first column date values specified in this format:
"Dec 30, 2021","1.1","1.2","1.3","1"
While I can define the types for the remaining columns using dtype= clause, I do not know how to handle the Date.
I have tried the obvious np.datetime64 without success.
Is there any way to specify a format to parse this date directly using read_csv method?
You may use parse_dates :
df = pd.read_csv('data.csv', parse_dates=['date'])
But in my experience it is a frequent source of errors, I think it is better to specify the date format and convert manually the date column. For example, in your case :
df = pd.read_csv('data.csv')
df['date'] = pd.to_datetime(df['date'], format = '%b %d, %Y')
Just specify a list of columns that should be convert to dates in the parse_dates= of pd.read_csv:
>>> df = pd.read_csv('file.csv', parse_dates=['date'])
>>> df
date a b c d
0 2021-12-30 1.1 1.2 1.3 1
>>> df.dtypes
date datetime64[ns]
a float64
b float64
c float64
d int64
Update
What if I want to further specify the format for a,b,c and d? I used a simplified example, in my file numbers are formated like this "2,345.55" and those are read as object by read_csv, not as float64 or int64 as in your example
converters = {
'Date': lambda x: datetime.strptime(x, "%b %d, %Y"),
'Number': lambda x: float(x.replace(',', ''))
}
df = pd.read_csv('data.csv', converters=converters)
Output:
>>> df
Date Number
0 2021-12-30 2345.55
>>> df.dtypes
Date datetime64[ns]
Number float64
dtype: object
# data.csv
Date,Number
"Dec 30, 2021","2,345.55"
Old answer
If you have a particular format, you can pass a custom function to date_parser parameter:
from datetime import datetime
custom_date_parser = lambda x: datetime.strptime(x, "%b %d, %Y")
df = pd.read_csv('data.csv', parse_dates=['Date'], date_parser=custom_date_parser)
print(df)
# Output
Date A B C D
0 2021-12-30 1.1 1.2 1.3 1
Or let Pandas try to determine the format as suggested by #richardec.

isinstance(df, pd._libs.tslib.Timestamp) what does it do?

what does this function do exactly? here df is a dataframe with timestamp as index.For example, below DataFrame df:
2018-12-13 09:00:00, -113.0
2018-12-13 10:00:00, -112.5
2018-12-13 11:00:00, -114.8
if isinstance(df, pd._libs.tslib.Timestamp):
What does this if check do?
Access the Timestamp class as pd.Timestamp and skip the middle part, it's clearer.
You are testing whether the Dataframe is a single Timestamp. A Dataframe can contain columns of different data types - like Timestamps.
Some examples of your case:
import pandas as pd
dt_single = pd.Timestamp("2019-01-01")
dt_column = [pd.Timestamp("2019-01-01") + pd.Timedelta(days=n) for n in range(3)]
values = np.random.rand(3)
df = pd.DataFrame({"dt_column": dt_column, "values": values})
print(isinstance(df, pd.Timestamp)) # False. Type = pandas.core.frame.DataFrame
print(isinstance(df["dt_column"], pd.Timestamp)) # False. Type = pandas.core.series.Series
print(isinstance(dt_single, pd.Timestamp)) # True

pandas datetime is shown as numbers in plot

I have got a datetime variable in pandas dataframe 1, when I check the dtypes, it shows the right format (datetime) [2], however when I try to plot this variable, it is being plotted as numbers and not datetime [3].
The most surprising is that this variable was working fine till yesterday, I do not know what has change today :( and as the dtype is showing fine, I am clueless what else could go wrong.
I would highly appreciate your feedback.
thank you,
1
df.head()
reactive_power current timeofmeasurement
0 0 0.000 2018-12-12 10:43:41
1 0 0.000 2018-12-12 10:44:32
2 0 1.147 2018-12-12 10:46:16
3 262 1.135 2018-12-12 10:47:30
4 1159 4.989 2018-12-12 10:49:47
[2]
[] df.dtypes
reactive_power int64
current float64
timeofmeasurement datetime64[ns]
dtype: object
[3]
[]1
You need to convert your datetime column from string type into datetime type, and then set it as index. I don't have your original code, but something along those lines:
#Convert to datetime
df["current timeofmeasurement"] = pd.to_datetime(df["current timeofmeasurement"], format = "%Y-%m-%d %H:%H:%S")
#Set date as index
df = df.set_index("current timeofmeasurement")
#Then you can plot easily
df.plot()

Panda Index Datetime Switching Months and Days

I have a panda df.index in the format below.
It's a string of day/month/year, so the first item is 05Sep2017 etc:
05/09/17 #05Sep2017
07/09/17 #07Sep2017
...
18/10/17 #18Oct2017
Applying
df.index = pd.to_datetime(df.index)
to the above, transforms it to:
2017-05-09 #09May2017
2017-07-09 #09Jul2017
...
2017-10-18 #18Oct2017
What seems to be happening is that the first entries are having the Day and Month switched. The last entry instead, where the day is greater than 12, is converted correctly.
I tried to switch month days by converting the index to a column and applying:
df['date'] = df.index
df['date'].apply(lambda x: dt.datetime.strftime(x, '%Y-%d-%m'))
as well as:
df['date'].apply(lambda x: dt.datetime.strftime(x, '%Y-%m-%d'))
but to no avail.
How can i convert the index to datetime, where all entries are day/month/year please?
In pandas is default format of dates YY-MM-DD.
df = df.set_index('date_col')
df.index = pd.to_datetime(df.index)
print (df)
val
2017-05-09 4
2017-07-09 8
2017-10-18 2
print (df.index)
DatetimeIndex(['2017-05-09', '2017-07-09', '2017-10-18'], dtype='datetime64[ns]', freq=None)
You need strftime, but lost datetimes, because get strings:
df.index = pd.to_datetime(df.index).strftime('%Y-%d-%m')
print (df.index)
Index(['2017-09-05', '2017-09-07', '2017-18-10'], dtype='object')
df.index = pd.to_datetime(df.index).strftime('%d-%b-%Y')
print (df)
val
09-May-2017 4
09-Jul-2017 8
18-Oct-2017 2

pandas - change time object to a float?

I have a field for call length in my raw data which is listed as an object, such as: 00:10:30 meaning 10 minutes and 30 seconds. How can I convert this to a number like 10.50?
I keep getting errors. If convert the fields with pd.datetime then I can't do an .astype('float'). In Excel, I just multiple the time stamp by 1440 and it outputs the number value I want to work with. (Timestamp * 24 * 60)
You can use time deltas to do this more directly:
In [11]: s = pd.Series(["00:10:30"])
In [12]: s = pd.to_timedelta(s)
In [13]: s
Out[13]:
0 00:10:30
dtype: timedelta64[ns]
In [14]: s / pd.offsets.Minute(1)
Out[14]:
0 10.5
dtype: float64
I would convert the string to a datetime and then use the dt accessor to access the components of the time and generate your minutes column:
In [16]:
df = pd.DataFrame({'time':['00:10:30']})
df['time'] = pd.to_datetime(df['time'])
df['minutes'] = df['time'].dt.hour * 60 + df['time'].dt.minute + df['time'].dt.second/60
df
Out[16]:
time minutes
0 2015-02-05 00:10:30 10.5
There is probably a better way of doing this, but this will work.
from datetime import datetime
import numpy as np
my_time = datetime.strptime('00:10:30','%H:%M:%S')
zero_time = datetime.strptime('00:00:00','%H:%M:%S')
x = my_time - zero_time
x.seconds
Out[25]: 630