Python Pandas detects the wrong datetime format - pandas

After loading data from a csv file, I set the index to the "Date" column and then convert the index to datetime.
df1=pd.read_csv('Data.csv')
df1=df1.set_index('Date')
df1.index=pd.to_datetime(df1.index)
However after conversion the datetime format shows it has been misinterpreted:
original date was e.g. 01-10-2014 00:00:00
but Pandas converts it to 2014-01-10 00:00:00
How can I get Pandas to respect or recognize the original date format?
Thank you

Your datestrings were being interpreted as month first, you need to specify the correct format:
df1.index=pd.to_datetime(df1.index, format='%d-%m-%Y %H:%M:%S')
so that it doesn't interpret the first part as the month
In [128]:
pd.to_datetime('01-10-2014 00:00:00', format='%d-%m-%Y %H:%M:%S')
Out[128]:
Timestamp('2014-10-01 00:00:00')

Related

Conversion of datetime format

I have column name requestdatetime with data type string.
Value for requestdatetime is in format 15/Aug/2022:01:54:41 +0000
I need to convert 15/Aug/2022:01:54:41 +0000 into 'yyyy-MM-dd HH:mm:ss' format.
I have tried date_parse(requestdatetime,'%d/%b/%Y'':''HH:mm:ss'' ''+SSS') but it not working out.
You need to convert string to date then date to string to get expected result.
select date_format(parse_datetime('15/Aug/2022:01:54:41 +0000','dd/MMM/yyyy:HH:mm:ss Z'), '%Y/%m/%d %T')
result:
2022/08/15 01:54:41
date_parse accepts MySQL date format, try parse_datetime which accepts Java format (do not forget to add part for timezone offset - Z):
SELECT parse_datetime('15/Aug/2022:01:54:41 +0000', 'dd/MMM/yyyy:HH:mm:ss Z');
Output:
_col0
2022-08-15 01:54:41.000 UTC

How to display day first with pd.to_datetime()?

I have a data frame with date columns in the format: day / month / year
They are in string/object format.
I want to convert them to datetime.
Sample date, 5th of January 2016: '05/01/2016'
However pd.to_datetime is confusing the day and month.
Here is what I've tried:
pd.to_datetime('05/01/2016')
Timestamp('2016-05-01 00:00:00')
This has given me Year - Month - Day
I want Day - Month - Year as in: 05-01-2016
What I have tried:
pd.to_datetime('05/01/2016',dayfirst=True)
Timestamp('2016-01-05 00:00:00')
This is correct, but it's not the format I want, which is '05-01-2016'
So I tried this:
pd.to_datetime('05/01/2016',dayfirst=True,format='%d/%m/%Y')
Timestamp('2016-01-05 00:00:00')
There's no difference.
How can I do it? How can I force it to display the datetime as '05-01-2016'
The only way I know is to change the display options:
pd.set_option("display.date_dayfirst", True)
https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html#available-options
but it's not working... Else you convert the datetime type to str:
ts = pd.to_datetime('05/01/2016', format='%d/%m/%Y')
print(ts)
# Timestamp('2016-01-05 00:00:00')
ts = ts.strftime('%d-%m-%Y')
print(ts)
# '05-01-2016'
Or just replace '/' by '-':
print('05/01/2016'.replace('/', '-'))
# '05-01-2016'
You can't change the timestamp format (to my knowledge), but you can convert it to string in the wanted format like so :
>>> import pandas as pd
>>> pd.to_datetime('05/01/2016', dayfirst=True, format='%d/%m/%Y').strftime('%d-%m-%Y')
'05-01-2016'

pandas: to_list for timestamp stores as epoch. How to store as iso format time stamp string

I have a timestamp column named time in pandas dataframe
a sample timestamp is
2021-01-17 18:11:23+00:00
and the column data type is
time datetime64[ns, psycopg2.tz.FixedOffsetTimezone...
now i am trying to convert the column to a list
df['time'].values.tolist(),
the above sample timestamp is now converted into epoch and show stored as
1610907083000000000
How can i tell pandas to store in iso string format rather than epoch
df['time'].dt.strftime('%Y-%m-%dT%H:%M:%S.%f%z').values.tolist()
You can change '%Y-%m-%dT%H:%M:%S.%f%z' with the format you like the most (I used the ISO 8601 format; see here for the format specs)

Convert Pandas date column from dd-mmm-yy to yyyy-mm-dd

I have a Pandas Dataframe that stores date in the format 19-Jul-18. I am trying to convert it to 2018-07-19
I tried doing pd.to_datetime(df['date']) but it dint help.

Matplotlib Default date format?

I'm using Pandas to read a .csv file that a 'Timestamp' date column in the format:
31/12/2016 00:00
I use the following line to convert it to a datetime64 dtype:
time = pd.to_datetime(df['Timestamp'])
The column has an entry corresponding to every 15mins for almost a year, and I've run into a problem when I want to plot more than 1 months worth.
Pandas seems to change the format from ISO to US upon reading (so YYYY:MM:DD to YYYY:DD:MM), so my plots have 30 day gaps whenever the datetime represents a new day. A plot of the first 5 days looks like:
This is the raw data in the file either side of the jump:
01/01/2017 23:45
02/01/2017 00:00
If I print the values being plotted (after reading) around the 1st jump, I get:
2017-01-01 23:45:00
2017-02-01 00:00:00
So is there a way to get pandas to read the dates properly?
Thanks!
You can specify a format parameter in pd.to_datetime to tell pandas how to parse the date exactly, which I suppose is what you need:
time = pd.to_datetime(df['Timestamp'], format='%d/%m/%Y %H:%M')
pd.to_datetime('02/01/2017 00:00')
#Timestamp('2017-02-01 00:00:00')
pd.to_datetime('02/01/2017 00:00', format='%d/%m/%Y %H:%M')
#Timestamp('2017-01-02 00:00:00')