I'm having a trouble with pandas to_datetime function
When I call the function in this way:
import pandas as pd
pd.to_datetime(['Wed', 'Thu', 'Mon', 'Tue', 'Fri'], format='%a')
I get this result:
DatetimeIndex(['1900-01-01', '1900-01-01', '1900-01-01', '1900-01-01',
'1900-01-01'],
dtype='datetime64[ns]', freq=None)
I don't know why pandas don't recognize the correct format.
I want get a datetime object which has the right day independently the month or year
This is not a pandas issue but with datetime in python.
Here is the best documentation I can find why '1900-01-01' Python Datetime Technical Details.
Note:
For the datetime.strptime() class method, the default value is
1900-01-01T00:00:00.000: any components not specified in the format
string will be pulled from the default value.
Basically, it could be any Monday, Tuesday (day of the week name) in the month of Jan 1900. Therefore, it is ambiguous, thus returning the default value of 1900-01-01T00:00:00.000. If you put in a day of the month then that date is determinant using the given defaults of Jan 1900, so using strptime with 1 and %d does lead to the date of 1900-Jan-01 00:00:00.000 and 2 with %d will lead to 1900-Jan-02 00:00:00.000, just using Mon or Monday, is not determinant to a datetime.
This is my interpretation of the documentation and the issue you are experiencing.
Related
I'm trying to understand the following T-SQL code:
select DATEADD(
MONTH,
DATEDIFF(
MONTH,
-1,
GETDATE()) -1,
-1)
What does -1 indicate when passing into DATEDIFF? According to this, it should be a date parameter.
Here, the -1 for the 2nd parameter for DATEDIFF, and the 3rd parameter for DATEADD will be implicitly converted to a datetime. You can find what the value would be with a simple expression:
SELECT CONVERT(datetime, -1);
Which gives 1899-12-31 00:00:00.000. For the "old" date and time data types ((small)datetime), they allow conversion from numerical data type (such as an int or decimal). 0 represents 1900-01-01 00:00:00 and each full integer value represents a day. So -1 is 1899-12-31 00:00:00.000, 2 would be 1900-01-03 00:00:00.000 and 5.5 would be 1900-01-06 12:00:00.000 (as
.5 represents 12 hours).
For the new date and time data types, this conversion does not exist.
In truth, the above could likely be much more easily written as the following though:
SELECT EOMONTH(GETDATE(),-1);
Here the -1 means the end of the month 1 month prior to the date of the first parameter (in this case the current date). The second parameter for EOMONTH is optional, so EOMONTH(GETDATE()) would return the last day of the current month, and EOMONTH(GETDATE(),2) would return the last day of the month in 2 months time (2023-03-31 at time of writing).
I have a data frame with date columns in the format: day / month / year
They are in string/object format.
I want to convert them to datetime.
Sample date, 5th of January 2016: '05/01/2016'
However pd.to_datetime is confusing the day and month.
Here is what I've tried:
pd.to_datetime('05/01/2016')
Timestamp('2016-05-01 00:00:00')
This has given me Year - Month - Day
I want Day - Month - Year as in: 05-01-2016
What I have tried:
pd.to_datetime('05/01/2016',dayfirst=True)
Timestamp('2016-01-05 00:00:00')
This is correct, but it's not the format I want, which is '05-01-2016'
So I tried this:
pd.to_datetime('05/01/2016',dayfirst=True,format='%d/%m/%Y')
Timestamp('2016-01-05 00:00:00')
There's no difference.
How can I do it? How can I force it to display the datetime as '05-01-2016'
The only way I know is to change the display options:
pd.set_option("display.date_dayfirst", True)
https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html#available-options
but it's not working... Else you convert the datetime type to str:
ts = pd.to_datetime('05/01/2016', format='%d/%m/%Y')
print(ts)
# Timestamp('2016-01-05 00:00:00')
ts = ts.strftime('%d-%m-%Y')
print(ts)
# '05-01-2016'
Or just replace '/' by '-':
print('05/01/2016'.replace('/', '-'))
# '05-01-2016'
You can't change the timestamp format (to my knowledge), but you can convert it to string in the wanted format like so :
>>> import pandas as pd
>>> pd.to_datetime('05/01/2016', dayfirst=True, format='%d/%m/%Y').strftime('%d-%m-%Y')
'05-01-2016'
I'm using Pandas to read a .csv file that a 'Timestamp' date column in the format:
31/12/2016 00:00
I use the following line to convert it to a datetime64 dtype:
time = pd.to_datetime(df['Timestamp'])
The column has an entry corresponding to every 15mins for almost a year, and I've run into a problem when I want to plot more than 1 months worth.
Pandas seems to change the format from ISO to US upon reading (so YYYY:MM:DD to YYYY:DD:MM), so my plots have 30 day gaps whenever the datetime represents a new day. A plot of the first 5 days looks like:
This is the raw data in the file either side of the jump:
01/01/2017 23:45
02/01/2017 00:00
If I print the values being plotted (after reading) around the 1st jump, I get:
2017-01-01 23:45:00
2017-02-01 00:00:00
So is there a way to get pandas to read the dates properly?
Thanks!
You can specify a format parameter in pd.to_datetime to tell pandas how to parse the date exactly, which I suppose is what you need:
time = pd.to_datetime(df['Timestamp'], format='%d/%m/%Y %H:%M')
pd.to_datetime('02/01/2017 00:00')
#Timestamp('2017-02-01 00:00:00')
pd.to_datetime('02/01/2017 00:00', format='%d/%m/%Y %H:%M')
#Timestamp('2017-01-02 00:00:00')
I am wondering if there are any efficient methods or one-liner that, given a pandas DatetimeIndex date1, return a DatetimeIndex date2 that is the first day of the next month?
For example, if date1 is '2011-09-30' then date2 is '2011-10-01'?
I have tried this one liner
df.index.to_period("M").to_timestamp('M')
But this seems only able to return the "last day of the same month". Is it possible to do some datetime arithmetic here?
You can use pd.offsets.MonthBegin()
In [261]: d = pd.to_datetime(['2011-09-30', '2012-02-28'])
In [262]: d
Out[262]: DatetimeIndex(['2011-09-30', '2012-02-28'], dtype='datetime64[ns]', freq=None)
In [263]: d + pd.offsets.MonthBegin(1)
Out[263]: DatetimeIndex(['2011-10-01', '2012-03-01'], dtype='datetime64[ns]', freq=None)
You'll find a lot of examples in the official Pandas docs
After loading data from a csv file, I set the index to the "Date" column and then convert the index to datetime.
df1=pd.read_csv('Data.csv')
df1=df1.set_index('Date')
df1.index=pd.to_datetime(df1.index)
However after conversion the datetime format shows it has been misinterpreted:
original date was e.g. 01-10-2014 00:00:00
but Pandas converts it to 2014-01-10 00:00:00
How can I get Pandas to respect or recognize the original date format?
Thank you
Your datestrings were being interpreted as month first, you need to specify the correct format:
df1.index=pd.to_datetime(df1.index, format='%d-%m-%Y %H:%M:%S')
so that it doesn't interpret the first part as the month
In [128]:
pd.to_datetime('01-10-2014 00:00:00', format='%d-%m-%Y %H:%M:%S')
Out[128]:
Timestamp('2014-10-01 00:00:00')