How to display day first with pd.to_datetime()? - pandas

I have a data frame with date columns in the format: day / month / year
They are in string/object format.
I want to convert them to datetime.
Sample date, 5th of January 2016: '05/01/2016'
However pd.to_datetime is confusing the day and month.
Here is what I've tried:
pd.to_datetime('05/01/2016')
Timestamp('2016-05-01 00:00:00')
This has given me Year - Month - Day
I want Day - Month - Year as in: 05-01-2016
What I have tried:
pd.to_datetime('05/01/2016',dayfirst=True)
Timestamp('2016-01-05 00:00:00')
This is correct, but it's not the format I want, which is '05-01-2016'
So I tried this:
pd.to_datetime('05/01/2016',dayfirst=True,format='%d/%m/%Y')
Timestamp('2016-01-05 00:00:00')
There's no difference.
How can I do it? How can I force it to display the datetime as '05-01-2016'

The only way I know is to change the display options:
pd.set_option("display.date_dayfirst", True)
https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html#available-options
but it's not working... Else you convert the datetime type to str:
ts = pd.to_datetime('05/01/2016', format='%d/%m/%Y')
print(ts)
# Timestamp('2016-01-05 00:00:00')
ts = ts.strftime('%d-%m-%Y')
print(ts)
# '05-01-2016'
Or just replace '/' by '-':
print('05/01/2016'.replace('/', '-'))
# '05-01-2016'

You can't change the timestamp format (to my knowledge), but you can convert it to string in the wanted format like so :
>>> import pandas as pd
>>> pd.to_datetime('05/01/2016', dayfirst=True, format='%d/%m/%Y').strftime('%d-%m-%Y')
'05-01-2016'

Related

How to convert days to datetime where time origin is 1-JAN-0000 00:00:00?

I have a netCDF dataset which includes coordinates of 'time' and 'depth'. The time coordinate has data stored in the format of days, where origin time is 'JAN1-0000 00:00:00' (Image for the dataset is attached below).
I want to know how to convert those days to correct datetime format ?
Thanks in advance!
There are 719528 days between year 0 and epoch (1970-01-01).
You can subtract those days and use to_datetime with days as unit:
time = np.array([731957.5, 731958.5])
out = pd.to_datetime(time-719528, unit='d')
output: DatetimeIndex(['2004-01-12 12:00:00', '2004-01-13 12:00:00'], dtype='datetime64[ns]', freq=None)

Convert string to date in databricks SQL

I have a table with a text column in the following format:
5/30/2021 9:35:18 AM
I'm trying to convert this to date(yyyy-MM-dd) but I get null values when I use cast, to_date.
Is there any way to get the above data in the yyyy-mm-dd format?
Use Databricks Datetime Patterns. According to SparkSQL documentation on the Databricks website, you can use datetime patterns specific to Databricks to convert to and from date columns. First, you need to convert the text column to a date column like this:
to_date('5/30/2021 9:35:18 AM','M/d/y h:m:s a')
M - month, d - day of month, y - year, h - hour of day (12-hour), m - minute of hour, s - second of minute, a - AM/PM
Once the column is converted to a date, you can easily use the same datetime patterns to convert it back to a specific format. Use the following command to convert it to the required format:
date_format(date to_date('5/30/2021 9:35:18 AM','M/d/y h:m:s a'), 'yyyy/MM/dd')
Note: Depending upon whether you're getting zero left padded days, months, hours, minutes, and seconds, you'll need to tweak the above command.

how to get last date from last year from a given date in pandas

I am looking for a solution to get from a date (timestamp), the last date of the previous year.
For Example
date = '2021-01-31' or '2021-04-25' . i expect '2020-12-31'
import pandas as pd
report_date = '2021-01-31'
report_date_tsmp = pd.Timestamp(report_date)
thanks for solutions!
Quick and dirty:
pd.Timestamp(f'12-31-{report_date_tsmp.year - 1}')
Less dirty with offset:
report_date_tsmp - pd.offsets.YearEnd()
Output:
Timestamp('2020-12-31 00:00:00')

How to generate a time series column from today to the next 600 days in pandas?

How to generate a time series column from today to the next 600 days in pandas?
I'm a new pandas learner. I can generate a new column as follows:
dates = pd.date_range('2010-01-01', '2011-8-23', freq='D')
Output:
DatetimeIndex(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',
'2010-01-05', '2010-01-06', '2010-01-07', '2010-01-08',
'2010-01-09', '2010-01-10',
...
'2011-08-14', '2011-08-15', '2011-08-16', '2011-08-17',
'2011-08-18', '2011-08-19', '2011-08-20', '2011-08-21',
'2011-08-22', '2011-08-23'],
dtype='datetime64[ns]', length=600, freq='D')
My question is: what should we do if we do only know the starting date, and the time period 600 days? we don't know the ending date. How to modify the code?
And another follow up questions, how to set the starting date to current or yesterday's date?
Just change the period to 600, you should get your out put
pd.date_range(start='2010-01-01', periods=5, freq='D')
Out[335]:
DatetimeIndex(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',
'2010-01-05'],
dtype='datetime64[ns]', freq='D')
For get today'date
pd.to_datetime('today')
Out[338]: Timestamp('2017-09-29 00:00:00')
First, import core package datetime
import datetime
Then you can instantiate a datetime object and add 600 days using the timedelta() method
start_date = "2010-01-01"
start_date = datetime.datetime.strptime(start_date, "%Y-%m-%d")
end_date = start_date + datetime.timedelta(days=600)
To now get the string back, we can use strftime() like:
end_date = end_date.strftime("%Y-%m-%d")
> '2011-08-24'

Python Pandas detects the wrong datetime format

After loading data from a csv file, I set the index to the "Date" column and then convert the index to datetime.
df1=pd.read_csv('Data.csv')
df1=df1.set_index('Date')
df1.index=pd.to_datetime(df1.index)
However after conversion the datetime format shows it has been misinterpreted:
original date was e.g. 01-10-2014 00:00:00
but Pandas converts it to 2014-01-10 00:00:00
How can I get Pandas to respect or recognize the original date format?
Thank you
Your datestrings were being interpreted as month first, you need to specify the correct format:
df1.index=pd.to_datetime(df1.index, format='%d-%m-%Y %H:%M:%S')
so that it doesn't interpret the first part as the month
In [128]:
pd.to_datetime('01-10-2014 00:00:00', format='%d-%m-%Y %H:%M:%S')
Out[128]:
Timestamp('2014-10-01 00:00:00')