How to convert days to datetime where time origin is 1-JAN-0000 00:00:00? - pandas

I have a netCDF dataset which includes coordinates of 'time' and 'depth'. The time coordinate has data stored in the format of days, where origin time is 'JAN1-0000 00:00:00' (Image for the dataset is attached below).
I want to know how to convert those days to correct datetime format ?
Thanks in advance!

There are 719528 days between year 0 and epoch (1970-01-01).
You can subtract those days and use to_datetime with days as unit:
time = np.array([731957.5, 731958.5])
out = pd.to_datetime(time-719528, unit='d')
output: DatetimeIndex(['2004-01-12 12:00:00', '2004-01-13 12:00:00'], dtype='datetime64[ns]', freq=None)

Related

How to display day first with pd.to_datetime()?

I have a data frame with date columns in the format: day / month / year
They are in string/object format.
I want to convert them to datetime.
Sample date, 5th of January 2016: '05/01/2016'
However pd.to_datetime is confusing the day and month.
Here is what I've tried:
pd.to_datetime('05/01/2016')
Timestamp('2016-05-01 00:00:00')
This has given me Year - Month - Day
I want Day - Month - Year as in: 05-01-2016
What I have tried:
pd.to_datetime('05/01/2016',dayfirst=True)
Timestamp('2016-01-05 00:00:00')
This is correct, but it's not the format I want, which is '05-01-2016'
So I tried this:
pd.to_datetime('05/01/2016',dayfirst=True,format='%d/%m/%Y')
Timestamp('2016-01-05 00:00:00')
There's no difference.
How can I do it? How can I force it to display the datetime as '05-01-2016'
The only way I know is to change the display options:
pd.set_option("display.date_dayfirst", True)
https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html#available-options
but it's not working... Else you convert the datetime type to str:
ts = pd.to_datetime('05/01/2016', format='%d/%m/%Y')
print(ts)
# Timestamp('2016-01-05 00:00:00')
ts = ts.strftime('%d-%m-%Y')
print(ts)
# '05-01-2016'
Or just replace '/' by '-':
print('05/01/2016'.replace('/', '-'))
# '05-01-2016'
You can't change the timestamp format (to my knowledge), but you can convert it to string in the wanted format like so :
>>> import pandas as pd
>>> pd.to_datetime('05/01/2016', dayfirst=True, format='%d/%m/%Y').strftime('%d-%m-%Y')
'05-01-2016'

Convert string to date in databricks SQL

I have a table with a text column in the following format:
5/30/2021 9:35:18 AM
I'm trying to convert this to date(yyyy-MM-dd) but I get null values when I use cast, to_date.
Is there any way to get the above data in the yyyy-mm-dd format?
Use Databricks Datetime Patterns. According to SparkSQL documentation on the Databricks website, you can use datetime patterns specific to Databricks to convert to and from date columns. First, you need to convert the text column to a date column like this:
to_date('5/30/2021 9:35:18 AM','M/d/y h:m:s a')
M - month, d - day of month, y - year, h - hour of day (12-hour), m - minute of hour, s - second of minute, a - AM/PM
Once the column is converted to a date, you can easily use the same datetime patterns to convert it back to a specific format. Use the following command to convert it to the required format:
date_format(date to_date('5/30/2021 9:35:18 AM','M/d/y h:m:s a'), 'yyyy/MM/dd')
Note: Depending upon whether you're getting zero left padded days, months, hours, minutes, and seconds, you'll need to tweak the above command.

How to generate a time series column from today to the next 600 days in pandas?

How to generate a time series column from today to the next 600 days in pandas?
I'm a new pandas learner. I can generate a new column as follows:
dates = pd.date_range('2010-01-01', '2011-8-23', freq='D')
Output:
DatetimeIndex(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',
'2010-01-05', '2010-01-06', '2010-01-07', '2010-01-08',
'2010-01-09', '2010-01-10',
...
'2011-08-14', '2011-08-15', '2011-08-16', '2011-08-17',
'2011-08-18', '2011-08-19', '2011-08-20', '2011-08-21',
'2011-08-22', '2011-08-23'],
dtype='datetime64[ns]', length=600, freq='D')
My question is: what should we do if we do only know the starting date, and the time period 600 days? we don't know the ending date. How to modify the code?
And another follow up questions, how to set the starting date to current or yesterday's date?
Just change the period to 600, you should get your out put
pd.date_range(start='2010-01-01', periods=5, freq='D')
Out[335]:
DatetimeIndex(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',
'2010-01-05'],
dtype='datetime64[ns]', freq='D')
For get today'date
pd.to_datetime('today')
Out[338]: Timestamp('2017-09-29 00:00:00')
First, import core package datetime
import datetime
Then you can instantiate a datetime object and add 600 days using the timedelta() method
start_date = "2010-01-01"
start_date = datetime.datetime.strptime(start_date, "%Y-%m-%d")
end_date = start_date + datetime.timedelta(days=600)
To now get the string back, we can use strftime() like:
end_date = end_date.strftime("%Y-%m-%d")
> '2011-08-24'

Matplotlib Default date format?

I'm using Pandas to read a .csv file that a 'Timestamp' date column in the format:
31/12/2016 00:00
I use the following line to convert it to a datetime64 dtype:
time = pd.to_datetime(df['Timestamp'])
The column has an entry corresponding to every 15mins for almost a year, and I've run into a problem when I want to plot more than 1 months worth.
Pandas seems to change the format from ISO to US upon reading (so YYYY:MM:DD to YYYY:DD:MM), so my plots have 30 day gaps whenever the datetime represents a new day. A plot of the first 5 days looks like:
This is the raw data in the file either side of the jump:
01/01/2017 23:45
02/01/2017 00:00
If I print the values being plotted (after reading) around the 1st jump, I get:
2017-01-01 23:45:00
2017-02-01 00:00:00
So is there a way to get pandas to read the dates properly?
Thanks!
You can specify a format parameter in pd.to_datetime to tell pandas how to parse the date exactly, which I suppose is what you need:
time = pd.to_datetime(df['Timestamp'], format='%d/%m/%Y %H:%M')
pd.to_datetime('02/01/2017 00:00')
#Timestamp('2017-02-01 00:00:00')
pd.to_datetime('02/01/2017 00:00', format='%d/%m/%Y %H:%M')
#Timestamp('2017-01-02 00:00:00')

Python Pandas detects the wrong datetime format

After loading data from a csv file, I set the index to the "Date" column and then convert the index to datetime.
df1=pd.read_csv('Data.csv')
df1=df1.set_index('Date')
df1.index=pd.to_datetime(df1.index)
However after conversion the datetime format shows it has been misinterpreted:
original date was e.g. 01-10-2014 00:00:00
but Pandas converts it to 2014-01-10 00:00:00
How can I get Pandas to respect or recognize the original date format?
Thank you
Your datestrings were being interpreted as month first, you need to specify the correct format:
df1.index=pd.to_datetime(df1.index, format='%d-%m-%Y %H:%M:%S')
so that it doesn't interpret the first part as the month
In [128]:
pd.to_datetime('01-10-2014 00:00:00', format='%d-%m-%Y %H:%M:%S')
Out[128]:
Timestamp('2014-10-01 00:00:00')