I'm using Pandas to read a .csv file that a 'Timestamp' date column in the format:
31/12/2016 00:00
I use the following line to convert it to a datetime64 dtype:
time = pd.to_datetime(df['Timestamp'])
The column has an entry corresponding to every 15mins for almost a year, and I've run into a problem when I want to plot more than 1 months worth.
Pandas seems to change the format from ISO to US upon reading (so YYYY:MM:DD to YYYY:DD:MM), so my plots have 30 day gaps whenever the datetime represents a new day. A plot of the first 5 days looks like:
This is the raw data in the file either side of the jump:
01/01/2017 23:45
02/01/2017 00:00
If I print the values being plotted (after reading) around the 1st jump, I get:
2017-01-01 23:45:00
2017-02-01 00:00:00
So is there a way to get pandas to read the dates properly?
Thanks!
You can specify a format parameter in pd.to_datetime to tell pandas how to parse the date exactly, which I suppose is what you need:
time = pd.to_datetime(df['Timestamp'], format='%d/%m/%Y %H:%M')
pd.to_datetime('02/01/2017 00:00')
#Timestamp('2017-02-01 00:00:00')
pd.to_datetime('02/01/2017 00:00', format='%d/%m/%Y %H:%M')
#Timestamp('2017-01-02 00:00:00')
Related
I have a netCDF dataset which includes coordinates of 'time' and 'depth'. The time coordinate has data stored in the format of days, where origin time is 'JAN1-0000 00:00:00' (Image for the dataset is attached below).
I want to know how to convert those days to correct datetime format ?
Thanks in advance!
There are 719528 days between year 0 and epoch (1970-01-01).
You can subtract those days and use to_datetime with days as unit:
time = np.array([731957.5, 731958.5])
out = pd.to_datetime(time-719528, unit='d')
output: DatetimeIndex(['2004-01-12 12:00:00', '2004-01-13 12:00:00'], dtype='datetime64[ns]', freq=None)
I'm having a trouble with pandas to_datetime function
When I call the function in this way:
import pandas as pd
pd.to_datetime(['Wed', 'Thu', 'Mon', 'Tue', 'Fri'], format='%a')
I get this result:
DatetimeIndex(['1900-01-01', '1900-01-01', '1900-01-01', '1900-01-01',
'1900-01-01'],
dtype='datetime64[ns]', freq=None)
I don't know why pandas don't recognize the correct format.
I want get a datetime object which has the right day independently the month or year
This is not a pandas issue but with datetime in python.
Here is the best documentation I can find why '1900-01-01' Python Datetime Technical Details.
Note:
For the datetime.strptime() class method, the default value is
1900-01-01T00:00:00.000: any components not specified in the format
string will be pulled from the default value.
Basically, it could be any Monday, Tuesday (day of the week name) in the month of Jan 1900. Therefore, it is ambiguous, thus returning the default value of 1900-01-01T00:00:00.000. If you put in a day of the month then that date is determinant using the given defaults of Jan 1900, so using strptime with 1 and %d does lead to the date of 1900-Jan-01 00:00:00.000 and 2 with %d will lead to 1900-Jan-02 00:00:00.000, just using Mon or Monday, is not determinant to a datetime.
This is my interpretation of the documentation and the issue you are experiencing.
I am working with a time series data in pandas df that doesn't have a real calendar date but an index value that indicates an equal time interval in between each value. I'm trying to convert it into a datetime type with daily or weekly frequency. Is there a way to keep the values same while changing the type (like without setting an actual calander date)?
Index,Col1,Col2
1,6.5,0.7
2,6.2,0.3
3,0.4,2.1
pd.to_datetime can create dates when given time units relative to some origin. The default is the POSIX origin 1970-01-01 00:00:00 and time in nanoseconds.
import pandas as pd
df['date1'] = pd.to_datetime(df.index, unit='D', origin='2010-01-01')
df['date2'] = pd.to_datetime(df.index, unit='W')
Output:
# Col1 Col2 date1 date2
#Index
#1 6.5 0.7 2010-01-02 1970-01-08
#2 6.2 0.3 2010-01-03 1970-01-15
#3 0.4 2.1 2010-01-04 1970-01-22
Alternatively, you can add timedeltas to the specified start:
pd.to_datetime('2010-01-01') + pd.to_timedelta(df.index, unit='D')
or just keep them as a timedelta:
pd.to_timedelta(df.index, unit='D')
#TimedeltaIndex(['1 days', '2 days', '3 days'], dtype='timedelta64[ns]', name='Index', freq=None)
I need to move back to the beginning of the month but if i'm already at the beginning I want to stay there. Pandas anchored offset with n=0 is supposed to do exactly that but it doesn't produce the expected results between the anchored points for the (-) MonthBegin .
For example for this
pd.Timestamp('2017-01-06 00:00:00') - pd.tseries.offsets.MonthBegin(n=0)
I expect to move me back to Timestamp('2017-01-01 00:00:00')
but instead I get Timestamp('2017-02-01 00:00:00')
What am I doing wrong? Or you think it's a bug?
I can also see that the same rule works fine for the MonthEnd so combining the 2 like below pd.Timestamp('2017-01-06 00:00:00')+pd.tseries.offsets.MonthEnd(n=0)-pd.tseries.offsets.MonthBegin(n=1)
I get the desired effect of Timestamp('2017-01-01 00:00:00') but my expectation for it to work with just - pd.tseries.offsets.MonthBegin(n=0)
To jump to the month's start, use:
ts + pd.tseries.offsets.MonthEnd(n=0) - pd.tseries.offsets.MonthBegin(n=1)
Yes, it's ugly, but it's the only method to jump to the first of the month while staying there if ts is already the first.
Quick demo:
>>> pd.date_range(dt.datetime(2016,12,30), dt.datetime(2017,2,2)).to_series() \
+ MonthEnd(n=0) - MonthBegin(n=1)
2016-12-30 2016-12-01
2016-12-31 2016-12-01
2017-01-01 2017-01-01
2017-01-02 2017-01-01
...
2017-01-31 2017-01-01
2017-02-01 2017-02-01
2017-02-02 2017-02-01
This is indeed the correct behavior that is witnessed which are part of the rules laid out in Anchored Offset Semantics for offsets supporting start/end of a particular frequency offset.
Consider the given example:
from pandas.tseries.offsets import MonthBegin
pd.Timestamp('2017-01-02 00:00:00') - MonthBegin(n=0)
Out[18]:
Timestamp('2017-02-01 00:00:00')
Note that the anchor point corresponding to MonthBegin offset is set as first of every month. Now, since the given timestamp clearly surpasses this day, these would automatically be treated as though it were a part of the next month and rolling (whether forward or backwards) would come into play only after that.
excerpt from docs
For the case when n=0, the date is not moved if on an anchor point,
otherwise it is rolled forward to the next anchor point.
To get what you're after, you need to provide n=1 which would roll the timestamp to the correct date.
pd.Timestamp('2017-01-02 00:00:00') - MonthBegin(n=1)
Out[20]:
Timestamp('2017-01-01 00:00:00')
If you had set the date on the exact anchor point, then also it would give you the desired result as per the attached docs.
pd.Timestamp('2017-01-01 00:00:00') - MonthBegin(n=0)
Out[21]:
Timestamp('2017-01-01 00:00:00')
After loading data from a csv file, I set the index to the "Date" column and then convert the index to datetime.
df1=pd.read_csv('Data.csv')
df1=df1.set_index('Date')
df1.index=pd.to_datetime(df1.index)
However after conversion the datetime format shows it has been misinterpreted:
original date was e.g. 01-10-2014 00:00:00
but Pandas converts it to 2014-01-10 00:00:00
How can I get Pandas to respect or recognize the original date format?
Thank you
Your datestrings were being interpreted as month first, you need to specify the correct format:
df1.index=pd.to_datetime(df1.index, format='%d-%m-%Y %H:%M:%S')
so that it doesn't interpret the first part as the month
In [128]:
pd.to_datetime('01-10-2014 00:00:00', format='%d-%m-%Y %H:%M:%S')
Out[128]:
Timestamp('2014-10-01 00:00:00')