Pandas: convert date in month to the 1st day of next month - pandas

I am wondering if there are any efficient methods or one-liner that, given a pandas DatetimeIndex date1, return a DatetimeIndex date2 that is the first day of the next month?
For example, if date1 is '2011-09-30' then date2 is '2011-10-01'?
I have tried this one liner
df.index.to_period("M").to_timestamp('M')
But this seems only able to return the "last day of the same month". Is it possible to do some datetime arithmetic here?

You can use pd.offsets.MonthBegin()
In [261]: d = pd.to_datetime(['2011-09-30', '2012-02-28'])
In [262]: d
Out[262]: DatetimeIndex(['2011-09-30', '2012-02-28'], dtype='datetime64[ns]', freq=None)
In [263]: d + pd.offsets.MonthBegin(1)
Out[263]: DatetimeIndex(['2011-10-01', '2012-03-01'], dtype='datetime64[ns]', freq=None)
You'll find a lot of examples in the official Pandas docs

Related

How to display day first with pd.to_datetime()?

I have a data frame with date columns in the format: day / month / year
They are in string/object format.
I want to convert them to datetime.
Sample date, 5th of January 2016: '05/01/2016'
However pd.to_datetime is confusing the day and month.
Here is what I've tried:
pd.to_datetime('05/01/2016')
Timestamp('2016-05-01 00:00:00')
This has given me Year - Month - Day
I want Day - Month - Year as in: 05-01-2016
What I have tried:
pd.to_datetime('05/01/2016',dayfirst=True)
Timestamp('2016-01-05 00:00:00')
This is correct, but it's not the format I want, which is '05-01-2016'
So I tried this:
pd.to_datetime('05/01/2016',dayfirst=True,format='%d/%m/%Y')
Timestamp('2016-01-05 00:00:00')
There's no difference.
How can I do it? How can I force it to display the datetime as '05-01-2016'
The only way I know is to change the display options:
pd.set_option("display.date_dayfirst", True)
https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html#available-options
but it's not working... Else you convert the datetime type to str:
ts = pd.to_datetime('05/01/2016', format='%d/%m/%Y')
print(ts)
# Timestamp('2016-01-05 00:00:00')
ts = ts.strftime('%d-%m-%Y')
print(ts)
# '05-01-2016'
Or just replace '/' by '-':
print('05/01/2016'.replace('/', '-'))
# '05-01-2016'
You can't change the timestamp format (to my knowledge), but you can convert it to string in the wanted format like so :
>>> import pandas as pd
>>> pd.to_datetime('05/01/2016', dayfirst=True, format='%d/%m/%Y').strftime('%d-%m-%Y')
'05-01-2016'

Pandas - converting datamonth yyyymm to datetime yyyy-mm-dd

I have a dataframe in pandas with some columns with dates in the following format
dates
202001
202002
I want to convert them to the following format
dates
2020-01-01
2020-02-01
Could anyone assist with converting the date format? Thanks
If need datetimes use to_datetime with format='%Y%m':
df['dates'] = pd.to_datetime(df['dates'], format='%Y%m')
You may use to_datetime here:
df["dates"] = pd.to_datetime(df["dates"] + '01', format='%Y%m%d', errors='ignore')
Note that your current text dates are year month only, so I concatenate 01 to the end of each one to form the first of the month, for each date.
Try this:
df['dates'] = df['dates'].astype(str)
df['dates'] = pd.to_datetime(df['dates'].str[:4] + ' ' + df['dates'].str[4:])
print(df)
Output:
dates
0 2020-01-01
1 2020-02-01

pd.to_datetime doesn't work with %a format

I'm having a trouble with pandas to_datetime function
When I call the function in this way:
import pandas as pd
pd.to_datetime(['Wed', 'Thu', 'Mon', 'Tue', 'Fri'], format='%a')
I get this result:
DatetimeIndex(['1900-01-01', '1900-01-01', '1900-01-01', '1900-01-01',
'1900-01-01'],
dtype='datetime64[ns]', freq=None)
I don't know why pandas don't recognize the correct format.
I want get a datetime object which has the right day independently the month or year
This is not a pandas issue but with datetime in python.
Here is the best documentation I can find why '1900-01-01' Python Datetime Technical Details.
Note:
For the datetime.strptime() class method, the default value is
1900-01-01T00:00:00.000: any components not specified in the format
string will be pulled from the default value.
Basically, it could be any Monday, Tuesday (day of the week name) in the month of Jan 1900. Therefore, it is ambiguous, thus returning the default value of 1900-01-01T00:00:00.000. If you put in a day of the month then that date is determinant using the given defaults of Jan 1900, so using strptime with 1 and %d does lead to the date of 1900-Jan-01 00:00:00.000 and 2 with %d will lead to 1900-Jan-02 00:00:00.000, just using Mon or Monday, is not determinant to a datetime.
This is my interpretation of the documentation and the issue you are experiencing.

How to generate a time series column from today to the next 600 days in pandas?

How to generate a time series column from today to the next 600 days in pandas?
I'm a new pandas learner. I can generate a new column as follows:
dates = pd.date_range('2010-01-01', '2011-8-23', freq='D')
Output:
DatetimeIndex(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',
'2010-01-05', '2010-01-06', '2010-01-07', '2010-01-08',
'2010-01-09', '2010-01-10',
...
'2011-08-14', '2011-08-15', '2011-08-16', '2011-08-17',
'2011-08-18', '2011-08-19', '2011-08-20', '2011-08-21',
'2011-08-22', '2011-08-23'],
dtype='datetime64[ns]', length=600, freq='D')
My question is: what should we do if we do only know the starting date, and the time period 600 days? we don't know the ending date. How to modify the code?
And another follow up questions, how to set the starting date to current or yesterday's date?
Just change the period to 600, you should get your out put
pd.date_range(start='2010-01-01', periods=5, freq='D')
Out[335]:
DatetimeIndex(['2010-01-01', '2010-01-02', '2010-01-03', '2010-01-04',
'2010-01-05'],
dtype='datetime64[ns]', freq='D')
For get today'date
pd.to_datetime('today')
Out[338]: Timestamp('2017-09-29 00:00:00')
First, import core package datetime
import datetime
Then you can instantiate a datetime object and add 600 days using the timedelta() method
start_date = "2010-01-01"
start_date = datetime.datetime.strptime(start_date, "%Y-%m-%d")
end_date = start_date + datetime.timedelta(days=600)
To now get the string back, we can use strftime() like:
end_date = end_date.strftime("%Y-%m-%d")
> '2011-08-24'

Matplotlib Default date format?

I'm using Pandas to read a .csv file that a 'Timestamp' date column in the format:
31/12/2016 00:00
I use the following line to convert it to a datetime64 dtype:
time = pd.to_datetime(df['Timestamp'])
The column has an entry corresponding to every 15mins for almost a year, and I've run into a problem when I want to plot more than 1 months worth.
Pandas seems to change the format from ISO to US upon reading (so YYYY:MM:DD to YYYY:DD:MM), so my plots have 30 day gaps whenever the datetime represents a new day. A plot of the first 5 days looks like:
This is the raw data in the file either side of the jump:
01/01/2017 23:45
02/01/2017 00:00
If I print the values being plotted (after reading) around the 1st jump, I get:
2017-01-01 23:45:00
2017-02-01 00:00:00
So is there a way to get pandas to read the dates properly?
Thanks!
You can specify a format parameter in pd.to_datetime to tell pandas how to parse the date exactly, which I suppose is what you need:
time = pd.to_datetime(df['Timestamp'], format='%d/%m/%Y %H:%M')
pd.to_datetime('02/01/2017 00:00')
#Timestamp('2017-02-01 00:00:00')
pd.to_datetime('02/01/2017 00:00', format='%d/%m/%Y %H:%M')
#Timestamp('2017-01-02 00:00:00')