Pandas - converting datamonth yyyymm to datetime yyyy-mm-dd - pandas

I have a dataframe in pandas with some columns with dates in the following format
dates
202001
202002
I want to convert them to the following format
dates
2020-01-01
2020-02-01
Could anyone assist with converting the date format? Thanks

If need datetimes use to_datetime with format='%Y%m':
df['dates'] = pd.to_datetime(df['dates'], format='%Y%m')

You may use to_datetime here:
df["dates"] = pd.to_datetime(df["dates"] + '01', format='%Y%m%d', errors='ignore')
Note that your current text dates are year month only, so I concatenate 01 to the end of each one to form the first of the month, for each date.

Try this:
df['dates'] = df['dates'].astype(str)
df['dates'] = pd.to_datetime(df['dates'].str[:4] + ' ' + df['dates'].str[4:])
print(df)
Output:
dates
0 2020-01-01
1 2020-02-01

Related

How to display day first with pd.to_datetime()?

I have a data frame with date columns in the format: day / month / year
They are in string/object format.
I want to convert them to datetime.
Sample date, 5th of January 2016: '05/01/2016'
However pd.to_datetime is confusing the day and month.
Here is what I've tried:
pd.to_datetime('05/01/2016')
Timestamp('2016-05-01 00:00:00')
This has given me Year - Month - Day
I want Day - Month - Year as in: 05-01-2016
What I have tried:
pd.to_datetime('05/01/2016',dayfirst=True)
Timestamp('2016-01-05 00:00:00')
This is correct, but it's not the format I want, which is '05-01-2016'
So I tried this:
pd.to_datetime('05/01/2016',dayfirst=True,format='%d/%m/%Y')
Timestamp('2016-01-05 00:00:00')
There's no difference.
How can I do it? How can I force it to display the datetime as '05-01-2016'
The only way I know is to change the display options:
pd.set_option("display.date_dayfirst", True)
https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html#available-options
but it's not working... Else you convert the datetime type to str:
ts = pd.to_datetime('05/01/2016', format='%d/%m/%Y')
print(ts)
# Timestamp('2016-01-05 00:00:00')
ts = ts.strftime('%d-%m-%Y')
print(ts)
# '05-01-2016'
Or just replace '/' by '-':
print('05/01/2016'.replace('/', '-'))
# '05-01-2016'
You can't change the timestamp format (to my knowledge), but you can convert it to string in the wanted format like so :
>>> import pandas as pd
>>> pd.to_datetime('05/01/2016', dayfirst=True, format='%d/%m/%Y').strftime('%d-%m-%Y')
'05-01-2016'

pandas get days in a column from start date?

pandas get days in a column from start date?
start_date = '01/01/2021' (dd/mm/yyyy)
df
dates
2021-01-01
2021-01-02
.
.
.
2021-02-01
.
.
.
2021-06-01 (end date should be current date)
If there is always 1.1. pandas parse datetimes like mm/dd/YYYY so because same day and month here working well only passing string to date_range with to_datetime and now, default period='D', so omitted:
df = pd.DataFrame({'dates':pd.date_range(start_date, pd.to_datetime('now'))})
General solution with convert start_date by format dd/mm/YYYY is parsed also start_date with format parameter:
start_date = '01/05/2021'
df = pd.DataFrame({'dates': pd.date_range(pd.to_datetime(start_date, format='%d/%m/%Y'),
pd.to_datetime('now'))})
If you wand a dataframe output :
d = pd.date_range(start_date, pd.to_datetime('now'))
df = pd.DataFrame({'dates': d})

how to subtract days from current date and return date object in pandas

I want to subtract 30 days from current date and get date in following format.
final_date = 2019-12-24
I am doing following thing in pandas, but getting timestamp object in return
final_date = pd.to_datetime(pd.datetime.now().date() - timedelta(30))
How can I do it in pandas?
There is more solution for subtract today by Timestamp.floor with timedeltas or offsets:
final_date = pd.Timestamp.now().floor('d') - pd.Timedelta(30, unit='d')
final_date = pd.to_datetime('now').floor('d') - pd.DateOffset(days=30)
final_date = pd.to_datetime('now').floor('d') - pd.offsets.Day(30)
print (final_date)
2019-12-24 00:00:00
And last convert output to python object dates:
print (final_date.date())
2019-12-24
Or to strings:
print (final_date.strftime('%Y-%m-%d'))
2019-12-24
Use Series.strftime:
final_date = (pd.datetime.now().date() - pd.Timedelta(days = 30)).strftime('%Y-%m-%d')
#'2019-12-24'

convert dates to int in pandas

I have a date column of format YYYY-MM-DD and want to convert it to an int type, consecutively, where 1= Jan 1, 2000. So if I have a date 2000-01-31, it will convert to 31. If I have a date 2020-01-31 it will convert to (365*20yrs + 5 leap days), etc.
Is this possible to do in pandas?
I looked at Pandas: convert date 'object' to int, but this solution converts to an int 8 digits long.
First subtract column by Timestamp, convert timedelts to days by Series.dt.days and last add 1:
df = pd.DataFrame({"Date": ["2000-01-29", "2000-01-01", "2014-03-31"]})
d = '2000-01-01'
df["new"] = pd.to_datetime(df["Date"]).sub(pd.Timestamp(d)).dt.days + 1
print( df )
Date new
0 2000-01-29 29
1 2000-01-01 1
2 2014-03-31 5204

Python Pandas detects the wrong datetime format

After loading data from a csv file, I set the index to the "Date" column and then convert the index to datetime.
df1=pd.read_csv('Data.csv')
df1=df1.set_index('Date')
df1.index=pd.to_datetime(df1.index)
However after conversion the datetime format shows it has been misinterpreted:
original date was e.g. 01-10-2014 00:00:00
but Pandas converts it to 2014-01-10 00:00:00
How can I get Pandas to respect or recognize the original date format?
Thank you
Your datestrings were being interpreted as month first, you need to specify the correct format:
df1.index=pd.to_datetime(df1.index, format='%d-%m-%Y %H:%M:%S')
so that it doesn't interpret the first part as the month
In [128]:
pd.to_datetime('01-10-2014 00:00:00', format='%d-%m-%Y %H:%M:%S')
Out[128]:
Timestamp('2014-10-01 00:00:00')