I have a date column of format YYYY-MM-DD and want to convert it to an int type, consecutively, where 1= Jan 1, 2000. So if I have a date 2000-01-31, it will convert to 31. If I have a date 2020-01-31 it will convert to (365*20yrs + 5 leap days), etc.
Is this possible to do in pandas?
I looked at Pandas: convert date 'object' to int, but this solution converts to an int 8 digits long.
First subtract column by Timestamp, convert timedelts to days by Series.dt.days and last add 1:
df = pd.DataFrame({"Date": ["2000-01-29", "2000-01-01", "2014-03-31"]})
d = '2000-01-01'
df["new"] = pd.to_datetime(df["Date"]).sub(pd.Timestamp(d)).dt.days + 1
print( df )
Date new
0 2000-01-29 29
1 2000-01-01 1
2 2014-03-31 5204
Related
pandas get days in a column from start date?
start_date = '01/01/2021' (dd/mm/yyyy)
df
dates
2021-01-01
2021-01-02
.
.
.
2021-02-01
.
.
.
2021-06-01 (end date should be current date)
If there is always 1.1. pandas parse datetimes like mm/dd/YYYY so because same day and month here working well only passing string to date_range with to_datetime and now, default period='D', so omitted:
df = pd.DataFrame({'dates':pd.date_range(start_date, pd.to_datetime('now'))})
General solution with convert start_date by format dd/mm/YYYY is parsed also start_date with format parameter:
start_date = '01/05/2021'
df = pd.DataFrame({'dates': pd.date_range(pd.to_datetime(start_date, format='%d/%m/%Y'),
pd.to_datetime('now'))})
If you wand a dataframe output :
d = pd.date_range(start_date, pd.to_datetime('now'))
df = pd.DataFrame({'dates': d})
I have a dataframe in pandas with some columns with dates in the following format
dates
202001
202002
I want to convert them to the following format
dates
2020-01-01
2020-02-01
Could anyone assist with converting the date format? Thanks
If need datetimes use to_datetime with format='%Y%m':
df['dates'] = pd.to_datetime(df['dates'], format='%Y%m')
You may use to_datetime here:
df["dates"] = pd.to_datetime(df["dates"] + '01', format='%Y%m%d', errors='ignore')
Note that your current text dates are year month only, so I concatenate 01 to the end of each one to form the first of the month, for each date.
Try this:
df['dates'] = df['dates'].astype(str)
df['dates'] = pd.to_datetime(df['dates'].str[:4] + ' ' + df['dates'].str[4:])
print(df)
Output:
dates
0 2020-01-01
1 2020-02-01
I made a file that had three date columns:
pd.DataFrame({'yyyymm':[199501],'yyyy':[1995],'mm':[1],'Address':['AL1'],'Number':[12]})
yyyymm yyyy mm Address Number
0 199501 1995 1 AL1 12
and saved it as a file:
df.to_csv('complete.csv')
I read in the file with:
df=pd.read_csv('complete.csv')
and my 3 date columns are converted to int's, and not dates.
I tried to convert them back to dates with:
df['yyyymm']=df['yyyymm'].astype(str).dt.strftime('%Y%m')
df['yyyy']=df['yyyy'].dt.strftime('%Y')
df['mm']=df['mm'].dt.dtrftime('%m')
with the error:
AttributeError: Can only use .dt accessor with datetimelike values
Very odd, as the command I used to make the datetime column was:
df['yyyymm']=df['col2'].dt.strftime('%Y%m')
Am I missing something? HOw can I convert the 6 digit column back to yyyymm datetime, the 4 digit column to yyyy datetime, and the mm digit column back to datetime?
The columns yyyymm and yyyy and mm are integers. By using .astype(str), you convert these to strings. But a string has no .dt.
You can use pd.to_datetime(..) [pandas-doc] to convert these to a datetime object:
df['yyyymm'] = pd.to_datetime(df['yyyymm'].astype(str), format='%Y%m')
Indeed, this gives us:
>>> pd.to_datetime(df['yyyymm'].astype(str), format='%Y%m')
0 1995-01-01
Name: yyyymm, dtype: datetime64[ns]
The same can be done for the yyyy and mm columns:
>>> pd.to_datetime(df['yyyy'].astype(str), format='%Y')
0 1995-01-01
Name: yyyy, dtype: datetime64[ns]
>>> pd.to_datetime(df['mm'].astype(str), format='%m')
0 1900-01-01
Name: mm, dtype: datetime64[ns]
I am working with a time series data in pandas df that doesn't have a real calendar date but an index value that indicates an equal time interval in between each value. I'm trying to convert it into a datetime type with daily or weekly frequency. Is there a way to keep the values same while changing the type (like without setting an actual calander date)?
Index,Col1,Col2
1,6.5,0.7
2,6.2,0.3
3,0.4,2.1
pd.to_datetime can create dates when given time units relative to some origin. The default is the POSIX origin 1970-01-01 00:00:00 and time in nanoseconds.
import pandas as pd
df['date1'] = pd.to_datetime(df.index, unit='D', origin='2010-01-01')
df['date2'] = pd.to_datetime(df.index, unit='W')
Output:
# Col1 Col2 date1 date2
#Index
#1 6.5 0.7 2010-01-02 1970-01-08
#2 6.2 0.3 2010-01-03 1970-01-15
#3 0.4 2.1 2010-01-04 1970-01-22
Alternatively, you can add timedeltas to the specified start:
pd.to_datetime('2010-01-01') + pd.to_timedelta(df.index, unit='D')
or just keep them as a timedelta:
pd.to_timedelta(df.index, unit='D')
#TimedeltaIndex(['1 days', '2 days', '3 days'], dtype='timedelta64[ns]', name='Index', freq=None)
How to subtract two dateTime field containing dateTime in ISO format and get the result in hours?
I have tried subtracting two date fields but it has just subtracted date and not taken time into consideration
to_number(
TRUNC(to_timestamp(T1.attribute_2,'YYYY-MM-DD"T"HH24:MI:SS.ff3"Z"'))-
TRUNC(to_timestamp(T2.attribute_2,'YYYY-MM-DD"T"HH24:MI:SS.ff3"Z"'))
)
Date 1 2019-04-26 10:00pm
Date 2 2019-04-26 8:00pm
Expected Outcome: Date1- Date 2 = 2(in hrs)
Actual Outcome: Date1- Date 2 should give 0
If you want to take the hours into consideration, then don't truncate the values! TRUNC() removes the time component.
For hours, multiply the difference by 24:
(to_timestamp(T1.attribute_2,'YYYY-MM-DD"T"HH24:MI:SS.ff3"Z"')-
to_timestamp(T2.attribute_2,'YYYY-MM-DD"T"HH24:MI:SS.ff3"Z"')
) * 24