Convert string date column to int column for merge in python - pandas

I have two dataframe and I have to merge them with a date column,
The column of the first dataframe is an integer(year,month and day) and the second is a str(%d,/%m/&Y)
How can I convert the str dataframe to join them?

What we do is convert both of them to date format.
df1.Date=pd.to_datetime(df1.Date,format='%Y%m%d')
df2.Date=pd.to_datetime(df2.Date,format='%m/%d/%Y')
Then join or merge
df1.merge(df2, on = 'Date')# df1.join(df2) when the Date is index

Related

How to change date datetype of df column entries

Within a df, the entries of a column "Date" (n entries) are of type datetime.datetime and I want to convert every entry to type datetime.date. Can anyone help here? THX!
use to_datetime to convert the dates to datetiem
df[Date] = pd.to_datetime(df[Date]).dt.date

Extract the first 10 values of a column and create a new one [duplicate]

I am looking to convert datetime to date for a pandas datetime series.
I have listed the code below:
df = pd.DataFrame()
df = pandas.io.parsers.read_csv("TestData.csv", low_memory=False)
df['PUDATE'] = pd.Series([pd.to_datetime(date) for date in df['DATE_TIME']])
df['PUDATE2'] = datetime.datetime.date(df['PUDATE']) #Does not work
Can anyone guide me in right direction?
You can access the datetime methods of a Pandas series by using the .dt methods (in a aimilar way to how you would access string methods using .str. For your case, you can extract the date of your datetime column as:
df['PUDATE'].dt.date
This is a simple way to get day of month, from a pandas
#create a dataframe with dates as a string
test_df = pd.DataFrame({'dob':['2001-01-01', '2002-02-02', '2003-03-03', '2004-04-04']})
#convert column to type datetime
test_df['dob']= pd.to_datetime(test_df['dob'])
# Extract day, month , year using dt accessor
test_df['DayOfMonth']=test_df['dob'].dt.day
test_df['Month']=test_df['dob'].dt.month
test_df['Year']=test_df['dob'].dt.year
I think you need to specify the format for example
df['PUDATE2']=datetime.datetime.date(df['PUDATE'], format='%Y%m%d%H%M%S')
So you just need to know what format you are using

Comparision between date and integer in pandas

I have dataset df with date column. I have dates from 2020-01-01 to 2021-03-30 in date column. Now i have a variable like a=20210130(which is actually a date). I need take take values from the df which is <=a.
First idea is convert a to datetimes and compare, then filter by boolean indexing:
df['date'] = pd.to_datetime(df['date'])
a = 20210130
df = df[df['date'] <= pd.to_datetime(a)]
Or convert column to integers and compare:
a = 20210130
df = df[df['date'].dt.strftime('%Y%m%d').astype(int) <= a]

BigQuery: select rows where column value contains string

I would like to know how to know how to filter a table by a specific column, when this column contains a specific subtring.
Here's an example of my table:
I would like to obtain those rows where the column tsBegin contains 2020-08-04, maybe with something like:
SELECT * FROM mytable
where '2020-08-04' in tsBegin
Date operations:
where date(tsBegin) = date '2020-08-04'
A column named tsBegin should not be a string column, so you just want the date.
If tsBegin is a string, I would suggest that you convert it to a timestamp.
Use date functions, and half-open intervals:
select *
from mytable
where tsbegin >= date '2020-08-04'
and tsbegin < date_add(date '2020-08-04', interval 1 day)
Although a bit lenghtier to type, direct filtering against literal values is usually much faster than applying a date function on the column being filtered, as in where date(tsbegin) = date '2020-08-04'

column with dates into datetime index in dask

pd.DatetimeIndex(df_dask_dataframe['name_col'])
I have a dask dataframe for which I want to convert a column with dates into datetime index. However I get a not implemented error. Is there a workaround?
I think you need dask.dataframe.DataFrame.set_index if dtype of column is datetime64:
df_dask_dataframe = df_dask_dataframe.set_index('name_col')