Check if dataframe contain Date or Time - pandas

I want to check if dataframe contain a date or datetime value in python dataframe. It is possible to do it ?
df = {'Latitude':['19.34', '19.42', '-4.34', '35.10'],
'Date':['2019-03-13', '2016-07-08', '2018-03-08', '2014-01-17']}
and make a function to check date
def CheckDate():
return True
CheckDate(df)
True

Just use in:
df = {'Latitude':['19.34', '19.42', '-4.34', '35.10'],
'Date':['2019-03-13', '2016-07-08', '2018-03-08', '2014-01-17']}
'2019-03-13' in df['Date']
True
'2019-03-30' in in df['Date']
False
In function it would look like this:
def checkDate(df, date)
return date in df['Date']

You can try below code to check if a given date exists in a dataframe:
yourdate = '2018-03-08'
print((df['Date'] == yourdate).any())
output:
True
Alternatively, you can try this:
print('2018-03-08' in df['Date'].values)
output:
True
Desired Code:
def checkDate(df, date):
if date in df['Date'].values:
return True
return False
df = pd.DataFrame({'Latitude':['19.34', '19.42', '-4.34', '35.10'],
'Date':['2019-03-13', '2016-07-08', '2018-03-08', '2014-01-17']})
print(checkDate(df, '2018-03-08'))
print(checkDate(df, '2018-03-09'))
output:
True
False
You can also write you function like this:
def checkDate(df, date):
return date in df['Date'].values

Related

KeyError: 'date' Pandas

```if __name__ == "__main__":
pd.options.display.float_format = '{:.4f}'.format
temp1 = pd.read_csv('_4streams_alabama.csv.gz')
temp1['date'] = pd.to_datetime(temp1['date'])
def vacimpval(x):
for date in x['date'].unique():
if date >= '2022-06-16':
x['vac_count'] = x['vac_count'].interpolate()
x['vac_count'] = x['vac_count'].astype(int)
for location in temp1['location_name'].unique():
s = temp1.apply(vacimpval)```
In the code above, I am trying to use this function for all the location so that I can fill in the values using the interpolate method() but I don't know why I keep getting an key error
Source of the error:
Since there are only two places in your code where you access 'date',
and as you said, temp1.columns contains 'date', then the problem is in x['date'].

Pandas - get_loc nearest for whole column

I have a df with date and price.
Given a datetime, I would like to find the price at the nearest date.
This works for one input datetime:
import requests, xlrd, openpyxl, datetime
import pandas as pd
file = "E:/prices.csv" #two columns: Timestamp (UNIX epoch), Price (int)
df = pd.read_csv(file, index_col=None, names=["Timestamp", "Price"])
df['Timestamp'] = pd.to_datetime(df['Timestamp'],unit='s')
df = df.drop_duplicates(subset=['Timestamp'], keep='last')
df = df.set_index('Timestamp')
file = "E:/input.csv" #two columns: ID (string), Date (dd-mm-yyy hh:ss:mm)
dfinput = pd.read_csv(file, index_col=None, names=["ID", "Date"])
dfinput['Date'] = pd.to_datetime(dfinput['Date'], dayfirst=True)
exampledate = pd.to_datetime("20-3-2020 21:37", dayfirst=True)
exampleprice = df.iloc[df.index.get_loc(exampledate, method='nearest')]["Price"]
print(exampleprice) #price as output
I have another dataframe with the datetimes ("dfinput") I want to lookup prices of and save in a new column "Price".
Something like this which is obviously not working:
dfinput['Date'] = pd.to_datetime(dfinput['Date'], dayfirst=True)
dfinput['Price'] = df.iloc[df.index.get_loc(dfinput['Date'], method='nearest')]["Price"]
dfinput.to_csv('output.csv', index=False, columns=["Hash", "Date", "Price"])
Can I do this for a whole column or do I need to iterate over all rows?
I think you need merge_asof (cannot test, because no sample data):
df = df.sort_index('Timestamp')
dfinput = dfinput.sort_values('Date')
df = pd.merge_asof(df, dfinput, left_index=True, right_on='Date', direction='nearest')

How to merge 4 dataframes into one

I have created a functions that returns a dataframe.Now, i want merge all dataframe into one. First, i called all the function and used reduce and merge function.It did not work as expected.The error i am getting is "cannot combine function.It should be dataframe or series.I checked the type of my df,it is dataframe not functions. I don't know where the error is coming from.
def func1():
return df1
def func2():
return df2
def func3():
return df3
def func4():
return df4
def alldfs():
df_1 = func1()
df_2 = func2()
df_3 = func3()
df_4 = func4()
result = reduce(lambda df_1,d_2,df_3,df_4: pd.merge(df_1,df_2,df_3,df_4,on ="EMP_ID"),[df1,df2,df3,df4)
print(result)
You could try something like this ( assuming that EMP_ID is common across all dataframes and you want the intersection of all dataframes ) -
result = pd.merge(df1, df2, on='EMP_ID').merge(df3, on='EMP_ID').merge(df4, on='EMP_ID')

Convert date manipulation code to a function then apply it to multiple columns

Let's say a given dataframe df contains two date type columns start_date and end_date, they both need to be manipulated with the code below:
df['date'] = df['date'].str.split('d').str[0].add('d')
df['date'] = df['date'].str.replace('Y', '-').str.replace('m', '-').str.replace('d', '')
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce').dt.date
Just wonder how I can convert it to a function date_manipulate like this:
def date_manipulate(x):
return ...
Then apply it to those two columns, thanks for your help.
df[['start_date', 'end_date']] = df[['start_date', 'end_date']].apply(date_manipulate)
Change df['date'] for x, because DataFrame.apply processing both columns like Series:
def date_manipulate(x):
x = x.str.split('d').str[0].add('d')
x = x.str.replace('Y', '-').str.replace('m', '-').str.replace('d', '')
x = pd.to_datetime(x, format='%Y-%m-%d', errors='coerce').dt.date
return x
Also is possible simplify code:
def date_manipulate(x):
x = x.str.split('d').str[0].add('d')
x = pd.to_datetime(x, format='%YY%mm%dd', errors='coerce').dt.date
return x

Dataframe column filter from a list of tuples

I'm trying to create a function to filter a dataframe from a list of tuples. I've created the below function but it doesn't seem to be working.
The list of tuples would be have dataframe column name, and a min value and a max value to filter.
eg:
eg_tuple = [('colname1', 10, 20), ('colname2', 30, 40), ('colname3', 50, 60)]
My attempted function is below:
def col_cut(df, cutoffs):
for c in cutoffs:
df_filter = df[ (df[c[0]] >= c[1]) & (df[c[0]] <= c[2])]
return df_filter
Note that the function should not filter on rows where the value is equal to max or min. Appreciate the help.
The problem is that you each time take df as the source to filter. You should filter with:
def col_cut(df, cutoffs):
df_filter = df
for col, mn, mx in cutoffs:
dfcol = df_filter[col]
df_filter = df_filter[(dfcol >= mn) & (dfcol <= mx)]
return df_filter
Note that you can use .between(..) [pandas-doc] here:
def col_cut(df, cutoffs):
df_filter = df
for col, mn, mx in cutoffs:
df_filter = df_filter[df_filter[col].between(mn, mx)]
return df_filter
Use np.logical_and + reduce of all masks created by list comprehension with Series.between:
def col_cut(df, cutoffs):
mask = np.logical_and.reduce([df[col].between(min1,max1) for col,min1,max1 in cutoffs])
return df[mask]