KeyError: 'date' Pandas - pandas

```if __name__ == "__main__":
pd.options.display.float_format = '{:.4f}'.format
temp1 = pd.read_csv('_4streams_alabama.csv.gz')
temp1['date'] = pd.to_datetime(temp1['date'])
def vacimpval(x):
for date in x['date'].unique():
if date >= '2022-06-16':
x['vac_count'] = x['vac_count'].interpolate()
x['vac_count'] = x['vac_count'].astype(int)
for location in temp1['location_name'].unique():
s = temp1.apply(vacimpval)```
In the code above, I am trying to use this function for all the location so that I can fill in the values using the interpolate method() but I don't know why I keep getting an key error

Source of the error:
Since there are only two places in your code where you access 'date',
and as you said, temp1.columns contains 'date', then the problem is in x['date'].

Related

Pandas Data frame column condition check based on length of the value

I have pandas data frame which gets created by reading an excel file. The excel file has a column called serial number. Then I pass a serial number to another function which connect to API and fetch me the result set for those serial number.
My Code -:
def create_excel(filename):
try:
data = pd.read_excel(filename, usecols=[4,18,19,20,26,27,28],converters={'Serial Number': '{:0>32}'.format})
except Exception as e:
sys.exit("Error reading %s: %s" % (filename, e))
data["Subject Organization"].fillna("N/A",inplace= True)
df = data[data['Subject Organization'].str.contains("Fannie",case = False)]
#df['Serial Number'].apply(lamda x: '000'+x if len(x) == 29 else '00'+x if len(x) == 30 else '0'+x if len(x) == 31 else x)
print(df)
df.to_excel(r'Data.xlsx',index= False)
output = df['Serial Number'].apply(lambda x: fetch_by_ser_no(x))
df2 = pd.DataFrame(output)
df2.columns = ['Output']
df5 = pd.concat([df,df2],axis = 1)
The problem I am facing is I want to check if df5 returned by fetch_by_ser_no() is blank then make the serial number as 34 characters by adding two more leading 00 and then check the function again.
How can I do it by not creating multiple dataframe
Any help!!
Thanks
You can try to use if ... else ...:
output = df['Serial Number'].apply(lambda x: 'ok' if fetch_by_ser_no(x) else 'badly')

Return KDB query to a pandas dataframe

I would like to extract data from a KDB database and place into a dataframe. My query runs fine in qpad, no issues; just need to write it into my Pandas dataframe. My code:
from qpython import qconnection
# Create the connection and save the handle to a variable
q = qconnection.QConnection(host = 'wokplpaxvj003', port = 11503, username = 'pelucas', password = 'Dive2600', timeout = 3.0)
try:
# initialize connection
q.open()
print(q)
print('IPC version: %s. Is connected: %s' % (q.protocol_version, q.is_connected()))
df = q.sendSync('{select from quote_flat where date within (2019.08.14;2019.08.14), amendment_no = (max;amendment_no)fby quote_id}')
df.info()
finally:
q.close()
It fails on the df.info() raising AttributeError: 'QLambda' object has no attribute 'info' so I guess the call is not successful.
It looks like you've sent only a lambda but with no instruction to execute that lambda. Two options:
Don't make it a lambda
df = q.sendSync('select from quote_flat where date within (2019.08.14;2019.08.14), amendment_no = (max;amendment_no)fby quote_id')
Execute the lambda
df = q.sendSync('{select from quote_flat where date within (2019.08.14;2019.08.14), amendment_no = (max;amendment_no)fby quote_id}[]')

How to get dates along with the functions I perform?

My initial data frame is like this:
import pandas as pd
df = pd.DataFrame({'serialNo':['aaaa','aaaa','cccc','ffff'],
'Date':['2018-09-15','2018-09-16','2018-09-15','2018-09-19'],
'moduleLocation': ['face','head','stomach','legs'],
'moduleName': ['singing', 'dance','booze', 'vocals'],
'warning': [4402, 3747 ,5555,8754],
'failed':[0,3462,5161,3262]})
I have performed the following functions to clean up the data the first is to make all the datatypes as string:
all_columns = list(df)
df[all_columns] = df[all_columns].astype(str)
This is followed by the function to perform certain concatenations:
def concatenate(diagnostics, field, target):
diagnostics.sort_values(by=['serialNo',field],inplace=True)
diagnostics.drop_duplicates(inplace=True)
diagnostics[target] = \
diagnostics.groupby(['serialNo'], as_index=False)[field].transform(lambda s: ','.join(filter(None, s)))
diagnostics.drop([field],axis=1,inplace=True)
diagnostics.drop_duplicates(inplace=True)
return diagnostics
module = concatenate(df[['serialNo','moduleName']], 'moduleName', 'Module')
Warn = concatenate(df[['serialNo','warning']], 'warning', 'Warn')
Err = concatenate(df[['serialNo','failed']], 'failed', 'Err')
Location = concatenate(df[['serialNo','moduleLocation']], 'moduleLocation', 'Location')
diag_final = pd.merge(module,Warn,on=['serialNo'],how='inner')
diag_final = pd.merge(diag_final,Err,on=['serialNo'],how='inner')
diag_final = pd.merge(diag_final,Location,on=['serialNo'],how='inner')
Now the problem is the Date column no longer exists in my diag_final data frame and I would like to have it. I do not want to make changes to the existing function but just make sure that I have the corresponding Dates. How should I achieve this?
There are likely to be multiple values for each serial number. Hence, you will have to concatenate the values, similar what you are doing for moduleLocation, and moduleName.
dates = concatenate(df[['serialNo','Date']], 'Date', 'Date_cat')
diag_final = pd.merge(diag_final,dates,on=['serialNo'],how='inner')

Python Dash plotly update table

I have a DataFrame in pandas called interesttable, which is getting updated with time (seconds). I am using Dash plotly to display the dataframe. Although i successfully display the dataframe in Dash, I cannot update the Dash with the new rows added in the dataframe. I try the following but it doesn't work. Thank you for your feedback!
def generate_table():
return html.Table(
# Header
[html.Tr([html.Th(col) for col in interesttable.columns])] +
# Body
[html.Tr([html.Td(interesttable.iloc[i][col]) for col in interesttable.columns])
for i in range(min(len(interesttable), 50))]
)
app = dash.Dash()
app.layout = html.Div(children=[
html.H1(children='Interest Table'),
dcc.Interval(id='generate_table()',interval=1*1000),
generate_table()
])
app.callback(Output('generate_table()','children'), [Input('interesttable', 'n_intervals')])
if __name__ == '__main__':
app.run_server(debug=True)
Unfortunatelly the error I receive is:
Here is a list of the available properties in "generate_table()":
['id','interval','disabled','n_intervals','max_intervals']
Isn't app.callback supposed to be a decorator?
Try:
#app.callback(Output('generate_table()','children'), [Input('interesttable', 'n_intervals')])
I don't see also n_intervals declared.
You should add it:
app.layout = html.Div(children=[html.H1(children='Interest Table'),
dcc.Interval(id='generate_table()',interval=1000,n_intervals=0 ),generate_table()])

Applying different functions to different columns of grouped dataframe

I am new to Pandas. I have grouped a dataframe by date and applied a function to different columns of the dataframe as shown below
def func(x):
questionID = x['questionID'].size()
is_true = x['is_bounty'].sum()
is_closed = x['is_closed'].sum()
flag = True
return pd.Series([questionID, is_true, is_closed, flag], index=['questionID', 'is_true', 'is_closed', 'flag'])
df_grouped = df1.groupby(['date'], as_index = False)
df_grouped = df_grouped.apply(func)
But when I run this I get an error saying
questionID = x['questionID'].size()
TypeError: 'int' object is not callable.
When I do the same thing this way it doesn't give any error.
df_grouped1 = df_grouped['questionID'].size()
I don't understand where am I going wrong.
'int' object is not callable. means you have to use size without ()
x['questionID'].size
For some objects size is only value, for others it can be function.
The same can be with other values/functions.