How to get dates along with the functions I perform?

How to get dates along with the functions I perform? - pandas

My initial data frame is like this:
import pandas as pd
df = pd.DataFrame({'serialNo':['aaaa','aaaa','cccc','ffff'],
'Date':['2018-09-15','2018-09-16','2018-09-15','2018-09-19'],
'moduleLocation': ['face','head','stomach','legs'],
'moduleName': ['singing', 'dance','booze', 'vocals'],
'warning': [4402, 3747 ,5555,8754],
'failed':[0,3462,5161,3262]})
I have performed the following functions to clean up the data the first is to make all the datatypes as string:
all_columns = list(df)
df[all_columns] = df[all_columns].astype(str)
This is followed by the function to perform certain concatenations:
def concatenate(diagnostics, field, target):
diagnostics.sort_values(by=['serialNo',field],inplace=True)
diagnostics.drop_duplicates(inplace=True)
diagnostics[target] = \
diagnostics.groupby(['serialNo'], as_index=False)[field].transform(lambda s: ','.join(filter(None, s)))
diagnostics.drop([field],axis=1,inplace=True)
diagnostics.drop_duplicates(inplace=True)
return diagnostics
module = concatenate(df[['serialNo','moduleName']], 'moduleName', 'Module')
Warn = concatenate(df[['serialNo','warning']], 'warning', 'Warn')
Err = concatenate(df[['serialNo','failed']], 'failed', 'Err')
Location = concatenate(df[['serialNo','moduleLocation']], 'moduleLocation', 'Location')
diag_final = pd.merge(module,Warn,on=['serialNo'],how='inner')
diag_final = pd.merge(diag_final,Err,on=['serialNo'],how='inner')
diag_final = pd.merge(diag_final,Location,on=['serialNo'],how='inner')
Now the problem is the Date column no longer exists in my diag_final data frame and I would like to have it. I do not want to make changes to the existing function but just make sure that I have the corresponding Dates. How should I achieve this?

There are likely to be multiple values for each serial number. Hence, you will have to concatenate the values, similar what you are doing for moduleLocation, and moduleName.
dates = concatenate(df[['serialNo','Date']], 'Date', 'Date_cat')
diag_final = pd.merge(diag_final,dates,on=['serialNo'],how='inner')

Related

KeyError: 'date' Pandas

```if __name__ == "__main__":
pd.options.display.float_format = '{:.4f}'.format
temp1 = pd.read_csv('_4streams_alabama.csv.gz')
temp1['date'] = pd.to_datetime(temp1['date'])
def vacimpval(x):
for date in x['date'].unique():
if date >= '2022-06-16':
x['vac_count'] = x['vac_count'].interpolate()
x['vac_count'] = x['vac_count'].astype(int)
for location in temp1['location_name'].unique():
s = temp1.apply(vacimpval)```
In the code above, I am trying to use this function for all the location so that I can fill in the values using the interpolate method() but I don't know why I keep getting an key error

Source of the error:
Since there are only two places in your code where you access 'date',
and as you said, temp1.columns contains 'date', then the problem is in x['date'].

Loop over Pandas dataframe to populate list (Python)

I have the following dataframe:
import pandas as pd
action = ['include','exclude','ignore','include', 'exclude', 'exclude','ignore']
names = ['john','michael','joshua','peter','jackson','john', 'erick']
df = pd.DataFrame(list(zip(action,names)), columns = ['action','names'])
I also have a list of starting participants like this:
participants = [['michael','jackson','jeremiah','martin','luis']]
I want to iterate over df['action']. If df['action'] == 'include', add another list to the participants list that includes all previous names and the one in df['names']. So, after the first iteration, participants list should look like this:
participants = [['michael','jackson','jeremiah','martin','luis'],['michael','jackson','jeremiah','martin','luis','john']]
I have managed to achieve this with the following code (I don´t know if this part could be improved, although it is not my question):
for i, row in df.iterrows():
if df.at[i,'action'] == 'include':
person = [df.at[i,'names']]
old_list = participants[-1]
new_list = old_list + person
participants.append(new_list)
else:
pass
The main problem (and my question is), how do I accomplish the same but removing the name when df['action'] == 'exclude'? So, after the second iteration, I should have this list in participants:
participants = [['michael','jackson','jeremiah','martin','luis'],['michael','jackson','jeremiah','martin','luis','john'],['jackson','jeremiah','martin','luis','john']]

You can just add a elif to your code. With the remove method you can remove a item by value. Just be careful your person is a list and not a string. I just call it by index with [0].
elif df.at[i, 'action'] == 'exclude':
person = [df.at[i, 'names']]
participants.append(participants[-1].remove(person[0]))

Set Multiple Restrictions for Rows Called to Print in Pandas

import pandas as pd
import numpy as np
#load data
#data file and py file must be in same file path
df = pd.read_csv('cbp15st.txt', delimiter = ',', encoding = 'utf-8-
sig')
#define load data DataFrame columns
state = df['FIPSTATE']
industry = df['NAICS']
legal_form_of_organization = df['LFO']
suppression_flag = df['EMPFLAG']
total_establishment = df['EST']
establishment_1_4 = df['N1_4']
establishment_5_9 = df['N5_9']
establishment_10_19 = df['N10_19']
establishment_20_49 = df['N20_49']
establishment_50_99 = df['N50_99']
establishment_100_249 = df['N100_249']
establishment_250_499 = df['N250_499']
establishment_500_999 = df['N500_999']
establishment_1000_more = df['N1000']
#use df.loc to parse dataset for partiuclar value types
print(df.loc[df['EMPFLAG']=='A'], df.loc[df['FIPSTATE']==1],
df.loc[df['NAICS']=='------'])
Currently using df.loc to locate specific values from the df columns, but will read out those columns that contain all of these values, not only these values (like an or vs and statement)
Trying to find a way to place multiple restrictions on this to only get column reads that meet criteria x y and z.
Current Readout from above:
enter image description here

You can use & operator while specifying multiple filtering criteria, something like:
df1 = df.loc[(df['EMPFLAG']=='A']) & (df['FIPSTATE']==1) & (df['NAICS']=='------')]
print(df1)

Populating data to individual columns in pandas dataframe

I am trying to get the data from the list (list_addresses) and populate it to different columns of the dataframe (dfloc). I use the below code, not sure where I am going wrong.
Values are present in list_addresses but not getting populated to the dataframe.
Any help would be appreciated.
for index in range(len(list_addresses)):
location = geolocator.reverse([list_addresses[index][0],list_addresses[index][1]])
dfloc.loc[dfloc.Latitude] = list_addresses[index][0]
dfloc.loc[dfloc.Longitude] = list_addresses[index][1]
dfloc.loc[dfloc.Address] = location.address

So it looks like you have a list of lists or tuples with form of [(Lat1,Lon1),(Lat2,Lon2), etc...]. I like to make a list for each column, then assign the entire column at once:
lat_list = [x[0] for x in list_addresses]
lon_list = [x[1] for x in list_addresses]
address_list = []
for index in range(len(list_addresses)):
location = geolocator.reverse([list_addresses[index][0],list_addresses[index][1]])
address_list.append(location.address)
dfloc['Latitude'] = lat_list
dfloc['Longitude'] = lon_list
dfloc['Address'] = address_list

Applying different functions to different columns of grouped dataframe

I am new to Pandas. I have grouped a dataframe by date and applied a function to different columns of the dataframe as shown below
def func(x):
questionID = x['questionID'].size()
is_true = x['is_bounty'].sum()
is_closed = x['is_closed'].sum()
flag = True
return pd.Series([questionID, is_true, is_closed, flag], index=['questionID', 'is_true', 'is_closed', 'flag'])
df_grouped = df1.groupby(['date'], as_index = False)
df_grouped = df_grouped.apply(func)
But when I run this I get an error saying
questionID = x['questionID'].size()
TypeError: 'int' object is not callable.
When I do the same thing this way it doesn't give any error.
df_grouped1 = df_grouped['questionID'].size()
I don't understand where am I going wrong.

'int' object is not callable. means you have to use size without ()
x['questionID'].size
For some objects size is only value, for others it can be function.
The same can be with other values/functions.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to get dates along with the functions I perform? - pandas

Related

KeyError: 'date' Pandas

Loop over Pandas dataframe to populate list (Python)

Set Multiple Restrictions for Rows Called to Print in Pandas

Populating data to individual columns in pandas dataframe

Applying different functions to different columns of grouped dataframe

Categories

Resources