Populating data to individual columns in pandas dataframe - pandas

I am trying to get the data from the list (list_addresses) and populate it to different columns of the dataframe (dfloc). I use the below code, not sure where I am going wrong.
Values are present in list_addresses but not getting populated to the dataframe.
Any help would be appreciated.
for index in range(len(list_addresses)):
location = geolocator.reverse([list_addresses[index][0],list_addresses[index][1]])
dfloc.loc[dfloc.Latitude] = list_addresses[index][0]
dfloc.loc[dfloc.Longitude] = list_addresses[index][1]
dfloc.loc[dfloc.Address] = location.address

So it looks like you have a list of lists or tuples with form of [(Lat1,Lon1),(Lat2,Lon2), etc...]. I like to make a list for each column, then assign the entire column at once:
lat_list = [x[0] for x in list_addresses]
lon_list = [x[1] for x in list_addresses]
address_list = []
for index in range(len(list_addresses)):
location = geolocator.reverse([list_addresses[index][0],list_addresses[index][1]])
address_list.append(location.address)
dfloc['Latitude'] = lat_list
dfloc['Longitude'] = lon_list
dfloc['Address'] = address_list

Related

Loop over Pandas dataframe to populate list (Python)

I have the following dataframe:
import pandas as pd
action = ['include','exclude','ignore','include', 'exclude', 'exclude','ignore']
names = ['john','michael','joshua','peter','jackson','john', 'erick']
df = pd.DataFrame(list(zip(action,names)), columns = ['action','names'])
I also have a list of starting participants like this:
participants = [['michael','jackson','jeremiah','martin','luis']]
I want to iterate over df['action']. If df['action'] == 'include', add another list to the participants list that includes all previous names and the one in df['names']. So, after the first iteration, participants list should look like this:
participants = [['michael','jackson','jeremiah','martin','luis'],['michael','jackson','jeremiah','martin','luis','john']]
I have managed to achieve this with the following code (I donĀ“t know if this part could be improved, although it is not my question):
for i, row in df.iterrows():
if df.at[i,'action'] == 'include':
person = [df.at[i,'names']]
old_list = participants[-1]
new_list = old_list + person
participants.append(new_list)
else:
pass
The main problem (and my question is), how do I accomplish the same but removing the name when df['action'] == 'exclude'? So, after the second iteration, I should have this list in participants:
participants = [['michael','jackson','jeremiah','martin','luis'],['michael','jackson','jeremiah','martin','luis','john'],['jackson','jeremiah','martin','luis','john']]
You can just add a elif to your code. With the remove method you can remove a item by value. Just be careful your person is a list and not a string. I just call it by index with [0].
elif df.at[i, 'action'] == 'exclude':
person = [df.at[i, 'names']]
participants.append(participants[-1].remove(person[0]))

Pandas to SUM of list items in a DataFrame

Below i have list bu_lst which I'm passing to a dataframe df2 to do the sum of each individual item in the list, How could i achieve that in one go so, i do not repeat it multiple times:
bu_lst = ['FPG','IPG','DSG','STG','WFO','IT']
FPG = ['ADE','FPG AE','FPG PE','MMSIM','OrFAD','Tirtuoso DashBoard','SPB AE','SPB PE']
IPG = ['DDR','DDR_DT','Tensilica']
DSG = ['FLA','FLS','FEQoS','IFD PT','Sasus R&D','sasus'] PE','Toltus','Tempus','Quantus','Genus']
STG = ['ATS','HST','TIP','System Engineering']
WFO = ['AFademiF Network','FRAFT','Fhip Estimate','EduFation SerTiFes','LiFensing','Sales','SerTiFes','TFAD']
IT = ['App Development','Fumulus','InfoSeF']
My current approach:
print(df2[FPG].sum())
print(df2[IPG].sum())
print(df2[DSG].sum())
print(df2[STG].sum())
print(df2[WFO].sum())
print(df2[IT].sum())
I just the took the relevant line of the code to show here.
You can create dictionary of lists and then use dictionary comprehension if in lists are columns names:
d = {'bu_lst':bu_lst, 'FPG': FPG, ...}
d2 = {k: df2[v].sum() for k, v in d.items()}

Looping through all elements of a pandas dataframe?

I am trying to loop through a dataframe that is 4x3 with the following code:
R = []
Ratio = pd.DataFrame(R)
for column in Three_Distances:
column_location = Three_Distances.columns.get_loc(column)
for rows, row in Three_Distances.iterrows():
if column == 'post_one':
Ratio[column_location,rows] =
1/Three_Distances[column].divide(D1_theoretical)
if column == 'post_two':
Ratio[column_location,rows] =
1/Three_Distances[column].divide(D2_theoretical)
if column == 'post_three':
Ratio[column_location,rows] =
1/Three_Distances[column].divide(D3_theoretical)
My output for variable Ratio is 4x12 instead of 4x3. I am wondering if I am not properly understanding the tools or loops that I am using?
Thanks

Set Multiple Restrictions for Rows Called to Print in Pandas

import pandas as pd
import numpy as np
#load data
#data file and py file must be in same file path
df = pd.read_csv('cbp15st.txt', delimiter = ',', encoding = 'utf-8-
sig')
#define load data DataFrame columns
state = df['FIPSTATE']
industry = df['NAICS']
legal_form_of_organization = df['LFO']
suppression_flag = df['EMPFLAG']
total_establishment = df['EST']
establishment_1_4 = df['N1_4']
establishment_5_9 = df['N5_9']
establishment_10_19 = df['N10_19']
establishment_20_49 = df['N20_49']
establishment_50_99 = df['N50_99']
establishment_100_249 = df['N100_249']
establishment_250_499 = df['N250_499']
establishment_500_999 = df['N500_999']
establishment_1000_more = df['N1000']
#use df.loc to parse dataset for partiuclar value types
print(df.loc[df['EMPFLAG']=='A'], df.loc[df['FIPSTATE']==1],
df.loc[df['NAICS']=='------'])
Currently using df.loc to locate specific values from the df columns, but will read out those columns that contain all of these values, not only these values (like an or vs and statement)
Trying to find a way to place multiple restrictions on this to only get column reads that meet criteria x y and z.
Current Readout from above:
enter image description here
You can use & operator while specifying multiple filtering criteria, something like:
df1 = df.loc[(df['EMPFLAG']=='A']) & (df['FIPSTATE']==1) & (df['NAICS']=='------')]
print(df1)

data type with both alphabets and integers in pandas data frame

I have data frame with the column 'Item'. When I run this query:
df = pd.DataFrame({'SKUIDs': fullFrame.Item})
bySKU = pd.DataFrame(df.groupby(['SKUIDs']).size(),columns = ['Lines'])
the results are like this:
How can I get this:
fullFrame.Item.str.strip().value_counts().to_frame('Lines')
if you have whitespaces in the values of the SKUIDs column, do the following
bySKU['SKUIDs'] = bySKU['SKUIDs'].str.strip()
bySKU = bySKU.groupby('SKUIDs', as_index=False).agg({'Lines':'sum'})
You can also proceed as follows:
df = pd.DataFrame({'SKUIDs': fullFrame.Item})
bySKU['SKUIDs'] = bySKU['SKUIDs'].str.strip()
bySKU = pd.DataFrame(df.groupby(['SKUIDs']).size(),columns = ['Lines'])