Looping through all elements of a pandas dataframe? - pandas

I am trying to loop through a dataframe that is 4x3 with the following code:
R = []
Ratio = pd.DataFrame(R)
for column in Three_Distances:
column_location = Three_Distances.columns.get_loc(column)
for rows, row in Three_Distances.iterrows():
if column == 'post_one':
Ratio[column_location,rows] =
1/Three_Distances[column].divide(D1_theoretical)
if column == 'post_two':
Ratio[column_location,rows] =
1/Three_Distances[column].divide(D2_theoretical)
if column == 'post_three':
Ratio[column_location,rows] =
1/Three_Distances[column].divide(D3_theoretical)
My output for variable Ratio is 4x12 instead of 4x3. I am wondering if I am not properly understanding the tools or loops that I am using?
Thanks

Related

KeyError to existing Colums in my panda dataframe after transposing, with dtype='object'

So i have some trouble with trying to make basic arithmetic with columns in pandas. the datatype of my columns after transposing are 'object'. Because of this i get a KeyError when trying to add a column to another column.
After transposing I have added the code:
dg.columns = dg.columns.astype(str)
The dont seem to react to it, anyone knows how to solve this?
my full code:
dg = pd.read_csv("file.csv", encoding="latin-1", header = None)
dg = dg.T
dg.columns= dg.iloc[0]
dg = dg.reindex(dg.index.drop(0))
dg.index.name = 'Date'
dg = dg.fillna(0)
dg.drop(dg.columns.difference(['Category','revenue','Result']), 1, inplace=True)
dg.columns = dg.columns.astype(str)
print (dg.columns)
dg['revenue','Result'] = pd.to_numeric(dg['revenue','Result'], errors='coerce')
dg['cost'] = dg['revenue']* - dg['Result']
dg = dg.groupby('Category','revenue','Result','cost').agg(sum).reset_index()
print (dg[:5])

Pandas to SUM of list items in a DataFrame

Below i have list bu_lst which I'm passing to a dataframe df2 to do the sum of each individual item in the list, How could i achieve that in one go so, i do not repeat it multiple times:
bu_lst = ['FPG','IPG','DSG','STG','WFO','IT']
FPG = ['ADE','FPG AE','FPG PE','MMSIM','OrFAD','Tirtuoso DashBoard','SPB AE','SPB PE']
IPG = ['DDR','DDR_DT','Tensilica']
DSG = ['FLA','FLS','FEQoS','IFD PT','Sasus R&D','sasus'] PE','Toltus','Tempus','Quantus','Genus']
STG = ['ATS','HST','TIP','System Engineering']
WFO = ['AFademiF Network','FRAFT','Fhip Estimate','EduFation SerTiFes','LiFensing','Sales','SerTiFes','TFAD']
IT = ['App Development','Fumulus','InfoSeF']
My current approach:
print(df2[FPG].sum())
print(df2[IPG].sum())
print(df2[DSG].sum())
print(df2[STG].sum())
print(df2[WFO].sum())
print(df2[IT].sum())
I just the took the relevant line of the code to show here.
You can create dictionary of lists and then use dictionary comprehension if in lists are columns names:
d = {'bu_lst':bu_lst, 'FPG': FPG, ...}
d2 = {k: df2[v].sum() for k, v in d.items()}

data type with both alphabets and integers in pandas data frame

I have data frame with the column 'Item'. When I run this query:
df = pd.DataFrame({'SKUIDs': fullFrame.Item})
bySKU = pd.DataFrame(df.groupby(['SKUIDs']).size(),columns = ['Lines'])
the results are like this:
How can I get this:
fullFrame.Item.str.strip().value_counts().to_frame('Lines')
if you have whitespaces in the values of the SKUIDs column, do the following
bySKU['SKUIDs'] = bySKU['SKUIDs'].str.strip()
bySKU = bySKU.groupby('SKUIDs', as_index=False).agg({'Lines':'sum'})
You can also proceed as follows:
df = pd.DataFrame({'SKUIDs': fullFrame.Item})
bySKU['SKUIDs'] = bySKU['SKUIDs'].str.strip()
bySKU = pd.DataFrame(df.groupby(['SKUIDs']).size(),columns = ['Lines'])

Assigning values to dataframe columns

In the below code, the dataframe df5 is not getting populated. I am just assigning the values to dataframe's columns and I have specified the column beforehand. When I print the dataframe, it returns an empty dataframe. Not sure whether I am missing something.
Any help would be appreciated.
import math
import pandas as pd
columns = ['ClosestLat','ClosestLong']
df5 = pd.DataFrame(columns=columns)
def distance(pt1, pt2):
return math.sqrt((pt1[0] - pt2[0])**2 + (pt1[1] - pt2[1])**2)
for pt1 in df1:
closestPoints = [pt1, df2[0]]
for pt2 in df2:
if distance(pt1, pt2) < distance(closestPoints[0], closestPoints[1]):
closestPoints = [pt1, pt2]
df5['ClosestLat'] = closestPoints[1][0]
df5['ClosestLat'] = closestPoints[1][0]
df5['ClosestLong'] = closestPoints[1][1]
print ("Point: " + str(closestPoints[0]) + " is closest to " + str(closestPoints[1]))
From the look of your code, you're trying to populate df5 with a list of latitudes and longitudes. However, you're making a couple mistakes.
The columns of pandas dataframes are Series, and hold some type of sequential data. So df5['ClosestLat'] = closestPoints[1][0] attempts to assign the entire column a single numerical value, and results in an empty column.
Even if the dataframe wasn't ignoring your attempts to assign a real number to the column, you would lose data because you are overwriting the column with each loop.
The Solution: Build a list of lats and longs, then insert into the dataframe.
import math
import pandas as pd
columns = ['ClosestLat','ClosestLong']
df5 = pd.DataFrame(columns=columns)
def distance(pt1, pt2):
return math.sqrt((pt1[0] - pt2[0])**2 + (pt1[1] - pt2[1])**2)
lats, lngs = [], []
for pt1 in df1:
closestPoints = [pt1, df2[0]]
for pt2 in df2:
if distance(pt1, pt2) < distance(closestPoints[0], closestPoints[1]):
closestPoints = [pt1, pt2]
lats.append(closestPoints[1][0])
lngs.append(closestPoints[1][1])
df['ClosestLat'] = pd.Series(lats)
df['ClosestLong'] = pd.Series(lngs)

Populating data to individual columns in pandas dataframe

I am trying to get the data from the list (list_addresses) and populate it to different columns of the dataframe (dfloc). I use the below code, not sure where I am going wrong.
Values are present in list_addresses but not getting populated to the dataframe.
Any help would be appreciated.
for index in range(len(list_addresses)):
location = geolocator.reverse([list_addresses[index][0],list_addresses[index][1]])
dfloc.loc[dfloc.Latitude] = list_addresses[index][0]
dfloc.loc[dfloc.Longitude] = list_addresses[index][1]
dfloc.loc[dfloc.Address] = location.address
So it looks like you have a list of lists or tuples with form of [(Lat1,Lon1),(Lat2,Lon2), etc...]. I like to make a list for each column, then assign the entire column at once:
lat_list = [x[0] for x in list_addresses]
lon_list = [x[1] for x in list_addresses]
address_list = []
for index in range(len(list_addresses)):
location = geolocator.reverse([list_addresses[index][0],list_addresses[index][1]])
address_list.append(location.address)
dfloc['Latitude'] = lat_list
dfloc['Longitude'] = lon_list
dfloc['Address'] = address_list