Pandas to SUM of list items in a DataFrame - pandas

Below i have list bu_lst which I'm passing to a dataframe df2 to do the sum of each individual item in the list, How could i achieve that in one go so, i do not repeat it multiple times:
bu_lst = ['FPG','IPG','DSG','STG','WFO','IT']
FPG = ['ADE','FPG AE','FPG PE','MMSIM','OrFAD','Tirtuoso DashBoard','SPB AE','SPB PE']
IPG = ['DDR','DDR_DT','Tensilica']
DSG = ['FLA','FLS','FEQoS','IFD PT','Sasus R&D','sasus'] PE','Toltus','Tempus','Quantus','Genus']
STG = ['ATS','HST','TIP','System Engineering']
WFO = ['AFademiF Network','FRAFT','Fhip Estimate','EduFation SerTiFes','LiFensing','Sales','SerTiFes','TFAD']
IT = ['App Development','Fumulus','InfoSeF']
My current approach:
print(df2[FPG].sum())
print(df2[IPG].sum())
print(df2[DSG].sum())
print(df2[STG].sum())
print(df2[WFO].sum())
print(df2[IT].sum())
I just the took the relevant line of the code to show here.

You can create dictionary of lists and then use dictionary comprehension if in lists are columns names:
d = {'bu_lst':bu_lst, 'FPG': FPG, ...}
d2 = {k: df2[v].sum() for k, v in d.items()}

Related

Loop over Pandas dataframe to populate list (Python)

I have the following dataframe:
import pandas as pd
action = ['include','exclude','ignore','include', 'exclude', 'exclude','ignore']
names = ['john','michael','joshua','peter','jackson','john', 'erick']
df = pd.DataFrame(list(zip(action,names)), columns = ['action','names'])
I also have a list of starting participants like this:
participants = [['michael','jackson','jeremiah','martin','luis']]
I want to iterate over df['action']. If df['action'] == 'include', add another list to the participants list that includes all previous names and the one in df['names']. So, after the first iteration, participants list should look like this:
participants = [['michael','jackson','jeremiah','martin','luis'],['michael','jackson','jeremiah','martin','luis','john']]
I have managed to achieve this with the following code (I don´t know if this part could be improved, although it is not my question):
for i, row in df.iterrows():
if df.at[i,'action'] == 'include':
person = [df.at[i,'names']]
old_list = participants[-1]
new_list = old_list + person
participants.append(new_list)
else:
pass
The main problem (and my question is), how do I accomplish the same but removing the name when df['action'] == 'exclude'? So, after the second iteration, I should have this list in participants:
participants = [['michael','jackson','jeremiah','martin','luis'],['michael','jackson','jeremiah','martin','luis','john'],['jackson','jeremiah','martin','luis','john']]
You can just add a elif to your code. With the remove method you can remove a item by value. Just be careful your person is a list and not a string. I just call it by index with [0].
elif df.at[i, 'action'] == 'exclude':
person = [df.at[i, 'names']]
participants.append(participants[-1].remove(person[0]))

df.groupby('columns').apply(''.join()), join all the cells to a string

df.groupby('columns').apply(''.join()), join all the cells to a string.
This is for a junior dataprocessor. In the past, I've tried many ways.
import pandas as pd
data = {'key':['a','b','c','a','b','c','a'], 'profit':
[12,3,4,5,6,7,9],'income':['j','d','d','g','d','t','d']}
df = pd.DataFrame(data)
df = df.set_index(‘key’)
#df2 is expected result
data2 = {'a':['12j5g9d'],'b':['3d6d'],'c':['4d7t']}
df2 = pd.DataFrame(data2)
df2 = df2.set_index(‘key’)
Here's a simple solution, where we first translate the integers to strings and then concatenate profit and income, then finally we concatenate all strings under the same key:
data = {'key':['a','b','c','a','b','c','a'], 'profit':
[12,3,4,5,6,7,9],'income':['j','d','d','g','d','t','d']}
df = pd.DataFrame(data)
df['profit_income'] = df['profit'].apply(str) + df['income']
res = df.groupby('key')['profit_income'].agg(''.join)
print(res)
output:
key
a 12j5g9d
b 3d6d
c 4d7t
Name: profit_income, dtype: object
This question can be solved couple different ways:
First add an extra column by concatenating the profit and income columns.
import pandas as pd
data = {'key':['a','b','c','a','b','c','a'], 'profit':
[12,3,4,5,6,7,9],'income':['j','d','d','g','d','t','d']}
df = pd.DataFrame(data)
df = df.set_index('key')
df['profinc']=df['profit'].astype(str)+df['income']
1) Using sum
df2=df.groupby('key').profinc.sum()
2) Using apply and join
df2=df.groupby('key').profinc.apply(''.join)
Results from both of the above would be the same:
key
a 12j5g9d
b 3d6d
c 4d7t

Adding multiple dictionaries into a single Dataframe pandas

I have a set of python dictionaries that I have obtained by means of a for loop. I am trying to have these added to Pandas Dataframe.
Output for a variable called output
{'name':'Kevin','age':21}
{'name':'Steve','age':31}
{'name':'Mark','age':11}
I am trying to append each of these dictionary into a single Dataframe. I tried to perform the below but it just added the first row.
df = pd.DataFrame(output)
Could anyone advice as to where am going wrong and have all the dictionaries added to the Dataframe.
Update on the loop statement
The below code helps to read xml and convert it to a dataframe. Right now I see I am able to loop in through multiple xml files and created dictionaries for each xml file. I am trying to see how could I add each of these dictionaries to a single Dataframe:
def f(elem, result):
result[elem.tag] = elem.text
cs = elem.getchildren()
for c in cs:
result = f(c, result)
return result
result = {}
for file in allFiles:
tree = ET.parse(file)
root = tree.getroot()
result = f(root, result)
print(result)
You can append each dictionary to list and last call DataFrame constructor:
out = []
for file in allFiles:
tree = ET.parse(file)
root = tree.getroot()
result = f(root, result)
out.append(result)
df = pd.DataFrame(out)
We can add these dicts to a list:
ds = []
for ...: # your loop
ds += [d] # where d is one of the dicts
When we have the list of dicts, we can simply use pd.DataFrame on that list:
ds = [
{'name':'Kevin','age':21},
{'name':'Steve','age':31},
{'name':'Mark','age':11}
]
pd.DataFrame(ds)
Output:
name age
0 Kevin 21
1 Steve 31
2 Mark 11
Update:
And it's not a problem if different dicts have different keys, e.g.:
ds = [
{'name':'Kevin','age':21},
{'name':'Steve','age':31,'location': 'NY'},
{'name':'Mark','age':11,'favorite_food': 'pizza'}
]
pd.DataFrame(ds)
Output:
age favorite_food location name
0 21 NaN NaN Kevin
1 31 NaN NY Steve
2 11 pizza NaN Mark
Update 2:
Building up on our previous discussion in Python - Converting xml to csv using Python pandas we can do:
results = []
for file in glob.glob('*.xml'):
tree = ET.parse(file)
root = tree.getroot()
result = f(root, {})
result['filename'] = file # added filename to our results
results += [result]
pd.DataFrame(results)

Assigning values to dataframe columns

In the below code, the dataframe df5 is not getting populated. I am just assigning the values to dataframe's columns and I have specified the column beforehand. When I print the dataframe, it returns an empty dataframe. Not sure whether I am missing something.
Any help would be appreciated.
import math
import pandas as pd
columns = ['ClosestLat','ClosestLong']
df5 = pd.DataFrame(columns=columns)
def distance(pt1, pt2):
return math.sqrt((pt1[0] - pt2[0])**2 + (pt1[1] - pt2[1])**2)
for pt1 in df1:
closestPoints = [pt1, df2[0]]
for pt2 in df2:
if distance(pt1, pt2) < distance(closestPoints[0], closestPoints[1]):
closestPoints = [pt1, pt2]
df5['ClosestLat'] = closestPoints[1][0]
df5['ClosestLat'] = closestPoints[1][0]
df5['ClosestLong'] = closestPoints[1][1]
print ("Point: " + str(closestPoints[0]) + " is closest to " + str(closestPoints[1]))
From the look of your code, you're trying to populate df5 with a list of latitudes and longitudes. However, you're making a couple mistakes.
The columns of pandas dataframes are Series, and hold some type of sequential data. So df5['ClosestLat'] = closestPoints[1][0] attempts to assign the entire column a single numerical value, and results in an empty column.
Even if the dataframe wasn't ignoring your attempts to assign a real number to the column, you would lose data because you are overwriting the column with each loop.
The Solution: Build a list of lats and longs, then insert into the dataframe.
import math
import pandas as pd
columns = ['ClosestLat','ClosestLong']
df5 = pd.DataFrame(columns=columns)
def distance(pt1, pt2):
return math.sqrt((pt1[0] - pt2[0])**2 + (pt1[1] - pt2[1])**2)
lats, lngs = [], []
for pt1 in df1:
closestPoints = [pt1, df2[0]]
for pt2 in df2:
if distance(pt1, pt2) < distance(closestPoints[0], closestPoints[1]):
closestPoints = [pt1, pt2]
lats.append(closestPoints[1][0])
lngs.append(closestPoints[1][1])
df['ClosestLat'] = pd.Series(lats)
df['ClosestLong'] = pd.Series(lngs)

Populating data to individual columns in pandas dataframe

I am trying to get the data from the list (list_addresses) and populate it to different columns of the dataframe (dfloc). I use the below code, not sure where I am going wrong.
Values are present in list_addresses but not getting populated to the dataframe.
Any help would be appreciated.
for index in range(len(list_addresses)):
location = geolocator.reverse([list_addresses[index][0],list_addresses[index][1]])
dfloc.loc[dfloc.Latitude] = list_addresses[index][0]
dfloc.loc[dfloc.Longitude] = list_addresses[index][1]
dfloc.loc[dfloc.Address] = location.address
So it looks like you have a list of lists or tuples with form of [(Lat1,Lon1),(Lat2,Lon2), etc...]. I like to make a list for each column, then assign the entire column at once:
lat_list = [x[0] for x in list_addresses]
lon_list = [x[1] for x in list_addresses]
address_list = []
for index in range(len(list_addresses)):
location = geolocator.reverse([list_addresses[index][0],list_addresses[index][1]])
address_list.append(location.address)
dfloc['Latitude'] = lat_list
dfloc['Longitude'] = lon_list
dfloc['Address'] = address_list