Adding Columns in loop pandas - pandas

I have a 2 dataframes each with 2 columns (named the same in both df's) and I want to add them together to make a third column.
df1['C']=df1[['A','B']].sum(axis=1)
df1['D']=df1[['E','G']].sum(axis=1)
df2['C']=df2[['A','B']].sum(axis=1)
df2['D']=df2[['E','G']].sum(axis=1)
However in reality its more complicated than this. So can I put these in a dictionary and loop?
I'm still figuring out how to structure dictionarys for this type of problem, so any advice would be great.
Here's what I'm trying to do:
all_dfs=[df1,df2]
for df in all_dfs:
dict={Out=['C'], in=['A','B]
Out2=['D'], in2=['E','G]
}
for i in dict:
df[i]=df[['i[1....
I'm a bit lost in how to build this last bit

First change dictionary name because dict is python code word, then change it by key with output column and value by list of input columns and last loop by items() method:
d= {'C':['A','B'],'D': ['E','G']}
for k, v in d.items():
#checking key and value of dict
print (k)
print (v)
df[k]=df[v].sum(axis=1)
EDIT:
Here is simplier working with dictionary of DataFrames, use sum and last create anoter dictionary of DataFrames:
all_dfs= {'first': df1, 'second':df2}
out = {}
for name, df in all_dfs.items():
d= {'C':['A','B'],'D': ['E','G']}
for k, v in d.items():
df[k]=df[v].sum(axis=1)
#fill empty dict by name
out[name] = df
print (out)
print (out['first'])
print (out['second'])

Related

A for loop outputs one list after each iteration. How to append each of them in its own row in a 3-column dataframe?

this seemingly simple operation at point d) still eludes me after numerous attempts to do it by myself.
The for loop I use:
a) cycles through an unknown numbers of excel files,
b) selects 3 columns from each file,
c) perform some string manipulations on their headers using
conditions then
d) outputs the 1-row extraction of the headers I
have achieved so far to a individual list .
After n(3) iterations of a), b) and c), the for loop outputs lists such as:
['Col1','Col1a','Col1b']
['Col2','Col2a','Col2b']
['Col3','Col3a','Col3b']
I am looking to append/concatenate/merge these individual lists each in its own individual row into one dataframe that I can further manipulate.
Excepted final dataframe with index=True and header=None:
0, 'Col1','Col1a','Col1b'
1, 'Col2','Col2a','Col2b'
2, 'Col3','Col3a','Col3b'
I have tried many examples found in SO such as:
df = pd.DataFrame()
for lst in [list1, list2, list3]:
df_temp = pd.DataFrame(lst)
df = df.append(df_temp)
print(df)
Thanks for the time you take reviewing this request.
You can create the dataframe from the list:
pd.DataFrame([list1, list2, list3]).to_csv('file.csv', header=None, index=True)
file.csv:
0,Col1,Col1a,Col1b
1,Col2,Col2a,Col2b
2,Col3,Col3a,Col3b

I tried to add filter for the Countries but it gives me an eeror [duplicate]

I have a dataframe with a lot of columns in it. Now I want to select only certain columns. I have saved all the names of the columns that I want to select into a Python list and now I want to filter my dataframe according to this list.
I've been trying to do:
df_new = df[[list]]
where list includes all the column names that I want to select.
However I get the error:
TypeError: unhashable type: 'list'
Any help on this one?
You can remove one []:
df_new = df[list]
Also better is use other name as list, e.g. L:
df_new = df[L]
It look like working, I try only simplify it:
L = []
for x in df.columns:
if not "_" in x[-3:]:
L.append(x)
print (L)
List comprehension:
print ([x for x in df.columns if not "_" in x[-3:]])

Split Get from Python dictionary

I have a dictionary that I can use the get method to extract values from but I need to subset these values. For example
dict_of_measures = {k: v for k, v in measures.groupby('Measure')}
And I am using get
BCS=dict_of_measures.get('BCS')
I have several values and wanted to know if I could use a for loop to extract from the dictionary and subset into multiple dataframes per measure using the get method? Is this possible?
for measure name in dict_of_measures:
get measure name()
you can use dict comprehension-
result= []
keys_to_extract = ['key1','key2']
new_dict = {k: bigdict[k] for k in keys_to_extract}
result.append(new_dict) # add dictionary to list. This can then be converted into pandas dataframe

Looping through a dictionary of dataframes and counting a column

I am wondering if anyone can help. I have a number of dataframes stored in a dictionary. I simply want to access each of these dataframes and count the values in a column in the column I have 10 letters. In the first dataframe there are 5bs and 5 as. For example the output from the count I would expect to be is a = 5 and b =5. However for each dataframe this count would be different hence I would like to store the output of these counts either into another dictionary or a separate variable.
The dictionary is called Dict and the column name in all the dataframes is called letters. I have tried to do this by accessing the keys in the dictionary but can not get it to work. A section of what I have tried is shown below.
import pandas as pd
for key in Dict:
Count=pd.value_counts(key['letters'])
Count here would ideally change with each new count output to store into a new variable
A simplified example (the actual dataframe sizes are max 5000,63) of the one of the 14 dataframes in the dictionary would be
`d = {'col1': [1, 2,3,4,5,6,7,8,9,10], 'letters': ['a','a','a','b','b','a','b','a','b','b']}
df = pd.DataFrame(data=d)`
The other dataframes are names df2,df3,df4 etc
I hope that makes sense. Any help would be much appreciated.
Thanks
If you want to access both key and values when iterating over a dictionary, you should use the items function.
You could use another dictionary to store the results:
letter_counts = {}
for key, value in Dict.items():
letter_counts[key] = value["letters"].value_counts()
You could also use dictionary comprehension to do this in 1 line:
letter_counts = {key: value["letters"].value_counts() for key, value in Dict.items()}
The easiest thing is probably dictionary comprehension:
d = {'col1': [1, 2,3,4,5,6,7,8,9,10], 'letters': ['a','a','a','b','b','a','b','a','b','b']}
d2 = {'col1': [1, 2,3,4,5,6,7,8,9,10,11], 'letters': ['a','a','a','b','b','a','b','a','b','b','a']}
df = pd.DataFrame(data=d)
df2 = pd.DataFrame(d2)
df_dict = {'d': df, 'd2': df2}
new_dict = {k: v['letters'].count() for k,v in df_dict.items()}
# out
{'d': 10, 'd2': 11}

Loop to get the dataframe, and pass it to a function

I am trying to:
read files and store in dataset
store names of the dataframes in a dataframe
loop to recover the dataframe, and pass it to a function as a dataframe
It does't work because when I retrieve the name of the dataframe, it is a str object, not a dataframe, so the calculus fails.
df_files:
dataframe name
0 df_bureau bureau
1 df_previous_application previous_application
Code:
def missing_values_table_for(df_for, name):
mis_val_for = df_for.isnull().sum() # count null values
-> error
for index, row in df_files.iterrows():
missing_values_for = missing_values_table_for(dataframe, name)
Thanks in advance.
I believe the best here is working with dictionary of Dataframes creating by loop names of files by glob:
import glob
files = glob.glob('files/*.csv')
dfs = {f: pd.read_csv(f) for f in files}
for k, v in dfs.items():
df = v.isnull().sum()
print (df)