How to loop over multiple panda dataframes with a for loop - pandas

I'm trying to harmonize the column names in all my data frames so that I can concatenate them and create one table. I'm struggling to create a loop over multiple dataframes. The code does not fail, but it does not work either. Here is an example of two dataframes and a list that includes the dataframes:
df_test = pd.DataFrame({'HHLD_ID':[6,7,8,9,10],
'sales':[25,50,25,25,50],
'units':[1,2,1,1,2],
})
df_test2 = pd.DataFrame({'HHLD_ID':[1,2,3,4,5],
'sale':[25,50,25,25,50],
'unit':[1,2,1,1,2],
})
list_df_export = [df_test,df_test2]
Here is what I have tried...
for d in list_df_export:
if 'sale' in d:
d = d.rename(columns={"sale": "sales",'unit':'units'})
Here is what I would like df_test2 to look like...

you can use:
d = {'sale':'sales','unit':'units'}
pd.concat(i.rename(columns=d) for i in list_df_export)

Maybe the "inplace" option can help you:
for d in list_df_export:
if 'sale' in d:
d = d.rename(columns={"sale": "sales", 'unit': 'units'}, inplace=True)

You can try:
df_test2.columns = df_test.columns
This will make the columns in df_test2 have the same names as df_test.
Is this what you need?

Related

replacing df.append with pd.concat when building a new dataframe from file read

...
header = pd.DataFrame()
for x in {0,7,8,9,10,11,12,13,14,15,18,19,21,23}:
header = header.append({'col1':data1[x].split(':')[0],
'col2':data1[x].split(':')[1][:-1],
'col3':data2[x].split(':')[1][:-1],
'col4':data2[x]==data1[x],
'col5':'---'},
ignore_index=True)`
...
I have some Jupyter Notebook code which reads in 2 text files to data1 and data2 and using a list I am picking out specific matching lines in both files to a dataframe for easy display and comparison in the notebook
Since df.append is now being bumped for pd.concat what's the tidiest way to do this
is it basically to replace the inner loop code with
...
header = pd.concat(header, {all the column code from above })
...
addtional input to comment below
Yes, sorry for example the next block of code does this:
for x in {4,2 5}:
header = header.append({'col1':SOMENEWROWNAME'',
'col2':data1[x].split(':')[1][:-1],
'col3':data2[x].split(':')[1][:-1],
'col4':data2[x]==data1[x],
'col5':float(data2[x].split(':'},[1]([-1]) -float(data1[x].split(':'},[1]([-1])
ignore_index=True)`
repeated 5 times with different data indices in the loop, and then a different SOMENEWROWNAME
I inherited this notebook and I see now that this way of doing it was because they only wanted to do a numerical float difference on the columns where numbers come
but there are several such blocks, with different lines in the data and where that first parameter SOMENEWROWNAME is the different text fields from the respective lines in the data.
so I was primarily just trying to fix these append to concat warnings, but of course if the code can be better written then all good!
Use list comprehension and DataFrame constructor:
data = [{'col1':data1[x].split(':')[0],
'col2':data1[x].split(':')[1][:-1],
'col3':data2[x].split(':')[1][:-1],
'col4':data2[x]==data1[x],
'col5':'---'} for x in {0,7,8,9,10,11,12,13,14,15,18,19,21,23}]
df = pd.DataFrame(data)
EDIT:
out = []
#sample
for x in {1,7,30}:
out.append({'col1':SOMENEWROWNAME'',
'col2':data1[x].split(':')[1][:-1],
'col3':data2[x].split(':')[1][:-1],
'col4':data2[x]==data1[x],
'col5':float(data2[x].split(':'},[1]([-1]) -float(data1[x].split(':'},[1]([-1]))))))
df1 = pd.DataFrame(out)
out1 = []
#sample
for x in {1,7,30}:
out1.append({another dict})))
df2 = pd.DataFrame(out1)
df = pd.concat([df1, df2])
Or:
final = []
for x in {4,2,5}:
final.append({'col1':SOMENEWROWNAME'',
'col2':data1[x].split(':')[1][:-1],
'col3':data2[x].split(':')[1][:-1],
'col4':data2[x]==data1[x],
'col5':float(data2[x].split(':'},[1]([-1]) -float(data1[x].split(':'},[1]([-1]))))))
for x in {4,2, 5}:
final.append({another dict})))
df = pd.DataFrame(final)

Dataframe-renaming multiply columns with othe same name

I have a dataframe with several columns with almost the same name and a number in the end (Hora1, Hora2, ..., Hora12).
I would like to change all column names to GAx, where x is a different number (GA01.0, GA01.1, ...).
Well, we can achieve the above output in many ways. One of the ways I will share here.
df.columns = [col.replace('Hora', 'GA01.') for col in df.columns]
Please check the screenshot for reference.
You can rename the columns by passing a list of column names:
columns = ['GA1.0','GA01.1']
df.columns = columns
You can try:
import re
df.columns = [re.sub('Hora', 'GA01.', x) for x in df.columns]

convert rows of dataframe to separate dataframes

i need to convert to rows of a dataframe from separate 1 row dataframes. Looking for the most efficient / clean approach here.
I need to persist the column names, it is for a machine learning model and i basically need a list of dataframes.
My current solution:
def get_data(filename):
dataframe = pd.read_csv(filename, sep=';')
dataframes = []
for i,row in dataframe.iterrows():
dataframes.append(row.to_frame().T)
return dataframes
This looks very inefficient, maybe there is a cleaner shorter solution.
Use:
dataframe = pd.read_csv(filename, sep=';')
dataframes = [dataframe.iloc[[i]] for i in range(len(dataframe))]
Or:
dataframe = pd.read_csv(filename, sep=';')
dataframes = [x.to_frame().T for i,x in dataframe.T.items()]
Try:
df_list = []
_ = dataframe.apply(lambda x: df_list.append(x.to_frame().T),axis=1)
If I understood what you want is somethink like this:
start = 0
end = dataframe.shape[0]
dataframes = dataframe.loc[start:end]

How to concat 3 dataframes with each into sequential columns

I'm trying to understand how to concat three individual dataframes (i.e df1, df2, df3) into a new dataframe say df4 whereby each individual dataframe has its own column left to right order.
I've tried using concat with axis = 1 to do this, but it appears not possible to automate this with a single action.
Table1_updated = pd.DataFrame(columns=['3P','2PG-3Io','3Io'])
Table1_updated=pd.concat([get_table1_3P,get_table1_2P_max_3Io,get_table1_3Io])
Note that with the exception of get_table1_2P_max_3Io, which has two columns, all other dataframes have one column
For example,
get_table1_3P =
get_table1_2P_max_3Io =
get_table1_3Io =
Ultimately, i would like to see the following:
I believe you need first concat and tthen change order by list of columns names:
Table1_updated=pd.concat([get_table1_3P,get_table1_2P_max_3Io,get_table1_3Io], axis=1)
Table1_updated = Table1_updated[['3P','2PG-3Io','3Io']]

Extracting value and creating new column out of it

I would like to extract certain section of a URL, residing in a column of a Pandas Dataframe and make that a new column. This
ref = df['REFERRERURL']
ref.str.findall("\\d\\d\\/(.*?)(;|\\?)",flags=re.IGNORECASE)
returns me a Series with tuples in it. How can I take out only one part of that tuple before the Series is created, so I can simply turn that into a column? Sample data for referrerurl is
http://wap.blah.com/xxx/id/11/someproduct_step2;jsessionid=....
In this example I am interested in creating a column that only has 'someproduct_step2' in it.
Thanks,
In [25]: df = DataFrame([['http://wap.blah.com/xxx/id/11/someproduct_step2;jsessionid=....']],columns=['A'])
In [26]: df['A'].str.findall("\\d\\d\\/(.*?)(;|\\?)",flags=re.IGNORECASE).apply(lambda x: Series(x[0][0],index=['first']))
Out[26]:
first
0 someproduct_step2
in 0.11.1 here is a neat way of doing this as well
In [34]: df.replace({ 'A' : "http:.+\d\d\/(.*?)(;|\\?).*$"}, { 'A' : r'\1'} ,regex=True)
Out[34]:
A
0 someproduct_step2
This also worked
def extract(x):
res = re.findall("\\d\\d\\/(.*?)(;|\\?)",x)
if res: return res[0][0]
session['RU_2'] = session['REFERRERURL'].apply(extract)