Concatenate single row dataframe with multiple row dataframe - pandas

I have a dataframe with large number of columns but single row as df1:
Col1 Col2 Price Qty
A B 16 5
I have another dataframe as follows, df2:
Price Qty
8 2.5
16 5
6 1.5
I want to achieve the following:
Col1 Col2 Price Qty
A B 8 2.5
A B 16 5
A B 6 1.5
Where essentially I am taking all rows of df1 and repeat it while concatenating with df2 but bring the Price and Qty columns from df2 and replace the ones present originally in df1.
I am not sure how to proceed with above.

I believe the following approach will work,
# first lets repeat the single row df1 as many times as there are rows in df2
df1 = pd.DataFrame(np.repeat(df1.values, len(df2.index), axis=0), columns=df1.columns)
# lets reset the indexes of both DataFrames just to be safe
df1.reset_index(inplace=True)
df2.reset_index(inplace=True)
# now, lets merge the two DataFrames based on the index
# after dropping the Price and Qty columns from df1
df3 = pd.merge(df1.drop(['Price', 'Qty'], axis=1), df2, left_index=True, right_index=True)
# finally, lets drop the index columns
df3.drop(['index_x', 'index_y'], inplace=True, axis=1)

Related

is There any methods to merge multiple dataframes of different templates

There are a total of 4 dataframes (df1 / df2 / df3 / df4),
Each dataframe has a different template, but they all have the same columns.
I want to merges the row of each dataframe based on the same column, but what function should I use? A 'merge' or 'join' function doesn't seem to work, and deleting the rest of the columns after grouping them into a list seems to be too messy.
I want to make attached image
This is an option, you can merge the dataframes and then drop the useless columns from the total dataframe.
df_total = pd.concat([df1, df2, df3, df4], axis=0)
df_total.drop(['Value2', 'Value3'], axis=1)
You can use reduce to get it done too.
from functools import reduce
reduce(lambda left,right: pd.merge(left, right, on=['ID','value1'], how='outer'), [df1,df2,df3,df4])[['ID','value1']]
ID value1
0 a 1
1 b 4
2 c 5
3 f 1
4 g 5
5 h 6
6 i 1

Multiplying two data frames in pandas

I have two data frames as shown below df1 and df2. I want to create a third dataframe i.e. df as shown below. What would be the appropriate way?
df1={'id':['a','b','c'],
'val':[1,2,3]}
df1=pd.DataFrame(df)
df1
id val
0 a 1
1 b 2
2 c 3
df2={'yr':['2010','2011','2012'],
'val':[4,5,6]}
df2=pd.DataFrame(df2)
df2
yr val
0 2010 4
1 2011 5
2 2012 6
df={'id':['a','b','c'],
'val':[1,2,3],
'2010':[4,8,12],
'2011':[5,10,15],
'2012':[6,12,18]}
df=pd.DataFrame(df)
df
id val 2010 2011 2012
0 a 1 4 5 6
1 b 2 8 10 12
2 c 3 12 15 18
I can basically convert df1 and df2 as 1 by n matrices and get n by n result and assign it back to the df1. But is there any easy pandas way?
TL;DR
We can do it in one line like this:
df1.join(df1.val.apply(lambda x: x * df2.set_index('yr').val))
or like this:
df1.join(df1.set_index('id') # df2.set_index('yr').T, on='id')
Done.
The long story
Let's see what's going on here.
To find the output of multiplication of each df1.val by values in df2.val we use apply:
df1['val'].apply(lambda x: x * df2.val)
The function inside will obtain df1.vals one by one and multiply each by df2.val element-wise (see broadcasting for details if needed). As far as df2.val is a pandas sequence, the output is a data frame with indexes df1.val.index and columns df2.val.index. By df2.set_index('yr') we force years to be indexes before multiplication so they will become column names in the output.
DataFrame.join is joining frames index-on-index by default. So due to identical indexes of df1 and the multiplication output, we can apply df1.join( <the output of multiplication> ) as is.
At the end we get the desired matrix with indexes df1.index and columns id, val, *df2['yr'].
The second variant with # operator is actually the same. The main difference is that we multiply 2-dimentional frames instead of series. These are the vertical and horizontal vectors, respectively. So the matrix multiplication will produce a frame with indexes df1.id and columns df2.yr and element-wise multiplication as values. At the end we connect df1 with the output on identical id column and index respectively.
This works for me:
df2 = df2.T
new_df = pd.DataFrame(np.outer(df1['val'],df2.iloc[1:]))
df = pd.concat([df1, new_df], axis=1)
df.columns = ['id', 'val', '2010', '2011', '2012']
df
The output I get:
id val 2010 2011 2012
0 a 1 4 5 6
1 b 2 8 10 12
2 c 3 12 15 18
Your question is a bit vague. But I suppose you want to do something like that:
df = pd.concat([df1, df2], axis=1)

How to pull a specific value from one dataframe into another?

I have two dataframes
How would one populate the values in bold from df1 into the column 'Value' in df2?
Use melt on df1 before merge your 2 dataframes
tmp = df1.melt('Rating', var_name='Category', value_name='Value2')
df2['Value'] = df2.merge(tmp, on=['Rating', 'Category'])['Value2']
print(df2)
# Output
Category Rating Value
0 Hospitals A++ 2.5
1 Education AA 2.1

How do I offset a dataframe with values in another dataframe?

I have two dataframes. One is the basevales (df) and the other is an offset (df2).
How do I create a third dataframe that is the first dataframe offset by matching values (the ID) in the second dataframe?
This post doesn't seem to do the offset... Update only some values in a dataframe using another dataframe
import pandas as pd
# initialize list of lists
data = [['1092', 10.02], ['18723754', 15.76], ['28635', 147.87]]
df = pd.DataFrame(data, columns = ['ID', 'Price'])
offsets = [['1092', 100.00], ['28635', 1000.00], ['88273', 10.]]
df2 = pd.DataFrame(offsets, columns = ['ID', 'Offset'])
print (df)
print (df2)
>>> print (df)
ID Price
0 1092 10.02
1 18723754 15.76 # no offset to affect it
2 28635 147.87
>>> print (df2)
ID Offset
0 1092 100.00
1 28635 1000.00
2 88273 10.00 # < no match
This is want I want to produce: The price has been offset by matching
ID Price
0 1092 110.02
1 18723754 15.76
2 28635 1147.87
I've also looked at Pandas Merging 101
I don't want to add columns to the dataframe, and I don;t want to just replace column values with values from another dataframe.
What I want is to add (sum) column values from the other dataframe to this dataframe, where the IDs match.
The closest I come is df_add=df.reindex_like(df2) + df2 but the problem is that it sums all columns - even the ID column.
Try this :
df['Price'] = pd.merge(df, df2, on=["ID"], how="left")[['Price','Offset']].sum(axis=1)

Performance issue pandas 6 mil rows

need one help.
I am trying to concatenate two data frames. 1st has 58k rows, other 100. Want to concatenate in a way that each of 58k row has 100 rows from other df. So in total 5.8 mil rows.
Performance is very poor, takes 1 hr to do 10 pct. Any suggestions for improvement?
Here is code snippet.
def myfunc(vendors3,cust_loc):
cust_loc_vend = pd.DataFrame()
cust_loc_vend.empty
for i,row in cust_loc.iterrows():
clear_output(wait=True)
a= row.to_frame().T
df= pd.concat([vendors3, a],axis=1, ignore_index=False)
#cust_loc_vend = pd.concat([cust_loc_vend, df],axis=1, ignore_index=False)
cust_loc_vend= cust_loc_vend.append(df)
print('Current progress:',np.round(i/len(cust_loc)*100,2),'%')
return cust_loc_vend
For e.g. if first DF has 5 rows and second has 100 rows
DF1 (sample 2 columns)
I want a merged DF such that each row in DF 2 has All rows from DF1-
Well all you are looking for is a join.But since there is no column column, what you can do is create a column which is similar in both the dataframes and then drop it eventually.
df['common'] = 1
df1['common'] = 1
df2 = pd.merge(df, df1, on=['common'],how='outer')
df = df.drop('tmp', axis=1)
where df and df1 are dataframes.