My problem is similar to split a dataframe into chunks of N rows problem, expect that the number of rows in each chunk will be different. I have a datafame as such:
A
B
C
1
2
0
1
2
1
1
2
2
1
2
0
1
2
1
1
2
2
1
2
3
1
2
4
1
2
0
A and B are just whatever don't pay attention. Column C though starts at 0 and increments with each row until it suddenly resets to 0. So in the dataframe included the first 3 rows are a new dataframe, then the next 5 are a second new dataframe, and this continues as my dataframe adds more and more rows.
To finish off the question,
df = [x for _, x in df.groupby(df['C'].eq(0).cumsum())]
allows me to group all the subgroups and then with this groupby I can select each subgroups as a separate dataframe.
Pretty new to this and am having trouble finding the right way to do this.
Say I have dataframe1 looking like this with column names and a bunch of numbers as data:
D L W S
1 2 3 4
4 3 2 1
1 2 3 4
and I have dataframe2 looking like this:
Name1 Name2 Name3 Name4
2 data data D
3 data data S
4 data data L
5 data data S
6 data data W
I would like a new dataframe produced with the result of multiplying each row of the second dataframe against each row of the first dataframe, where it multiplies the value of Name1 against the value in the column of dataframe1 which matches the Name4 value of dataframe2.
Is there any nice way to do this? I was trying to look at using methods like where, condition, and apply but haven't been understanding things well enough to get something working.
EDIT: Use the following code to create fake data for the DataFrames:
d1 = {'D':[1,2,3,4,5,6],'W':[2,2,2,2,2,2],'L':[6,5,4,3,2,1],'S':[1,2,3,4,5,6]}
d2 = {'col1': [3,2,7,4,5,6], 'col2':[2,2,2,2,3,4], 'col3':['data', 'data', 'data','data', 'data', 'data' ], 'col4':['D','L','D','W','S','S']}
df1 = pd.DataFrame(data = d1)
df2 = pd.DataFrame(data = d2)
EDIT AGAIN FOR MORE INFO
First I changed the data in df1 at this point so this new example will turn out better.
Okay so from those two dataframes the data frame I'd like to create would come out like this if the multiplication when through for the first four rows of df2. You can see that Col2 and Col3 are unchanged, but depending on the letter of Col4, Col1 was multiplied with the corresponding factor from df1:
d3 = { 'col1':[3,6,9,12,15,18,12,10,8,6,4,2,7,14,21,28,35,42,8,8,8,8,8,8], 'col2':[2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2], 'col3':['data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data'], 'col4':['D','D','D','D','D','D','L','L','L','L','L','L','D','D','D','D','D','D','W','W','W','W','W','W']}
df3 = pd.DataFrame(data = d3)
I think I understand what you are trying to achieve. You want to multiply each row r in df2 with the corresponding column c in df1 but the elements from c are only multiplied with the first element in r the rest of the row doesn't change.
I was thinking there might be a way to join df1.transpose() and df2 but I didn't find one.
While not pretty, I think the code below solves your problem:
def stretch(row):
repeated_rows = pd.concat([row]*len(df1), axis=1, ignore_index=True).transpose()
factor = row['col1']
label = row['col4']
first_column = df1[label] * factor
repeated_rows['col1'] = first_column
return repeated_rows
pd.concat((stretch(r) for _, r in df2.iterrows()), ignore_index=True)
#resulting in
col1 col2 col3 col4
0 3 2 data D
1 6 2 data D
2 9 2 data D
3 12 2 data D
4 15 2 data D
5 18 2 data D
0 12 2 data L
1 10 2 data L
2 8 2 data L
3 6 2 data L
4 4 2 data L
5 2 2 data L
0 7 2 data D
1 14 2 data D
2 21 2 data D
3 28 2 data D
4 35 2 data D
5 42 2 data D
0 8 2 data W
1 8 2 data W
2 8 2 data W
3 8 2 data W
4 8 2 data W
5 8 2 data W
...