I have a dataframe with 3240 rows and 3 columns. Column Block represents the block in which values in column A and B appeared. Unique number of blocks is 6 but they are repeating in sequence throughout whole dataframe from 1-6. Values in column A are repeating themselves in the sequences of exact order from 1-10 throughout the whole dataframe (blocks). Values in column B exist from a-j (n = 10), but they repeating themselves in random order in sequences from a-j, so they are never duplicated within the Block.
So in each of 6 Blocks, values in column A (1-10) repeat themselves in exact order from 1-10, while In column B, values (a-j) repeat themselves in random order.
Df looks like this:
Block A B ID
1 1 a XY
1 2 b XY
1 3 c XY
1 4 d XY
1 5 e XY
1 6 f XY
1 7 g XY
1 8 h XY
1 9 i XY
1 10 j XY
....
6 1 d XY
...
6 6 j XY
....
1 1 g XX
1 2 a XX
Throughout dataframe i would like to replace all values in column B based on corresponding value in column A for each separate Block. Logic would be to replace values in column B based on values in column A by this pattern 1=6, 2=7, 3=8, 4=9, 5=10.
Result would look like this:
Block A B ID
1 1 f XY
1 2 g XY
1 3 h XY
1 4 i XY
1 5 j XY
1 6 a XY
1 7 b XY
1 8 c XY
1 9 d XY
1 10 e XY
....
6 1 j XY
...
6 6 d XY
....
1 1 g XX
1 2 a XX
What would be an efficient to do this?
You want to identify the block of 5 within each block of 10 and swap them. This is my solution:
df['B'] = (df.assign(blk_5 = (np.arange(len(df))//5+1) % 2,
blk_10 = np.arange(len(df)) // 10
)
.sort_values(['Block','blk_10','blk_5'])
['B'].values
)
Pretty new to this and am having trouble finding the right way to do this.
Say I have dataframe1 looking like this with column names and a bunch of numbers as data:
D L W S
1 2 3 4
4 3 2 1
1 2 3 4
and I have dataframe2 looking like this:
Name1 Name2 Name3 Name4
2 data data D
3 data data S
4 data data L
5 data data S
6 data data W
I would like a new dataframe produced with the result of multiplying each row of the second dataframe against each row of the first dataframe, where it multiplies the value of Name1 against the value in the column of dataframe1 which matches the Name4 value of dataframe2.
Is there any nice way to do this? I was trying to look at using methods like where, condition, and apply but haven't been understanding things well enough to get something working.
EDIT: Use the following code to create fake data for the DataFrames:
d1 = {'D':[1,2,3,4,5,6],'W':[2,2,2,2,2,2],'L':[6,5,4,3,2,1],'S':[1,2,3,4,5,6]}
d2 = {'col1': [3,2,7,4,5,6], 'col2':[2,2,2,2,3,4], 'col3':['data', 'data', 'data','data', 'data', 'data' ], 'col4':['D','L','D','W','S','S']}
df1 = pd.DataFrame(data = d1)
df2 = pd.DataFrame(data = d2)
EDIT AGAIN FOR MORE INFO
First I changed the data in df1 at this point so this new example will turn out better.
Okay so from those two dataframes the data frame I'd like to create would come out like this if the multiplication when through for the first four rows of df2. You can see that Col2 and Col3 are unchanged, but depending on the letter of Col4, Col1 was multiplied with the corresponding factor from df1:
d3 = { 'col1':[3,6,9,12,15,18,12,10,8,6,4,2,7,14,21,28,35,42,8,8,8,8,8,8], 'col2':[2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2], 'col3':['data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data'], 'col4':['D','D','D','D','D','D','L','L','L','L','L','L','D','D','D','D','D','D','W','W','W','W','W','W']}
df3 = pd.DataFrame(data = d3)
I think I understand what you are trying to achieve. You want to multiply each row r in df2 with the corresponding column c in df1 but the elements from c are only multiplied with the first element in r the rest of the row doesn't change.
I was thinking there might be a way to join df1.transpose() and df2 but I didn't find one.
While not pretty, I think the code below solves your problem:
def stretch(row):
repeated_rows = pd.concat([row]*len(df1), axis=1, ignore_index=True).transpose()
factor = row['col1']
label = row['col4']
first_column = df1[label] * factor
repeated_rows['col1'] = first_column
return repeated_rows
pd.concat((stretch(r) for _, r in df2.iterrows()), ignore_index=True)
#resulting in
col1 col2 col3 col4
0 3 2 data D
1 6 2 data D
2 9 2 data D
3 12 2 data D
4 15 2 data D
5 18 2 data D
0 12 2 data L
1 10 2 data L
2 8 2 data L
3 6 2 data L
4 4 2 data L
5 2 2 data L
0 7 2 data D
1 14 2 data D
2 21 2 data D
3 28 2 data D
4 35 2 data D
5 42 2 data D
0 8 2 data W
1 8 2 data W
2 8 2 data W
3 8 2 data W
4 8 2 data W
5 8 2 data W
...
I am a new bee to python and Pandas, I have a huge data set and insted of applying function row by row I want to apply to a batch of rows and return back the result and associated back to the same corresponding row back
Example:
ID Values
a 2
b 3
c 4
d 5
e 6
f 7
df['squared_values']= df['values'].apply(lambda row: function(row))
def function(x):
#making call to api and returning values related to x
return response
above one apply function row by row which is time consuming
I need a way to do batch operations on row
example:
batch=3
df['squared_values']= df['values'].apply(lambda batch: function(batch))
on first pass values should be
ID Values squared_values
a 2 4
b 3 9
c 4 16
d 5
e 6
f 7
on second pass
ID Values squared_values
a 2 4
b 3 9
c 4 16
d 5 25
e 6 36
f 7 49
Is this operation really too slow?
df['squared_values'] = df['Values'] ** 2
you can always add the iloc to select rows:
df.iloc['squared_values'].update(df.iloc[0:4]['Values'] ** 2)
But I can't imagine this being quicker