Pandas create new column with specific row values from dict

Pandas create new column with specific row values from dict - pandas

I have a dataframe:
ID val
1 a
2 b
3 c
4 d
5 a
7 d
6 v
8 j
9 k
10 a
I have a dictionary as follows:
{aa:3, bb: 3,cc:4}
In the dictionary the numerical values indicates the number of records. The sum of numerical values is equal to the number of rows that I have in the data frame. In this example 3 + 3 + 4 = 10 and I have 10 rows in the data frame.
I am trying to split the data frame by rows that are equal to the number given in the dictionary and fill the key as column value into a new column. The desired output is as follows:
ID val. new_col
1 a. aa
2 b aa
3 c. aa
4 d. bb
5 a. bb
6 v. bb
7. d. cc
8 j. cc
9 k. cc
10 a. cc
The order of the fill is not important as long as the count of records match with the count given in the dict. I am trying to resolve this by iterating through the dict but I am not able to isolate specific number of records of the data frame with every new key value pair.
I have also tried using pd.cut by splitting the dict values to bins and keys as column values. However I am getting the error ValueError: bins must increase monotonically.

d = {'aa':3, 'bb': 3,'cc':4}
df['new_col'] = pd.Series([np.repeat(i, j) for i, j in d.items()]).explode().to_numpy()
df
Out[64]:
ID val new_col
0 1 a aa
1 2 b aa
2 3 c aa
3 4 d bb
4 5 a bb
5 7 d bb
6 6 v cc
7 8 j cc
8 9 k cc
9 10 a cc

Related

keep all column after sum and groupby including empty values

I have the following dataframe:
source name cost other_c other_b
a a 7 dd 33
b a 6 gg 44
c c 3 ee 55
b a 2
d b 21 qw 21
e a 16 aq
c c 10 55
I am doing a sum of name and source with:
new_df = df.groupby(['source', 'name'], as_index=False)['cost'].sum()
but it is dropping the remaining 6 columns in my dataframe. Is there a way to keep the rest of the columns? I'm not looking to add new column, just carry over the columns from the original dataframe

Replace value in column based on value in another column

I have a dataframe with 3240 rows and 3 columns. Column Block represents the block in which values in column A and B appeared. Unique number of blocks is 6 but they are repeating in sequence throughout whole dataframe from 1-6. Values in column A are repeating themselves in the sequences of exact order from 1-10 throughout the whole dataframe (blocks). Values in column B exist from a-j (n = 10), but they repeating themselves in random order in sequences from a-j, so they are never duplicated within the Block.
So in each of 6 Blocks, values in column A (1-10) repeat themselves in exact order from 1-10, while In column B, values (a-j) repeat themselves in random order.
Df looks like this:
Block A B ID
1 1 a XY
1 2 b XY
1 3 c XY
1 4 d XY
1 5 e XY
1 6 f XY
1 7 g XY
1 8 h XY
1 9 i XY
1 10 j XY
....
6 1 d XY
...
6 6 j XY
....
1 1 g XX
1 2 a XX
Throughout dataframe i would like to replace all values in column B based on corresponding value in column A for each separate Block. Logic would be to replace values in column B based on values in column A by this pattern 1=6, 2=7, 3=8, 4=9, 5=10.
Result would look like this:
Block A B ID
1 1 f XY
1 2 g XY
1 3 h XY
1 4 i XY
1 5 j XY
1 6 a XY
1 7 b XY
1 8 c XY
1 9 d XY
1 10 e XY
....
6 1 j XY
...
6 6 d XY
....
1 1 g XX
1 2 a XX
What would be an efficient to do this?

You want to identify the block of 5 within each block of 10 and swap them. This is my solution:
df['B'] = (df.assign(blk_5 = (np.arange(len(df))//5+1) % 2,
blk_10 = np.arange(len(df)) // 10
)
.sort_values(['Block','blk_10','blk_5'])
['B'].values
)

Pandas, multiply part of one DF against another based on condition

Pretty new to this and am having trouble finding the right way to do this.
Say I have dataframe1 looking like this with column names and a bunch of numbers as data:
D L W S
1 2 3 4
4 3 2 1
1 2 3 4
and I have dataframe2 looking like this:
Name1 Name2 Name3 Name4
2 data data D
3 data data S
4 data data L
5 data data S
6 data data W
I would like a new dataframe produced with the result of multiplying each row of the second dataframe against each row of the first dataframe, where it multiplies the value of Name1 against the value in the column of dataframe1 which matches the Name4 value of dataframe2.
Is there any nice way to do this? I was trying to look at using methods like where, condition, and apply but haven't been understanding things well enough to get something working.
EDIT: Use the following code to create fake data for the DataFrames:
d1 = {'D':[1,2,3,4,5,6],'W':[2,2,2,2,2,2],'L':[6,5,4,3,2,1],'S':[1,2,3,4,5,6]}
d2 = {'col1': [3,2,7,4,5,6], 'col2':[2,2,2,2,3,4], 'col3':['data', 'data', 'data','data', 'data', 'data' ], 'col4':['D','L','D','W','S','S']}
df1 = pd.DataFrame(data = d1)
df2 = pd.DataFrame(data = d2)
EDIT AGAIN FOR MORE INFO
First I changed the data in df1 at this point so this new example will turn out better.
Okay so from those two dataframes the data frame I'd like to create would come out like this if the multiplication when through for the first four rows of df2. You can see that Col2 and Col3 are unchanged, but depending on the letter of Col4, Col1 was multiplied with the corresponding factor from df1:
d3 = { 'col1':[3,6,9,12,15,18,12,10,8,6,4,2,7,14,21,28,35,42,8,8,8,8,8,8], 'col2':[2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2], 'col3':['data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data','data'], 'col4':['D','D','D','D','D','D','L','L','L','L','L','L','D','D','D','D','D','D','W','W','W','W','W','W']}
df3 = pd.DataFrame(data = d3)

I think I understand what you are trying to achieve. You want to multiply each row r in df2 with the corresponding column c in df1 but the elements from c are only multiplied with the first element in r the rest of the row doesn't change.
I was thinking there might be a way to join df1.transpose() and df2 but I didn't find one.
While not pretty, I think the code below solves your problem:
def stretch(row):
repeated_rows = pd.concat([row]*len(df1), axis=1, ignore_index=True).transpose()
factor = row['col1']
label = row['col4']
first_column = df1[label] * factor
repeated_rows['col1'] = first_column
return repeated_rows
pd.concat((stretch(r) for _, r in df2.iterrows()), ignore_index=True)
#resulting in
col1 col2 col3 col4
0 3 2 data D
1 6 2 data D
2 9 2 data D
3 12 2 data D
4 15 2 data D
5 18 2 data D
0 12 2 data L
1 10 2 data L
2 8 2 data L
3 6 2 data L
4 4 2 data L
5 2 2 data L
0 7 2 data D
1 14 2 data D
2 21 2 data D
3 28 2 data D
4 35 2 data D
5 42 2 data D
0 8 2 data W
1 8 2 data W
2 8 2 data W
3 8 2 data W
4 8 2 data W
5 8 2 data W
...

how can I applying multiple row data in dataframe

I am a new bee to python and Pandas, I have a huge data set and insted of applying function row by row I want to apply to a batch of rows and return back the result and associated back to the same corresponding row back
Example:
ID Values
a 2
b 3
c 4
d 5
e 6
f 7
df['squared_values']= df['values'].apply(lambda row: function(row))
def function(x):
#making call to api and returning values related to x
return response
above one apply function row by row which is time consuming
I need a way to do batch operations on row
example:
batch=3
df['squared_values']= df['values'].apply(lambda batch: function(batch))
on first pass values should be
ID Values squared_values
a 2 4
b 3 9
c 4 16
d 5
e 6
f 7
on second pass
ID Values squared_values
a 2 4
b 3 9
c 4 16
d 5 25
e 6 36
f 7 49

Is this operation really too slow?
df['squared_values'] = df['Values'] ** 2
you can always add the iloc to select rows:
df.iloc['squared_values'].update(df.iloc[0:4]['Values'] ** 2)
But I can't imagine this being quicker

Python: Add column to panda data frame with different column length

I have a panda dataframe and would like to add data columns using one common column as index. In case the new data does not have the index value it should enter a 0. The new column will have a different length. Is there a better way than using a loop? Example below
main Dataframe:
index_column date value
1 1 A
2 2 B
3 3 C
4 4 D
add new column:
date value
2 G
3 J
Result:
index_column date value new value
1 1 A 0
2 2 B G
3 3 C J
4 4 D 0
Many thanks!
Rolf

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pandas create new column with specific row values from dict - pandas

d = {'aa':3, 'bb': 3,'cc':4} df['new_col'] = pd.Series([np.repeat(i, j) for i, j in d.items()]).explode().to_numpy() df Out[64]: ID val new_col 0 1 a aa 1 2 b aa 2 3 c aa 3 4 d bb 4 5 a bb 5 7 d bb 6 6 v cc 7 8 j cc 8 9 k cc 9 10 a cc

Related

keep all column after sum and groupby including empty values

Replace value in column based on value in another column

Pandas, multiply part of one DF against another based on condition

how can I applying multiple row data in dataframe

Python: Add column to panda data frame with different column length

Categories

Resources