Setting to first rows of pandas DataFrame - pandas

I would like to set a value in some column for the first n rows of a pandas DataFrame.
>>> example = pd.DataFrame({'number':range(10),'name':list('aaabbbcccc')},index=range(20,0,-2)) # nontrivial index
>>> example
name number
20 a 0
18 a 1
16 a 2
14 b 3
12 b 4
10 b 5
8 c 6
6 c 7
4 c 8
2 c 9
I would like to set "number" for the first, say, 5 rows to the number 19. What I really want is to set the lowest values of "number" to that value, so I just sort first.
If my index was the trivial one, I could do
example.loc[:5-1,'number'] = 19 # -1 for inclusive indexing
# or
example.ix[:5-1,'number'] = 19
But since it's not, this would produce the following artifact (where all index values up to 4 have been chosen):
>>> example
name number
20 a 19
18 a 19
16 a 19
14 b 19
12 b 19
10 b 19
8 c 19
6 c 19
4 c 19
2 c 9
Using .iloc[] would be nice, except that it doesn't accept column names.
example.iloc[:5]['number'] = 19
works but gives a SettingWithCopyWarning.
My current solution is to do:
>>> example.sort_values('number',inplace=True)
>>> example.reset_index(drop=True,inplace=True)
>>> example.ix[:5-1,'number'] = 19
>>> example
name number
0 a 19
1 a 19
2 a 19
3 b 19
4 b 19
5 b 5
6 c 6
7 c 7
8 c 8
9 c 9
And since I have to repeat this for several columns, I have to do this a few times and reset the index each time, which also costs me my index (but never mind that).
Does anyone have a better solution?

I would use .iloc as .loc might yield unexpected results if certain indexes are repeated.
example.iloc[:5, example.columns.get_loc('number')] = 19

example.loc[example.index[:5], 'number'] = 19

Related

pandas dataframe and how to find an element using row and column

is there a way to find the element in a pandas data frame by using the row and column values.For example, if we have a list, L = [0,3,2,3,2,4,30,7], we can use L[2] and get the value 2 in return.
Use .iloc
df = pd.DataFrame({'L':[0,3,2,3,2,4,30,7], 'M':[10,23,22,73,72,14,130,17]})
L M
0 0 10
1 3 23
2 2 22
3 3 73
4 2 72
5 4 14
6 30 130
7 7 17
df.iloc[2]['L']
df.iloc[2:3, 0:1]
df.iat[2, 0]
2
df.iloc[6]['M']
df.iloc[6:7, 1:2]
df.iat[6, 1]
130

pandas: get top n including the duplicates of a sorted column

I have some data like
This is a table sorted by score column and also then by cat column
score cat
18 B
18 A
17 A
16 B
16 A
15 B
14 B
13 A
12 A
10 B
9 B
I want to get the top 5 of score including the duplicates and also add the rank
i.e
rank score cat
1 18 B
1 18 A
2 17 A
3 16 B
3 16 A
4 15 B
5 14 B
How can i get this using pandas
Since the data frame is ordered, try factorize
df['rnk'] = df.score.factorize()[0]+1
out = df[df['rnk'] <= 5]
out
score cat rnk
0 18 B 1
1 18 A 1
2 17 A 2
3 16 B 3
4 16 A 3
5 15 B 4
6 14 B 5

How to multiply dataframe columns with dataframe column in pandas?

I want to multiply hdataframe columns with dataframe column.
I have two dataframews as shown here:
A dataframe, B dataframe
a b c d e
3 4 4 4 2
3 3 3 3 3
3 3 3 3 4
and I want to make multiplication A and B.
Multiplication result should be like this:
a b c d
6 8 8 8
9 9 9 9
12 12 12 12
I tried just * multiplication but got a wrong result.
Thank you in advance!
Use B.values or B.to_numpy() which will return numpy array and then you can multiply with DataFrame
Ex.:
>>> A
a b c d
0 3 4 4 4
1 3 3 3 3
2 3 3 3 3
>>> B
c
0 2
1 3
2 4
>>> A * B.values
a b c d
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12
Just another variation on #Dishin's excellent answer:
U can use pandas mul method to multiply A by B, by setting B as a series and multiplying on the index:
A.mul(B.iloc[:,0],axis='index')
a b c d
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12
Use DataFrame.mul with Series by selecting e column:
df = A.mul(B['e'], axis=0)
print (df)
a b c d
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12
I think you are looking for the mul function, as seen on this thread here, here is the code.
df = pd.DataFrame([[3, 4, 4, 4],[3, 3, 3, 3],[3, 3, 3, 3]])
val = [2,3,4]
df.mul(val, axis = 0)
Here are the results:
0 1 2 3
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12
Ignore the indices.

Update dataframe column with values from another dataframe by index

I have two DataFrames.
One of them contains: item id, name, quantity and price.
Another: item id, name and quantity.
The problem is to update names and quantity in first DataFrame taking information from the second DataFrame by item id. Also, first DataFrame has not all item id's, so I need to take into account only those rows from the second DataFrame, which are in the first one.
DataFrame 1
In [1]: df1
Out[1]:
id name quantity price
0 10 X 10 15
1 11 Y 30 20
2 12 Z 20 15
3 13 X 15 10
4 14 X 12 15
DataFrame 2
In [2]: df2
Out[2]:
id name quantity
0 10 A 3
1 12 B 3
2 13 C 6
I've tried to use apply to iterate through rows and modify column value by condition like this:
def modify(row):
row['name'] = df2[df2['id'] == row['id']]['name'].get_values()[0]
row['quantity'] = df2[df2['id'] == row['id']]['quantity'].get_values()[0]
df1.apply(modify, axis=1)
But it doesn't have any results. DataFrame 1 is still the same
I am expecting something like this first:
In [1]: df1
Out[1]:
id name quantity price
0 10 A 3 15
1 11 Y 30 20
2 12 B 3 15
3 13 C 6 10
4 14 X 12 15
After that I want to drop the rows, which were not modified to get:
In [1]: df1
Out[1]:
id name quantity price
0 10 A 3 15
1 12 B 3 15
2 13 C 6 10
Using update
df1=df1.set_index('id')
df1.update(df2.set_index('id'))
df1=df1.reset_index()
Out[740]:
id name quantity price
0 10 A 3.0 15
1 11 Y 30.0 20
2 12 B 3.0 15
3 13 C 6.0 10
4 14 X 12.0 15
new_df = df.merge(df2, on='id')
new.drop(['name_x','quantity_x'], inplace=True, axis=1)
new.columns = ['id','price','name','quantity']
Output
id price name quantity
0 10 15 A 3
1 12 15 B 3
2 13 10 C 6

Split a column by element and create new ones with pandas

Goal: I want to split one single column by elements (not the strings cells) and, from that division, create new columns, where the element is the title of the new column and the other values from another columns compose the respective column.
There is a way of doing that with pandas? Thanks in advance.
Example:
[IN]:
A 1
A 2
A 6
A 99
B 7
B 8
B 19
B 18
[OUT]:
A B
1 7
2 8
6 19
99 18
Just an alternative if 2 column input data:
print(df)
col1 col2
0 A 1
1 A 2
2 A 6
3 A 99
4 B 7
5 B 8
6 B 19
7 B 18
df1=pd.DataFrame(df.groupby('col1')['col2'].apply(list).to_dict())
print(df1)
A B
0 1 7
1 2 8
2 6 19
3 99 18
Use Series.str.split with GroupBy.cumcount for counter, then reshape by DataFrame.set_index with Series.unstack:
print (df)
col
0 A 1
1 A 2
2 A 6
3 A 99
4 B 7
5 B 8
6 B 19
7 B 18
df1 = df['col'].str.split(expand=True)
g = df1.groupby(0).cumcount()
df2 = df1.set_index([0, g])[1].unstack(0).rename_axis(None, axis=1)
print (df2)
A B
0 1 7
1 2 8
2 6 19
3 99 18
If 2 columns input data:
print (df)
col1 col2
0 A 1
1 A 2
2 A 6
3 A 99
4 B 7
5 B 8
6 B 19
7 B 18
g = df.groupby('col1').cumcount()
df2 = df.set_index(['col1', g])['col2'].unstack(0).rename_axis(None, axis=1)
print (df2)
A B
0 1 7
1 2 8
2 6 19
3 99 18