Append dataframe in specific row - pandas

I have dataframe in the following format
a b label
1 5 A
2 6 A
3 7 A
4 8 B
1 5 B
2 6 B
5 6 C
3 2 C
I want append with new dataframe
a b label
3 4 A
The result become this
a b label
1 5 A
2 6 A
3 7 A
4 8 B
1 5 B
2 6 B
5 6 C
3 2 C
3 4 A <-- New Data
My question is how order new data become this every append new data
a b label
1 5 A
2 6 A
3 7 A
3 4 A <-- New Data
4 8 B
1 5 B
2 6 B
5 6 C
3 2 C
This is my code
import pandas as pd
df1 = pd.DataFrame({"a":[1, 2, 3, 4, 1, 2,5,3],
"b":[5, 6, 7, 8, 5, 6,6,2],
"label":['A','A','A','B','B','B','C','C']})
new_data = pd.DataFrame({"a":[3],
"b":[4],
"label":['A']})
df1 = df1.append(new_data,ignore_index = True)

You can simply sort it on the label column after the data frame append
import numpy as np
import pandas as pd
df1 = pd.DataFrame({"a":[1, 2, 3, 4, 1, 2,5,3],
"b":[5, 6, 7, 8, 5, 6,6,2],
"label":['A','A','A','B','B','B','C','C']})
new_data = pd.DataFrame({"a":[3],
"b":[4],
"label":['A']})
df1 = df1.append(new_data,ignore_index = True).sort_values(by='label')
Result :
a b label
1 5 A
2 6 A
3 7 A
3 4 A <-- new data here
4 8 B
1 5 B
2 6 B
5 6 C
3 2 C

Related

Passing Tuple to a function via apply

I am trying to run below function which takes two points..
point A=(2,3)
point B=(4,5
def Somefunc(pointA, point B):
x= pointA[0] + pointB[1]
return x
Now, when in try to create a separate column based on this fucntion, it is throwing me errors like cannot convert the series to <class 'float'>, so I tried this
df['T']=df.apply(Somefunc((df['A'].apply(lambda x: float(x)),df['B'].apply(lambda x: float(x))),\
(df['C'].apply(lambda x: float(x)),df['D'].apply(lambda x: float(x)))),axis=0))
Sample dataframe below;
A B C D
1 2 3 5
2 4 7 8
4 7 9 0
Any help will be appreciated.
This is the best guess I can make as to what you're trying to do:
df['T']=df.apply(lambda row: [(row['A'],row['B']),(row['C'],row['D'])],axis=1)
Edit: to apply your function;
df['T'] = df.apply(lambda row: SomeFunc((row['A'],row['B']),(row['C'],row['D'])),axis=1)
that being said, the same result can be achieved much quicker and idiomatically like so:
>>> df
A B C D
0 2 7 3 3
1 3 1 5 7
2 2 0 6 2
3 3 9 5 9
4 0 2 3 7
>>> df['T']=df.apply(tuple,axis=1)
>>> df
A B C D T
0 2 7 3 3 (2, 7, 3, 3)
1 3 1 5 7 (3, 1, 5, 7)
2 2 0 6 2 (2, 0, 6, 2)
3 3 9 5 9 (3, 9, 5, 9)
4 0 2 3 7 (0, 2, 3, 7)

How to multiply dataframe columns with dataframe column in pandas?

I want to multiply hdataframe columns with dataframe column.
I have two dataframews as shown here:
A dataframe, B dataframe
a b c d e
3 4 4 4 2
3 3 3 3 3
3 3 3 3 4
and I want to make multiplication A and B.
Multiplication result should be like this:
a b c d
6 8 8 8
9 9 9 9
12 12 12 12
I tried just * multiplication but got a wrong result.
Thank you in advance!
Use B.values or B.to_numpy() which will return numpy array and then you can multiply with DataFrame
Ex.:
>>> A
a b c d
0 3 4 4 4
1 3 3 3 3
2 3 3 3 3
>>> B
c
0 2
1 3
2 4
>>> A * B.values
a b c d
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12
Just another variation on #Dishin's excellent answer:
U can use pandas mul method to multiply A by B, by setting B as a series and multiplying on the index:
A.mul(B.iloc[:,0],axis='index')
a b c d
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12
Use DataFrame.mul with Series by selecting e column:
df = A.mul(B['e'], axis=0)
print (df)
a b c d
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12
I think you are looking for the mul function, as seen on this thread here, here is the code.
df = pd.DataFrame([[3, 4, 4, 4],[3, 3, 3, 3],[3, 3, 3, 3]])
val = [2,3,4]
df.mul(val, axis = 0)
Here are the results:
0 1 2 3
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12
Ignore the indices.

Renaming column of one dataframe by extracting from combination of series and dataframe column names

In the line below, I am renaming the columns of pnlsummary dataframe from the column names of three series (totalheldmw, totalcost and totalsellprofit) and one dataframe (totalheldprofit).
The difficulty I have is to iterate over the column names of the dataframe. I have manually assigned the names as you can see below. I would suppose there is an efficient way of iterating over the column names of the dataframe. Please advice.
pnlsummary.columns =
[totalheldmw.name[0],totalcost.name[0],totalsellprofit.name[0],
totalheldprofit.columns[0],totalheldprofit.columns[1],
totalheldprofit.columns[2],totalheldprofit.columns[3]]
I think you need create list by constants and then add columns names converted to list:
pnlsummary.columns = [totalheldmw.name[0],totalcost.name[0],totalsellprofit.name[0]] +
totalheldprofit.columns[0:3].astype(str).tolist()
Sample:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
df.columns = ['a','s','d'] + df.columns[0:3].tolist()
print (df)
a s d A B C
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b

Separate aggregated data in different rows [duplicate]

This question already has answers here:
How can I replicate rows of a Pandas DataFrame?
(10 answers)
Closed 11 months ago.
I want to replicate rows in a Pandas Dataframe. Each row should be repeated n times, where n is a field of each row.
import pandas as pd
what_i_have = pd.DataFrame(data={
'id': ['A', 'B', 'C'],
'n' : [ 1, 2, 3],
'v' : [ 10, 13, 8]
})
what_i_want = pd.DataFrame(data={
'id': ['A', 'B', 'B', 'C', 'C', 'C'],
'v' : [ 10, 13, 13, 8, 8, 8]
})
Is this possible?
You can use Index.repeat to get repeated index values based on the column then select from the DataFrame:
df2 = df.loc[df.index.repeat(df.n)]
id n v
0 A 1 10
1 B 2 13
1 B 2 13
2 C 3 8
2 C 3 8
2 C 3 8
Or you could use np.repeat to get the repeated indices and then use that to index into the frame:
df2 = df.loc[np.repeat(df.index.values, df.n)]
id n v
0 A 1 10
1 B 2 13
1 B 2 13
2 C 3 8
2 C 3 8
2 C 3 8
After which there's only a bit of cleaning up to do:
df2 = df2.drop("n", axis=1).reset_index(drop=True)
id v
0 A 10
1 B 13
2 B 13
3 C 8
4 C 8
5 C 8
Note that if you might have duplicate indices to worry about, you could use .iloc instead:
df.iloc[np.repeat(np.arange(len(df)), df["n"])].drop("n", axis=1).reset_index(drop=True)
id v
0 A 10
1 B 13
2 B 13
3 C 8
4 C 8
5 C 8
which uses the positions, and not the index labels.
You could use set_index and repeat
In [1057]: df.set_index(['id'])['v'].repeat(df['n']).reset_index()
Out[1057]:
id v
0 A 10
1 B 13
2 B 13
3 C 8
4 C 8
5 C 8
Details
In [1058]: df
Out[1058]:
id n v
0 A 1 10
1 B 2 13
2 C 3 8
It's something like the uncount in tidyr:
https://tidyr.tidyverse.org/reference/uncount.html
I wrote a package (https://github.com/pwwang/datar) that implements this API:
from datar import f
from datar.tibble import tribble
from datar.tidyr import uncount
what_i_have = tribble(
f.id, f.n, f.v,
'A', 1, 10,
'B', 2, 13,
'C', 3, 8
)
what_i_have >> uncount(f.n)
Output:
id v
0 A 10
1 B 13
1 B 13
2 C 8
2 C 8
2 C 8
Not the best solution, but I want to share this: you could also use pandas.reindex() and .repeat():
df.reindex(df.index.repeat(df.n)).drop('n', axis=1)
Output:
id v
0 A 10
1 B 13
1 B 13
2 C 8
2 C 8
2 C 8
You can further append .reset_index(drop=True) to reset the .index.

How to prepend pandas data frames

How can I prepend a dataframe to another dataframe? Consider dataframe A:
b c d
2 3 4
6 7 8
and dataFrame B:
a
1
5
I want to prepend A to B to get:
a b c d
1 2 3 4
5 6 7 8
2 methods:
In [1]: df1 = DataFrame(randint(0,10,size=(12)).reshape(4,3),columns=list('bcd'))
In [2]: df1
Out[2]:
b c d
0 5 9 5
1 8 4 0
2 8 4 5
3 4 9 2
In [3]: df2 = DataFrame(randint(0,10,size=(4)).reshape(4,1),columns=list('a'))
In [4]: df2
Out[4]:
a
0 4
1 9
2 2
3 0
Concating (returns a new frame)
In [6]: pd.concat([df2,df1],axis=1)
Out[6]:
a b c d
0 4 5 9 5
1 9 8 4 0
2 2 8 4 5
3 0 4 9 2
Insert, puts a series into an existing frame
In [8]: df1.insert(0,'a',df2['a'])
In [9]: df1
Out[9]:
a b c d
0 4 5 9 5
1 9 8 4 0
2 2 8 4 5
3 0 4 9 2
Achieved by doing
A[B.columns]=B