Append dataframe in specific row

Append dataframe in specific row - pandas

I have dataframe in the following format
a b label
1 5 A
2 6 A
3 7 A
4 8 B
1 5 B
2 6 B
5 6 C
3 2 C
I want append with new dataframe
a b label
3 4 A
The result become this
a b label
1 5 A
2 6 A
3 7 A
4 8 B
1 5 B
2 6 B
5 6 C
3 2 C
3 4 A <-- New Data
My question is how order new data become this every append new data
a b label
1 5 A
2 6 A
3 7 A
3 4 A <-- New Data
4 8 B
1 5 B
2 6 B
5 6 C
3 2 C
This is my code
import pandas as pd
df1 = pd.DataFrame({"a":[1, 2, 3, 4, 1, 2,5,3],
"b":[5, 6, 7, 8, 5, 6,6,2],
"label":['A','A','A','B','B','B','C','C']})
new_data = pd.DataFrame({"a":[3],
"b":[4],
"label":['A']})
df1 = df1.append(new_data,ignore_index = True)

You can simply sort it on the label column after the data frame append
import numpy as np
import pandas as pd
df1 = pd.DataFrame({"a":[1, 2, 3, 4, 1, 2,5,3],
"b":[5, 6, 7, 8, 5, 6,6,2],
"label":['A','A','A','B','B','B','C','C']})
new_data = pd.DataFrame({"a":[3],
"b":[4],
"label":['A']})
df1 = df1.append(new_data,ignore_index = True).sort_values(by='label')
Result :
a b label
1 5 A
2 6 A
3 7 A
3 4 A <-- new data here
4 8 B
1 5 B
2 6 B
5 6 C
3 2 C

Related

Passing Tuple to a function via apply

I am trying to run below function which takes two points..
point A=(2,3)
point B=(4,5
def Somefunc(pointA, point B):
x= pointA[0] + pointB[1]
return x
Now, when in try to create a separate column based on this fucntion, it is throwing me errors like cannot convert the series to <class 'float'>, so I tried this
df['T']=df.apply(Somefunc((df['A'].apply(lambda x: float(x)),df['B'].apply(lambda x: float(x))),\
(df['C'].apply(lambda x: float(x)),df['D'].apply(lambda x: float(x)))),axis=0))
Sample dataframe below;
A B C D
1 2 3 5
2 4 7 8
4 7 9 0
Any help will be appreciated.

This is the best guess I can make as to what you're trying to do:
df['T']=df.apply(lambda row: [(row['A'],row['B']),(row['C'],row['D'])],axis=1)
Edit: to apply your function;
df['T'] = df.apply(lambda row: SomeFunc((row['A'],row['B']),(row['C'],row['D'])),axis=1)
that being said, the same result can be achieved much quicker and idiomatically like so:
>>> df
A B C D
0 2 7 3 3
1 3 1 5 7
2 2 0 6 2
3 3 9 5 9
4 0 2 3 7
>>> df['T']=df.apply(tuple,axis=1)
>>> df
A B C D T
0 2 7 3 3 (2, 7, 3, 3)
1 3 1 5 7 (3, 1, 5, 7)
2 2 0 6 2 (2, 0, 6, 2)
3 3 9 5 9 (3, 9, 5, 9)
4 0 2 3 7 (0, 2, 3, 7)

How to multiply dataframe columns with dataframe column in pandas?

I want to multiply hdataframe columns with dataframe column.
I have two dataframews as shown here:
A dataframe, B dataframe
a b c d e
3 4 4 4 2
3 3 3 3 3
3 3 3 3 4
and I want to make multiplication A and B.
Multiplication result should be like this:
a b c d
6 8 8 8
9 9 9 9
12 12 12 12
I tried just * multiplication but got a wrong result.
Thank you in advance!

Use B.values or B.to_numpy() which will return numpy array and then you can multiply with DataFrame
Ex.:
>>> A
a b c d
0 3 4 4 4
1 3 3 3 3
2 3 3 3 3
>>> B
c
0 2
1 3
2 4
>>> A * B.values
a b c d
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12

Just another variation on #Dishin's excellent answer:
U can use pandas mul method to multiply A by B, by setting B as a series and multiplying on the index:
A.mul(B.iloc[:,0],axis='index')
a b c d
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12

Use DataFrame.mul with Series by selecting e column:
df = A.mul(B['e'], axis=0)
print (df)
a b c d
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12

I think you are looking for the mul function, as seen on this thread here, here is the code.
df = pd.DataFrame([[3, 4, 4, 4],[3, 3, 3, 3],[3, 3, 3, 3]])
val = [2,3,4]
df.mul(val, axis = 0)
Here are the results:
0 1 2 3
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12
Ignore the indices.

Renaming column of one dataframe by extracting from combination of series and dataframe column names

In the line below, I am renaming the columns of pnlsummary dataframe from the column names of three series (totalheldmw, totalcost and totalsellprofit) and one dataframe (totalheldprofit).
The difficulty I have is to iterate over the column names of the dataframe. I have manually assigned the names as you can see below. I would suppose there is an efficient way of iterating over the column names of the dataframe. Please advice.
pnlsummary.columns =
[totalheldmw.name[0],totalcost.name[0],totalsellprofit.name[0],
totalheldprofit.columns[0],totalheldprofit.columns[1],
totalheldprofit.columns[2],totalheldprofit.columns[3]]

I think you need create list by constants and then add columns names converted to list:
pnlsummary.columns = [totalheldmw.name[0],totalcost.name[0],totalsellprofit.name[0]] +
totalheldprofit.columns[0:3].astype(str).tolist()
Sample:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
df.columns = ['a','s','d'] + df.columns[0:3].tolist()
print (df)
a s d A B C
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b

Separate aggregated data in different rows [duplicate]

This question already has answers here:
How can I replicate rows of a Pandas DataFrame?
(10 answers)
Closed 11 months ago.
I want to replicate rows in a Pandas Dataframe. Each row should be repeated n times, where n is a field of each row.
import pandas as pd
what_i_have = pd.DataFrame(data={
'id': ['A', 'B', 'C'],
'n' : [ 1, 2, 3],
'v' : [ 10, 13, 8]
})
what_i_want = pd.DataFrame(data={
'id': ['A', 'B', 'B', 'C', 'C', 'C'],
'v' : [ 10, 13, 13, 8, 8, 8]
})
Is this possible?

You can use Index.repeat to get repeated index values based on the column then select from the DataFrame:
df2 = df.loc[df.index.repeat(df.n)]
id n v
0 A 1 10
1 B 2 13
1 B 2 13
2 C 3 8
2 C 3 8
2 C 3 8
Or you could use np.repeat to get the repeated indices and then use that to index into the frame:
df2 = df.loc[np.repeat(df.index.values, df.n)]
id n v
0 A 1 10
1 B 2 13
1 B 2 13
2 C 3 8
2 C 3 8
2 C 3 8
After which there's only a bit of cleaning up to do:
df2 = df2.drop("n", axis=1).reset_index(drop=True)
id v
0 A 10
1 B 13
2 B 13
3 C 8
4 C 8
5 C 8
Note that if you might have duplicate indices to worry about, you could use .iloc instead:
df.iloc[np.repeat(np.arange(len(df)), df["n"])].drop("n", axis=1).reset_index(drop=True)
id v
0 A 10
1 B 13
2 B 13
3 C 8
4 C 8
5 C 8
which uses the positions, and not the index labels.

You could use set_index and repeat
In [1057]: df.set_index(['id'])['v'].repeat(df['n']).reset_index()
Out[1057]:
id v
0 A 10
1 B 13
2 B 13
3 C 8
4 C 8
5 C 8
Details
In [1058]: df
Out[1058]:
id n v
0 A 1 10
1 B 2 13
2 C 3 8

It's something like the uncount in tidyr:
https://tidyr.tidyverse.org/reference/uncount.html
I wrote a package (https://github.com/pwwang/datar) that implements this API:
from datar import f
from datar.tibble import tribble
from datar.tidyr import uncount
what_i_have = tribble(
f.id, f.n, f.v,
'A', 1, 10,
'B', 2, 13,
'C', 3, 8
)
what_i_have >> uncount(f.n)
Output:
id v
0 A 10
1 B 13
1 B 13
2 C 8
2 C 8
2 C 8

Not the best solution, but I want to share this: you could also use pandas.reindex() and .repeat():
df.reindex(df.index.repeat(df.n)).drop('n', axis=1)
Output:
id v
0 A 10
1 B 13
1 B 13
2 C 8
2 C 8
2 C 8
You can further append .reset_index(drop=True) to reset the .index.

How to prepend pandas data frames

How can I prepend a dataframe to another dataframe? Consider dataframe A:
b c d
2 3 4
6 7 8
and dataFrame B:
a
1
5
I want to prepend A to B to get:
a b c d
1 2 3 4
5 6 7 8

2 methods:
In [1]: df1 = DataFrame(randint(0,10,size=(12)).reshape(4,3),columns=list('bcd'))
In [2]: df1
Out[2]:
b c d
0 5 9 5
1 8 4 0
2 8 4 5
3 4 9 2
In [3]: df2 = DataFrame(randint(0,10,size=(4)).reshape(4,1),columns=list('a'))
In [4]: df2
Out[4]:
a
0 4
1 9
2 2
3 0
Concating (returns a new frame)
In [6]: pd.concat([df2,df1],axis=1)
Out[6]:
a b c d
0 4 5 9 5
1 9 8 4 0
2 2 8 4 5
3 0 4 9 2
Insert, puts a series into an existing frame
In [8]: df1.insert(0,'a',df2['a'])
In [9]: df1
Out[9]:
a b c d
0 4 5 9 5
1 9 8 4 0
2 2 8 4 5
3 0 4 9 2

Achieved by doing
A[B.columns]=B

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Append dataframe in specific row - pandas

Related

Passing Tuple to a function via apply

How to multiply dataframe columns with dataframe column in pandas?

Renaming column of one dataframe by extracting from combination of series and dataframe column names

Separate aggregated data in different rows [duplicate]

How to prepend pandas data frames

Categories

Resources