How to prepend pandas data frames - pandas

How can I prepend a dataframe to another dataframe? Consider dataframe A:
b c d
2 3 4
6 7 8
and dataFrame B:
a
1
5
I want to prepend A to B to get:
a b c d
1 2 3 4
5 6 7 8

2 methods:
In [1]: df1 = DataFrame(randint(0,10,size=(12)).reshape(4,3),columns=list('bcd'))
In [2]: df1
Out[2]:
b c d
0 5 9 5
1 8 4 0
2 8 4 5
3 4 9 2
In [3]: df2 = DataFrame(randint(0,10,size=(4)).reshape(4,1),columns=list('a'))
In [4]: df2
Out[4]:
a
0 4
1 9
2 2
3 0
Concating (returns a new frame)
In [6]: pd.concat([df2,df1],axis=1)
Out[6]:
a b c d
0 4 5 9 5
1 9 8 4 0
2 2 8 4 5
3 0 4 9 2
Insert, puts a series into an existing frame
In [8]: df1.insert(0,'a',df2['a'])
In [9]: df1
Out[9]:
a b c d
0 4 5 9 5
1 9 8 4 0
2 2 8 4 5
3 0 4 9 2

Achieved by doing
A[B.columns]=B

Related

Maximum of calculated pandas column and 0

I have a very simple problem (I guess) but don't find the right syntax to do it :
The following Dataframe :
A B C
0 7 12 2
1 5 4 4
2 4 8 2
3 9 2 3
I need to create a new column D equal for each row to max (0 ; A-B+C)
I tried a np.maximum(df.A-df.B+df.C,0) but it doesn't match and give me the maximum value of the calculated column for each row (= 10 in the example).
Finally, I would like to obtain the DF below :
A B C D
0 7 12 2 0
1 5 4 4 5
2 4 8 2 0
3 9 2 3 10
Any help appreciated
Thanks
Let us try
df['D'] = df.eval('A-B+C').clip(lower=0)
Out[256]:
0 0
1 5
2 0
3 10
dtype: int64
You can use np.where:
s = df["A"]-df["B"]+df["C"]
df["D"] = np.where(s>0, s, 0) #or s.where(s>0, 0)
print (df)
A B C D
0 7 12 2 0
1 5 4 4 5
2 4 8 2 0
3 9 2 3 10
To do this in one line you can use apply to apply the maximum function to each row seperately.
In [19]: df['D'] = df.apply(lambda s: max(s['A'] - s['B'] + s['C'], 0), axis=1)
In [20]: df
Out[20]:
A B C D
0 0 0 0 0
1 5 4 4 5
2 0 0 0 0
3 9 2 3 10

How to multiply dataframe columns with dataframe column in pandas?

I want to multiply hdataframe columns with dataframe column.
I have two dataframews as shown here:
A dataframe, B dataframe
a b c d e
3 4 4 4 2
3 3 3 3 3
3 3 3 3 4
and I want to make multiplication A and B.
Multiplication result should be like this:
a b c d
6 8 8 8
9 9 9 9
12 12 12 12
I tried just * multiplication but got a wrong result.
Thank you in advance!
Use B.values or B.to_numpy() which will return numpy array and then you can multiply with DataFrame
Ex.:
>>> A
a b c d
0 3 4 4 4
1 3 3 3 3
2 3 3 3 3
>>> B
c
0 2
1 3
2 4
>>> A * B.values
a b c d
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12
Just another variation on #Dishin's excellent answer:
U can use pandas mul method to multiply A by B, by setting B as a series and multiplying on the index:
A.mul(B.iloc[:,0],axis='index')
a b c d
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12
Use DataFrame.mul with Series by selecting e column:
df = A.mul(B['e'], axis=0)
print (df)
a b c d
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12
I think you are looking for the mul function, as seen on this thread here, here is the code.
df = pd.DataFrame([[3, 4, 4, 4],[3, 3, 3, 3],[3, 3, 3, 3]])
val = [2,3,4]
df.mul(val, axis = 0)
Here are the results:
0 1 2 3
0 6 8 8 8
1 9 9 9 9
2 12 12 12 12
Ignore the indices.

Split a column by element and create new ones with pandas

Goal: I want to split one single column by elements (not the strings cells) and, from that division, create new columns, where the element is the title of the new column and the other values from another columns compose the respective column.
There is a way of doing that with pandas? Thanks in advance.
Example:
[IN]:
A 1
A 2
A 6
A 99
B 7
B 8
B 19
B 18
[OUT]:
A B
1 7
2 8
6 19
99 18
Just an alternative if 2 column input data:
print(df)
col1 col2
0 A 1
1 A 2
2 A 6
3 A 99
4 B 7
5 B 8
6 B 19
7 B 18
df1=pd.DataFrame(df.groupby('col1')['col2'].apply(list).to_dict())
print(df1)
A B
0 1 7
1 2 8
2 6 19
3 99 18
Use Series.str.split with GroupBy.cumcount for counter, then reshape by DataFrame.set_index with Series.unstack:
print (df)
col
0 A 1
1 A 2
2 A 6
3 A 99
4 B 7
5 B 8
6 B 19
7 B 18
df1 = df['col'].str.split(expand=True)
g = df1.groupby(0).cumcount()
df2 = df1.set_index([0, g])[1].unstack(0).rename_axis(None, axis=1)
print (df2)
A B
0 1 7
1 2 8
2 6 19
3 99 18
If 2 columns input data:
print (df)
col1 col2
0 A 1
1 A 2
2 A 6
3 A 99
4 B 7
5 B 8
6 B 19
7 B 18
g = df.groupby('col1').cumcount()
df2 = df.set_index(['col1', g])['col2'].unstack(0).rename_axis(None, axis=1)
print (df2)
A B
0 1 7
1 2 8
2 6 19
3 99 18

Pandas count values inside dataframe

I have a dataframe that looks like this:
A B C
1 1 8 3
2 5 4 3
3 5 8 1
and I want to count the values so to make df like this:
total
1 2
3 2
4 1
5 2
8 2
is it possible with pandas?
With np.unique -
In [332]: df
Out[332]:
A B C
1 1 8 3
2 5 4 3
3 5 8 1
In [333]: ids, c = np.unique(df.values.ravel(), return_counts=1)
In [334]: pd.DataFrame({'total':c}, index=ids)
Out[334]:
total
1 2
3 2
4 1
5 2
8 2
With pandas-series -
In [357]: pd.Series(np.ravel(df)).value_counts().sort_index()
Out[357]:
1 2
3 2
4 1
5 2
8 2
dtype: int64
You can also use stack() and groupby()
df = pd.DataFrame({'A':[1,8,3],'B':[5,4,3],'C':[5,8,1]})
print(df)
A B C
0 1 5 5
1 8 4 8
2 3 3 1
df1 = df.stack().reset_index(1)
df1.groupby(0).count()
level_1
0
1 2
3 2
4 1
5 2
8 2
Other alternative may be to use stack, followed by value_counts then, result changed to frame and finally sorting the index:
count_df = df.stack().value_counts().to_frame('total').sort_index()
count_df
Result:
total
1 2
3 2
4 1
5 2
8 2
using np.unique(, return_counts=True) and np.column_stack():
pd.DataFrame(np.column_stack(np.unique(df, return_counts=True)))
returns:
0 1
0 1 2
1 3 2
2 4 1
3 5 2
4 8 2

Subsetting index from Pandas DataFrame

I have a DataFrame with columns [A, B, C, D, E, F, G, H].
An index has been made with columns [D, G, H]:
>>> print(dgh_columns)
Index(['D', 'G', 'H'], dtype='object')
How can I retrieve the original DataFrame without the columns D, G, H ?
Is there an index subset operation?
Ideally, this would be:
df[df.index - dgh_columns]
But this doesn't seem to work
I think you can use Index.difference:
df[df.columns.difference(dgh_columns)]
Sample:
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[7,8,9],
'F':[1,3,5],
'G':[5,3,6],
'H':[7,4,3]})
print (df)
A B C D E F G H
0 1 4 7 1 7 1 5 7
1 2 5 8 3 8 3 3 4
2 3 6 9 5 9 5 6 3
dgh_columns = pd.Index(['D', 'G', 'H'])
print (df[df.columns.difference(dgh_columns)])
A B C E F
0 1 4 7 7 1
1 2 5 8 8 3
2 3 6 9 9 5
Numpy solution with numpy.setxor1d or numpy.setdiff1d:
dgh_columns = pd.Index(['D', 'G', 'H'])
print (df[np.setxor1d(df.columns, dgh_columns)])
A B C E F
0 1 4 7 7 1
1 2 5 8 8 3
2 3 6 9 9 5
dgh_columns = pd.Index(['D', 'G', 'H'])
print (df[np.setdiff1d(df.columns, dgh_columns)])
A B C E F
0 1 4 7 7 1
1 2 5 8 8 3
2 3 6 9 9 5
use drop
df.drop(list('DGH'), axis=1)
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[7,8,9],
'F':[1,3,5],
'G':[5,3,6],
'H':[7,4,3]})
df.drop(list('DGH'), 1)