Subsetting index from Pandas DataFrame - pandas

I have a DataFrame with columns [A, B, C, D, E, F, G, H].
An index has been made with columns [D, G, H]:
>>> print(dgh_columns)
Index(['D', 'G', 'H'], dtype='object')
How can I retrieve the original DataFrame without the columns D, G, H ?
Is there an index subset operation?
Ideally, this would be:
df[df.index - dgh_columns]
But this doesn't seem to work

I think you can use Index.difference:
df[df.columns.difference(dgh_columns)]
Sample:
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[7,8,9],
'F':[1,3,5],
'G':[5,3,6],
'H':[7,4,3]})
print (df)
A B C D E F G H
0 1 4 7 1 7 1 5 7
1 2 5 8 3 8 3 3 4
2 3 6 9 5 9 5 6 3
dgh_columns = pd.Index(['D', 'G', 'H'])
print (df[df.columns.difference(dgh_columns)])
A B C E F
0 1 4 7 7 1
1 2 5 8 8 3
2 3 6 9 9 5
Numpy solution with numpy.setxor1d or numpy.setdiff1d:
dgh_columns = pd.Index(['D', 'G', 'H'])
print (df[np.setxor1d(df.columns, dgh_columns)])
A B C E F
0 1 4 7 7 1
1 2 5 8 8 3
2 3 6 9 9 5
dgh_columns = pd.Index(['D', 'G', 'H'])
print (df[np.setdiff1d(df.columns, dgh_columns)])
A B C E F
0 1 4 7 7 1
1 2 5 8 8 3
2 3 6 9 9 5

use drop
df.drop(list('DGH'), axis=1)
df = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[7,8,9],
'F':[1,3,5],
'G':[5,3,6],
'H':[7,4,3]})
df.drop(list('DGH'), 1)

Related

Split and concatenate dataframe

So i have dataframe which looks like this one:
>>>df = pd.DataFrame({
'id': [i for i in range(5)],
'1': ['a', 'b', 'c', 'd', 'e'],
'2': ['f', 'g', 'h', 'i', 'g']
})
>>>df
id 1 2
0 0 a f
1 1 b g
2 2 c h
3 3 d i
4 4 e g
I want to convert this dataframe to following dataframe
>>>df_concatenated
id val
1 0 a
1 1 b
1 2 c
1 3 d
1 4 e
2 0 f
2 1 g
2 2 h
2 3 i
2 4 g
One way is to pd.melt
pd.melt(df, id_vars=['id'], value_vars=['1','2']).set_index('variable',append=True)
The other is by splitting by .loc accessor and concatenating. Long but it works
res1=df.iloc[:,[0,2]]
res1.columns=['id','val']
res=df.iloc[:,:2]
res.columns=['id','val']
res2=pd.concat([res1,res])
res2
variable id value
0 1 0 a
1 1 1 b
2 1 2 c
3 1 3 d
4 1 4 e
5 2 0 f
6 2 1 g
7 2 2 h
8 2 3 i
9 2 4 g
You can try this:
df = df.rename({"1":"val"},axis=1)
df_temp = df[["id","2"]]
df_temp = df_temp.rename({"2":"val"},axis=1)
df.drop("2",axis=1,inplace=True)
out_df = pd.concat([df,df_temp],axis=0).reset_index(drop=True)
print(out_df)
output:
id val
0 0 a
1 1 b
2 2 c
3 3 d
4 4 e
5 0 f
6 1 g
7 2 h
8 3 i
9 4 g

How to append a DataFrame to a multiindex DataFrame?

Suppose that I have the DataFrames
In [1]: a=pd.DataFrame([[1,2],[3,4],[5,6],[7,8]],
...: index=pd.MultiIndex.from_product([('A','B'),('d','e')]))
In [2]: a
Out[2]:
0 1
A d 1 2
e 3 4
B d 5 6
e 7 8
In [3]: b=pd.DataFrame([[9,10],[11,12]],index=('d','e'))
In [4]: b
Out[4]:
0 1
d 9 10
e 11 12
and I want to append b to a, with the subindex C, producing the
DataFrame
0 1
A d 1 2
e 3 4
B d 5 6
e 7 8
C d 9 10
e 11 12
I tried
In [5]: a.loc['C'] = b
but got
TypeError: 'int' object is not iterable
How do I do it?
Assign a new value to b , then set_index and swaplevel before append to a
a.append(b.assign(k='C').set_index('k',append=True).swaplevel(0,1))
Out[33]:
0 1
A d 1 2
e 3 4
B d 5 6
e 7 8
C d 9 10
e 11 12
First update b's index to match the same levels as a, then concat:
b.index = pd.MultiIndex.from_arrays([('C','C'), ('d','e')])
pd.concat([a,b]))])
If wanna step-by-step;
df2 = pd.concat([a,b], ignore_index=True)
df2['i0'] = a.index.get_level_values(0).tolist() + ['C']*len(b)
df2['i1'] = a.index.get_level_values(0).union(b.index)
df2.set_index(['i0', 'i1'])
Outputs
0 1
i0 i1
A A 1 2
A 3 4
B B 5 6
B 7 8
C d 9 10
e 11 12

Element wise multiplication of each row

I have two DataFrame objects which I want to apply an element-wise multiplication on each row onto:
df_prob_wc.shape # (3505, 13)
df_prob_c.shape # (13, 1)
I thought I could do it with DataFrame.apply()
df_prob_wc.apply(lambda x: x.multiply(df_prob_c), axis=1)
which gives me:
TypeError: ("'int' object is not iterable", 'occurred at index $')
or with
df_prob_wc.apply(lambda x: x * df_prob_c, axis=1)
which gives me:
TypeError: 'int' object is not iterable
But it's not working.
However, I can do this:
df_prob_wc.apply(lambda x: x * np.asarray([1,2,3,4,5,6,7,8,9,10,11,12,13]), axis=1)
What am I doing wrong here?
It seems you need multiple by Series created with df_prob_c by iloc:
df_prob_wc = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df_prob_wc)
A B C D E F
0 1 4 7 1 5 7
1 2 5 8 3 3 4
2 3 6 9 5 6 3
df_prob_c = pd.DataFrame([[4,5,6,1,2,3]])
#for align data same columns in both df
df_prob_c.index = df_prob_wc.columns
print (df_prob_c)
0
A 4
B 5
C 6
D 1
E 2
F 3
print (df_prob_wc.shape)
(3, 6)
print (df_prob_c.shape)
(6, 1)
print (df_prob_c.iloc[:,0])
A 4
B 5
C 6
D 1
E 2
F 3
Name: 0, dtype: int64
print (df_prob_wc.mul(df_prob_c.iloc[:,0], axis=1))
A B C D E F
0 4 20 42 1 10 21
1 8 25 48 3 6 12
2 12 30 54 5 12 9
Another solution is multiple by numpy array, only need [:,0] for select:
print (df_prob_wc.mul(df_prob_c.values[:,0], axis=1))
A B C D E F
0 4 20 42 1 10 21
1 8 25 48 3 6 12
2 12 30 54 5 12 9
And another solution with DataFrame.squeeze:
print (df_prob_wc.mul(df_prob_c.squeeze(), axis=1))
A B C D E F
0 4 20 42 1 10 21
1 8 25 48 3 6 12
2 12 30 54 5 12 9

Group by with a pandas dataframe using different aggregation for different columns

I have a pandas dataframe df with columns [a, b, c, d, e, f]. I want to perform a group by on df. I can best describe what it's supposed to do in SQL:
SELECT a, b, min(c), min(d), max(e), sum(f)
FROM df
GROUP BY a, b
How do I do this group by using pandas on my dataframe df?
consider df:
a b c d e f
1 1 2 5 9 3
1 1 3 3 4 5
2 2 4 7 4 4
2 2 5 3 8 8
I expect the result to be:
a b c d e f
1 1 2 3 9 8
2 2 4 3 8 12
use agg
df = pd.DataFrame(
dict(
a=list('aaaabbbb'),
b=list('ccddccdd'),
c=np.arange(8),
d=np.arange(8),
e=np.arange(8),
f=np.arange(8),
)
)
funcs = dict(c='min', d='min', e='max', f='sum')
df.groupby(['a', 'b']).agg(funcs).reset_index()
a b c e f d
0 a c 0 1 1 0
1 a d 2 3 5 2
2 b c 4 5 9 4
3 b d 6 7 13 6
with your data
a b c e f d
0 1 1 2 9 8 3
1 2 2 4 8 12 3

How to prepend pandas data frames

How can I prepend a dataframe to another dataframe? Consider dataframe A:
b c d
2 3 4
6 7 8
and dataFrame B:
a
1
5
I want to prepend A to B to get:
a b c d
1 2 3 4
5 6 7 8
2 methods:
In [1]: df1 = DataFrame(randint(0,10,size=(12)).reshape(4,3),columns=list('bcd'))
In [2]: df1
Out[2]:
b c d
0 5 9 5
1 8 4 0
2 8 4 5
3 4 9 2
In [3]: df2 = DataFrame(randint(0,10,size=(4)).reshape(4,1),columns=list('a'))
In [4]: df2
Out[4]:
a
0 4
1 9
2 2
3 0
Concating (returns a new frame)
In [6]: pd.concat([df2,df1],axis=1)
Out[6]:
a b c d
0 4 5 9 5
1 9 8 4 0
2 2 8 4 5
3 0 4 9 2
Insert, puts a series into an existing frame
In [8]: df1.insert(0,'a',df2['a'])
In [9]: df1
Out[9]:
a b c d
0 4 5 9 5
1 9 8 4 0
2 2 8 4 5
3 0 4 9 2
Achieved by doing
A[B.columns]=B