column names to column, pandas - pandas

What is an apposite function of pivot in Pandas?
For example I have
a b c
1 1 2
2 2 3
3 1 2
What I want
a newcol newcol2
1 b 1
1 c 2
2 b 2
2 c 3
3 b 1
3 c 2

use pd.melt http://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html
import pandas as pd
df = pd.DataFrame({'a':[1,2,3],'b':[1,2,1],'c':[2,3,2]})
pd.melt(df,id_vars=['a'])
Out[8]:
a variable value
0 1 b 1
1 2 b 2
2 3 b 1
3 1 c 2
4 2 c 3
5 3 c 2

Related

How to create a rolling unique count by group using pandas

I have a dataframe like the following:
group value
1 a
1 a
1 b
1 b
1 b
1 b
1 c
2 d
2 d
2 d
2 d
2 e
I want to create a column with how many unique values there have been so far for the group. Like below:
group value group_value_id
1 a 1
1 a 1
1 b 2
1 b 2
1 b 2
1 b 2
1 c 3
2 d 1
2 d 1
2 d 1
2 d 1
2 e 2
Use custom lambda function with GroupBy.transform and factorize:
df['group_value_id']=df.groupby('group')['value'].transform(lambda x:pd.factorize(x)[0]) + 1
print (df)
group value group_value_id
0 1 a 1
1 1 a 1
2 1 b 2
3 1 b 2
4 1 b 2
5 1 b 2
6 1 c 3
7 2 d 1
8 2 d 1
9 2 d 1
10 2 d 1
11 2 e 2
because:
df['group_value_id'] = df.groupby('group')['value'].rank('dense')
print (df)
DataError: No numeric types to aggregate
Also cab be solved as :
df['group_val_id'] = (df.groupby('group')['value'].
apply(lambda x:x.astype('category').cat.codes + 1))
df
group value group_val_id
0 1 a 1
1 1 a 1
2 1 b 2
3 1 b 2
4 1 b 2
5 1 b 2
6 1 c 3
7 2 d 1
8 2 d 1
9 2 d 1
10 2 d 1
11 2 e 2

Replace values of duplicated rows with first record in pandas?

Input
df
id label
a 1
b 2
a 3
a 4
b 2
b 3
c 1
c 2
d 2
d 3
Expected
df
id label
a 1
b 2
a 1
a 1
b 2
b 2
c 1
c 1
d 2
d 2
For id a, the label value is 1 and id b is 2 because 1 and 2 is the first record for a and b.
Try
I refer this post, but still not solve it.
Update with transform first
df['lb2']=df.groupby('id').label.transform('first')
df
Out[87]:
id label lb2
0 a 1 1
1 b 2 2
2 a 3 1
3 a 4 1
4 b 2 2
5 b 3 2
6 c 1 1
7 c 2 1
8 d 2 2
9 d 3 2

How to remove one specific duplicate named column in columns of a dataframe?

I have a sample dataframe df with columns as:
a b c a a b b c c
0 2 2 1 2 2 1 1 2 2
1 2 2 2 2 2 1 2 1 2
. . .
. . .
I want to remove the duplicate columns named with only 'a' and keep other as same
The expected o/p is:
a b c b b c c
0 2 2 1 1 1 2 2
1 2 2 2 1 2 1 2
Here is a general solution to drop any duplicates of a column, no matter where these columns are in the dataframe and what the content of these columns is.
First we get all column indexes for the given column name and drop the first occurrence. Then we "substract" these indexes from all indexes and return the remaining columns:
to_drop = 'a'
dup = [i for i,v in enumerate(df.columns) if v==to_drop][1:]
df = df.iloc[:, list(set(range(len(df.columns))) - set(dup))]
Result:
a b c b b c c
0 2 2 1 1 1 2 2
1 2 2 2 1 2 1 2
df = df.T.reset_index().drop_duplicates().set_index('index').T
del df.columns.name
Exp
since the column a has only dupe values, so we can simply transpose with reset index
df.T.reset_index()
index 0 1
0 a 2 2
1 b 2 2
2 c 1 2
3 b 1 1
4 b 1 2
5 c 2 1
6 c 2 2
Apply drop_duplicate on above df and only the dupes will get removed. It serves the purpose in those instances too where there are more than one column which has dupe value
Output
a b c b b c c
0 2 2 1 1 1 2 2
1 2 2 2 1 2 1 2

How to do intersection match between 2 DataFrames in Pandas?

Assume exists 2 DataFrames A and B like following
A:
a A
b B
c C
B:
1 2
3 4
How to produce C DataFrame like
a A 1 2
a A 3 4
b B 1 2
b B 3 4
c C 1 2
c C 3 4
Is there some function in Pandas can do this operation?
First all values has to be unique in each DataFrame.
I think you need product:
from itertools import product
A = pd.DataFrame({'a':list('abc')})
B = pd.DataFrame({'a':[1,2]})
C = pd.DataFrame(list(product(A['a'], B['a'])))
print (C)
0 1
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2
Pandas pure solutions with MultiIndex.from_product:
mux = pd.MultiIndex.from_product([A['a'], B['a']])
C = pd.DataFrame(mux.values.tolist())
print (C)
0 1
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2
C = mux.to_frame().reset_index(drop=True)
print (C)
0 1
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2
Solution with cross join with merge and column filled by same scalars by assign:
df = pd.merge(A.assign(tmp=1), B.assign(tmp=1), on='tmp').drop('tmp', 1)
df.columns = ['a','b']
print (df)
a b
0 a 1
1 a 2
2 b 1
3 b 2
4 c 1
5 c 2
EDIT:
A = pd.DataFrame({'a':list('abc'), 'b':list('ABC')})
B = pd.DataFrame({'a':[1,3], 'c':[2,4]})
print (A)
a b
0 a A
1 b B
2 c C
print (B)
a c
0 1 2
1 3 4
C = pd.merge(A.assign(tmp=1), B.assign(tmp=1), on='tmp').drop('tmp', 1)
C.columns = list('abcd')
print (C)
a b c d
0 a A 1 2
1 a A 3 4
2 b B 1 2
3 b B 3 4
4 c C 1 2
5 c C 3 4

compare two column of two dataframe pandas

I have 2 data frames like :
df_out:
a b c d
1 1 2 1
2 1 2 3
3 1 3 5
df_fin:
a e f g
1 0 2 1
2 5 2 3
3 1 3 5
5 2 4 6
7 3 2 5
I want to get result as :
a b c d a e f g
1 1 2 1 1 0 2 1
2 1 2 3 2 5 2 3
3 1 3 5 3 1 3 5
in the other word I have two diffrent data frames that are common in one column(a), I want two compare this two columns(df_fin.a and df_out.a) and select the rows from df_fin that have the same value in column a and create new dataframe that has selected rows from df_fin and added columns from df_out ?
I think you need merge with left join:
df = pd.merge(df_out, df_fin, on='a', how='left')
print (df)
a b c d e f g
0 1 1 2 1 0 2 1
1 2 1 2 3 5 2 3
2 3 1 3 5 1 3 5
EDIT:
df1 = df_fin[df_fin['a'].isin(df_out['a'])]
df2 = df_out.join(df1.set_index('a'), on='a')
print (df2)
a b c d e f g
0 1 1 2 1 0 2 1
1 2 1 2 3 5 2 3
2 3 1 3 5 1 3 5