Change a Pandas DataFrame with Integer Index - pandas

I have converted a Python dict to pandas dataframe:
dict = {
u'erterreherh':
{
u'account': u'rgrgrgrg',
u'data': u'192.168.1.1',
},
u'hkghkghkghk':
{
u'account': u'uououopuopuop',
u'data': '192.168.1.170',
},
}
df = pd.DataFrame.from_dict(dict, orient='index')
account data
aa bbss
zz sssss
vv sss
"account" is index here. I want to dataframe like below, how can I do this?
account data
0 aa bbss
1 zz sssss
2 vv ss

You need rename_axis for change index name and last reset_index:
d = {
u'erterreherh':
{
u'account': u'rgrgrgrg',
u'data': u'192.168.1.1'
},
u'hkghkghkghk':
{
u'account': u'uououopuopuop',
u'data': '192.168.1.170'
}
}
df = pd.DataFrame.from_dict(d, orient='index')
df = df.rename_axis('acount1').reset_index()
print (df)
acount1 data account
0 erterreherh 192.168.1.1 rgrgrgrg
1 hkghkghkghk 192.168.1.170 uououopuopuop
If need overwrite column account by values from index:
df = df.assign(account=df.index).reset_index(drop=True)
print (df)
data account
0 192.168.1.1 erterreherh
1 192.168.1.170 hkghkghkghk

df.reset_index() is indeed working for me.
df
data
account
aa bbss
zz sssss
vv sss
df = df.reset_index()
account data
0 aa bbss
1 zz sssss
2 vv sss

Related

How to update a pandas column

Given the following dataframe
col1 col2
1 ('A->B', 'B->C')
2 ('A->D', 'D->C', 'C->F')
3 ('A->K', 'K->M', 'M->P')
...
I want to convert this to the following format
col1 col2
1 'A-B-C'
2 'A-D-C-F'
3 'A-K-M-P'
...
Each sequence shows an arc within a path. Hence, the sequence is like (a,b), (b,c), (c,d) ...
def merge_values(val):
val = [x.split('->') for x in val]
out = []
for char in val:
out.append(char[0])
out.append(val[-1][1])
return '-'.join(out)
df['col2'] = df['col2'].apply(merge_values)
print(df)
Output:
col1 col2
0 1 A-B-C
1 2 A-D-C-F
2 3 A-K-M-P
Given
df = pd.DataFrame({
'col1': [1, 2, 3],
'col2': [
('A->B', 'B->C'),
('A->D', 'D->C', 'C->F'),
('A->K', 'K->M', 'M->P'),
],
})
You can do:
def combine(t, old_sep='->', new_sep='-'):
if not t: return ''
if type(t) == str: t = [t]
tokens = [x.partition(old_sep)[0] for x in t]
tokens += t[-1].partition(old_sep)[-1]
return new_sep.join(tokens)
df['col2'] = df['col2'].apply(combine)

python pandas divide dataframe in method chain

I want to divide a dataframe by a number:
df = df/10
Is there a way to do this in a method chain?
# idea:
df = df.filter(['a','b']).query("a>100").assign(**divide by 10)
We can use DataFrame.div here:
df = df[['a','b']].query("a>100").div(10)
a b
0 40.0 0.7
1 50.0 0.8
5 70.0 0.3
Use DataFrame.pipe with lambda function for use some function for all data of DataFrame:
df = pd.DataFrame({
'a':[400,500,40,50,5,700],
'b':[7,8,9,4,2,3],
'c':[1,3,5,7,1,0],
'd':[5,3,6,9,2,4]
})
df = df.filter(['a','b']).query("a>100").pipe(lambda x: x / 10)
print (df)
a b
0 40.0 0.7
1 50.0 0.8
5 70.0 0.3
Here if use apply all columns are divided separately:
df = df.filter(['a','b']).query("a>100").apply(lambda x: x / 10)
You can see difference with print:
df1 = df.filter(['a','b']).query("a>100").pipe(lambda x: print (x))
a b
0 400 7
1 500 8
5 700 3
df2 = df.filter(['a','b']).query("a>100").apply(lambda x: print (x))
0 400
1 500
5 700
Name: a, dtype: int64
0 7
1 8
5 3
Name: b, dtype: int64

Pandas add a summary column that counts values that are not empty strings

I have a table that looks like this:
A B C
1 foo
2 foobar blah
3
I want to count up the non empty columns from A, B and C to get a summary column like this:
A B C sum
1 foo 1
2 foobar blah 2
3 0
Here is how I'm trying to do it:
import pandas as pd
df = { 'A' : ["foo", "foobar", ""],
'B' : ["", "blah", ""],
'C' : ["","",""]}
df = pd.DataFrame(df)
print(df)
df['sum'] = df[['A', 'B', 'C']].notnull().sum(axis=1)
df['sum'] = (df[['A', 'B', 'C']] != "").sum(axis=1)
These last two lines are different ways to get what I want but they aren't working. Any suggestions?
df['sum'] = (df[['A', 'B', 'C']] != "").sum(axis=1)
Worked. Thanks for the assistance.
This one-liner worked for me :)
df["sum"] = df.replace("", np.nan).T.count().reset_index().iloc[:,1]

How to replace pd.NamedAgg to a code compliant with pandas 0.24.2?

Hello I am obliged to downgrade Pandas versioon to '0.24.2'
As a result, the function pd.NamedAgg is not recognizable anymore.
import pandas as pd
import numpy as np
agg_cols = ['A', 'B', 'C']
agg_df = df.groupby(agg_cols).agg(
max_foo=pd.NamedAgg(column='Foo', aggfunc=np.max),
min_foo=pd.NamedAgg(column='Foo', aggfunc=np.min)
).reset_index()
Can you help me please change my code to make it compliant with the version 0.24.2??
Thank you a lot.
Sample:
agg_df = df.groupby(agg_cols)['Foo'].agg(
[('max_foo', np.max),('min_foo', np.min)]
).reset_index()
df = pd.DataFrame({
'A':list('a')*6,
'B':[4,5,4,5,5,4],
'C':[7]*6,
'Foo':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')
})
agg_cols = ['A', 'B', 'C']
agg_df = df.groupby(agg_cols).agg(
max_foo=pd.NamedAgg(column='Foo', aggfunc=np.max),
min_foo=pd.NamedAgg(column='Foo', aggfunc=np.min)
).reset_index()
print (agg_df)
A B C max_foo min_foo
0 a 4 7 5 0
1 a 5 7 7 1
Because there is only one column Foo for processing add column Foo after groupby and pass tuples with new columns names with aggregate functions:
agg_df = df.groupby(agg_cols)['Foo'].agg(
[('max_foo', np.max),('min_foo', np.min)]
).reset_index()
print (agg_df)
A B C max_foo min_foo
0 a 4 7 5 0
1 a 5 7 7 1
Another idea is pass dictionary of lists of aggregate functions:
agg_df = df.groupby(agg_cols).agg({'Foo':['max', 'min']})
agg_df.columns = [f'{b}_{a}' for a, b in agg_df.columns]
agg_df = agg_df.reset_index()
print (agg_df)
A B C max_foo min_foo
0 a 4 7 5 0
1 a 5 7 7 1

Append new columns to a pandas dataframe in a groupby object

I would like to add columns to a pandas dataframe in a groupby object
# create the dataframe
idx = ['a','b','c'] * 10
df = pd.DataFrame({
'f1' : np.random.randn(30),
'f2' : np.random.randn(30),
'f3' : np.random.randn(30),
'f4' : np.random.randn(30),
'f5' : np.random.randn(30)},
index = idx)
colnum = [1,2,3,4,5]
newcol = ['a' + str(s) for s in colnum]
# group by the index
df1 = df.groupby(df.index)
Trying to loop over each group in the groupby object and add new columns to the current dataframe in the group
for group in df1:
tmp = group[1]
for s in range(len(tmp.columns)):
print(s)
tmp.loc[:,newcol[s]] = tmp[[tmp.columns[s]]] * colnum[s]
group[1] = tmp
I'm unable to add the new dataframe to the group object
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
TypeError: 'tuple' object does not support item assignment
Is there a way to replace the dataframe in the groupby object with a new dataframe ?
Base on your code: (PS: df.mul([1,2,3,4,5]) work for you example out put)
grouplist=[]
for _,group in df1:
tmp = group
for s in range(len(tmp.columns)):
print(s)
tmp.loc[:,newcol[s]] = tmp[[tmp.columns[s]]] * colnum[s]
grouplist.append(tmp)
grouplist[1]
Out[217]:
f1 f2 f3 f4 f5 a1 a2 \
b -0.262064 -1.148832 -1.835077 -0.244675 -0.215145 -0.262064 -2.297664
b -1.595659 -0.448111 -0.908683 -0.157839 0.208497 -1.595659 -0.896222
b 0.373039 -0.557571 1.154175 -0.172326 1.236915 0.373039 -1.115142
b -1.485564 1.508292 0.420220 -0.380387 -0.725848 -1.485564 3.016584
b -0.760250 -0.380997 -0.774745 -0.853975 0.041411 -0.760250 -0.761994
b 0.600410 1.822984 -0.310327 -0.281853 0.458621 0.600410 3.645968
b -0.707724 1.706709 -0.208969 -1.696045 -1.644065 -0.707724 3.413417
b -0.892057 1.225944 -1.027265 -1.519110 -0.861458 -0.892057 2.451888
b -0.454419 -1.989300 2.241945 -1.071738 -0.905364 -0.454419 -3.978601
b 1.171569 -0.827023 -0.404192 -1.495059 0.500045 1.171569 -1.654046
a3 a4 a5
b -5.505230 -0.978700 -1.075727
b -2.726048 -0.631355 1.042483
b 3.462526 -0.689306 6.184576
b 1.260661 -1.521547 -3.629239
b -2.324236 -3.415901 0.207056
b -0.930980 -1.127412 2.293105
b -0.626908 -6.784181 -8.220324
b -3.081796 -6.076439 -4.307289
b 6.725834 -4.286954 -4.526821
b -1.212577 -5.980235 2.500226