Keep the DataFrame index name after appending, a list of Series? - pandas

I want to keep the name of the index of this DataFrame after appending a list of Series, as it is kept after appending them one at a time, but:
df = pd.DataFrame([[1,2],[3,4]],index = pd.Index(['a','b'],name='keepthisname'))
0 1
keepthisname
a 1 2
b 3 4
serc = pd.Series([5,6],name='c')
0 5
1 6
Name: c, dtype: int64
dfc = df.append(serc) # one at a time works
0 1
keepthisname
a 1 2
b 3 4
c 5 6
serd = pd.Series([7,8],name='d') # as further evidenced with this...
dfc.append(serd)
0 1
keepthisname
a 1 2
b 3 4
c 5 6
d 7 8
df.append([serc,serd]) # but this wipes out the name of the index
0 1
a 1 2
b 3 4
c 5 6
d 7 8

Related

Allotting unique identifier to a group of groups in pandas dataframe

Given a frame like this
import pandas as pd
df = pd.DataFrame({'A':[1,2,3,4,6,3,7,3,2,11,13,10,1,5],'B':[1,1,1,2,2,2,2,3,3,3,3,3,4,4],
'C':[1,1,1,1,1,1,1,2,2,2,2,2,3,3]})
I want to allot a unique identifier to multiple groups in column B. For example, going from top for every two groups allot a unique identifier as shown in red boxes in below image. The end result would look like below:
Currently I am doing like below but it seems to be over kill. It's taking too much time to update even 70,000 rows:
b_unique_cnt = df['B'].nunique()
the_list = list(range(1, b_unique_cnt+1))
slice_size = 2
list_of_slices = zip(*(iter(the_list),) * slice_size)
counter = 1
df['D'] = -1
for i in list_of_slices:
df.loc[df['B'].isin(i), 'D'] = counter
counter = counter + 1
df.head(15)
You could do
df['new'] = df.B.factorize()[0]//2+1
#(df.groupby(['B'], sort=False).ngroup()//2).add(1)
df
Out[153]:
A B C new
0 1 1 1 1
1 2 1 1 1
2 3 1 1 1
3 4 2 1 1
4 6 2 1 1
5 3 2 1 1
6 7 2 1 1
7 3 3 2 2
8 2 3 2 2
9 11 3 2 2
10 13 3 2 2
11 10 3 2 2
12 1 4 3 2
13 5 4 3 2

Maximum of calculated pandas column and 0

I have a very simple problem (I guess) but don't find the right syntax to do it :
The following Dataframe :
A B C
0 7 12 2
1 5 4 4
2 4 8 2
3 9 2 3
I need to create a new column D equal for each row to max (0 ; A-B+C)
I tried a np.maximum(df.A-df.B+df.C,0) but it doesn't match and give me the maximum value of the calculated column for each row (= 10 in the example).
Finally, I would like to obtain the DF below :
A B C D
0 7 12 2 0
1 5 4 4 5
2 4 8 2 0
3 9 2 3 10
Any help appreciated
Thanks
Let us try
df['D'] = df.eval('A-B+C').clip(lower=0)
Out[256]:
0 0
1 5
2 0
3 10
dtype: int64
You can use np.where:
s = df["A"]-df["B"]+df["C"]
df["D"] = np.where(s>0, s, 0) #or s.where(s>0, 0)
print (df)
A B C D
0 7 12 2 0
1 5 4 4 5
2 4 8 2 0
3 9 2 3 10
To do this in one line you can use apply to apply the maximum function to each row seperately.
In [19]: df['D'] = df.apply(lambda s: max(s['A'] - s['B'] + s['C'], 0), axis=1)
In [20]: df
Out[20]:
A B C D
0 0 0 0 0
1 5 4 4 5
2 0 0 0 0
3 9 2 3 10

How to concatenate a dictionary of pandas DataFrames into a signle DataFrame?

I have three DataFrames containing each a single row
dfA = pd.DataFrame( {'A':[3], 'B':[2], 'C':[1], 'D':[0]} )
dfB = pd.DataFrame( {'A':[9], 'B':[3], 'C':[5], 'D':[1]} )
dfC = pd.DataFrame( {'A':[3], 'B':[4], 'C':[7], 'D':[8]} )
for instance dfA is
A B C D
0 3 2 1 0
I organize them in a dictionary:
data = {'row_1': dfA, 'row_2': dfB, 'row_3': dfC}
I want to concatenate them into a single DataFrame
ans = pd.concat(data)
which returns
A B C D
row_1 0 3 2 1 0
row_2 0 9 3 5 1
row_3 0 3 4 7 8
whereas I want to obtain this
A B C D
row_1 3 2 1 0
row_2 9 3 5 1
row_3 3 4 7 8
That is to say I want to "drop" an index column.
How do I do this?
Use DataFrame.reset_index with second level and parameter drop=True:
df = ans.reset_index(level=1, drop=True)
print (df)
A B C D
row_1 3 2 1 0
row_2 9 3 5 1
row_3 3 4 7 8
You can reset index:
pd.concat(data).reset_index(level=-1,drop=True)
Output:
A B C D
row_1 3 2 1 0
row_2 9 3 5 1
row_3 3 4 7 8

Pandas change each group into a single row

I have a dataframe like the follows.
>>> data
target user data
0 A 1 0
1 A 1 0
2 A 1 1
3 A 2 0
4 A 2 1
5 B 1 1
6 B 1 1
7 B 1 0
8 B 2 0
9 B 2 0
10 B 2 1
You can see that each user may contribute multiple claims about a target. I want to only store each user's most frequent data for each target. For example, for the dataframe shown above, I want the result like follows.
>>> result
target user data
0 A 1 0
1 A 2 0
2 B 1 1
3 B 2 0
How to do this? And, can I do this using groupby? (my real dataframe is not sorted)
Thanks!
Using groupby with count create the helper key , then we using idxmax
df['helperkey']=df.groupby(['target','user','data']).data.transform('count')
df.groupby(['target','user']).helperkey.idxmax()
Out[10]:
target user
A 1 0
2 3
B 1 5
2 8
Name: helperkey, dtype: int64
df.loc[df.groupby(['target','user']).helperkey.idxmax()]
Out[11]:
target user data helperkey
0 A 1 0 2
3 A 2 0 1
5 B 1 1 2
8 B 2 0 2

Renaming column of one dataframe by extracting from combination of series and dataframe column names

In the line below, I am renaming the columns of pnlsummary dataframe from the column names of three series (totalheldmw, totalcost and totalsellprofit) and one dataframe (totalheldprofit).
The difficulty I have is to iterate over the column names of the dataframe. I have manually assigned the names as you can see below. I would suppose there is an efficient way of iterating over the column names of the dataframe. Please advice.
pnlsummary.columns =
[totalheldmw.name[0],totalcost.name[0],totalsellprofit.name[0],
totalheldprofit.columns[0],totalheldprofit.columns[1],
totalheldprofit.columns[2],totalheldprofit.columns[3]]
I think you need create list by constants and then add columns names converted to list:
pnlsummary.columns = [totalheldmw.name[0],totalcost.name[0],totalsellprofit.name[0]] +
totalheldprofit.columns[0:3].astype(str).tolist()
Sample:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
df.columns = ['a','s','d'] + df.columns[0:3].tolist()
print (df)
a s d A B C
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b