if use to_frame() the column name seems not in the same row
This is my code, it groups the "型号"(which means type), and get the sum of the "重量"(weight) and exclude the column("是否发送") with a value in it.
import pandas as pd
import numpy as np
import sys
import os
script_dir = os.path.dirname(os.path.abspath(__file__))
os.chdir(script_dir ) # change to the path that you already know
ClientName = sys.argv[1]
except :
df = pd.read_excel("Summary.xlsm")
df = df[df['客户'].str.contains(ClientName)][pd.isnull(df[u"是否已经发送"])].groupby([ u'型号'])[u'重量'].sum()
print('[CQ:face,id=21] ' + '*' * 10 + u'以下是' + ClientName + u'未发送的重量' + '*' * 10 + '[CQ:face,id=21]')
Output is this :
[CQ:face,id=21] **********以下是KATUN未发送的重量**********[CQ:face,id=
型号 (****the column name is missing here*****)
HG-R2075 2040
HG220 680
Name: 重量, dtype: int64
I don't know why the column name is missing?
The output I want is this: how to make it?
型号 重量
HG-R2075 2040
HG220 680
Name: 重量, dtype: int64

The result df of your groupby operation is actually a Series, not a DataFrame. That's why it is printed with a different format.
print(df.to_frame()) should to the trick.
EDIT: Actually in such a dataframe index name and column name will not be printed on the same row. To get a cleaner output, use reset_index to get 2 proper columns:

First use boolean indexing with chaining by &.
If need 2 column DataFrame add as_index=False or Series.reset_index:
mask = df['客户'].str.contains(ClientName) & df[u"是否已经发送"].isnull()
df = df[mask].groupby([ u'型号'], as_index=False)[u'重量'].sum()
df = df[mask].groupby([ u'型号'])[u'重量'].sum().reset_index()
For one column DataFrame use Series.to_frame - first column is index:
df = df[mask].groupby([ u'型号'])[u'重量'].sum().to_frame()
N = 10
df = pd.DataFrame({'客户':np.random.choice(list('abc'), size=N),
u"是否已经发送":np.random.choice([np.nan,0], size=N),
u'型号':np.random.randint(2, size=N),
u'重量':np.random.randint(10, size=N)})
print (df)
型号 客户 是否已经发送 重量
0 0 a 0.0 4
1 0 a 0.0 0
2 1 b NaN 8
3 1 b NaN 5
4 1 c 0.0 6
5 1 a NaN 3
6 1 a NaN 3
7 1 b 0.0 4
8 0 a NaN 2
9 1 c NaN 8
ClientName = 'a'
mask = df['客户'].str.contains(ClientName) & df[u"是否已经发送"].isnull()
df1 = df[mask].groupby([ u'型号'], as_index=False)[u'重量'].sum()
型号 重量
0 0 2
1 1 6
df1 = df[mask].groupby([ u'型号'])[u'重量'].sum().reset_index()
型号 重量
0 0 2
1 1 6
df2 = df[mask].groupby([ u'型号'])[u'重量'].sum().to_frame()
print (df2)
0 2
1 6


df.diff() how to compare row A with last row B

If we have two columns A and B
how to compare row A with last row B
A B diff
0 0.904560 0.208318 0
1 0.679290 0.747496 0
2 0.069841 0.165834 0
3 0.045818 0.907888 0
4 0.485712 0.593785 0
5 0.771665 0.800182 0
6 0.485041 0.024829 0
7 0.897172 0.584406 0
8 0.561953 0.626699 0
9 0.412803 0.900643 0
You can use Pandas' shift function to create a 'lagged' version of column B. Then it's a simple difference between columns.
from io import StringIO
import pandas as pd
raw = '''
df = pd.read_csv(StringIO(raw))
df['B_lag'] = df.B.shift(1)
df['diff'] = df.A - df.B_lag
Output looks like
A B B_lag diff
0 0.904560 0.208318 NaN NaN
1 0.679290 0.747496 0.208318 0.470972
2 0.069841 0.165834 0.747496 -0.677655
3 0.045818 0.907888 0.165834 -0.120016
4 0.485712 0.593785 0.907888 -0.422176
5 0.771665 0.800182 0.593785 0.177880
6 0.485041 0.024829 0.800182 -0.315141
7 0.897172 0.584406 0.024829 0.872343
8 0.561953 0.626699 0.584406 -0.022453
9 0.412803 0.900643 0.626699 -0.213896

Replace cell values in df based on complex condition

Hello friends,
I would like to iterate trough all the numeric columns in the df (in a generic way).
For each unique df["Type"] group in each numeric column:
Replace all values that are greater than each column mean + 2 standard
deviation values with "nan"
df = pd.DataFrame(data=d)
df = pd.DataFrame(data=d)
Sample df:
PRODUCT Test1 Test2 Type
A 7 99 Y
B 1 10 X
C 2 13 X
A 5 12 Y
B 1 11 Y
C 90 87 X
Expected output:
RODUCT Test1 Test2 Type
A 7 nan Y
B 1 10 X
C 2 13 X
A 5 12 Y
B 1 11 Y
C nan nan X
Logically, it can go like this:
test_cols = ['Test1', 'Test2']
# calculate mean and std with groupby
groups = df.groupby('Type')
test_mean = groups[test_cols].transform('mean')
test_std = groups[test_cols].transform('std')
# threshold
thresh = test_mean + 2 * test_std
# thresholding
df[test_cols] = np.where(df[test_cols]>thresh, np.nan, df[test_cols])
However, from your sample data set, thresh is:
Test1 Test2
0 10.443434 141.707912
1 133.195890 123.898159
2 133.195890 123.898159
3 10.443434 141.707912
4 10.443434 141.707912
5 133.195890 123.898159
So, it wouldn't change anything.
You can get this through a groupby and transform:
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['Product'] = ['A', 'B', 'C', 'A', 'B', 'C']
df = df.set_index('Product')
def nan_out_values(type_df):
type_df[type_df > type_df.mean() + 2*type_df.std()] = np.nan
return type_df
df[['Test1', 'Test2']] = df.groupby('Type').transform(nan_out_values)

How to add a new row to pandas dataframe with non-unique multi-index

df = pd.DataFrame(np.arange(4*3).reshape(4,3), index=[['a','a','b','b'],[1,2,1,2]], columns=list('xyz'))
where df looks like:
Now I add a new row by:
Then df becomes:
Now I want to do the same but with a different df that has non-unique multi-index:
df = pd.DataFrame(np.arange(4*3).reshape(4,3), index=[['a','a','b','b'],[1,1,2,2]], columns=list('xyz'))
,which looks like:
and call
The result is "Exception: cannot handle a non-unique multi-index!"
How could I achieve the goal?
Use append or concat with helper DataFrame:
df1 = pd.DataFrame([[0,0,0]],
index=pd.MultiIndex.from_arrays([['new'], ['']]))
df2 = df.append(df1)
df2 = pd.concat([df, df1])
print (df2)
x y z
a 1 0 1 2
1 3 4 5
b 2 6 7 8
2 9 10 11
new 0 0 0