Pivoting data without column - pandas

Starting from an imported df from excel like that:
Code
Material
Text
QTY
A1
X222
Model3
1
A2
4027721
Gruoup1
1
A2
4647273
Gruoup1.1
4
A1
573828
Gruoup1.2
1
I want to create a new pivot table like that:
Code
Qty
A1
2
A2
5
I tried with the following command but they do not work:
df.pivot(index='Code', columns='',values='Qty')
df_pivot = df ("Code").Qty([sum, max])

You don't need pivot but groupby:
out = df.groupby('Code', as_index=False)['QTY'].sum()
# Or
out = df.groupby('Code')['QTY'].agg(['sum', 'max']).reset_index()
Output:
>>> out
Code sum max
0 A1 2 1
1 A2 5 4
The equivalent code with pivot_table:
out = (df.pivot_table('QTY', 'Code', aggfunc=['sum', 'max'])
.droplevel(1, axis=1).reset_index())

Related

replacing first row of selected column from each group with 0

Existing df :
Id status value
A1 clear 23
A1 in-process 50
A1 done 20
B1 start 2
B1 end 30
Expected df :
Id status value
A1 clear 0
A1 in-process 50
A1 done 20
B1 start 0
B1 end 30
looking to replace first value of each group with 0
Use Series.duplicated for duplicated values, set first duplicate by inverse mask by ~ with DataFrame.loc:
df.loc[~df['Id'].duplicated(), 'value'] = 0
print (df)
Id status value
0 A1 clear 0
1 A1 in-process 50
2 A1 done 20
3 B1 start 0
4 B1 end 30
One approach could be as follows:
Compare the values for each row in df.Id with the next row, combining Series.shift with Series.ne. This will return a boolean Series with True for each first row of a new Id value.
Next, use df.loc to select only rows with True for column value and assign 0.
df.loc[df.Id.ne(df.Id.shift()), 'value'] = 0
print(df)
Id status value
0 A1 clear 0
1 A1 in-process 50
2 A1 done 20
3 B1 start 0
4 B1 end 30
N.B. this approach assumes that the "groups" in Id are sorted (as they seem to be, indeed). If this is not the case, you could use df.sort_values('Id', inplace=True) first, but if that is necessary, the answer by #jezrael will be faster, surely.
df1.mask(~df1.Id.duplicated(),0)

append lists of different length to dataframe pandas

Consider I have multiple lists
A = ['acc_num=1', 'A1', 'A2']
B = ['acc_num=2', 'B1', 'B2', 'B3','B4']
C = ['acc_num=3', 'C1']
How to I put them in dataframe to export to excel as:
acc_num _1 _2 _3 _4
_1 1 A1 A2
_2 2 B1 B2 B3 B4
_3 3 C1
Hi here is a solution for you in 3 basic steps:
Create a DataFrame just by passing a list of your lists
Manipulate the acc_num column and remove the starting string "acc_num=" this is done with a string method on the vectorized column (but that goes maybe to far for now)
Rename the Column Header / Names as you wish by passing a dictionary {} to the df.rename
The Code:
# Create a Dataframe from your lists
df = pd.DataFrame([A,B,C])
# Change Column 0 and remove initial string
df[0] = df[0].str.replace('acc_num=','')
# Change the name of Column 0
df.rename(columns={0:"acc_num"},inplace=True)
Final result:
Out[26]:
acc_num 1 2 3 4
0 1 A1 A2 None None
1 2 B1 B2 B3 B4
2 3 C1 None None None

How to groupby in pandas where i have column values starting with similar letters

Suppose i have a column with values(not column name) L1 xyy, L2 yyy, L3 abc, now i want to group L1, L2 and L3 as L(or any other name also would do).
Similarly i have other values like A1 xxx, A2 xxx, to be grouped form A and so on for other alphabets.
How do i achieve this in pandas?
I have L1, A1 and so on all in same column, and not different columns.
Use indexing by str[0] for return first letter of column and then aggregate some function, e.g. sum:
df = pd.DataFrame({'col':['L1 xyy','L2 yyy','L3 abc','A1 xxx','A2 xxx'],
'val':[2,3,5,1,2]})
print (df)
col val
0 L1 xyy 2
1 L2 yyy 3
2 L3 abc 5
3 A1 xxx 1
4 A2 xxx 2
df1 = df.groupby(df['col'].str[0])['val'].sum().reset_index(name='new')
print (df1)
col new
0 A 3
1 L 10
If need new column by first value:
df['new'] = df['col'].str[0]
print (df)
col val new
0 L1 xyy 2 L
1 L2 yyy 3 L
2 L3 abc 5 L
3 A1 xxx 1 A
4 A2 xxx 2 A

Split a dataframe into multiple dataframes

I have a dataframe; I split it using groupby. I understand this splits the dataframes into multiple dataframes. How can I get back those individual dataframes , based on the groups and name them accordingly? So if said df.groupby(['A','B'])
and A has values A1, and B has values B1-B4, I want to get back those 4 dataframes callefdf_A1B1..df_A1B1, df_A1B2...df_A1B4?
This can be done by locals but not recommend
variables = locals()
for i,j in df.groupby(['A','B']):
variables["df_{0[0]}{0[1]}".format(i)] = j
df_01
Out[332]:
A B C
0 0 1 a-1524112-124
Using dict is the right way
{"df_{0[0]}{0[1]}".format(i) : j for i,j in df.groupby(['A','B'])}
Offering an alternate solution, using pandas.DataFrame.xs and some exec magic -
df = pd.DataFrame({'A': ['a1', 'a2']*4,
'B': ['b1', 'b2', 'b3', 'b4']*2,
'val': [i for i in range(8)]
})
df
# A B val
# 0 a1 b1 0
# 1 a2 b2 1
# 2 a1 b3 2
# 3 a2 b4 3
# 4 a1 b1 4
# 5 a2 b2 5
# 6 a1 b3 6
# 7 a2 b4 7
for i in df.set_index(['A', 'B']).index.unique().tolist():
exec("df_{}{}".format(i[0], i[1]) + " = df.set_index(['A','B']).xs(i)")
df_a1b1
# val
# A B
# a1 b1 0
# b1 4

pandas dataframe group by and agg

I am new to ipython and I am trying to do something with dataframe grouping . I have a dataframe like below
df_test = pd.DataFrame({"A": range(4), "B": ["B1", "B2", "B1", "B2"], "C": ["C1", "C1", np.nan, "C2"]})
df_test
A B C
0 0 B1 C1
1 1 B2 C1
2 2 B1 NaN
3 3 B2 C2
I would like to achieve following things:
1) group by B but creating multilevel column instead of grouped to rows with B1 and B2 as index, B1 and B2 are basically count
2) column A and C are agg function applied with something like {'C':['count'],'A':['sum']}
B
A B1 B2 C
0 6 2 2 3
how ? Thanks
You are doing separate actions to each column. You can hack this by aggregating A and C and then taking the value counts of B separately and then combine the data back together.
ac = df_test.agg({'A':'sum', 'C':'count'})
b = df_test['B'].value_counts()
pd.concat([ac, b]).sort_index().to_frame().T
A B1 B2 C
0 6 2 2 3