How to get pandas crosstab margins value? [duplicate] - pandas

This question already has answers here:
Panda .loc or .iloc to select the columns from a dataset
(2 answers)
How are iloc and loc different?
(6 answers)
Closed 4 years ago.
I got a set of crosstab dataframe which looks like below
Batch 1 2 3 4 All
Fruits
Orange 2 3 4 5 14
Mango 3 2 1 7 13
Grape 2 2 2 2 8
Apple 5 5 8 9 27
All 13 14 18 27 62
The 'All' column and row is generated by the pandas crosstab's margins parameter, so my question is that how can I get the 'All' data by column, which is 13, 14, 18 and 27?

Related

Pandas: Create a new column that alternate between values in two other columns [duplicate]

This question already has answers here:
Pandas Melt Function
(2 answers)
Closed 1 year ago.
How can I transform a Dataframe with columns S (start), E (end), V (value)
S E V
1 2 3
2 5 11
5 11 5
And transform it to:
T V
1 3
2 3
2 11
5 11
5 5
11 5
?
This is so that we can plot the data with in such a way the value V (y-axis) is the same throughout the interval.
Edit:
Some are suggesting this is the same as a "how do I use melt()?" question. However the order of the result is important.
Or with set_index/stack:
df = df.set_index('V').stack().reset_index(-1, drop =True).reset_index(name = 'T')
OUTPUT:
V T
0 3 1
1 3 2
2 11 2
3 11 5
4 5 5
5 5 11
Try with melt
df.melt('V')
Out[39]:
V variable value
0 3 S 1
1 11 S 2
2 5 S 5
3 3 E 2
4 11 E 5
5 5 E 11

How to split pandas data frame by repeating rows? [duplicate]

This question already has answers here:
Split pandas dataframe based on groupby
(4 answers)
Closed 1 year ago.
I have the df like this:
1 a 12
2 a 3
3 b 45
4 b 34
5 b 23
and I need to split it to two df like this:
1 a 12
2 a 3
and
3 b 45
4 b 34
5 b 23
Someone know any reasonable quick way?
Try with
d = {x : y for x , y in df.groupby('col')}

merge all columns in the first column after the last row

I have a tabular data like this one.
1 4 7
2 5 8
3 6 9
I would like data that look like this
1
2
3
4
5
6
7
8
9
Does anyone know how to use pandas to do this. (or maybe the keyword for this methodology to search for since I don't know how to properly call the procedure.)
Thank you in advance!
You can use numpy reshaping and pandas DataFrame constructor:
pd.DataFrame(df.values.reshape(-1,1, order='F'))
Output:
0
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9

Only sum rows with specific column value [duplicate]

This question already has answers here:
How do I sum values in a column that match a given condition using pandas?
(3 answers)
Closed 3 years ago.
I have a data frame that looks like this:
Index Measure Tom Harry Mary
0 A 10 5 9
1 B 4 4 8
2 A 11 5 7
3 B 2 3 6
4 A 8 5 5
5 B 4 7 5
6 A 10 5 4
7 B 5 5 3
I basically need it to sum the values for each person for the rows where Measure = A. So for Tom, it would be 39, Harry would be 20 & Mary would be 25.
Thanks in advance!
I figured it out!
used pd.pivot_table(df, index=['Measure'], aggfunc=np.sum)

How to apply a function to multiple columns in Pandas [duplicate]

This question already has answers here:
Selecting multiple columns in a Pandas dataframe
(22 answers)
Closed 4 years ago.
I have a bunch of columns which requires cleaning in Pandas. I've written a function which does that cleaning. I'm not sure how to apply the same function to many columns. Here is what I'm trying:
df["Passengers", "Revenue", "Cost"].apply(convert_dash_comma_into_float)
But I'm getting KeyError.
Use double brackets [[]] as #chrisz points out:
Here is a MVCE:
df = pd.DataFrame(np.arange(30).reshape(10,-1),columns=['A','B','C'])
def f(x):
#Clean even numbers from columns.
return x.mask(x%2==0,0)
df[['B','C']] = df[['B','C']].apply(f)
print(df)
Output
A B C
0 0 1 0
1 3 0 5
2 6 7 0
3 9 0 11
4 12 13 0
5 15 0 17
6 18 19 0
7 21 0 23
8 24 25 0
9 27 0 29
​