Only sum rows with specific column value [duplicate] - pandas

This question already has answers here:
How do I sum values in a column that match a given condition using pandas?
(3 answers)
Closed 3 years ago.
I have a data frame that looks like this:
Index Measure Tom Harry Mary
0 A 10 5 9
1 B 4 4 8
2 A 11 5 7
3 B 2 3 6
4 A 8 5 5
5 B 4 7 5
6 A 10 5 4
7 B 5 5 3
I basically need it to sum the values for each person for the rows where Measure = A. So for Tom, it would be 39, Harry would be 20 & Mary would be 25.
Thanks in advance!

I figured it out!
used pd.pivot_table(df, index=['Measure'], aggfunc=np.sum)

Related

Pandas Groupby Problems with Calculating Column-Wise Quantiles with "quantile"

i need to compute quantiles for a large DF across columns or column-wise along rows or "months" in my case. Apparently, the quantile function applied on just a df works using the key word "axis" but if you try and apply quantile using a groupby, it is rejected with an error:
TypeError: quantile() got an unexpected keyword argument 'axis'
Here is the situation that the quantile works with data like this:
Num Num Num Quantile 0.5
5 6 4 5
4 1 2 2
3 9 7 7
7 2 8 7
5 5 4 5
But, if I add more columns with a groupby statement to find the same quantile(0.5, axis=1), then I get the error shown above. Please help and thank you. My actual data looks like this below:
site month Num Num Num Quantile 0.5
0 A 8 5 6 4 5
1 A 9 4 1 2 2
2 A 10 3 9 7 7
3 A 11 7 2 8 7
4 A 12 5 5 4 5
5 B 8 3 7 5 5
6 B 9 6 9 0 6
7 B 10 4 1 3 3
8 B 11 8 3 0 3
9 B 12 5 6 8 6
The confusion arises from the fact that pd.DataFrame.quantile and DataFrameGroupBy.quantile are not the same functions. The first one has an axis parameter, the second one does not. Hence the error.
When you think about it, it is perfectly logical that the second function does not have this option. Suppose we do:
groups = df.groupby('site')
for group in groups:
print(group[1])
site month Num Num.1 Num.2
0 A 8 5 6 4
1 A 9 4 1 2
2 A 10 3 9 7
3 A 11 7 2 8
4 A 12 5 5 4
site month Num Num.1 Num.2
5 B 8 3 7 5
6 B 9 6 9 0
7 B 10 4 1 3
8 B 11 8 3 0
9 B 12 5 6 8
Now ask yourself the question which axis could generate a qauntile that is meaningfully related to A | B. The answer surely is column-wise. I could get a quantile of Num for A, or Num.1. E.g.:
print(groups.quantile())
month Num Num.1 Num.2
site
A 10.0 5.0 5.0 4.0
B 10.0 5.0 6.0 3.0
It wouldn't make sense to say, let's get the quantile row-wise for A at row 0 (and pretend that this has anything to do with A as a grouped value as distinct from B). Indeed, you don't need a groupby for that at all.
Sidenote: you will have noticed that your columns Num, Num, Num have turned into Num, Num.1, Num.2 in my examples. This conversion takes place automatically when you read from the clipboard (pd.read_clipboard). In general, having multiple columns with duplicate names is very bad practice and might get you into all sorts of problems with various operators. So, I strongly advice you to rename them.

Pandas: Create a new column that alternate between values in two other columns [duplicate]

This question already has answers here:
Pandas Melt Function
(2 answers)
Closed 1 year ago.
How can I transform a Dataframe with columns S (start), E (end), V (value)
S E V
1 2 3
2 5 11
5 11 5
And transform it to:
T V
1 3
2 3
2 11
5 11
5 5
11 5
?
This is so that we can plot the data with in such a way the value V (y-axis) is the same throughout the interval.
Edit:
Some are suggesting this is the same as a "how do I use melt()?" question. However the order of the result is important.
Or with set_index/stack:
df = df.set_index('V').stack().reset_index(-1, drop =True).reset_index(name = 'T')
OUTPUT:
V T
0 3 1
1 3 2
2 11 2
3 11 5
4 5 5
5 5 11
Try with melt
df.melt('V')
Out[39]:
V variable value
0 3 S 1
1 11 S 2
2 5 S 5
3 3 E 2
4 11 E 5
5 5 E 11

merge all columns in the first column after the last row

I have a tabular data like this one.
1 4 7
2 5 8
3 6 9
I would like data that look like this
1
2
3
4
5
6
7
8
9
Does anyone know how to use pandas to do this. (or maybe the keyword for this methodology to search for since I don't know how to properly call the procedure.)
Thank you in advance!
You can use numpy reshaping and pandas DataFrame constructor:
pd.DataFrame(df.values.reshape(-1,1, order='F'))
Output:
0
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9

Select rows that fulfill conditions

I need to write a query that will get a name of the company and shows only 2 rows and after that will check the next company and shows another 2 rows. Let's say that df looks like the one below:
x y y name
1 2 3 ammazon
4 5 6 ammazon
7 8 9 ammazon
9 8 7 google
6 5 4 google
3 2 1 google
So result should be like that:
x y y name
1 2 3 ammazon
4 5 6 ammazon
9 8 7 google
6 5 4 google
I tried to use sql query but couldn't write the correct one. Could you help ? Or perhaps "for loop" would be better solution... anything
Thanks All !
groupby with head
df.groupby('name').head(2)
x y y name
0 1 2 3 ammazon
1 4 5 6 ammazon
3 9 8 7 google
4 6 5 4 google

How to get pandas crosstab margins value? [duplicate]

This question already has answers here:
Panda .loc or .iloc to select the columns from a dataset
(2 answers)
How are iloc and loc different?
(6 answers)
Closed 4 years ago.
I got a set of crosstab dataframe which looks like below
Batch 1 2 3 4 All
Fruits
Orange 2 3 4 5 14
Mango 3 2 1 7 13
Grape 2 2 2 2 8
Apple 5 5 8 9 27
All 13 14 18 27 62
The 'All' column and row is generated by the pandas crosstab's margins parameter, so my question is that how can I get the 'All' data by column, which is 13, 14, 18 and 27?