Pandas Pivot table get max with column name - pandas

I have the following pivot table
I want to get the max value from each row, but also, I need to get the column it came from.
So far I know who to get the max row of every column using this:
dff['State'] = stateRace.max(axis=1)
dff
I get this:
which is returning the correct max value but not the column it came from.

You suffer a disadvantage getting help because you have supplied images and the question is not clear. Happy to help if the below answer doesn't help.
stateRace=stateRace.assign(max_value=stateRace.select_dtypes(exclude='object').max(axis=1),\
max_column=stateRace.select_dtypes(exclude='object').idxmax(axis=1))

Related

Forward fill in spark SQL based on column value condition

Please can someone help me how to forward fill values in a case statement based on another column value in SPARK SQL.
I am basically trying to detect outliers in the SQL dataset and so far how I have identified these outliers is identifying standard deviation of a value far from the mean of the dataset.
Now the problem statement is wherever these outliers fall, I have to fill the value in a new column the value which was last valid/authentic.
For example: after 1 in the first column, I want to append 556 in third column and for 3 in the first column, I want to append 561 in the third column
So far, I have identified the outliers and based on the value, I am guessing I can use lag function and go back 1 row. But I also know, this is not a good approach. For example, I get 10 outliers in a sequence, I will have to write 10 CASE statement for that.
Please if someone have any better/efficient approach, please help.

Looping/Iterating a table query in Bigquery

I am using BigQuery and I don´t know how to loop a table that is in a database here. For example, lets suppose we have schema_A.tableA with the following information
Table A
Originally the TableA.columnA holds the information for the rest row. The columnE is the calculation of the other three columns. But what I am looking for is to iterate/loop in a column the result coming from E (LAG(columnE)) and generate the calculation for the second row. The third row would take the result of columnE_2row and so on.
The desired output is like this :
For example the 2 row- columnA is using 500 because the result of the previous row is 500. In the third row is 300 because that was the result of columnE_row2 and so on. I don´t know how looping works in BigQuery, I would really appreciate your knowledge
Please help!!!
So far, I read some threads but none of them shows how to set a variable from a query, all are loops from 0. https://towardsdatascience.com/loops-in-bigquery-db137e128d2d

Trying to get the max value from an INDEX MATCH query

I have tried each of the following formulas to get the highest number when I have a duplicate record. They both give me what appears to be the same output, but I know of at least one ID# where the response for both is "6" when I am expecting "7".
F2 = ID# to look for
PSStatus!$A = Hour; data ranges from 1 to 7; column is formatted as a number.
PSStatus!$F = ID#s
=INDEX(QUERY(PSStatus!$A$2:$A,,), MATCH(MAX($F2), (QUERY(PSStatus!$F$2:$F,,)),0))
=MAX(INDEX(QUERY(PSStatus!$A$2:$A,,), MATCH($F2, (QUERY(PSStatus!$F$2:$F,,)),0)))
I'll assume that your data table looks like the following one. Please, forgive me if I am mistaken.
If my assumption is correct, you can use the formula =MAX(FILTER({DATA TABLE RANGE}, {ID COLUMN FROM DATA TABLE}={ID})). That formula will first use FILTER to pick only the requested ID and then MAX will pick the highest one. In my example above, the formula should be =MAX(FILTER(Sheet1!$A$2:$B$26, Sheet1!$A$2:$A$26=A2)) for the first row. This is the end result:
Please, ask me anything if you need further help.

how can I summarize different columns to make totals by row?

how can I summarize different columns to make totals by row?
on the picture below you can see my statement, definitely is something wrong there because is returning NULL value, but I don't know what it is. I want to create a TOTAL column summarizing WOSE, WO, SSSE and SS per row. Could someone help me with that?
It is because of null values in the columns -Use the following instead -
SUM(COALESCE(WOSE,0) +COALESCE(WO,0) + COALESCE(SSSE,0)+COALESCE(SS,0))

Percent of Group, not total

It seems like there are a lot of answers out there but I can't seem to relate it to my specific issue. I want to get the breakdown of yes/no for the specific Group. Not get the percent of the yes for the entire population of data.
I have tried the following code in the "What I'm Getting" % of Total cell =
=FormatPercent(Count(Fields!SessionID.Value)/Count((Fields!SessionID.Value), "Tablix1"),)
=FormatPercent(Count(Fields!Value.Value)/Count((Fields!SessionID.Value), "Value"),)
It should just be a case of changing the Scope in your expression to make sure the denominator is the total for the group, not the entire Dataset or Tablix, i.e. something like:
=Count(Fields!SessionID.Value) / Count(Fields!SessionID.Value, "MyGroup")
Where MyGroup is the name of the group, i.e. something like:
If this is still not clear, your best option would be to add a few sample rows, and your desired result for these, to the question so we can replicate your exact issue.
Edit after more info added
Thanks for adding more details. I have created a Dataset based on your example:
And I've created a table based on this:
The group is based on the Group field:
The Group % expression is:
=Fields!YesNoCount.Value / Sum(Fields!YesNoCount.Value, "MyGroup")
This is taking the YesNoCount value of each row and comparing it to the total YesNoCount value in that particular group (i.e. the MyGroup scope).
Note that I'm using Sum here, not Count as in your example expression - that seems to be the appropriate aggregate for your data and the required value.
Results look OK to me: