How to change rows in pandas based on an attribute of the other rows - pandas

I have a dataframe with columns: A(continuous variable) and B(discrete 1 or 0). The df is initially sorted by A variable.
I need to order the dataframe so for each set of X rows, there are Y rows with value 1 in B column, and (X-Y) rows with 0 (B column) (when possible!). But these sets should have variable A in desceding order. X and Y are input by the user
Example:
X=4, Y=3
Rows 0-11 are ok, since the sets (0-3),(4-7) and (8-11) has 3 rows with 1 in column B and only one row with 0 AND variable A is descending. However, rows 12-15 are not ok, since there are 2 rows with 1(variable B) and two with 0. Row 17 would replace row 15 to make this set valid. There is no problem if the last rows has 0 in variable B, since there isn't any with value 1.
The code should be general enough to run on dataframes with different number of rows.
Any ideas?

Related

Dropping semi-dupliacted rows in pandas according to specific column value

I have a dataframe with duplicated rows except one column value, I want to drop the row with a value of "None" if the id is the same (not all the rows are duplicated)
a b
1 1 None
2 1 7
3 2 2
4 3 4
I need to drop the first row with the duplicated (1) and the value of b is None.
You can use duplicated and also search for None. That will return the row you want to drop, so use ~ to get the inverse dataframe (so everything but the row you want to drop) to return the expected result. EDIT: Passing keep=False will return all duplicates, so order doesn't matter.
df[~((df['b'].isnull()) & (df.duplicated('a', keep=False)))] #if None is Null value
OR
df[~((df['b'] == 'None') & (df.duplicated('a', keep=False)))] if 'None' is string

Get column index label based on values

I have the following:
C1 C2 C3
0 0 0 1
1 0 0 1
2 0 0 1
And i would like to get the corresponding column index value that has 1's, so the result
should be "C3".
I know how to do this by transposing the dataframe and then getting the index values, but this is not ideal for data in the dataframes i have, and i wonder there might be a more efficient solution?
I will save the result in a list because otherwise there could be more than one column with values ​​equal to 1. You can use DataFrame.loc
if all column values ​​must be 1 then you can use:
df.loc[:,df.eq(1).all()].columns.tolist()
Output:
['C3']
if this isn't necessary then use:
df.loc[:,df.eq(1).any()].columns.tolist()
or as suggested #piRSquared, you can select directly from df.columns:
[*df.columns[df.eq(1).all()]]

How can I compare two sets of data having two columns in excel? Picture below will elaborate

Below are two sets of data. Each has two columns. I want that that the similar data comes in front of each other.
This is a manual solution with formulas and sorting.
Imagine the following data in columns A to E:
Enter the following formulas into columns G to K
Column G: =IFERROR(IF(VLOOKUP(D:D,A:B,2,FALSE)=E:E,1,2),3)
Column H: =IF(G:G<3,D:D,"")
Column I: =IFERROR(VLOOKUP(H:H,A:B,2,FALSE),"")
Column J: =D:D
Column K: =IFERROR(VLOOKUP(J:J,D:E,2,FALSE),"")
The column G sort by now shows:
1 if part and quantity matched
2 if only part matched
3 if nothing matched
So if you now select data from A3:K10 and sort by column G (sort by) then it will result in this:

Business Objects CountIf by cell reference

So I have a column with this data
1
1
1
2
3
4
5
5
5
how can I do a count if where the value at any given location in the above table is equal to a cell i select? i.e. doing Count([NUMBER]) Where([NUMBER] = Coordinates(0,0)) would return 3, because there are 3 rows where the value is one in the 0 position.
it's basically like in excel where you can do COUNTIF(A:A, 1) and it would give you the total number of rows where the value in A:A is 1. is this possible to do in business objects web intelligence?
Functions in WebI operate on rows, so you have to think about it a little differently.
If your intent is to create a cell outside of the report block and display the count of specific values, you can use Count() with Where():
=Count([NUMBER];All) Where ([NUMBER] = "1")
In a freestanding cell, the above will produce a value of "3" for your sample data.
If you want to put the result in the same block and have it count up the occurrences of values on that row, for example:
NUMBER NUMBER Total
1 3
1 3
1 3
2 1
3 1
4 1
5 3
5 3
5 3
it gets a little more complicated. You have to have at least one other dimension in the query to reference. It can be anything, but you have to be counting something in conjunction with the NUMBER dimension. So, the following would work, assuming there's another dimension in the query named [Duh]:
=Count([NUMBER];All) ForAll([Duh])

How to get second MIN value or cross column MIN value in PowerPivot?

I have an interesting table and i cant figure it out how to get 2nd minimum value or like something operation. Here is an example style of my table:
Column1 Column2 Column3
A A 0
A C 11
A D 7
B X 11
B B 0
A E 5
B Y 17
A F 4
I need to find minimum value for each A or B (from Column1) in Column3. But A=A (column1=column2) or B=B rows should not include in this MIN calculation. But found value for A should shown on row A=A or min value for B should shown on B=B row.
Also tryed this calculations:
IF([Column1]<>[Column2],CALCULATE(MIN([Column3]),ALL(myTable),myTable[Column2]=EARLIER(myTable[Column2])),0) --> returning same values from Column3 for each row.
IF([Column1]=[Column2],CALCULATE(MIN([Column3]),ALL(myTable),myTable[Column2]=EARLIER(myTable[Column2])),0) --> returning min values from Column3 for each A=A or B=B rows correctly. A=A or B=B rows contain data as value is 0. if i change it to 1, it returns 1 for this calculation. But i need other rows min value.
IF([Column1]=[Column2],CALCULATE(MAX([Column3]),ALL(myTable),myTable[Column1]=EARLIER(myTable[Column1])),0) --> this calculation works like a charm for MAX value. Because highest values will be in other rows always.
P.S: A=A or B=B rows default value is always 0 in Column3.
im stuck at this point =/ Thank you.
Use (in a calculated column):
=CALCULATE(MIN([Column3]),FILTER(myTable,myTable[Column1]=EARLIER(myTable[Column1])),myTable[bColumn])
where myTable[bColumn] is the calculated column
=myTable[Column1]<>myTable[Column2]
If you want the formula in a measure instead of a calculated column, use:
=CALCULATE(MIN([Column3]),FILTER(ALL(myTable),COUNTROWS(FILTER(myTable,myTable[Column1]=EARLIER(myTable[Column1])))),myTable[bColumn])
HTH!