How to condition on odd or even across rows in dataframe - dataframe

I am looking to create a new variable based on the nature of other variables in the same observation
Column C should be "Keep" if all values in Columns A and B are not each Even or not each Odd, otherwise value in Column C should be remove. Ideally, I'd just like the result to retain only the "Keeps"
I've typed out what I want to achieve programmatically in Column C
b<-data.frame(matrix(c(1, 3, 4, 5, 2, 8), nrow=3, ncol=2))
c<-data.frame(c("remove","keep","remove"))
x<-cbind(b,c)
names(x)<-c("A","B","C")
x
Ideally, I'd just like the result to retain only the "Keeps"

Related

Looping though jagged array while fixing some dimentions in vb.net

I have a 7 dimensional jagged array that essentially is just a collection of decimal numbers. I need to go through the array and add up all the decimals that have certain values in certain columns. For example;
(A)(B)(..)(..)(..)(..)(..)
Where .. is the entire size of the dimention. For the above case I can simply use a bunch of nested for loops because I know that A and B are at the start of the array. But how can I deal with this if the dimention in which A and B are located is randomised. Eg.
(..)(A)(..)(..)(B)(..)(..)
Or
(..)(..)(..)(..)(..)(A)(B)
Or
(..)(..)(A)(..)(..)(..)(B)
Etc.
I thought about have a select case for the locations of A and B but this leads to hundreds (if not thousands) of lines of repeated code, and it feels like bad practice.
Any suggestions?
Edit #1
This is difficult to explain so I'm going to use a much more simple example. Instead of 7 dimentions let's say it's 2 dimentions (each with a length of 4). And instead of A and B let's say it's just A. I wish to add the following elements:
(A)(0)
(A)(1)
(A)(2)
(A)(3)
(0)(A)
(1)(A)
(2)(A)
(3)(A)
As you can see this is every element where A is in either of the dimentions (A is a real number, in this case either 0, 1, 2, or 3). Now in my case there's the need for both A and B to be in one of the dimentions and the requirement that A is always before B. But since there's 7 dimentions there's so many possible locations of A and B that writing code to each scenario is not ideal (also I'd like to extend it to C, D, etc.)

how to find mean of a column based upon its group in pandas

I'm using pandas and using a data set that has a column of class having values 1, 2 and 3 and have a column of age that has a variety of values.
Now I want to find the average/mean of the age depending upon which class they belong to ie class 1, 2 or 3. The data set has 900 rows and 9 columns in it. How can I do it ??
One possible solution is,
df.loc[df['class'] == 1, 'age'].mean()
Where df['class'] can be whatever columns that you want == to whatever value that you want in the class column
Hope this answers your question.

recoding multiple variables in the same way

I am looking for the shortest way to recode many variables in the same way.
For example I have data frame where columns a,b,c are names of items of survey and rows are observations.
d <- data.frame(a=c(1,2,3), b=c(1,3,2), c=c(1,2,1))
I want to change values of all observations for selected columns. For instance value 1 of column "a" and "c" should be replaced to string "low" and values 2,3 of these columns should be replaced to "high".
I do it often with many columns so I am looking for function which can do it in very simple way, like this:
recode2(data=d, columns=a,d, "1=low, 2,3=high").
Almost ok is function recode from package cars, but if I have 10 columns to recode I have to rewrite it 10 times and it is not as effective as I want.

Mark accumulated values on a QlikView column if condition is fulfilled

I have a table in Qlikview with 2 columns:
A B
a 10
b 45
c 30
d 15
Based on this table, I have a formula with full acumulation defined as:
SUM(a)/SUM(TOTAL a)
As a result,
A B D
b 45 45/100=0.45
c 30 75/100=0.75
d 15 90/100=0.90
a 10 100/100=1
My question is. how do I mark in colour the values in column A that have on column D <=0.8)?
The challenge is that D is defined with full accumulation, but if I reference D in a formula, it doesn't consider the full accumulation!
I tried with defining a formula E=if(D>0.8,'Y','N') but this formula doesn't take the visible (accumulated) value for D unfortunately, instead it takes the D with no accumulation. If this worked, I would have tried to hide (not disable) E and reference it from the dimensions column of the table , Text colour option. Any ideas please?? Thanks
You can't get an expression column's value from within a dimension or it's properties, because the expression columns rely on the dimensions provided. It would create an endless loop. Your options are:
Apply your background colour to the expression columns, not the dimensions. This would actually make more sense as the accumulated values would have the colour, not the dimension.
When loading this specific table, have QlikView create a new column that contains the accumulated values of B. This would mean, however, that the order of your chart-table would need to be fixed for the accumulations to make any sense.
Use aggregation to create a temporary table and accumulate the values using RangeSum(). Note this will only accumulate properly if the table is ordered in Ascending order of Column A
=IF(Aggr(RangeSum(Above(Sum(B),0,10)),A)/100>0.8,
rgb(0,0,0),
rgb(255,0,0)
)

Dataframe non-null values differ from value_counts() values

There is an inconsistency with dataframes that I cant explain. In the following, I'm not looking for a workaround (already found one) but an explanation of what is going on under the hood and how it explains the output.
One of my colleagues which I talked into using python and pandas, has a dataframe "data" with 12,000 rows.
"data" has a column "length" that contains numbers from 0 to 20. she wants to divided the dateframe into groups by length range: 0 to 9 in group 1, 9 to 14 in group 2, 15 and more in group 3. her solution was to add another column, "group", and fill it with the appropriate values. she wrote the following code:
data['group'] = np.nan
mask = data['length'] < 10;
data['group'][mask] = 1;
mask2 = (data['length'] > 9) & (data['phraseLength'] < 15);
data['group'][mask2] = 2;
mask3 = data['length'] > 14;
data['group'][mask3] = 3;
This code is not good, of course. the reason it is not good is because you dont know in run time whether data['group'][mask3], for example, will be a view and thus actually change the dataframe, or it will be a copy and thus the dataframe would remain unchanged. It took me quit sometime to explain it to her, since she argued correctly that she is doing an assignment, not a selection, so the operation should always return a view.
But that was not the strange part. the part the even I couldn't understand is this:
After performing this set of operation, we verified that the assignment took place in two different ways:
By typing data in the console and examining the dataframe summary. It told us we had a few thousand of null values. The number of null values was the same as the size of mask3 so we assumed the last assignment was made on a copy and not on a view.
By typing data.group.value_counts(). That returned 3 values: 1,2 and 3 (surprise) we then typed data.group.value_counts.sum() and it summed up to 12,000!
So by method 2, the group column contained no null values and all the values we wanted it to have. But by method 1 - it didnt!
Can anyone explain this?
see docs here.
You dont' want to set values this way for exactly the reason you pointed; since you don't know if its a view, you don't know that you are actually changing the data. 0.13 will raise/warn that you are attempting to do this, but easiest/best to just access like:
data.loc[mask3,'group'] = 3
which will guarantee you inplace setitem