system missings when computing variable

system missings when computing variable - variables

I'm trying to compute a new variable using 3 other variables. If all 3 conditions are positive, the new variable gives 1. My problem: if just 1 or 2 of these conditions are present, I get a value 0 when it needs to be a system missing.

In order to get SPSS to calculate a value only when you have values in all three variables you can use this:
if nmiss(E’sept_preTx, Eope’gem_preTx, TRpieksnelheid_t1)=0
DD_new=(E’sept_preTx < 10) & (Eope’gem_preTx >= 15) & (TRpieksnelheid_t1 > 2.8).
the nmiss counts the missing values, and the original calculation is carried out only if tere are none.

Related

lpsolve - how to define variable to accept values only from two intervals

I need to set up all my variables to accept only values from two intervals, e.g. for var C1:
C1 = 0 or
100 <= C1 <= 1000
and so on for each variable.
I can specify either first (=0) or second ([100;1000]) constraints for every variable but I need lpsolve model to to be able to assign either 0 ot any value from given interval.
Any advice appreciated.

SQL Aggregating Mixed Data

I have a table that has 3 columns
STA (which is equivalent to X)
BL (which is equivalent to y)
Ultimate_Load (which has positive and negative values)
I want to aggregate column 3 by the maximum of absolute values, but display the actual (negative or positive) value.
your help is appreciated

I think this would work. You get de max absolute value of ultimate_load and then change it's sign depending on the sign of MAX(ultimate_load)+MIN(ultimate_load).
SELECT STA,BL, MAX(ABS(ULTIMATE_LOAD)) * case sign(MAX(ULTIMATE_LOAD)+min(ULTIMATE_LOAD)) when 0 then 1 when -1 then -1 else 1 end as maxvalue
FROM TAB
GROUP BY STA, BL

Using CONTAINS with variables sql

Ok so I am trying to reference one variable with another in SQL.
X= a,b,c,d (x is a string variable with a list of things in it)
Y= b ( Y is a string variable that may or may not have a vaue that appears in X)
I tried this:
Case when Y in (X) then 1 else 0 end as aa
But it doesnt work since it looks for exact matches between X and Y
also tried this:
where contains(X,#Y)
but i cant create Y globally since it is a variable that changes in each row of the table.( x also changes)
A solution in SAS would also be useful.
Thanks

Maybe like will help
select
*
from
t
where
X like ('%'+Y+'%')
or
select
case when (X like ('%'+Y+'%')) then 1 else 0 end
from
t
SQLFiddle example

In SAS I would use the INDEX function, either in a data step or proc sql. This returns the position within the string in which it finds the character(s), or zero if there is no match. Therefore a test if the value returned is greater than zero will result in a binary 1:0 output. You need to use the compress function with the variable containing the search characters as SAS pads the value with blanks.
Data step solution :
aa=index(x,compress(y))>0;
Proc Sql solution :
index(x,compress(y))>0 as aa

Dataframe non-null values differ from value_counts() values

There is an inconsistency with dataframes that I cant explain. In the following, I'm not looking for a workaround (already found one) but an explanation of what is going on under the hood and how it explains the output.
One of my colleagues which I talked into using python and pandas, has a dataframe "data" with 12,000 rows.
"data" has a column "length" that contains numbers from 0 to 20. she wants to divided the dateframe into groups by length range: 0 to 9 in group 1, 9 to 14 in group 2, 15 and more in group 3. her solution was to add another column, "group", and fill it with the appropriate values. she wrote the following code:
data['group'] = np.nan
mask = data['length'] < 10;
data['group'][mask] = 1;
mask2 = (data['length'] > 9) & (data['phraseLength'] < 15);
data['group'][mask2] = 2;
mask3 = data['length'] > 14;
data['group'][mask3] = 3;
This code is not good, of course. the reason it is not good is because you dont know in run time whether data['group'][mask3], for example, will be a view and thus actually change the dataframe, or it will be a copy and thus the dataframe would remain unchanged. It took me quit sometime to explain it to her, since she argued correctly that she is doing an assignment, not a selection, so the operation should always return a view.
But that was not the strange part. the part the even I couldn't understand is this:
After performing this set of operation, we verified that the assignment took place in two different ways:
By typing data in the console and examining the dataframe summary. It told us we had a few thousand of null values. The number of null values was the same as the size of mask3 so we assumed the last assignment was made on a copy and not on a view.
By typing data.group.value_counts(). That returned 3 values: 1,2 and 3 (surprise) we then typed data.group.value_counts.sum() and it summed up to 12,000!
So by method 2, the group column contained no null values and all the values we wanted it to have. But by method 1 - it didnt!
Can anyone explain this?

see docs here.
You dont' want to set values this way for exactly the reason you pointed; since you don't know if its a view, you don't know that you are actually changing the data. 0.13 will raise/warn that you are attempting to do this, but easiest/best to just access like:
data.loc[mask3,'group'] = 3
which will guarantee you inplace setitem

Looping through variables in spss

Im looking for a way to loop through variables (eg week01 to week52) and count the number of times the value changes across the them. For example
week01 to week18 may be coded as 1
week19 to week40 may be coded as 4
and week 41 to 52 may be coded as 3
That would be 2 transistions within the data.
How could i go about writing a code that can find me this information? I'm rather new to this and some help to get me in the right direction would be very appreciated.

You can use the DO REPEAT command to loop through variable lists. Below is an example of using this command to create a before date and after date to compare, and increment a count variable whenever these two variables are different.
data list fixed / observation (A1).
begin data
1
2
3
4
5
end data.
*making random data.
vector week(52).
do repeat week = week1 to week52.
compute week = RND(RV.UNIFORM(0.5,4.4)).
end repeat.
execute.
*initialize count to zero.
compute count = 0.
do repeat week_after = week2 to week52 / week_before = week1 to week51.
if week_after <> week_before count = count + 1.
end repeat.
execute.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

system missings when computing variable - variables

I'm trying to compute a new variable using 3 other variables. If all 3 conditions are positive, the new variable gives 1. My problem: if just 1 or 2 of these conditions are present, I get a value 0 when it needs to be a system missing.

Related

lpsolve - how to define variable to accept values only from two intervals

SQL Aggregating Mixed Data

Using CONTAINS with variables sql

Dataframe non-null values differ from value_counts() values

Looping through variables in spss

Categories

Resources