SAS Checking Whether A Third Variable is Between the Values of two other variables - variables

I have been dealing with this issue that I thought was trivial, but for some reason nothing I have tried has worked so far.
I have a dataset
obs A B C
1 2 6 7
2 3 1 5
3 8 5 9
. . . .
For each observation, I want to compare the values in column A to the values in column B and assign a value 1 to a variable called within. My goal to only select observations where their A value is within their B and C values. I have tried everything, but nothing seem to be working.
Thank you.

Here's how to do it in a data step. Let me know if that works for you.
data new;
set old;
if B < A < C then D = 1;
else delete;
run;

Related

SQL dealing every bit without run query repeatedly

I have a column using bits to record status of every mission. The index of bits represents the number of mission while 1/0 indicates if this mission is successful and all bits are logically isolated although they are put together.
For instance: 1010 is stored in decimal means a user finished the 2nd and 4th mission successfully and the table looks like:
uid status
a 1100
b 1111
c 1001
d 0100
e 0011
Now I need to calculate: for every mission, how many users passed this mission. E.g.: for mission1: it's 0+1+1+0+1 = 5 while for mission2, it's 0+1+0+0+1 = 2.
I can use a formula FLOOR(status%POWER(10,n)/POWER(10,n-1)) to get the bit of every mission of every user, but actually this means I need to run my query by n times and now the status is 64-bit long...
Is there any elegant way to do this in one query? Any help is appreciated....
The obvious approach is to normalise your data:
uid mission status
a 1 0
a 2 0
a 3 1
a 4 1
b 1 1
b 2 1
b 3 1
b 4 1
c 1 1
c 2 0
c 3 0
c 4 1
d 1 0
d 2 0
d 3 1
d 4 0
e 1 1
e 2 1
e 3 0
e 4 0
Alternatively, you can store a bitwise integer (or just do what you're currently doing) and process the data in your application code (e.g. a bit of PHP)...
uid status
a 12
b 15
c 9
d 4
e 3
<?php
$input = 15; // value comes from a query
$missions = array(1,2,3,4); // not really necessary in this particular instance
for( $i=0; $i<4; $i++ ) {
$intbit = pow(2,$i);
if( $input & $intbit ) {
echo $missions[$i] . ' ';
}
}
?>
Outputs '1 2 3 4'
Just convert the value to a string, remove the '0's, and calculate the length. Assuming that the value really is a decimal:
select length(replace(cast(status as char), '0', '')) as num_missions as num_missions
from t;
Here is a db<>fiddle using MySQL. Note that the conversion to a string might look a little different in Hive, but the idea is the same.
If it is stored as an integer, you can use the the bin() function to convert an integer to a string. This is supported in both Hive and MySQL (the original tags on the question).
Bit fiddling in databases is usually a bad idea and suggests a poor data model. Your data should have one row per user and mission. Attempts at optimizing by stuffing things into bits may work sometimes in some programming languages, but rarely in SQL.

Calculating the difference between values based on their date

I have a dataframe that looks like this, where the "Date" is set as the index
A B C D E
Date
1999-01-01 1 2 3 4 5
1999-01-02 1 2 3 4 5
1999-01-03 1 2 3 4 5
1999-01-04 1 2 3 4 5
I'm trying to compare the percent difference between two pairs of dates. I think I can do the first bit:
start_1 = "1999-01-02"
end_1 = "1999-01-03"
start_2 = "1999-01-03"
end_2 = "1999-01-04"
Obs_1 = df.loc[end_1] / df.loc[start_1] -1
Obs_2 = df.loc[end_2] / df.loc[start_2] -1
The output I get from - eg Obs_1 looks like this:
A 0.011197
B 0.007933
C 0.012850
D 0.016678
E 0.007330
dtype: float64
I'm looking to build some correlations between Obs_1 and Obs_2. I think I need to create a new dataframe with the labels A-E as one column (or as the index), and then the data series from Obs_1 and Obs_2 as adjacent columns.
But I'm struggling! I can't 'see' what Obs_1 and Obs_2 'are' - have I created a list? A series? How can I tell? What would be the best way of combining the two into a single dataframe...say df_1.
I'm sure the answer is staring me in the face but I'm going mental trying to figure it out...and because I'm not quite sure what Obs_1 and Obs_2 'are', it's hard to search the SO archive to help me.
Thanks in advance

SAS INPUT COLUMN

I have a problem in SAS, I would like to know how can I input several columns in only one column(put everything in a single variable)?
For example, I have 3 columns but I would like to put this 3 columns in only one column.
like this:
1 2 3
1 3 1
3 4 4
output:
1
1
3
2
3
4
3
1
4
I'm assuming you're reading from a file, so use the trailing ## to keep reading variables past the end of the line:
data want;
input a ##;
cards;
1 2 3
1 3 1
3 4 4
;
run;
If the dataset is not big just split it to several small data set with one variable each, then rename all variables to one name and concatenate vertiacally using simple set statement. I am sure there are more elegant solutions than this one and if your data set is big let me know, I will write the actual code needed to perform this action with optimal coding

Counting rows in multiple variables using SAS

I have a question in creating a count variable using SAS.
Q R
----
1 a
1 a
1 b
1 b
1 b
2 a
3 a
3 c
4 c
4 c
4 c
I need to create a variable S that counts the rows that has same combination of Q and R. The following will be the output.
Q R S
-------------------
1 a 1
1 a 2
1 b 1
1 b 2
1 b 3*
2 a 1
3 a 1
3 c 1
4 b 1
4 b 2
4 b 3
I tried using following program:
data two;
set one;
S + 1;
by Q R;
if first.Q and first.R then S = 1;
run;
But, this did not run correctly. For example, * will come out as 1 instead of 3. I would appreciate any tips on how to make this counting variable work correctly.
Very close, your if statement is should be first.R (or change the and to OR but that isn't efficient). I usually prefer to have the increment after the set to 1.
data two;
set one;
by Q R;
*Retain S; *implicitly retained by using the +1 notation;
if first.R then S = 1;
else S+1;
run;
Reese's example is certainly sufficient in this case, but given this was a simple question with a mostly uninteresting answer, I'll instead present a very small variant simply from a programming style standpoint.
data two;
set one;
by Q R;
if first.R then s=0;
s+1;
run;
This will likely function exactly the same as Reese's code and the original question's code once the first.Q is removed. However, it has two slight differences.
First off, I like to group if first. variable-resetting code (that isn't otherwise dependent on location) in one place, as early as possible in the code (after by statements, where, "early" subsetting if, array, format, length, and retain). This is useful in an organizational standpoint because this code (usually) is something that roughly parallels what SAS does in between data step iterations, but for BY groups - so it's nice to have it at the start of the data step.
Second, I like initializing to zero. That avoids the need for the else conditional, which makes the code clearer; you're not doing anything different on the first row per BY group other than the re-initialization, so it makes sense to not have the increment be conditional.
These are both helpful in a more complex version of this data step; obviously in this particular step it doesn't matter much, but in a larger data step it can be helpful to organize your code more effectively. Imagine this data step:
data two;
set one;
by Q R;
retain Z;
format Z $12.;
array CS[15];
if first.R then do; *re-init block;
S=0;
Z=' ';
end;
S+1; *increment counter;
do _t = 1 to dim(CS);
Z = cats(Z,ifc(CS[_t]>0,put(CS[_t],8.),''));
end;
keep Q R S Z;
run;
That data step is nicely organized, and has a logical flow, even though almost every step could be moved anywhere else in the data step. Grouping the initializations together makes it more readable, particularly if you always group things that way.

Check if value is already in a query field to change the value of another

I'll clarify this: I have a data result with the twist that the two PK's (A and B) are the same, and field C doesn't.
Example:
A B C D
> 14 20 1 null
> 14 20 2 1
> 15 20 2 0
As you can see, D field has a null and a 0.
What I have to do is to change D's null value to 1 whenever A fields are the same, and there's more than 1 record with those, not touching the 0's in D.
I tried initially with NVLs and DECODEs, like this:
DECODE(migr.A,NULL,(NVL(C,1)),D) AS D
but I'm not getting all the records, only the D-1's.
I really don't want to relate to an extra table/step for validation, as my query result can be easily over 1 million records, but if that's the best, I'm ok.
Many thanks.