I appreciate a lot you guys, especially when i got problems about modulating with SAS.
I have a data set like a follows.
ID key score
10002817 200207826243 0
10002817 200207826271 0
10002817 200208532180 0
10002976 200301583978 0
10003685 200302311690 0
10006588 200401613047 0
10006588 200502882618 0
10009377 201007510866 1
10009377 201111777969 0
10011044 200801328219 2
10011044 200803290654 3
10011044 200803290728 1
10011044 200803290905 1
10011044 200803291161 0
Sometimes the id is repeated in the data or not.
I want to see maximum difference in score according to ID.
That is, a form like followings.
ID key score diff_score
10002817 200207826243 0 0
10002817 200207826271 0 0
10002817 200208532180 0 0
10002976 200301583978 0 0
10003685 200302311690 0 0
10006588 200401613047 0 0
10006588 200502882618 0 0
10009377 201007510866 1 1
10009377 201111777969 0 1
10011044 200801328219 2 3
10011044 200803290654 3 3
10011044 200803290728 1 3
10011044 200803290905 1 3
10011044 200803291161 0 3
How can i make this with SAS?
It would be helpful if you help me.
Thank you all.
You can do this using proc sql:
proc sql;
create table want as
select ID, key, score, max(score)-min(score) as diff_score
from have
group by ID;
quit;
One of the advantages of using proc sql is your data doesn't need to be sorted for this to work.
Related
I have this database
ID
LABEL
1
A
1
B
2
B
3
c
I'm trying to do an one hot encoding, which I was able to do. However, I also need to remove the duplicated IDs, so my one hot code appears to be like below:
ID
A
B
C
1
1
0
0
1
0
1
0
2
0
1
0
3
0
0
1
and I need this to be the final database
ID
A
B
C
1
1
1
0
2
0
1
0
3
0
0
1
this is my code
dummy <- dummyVars('~ .', data = data_to_be_encoded)
encoded_data <- data.frame(predict(dummy, newdata = data_to_be_encoded))
I have table with data needs to unpivot and get aggregated counts.
Source table:
primary_id sys_1 sys_2 sys3_ sy5 sys100
newa889 0 1 0 1 0
den7899 1 1 1 1 0
geo8988 1 1 1 1 0
atla8766 0 1 0 1 1
chic7898 0 1 0 0 1
Desired output:
sys_name count(primary_key) flag_0_or_1
sys_1 129999 0
sys_1 544545 1
sys_2 23333 0
sys2 23322323 1
sys3_ 332233 0
sys3_ 323232 1
sy5 32332 0
sy5 32323 1
Looking to get the data transpose get 0's and 1's counts from each sys_ column.
There is an incomplete graph (e.g. including 5 vertices). The adjacency matrix "a" is available. I want to define the set which includes all edges but exclude any other pair of vertices. That is, the pair of vertices belongs to the set of edges iff the element in matrix "a" is positive.
The last line of following code does not work!
sets i "Set of vertices" /1*5/ ;
alias(i,j);
set a(i,j) "Adjacency matrix" ;
Table a(i,j)
1 2 3 4 5
1 0 1 0 1 1
2 1 0 1 0 0
3 0 1 0 0 0
4 1 0 0 0 1
5 1 0 0 1 0;
Set edges(i,j);
edges(i,j) = a(i,j)$(a(i,j)>0);
If you want to have edge , you must define a set and parameter like this :
sets i "Set of vertices" /1*5/ ;
alias(i,j);
set a(i,j) "Adjacency matrix" ;
Table a(i,j)
1 2 3 4 5
1 0 1 0 1 1
2 1 0 1 0 0
3 0 1 0 0 0
4 1 0 0 0 1
5 1 0 0 1 0;
Set edges(i,j);
edges(i,j) $ a(i,j) =yes;
You can simplify your last line to
edges(i,j) = a(i,j);
This automatically acts as if you wrote something like $(a<>0). However, since you defined your symbol a as set already and not as parameter, I think you actually do not have to do anything. A just is what you are looking for. Just do
display a;
and look at the result in the lst file.
I'm starting with a table like this:
code new_code_flag
abc123 0
xyz456 0
wer098 1
jio234 0
bcx190 0
eiw157 0
nzi123 0
epj676 0
ere654 0
yru493 1
ale674 0
I want to grab the 2 records before and 2 records after each value where "new_code_flag"=1. I want my output to look like this:
code new_code_flag
abc123 0
xyz456 0
wer098 1
jio234 0
bcx190 0
epj676 0
ere654 0
yru493 1
ale674 0
Any help on how to do this in SQL or SAS?
SQL tables represent unordered sets. Hence, in SQL you need to have a column that specifies the ordering. Assuming you do, you can do something like:
with t as (
select t.*, row_number() over (order by ?) as seqnum
from tbl t
)
select t.*
from t
where exists (select 1
from t t2
where t2.new_code_flag = 1 and
t.seqnum between t2.seqnum - 2 and t2.seqnum + 2
);
You could create two lag and two lead copies of the flag variable and then test if any of the 5 variables are 1 (true).
data have;
input code $ flag ;
cards;
abc123 0
xyz456 0
wer098 1
jio234 0
bcx190 0
eiw157 0
nzi123 0
epj676 0
ere654 0
yru493 1
ale674 0
;
data want ;
set have ;
set have(keep=flag rename=(flag=lead1_flag) firstobs=2) have(drop=_all_ obs=1);
set have(keep=flag rename=(flag=lead2_flag) firstobs=3) have(drop=_all_ obs=2);
lag1_flag=lag1(flag);
lag2_flag=lag2(flag);
if lag1_flag or lag2_flag or flag or lead1_flag or lead2_flag ;
run;
Results
lead1_ lead2_ lag1_ lag2_
Obs code flag flag flag flag flag
1 abc123 0 0 1 . .
2 xyz456 0 1 0 0 .
3 wer098 1 0 0 0 0
4 jio234 0 0 0 1 0
5 bcx190 0 0 0 0 1
6 epj676 0 0 1 0 0
7 ere654 0 1 0 0 0
8 yru493 1 0 . 0 0
9 ale674 0 . . 1 0
data want(drop=_: i);
merge have have(keep=flag firstobs=3 rename=(flag=_flag));
if flag or _flag then i=1;
if 0<i<=3 then do;
output;
i+1;
end;
else delete;
run;
I have a table with indicators of directions and based on that I need to derive a new column which tells whether its IN or Out
ORG_IN ORG_OUT DEST_IN DEST_OUT Direction
0 0 0 0 NULL
0 0 0 1 Out
0 0 1 0 In
0 1 0 0 Out
0 1 0 1 Out
0 1 1 0 NULL
1 0 0 0 In
1 0 0 1 NULL
1 0 1 0 In
This is the query where ill derived the direction
http://sqlfiddle.com/#!4/a9f82/1
Do you think it will cover all cases in future for all the combinations. Right now I can see only above combinations. Any better way to write the sql.
select t.*, case ORG_IN + DEST_IN - ORG_OUT - DEST_OUT
when 2 then 'In'
when 1 then 'In'
when 0 then null
when -1 then 'Out'
when -2 then 'Out'
end as Direction
from tablename t
I can't figure out any more valid combinations. However, I'd recommend a check constraint that makes sure no invalid combinations are entered:
check (ORG_IN + ORG_OUT < 2 and DEST_IN + DEST_OUT < 2)