Simplifying the variable input in SAS - variables

I have 90 variables in the data, I want to do the following in SAS.
Here is my SAS code:
data test;
length id class sex $ 30;
input id $ 1 class $ 4-6 sex $ 8 survial $ 10;
cards;
1 3rd F Y
2 2nd F Y
3 2nd F N
4 1st M N
5 3rd F N
6 2nd M Y
;
run;
data items2;
set test;
length tid 8;
length item $8;
tid = _n_;
item = class;
output;
item = sex;
output;
item = survial;
output;
keep tid item;
run;
What if I have 90 variables to input the data like this? There should be a very long list. I want to simplify it.

You could use an ARRAY or alternately a PROC TRANSPOSE.
The following is untested, because you haven't provided an exxample of your input dataset.
DATA ITEMS;
ARRAY VARS {*} VAR1-VAR90;
SET REPLACE;
DO I = LBOUND(VARS) TO HBOUUND(VARS);
ITEM = VARS{I};
OUTPUT;
END;
RUN;
OR
PROC TRANSPOSE DATA = TEST OUT = WANT;
BY ID;
VAR CLASS -- SURVIAL;
RUN;
In the future it would be best is you could supply your input and desired output.

I don't seem to be able to add another comment to the above answer, as such I am adding one here.
You need to extend the VAR statement to include all variables that you want transposed.
CLASS -- SURVIAL means all variables between CLASS and SURVIVAL inclusive.
Post your code and the error so that I can help you better.

Related

How do i assign a value to a new variable, using another dataset which contains one value in SAS

I have a dataframe
ID value1
1 12
2 345
3 342
i have a second dataframe
value2
3823
how do I get the following result?
ID value1 value2
1 12 3823
2 345 3823
3 342 3823
any joins I have done have given me
ID value1 value2
1 12 .
2 345 .
3 342 .
. . 3823
No need for joins or helper variables:
data have;
do i = 1 to 3;
output;
end;
run;
data lookup;
j = 1;
run;
data want;
set have;
if _n_ = 1 then set lookup;
run;
Without the if _n_ = 1, the data step stops after one iteration when it tries to read a second row from the lookup dataset and finds that there are no rows remaining.
N.B. this requires that the have dataset doesn't already contain a variable with the same name as the variable(s) attached from the lookup dataset.
By far the easiest way to do this is to utilize PROC SQL and defining the condition 1=1, which is always true for each comparison:
data first;
input ID value1 ##;
cards;
1 12 2 345 3 342
run;
data second;
input value2 ;
cards;
3823
run;
proc sql;
create table wanted as
select * from first
left join second
on 1 =1
;quit;
Edit: As far as I know, there isn't direct way to merge datasets by each row, but you can do the following trick:
Add variable Help:
data second_trick;
set second;
help=1;
run;
data first_trick;
set first;
help=1;
run;
Then we just perform the merge by the static variable:
data wanted_trick;
merge first_trick(in=a) second_trick;
by help;
if a; /*Left join, just to be sure.*/
run;
now this only works if you want to add single static value. Don't try to use it your Second set has more rows.
For more on Merges and joins see: https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/249-30.pdf

SAS : Output from highest variable from dataset

I want to assign a new variable from existing highest n variable.
So if we have a table that has increasing number of columns -
data have;
input uid $ var1 $ var2 $ var3 $;
datalines;
1111 1 0 1
2222 1 0 0
3333 0 0 0
4444 1 1 1
5555 0 0 0
6666 1 1 1
;
I want derive the variable var3 as final_code.
data want;
set have;
final_code = max(of var1-var3);
run;
Above doesn't make sense here as I want only var3 column to remain. Similarly, if var4 is there, I wish to have var4 only.
Does somebody want to help me here ?
If I understand you right, you don't want max of the values but the value from the highest-numbered-variable.
Lots of ways to do this, which way depends on how the variables are named. Here's the easiest, if they're actually named as you say.
data want;
set have;
array var[*] var:;
final_code = var[dim(var)];
run;
Here we make an array out of var: and then choose the last element in the array using dim (to say the size of the array).
I think this is what you are looking for is:
%let n=3
data want;
set have;
var&n = max(of var1-var&n);
drop var1-var%eval(&n-1);
run;
The macro variable &n holds the value of n. This acts as a substitution during the compilation phase of the code.
The DROP statement tells the data step to drop those variable.
The %eval() macro function performs integer math on macro values. So we are dropping 1 through N-1.

Dropping variable based on sum of values in it using SAS

I wish to drop the columns in a SAS dataset which has a sum less than a particular value. Consider the case below.
Column_A Pred_1 Pred_2 Pred_3 Pred_4 Pred_5
A 1 1 0 1 0
A 0 1 0 1 0
A 0 1 0 1 0
A 0 1 0 1 1
A 0 1 0 0 1
Let us assume that our threshold is 4, so I wish to drop predictors having sum of active observations less than 4, so the output would look like
Column_A Pred_2 Pred_4
A 1 1
A 1 1
A 1 1
A 1 1
A 1 0
Currently I am using a very inefficient method of using multiple transposes to drop the predictors. There are multiple datasets with records > 30,000 so transpose approach is taking time. Would appreciate if anyone has a more efficient solution!
Thanks!
Seems like you could do:
Run PROC MEANS or similar proc to get the sums
Create a macro variable that contains all variables in the dataset with sum < threshhold
Drop those variables
Then no TRANSPOSE or whatever, just regular plain old summarization and drops. Note you should use ODS OUTPUT not the OUT= in PROC MEANS, or else you will have to PROC TRANSPOSE the normal PROC MEANS OUT= dataset.
An example using a trivial dataset:
data have;
array x[20];
do _n_ = 1 to 20;
do _i = 1 to dim(x);
x[_i] = rand('Uniform') < 0.2;
end;
output;
end;
run;
ods output summary=have_sums; *how we get our output;
ods html select none; *stop it from going to results window;
proc means data=have stackodsoutput sum; *stackodsoutput is 9.3+ I believe;
var x1-x20;
run;
ods html select all; *reenable normal html output;
%let threshhold=4; *your threshhold value;
proc sql;
select variable
into :droplist_threshhold separated by ' '
from have_sums
where sum lt &threshhold; *checking if sum is over threshhold;
quit;
data want;
set have;
drop &droplist_threshhold.; *and now, drop them!;
run;
Just use PROC SUMMARY to get the sums. You can then use a data step to generate the list of variable names to drop.
%let threshhold=4;
%let varlist= pred_1 - pred_5;
proc summary data=have ;
var &varlist ;
output out=sum sum= ;
run;
data _null_;
set sum ;
array x &varlist ;
length droplist $500 ;
do i=1 to dim(x);
if x(i) < &threshhold then droplist=catx(' ',droplist,vname(x(i)));
end;
call symputx('droplist',droplist);
run;
You can then use the macro variable to generate a DROP statement or a DROP= dataset option.
drop &droplist;

SAS: Changing multiple variable names

This is my current issue:
I have 53 variable headers in a SAS data set that need to be changed, for example:
Current_Week_0 TS | Current_Week_1 TS | Current_Week_2 TS -- etc.
I need it to change such that Current_Week_# TS = Current_Week_# -- dropping the TS
Is there a way to automate this such as looping it like:
i = 0,53
Current_week_i TS = Current_Week_i ?
I just don't understand the proper syntax.
Edit: Thank you for editing my formats Sergiu, appreciate it! :)
Edit:
I used the following code, but I get the following error:
Missing numeric suffix on a numbered variable list (TS-Current_Week_53)
DATA True_Start_8;
SET True_Start_7;
ARRAY oldnames (53) Current_Week_1 TS-Current_Week_53 TS;
ARRAY newnames (53) Current_Week_1-Current_Week_53;
DO i = 1 TO 53;
newnames(i) = oldnames(i) ;
END;
RUN;
#Joe EDIT
Here's what the data looks like before and after the "denorm" / transpose
BEFORE
Product ID CurrentWeek Market TS
X 75av2kz Current_Week_0 Z 1
Y 7sav2kz Current_Week_0 Z 1
X 752v2kz Current_Week_1 Z 1
Y 255v2kz Current_Week_1 Z 1
Product ID Market Current_Week_0_TS Current_Week_1_TS
X 75av2kz Z 1 0
Y 7sav2kz Z 1 1
X 752v2kz Z 1 1
Y 255v2kz Z 1 0
This isn't too hard. I assume these are variable labels.
proc sql;
select cats('%relabel_nots(',name,')') into :relabellist separated by ' '
from dictionary.columns
where libname='WORK' and memname='True_Start_7'
and name like '%TS'; *you may need to upper case the dataset name (memname) depending on your OS;
quit;
%macro relabel_nots(name);
label &name.= substr(vlabel(&name.),1,length(vlabel(&name.))-3);
%mend relabel_nots;
data want;
set True_Start_7;
&relabellist.;
run;
Basically the PROC SQL grabs the different names that qualify for the relabelling, and generates a large macro variable with all of the rename macro calls. The relabel_nots macro generates the new labels. You may need to change the logic behind the WHERE in the PROC SQL if the variable names don't also contain the TS.
Another option is to do this in the transpose. Your example data either doesn't match the example desired output, or there is something in logic not explained, but this does the simple transpose; if there is a logical reason that the current_week_0/1 are different in yours than in the below, explain why.
data have;
format currentWeek $20.;
input Product $ ID $ CurrentWeek $ Market $ TS;
datalines;
X 75av2kz Current_Week_0 Z 1
Y 7sav2kz Current_Week_0 Z 1
X 752v2kz Current_Week_1 Z 1
Y 255v2kz Current_Week_1 Z 1
;;;;
run;
proc sort data=have;
by market id product;
run;
proc transpose data=have out=want;
by market id product ;
id currentWeek;
var TS;
run;

SAS dynamic variable names from other variables

I am trying to create SAS variable names based on data contained within other variables. For example, I could start with
Obs Var1 Var2
1 abc X
2 def X
3 ghi Y
4 jkl X
and I would like to end up with
Obs Var1 Var2 X Y
1 abc X abc
2 def X def
3 ghi Y ghi
4 jkl X jkl
I do have one way of doing this but it requires somewhat ugly macros to first create the variables needed (using a length statement) and then creating a whole series of numbered macro variables (1 per observation) that are later called inside a data step loop. It works but is complicated and I don't think will scale well to the real data, which contain multiple variables for creation per row, and a few thousand rows.
I've also tried something with arrays - saving variables names in a macro var, using it to generate an array statement, and trying to keep track of which array index is needed for each new variable, but it is also complicated.
What would really help would be something analogous to
vvaluex(var2)=var1
except vvaluex can't be on the left-hand side of an equals. Any thoughts or ideas?
PROC TRANSPOSE is a handy way to do the example in the question.
data have;
input Obs Var1 $ Var2 $;
datalines;
1 abc X
2 def X
3 ghi Y
4 jkl X
;;;;
run;
proc transpose data=have out=want;
by obs;
id var2;
var var1;
copy var1 var2;
run;
Another option is probably similar to what you've tried before, using arrays and VNAME:
proc sql;
select var2 into :var2list separated by ' ' from have;
quit;
data want;
set have;
array newvars $ &var2list;
do _t = 1 to dim(newvars);
if vname(newvars[_t]) = Var2 then do;
newvars[_t] = var1;
leave;
end;
end;
run;
PROC TRANSPOSE should be faster and is probably more flexible, but this might work better for some purposes.