how to assign count of column to anothe variable in sas? - sql

%Let abc = count( no of variables in data set )

The following code assigns the number of columns in the dataset 'have' to the macro variable abc.
data _null_;
if 0 then
do;
set have (obs=0);
end;
array chars _character_;
array nums _numeric_;
ncharvar = dim(chars);
nnumvar = dim(nums);
nvar = ncharvar + nnumvar;
call symput('abc',nvar);
run;

Related

SAS - How to separate a string with variable substring into multiple columns

I have a dataset containing a variable X, made up of multiple numbers separated by a comma. The number of item is different among rows. I created a count words. Now I would like to see the numbers in different columns.
Here the example:
X Num_of_X Var1 Var2 Var3 Var4 ... Varn
3,10,165 3 3 10 165
1 1 1
15,100 2 15 100
10,52,63,90 4 10 52 63 90
I tried this way:
%let max_num_X=max(num_of_x);
data have;
set have;
length var1-var&max_num_X $10.;
array Var(&max_num_X) $;
do i=1 to &max_num_X;
Var[i]=scan(X,i,',');
output;
end;
run;
Could you help me?
Thank you
Do something like this
data have;
input X :$20.;
datalines;
3,10,165
1
15,100
10,52,63,90
;
data long;
set have;
n = _N_;
do i = 1 to countw(X, ',');
xx = scan(X, i, ',');
output;
end;
run;
proc transpose data = long out = want(drop=_:) prefix=Var;
by n;
id i;
var xx;
run;
you could use a macro to find the maximum number and create the variables:
%macro create_vars();
proc sql noprint; select max(countw(X)) into :max_num_X from have; quit;
data have; set have;
%do i = 1 %to &max_num_X.; Var&i. = scan(X,&i.,','); %end;
run;
%mend;
%create_vars();

Assigning the same value to multiple SAS variables

I have 100 different variables ndc1-ndc100. I need to assign the same value to all of them, something like this:
data prj.rx_comm_crosstab;
length
ndc1-ndc100 $20
;
retain ndc1-ndc100;
retain cnter 0;
set rx_cost_by_drug;
by yrmo subs_id mbrtype;
if first.mbrtype then do;
ndc1-ndc100 =' ';
cnter=0;
end;
....some other code
run;
The line ndc1-ndc100 = ' ' doesn't work. Is there a way to do it? I wanna avoid having to set each of the 100 variables to the same value individually.
you can use an array as shown below.
data class;
length
ndc1-ndc10 $20 ;
set sashelp.class;
array nd(*) $ ndc1-ndc10 ;
if age = 13 then do;
do i=1 to dim(nd);
nd{i}="Hello";
end;
end;
run;

How to pass variable sequence (list) to SAS Macro

I have variables named _200701, _200702,... till _201612, each containing specific numeric data for that month. From these, I want to substract specific amount (variable cap_inc), if a condition is met:
%MACRO DeleteExc(var);
DATA Working.Test;
SET Working.Test;
IF &var. GE cap_inc THEN &var. = SUM(&var., - cap_inc);
ELSE &var. = &var.;
RUN;
%MEND;
Code is working if I put only one month as a parameter (eg _200909)... But I want to put there sequence from these variables. I have tried combinations like "OF _200701 -- _201612" or "OF _20:", but nothing has worked.
I have also another macro, using parmbuff parameter, working in the "for each loop" way, where I can put more variables separated by comma, for instance
%DeleteExc(_200701, _200702, _200703)
But I still can't pass all variables in some convenient, easy to follow way. (I don't want to type all parameters as there is 120 of them).
Is there any way how to do this?
Thank you!
First thing is that if you want to pass a list into a macro then DO NOT delimit the list using a comma. It will just make calling the macro a large pain. You will will either need to use macro quoting to hide the comma. Or override SAS's parameter processing by using the /parmbuff option and add logic to process the &syspbuff macro variable yourself. Use some other character that is not used in the values as the delimiter. Like | or ^ for example. For a list of variable names use spaces as the delimiter.
%DeleteExc(varlist=_200701 _200702 _200703)
Then you can use the macro variable anywhere SAS expects a list of variables.
array in &varlist ;
total = sum(of &varlist);
Now since your list is really a list of MONTHS then give your macro the start and end month and let it generate the list for you.
%macro DeleteExc(start,end);
%local i var ;
%do i=0 %to %sysfunc(intck(month,&start,&end)) ;
%let var=_%sysfunc(intnx(month,&start,&i,b),yymmn6);
IF .Z < cap_inc < &var. THEN &var. = &var - cap_inc;
%end;
%mend;
DATA Working.Test;
SET Working.Test;
%DeleteExc("01JAN2007"d,"01DEC2016"d);
RUN;
Here are a few options - perhaps there's one you haven't tried?
data example;
array months{*} _200701-_200712 _200801-_200812 (24*1);
array underscores{*} _:;
_randomvar = 100;
s1 = sum(of _200701-_200812); /*Generates lots of notes about uninitialised variables but gives correct result*/
s2 = sum(of _200701--_200812); /*Works only if there are no rogue columns in between month columns*/
s3 = sum(of months{*}); /* Requires array definition*/
s4 = sum(of _:); /*Sum any variables with _ prefix - potentially including undesired variables*/
put (s1-s4)(=);
run;
The double dash (--) variable name range list can be used to specify the variables in an array. A simple iterative DO LOOP lets you perform the desired operation on each variable.
data want;
set have;
array month_named_variables _200701 -- _201612;
do _index = 1 to dim(month_named_variables); drop _index;
IF month_named_variables(_index) GE cap_inc THEN
month_named_variables(_index) = SUM(month_named_variables(_index), - cap_inc);
ELSE
month_named_variables(_index) = month_named_variables(_index);
end;
run;
If the data set has extra variables within the name range you can still use an array and non-macro code:
data want;
set have;
array nums _numeric_;
do _index = 1 to dim(nums); drop _index;
_vname = vname(nums(_index)); drop _vname;
if _vname ne: '_'
or not (2007 <= input(substr(_vname,2,4), ??4.) <= 2016)
or not (01 <= input(substr(_vname,6,2), ??2.) <= 12)
or not length(_vname) = 7
then continue;
IF nums(_index) GE cap_inc THEN
nums(_index) = SUM(nums(_index), - cap_inc);
ELSE
nums(_index) = nums(_index);
end;
run;
If you really need use a specific list of variables and want to work within a macro, I would recommend passing the FROM and TO values corresponding to the variable names and looping that range according to the naming convention:
%macro want(data=, yyyymm_from=, yyyymm_to=, guard=1000, debug=0);
%local LOWER UPPER YEARMON INDEX NVARS;
%let LOWER = %sysfunc(inputn(&yyyymm_from,yymmn6.));
%let UPPER = %sysfunc(inputn(&yyyymm_to,yymmn6.));
%let INDEX = 1;
%do YEARMON = &LOWER %to &UPPER;
%let yyyymm = %sysfunc(putn(&YEARMON, yymmn6.));
%local ymvar&INDEX;
%let ymvar&INDEX = _&yyyymm; %* NAMING CONVENTION;
%if &debug %then %put NOTE: YMVAR&INDEX=%superq(YMVAR&INDEX);
%if &INDEX > &GUARD %then %do;
%put ERROR: Exceeded guard limit of &GUARD variables;
%return;
%end;
%let NVARS = &INDEX;
%let YEARMON = %sysfunc(INTNX(MONTH,&yearmon,1)); %* NAMING CONVENTION;
%let YEARMON = %eval(&YEARMON-1); %* back off by one for implicit macro do loop increment of +1;
%let INDEX = %eval(&INDEX+1);
%end;
%do INDEX = 1 %to &NVARS;
%put NOTE: &=INDEX YMVAR&INDEX=&&&YMVAR&INDEX;
%end;
%mend;
%want (data=have, yyyymm_from=200701, yyyymm_to=201612)
If my understanding is correct, you want to do loop with month,which is defendant on variables in data, you could set start date and end date, then do loop.
%macro month_loop(start,end);
%let start=%sysfunc(inputn(&start,yymmn6.));
%let end=%sysfunc(inputn(&end,yymmn6.));
%let date=&start;
%do %until (%sysfunc(indexw("&date","&end")));
%let date=%sysfunc(intnx(month,&date,1));
%let var=_%sysfunc(putn(&date,yymmn6.));
data want;
set have;
IF &var. GE cap_inc THEN &var. = SUM(&var., - cap_inc);
ELSE &var. = &var.;
run;
%end;
%mend;
%month_loop(200701,201612)

Dropping variable based on sum of values in it using SAS

I wish to drop the columns in a SAS dataset which has a sum less than a particular value. Consider the case below.
Column_A Pred_1 Pred_2 Pred_3 Pred_4 Pred_5
A 1 1 0 1 0
A 0 1 0 1 0
A 0 1 0 1 0
A 0 1 0 1 1
A 0 1 0 0 1
Let us assume that our threshold is 4, so I wish to drop predictors having sum of active observations less than 4, so the output would look like
Column_A Pred_2 Pred_4
A 1 1
A 1 1
A 1 1
A 1 1
A 1 0
Currently I am using a very inefficient method of using multiple transposes to drop the predictors. There are multiple datasets with records > 30,000 so transpose approach is taking time. Would appreciate if anyone has a more efficient solution!
Thanks!
Seems like you could do:
Run PROC MEANS or similar proc to get the sums
Create a macro variable that contains all variables in the dataset with sum < threshhold
Drop those variables
Then no TRANSPOSE or whatever, just regular plain old summarization and drops. Note you should use ODS OUTPUT not the OUT= in PROC MEANS, or else you will have to PROC TRANSPOSE the normal PROC MEANS OUT= dataset.
An example using a trivial dataset:
data have;
array x[20];
do _n_ = 1 to 20;
do _i = 1 to dim(x);
x[_i] = rand('Uniform') < 0.2;
end;
output;
end;
run;
ods output summary=have_sums; *how we get our output;
ods html select none; *stop it from going to results window;
proc means data=have stackodsoutput sum; *stackodsoutput is 9.3+ I believe;
var x1-x20;
run;
ods html select all; *reenable normal html output;
%let threshhold=4; *your threshhold value;
proc sql;
select variable
into :droplist_threshhold separated by ' '
from have_sums
where sum lt &threshhold; *checking if sum is over threshhold;
quit;
data want;
set have;
drop &droplist_threshhold.; *and now, drop them!;
run;
Just use PROC SUMMARY to get the sums. You can then use a data step to generate the list of variable names to drop.
%let threshhold=4;
%let varlist= pred_1 - pred_5;
proc summary data=have ;
var &varlist ;
output out=sum sum= ;
run;
data _null_;
set sum ;
array x &varlist ;
length droplist $500 ;
do i=1 to dim(x);
if x(i) < &threshhold then droplist=catx(' ',droplist,vname(x(i)));
end;
call symputx('droplist',droplist);
run;
You can then use the macro variable to generate a DROP statement or a DROP= dataset option.
drop &droplist;

Simplifying the variable input in SAS

I have 90 variables in the data, I want to do the following in SAS.
Here is my SAS code:
data test;
length id class sex $ 30;
input id $ 1 class $ 4-6 sex $ 8 survial $ 10;
cards;
1 3rd F Y
2 2nd F Y
3 2nd F N
4 1st M N
5 3rd F N
6 2nd M Y
;
run;
data items2;
set test;
length tid 8;
length item $8;
tid = _n_;
item = class;
output;
item = sex;
output;
item = survial;
output;
keep tid item;
run;
What if I have 90 variables to input the data like this? There should be a very long list. I want to simplify it.
You could use an ARRAY or alternately a PROC TRANSPOSE.
The following is untested, because you haven't provided an exxample of your input dataset.
DATA ITEMS;
ARRAY VARS {*} VAR1-VAR90;
SET REPLACE;
DO I = LBOUND(VARS) TO HBOUUND(VARS);
ITEM = VARS{I};
OUTPUT;
END;
RUN;
OR
PROC TRANSPOSE DATA = TEST OUT = WANT;
BY ID;
VAR CLASS -- SURVIAL;
RUN;
In the future it would be best is you could supply your input and desired output.
I don't seem to be able to add another comment to the above answer, as such I am adding one here.
You need to extend the VAR statement to include all variables that you want transposed.
CLASS -- SURVIAL means all variables between CLASS and SURVIVAL inclusive.
Post your code and the error so that I can help you better.