How to randomly select variables in SAS? - variables

I can find all sorts of information on how to randomly select observations in SAS which is a fairly easy task. This is not what I need though. I need to randomly select variables. What I want to do specifically is randomly choose 20 variables from my list of 159 variables and do this 50 times. I want to ensure diversity too. I have been spending about two days on this and am having no luck.

I'm glad that you asked this question, because I just developed a solution for that! Let's break down exactly what needs to be done, step-by-step.
Step 0: What do we need to do?
We need a way to take all of our variables and randomly select 20 of them while keeping them within the bounds of the SAS language rules.
We'll require:
All variables in the dataset
A way to re-sort them randomly
A limit of 20 variables
A way to loop this 50 times
Let's start with 1.
Step 1: Getting all the variables
sashelp.vcolumn provides a list of all variables within a dataset. Let's select them all.
proc sql noprint;
create table all_vars as
select name
where libname = 'LIBRARYHERE' AND memname = 'HAVE'
;
quit;
This gets us a list of all variables within our dataset. Now, we need to sort them randomly.
Step 2: Making them random
SAS provides the rand function that allows you to pull from any distribution that you'd like. You can use call streaminit(seedhere) prior to the rand function to set a specific seed, creating reproducable results.
We'll simply modify our original SQL statement and order the dataset with the rand() function.
data _null_;
call streaminit(1234);
run;
proc sql noprint;
create table all_vars as
select name
from sashelp.vcolumn
where libname = 'LIBRARYHERE' AND memname = 'HAVE'
order by rand('uniform');
quit;
Now we've got all of our variables in a random order, distributed evenly by the uniform distribution.
Step 3: Limit to 20 variables
You can do this a few ways. One way is the obs= dataset option in separate procedures, another is the outobs= proc sql option. Personally, I like the obs= dataset option since it doesn't generate a warning in the log, and can be used in other procedures.
data _null_;
call streaminit(1234);
run;
proc sql noprint outobs=20;
create table all_vars as
select name
from sashelp.vcolumn
where libname = 'LIBRARYHERE' AND memname = 'HAVE'
order by rand('uniform');
quit;
Step 4: Loop it 50 times
We'll use SAS Macro Language to do this part. We can create 50 individual datasets this way, or switch the code up slightly and read them into macro variables.
%macro selectVars(loop=50, seed=1234);
data _null_;
call streaminit(&seed);
run;
%do i = 1 %to &loop;
proc sql noprint outobs=20;
create table all_vars&i as
select name
from sashelp.vcolumn
where libname = 'LIBRARYHERE' AND memname = 'HAVE'
order by rand('uniform')
;
quit;
%end;
%mend;
%selectVars;
Or, option 2:
%macro selectVars(loop=50, seed=1234);
data _null_;
call streaminit(&seed);
run;
%do i = 1 %to &loop;
proc sql noprint outobs=20;
select name
into :varlist separated by ' '
from sashelp.vcolumn
where libname = 'LIBRARYHERE' AND memname = 'HAVE'
order by rand('uniform')
;
quit;
%end;
%mend;
%selectVars;
The 2nd option will create a local macro variable called &varlist that will have the random 20 variables separated by spaces. This can be convenient for various modeling procs, and is preferable since it does not create a separate dataset each time.
Hope this helps!

You will need to treat your meta data as data and use SURVEYSELECT to select observations. Then perhaps put these names into macro variables but you did not mention the exact output you want.
data v;
array rvars[159];
run;
proc transpose data=v(obs=0) out=vars name=name;
var rvars:;
run;
proc surveyselect reps=4 sampsize=20 data=vars out=selection;
run;
proc transpose data=selection out=lists(drop=_:);
by replicate;
var name;
run;
proc print;
run;
data _null_;
set lists;
by replicate;
call symputx(cats('VLIST',_n_),catx(' ',of col:));
run;
%put _global_;

Related

Get Data from SAS Macro (list of values) to SAS table (column)

I am trying to create a SAS table from Macro variable using PROC SQL:
I have a list of value saved in a macro variable :
%let l=1,2,3;
I want to create a SAS table with a column containing the values of the macro variable :
1
2
3
Thank you very much for your help.
Sincerely,
Abdeljalil
you should so some effort to solving this yourself.
Put the values into a string, parse the string and output the values you would like.
%let l=1,2,3;
data want;
str = "&l";
do i=1 to countw(str,',');
value = input(scan(str,i,","),best.);
output;
end;
/*drop other variables if you want*/
drop str i;
run;
Something like this?
%let age=%str(12,13,15);
proc sql;
select * from sashelp.class where age in (&age);
quit;
You have a data set that contains a list of names and you want to place these names into a macro variable for later use. That will work as long as the macro variable does not go beyond the 64K limit.
If the value hits this limit, then you can use macro processing to retrieve the names from the data set. Since a macro definition does not have the 64K restriction, it can be used to create the list for you.
In the sample code on the Full Code tab, we have a list of names that we want to use on an INPUT statement along with a given informat. This sample demonstrates how to create the list without having to use a macro variable.
data one;
input name $;
datalines;
abc
def
ghi
;
run;
%macro test;
%let dsid=%sysfunc(open(one));
%let cnt=%sysfunc(attrn(&dsid,nobs));
%do i=1 %to &cnt;
%let rc=%sysfunc(fetchobs(&dsid,&i));
%cmpres(%sysfunc(getvarc(&dsid,%sysfunc(varnum(&dsid,name))))) $4.
%end;
%let rc=%sysfunc(close(&dsid));
%mend test;
/** Using %PUT to see outcome **/
/** %test could be used on an INPUT statement **/
%put %test;
source: http://support.sas.com/kb/39/605.html

Using macro variables/language in PROC SQL

I use PROC SQL for Oracle database queries (I'm not a db person though, so I can't be more specific than that), and we often apply formats from a library that is automatically loaded. I was wondering if there's a faster way to program these types of queries, for example let's say I have a variable called prim_disease_cd in a view, and I want to pull that out, apply the format (which has the same name) and also call it prim_disease_cd. Right now I would do
put(a.prim_disease_cd, prim_disease_cd.) as prim_disease_cd
Is there a way I can shorten this using macro language? I have been unsuccessful so far, but we do this often and it seems quite inefficient. Essentially I want a macro that takes in a view/dataset a and a variable X and applies "put (a.X, X.) as X"
Additionally, if there's anyway I can implement something like this for dates too that would be great, i.e. to replace
datepart(a.(var_name)) as (var_name) format mmddyy10.
Thanks for any help you can provide.
You could create simple macros to do those two things. Macros that emit just a portion of a statement like that are often referred to as macro functions or function style macros. Make sure not to emit any semi-colons. For example you might make these two macros.
%macro decode(alias);
%local varname ;
%let varname=%scan(&alias,-1,.);
put(&alias,&varname..) as &varname
%mend;
%macro datepart(alias);
%local varname ;
%let varname=%scan(&alias,-1,.);
datepart(&alias) as &varname format yymmdd10.
%mend;
Then your SQL query might look like:
create table want as
select a.patid
, %decode(a.prim_disease_cd)
, %datepart(a.onset_date)
from oralib.diagnosis a
;
You might find that the use of the these will make your SAS code much harder to maintain. It might be easier to find a way to automate the generation of the text in your editor instead. Or running a program that generates the text from the metadata and then just copy and paste it into your program.
PS Don't use MDY (or DMY) format for dates. It will just confuse your European (or American) friends.
If ever need to use the <concept>_cd code values in a future query against the Oracle data I would say create a new variable such as <concept>_value or simply <concept>.
If the coded data in the Oracle query is named consistently, such as only <concept>_cd, you can have a macro examine the pulled data and create a SAS view that applies the mapping from code to value via SAS format. Since you are pulling the coded values from Oracle, there is likely one or more lookup tables in Oracle that map the code to the value, and possibly your SAS formats are built from that data.
In your use case, transforming code to value is, in essence, performing left joins against the supposed lookup table or tables. I would presume you are performing the code mapping so that it is easier to perform subset selections.
If you are only reporting the data, you may only need to apply the format to the code variable itself. Here is a sample macro that post processes a query result and performs code to value mappings according to naming convention <concept>_cd
data code_lookups;
length id 8 fmt $31 desc $50 ;
input id & fmt & desc;
datalines;
1 country_cd US
2 country_cd Canada
10 color_cd Green
11 color_cd Blue
12 color_cd Red
20 footwear_cd Shoes
21 footwear_cd Socks
22 footwear_cd Laces
run;
proc format cntlin=code_lookups(rename=(fmt=fmtname id=start desc=label));
run;
data have(label="Some result from Oracle with unmapped codes");
input item_id country_cd color_cd footwear_cd;
datalines;
1 1 11 22
2 2 11 21
3 1 12 22
3 1 10 20
run;
%macro auto_codemap (data=, out=, out_struct=view, map_func=new_var);
%local dsid i l p q varname;
%let dsid = %sysfunc(open(&data));
%if &map_func ne format_only and &map_func ne new_var %then %do;
%put ERROR: &=map_func unknown.;
%end;
proc sql;
create &out_struct &out as
select
%do i = 1 %to %sysfunc(attrn(&dsid,nvar));
%if &i > 1 %then %str(,);
%let varname = %sysfunc(varname(&dsid,&i));
&varname
%let l = %length(&varname);
%if &l > 3 %then %do;
%let p = %eval(&l-3);
%let q = %eval(&l-2);
%if %substr(%upcase(&varname),&q) = _CD %then %do;
%if &map_func = format_only %then %do;
format=%str(&varname).
%end;
%else %if &map_func = new_var %then %do;
, put(&varname, %str(&varname).) as %substr(&varname,1,&p)
%end;
%end;
%end;
%end;
from &data
;
quit;
%let dsid = %sysfunc(close(&dsid));
%mend;
options mprint;
%auto_codemap (data=have, out=want)
proc print data=want;
run;
%auto_codemap (data=have, out=want2, map_func=format_only)
proc print data=want2;
run;

Pass a dynamic array to iterate over proc sql

In this question
Simple iteration through array with proc sql in SAS
%macro doit(list);
proc sql noprint;
%let n=%sysfunc(countw(&list));
%do i=1 %to &n;
%let val = %scan(&list,&i);
create table somlib._&val as
select * from somlib.somtable
where item=&val;
%end;
quit;
%mend;
%doit(100 101 102);
I want to pass a list through macro doit which we can extract from a dataset.
For eg.: list contains the distinct values of variable 'age' present in dataset 'agegroups'.
data agegroups;
input age;
datalines;
1
2
4
5
8
18
16
19
23;
I looked upon %macro array for it but it didnt help me out(http://www2.sas.com/proceedings/sugi31/040-31.pdf)
Any help will be highly appreciated. Thanks !
As stated in the comments, BY group processing might be a better option.
However, you can use PROC SQL to create your list:
proc sql noprint;
select distinct age
into :ageList separated by ' '
from agegroups;
quit;
%put Age List: &ageList;
%doit(&ageList);

SAS - renaming variables

I am trying to change the names of variables in my table/dataset. I went through several websites and this discussion forum, but I didnĀ“t manage to find any code that would work properly in my case (i am a newcomer to SAS).
My dataset contains 103 columns and I would like to rename the first 100 columns. The name of the first column is CFT(1), CFT(2) of the second column,..., CFT(100) of the 100th column. New variables can be called for example CFT_n(1),...,CFT_n(100).
The code I was using is following:
data vystup_m200_b;
set vystup_m200_a;
rename 'cft(1)'n - 'cft(100)'n='cft(1)_n'n - 'cft(100)_n'n;
run;
But I obtain an error stating:
Aplhabetic prefixes for enumerated variables (cft(1)-cft(100)) are different.
Thank you for any suggestion what I am doing wrong.
Even with validvarname=any the numeric suffix on a numbered variable list have to have the number as the last part of the name. You "could" use the features of PROC TRANSPOSE to flip-flop the data to rename the variables. This is only advisable if the data are rather small.
data ren;
array _a[*] 'cft(1)'n 'cft(2)'n 'cft(3)'n ( 1 2 3);
do i = 1 to 10;
output;
end;
drop i;
run;
proc transpose data=ren out=ren2;
run;
proc transpose data=ren2 out=renamed(drop=_name_) suffix=_N;
id _name_;
run;
If your variables are sequentially named, a simple macro will suffice:
option validvarname = any;
data ren;
array _a[*] 'cft(1)'n 'cft(2)'n 'cft(3)'n ( 1 2 3);
do i = 1 to 10;
output;
end;
drop i;
run;
%macro rename_loop;
%local i;
%do i = 1 %to 3;
"cft(&i)"n = "cft(&i)_n"n
%end;
%mend rename_loop;
proc datasets lib = work nolist nowarn nodetails;
modify ren;
rename %rename_loop;
run;
quit;
This should work more or less instantaneously, regardless of the size of the dataset, as it only needs to update the metadata.
Renaming is fastest. I would look to a more general solution that doesn't require knowing anything like the name or how many or if you need name literals.
data ren;
array _a[*] 'cft(1)'n 'cft(2)'n 'cft(3)'n (1 2 3);
do i = 1 to 10;
output;
end;
drop i;
run;
proc print;
run;
proc transpose data=ren(obs=0) out=ren2;
run;
proc sql noprint;
select catx('=',nliteral(_name_),nliteral(cats(_name_,'_n')))
into :renamelist separated by ' '
from ren2;
quit;
run;
%put NOTE: &=renamelist;
proc datasets nolist;
modify ren;
rename &renamelist;
run;
contents data=ren varnum short;
quit;
Another solution, which is renaming variables after upload:
proc import datafile="\\folder\RUN_00.xlsx"
dbms=xlsx out=run_00 replace;
run;
data rename;
length ren $32767;
set run_00(obs= 1);
keep ren delka;
array cfte{*} CFT:;
do i=1 to dim(cfte);
ren=strip(ren)||" 'cft("||strip(i)||")'n='cft_"||strip(i)||"_00'n";
delka=length(ren);
end;
call symputx("renam",ren);
run;
proc datasets library=work;
modify run_00;
rename &renam;
run;

SAS If then else Statement with Do loop

I need to get a percentage for 75 values in 75 columns individually. And I want to use a do loop so I don't have to hard code it 75 times. There are some conditions so there will be a where statement.
I am not getting the do loop correctly but I am using the below to get a percentage
case when (SUM(t1.sam)) >0 then
((SUM(t1.sam))/(SUM(t1.sam_Threshold)))*100
else 0
end
I tried the below and its a bit better:
data test;
i_1=4;
i_2=8;
i_3=4;
i_4=8;
V_ANN_V_INSP=24;
run;
%macro loop();
%let numcols=4;
proc sql;
create table test3 as
select V_ANN_V_INSP,
%do i=1 %to &numcols;
(i_&i/V_ANN_V_INSP)*100 as i_&i._perc
%if &i<&numcols %then %do;,
%end;
%end;
from test;
quit;
%mend;
%loop();
CASE WHEN is a SQL statement, not a data step statement, so you can't use a DO loop there. Depending on what you're doing exactly, there are a lot of possible solutions here. Posting additional code would help to get a more precise answer, but I can give you a few suggestions.
First, take it into a data step. Then you can use a do loop.
data want;
set have;
array nums sam1-sam75;
array denoms threshold1-threshold75;
array pct[75];
do _t = 1 to dim(nums);
pct[_t]=nums[_t]/denoms[_t];
end;
run;
Second, if you need to do this in SQL for some reason, you can write out the SQL code either in a macro or in a data step in a pre-processing step.
%macro do_sql_st;
%do _t = 1 to 75;
case when (SUM(t1.sam&_t.)) >0 then
((SUM(t1.sam&_t.))/(SUM(t1.sam_Threshold&_t.)))*100
else 0
end
as pct&_t.
%end;
%mend do_sql_st;
proc sql;
select %do_sql_st from t1 where ... ;
quit;
These are not terribly flexible; unless you have very specifically named variables, they won't work as is. You're more likely to want to do some sort of data step preprocessing I suspect, but that's very hard to explain without more detail as to how the variables are named (ie, if there is a relationship between them).