I would like to dynamically create macros to query a transactional data set. I have a table that has a set of parameters (parameter_data) and transaction data (txs). For each row in my parameter data I want to create a macro that can be called to query the data.
Parameter data:
data parameter_data;
input macro_name $ parameter_name $ parameter_value $;
datalines;
A Person_ID 1
B TX_ID 2
;
Transactional Data:
data txns;
input Person_ID $ TX_ID $ TX_Amount $;
datalines;
John Sales 1123
Mary Acctng 34
John Sales 23
Mary Sales 2134
;
Here I try to create a macro that should create macros dynamically according to the parameter data. The 'inner macros' are the macros that are created from the parameter data.
%macro outerMacro;
/*loop through each row in the parameter table to get the detail of the macro we want to create*/
%DO ROW = 1 %To 2;
data _NULL_;
set parameter_data;
if _N_ = ROW then do;
call symputx('parameter_name',parameter_name);
call symputx('parameter_value',parameter_value);
end;
run;
/*define inner macro parameters*/
%let macroName = myMacro; /*set the name of the macro we want to create*/
%let innerMacroStart = macro ¯oName.; /*set the macro name to start the macro definition*/
%let innerMacroEnd = mend ¯oName;
%&&innerMacroStart.; /*start the inner macro*/
/*body of the macro*/
data output;
set txns;
&¶meter_name = &¶meter_value;
/*so here effectively for the first row in the parameter table we are filtering where person_id = John*/
run;
%&&innerMacroEnd.; /*end the inner macro*/
%mend outerMacro;
%&&outerMacroName.;
It seems that SAS is unable to parse the lines %innerMacroStart. Any help is much appreciated.
Thanks!
If the goal is just to subset data then it might be better to generate macro variables instead of actual macros. Try something like this instead.
data _null_;
set parameter_data ;
call symputx(macro_name,catx(' ','where also'
,parameter_name,'=',quote(trim(parameter_value)),';'));
run;
Then just use the generated where statement(s) when you need them by expanding the macro variable. Like this:
data output ;
set txns;
&a
run;
If you really want to generate a macro definition then you probably want to just use a data step to write the code to a file and then %include the file to compile the macros. That will be much easier to debug than macro logic.
Let's fix your parameter file to better match your test data. Person_ID and TX_ID are character variables in your transaction dataset. You will probably need to add logic or change the parameter file to allow it to handle testing of both numeric and character variables. For now I just made it generate code that assumes that PARAMETER_NAME refers to a character variable so that PARAMETER_VALUE will need to have quotes added to make it a string literal.
data parameter_data;
input macro_name :$32. parameter_name :$32. parameter_value $:200.;
datalines;
A Person_ID John
B TX_ID Acctng
;
data txns;
input Person_ID $ TX_ID $ TX_Amount $;
datalines;
John Sales 1123
Mary Acctng 34
John Sales 23
Mary Sales 2134
;
Now let's run a data step to generate the code for all of your macros. I added logic to use AND if there were multiple "parameters" defined for each macro.
filename code temp;
data _null_;
set parameter_data ;
by macro_name ;
file code ;
if first.macro_name then put
'%macro ' macro_name ';'
/ 'data output;'
/ ' set txns;'
/ ' where ' #
;
else put ' and ' # ;
put parameter_name '=' parameter_value :$quote. # ;
if last.macro_name then put
';'
/ 'run;'
/ '%mend ' macro_name ';'
;
run;
Now just use %include to compile the macros.
%include code / source2 ;
NOTE: %INCLUDE (level 1) file CODE is file C:\...\#LN00048.
432 +%macro A ;
433 +data output;
434 + set txns;
435 + where Person_ID ="John" ;
436 +run;
437 +%mend A ;
438 +%macro B ;
439 +data output;
440 + set txns;
441 + where TX_ID ="Acctng" ;
442 +run;
443 +%mend B ;
NOTE: %INCLUDE (level 1) ending.
Now you can use your macros.
445 options mprint;
446 %a ;
MPRINT(A): data output;
MPRINT(A): set txns;
MPRINT(A): where Person_ID ="John" ;
MPRINT(A): run;
NOTE: There were 2 observations read from the data set WORK.TXNS.
WHERE Person_ID='John';
NOTE: The data set WORK.OUTPUT has 2 observations and 3 variables.
447 %b ;
MPRINT(B): data output;
MPRINT(B): set txns;
MPRINT(B): where TX_ID ="Acctng" ;
MPRINT(B): run;
NOTE: There were 1 observations read from the data set WORK.TXNS.
WHERE TX_ID='Acctng';
NOTE: The data set WORK.OUTPUT has 1 observations and 3 variables.
I have placed a comment before each block of code, but essentially it is:
Parameter set up.
Macro generation.
%include.
Call any desired macro.
I have assumed no more than 999 parameter observations - this is controlled by seq.
You can examine file "inner_macro.sas" to see the macro definitions.
NB. When you try it, make sure to use your own path in place of <your-path> (occurs twice):
/* set up parameters */
data parameters;
infile datalines dlm=',';
input var : $8.
operator : $8.
value : $8.
;
datalines;
name,eq,"John"
age,gt,12
weight,eq,0
;
/* read parameters and generate a macro definition for each obs, written to a file */
data _null_;
file '<your-path>/inner_macro.sas';
set parameters;
seq = put(_n_,z3.);
put '%macro inner_' seq ';';
put ' where ' var operator value ';';
put '%mend inner_' seq ';';
put;
run;
/* %include (submits code in file) all of the macro definitions */
%include '<your-path>/inner_macro.sas';
options mprint;
/* invoke the macro with the required data sets */
data class1;
set sashelp.class;
%inner_001;
run;
data class2;
set sashelp.class;
%inner_002;
run;
data class3;
set sashelp.class;
%inner_003;
run;
Related
Dummy data:
MEMNAME _var1 var2 var3 var4
XY XYSTART_1 XYSTATT_2 XYSTAET_3 XYSTAWT_4
I want to create a macro variable that will have data as TEST_XYSTART, TEST_XYSTATT, TEST_XYSTAET, TEST_TAWT.... how can I do this in datastep without using call symput because I want to use this macro variable in the same datastep (call symput will not create macro variable until I end the datastep).
I tried as below (not working), please tell me what is the correct way of write the step.
case = "TEST_"|| strip(reverse(substr(strip(reverse(testcase(i))),3)));
%let var = case; (with/without quotes not getting the desired result).
abc= strip(reverse(substr(strip(reverse(testcase(i))),3)));
%let test = TEST_;
%let var = &test.abc;
I am getting correct data with this statement: strip(reverse(substr(strip(reverse(testcase(i))),3)))
just not able to concatenate this value with TEST_ and assign it to the macro variable in a datastep.
Appreciate your help!
It makes no sense to locate a %LET statement in the middle of a data step. The macro processor evaluates the macro statements first and then passes the resulting code onto SAS to evaluate. So if you had this code:
data want;
set have;
%let var=abc ;
run;
It is the same as if you placed the %LET statements before the DATA statement.
%let var=abc ;
data want;
set have;
run;
If you want to reference a variable dynamically in a data step then use either an index into an array.
data want;
set have;
array test test_1 - test_3;
new_var = test[ testnum ] ;
run;
Or use the VvalueX() function which will return the value of a variable whose name is a character expression.
data want;
set have;
new_var = vvaluex( cats('test_',testnum) );
run;
I have 100 different variables ndc1-ndc100. I need to assign the same value to all of them, something like this:
data prj.rx_comm_crosstab;
length
ndc1-ndc100 $20
;
retain ndc1-ndc100;
retain cnter 0;
set rx_cost_by_drug;
by yrmo subs_id mbrtype;
if first.mbrtype then do;
ndc1-ndc100 =' ';
cnter=0;
end;
....some other code
run;
The line ndc1-ndc100 = ' ' doesn't work. Is there a way to do it? I wanna avoid having to set each of the 100 variables to the same value individually.
you can use an array as shown below.
data class;
length
ndc1-ndc10 $20 ;
set sashelp.class;
array nd(*) $ ndc1-ndc10 ;
if age = 13 then do;
do i=1 to dim(nd);
nd{i}="Hello";
end;
end;
run;
I am giving the below command to check the number of rows in SAS data set but it's outputting the 60 records of dataset however the dataset have 247 records.
Is there is any other way to do in unix command?
UNIX command:
awk 'END {print NR}' /home/user/check.sas7bdat
You need to write a SAS program to output the number of observations for you. The structure of the sas7bdat file is complicated.
data _null_;
file stdout;
set "&sysparm" nobs=nobs;
put "NOBS:" nobs;
stop;
run;
I named this "test.sas"
This reads in the data set specified in a passed system parameter and outputs to STDOUT the number of observations.
I created a test data set in my home directory like:
libname d "~/";
data d.test;
do i=1 to 1000;
output;
end;
run;
From the command line run
<path to sas>/sas test.sas -sysparm ~/test.sas7bdat
I get NOBS:1000 back.
What about just doing it in a SAS datastep? You can fetch the number of rows with the NOBS statement.
/* Test dataset */
data have;
a = 1;output;
a = 2;output;
a = 3;output;
run;
data _null_;
set have NOBS = size;
call symput("size",strip(size));
run;
%put NOTE: Number of records: &size.;
NOTE: Number of records: 3
I've got a .dat file with numbers that I need imported into a SAS dataset. However, there's plenty of information that I do not need, and I only want specific lines of data (e.g. every 6th line starting from line 1000, until I have 100 observations). I also require a unique identifier based on what is displayed on the first line.
So for example, the .dat file contains this:
DATANOTREQUIRED
DATANOTREQUIRED
DATANOTREQUIRED
UPDATE AAA_1111111_Q_BBBBBB_0_1_#
123.4,
123.5,
124.0,
124.1
DATANOTREQUIRED
DATANOTREQUIRED
DATANOTREQUIRED
UPDATE AAA_1111111__Q_BBBBBB_0_2_#
125.1,
126.0,
127.1,
130.0
What I want the eventual SAS dataset to look like is this
Identifier | Value
X.1. | 124.1
X.2. | 130.0
I'm using the infile in SAS and using input to point to line 1000 but I'm stuck and cannot get the SAS dataset I want. (Updated code based on contributors below)
data work.test;
infile '\\filepath\mydatasource.dat' dsd firstobs=1042 truncover;
input #8 ID :$40.
#4 Value1 :8.;
run;
but what I'm seeing now is that the header lines are appearing fine, but the first observation has a . and instead the first data value is appearing for the 2nd header line.
ID | Value1
UPDATE AAA_1111111_Q_BBBBBB_0_1_# | .
UPDATE AAA_1111111__Q_BBBBBB_0_2_# | 124.1
Here's an example assuming that you have the same number of rows between each header row:
data want;
if _n_ > 2 then stop; /*Stop after we've output 2 rows */
infile cards firstobs=6; /*Skip the first 5 lines in the file*/
input #1 #8 ID :$32.
#5 myvar :8.;
cards;
UPDATE AAA_1111111_Q_BBBBBB_0_1_#
123.4,
123.5,
124.0,
124.1
UPDATE AAA_1111111__Q_BBBBBB_0_2_#
125.1,
126.0,
127.1,
130.0
UPDATE AAA_1111111_Q_BBBBBB_0_3_#
123.4,
123.5,
124.0,
124.1
UPDATE AAA_1111111__Q_BBBBBB_0_4_#
125.1,
126.0,
127.1,
130.0
;
run;
Use the FIRSTOBS= option to skip the beginning of the file.
If there are always 5 records per block you could just read them individually.
data want;
infile rawdata dsd firstobs=1000 truncover;
input id :$40. (4*value) (/) ;
run;
Or you could do something like this that should allow for a variable number of values per id and just keep the last one.
data want;
infile rawdata dsd firstobs=1000 end=eof;
input # ;
length id $32 value 8 ;
retain id value;
if _infile_ =: 'UPDATE' then do;
if _n_ > 1 then output;
id = scan(_infile_,-1,' ');
end;
else input value;
if eof and _n_ > 1 then output;
run;
I wish to drop the columns in a SAS dataset which has a sum less than a particular value. Consider the case below.
Column_A Pred_1 Pred_2 Pred_3 Pred_4 Pred_5
A 1 1 0 1 0
A 0 1 0 1 0
A 0 1 0 1 0
A 0 1 0 1 1
A 0 1 0 0 1
Let us assume that our threshold is 4, so I wish to drop predictors having sum of active observations less than 4, so the output would look like
Column_A Pred_2 Pred_4
A 1 1
A 1 1
A 1 1
A 1 1
A 1 0
Currently I am using a very inefficient method of using multiple transposes to drop the predictors. There are multiple datasets with records > 30,000 so transpose approach is taking time. Would appreciate if anyone has a more efficient solution!
Thanks!
Seems like you could do:
Run PROC MEANS or similar proc to get the sums
Create a macro variable that contains all variables in the dataset with sum < threshhold
Drop those variables
Then no TRANSPOSE or whatever, just regular plain old summarization and drops. Note you should use ODS OUTPUT not the OUT= in PROC MEANS, or else you will have to PROC TRANSPOSE the normal PROC MEANS OUT= dataset.
An example using a trivial dataset:
data have;
array x[20];
do _n_ = 1 to 20;
do _i = 1 to dim(x);
x[_i] = rand('Uniform') < 0.2;
end;
output;
end;
run;
ods output summary=have_sums; *how we get our output;
ods html select none; *stop it from going to results window;
proc means data=have stackodsoutput sum; *stackodsoutput is 9.3+ I believe;
var x1-x20;
run;
ods html select all; *reenable normal html output;
%let threshhold=4; *your threshhold value;
proc sql;
select variable
into :droplist_threshhold separated by ' '
from have_sums
where sum lt &threshhold; *checking if sum is over threshhold;
quit;
data want;
set have;
drop &droplist_threshhold.; *and now, drop them!;
run;
Just use PROC SUMMARY to get the sums. You can then use a data step to generate the list of variable names to drop.
%let threshhold=4;
%let varlist= pred_1 - pred_5;
proc summary data=have ;
var &varlist ;
output out=sum sum= ;
run;
data _null_;
set sum ;
array x &varlist ;
length droplist $500 ;
do i=1 to dim(x);
if x(i) < &threshhold then droplist=catx(' ',droplist,vname(x(i)));
end;
call symputx('droplist',droplist);
run;
You can then use the macro variable to generate a DROP statement or a DROP= dataset option.
drop &droplist;