Dummy data:
MEMNAME _var1 var2 var3 var4
XY XYSTART_1 XYSTATT_2 XYSTAET_3 XYSTAWT_4
I want to create a macro variable that will have data as TEST_XYSTART, TEST_XYSTATT, TEST_XYSTAET, TEST_TAWT.... how can I do this in datastep without using call symput because I want to use this macro variable in the same datastep (call symput will not create macro variable until I end the datastep).
I tried as below (not working), please tell me what is the correct way of write the step.
case = "TEST_"|| strip(reverse(substr(strip(reverse(testcase(i))),3)));
%let var = case; (with/without quotes not getting the desired result).
abc= strip(reverse(substr(strip(reverse(testcase(i))),3)));
%let test = TEST_;
%let var = &test.abc;
I am getting correct data with this statement: strip(reverse(substr(strip(reverse(testcase(i))),3)))
just not able to concatenate this value with TEST_ and assign it to the macro variable in a datastep.
Appreciate your help!
It makes no sense to locate a %LET statement in the middle of a data step. The macro processor evaluates the macro statements first and then passes the resulting code onto SAS to evaluate. So if you had this code:
data want;
set have;
%let var=abc ;
run;
It is the same as if you placed the %LET statements before the DATA statement.
%let var=abc ;
data want;
set have;
run;
If you want to reference a variable dynamically in a data step then use either an index into an array.
data want;
set have;
array test test_1 - test_3;
new_var = test[ testnum ] ;
run;
Or use the VvalueX() function which will return the value of a variable whose name is a character expression.
data want;
set have;
new_var = vvaluex( cats('test_',testnum) );
run;
Related
I have variables named _200701, _200702,... till _201612, each containing specific numeric data for that month. From these, I want to substract specific amount (variable cap_inc), if a condition is met:
%MACRO DeleteExc(var);
DATA Working.Test;
SET Working.Test;
IF &var. GE cap_inc THEN &var. = SUM(&var., - cap_inc);
ELSE &var. = &var.;
RUN;
%MEND;
Code is working if I put only one month as a parameter (eg _200909)... But I want to put there sequence from these variables. I have tried combinations like "OF _200701 -- _201612" or "OF _20:", but nothing has worked.
I have also another macro, using parmbuff parameter, working in the "for each loop" way, where I can put more variables separated by comma, for instance
%DeleteExc(_200701, _200702, _200703)
But I still can't pass all variables in some convenient, easy to follow way. (I don't want to type all parameters as there is 120 of them).
Is there any way how to do this?
Thank you!
First thing is that if you want to pass a list into a macro then DO NOT delimit the list using a comma. It will just make calling the macro a large pain. You will will either need to use macro quoting to hide the comma. Or override SAS's parameter processing by using the /parmbuff option and add logic to process the &syspbuff macro variable yourself. Use some other character that is not used in the values as the delimiter. Like | or ^ for example. For a list of variable names use spaces as the delimiter.
%DeleteExc(varlist=_200701 _200702 _200703)
Then you can use the macro variable anywhere SAS expects a list of variables.
array in &varlist ;
total = sum(of &varlist);
Now since your list is really a list of MONTHS then give your macro the start and end month and let it generate the list for you.
%macro DeleteExc(start,end);
%local i var ;
%do i=0 %to %sysfunc(intck(month,&start,&end)) ;
%let var=_%sysfunc(intnx(month,&start,&i,b),yymmn6);
IF .Z < cap_inc < &var. THEN &var. = &var - cap_inc;
%end;
%mend;
DATA Working.Test;
SET Working.Test;
%DeleteExc("01JAN2007"d,"01DEC2016"d);
RUN;
Here are a few options - perhaps there's one you haven't tried?
data example;
array months{*} _200701-_200712 _200801-_200812 (24*1);
array underscores{*} _:;
_randomvar = 100;
s1 = sum(of _200701-_200812); /*Generates lots of notes about uninitialised variables but gives correct result*/
s2 = sum(of _200701--_200812); /*Works only if there are no rogue columns in between month columns*/
s3 = sum(of months{*}); /* Requires array definition*/
s4 = sum(of _:); /*Sum any variables with _ prefix - potentially including undesired variables*/
put (s1-s4)(=);
run;
The double dash (--) variable name range list can be used to specify the variables in an array. A simple iterative DO LOOP lets you perform the desired operation on each variable.
data want;
set have;
array month_named_variables _200701 -- _201612;
do _index = 1 to dim(month_named_variables); drop _index;
IF month_named_variables(_index) GE cap_inc THEN
month_named_variables(_index) = SUM(month_named_variables(_index), - cap_inc);
ELSE
month_named_variables(_index) = month_named_variables(_index);
end;
run;
If the data set has extra variables within the name range you can still use an array and non-macro code:
data want;
set have;
array nums _numeric_;
do _index = 1 to dim(nums); drop _index;
_vname = vname(nums(_index)); drop _vname;
if _vname ne: '_'
or not (2007 <= input(substr(_vname,2,4), ??4.) <= 2016)
or not (01 <= input(substr(_vname,6,2), ??2.) <= 12)
or not length(_vname) = 7
then continue;
IF nums(_index) GE cap_inc THEN
nums(_index) = SUM(nums(_index), - cap_inc);
ELSE
nums(_index) = nums(_index);
end;
run;
If you really need use a specific list of variables and want to work within a macro, I would recommend passing the FROM and TO values corresponding to the variable names and looping that range according to the naming convention:
%macro want(data=, yyyymm_from=, yyyymm_to=, guard=1000, debug=0);
%local LOWER UPPER YEARMON INDEX NVARS;
%let LOWER = %sysfunc(inputn(&yyyymm_from,yymmn6.));
%let UPPER = %sysfunc(inputn(&yyyymm_to,yymmn6.));
%let INDEX = 1;
%do YEARMON = &LOWER %to &UPPER;
%let yyyymm = %sysfunc(putn(&YEARMON, yymmn6.));
%local ymvar&INDEX;
%let ymvar&INDEX = _&yyyymm; %* NAMING CONVENTION;
%if &debug %then %put NOTE: YMVAR&INDEX=%superq(YMVAR&INDEX);
%if &INDEX > &GUARD %then %do;
%put ERROR: Exceeded guard limit of &GUARD variables;
%return;
%end;
%let NVARS = &INDEX;
%let YEARMON = %sysfunc(INTNX(MONTH,&yearmon,1)); %* NAMING CONVENTION;
%let YEARMON = %eval(&YEARMON-1); %* back off by one for implicit macro do loop increment of +1;
%let INDEX = %eval(&INDEX+1);
%end;
%do INDEX = 1 %to &NVARS;
%put NOTE: &=INDEX YMVAR&INDEX=&&&YMVAR&INDEX;
%end;
%mend;
%want (data=have, yyyymm_from=200701, yyyymm_to=201612)
If my understanding is correct, you want to do loop with month,which is defendant on variables in data, you could set start date and end date, then do loop.
%macro month_loop(start,end);
%let start=%sysfunc(inputn(&start,yymmn6.));
%let end=%sysfunc(inputn(&end,yymmn6.));
%let date=&start;
%do %until (%sysfunc(indexw("&date","&end")));
%let date=%sysfunc(intnx(month,&date,1));
%let var=_%sysfunc(putn(&date,yymmn6.));
data want;
set have;
IF &var. GE cap_inc THEN &var. = SUM(&var., - cap_inc);
ELSE &var. = &var.;
run;
%end;
%mend;
%month_loop(200701,201612)
The data I have are millions of rows and rather sparse with anywhere between 3 and 10 variables needing processed. My end result needs to be one single row containing the first non-missing value for each column. Take the following test data:
** test data **;
data test;
length ID $5 AID 8 TYPE $5;
input ID $ AID TYPE $;
datalines;
A . .
. 123 .
C . XYZ
;
run;
The end result should look like such:
ID AID TYPE
A 123 XYZ
Using macro lists and loops I can brute force this result with multiple merge statements where the variable is non-missing and obs=1 but this is not efficient when the data are very large (below I'd loop over these variables rather than write multiple merge statements):
** works but takes too long on big data **;
data one_row;
merge
test(keep=ID where=(ID ne "") obs=1) /* character */
test(keep=AID where=(AID ne .) obs=1) /* numeric */
test(keep=TYPE where=(TYPE ne "") obs=1); /* character */
run;
The coalesce function seems very promising, but I believe I need it in combination with array and output to build this single-row result. The function also differs (coalesce and coalescec depending on variable type) whereas it does not matter using proc sql. I get an error using array since all variables in the array list are not the same type.
Exactly what is most efficient will largely depend on the characteristics of your data. In particular, whether the first nonmissing value for the last variable is usually relatively "early" in the dataset, or if you usually will have to trawl through the entire dataset to get to it.
I assume your dataset is not indexed (as that would simplify things greatly).
One option is the standard data step. This isn't necessarily fast, but it's probably not too much slower than most other options given you're going to have to read most/all of the rows no matter what you do. This has a nice advantage that it can stop when every row is complete.
data want;
if 0 then set test; *defines characteristics;
set test(rename=(id=_id aid=_aid type=_type)) end=eof;
id=coalescec(id,_id);
aid=coalesce(aid,_aid);
type=coalescec(type,_type);
if cmiss(of id aid type)=0 then do;
output;
stop;
end;
else if eof then output;
drop _:;
run;
You could populate all of that from macro variables from dictionary.columns, or even might use temporary arrays, though I think that gets too messy.
Another option is the self update, except it needs two changes. One, you need something to join on (as opposed to merge which can have no by variable). Two, it will give you the last nonmissing value, not the first, so you'd have to reverse-sort the dataset.
But assuming you added x to the first dataset, with any value (doesn't matter, but constant for every row), it is this simple:
data want;
update test(obs=0) test;
by x;
run;
So that has the huge advantage of simplicity of code, exchanged for some cost of time (reverse sorting and adding a new variable).
If your dataset is very sparse, a transpose might be a good compromise. Doesn't require knowing the variable names as you can process them with arrays.
data test_t;
set test;
array numvars _numeric_;
array charvars _character_;
do _i = 1 to dim(numvars);
if not missing(numvars[_i]) then do;
varname = vname(numvars[_i]);
numvalue= numvars[_i];
output;
end;
end;
do _i = 1 to dim(charvars);
if not missing(charvars[_i]) then do;
varname = vname(charvars[_i]);
charvalue= charvars[_i];
output;
end;
end;
keep numvalue charvalue varname;
run;
proc sort data=test_t;
by varname;
run;
data want;
set test_t;
by varname;
if first.varname;
run;
Then you proc transpose this to get the desired want (or maybe this works for you as is). It does lose the formats/etc. on the value, so take that into account, and your character value length probably needs to be set to something appropriately long - and then set back (you can use an if 0 then set to fix it).
A similar hash approach would work roughly the same way; it has the advantage that it would stop much sooner, and doesn't require resorting.
data test_h;
set test end=eof;
array numvars _numeric_;
array charvars _character_;
length varname $32 numvalue 8 charvalue $1024; *or longest charvalue length;
if _n_=1 then do;
declare hash h(ordered:'a');
h.defineKey('varname');
h.defineData('varname','numvalue','charvalue');
h.defineDone();
end;
do _i = 1 to dim(numvars);
if not missing(numvars[_i]) then do;
varname = vname(numvars[_i]);
rc = h.find();
if rc ne 0 then do;
numvalue= numvars[_i];
rc=h.add();
end;
end;
end;
do _i = 1 to dim(charvars);
if not missing(charvars[_i]) then do;
varname = vname(charvars[_i]);
rc = h.find();
if rc ne 0 then do;
charvalue= charvars[_i];
rc=h.add();
end;
end;
end;
if eof or h.num_items = dim(numvars) + dim(charvars) then do;
rc = h.output(dataset:'want');
end;
run;
There are lots of other solutions, just depending on your data which would be most efficient.
I was wondering if there is a way to read SAS-macrovariables from Excel book/sheet/cell references?
The macro variable are sorted in column A like this, in a Excel spreadsheet:
%let var_1 = 1;
%let var_2 = 2;
%let var_3 = 3;
%let var_4 = 4;
%let var_5 = 5;
%let var_6 = 6;
Then in the SAS editor:
A datastep or proc sql that will read the SAS-macrovariables from the Excel file;
Data testSet;
testVar_1 = &let var_1.;
testVar_2 = &let var_2.;
testVar_3 = &let var_3.;
testVar_4 = &let var_4.;
testVar_5 = &let var_5.;
testVar_6 = &let var_6.;
run;
Does anyone know if there is a way to make this work?
Your second data step doesn't quite make sense to me at least.
If you can change your data structure this may work more easily, assuming what you're trying to do is create macro variables.
Structure in excel
MVAR_NAME Value
var_1 1
var_2 2
var_3 3
Then in SAS, import the excel file however you normally would, let's assume its called have and create the macro variables:
data _null_;
set have;
call symput(mvar_name, value);
run;
I'm trying to import data from 15 different txt files into sas. I want to feed the different file names into an array and then use the array elements inside a macro to bring in all the data into the work folder. The following did not work; any help is much appreciated !!
%macro DATAIMP;
array filenames(3) visit visit_event department
%do i =1 %to 3 %by 1
proc import
datafile="C:\Users\AR\Documents\data\&filename(i).txt"
OUT= &filenames(i)_1
dbms=dlm replace;
delimiter=";";
getnames=yes;
run;
end;
%mend DATAIMP;
%DATAIMP;
run;
array is a statemenet within a data step, you cannot use it like that.
What you can do is create a data set containing all your file names and create macro variables from that:
data file;
input filename $50.;
datalines;
visit
visit_event
department
;
run;
%macro DATAIMP;
data _NULL_; /*local macro variables called FILENAME1, FILENAME2,...*/
set file end=fine;
call symput("FILENAME"||compress(_N_),filename);
if fine then call symput("NF",compress(_N_));
run;
%DO I=1 %TO &NF;
proc import
datafile="C:\Users\AR\Documents\data\&&FILENAME&I...txt"
OUT= &&FILENAME&I.._1
dbms=dlm replace;
delimiter=";";
getnames=yes;
run;
%END;
%mend DATAIMP;
%DATAIMP;
Remember that && resolves to & and for each macro resolution you need a . to mark the end of the macro variable.
So I am trying to break up a large dataset (70,000 obs with 1,790 variables) based on a specific variable grouping. Excel or CSV is the ideal format to export to, but there is a limitation on variable numbers (260 or something). Any ideas how I can do this in SAS (or R / SQL otherwise)?
I know the macro works, I have used it before. The error message reads the limit on variables has been reached.
There is certainly a limit on creating an Excel file, but not a CSV file. Here is an example using a dummy SAS data set:
data a;
array x(*) x1-x1790;
do j=1 to 5;
do i=1 to dim(x);
x(i) = ranuni(0);
end;
output;
end;
run;
proc export data=a
outfile="c:\temp\tempfile.csv"
dbms=CSV
replace;
run;
And here is the relevant log:
NOTE: The file 'c:\temp\tempfile.csv' is:
Filename=c:\temp\tempfile.csv,
RECFM=V,LRECL=32767,File Size (bytes)=0,
Last Modified=23Jan2013:15:27:13,
Create Time=23Jan2013:15:27:13
NOTE: 6 records were written to the file 'c:\temp\tempfile.csv'.
The minimum record length was 9636.
The maximum record length was 23087.
NOTE: There were 5 observations read from the data set WORK.A.
NOTE: DATA statement used (Total process time):
real time 0.26 seconds
cpu time 0.09 seconds
5 records created in c:\temp\tempfile.csv from A.
NOTE: "c:\temp\tempfile.csv" file was successfully created.
NOTE: PROCEDURE EXPORT used (Total process time):
real time 2.04 seconds
cpu time 0.26 seconds
Note the first row contains column headers.
UPDATE: If you have a recent version of SAS (9.3 TS1M1 or later) you can create an Office 2010 Excel spreadsheet, which has a maximum of 1,048,576 rows by 16,384 columns. In that case, you would use DBMS=XLSX.
Bob's answer is good if you are okay with XLSX or a CSV. If you do want to make a .xls excel file (255 column limit), or don't have 9.3TS1M1, it's fairly easy to do that. How exactly depends on how you want to specify the columns that go into each file.
Say you just want each 255 columns into a separate file, and two files split at the midpoint (35000 records into file A, 35001-end into file B, per set of variables). You would do something like this:
options mprint symbolgen;
data test;
array xs x1-x1700;
do id = 1 to 70000;
do _t = 1 to dim(xs);
xs[_t]=ranuni(7);
end;
output;
end;
run;
%macro export_file(varstart=,varend=,varnumstart=0,varnumend=0,recstart=1,recend=0,keeplist=,dset=, libname=WORK, outfile=,sheet="sheet1");
%if &varnumstart ne 0 %then %do;
proc sql noprint;
select name into :varstart from dictionary.columns
where libname=upcase("&libname.") and memname=upcase("&dset.") and varnum=&varnumstart.;
select name into :varend from dictionary.columns
where libname=upcase("&libname.") and memname=upcase("&dset.") and varnum=&varnumend.;
quit;
%end;
%if &varstart=%str() or &varend=%str() %then %do;
%put "ERROR: MISSING PARAMETERS. PLEASE CHECK YOUR MACRO CALL AND RERUN. MUST HAVE VARSTART AND VAREND OR VARNUMSTART AND VARNUMEND.";
%abort;
%end;
data _for_Export/view=_for_export;
set &libname..&dset;
keep &varstart.--&varend.
%if &keeplist ne %str() %then %do;
&keeplist
%end;
;
if _N_ ge &recstart.;
%if &recend ne 0 %then %do;
if _N_ le &recend.;
%end;
run;
proc export data=_for_export file=&outfile. dbms=excel replace;
sheet=&sheet.;
run;
proc datasets nolist noprint lib=work;
delete _for_export/memtype=view;
quit;
%mend export_file;
%export_file(varnumstart=1,varnumend=250, keeplist=id,recstart=1,recend=35000,dset=test,outfile="c:\temp\test.xls",sheet="sheet1");
%export_file(varnumstart=1,varnumend=250, keeplist=id,recstart=35001,recend=99999,dset=test,outfile="c:\temp\test.xls",sheet="sheet2");
%export_file(varnumstart=251,varnumend=500, keeplist=id,recstart=1,recend=35000,dset=test,outfile="c:\temp\test.xls",sheet="sheet3");
%export_file(varnumstart=251,varnumend=500, keeplist=id,recstart=35001,recend=99999,dset=test,outfile="c:\temp\test.xls",sheet="sheet4");
Mine fails when I try to export sheet4, not sure if there's some limit to the total size of an .xls file, but you can easily modify this to create separate files. This wouldn't work if you needed to specify specific variable names that are nonconsecutive for each separate file, but you could fairly easily modify the SQL code that pulls from dictionary.columns to instead pull from a table you create that holds the variable names you want in each file.