Using macro for formula proc sql in SAS - sql

I need some help with macros in SAS. I want to sum variables (for example, from v_1 to v_7) to aggregate them, grouping by year. There are plenty of them, so I want to use macro. However, it doesn't work (I get only v_1) I would really appreciate Your help.
%macro my_macro();
%local i;
%do i = 1 %to 7;
proc sql;
create table my_table as select
year,
sum(v_&i.) as v_&i.
from my_table
group by year
;
quit;
%end;
%mend;
/* I don't know to run this macro - is it ok? */
data run_macro;
set my_table;
%my_macro();
run;

The macro processor just generates SAS code and then passes onto to SAS to run. You are calling a macro that generates a complete SAS step in the middle of your DATA step. So you are trying to run this code:
data run_macro;
set my_table;
proc sql;
create table my_table as select
year,
sum(v_1) as v_1
from my_table
group by year
;
quit;
proc sql;
create table my_table as select
year,
sum(v_1) as v_1
from my_table
group by year
;
quit;
...
So first you make a copy of MY_TABLE as RUN_MACRO. Then you overwrite MY_TABLE with a collapsed version of MY_TABLE that has just two variables and only one observations per year. Then you try to collapse it again but are referencing a variable named V_2 that no longer exists.
If you simply move the %DO loop inside the generation of the SQL statement it should work. Also don't overwrite your input dataset. Here is version of the macro will create a new dataset name MY_NEW_TABLE with 8 variables from the existing dataset named MY_TABLE.
%macro my_macro();
%local i;
proc sql;
create table my_NEW_table as
select year
%do i = 1 %to 7;
, sum(v_&i.) as v_&i.
%end;
from my_table
group by year
;
quit;
%mend;
%my_macro;
Note if this is all you are doing then just use PROC SUMMARY. With regular SAS code instead of SQL code you can use variable lists like v_1-v_7. So there is no need for code generation.
proc summary nway data=my_table ;
class year ;
var v_1 - v_7;
output out=my_NEW_table sum=;
run;

Related

Changing FROM statement with a variable

I am trying to change the name of the table I am getting my data from
Like this:
COREPOUT.KUNDE_REA_UDL_202112 --> COREPOUT.KUNDE_REA_UDL_202203
I create my variable like this:
PROC SQL NOPRINT;
SELECT DISTINCT
PERIOKVT_PREV_BANKSL_I_YYMMN6
INTO :PERIOKVT_PREV_BANKSL_I_YYMMN6
FROM Datostamp_PREV_Kvartal;
This is the code I want to use the variable for.
%_eg_conditional_dropds(WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000);
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000 AS
SELECT t1.Z_ORDINATE,
(input(t1.cpr_se,w.)) AS KundeNum
FROM COREPOUT.KUNDE_REA_UDL_202203 t1;
QUIT;
I have tried things like:
FROM string("COREPOUT.KUNDE_REA_UDL_",PERIOKVT_PREV_BANKSL_I_YYMMN6," t1";
I hope you can point me in the right direction.
Use & to reference and resolve macro variables into strings (e.g. &PERIOKVT_PREV_BANKSL_I_YYMMN6).
proc sql noprint;
select distinct PERIOKVT_PREV_BANKSL_I_YYMMN6
into :PERIOKVT_PREV_BANKSL_I_YYMMN6
from Datostamp_PREV_Kvartal
;
quit;
proc sql;
create table WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000 AS
select t1.Z_ORDINATE,
(input(t1.cpr_se,w.)) AS KundeNum
from &PERIOKVT_PREV_BANKSL_I_YYMMN6 t1
;
quit;
You can use CALL SYMPUTX() to move values from a dataset into a macro variable.
data _null_;
set Datostamp_PREV_Kvartal;
call symputx('dataset_name',PERIOKVT_PREV_BANKSL_I_YYMMN6);
stop;
run;
Then use the value of the macro variable to insert the dataset name into the code at the appropriate place. So your posted SQL is equivalent to this simple data step.
data QUERY_FOR_KUNDE_REA_UDL_20_0000;
set &dataset_name. ;
KundeNum = input(cpr_se,32.);
keep Z_ORDINATE KundeNum;
run;
Note: I did not see any definition of a user defined informat named W in your posted code so I just replaced it with the normal numeric informat instead since it looked like you where trying to convert a character value into a number.
The solution I ended up with was inspried by #Stu Sztukowski response:
I made a data step to concat the variable and created a macro variable.
data Concat_var;
str_PERIOKVT_PREV_YYMMN6 = CAT("COREPOUT.KUNDE_REA_UDL_",&PERIOKVT_PREV_BANKSL_I_YYMMN6," t1");
run;
PROC SQL NOPRINT;
SELECT DISTINCT
str_PERIOKVT_PREV_YYMMN6
INTO :str_PERIOKVT_PREV_YYMMN6
FROM Concat_var;
Then I used the variable in the FROM statement:
%_eg_conditional_dropds(WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000);
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000 AS
SELECT t1.Z_ORDINATE,
(input(t1.cpr_se,w.)) AS KundeNum
FROM &str_PERIOKVT_PREV_YYMMN6;
QUIT;
I hope this helps someone else in the future.

SAS union distinct records from datasets with similar names

I have about 100 large datasets and within each dataset I'm hoping to extract distinct IDs to join them vertically. The datasets are unsorted, named as data_01 , data_02, data_03 ....data_100.
Since the datasets are all very large, set them together without reducing the size is not feasible, the join didn't even move after hours of running. Therefore, I believe there is the need to reduce the datasets before stacking is necessary, and I'm here to seek some help.
I tried to create a macro to select distinct ID and sum a numerical variable,cnt, by ID before vertically joining all datasets by proc sql union. The macro is not working properly:
/*Get dataset names*/
proc sql noprint;
select memname into :mylist separated by ' '
from dictionary.tables where libname= "mylib" and upcase(memname) like "DATA_%"
;
quit;
%put &mylist;
/*create union statements*/
%global nextdata;
%let nextdata =;
%macro combinedata(mylist);
data _null_;
datanum = countw("&mylist");
call symput('Dataset', put(datanum, 10.));
run;
%do i = 1 %to &Dataset ;
data _null_;
temp = scan("&mylist", &i);
call symput("Dataname", strip(put(temp,$12.)));
run;
%put &Dataname;
%put &Dataset;
%if (&i=&Dataset) %then %do;
%let nextdata = &nextdata.
select id, sum(cnt)
from mylib.&&Dataname
group by id;
%end;
%else %do;
%let nextdata = &nextdata.
select id, sum(cnt)
from mylib.&&Dataname union
group by id;
%end;
%put nextdata = &nextdata;
%end;
%mend combinedata;
%combinedata(&mylist);
/*execute from proc sql*/
proc sql;
create table combined as (&nextdata);
quit;
I have also attempted to use proc summary, but there was not enough memory to run the following code:
data vneed / view=vneed;
set data_: (keep=id cnt);
run;
proc summary data=vneed nway;
class id;
var cnt;
output out=want (drop=_type_) sum=sumcnt;
run;
Appreciate any help!
If the number of values of ID is reasonable you should be able to use a hash object.
data _null_ ;
if _n_=1 then do;
dcl hash H (ordered: "A") ;
h.definekey ("ID") ;
h.definedata ("ID", "SUMCNT") ;
h.definedone () ;
end;
set data_: (keep=id cnt) end=eof;
if h.find() then sumcnt=.;
sumcnt+cnt ;
h.replace() ;
if eof then h.output (dataset: "WANT") ;
run ;
If the number of ID values is too large to fit the summary data into a HASH object you could adapt this code to stop at some reasonable number of distinct ID values to avoid memory overload and write the current summary to an actual SAS dataset and then generate the final counts by re-aggregating the intermediate datasets. But at that point you should just use my other answer and let PROC SQL create the intermediate summary datasets instead.
Summarize the data as you go instead of trying to generate one massive query. Then re-aggregate the aggregates.
proc sql ;
%do i = 1 %to &Dataset ;
%let dataname=mylib.%scan(&mylist,&i,%str( ));
create table sum&i as
select id,sum(cnt) as cnt
from &dataname
group by id
order by id
;
%end;
quit;
data want ;
do until(last.id);
set sum1 - sum&dataset ;
by id;
sumcnt+cnt;
end;
drop cnt;
run;

Dynamize range of SAS PROC SQL SELECT INTO macro creation

I want to put multiple observations into an own macro variable. I would do this by using select into :obs1 - :obs4, however, as count of observations can differ, i would like to dynamize the range and my code looks like this:
proc sql;
create table segments as select distinct substr(name,1,6) as segment from dictionary.columns
where libname = 'WORK' and memname = 'ALL_CCFS' and name ne 'MONTH';
run;
proc sql noprint;
select count(*) into: count from segments;
run;
proc sql noprint;
select segment into :segment_1 - :segment_&count. from dictionary.columns;
run;
However, this doesn't seem to work... any suggestions? Thank you!
Leave last value empty/blank and SAS will create them automatically
Set it to an absurdly large number and SAS will only use what's required
Use a data step to create it where you can dynamically increment your number (not shown).
proc sql noprint;
select segment into :segment_1 -
from dictionary.columns;
run;
proc sql noprint;
select segment into :segment_1 - :segment_999
from dictionary.columns;
run;

check whether proc append was successful

I have some code which appends yesterday's data to [large dataset], using proc append. After doing so it changes the value of the variable "latest_date" in another dataset to yesterday's date, thus showing the maximum date value in [large dataset] without a time-consuming data step or proc sql.
How can I check, within the same program in which proc append is used, whether proc append was successful (no errors)? My goal is to change the "latest_date" variable in this secondary dataset only if the append is successful.
Try the automatic macro variable &SYSCC.
data test;
do i=1 to 10;
output;
end;
run;
data t1;
i=11;
run;
data t2;
XXX=12;
run;
proc append base=test data=t1;
run;
%put &syscc;
proc append base=test data=t2;
run;
%put &syscc;
I'm using the %get_table_size macro, which I found here. My steps are
run %get_table_size(large_table, size_preappend)
Create dataset called to_append
run %get_table_size(to_append, append_size)
run proc append
run %get_table_size(large_table, size_postappend)
Check if &size_postappend = &size_preappend + &append_size
Using &syscc isn't exactly what I wanted, because it doesn't check specifically for an error in proc append. It could be thrown off by earlier errors.
You can do this by counting how many records are in the table pre and post appending. This would work with any sas table or database.
The best practice is to always have control table for your process to log run time and number of records read.
Code:
/*Create input data*/
data work.t1;
input row ;
datalines;
1
2
;;
run;
data work.t2;
input row ;
datalines;
3
;;
run;
/*Create Control table, Run this bit only once, otherwise you delete the table everytime*/
data work.cntrl;
length load_dt 8. source 8. delta 8. total 8. ;
format load_dt datetime21.;
run;
proc sql; delete * from work.cntrl; quit;
/*Count Records before append*/
proc sql noprint ; select count(*) into: count_t1 from work.t1; quit;
proc sql noprint; select count(*) into: count_t2 from work.t2; quit;
/*Append data*/
proc append base=work.t1 data=work.t2 ; run;
/*Count Records after append*/
proc sql noprint ; select count(*) into: count_final from work.t1; quit;
/*Insert counts and timestampe into the Control Table*/
proc sql noprint; insert into work.cntrl
/*values(input(datetime(),datetime21.), input(&count_t1.,8.) , input(&count_t2.,8.) , input(&count_final.,8.)) ; */
values(%sysfunc(datetime()), &count_t1. , &count_t2., &count_final.) ;
quit;
Output: Control table is updated

Using a SAS macro variable to select all values using the IN operator in PROC SQL

In a SAS script I have a macro variable which is later used in an SQL in statement in a PROC SQL step.
%let my_list = (1,2,3);
proc sql;
select *
from my_table
where var1 in &my_list.
;
quit;
This works fine, but I need some flexibility and also want to be able to select ALL lines without changing the SQL code itself, but just the macro variable.
Is there a trick to specifiy the macro variable so it selects ALL lines still using the IN operator? (avoiding a subquery solution that fills all possible values in the macro variable)
You could change your code to
%let where_clause = var1 in (1,2,3);
proc sql;
select *
from my_table
where &where_clause
;
quit;
And change the macro variable to %let where_clause = 1=1; in order to select all lines.
%let where_clause = 1=1;
proc sql;
select *
from my_table
where &where_clause
;
quit;
OR, if you are adamant about keeping your code unchanged, you could simply change the macro variable as follows in order for your where clause to always be true:
%let my_list = (1) or 1=1;
proc sql;
select *
from my_table
where var1 in &my_list
;
quit;
(dirty but gets the job done)