Dynamize range of SAS PROC SQL SELECT INTO macro creation - sql

I want to put multiple observations into an own macro variable. I would do this by using select into :obs1 - :obs4, however, as count of observations can differ, i would like to dynamize the range and my code looks like this:
proc sql;
create table segments as select distinct substr(name,1,6) as segment from dictionary.columns
where libname = 'WORK' and memname = 'ALL_CCFS' and name ne 'MONTH';
run;
proc sql noprint;
select count(*) into: count from segments;
run;
proc sql noprint;
select segment into :segment_1 - :segment_&count. from dictionary.columns;
run;
However, this doesn't seem to work... any suggestions? Thank you!

Leave last value empty/blank and SAS will create them automatically
Set it to an absurdly large number and SAS will only use what's required
Use a data step to create it where you can dynamically increment your number (not shown).
proc sql noprint;
select segment into :segment_1 -
from dictionary.columns;
run;
proc sql noprint;
select segment into :segment_1 - :segment_999
from dictionary.columns;
run;

Related

How to create multiple macro variables using a loop in SAS Enterprise Guide?

I am constantly using SAS datasets in SAS EG to create macro variables that can be used as variables in a query from SAS EG to my internal servers. There is a character limit of 65,534 for a macro variable. When I need to get 100k ids that are 9 to 15 digits in length the number of macro variables required to create really adds up. I am asking the community if there is a way to create a large number of macro variables with a loop instead of doing it manually.
For instance the manual way to create these macro variables would be something like the following:
proc sql; create table alerts as select distinct review_id format best12. from q4_21_alerts order by review_id;quit;
proc sql; create table alerts1 as select review_id, monotonic() as number from alerts order by number; quit;
proc sql; select distinct review_id into:alert_ids1 separated by ',' from alerts1 where number between 1 and 7000; quit;
proc sql; select distinct review_id into:alert_ids2 separated by ',' from alerts1 where number between 7001 and 14000; quit;
proc sql; select distinct review_id into:alert_ids3 separated by ',' from alerts1 where number between 14001 and 21000; quit;
proc sql; select distinct review_id into:alert_ids4 separated by ',' from alerts1 where number between 21001 and 27000; quit;
.
.
.
proc sql; select distinct review_id into:alert_ids21 separated by ',' from alerts1 where number between 140001 and 147000; quit;
I am hoping to find a way to do something like the following:
N = 145417
#total number of review_ids that need to be contained in SAS macro variables
L = 8
#length/number of characters/digits in each review_id
L = L + 1
#length/number of characters/digits in each review_id with 1 added for the comma separation in macro variable
stop = N*L
i = 1
while(i<=stop){
some code to create all 21 macro variables
}
then be left with macro variables alert_ids1, alert_ids2,...,alert_ids21 that would contain all 145,417 ids i need to then use in a query for my internal servers.
Any help would be appreciated!
I've used google and sas communities and have code to do this process manually...
I am unsure what your final query is and would advise building a SQL query that specifically filters to the IDs you want. e.g.:
proc sql;
create table want as
select *
from have
where id in(select id from id_table)
;
quit;
But if you need to have a comma-separated list of macro variables that abides by the 65,534 character length, the safest way is to create one ID per macro variable. You can very easily do this with a data step.
data _null_;
set alerts1;
call symputx(cats('alert_id', _N_), review_id);
call symputx('n_ids', _N_);
run;
This will create the macro variables:
alert_id1
alert_id2
alert_id3
...
Now you need to create a loop that makes these all comma-separated.
%macro id_loop;
%do i = 1 %to &n_ids;
&&alert_id&i %if(&i < &n_ids) %then %do;,%end;
%end;
%mend;
Note the code format is a bit strange to keep the output formatted correctly. Now run this macro and you'll see a comma-separated list of every alert ID:
%put %id_loop;
id1, id2, id3, ...
You can put this in a query, such as where alert_id in (%id_loop). Keep in mind that doing this will load up the symbol table with a ton of macro variables. This is not the recommended way to query, but it is one way to achieve what you asked.
Use a data step instead of SQL to create the macro variables.
You can even create a second macro variable that references all of the generated macro variables.
For example say you have determined that you can always fit 1000 values into a single variable (the limit for a data step variable is 32K instead of the 64K limit of a macro variable) then you could use a data step like this:
data _null_;
length string list $32767 ;
group+1;
do i=1 to 1000 until(eof);
set alerts end=eof;
string=catx(',',string,review_id);
end;
call symputx(cats('alert_id',group),string);
list = catx(',',list,cats('&alert_id',group'));
if eof then call symputx('alerts',list);
run;
Now you can use that single macro variable ALERTS that consists of the string
&alert_id1,&alert_id2,....
in your SQL code:
where review_id in (&alerts)
And filter on all of the values in the ALERTS dataset even if the total string is longer than 64K. Since you put 1000 into each macro variable and you can fit about 3000 references to those macro variables into the ALERTS macro variable you could store up to 3 million values.
Of course you might hit a limit on what the SQL processor can handle.

Changing FROM statement with a variable

I am trying to change the name of the table I am getting my data from
Like this:
COREPOUT.KUNDE_REA_UDL_202112 --> COREPOUT.KUNDE_REA_UDL_202203
I create my variable like this:
PROC SQL NOPRINT;
SELECT DISTINCT
PERIOKVT_PREV_BANKSL_I_YYMMN6
INTO :PERIOKVT_PREV_BANKSL_I_YYMMN6
FROM Datostamp_PREV_Kvartal;
This is the code I want to use the variable for.
%_eg_conditional_dropds(WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000);
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000 AS
SELECT t1.Z_ORDINATE,
(input(t1.cpr_se,w.)) AS KundeNum
FROM COREPOUT.KUNDE_REA_UDL_202203 t1;
QUIT;
I have tried things like:
FROM string("COREPOUT.KUNDE_REA_UDL_",PERIOKVT_PREV_BANKSL_I_YYMMN6," t1";
I hope you can point me in the right direction.
Use & to reference and resolve macro variables into strings (e.g. &PERIOKVT_PREV_BANKSL_I_YYMMN6).
proc sql noprint;
select distinct PERIOKVT_PREV_BANKSL_I_YYMMN6
into :PERIOKVT_PREV_BANKSL_I_YYMMN6
from Datostamp_PREV_Kvartal
;
quit;
proc sql;
create table WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000 AS
select t1.Z_ORDINATE,
(input(t1.cpr_se,w.)) AS KundeNum
from &PERIOKVT_PREV_BANKSL_I_YYMMN6 t1
;
quit;
You can use CALL SYMPUTX() to move values from a dataset into a macro variable.
data _null_;
set Datostamp_PREV_Kvartal;
call symputx('dataset_name',PERIOKVT_PREV_BANKSL_I_YYMMN6);
stop;
run;
Then use the value of the macro variable to insert the dataset name into the code at the appropriate place. So your posted SQL is equivalent to this simple data step.
data QUERY_FOR_KUNDE_REA_UDL_20_0000;
set &dataset_name. ;
KundeNum = input(cpr_se,32.);
keep Z_ORDINATE KundeNum;
run;
Note: I did not see any definition of a user defined informat named W in your posted code so I just replaced it with the normal numeric informat instead since it looked like you where trying to convert a character value into a number.
The solution I ended up with was inspried by #Stu Sztukowski response:
I made a data step to concat the variable and created a macro variable.
data Concat_var;
str_PERIOKVT_PREV_YYMMN6 = CAT("COREPOUT.KUNDE_REA_UDL_",&PERIOKVT_PREV_BANKSL_I_YYMMN6," t1");
run;
PROC SQL NOPRINT;
SELECT DISTINCT
str_PERIOKVT_PREV_YYMMN6
INTO :str_PERIOKVT_PREV_YYMMN6
FROM Concat_var;
Then I used the variable in the FROM statement:
%_eg_conditional_dropds(WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000);
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000 AS
SELECT t1.Z_ORDINATE,
(input(t1.cpr_se,w.)) AS KundeNum
FROM &str_PERIOKVT_PREV_YYMMN6;
QUIT;
I hope this helps someone else in the future.

Using macro for formula proc sql in SAS

I need some help with macros in SAS. I want to sum variables (for example, from v_1 to v_7) to aggregate them, grouping by year. There are plenty of them, so I want to use macro. However, it doesn't work (I get only v_1) I would really appreciate Your help.
%macro my_macro();
%local i;
%do i = 1 %to 7;
proc sql;
create table my_table as select
year,
sum(v_&i.) as v_&i.
from my_table
group by year
;
quit;
%end;
%mend;
/* I don't know to run this macro - is it ok? */
data run_macro;
set my_table;
%my_macro();
run;
The macro processor just generates SAS code and then passes onto to SAS to run. You are calling a macro that generates a complete SAS step in the middle of your DATA step. So you are trying to run this code:
data run_macro;
set my_table;
proc sql;
create table my_table as select
year,
sum(v_1) as v_1
from my_table
group by year
;
quit;
proc sql;
create table my_table as select
year,
sum(v_1) as v_1
from my_table
group by year
;
quit;
...
So first you make a copy of MY_TABLE as RUN_MACRO. Then you overwrite MY_TABLE with a collapsed version of MY_TABLE that has just two variables and only one observations per year. Then you try to collapse it again but are referencing a variable named V_2 that no longer exists.
If you simply move the %DO loop inside the generation of the SQL statement it should work. Also don't overwrite your input dataset. Here is version of the macro will create a new dataset name MY_NEW_TABLE with 8 variables from the existing dataset named MY_TABLE.
%macro my_macro();
%local i;
proc sql;
create table my_NEW_table as
select year
%do i = 1 %to 7;
, sum(v_&i.) as v_&i.
%end;
from my_table
group by year
;
quit;
%mend;
%my_macro;
Note if this is all you are doing then just use PROC SUMMARY. With regular SAS code instead of SQL code you can use variable lists like v_1-v_7. So there is no need for code generation.
proc summary nway data=my_table ;
class year ;
var v_1 - v_7;
output out=my_NEW_table sum=;
run;

check whether proc append was successful

I have some code which appends yesterday's data to [large dataset], using proc append. After doing so it changes the value of the variable "latest_date" in another dataset to yesterday's date, thus showing the maximum date value in [large dataset] without a time-consuming data step or proc sql.
How can I check, within the same program in which proc append is used, whether proc append was successful (no errors)? My goal is to change the "latest_date" variable in this secondary dataset only if the append is successful.
Try the automatic macro variable &SYSCC.
data test;
do i=1 to 10;
output;
end;
run;
data t1;
i=11;
run;
data t2;
XXX=12;
run;
proc append base=test data=t1;
run;
%put &syscc;
proc append base=test data=t2;
run;
%put &syscc;
I'm using the %get_table_size macro, which I found here. My steps are
run %get_table_size(large_table, size_preappend)
Create dataset called to_append
run %get_table_size(to_append, append_size)
run proc append
run %get_table_size(large_table, size_postappend)
Check if &size_postappend = &size_preappend + &append_size
Using &syscc isn't exactly what I wanted, because it doesn't check specifically for an error in proc append. It could be thrown off by earlier errors.
You can do this by counting how many records are in the table pre and post appending. This would work with any sas table or database.
The best practice is to always have control table for your process to log run time and number of records read.
Code:
/*Create input data*/
data work.t1;
input row ;
datalines;
1
2
;;
run;
data work.t2;
input row ;
datalines;
3
;;
run;
/*Create Control table, Run this bit only once, otherwise you delete the table everytime*/
data work.cntrl;
length load_dt 8. source 8. delta 8. total 8. ;
format load_dt datetime21.;
run;
proc sql; delete * from work.cntrl; quit;
/*Count Records before append*/
proc sql noprint ; select count(*) into: count_t1 from work.t1; quit;
proc sql noprint; select count(*) into: count_t2 from work.t2; quit;
/*Append data*/
proc append base=work.t1 data=work.t2 ; run;
/*Count Records after append*/
proc sql noprint ; select count(*) into: count_final from work.t1; quit;
/*Insert counts and timestampe into the Control Table*/
proc sql noprint; insert into work.cntrl
/*values(input(datetime(),datetime21.), input(&count_t1.,8.) , input(&count_t2.,8.) , input(&count_final.,8.)) ; */
values(%sysfunc(datetime()), &count_t1. , &count_t2., &count_final.) ;
quit;
Output: Control table is updated

Update Oracle table from SAS dataset

How do I update an Oracle table in SAS from a SAS dataset?
Here's the scenario:
Trough a libname I load an Oracle table into a SAS dataset.
Make some data processing during which I UPDATE some values, INSERT some new observations and DELETE some observations in the dataset.
I need to update the original Oracle table with the dataset I've modified in the previous step - so when there's a match between the keys of the oracle table and the dataset, then the values will be updated, when there's a missing key in the oracle table, then it will be inserted, and when there's a key which is in the Oracle table but already deleted from the dataset, then it will be deleted from the Oracle table.
NOTE: I can not create a new table in Oracle. I need to make the "updating" on the original table.
I was trying to do it in two step using MERGE INTO and DELETE, but there's no MERGE INTO in PROC SQL.
I would really appreciate any help.
EDIT: I was also thinking about just truncating the oracle table and inserting the rows (talking about 4-5000 rows per procedure run), but seems like there's no built in truncate statement in PROC SQL.
Please try using the below,
Method 1:
PROC SQL;
insert into <User_Defined_Oracle_table>
select variables
from <SAS_Tables>;
QUIT;
Above creates a table that resides in the same database and schema.
PROC SQL;
connect to oracle (user= oraclepwd=);
execute(
UPDATE <Oracle_table> a SET <Column to be updated> = (SELECT <Columns to update seperated by commas>
FROM <SAS_table> b
WHERE a.<VARIABLE>=b.<VARIABLE>)
WHERE exists (select * from <SAS_table> b
WHERE a.<VARIABLE>=b.<VARIABLE> ))
by oracle;
QUIT;
PROC SQL;
connect to oracle
(user= oraclepwd=};
execute (truncate table <SAS_table>) by
oracle;
QUIT;
This is one of the efficient ways to update the oracle table.
Please refer to Update Oracle using SAS for more information.
Method 2:
LIBNAME Sample oracle user= password= path= schema= ; run;
PROC SQL;
UPDATE Sample_Oracle.<Table_Name> as a SET <Variable_Name> = (SELECT <Varibales>
FROM <Sas_table> as b
WHERE <A.Variable_Name>=<B.Variable_Name>)
WHERE exists
(select * from <Sas_table> as b
WHERE <A.Variable_Name>=<B.Variable_Name>);
QUIT;
This method takes longer processing time of all methods.
Also,
Method 3:
%MACRO update_oracle (SAS_Table,Oracle_Table);
Proc sql ;
select count(*) into: Count_Obs from <SAS_Table> ; Quit;
%do i = 1 %to &Count_Obs;
Proc sql;
select <variables to update seperated by commas> into: <macros> ; Quit;
PROC SQL;
UPDATE &Oracle_Table as a
SET <Oracle_Variable_to_Update>=<Variable_macro_created_above>
WHERE <A.Variable_Name>=<B.Variable_Name>
QUIT;
%end;
%MEND update_oracle;
%update_oracle();
The macro variables SAS_Table and Oracle_Table represent the SAS Dataset that contains the records to update and records to be updated in oracle, respectively.
Method 3 uses less processing time than method 2 but not as efficient as method 1.
Surely there are UPDATE and INSERT methods in proc SQL. Also, check if SAS will allow you to do other SQL operations "execute immediate" (such as PL/SQL will allow) where you can construct the SQL statement as a string, then send it to Oracle to execute.