Conditional merge SAS PROC SQL - sql

In my original code, I want to merge several data bases:
%let theft=1;
proc sql;
create table NewTable as
Select *
From
dtm_sitecom.FactSitecAllCriteria SITEC
left join dtm_sitecom.FactClaimLastVersion FactClaimLastVersion
on FactClaimLastVersion.ClaimId = Sitec.ClaimId
/*Condition 1 here*/
/*If &Theft=1 then*/
left join dtm_sitecom.DimClaimState DimClaimState
on DimClaimState.ClaimStateKey = FactClaimLastVersion.ClaimStateKey
/*Condition 2 here*/
/*else if &Theft = 0 then*/
left JOIN dtm_sitecom.DimGeoRisk dgr
ON dgr.GeoRiskKey = SITEC.GeoRiskKey;
quit;
On the upgraded version, I want to add some conditions (/Condition 1/ and /Condition 2/ in the code) such as, if the value of the variable Theft=1 then do the merge with the talbe dtm_sitecom.DimClaimState, otherwise do the merge with the other table. I tried the code as if there is no comment sign around the if, but it didn't work because of the wrong syntax. Is there any SAS syntax allowing me to do that?

If you are joining on a variable in the data, you can include that in the on statement:
data class_attach_m;
set sashelp.class;
attach_var='M';
run;
data class_attach_f;
set sashelp.class;
attach_var='F';
run;
proc sql;
create table class as
select class.*, coalescec(class_attach_m.attach_var,class_attach_f.attach_var) as attach_var from sashelp.class
left join class_attach_m
on class.name = class_attach_m.name
and class.sex='M'
left join class_attach_f
on class.name = class_attach_f.name
and class.sex='F'
;
quit;
coalescec (or coalesce for numerics) combines the two brought-in fields.
If you are using macro variables to control which you run, though, you need to be running this in a macro; if you're in a macro context, you can use %if to control what is executed.
For example, using the previous example - and I use a macro parameter not a %let as that's the right way to do it, though you can define that parameter using %let earlier and use it for the parameter call:
%macro attach_which(attach_m=M);
proc sql;
create table class as
select class.*, attach_var from sashelp.class
%if &attach_m=M %then %do;
left join class_attach_m
on class.name = class_attach_m.name
%end;
%else %do;
left join class_attach_f
on class.name = class_attach_f.name
%end;
;
quit;
%mend attach_which;
%attach_which(attach_m=M);

Related

Using macro for formula proc sql in SAS

I need some help with macros in SAS. I want to sum variables (for example, from v_1 to v_7) to aggregate them, grouping by year. There are plenty of them, so I want to use macro. However, it doesn't work (I get only v_1) I would really appreciate Your help.
%macro my_macro();
%local i;
%do i = 1 %to 7;
proc sql;
create table my_table as select
year,
sum(v_&i.) as v_&i.
from my_table
group by year
;
quit;
%end;
%mend;
/* I don't know to run this macro - is it ok? */
data run_macro;
set my_table;
%my_macro();
run;
The macro processor just generates SAS code and then passes onto to SAS to run. You are calling a macro that generates a complete SAS step in the middle of your DATA step. So you are trying to run this code:
data run_macro;
set my_table;
proc sql;
create table my_table as select
year,
sum(v_1) as v_1
from my_table
group by year
;
quit;
proc sql;
create table my_table as select
year,
sum(v_1) as v_1
from my_table
group by year
;
quit;
...
So first you make a copy of MY_TABLE as RUN_MACRO. Then you overwrite MY_TABLE with a collapsed version of MY_TABLE that has just two variables and only one observations per year. Then you try to collapse it again but are referencing a variable named V_2 that no longer exists.
If you simply move the %DO loop inside the generation of the SQL statement it should work. Also don't overwrite your input dataset. Here is version of the macro will create a new dataset name MY_NEW_TABLE with 8 variables from the existing dataset named MY_TABLE.
%macro my_macro();
%local i;
proc sql;
create table my_NEW_table as
select year
%do i = 1 %to 7;
, sum(v_&i.) as v_&i.
%end;
from my_table
group by year
;
quit;
%mend;
%my_macro;
Note if this is all you are doing then just use PROC SUMMARY. With regular SAS code instead of SQL code you can use variable lists like v_1-v_7. So there is no need for code generation.
proc summary nway data=my_table ;
class year ;
var v_1 - v_7;
output out=my_NEW_table sum=;
run;

SAS union distinct records from datasets with similar names

I have about 100 large datasets and within each dataset I'm hoping to extract distinct IDs to join them vertically. The datasets are unsorted, named as data_01 , data_02, data_03 ....data_100.
Since the datasets are all very large, set them together without reducing the size is not feasible, the join didn't even move after hours of running. Therefore, I believe there is the need to reduce the datasets before stacking is necessary, and I'm here to seek some help.
I tried to create a macro to select distinct ID and sum a numerical variable,cnt, by ID before vertically joining all datasets by proc sql union. The macro is not working properly:
/*Get dataset names*/
proc sql noprint;
select memname into :mylist separated by ' '
from dictionary.tables where libname= "mylib" and upcase(memname) like "DATA_%"
;
quit;
%put &mylist;
/*create union statements*/
%global nextdata;
%let nextdata =;
%macro combinedata(mylist);
data _null_;
datanum = countw("&mylist");
call symput('Dataset', put(datanum, 10.));
run;
%do i = 1 %to &Dataset ;
data _null_;
temp = scan("&mylist", &i);
call symput("Dataname", strip(put(temp,$12.)));
run;
%put &Dataname;
%put &Dataset;
%if (&i=&Dataset) %then %do;
%let nextdata = &nextdata.
select id, sum(cnt)
from mylib.&&Dataname
group by id;
%end;
%else %do;
%let nextdata = &nextdata.
select id, sum(cnt)
from mylib.&&Dataname union
group by id;
%end;
%put nextdata = &nextdata;
%end;
%mend combinedata;
%combinedata(&mylist);
/*execute from proc sql*/
proc sql;
create table combined as (&nextdata);
quit;
I have also attempted to use proc summary, but there was not enough memory to run the following code:
data vneed / view=vneed;
set data_: (keep=id cnt);
run;
proc summary data=vneed nway;
class id;
var cnt;
output out=want (drop=_type_) sum=sumcnt;
run;
Appreciate any help!
If the number of values of ID is reasonable you should be able to use a hash object.
data _null_ ;
if _n_=1 then do;
dcl hash H (ordered: "A") ;
h.definekey ("ID") ;
h.definedata ("ID", "SUMCNT") ;
h.definedone () ;
end;
set data_: (keep=id cnt) end=eof;
if h.find() then sumcnt=.;
sumcnt+cnt ;
h.replace() ;
if eof then h.output (dataset: "WANT") ;
run ;
If the number of ID values is too large to fit the summary data into a HASH object you could adapt this code to stop at some reasonable number of distinct ID values to avoid memory overload and write the current summary to an actual SAS dataset and then generate the final counts by re-aggregating the intermediate datasets. But at that point you should just use my other answer and let PROC SQL create the intermediate summary datasets instead.
Summarize the data as you go instead of trying to generate one massive query. Then re-aggregate the aggregates.
proc sql ;
%do i = 1 %to &Dataset ;
%let dataname=mylib.%scan(&mylist,&i,%str( ));
create table sum&i as
select id,sum(cnt) as cnt
from &dataname
group by id
order by id
;
%end;
quit;
data want ;
do until(last.id);
set sum1 - sum&dataset ;
by id;
sumcnt+cnt;
end;
drop cnt;
run;

ORA-00928: missing SELECT keyword, SQL statement was not passed to the DBMS, SAS will do the processing

I am so frustrated with this piece of code. I'm trying to pass in values using syspbuff which I do all the time. However, I want to pass in multiple values but for this UNION code I'm trying to do, it's giving me trouble. I am going from Oracle to SAS which I assume is causing the problem but I'd like an answer as to why. Previously, I had the source tables in temp space (SAS) and I didn't get this error. But when I had to create the tables in MYDB (Oracle) because of a specific reason, I started getting the large log with "failure to pass through" errors.
Interestingly, the code actually works and it does what I want it to but the problem is that I get a pop up that the log is too large and will open externally. Then it opens a text file that is HUGE and has tons of errors basically saying that it couldn't pass through the code into implicit pass through. I wasn't trying to do pass through for this particular piece of code. So, again, it works and I ultimately get what I want but the log issue is driving me bonkers.
%macro ALLPROVTYPE() / parmbuff;
%do ii = 1 %to %sysfunc(countw(%bquote(&syspbuff.)));
%let FT=%scan(%bquote(&SYSPBUFF),&ii);
CREATE TABLE MYSASLIB.ALLST_PROV_&FT._NULL AS
SELECT "AK" AS STATE,*
FROM MYDB.AK_PROV_&FT
%macro JNSTS() / parmbuff;
%do i = 1 %to %sysfunc(countw(%bquote(&syspbuff.)));
%let ST=%scan(%bquote(&SYSPBUFF),&i);
UNION CORR
SELECT "&ST" AS STATE,*
FROM MYDB.&ST._PROV_&FT
%end;
%mend JNSTS;
%JNSTS(&&PROVALL&FT);
;
%end;
%mend ALLPROVTYPE;
PROC SQL;
%ALLPROVTYPE(&PROVNUMS);
QUIT;
ACCESS ENGINE: ERROR: ORACLE prepare error: ORA-00928: missing SELECT keyword. SQL statement: DEBUG: DBMS engine returned an error - NO Implicit Passthru.
DEBUG: Error during prepare of:
The way I understood this query is, you are creating multiple tables, and each table is created as a select statement which is constructed through multiple select statements that are joined via a UNION CORR. Essentially something like:
create table <something> as
(select <something> as state, * from <something> union corr
select <something> as state, * from <something> union corr
select <something> as state, * from <something>);
Is this correct?
If yes, your macro code had some syntactically problematic nesting going on. Try the following code (though I wasn't able to fully verify it since I don't have information about the inputs to the macros):
/* Since this needs to be passed between the two macros */
%global FT;
%macro ALLPROVTYPE() / parmbuff;
%do ii = 1 %to %sysfunc(countw(%bquote(&syspbuff.)));
%let FT=%scan(%bquote(&SYSPBUFF),&ii);
CREATE TABLE MYSASLIB.ALLST_PROV_&FT._NULL AS (
%JNSTS(&&PROVALL&FT)
);
%end;
%mend;
%macro JNSTS() / parmbuff;
%do jj = 1 %to %sysfunc(countw(%bquote(&syspbuff.)));
%let ST=%scan(%bquote(&SYSPBUFF),&jj);
SELECT "&ST" AS STATE,* FROM MYDB.&ST._PROV_&FT
%if &jj NE %sysfunc(countw(%bquote(&syspbuff.))) %then
%do;
UNION CORR
%end;
%end;
%mend;
PROC SQL;
%ALLPROVTYPE(&PROVNUMS);
QUIT;

Using a SAS macro variable to select all values using the IN operator in PROC SQL

In a SAS script I have a macro variable which is later used in an SQL in statement in a PROC SQL step.
%let my_list = (1,2,3);
proc sql;
select *
from my_table
where var1 in &my_list.
;
quit;
This works fine, but I need some flexibility and also want to be able to select ALL lines without changing the SQL code itself, but just the macro variable.
Is there a trick to specifiy the macro variable so it selects ALL lines still using the IN operator? (avoiding a subquery solution that fills all possible values in the macro variable)
You could change your code to
%let where_clause = var1 in (1,2,3);
proc sql;
select *
from my_table
where &where_clause
;
quit;
And change the macro variable to %let where_clause = 1=1; in order to select all lines.
%let where_clause = 1=1;
proc sql;
select *
from my_table
where &where_clause
;
quit;
OR, if you are adamant about keeping your code unchanged, you could simply change the macro variable as follows in order for your where clause to always be true:
%let my_list = (1) or 1=1;
proc sql;
select *
from my_table
where var1 in &my_list
;
quit;
(dirty but gets the job done)

SAS Dynamic SQL join

I want to join 6 tables, which all have different variables, to one table, which has same columns as all 6 other tables. Can i somehow do it without looking at these tables and watching which columns these tables have? I have got macro variable, an array, with column names, but I cannot think of any good way how to join these tables using this array.
Array is created by this macro:
%macro getvars(dsn);
%global vlist;
proc sql noprint;
select name into :vlist separated by ' '
from dictionary.columns
where memname=upcase("&dsn");
quit;
%mend getvars;
And i want to just join tables like this:
proc sql;
create table new_table as select * from table1 as l
left join table2 as r on l.age=r.age and l.type=r.type;
quit;
but not so manually :)
For example, table1 has columns name, age, coef1 and sex, table 2 has columns name, region and coef2. The third table, where I want to join them has name, age, sex, region, coef and many other columns. I want to write a program, that doesn't know which table has which columns, but joins so that third table still has all the same columns plus coef1 and coef2.
This isn't an answer I'd normally recommend as it can lead to unwanted results if you're not careful, however it could work for you in this instance. I'm proposing using a natural join, which automatically joins on to all matching variables so you don't need to specify an ON clause. Here's example code.
proc sql;
create table want as select
*
from
a
natural left join
b
natural left join
c
;
quit;
As I say, be very careful about checking the results
Here's one method...
Firstly, use DICTIONARY.COLUMNS to find all of the common variables in each table based on the 'master' table. Then dynamically generate the join criteria for tables with common variables, and finally join them all together based on those criteria.
%MACRO COMMONJOIN(DSN,DSNLIST) ;
%LET DSNC = %SYSFUNC(countw(&DSNLIST,%STR( ))) ; /* # of additional tables */
/* Create a list of variables from primary DSN, with flags where variable exists in DSNLIST datasets */
proc sql ;
create table commonvars as
select a.name %DO I = 1 %TO &DSNC ;
%LET D = %SYSFUNC(scan(&DSNLIST,&I,%STR( ))) ;
, d&I..V&I label="&D"
%END ;
from dictionary.columns a
%DO I = 1 %TO &DSNC ;
/* Iterate over list of dataset names */
%LET D = %SYSFUNC(scan(&DSNLIST,&I,%STR( ))) ;
left join
(select name, 1 as V&I
from dictionary.columns
where libname = scan(upcase("&D"),1,'.')
and memname = scan(upcase("&D"),2,'.'))
as d&I on a.name = d&I..name
%END ;
where libname = scan(upcase("&DSN"),1,'.')
and memname = scan(upcase("&DSN"),2,'.')
;
quit ;
/* Create join criteria between master & each secondary table */
%DO I = 1 %TO &DSNC ;
%LET JOIN&I = ;
proc sql ;
select catx(' = ',cats('a.',name),cats("V&I..",name)) into :JOIN&I separated by ' and '
from commonvars
where V&I = 1 ;
quit ;
%END ;
/* Join */
proc sql ;
create table masterjoin as
select a.*
%DO I = 1 %TO &DSNC ;
%IF "&&JOIN&I" ne "" %THEN %DO ;
, V&I..*
%END ;
%END ;
from &DSN as a
%DO I = 1 %TO &DSNC ;
%IF "&&JOIN&I" ne "" %THEN %DO ;
%LET D = %SYSFUNC(scan(&DSNLIST,&I,%STR( ))) ;
left join &D as V&I on &&JOIN&I
%END ;
%END ;
;
quit ;
%MEND ;
%COMMONJOIN(work.master,work.table1 work.table2 work.table3) ;
If you are open to using a data step instead of proc sql you may be in luck.
/* pre-sorting is required for SAS merge */
proc sort data=master; by key1 key2; run;
proc sort data=table1; by key1 key2; run;
proc sort data=table2; by key1 key2; run;
proc sort data=table3; by key1 key2; run;
data want;
merge master (in=_inMaster) table1 table2 table3;
by key1 key2;
/* for a Left Join, keep all rows from Master */
if _inMaster;
run;
The only gotcha I can think of is a common variable name among the non-key fields. If more than one table has variable x, the right-most table's value of x will overwrite the previous ones, but SAS will note this in the log.