Conditional Insert using SAS Proc SQL - sql

I am trying to add records from one smaller table into a very large table if the primary key value for the rows in the smaller table is not in the larger one:
data test;
Length B C $4;
infile datalines delimiter=',';
input a b $ c $;
datalines;
1000,Test,File
2000,Test,File
3000,Test,File
;
data test2;
Length B C $4;
infile datalines delimiter=',';
input a b $ c $;
datalines;
1000,Test,File
4000,Test,File
;
proc sql;
insert into test
select * from test2
where a not in (select a from test2);
quit;
This however insets no records into the table Test. Can anyone tell me what I am doing wrong? The end result should be that the row where a = 4000 should be added to the table Test.
EDIT:
Using where a not in (select a from test) was what I originally tried and it generated the following error:
WARNING: This DELETE/INSERT statement recursively references the target table. A consequence of this is a possible data integrity problem.
ERROR: You cannot reopen WORK.TEST.DATA for update access with member-level control because WORK.TEST.DATA is in use by you in resource
environment SQL.
ERROR: PROC SQL could not undo this statement if an ERROR were to happen as it could not obtain exclusive access to the data set. This
statement will not execute as the SQL option UNDO_POLICY=REQUIRED is in effect.
224 quit;
Thanks

You can either do the process in two steps. First create the table of records to insert and then insert them.
proc sql ;
create table to_add as
select * from test2
where a not in (select a from test)
;
insert into test select * from to_add ;
quit;
Or you could just change the setting for the UNDO_POLICY option and SAS will let you reference TEST while updating TEST.
proc sql undo_policy=none;
insert into test
select * from test2
where a not in (select a from test)
;
quit;

Related

check whether proc append was successful

I have some code which appends yesterday's data to [large dataset], using proc append. After doing so it changes the value of the variable "latest_date" in another dataset to yesterday's date, thus showing the maximum date value in [large dataset] without a time-consuming data step or proc sql.
How can I check, within the same program in which proc append is used, whether proc append was successful (no errors)? My goal is to change the "latest_date" variable in this secondary dataset only if the append is successful.
Try the automatic macro variable &SYSCC.
data test;
do i=1 to 10;
output;
end;
run;
data t1;
i=11;
run;
data t2;
XXX=12;
run;
proc append base=test data=t1;
run;
%put &syscc;
proc append base=test data=t2;
run;
%put &syscc;
I'm using the %get_table_size macro, which I found here. My steps are
run %get_table_size(large_table, size_preappend)
Create dataset called to_append
run %get_table_size(to_append, append_size)
run proc append
run %get_table_size(large_table, size_postappend)
Check if &size_postappend = &size_preappend + &append_size
Using &syscc isn't exactly what I wanted, because it doesn't check specifically for an error in proc append. It could be thrown off by earlier errors.
You can do this by counting how many records are in the table pre and post appending. This would work with any sas table or database.
The best practice is to always have control table for your process to log run time and number of records read.
Code:
/*Create input data*/
data work.t1;
input row ;
datalines;
1
2
;;
run;
data work.t2;
input row ;
datalines;
3
;;
run;
/*Create Control table, Run this bit only once, otherwise you delete the table everytime*/
data work.cntrl;
length load_dt 8. source 8. delta 8. total 8. ;
format load_dt datetime21.;
run;
proc sql; delete * from work.cntrl; quit;
/*Count Records before append*/
proc sql noprint ; select count(*) into: count_t1 from work.t1; quit;
proc sql noprint; select count(*) into: count_t2 from work.t2; quit;
/*Append data*/
proc append base=work.t1 data=work.t2 ; run;
/*Count Records after append*/
proc sql noprint ; select count(*) into: count_final from work.t1; quit;
/*Insert counts and timestampe into the Control Table*/
proc sql noprint; insert into work.cntrl
/*values(input(datetime(),datetime21.), input(&count_t1.,8.) , input(&count_t2.,8.) , input(&count_final.,8.)) ; */
values(%sysfunc(datetime()), &count_t1. , &count_t2., &count_final.) ;
quit;
Output: Control table is updated

Update Oracle table from SAS dataset

How do I update an Oracle table in SAS from a SAS dataset?
Here's the scenario:
Trough a libname I load an Oracle table into a SAS dataset.
Make some data processing during which I UPDATE some values, INSERT some new observations and DELETE some observations in the dataset.
I need to update the original Oracle table with the dataset I've modified in the previous step - so when there's a match between the keys of the oracle table and the dataset, then the values will be updated, when there's a missing key in the oracle table, then it will be inserted, and when there's a key which is in the Oracle table but already deleted from the dataset, then it will be deleted from the Oracle table.
NOTE: I can not create a new table in Oracle. I need to make the "updating" on the original table.
I was trying to do it in two step using MERGE INTO and DELETE, but there's no MERGE INTO in PROC SQL.
I would really appreciate any help.
EDIT: I was also thinking about just truncating the oracle table and inserting the rows (talking about 4-5000 rows per procedure run), but seems like there's no built in truncate statement in PROC SQL.
Please try using the below,
Method 1:
PROC SQL;
insert into <User_Defined_Oracle_table>
select variables
from <SAS_Tables>;
QUIT;
Above creates a table that resides in the same database and schema.
PROC SQL;
connect to oracle (user= oraclepwd=);
execute(
UPDATE <Oracle_table> a SET <Column to be updated> = (SELECT <Columns to update seperated by commas>
FROM <SAS_table> b
WHERE a.<VARIABLE>=b.<VARIABLE>)
WHERE exists (select * from <SAS_table> b
WHERE a.<VARIABLE>=b.<VARIABLE> ))
by oracle;
QUIT;
PROC SQL;
connect to oracle
(user= oraclepwd=};
execute (truncate table <SAS_table>) by
oracle;
QUIT;
This is one of the efficient ways to update the oracle table.
Please refer to Update Oracle using SAS for more information.
Method 2:
LIBNAME Sample oracle user= password= path= schema= ; run;
PROC SQL;
UPDATE Sample_Oracle.<Table_Name> as a SET <Variable_Name> = (SELECT <Varibales>
FROM <Sas_table> as b
WHERE <A.Variable_Name>=<B.Variable_Name>)
WHERE exists
(select * from <Sas_table> as b
WHERE <A.Variable_Name>=<B.Variable_Name>);
QUIT;
This method takes longer processing time of all methods.
Also,
Method 3:
%MACRO update_oracle (SAS_Table,Oracle_Table);
Proc sql ;
select count(*) into: Count_Obs from <SAS_Table> ; Quit;
%do i = 1 %to &Count_Obs;
Proc sql;
select <variables to update seperated by commas> into: <macros> ; Quit;
PROC SQL;
UPDATE &Oracle_Table as a
SET <Oracle_Variable_to_Update>=<Variable_macro_created_above>
WHERE <A.Variable_Name>=<B.Variable_Name>
QUIT;
%end;
%MEND update_oracle;
%update_oracle();
The macro variables SAS_Table and Oracle_Table represent the SAS Dataset that contains the records to update and records to be updated in oracle, respectively.
Method 3 uses less processing time than method 2 but not as efficient as method 1.
Surely there are UPDATE and INSERT methods in proc SQL. Also, check if SAS will allow you to do other SQL operations "execute immediate" (such as PL/SQL will allow) where you can construct the SQL statement as a string, then send it to Oracle to execute.

SAS update multiple records for a by group

I have a master A and transaction set B. I am trying to udpate records in A with the records in B by variable C.
DATA TEST;
UPDATE A B;
BY C;
RUN;
The issue is, I have got some duplicate records in my master set and I still want to update them all. But what I get is a warning
There was more than one record for the specified BY group
And only the first record out of those duplicates gets updated.
Is there any way how to tell SAS to update all of them?
Or is there any other, completely different way?
Any help appreciated.
If you create an index on the ID variable used for your update, you can do this using a modify statement. This should be much quicker than using an update statement as it avoids creating a temporary copy of the master table - however, if the data step is interrupted there is a risk of data corruption. The syntax is a bit clunky but it can potentially be macro-ised if necessary.
data master;
input ID1 ID2 VAR1 VAR2;
cards;
1 1 2 3
1 2 3 4
2 1 5 6
;
run;
data transaction;
input ID1 VAR1 VAR2;
cards;
1 7 8
;
run;
proc datasets lib =work nolist nodetails;
modify master;
index create ID1;
quit;
data master;
set transaction(rename = (VAR1 = t_VAR1 VAR2 = t_VAR2));
do until(eof);
modify master key = ID1 end = eof;
if _IORC_ then _ERROR_ = 0;
else do;
VAR1 = t_VAR1;
VAR2 = t_VAR2;
replace;
end;
end;
drop t_VAR1 t_VAR2;
run;
If you really want to apply transactions then expand your transaction file to have all possible values of the key variables C,D for the values of C it does contain.
proc sql ;
create table transactions as
select a.D,b.*
from A right join B
on a.C = b.C
order by b.C,a.D
;
quit;
Then do the update.
data want ;
update A transactions ;
id c d;
run;
If you try to use MERGE then you will get in trouble when the extra variables exist in both tables. SAS will only change the values of the first record for each value of C. You could program around this by renaming the variables in the B dataset. You could then explicitly code whether you want the action to be like a MERGE or an UPDATE. So if your extra variable is named E then you could code like this:
data want;
merge a b(in=inb rename=(e=new_e)) ;
by c ;
updated_e = coalesce(new_e,e);
if inb then merged_e = new_e ;
else merged_e = e;
run;
So if you want the effect of merge (so a missing value of E in the transaction makes it missing the result) then use the formula like in MERGED_E. If you want the effect of update then use the formula like in UPDATED_E. If you have more than one extra variable then rename them also and add extra assignment statements to handle them.

Execute Macro inside SQL statement

The Situation:
I have a table mytable with two columns: tablename and tablefield:
|-----------|------------|
| tablename | tablefield |
|-----------|------------|
| table1 | id |
| table2 | date |
| table3 | etc |
|-----------|------------|
My core objective here is basically, make a Select for each of these tablenames, showing the MAX() value of its corresponding tablefield.
Proc SQL;
Select MAX(id) From table1;
Select MAX(date) From table2;
Select MAX(etc) From table3;
Quit;
ps: The solution have to pull the data from the table, so whether the table change its values, the solutions will make its changes also.
What I have tried:
From the most of my attempts, this is the most sofisticated and I believe the nearest from the solution:
proc sql;
create table table_associations (
memname varchar(255), dt_name varchar(255)
);
Insert Into table_associations
values ("table1", "id")
values ("table2", "date")
values ("table3", "etc");
quit;
%Macro Max(field, table);
Select MAX(&field.) From &table.;
%mend;
proc sql;
Select table, field, (%Max(field,table))
From LIB.table_associations
quit;
Creating the Macro, my intend is clear but, for this example, I should solve 2 problems:
Execute a macro inside an SQL Statement; And
Make the macro understand its String value parameter as an SQL command.
In a data step, you can use call execute to do what you're describing.
%Macro Max(field, table);
proc sql;
Select MAX(&field.) From &table.;
quit;
%mend;
data _null_;
set table_associations;
call execute('%MAX('||field||','||table||')');
run;
Macros are not necessary here as you can just generate the code using put statements in the data step:
filename gencode temp;
data _null_;
set table_associations end=eof;
file gencode;
if _n_=1 then put 'proc sql;';
put 'select max(' tablefield ') from ' tablename ';';
if eof then put 'quit;';
run;
%include gencode / source2;
filename gencode clear;
The code is written to the a temporary file named 'gencode'. You could make this file permanent if you want. _n_=1 and end=eof are used to print the statements before and after the queries. Finally, %include gencode runs the code and the source2 option prints the code to the log.

Delete specific rows in Oracle Database using SAS table

I have a Oracle table with 1M rows in it. I have a subset of oracle table in SAS with 3000 rows in it. I want to delete these 3000 rows from the oracle table.
Oracle Table columns are
Col1 Col2 Col3 timestamp
SAS Table columns are:
Col1 Col2 Col3
The only additional column that Oracle table has is a timestamp. This is the code that I using currently, but it's taking a lot of time.
libname ora oracle user='xxx' password='ppp' path = abcd;
PROC SQL;
DELETE from ora.oracle_table a
where exists (select * from sas_table b where a.col1=B.col1 AND a.col2=B.col2 AND A.col3=B.col3 );
QUIT;
Please advise as to how to make it faster and more efficient.
Thank You !
One option is to push your SAS table up to Oracle, then use oracle-side commands to perform the delete. I'm not sure exactly how SAS will translate the above code to DBMS-specific code, but it might be pushing a lot of data over the network depending on how it's able to optimize the query; in particular, if it has to perform the join locally instead of on the database, that's going to be very expensive. Further, Oracle can probably do the delete faster using entirely native operations.
IE:
libname ora ... ;
data ora.gtt_tableb; *or create a temporary or GT table in Oracle and insert into it via proc sql;
set sas_tableb;
run;
proc sql;
connect to oracle (... );
execute (
delete from ...
) by connection to oracle;
quit;
That may offer significant performance improvements over using the LIBNAME connection.
Further improvements may be possible if you take full advantage of an index on your PKs, if you don't already have that.
#Joe has a good answer. Another way would be to do something like this. This MIGHT allow the libname engine to pass all the work to Oracle instead of retrieving rows back to SAS (which is where your time is going).
Created some test data to show
data test1 test2;
do i=1 to 10;
do j=1 to 10;
do k=1 to 10;
output;
end;
end;
end;
run;
data todel;
do i=1 to 3;
do j=1 to 3;
do k=1 to 3;
output;
end;
end;
end;
run;
proc sql noprint;
delete from test1 as a
where a.i in (select distinct i from todel)
and a.j in (select distinct j from todel)
and a.k in (select distinct k from todel);
quit;
proc sql noprint;
delete from test2 as a
where exists (select * from todel as b where a.i=b.i and a.j=b.j and a.k=b.k);
quit;
thank you guys. Joe I used your suggestion and wrote this code.
/*---create a temp table in oracle---*/
libname ora oracle user='xxx' password='ppp' path = abcd;
proc append base=ora.TEMP_TABLE data=SAS.sas_TABLE;
run;
/*-----delete the rows using the temp table--------*/
proc sql;
connect to oracle(......);
execute (delete from ORA.ORACLE_TABLE a
where exists (select * from ora.TEMP_TABLE b where a.col1=B.col1 AND a.col2=B.col2 AND A.col3=B.col3)
) by oracle;
quit;
Thank you so much guys ! I appreciate your feedback.