Delete specific rows in Oracle Database using SAS table - sql

I have a Oracle table with 1M rows in it. I have a subset of oracle table in SAS with 3000 rows in it. I want to delete these 3000 rows from the oracle table.
Oracle Table columns are
Col1 Col2 Col3 timestamp
SAS Table columns are:
Col1 Col2 Col3
The only additional column that Oracle table has is a timestamp. This is the code that I using currently, but it's taking a lot of time.
libname ora oracle user='xxx' password='ppp' path = abcd;
PROC SQL;
DELETE from ora.oracle_table a
where exists (select * from sas_table b where a.col1=B.col1 AND a.col2=B.col2 AND A.col3=B.col3 );
QUIT;
Please advise as to how to make it faster and more efficient.
Thank You !

One option is to push your SAS table up to Oracle, then use oracle-side commands to perform the delete. I'm not sure exactly how SAS will translate the above code to DBMS-specific code, but it might be pushing a lot of data over the network depending on how it's able to optimize the query; in particular, if it has to perform the join locally instead of on the database, that's going to be very expensive. Further, Oracle can probably do the delete faster using entirely native operations.
IE:
libname ora ... ;
data ora.gtt_tableb; *or create a temporary or GT table in Oracle and insert into it via proc sql;
set sas_tableb;
run;
proc sql;
connect to oracle (... );
execute (
delete from ...
) by connection to oracle;
quit;
That may offer significant performance improvements over using the LIBNAME connection.
Further improvements may be possible if you take full advantage of an index on your PKs, if you don't already have that.

#Joe has a good answer. Another way would be to do something like this. This MIGHT allow the libname engine to pass all the work to Oracle instead of retrieving rows back to SAS (which is where your time is going).
Created some test data to show
data test1 test2;
do i=1 to 10;
do j=1 to 10;
do k=1 to 10;
output;
end;
end;
end;
run;
data todel;
do i=1 to 3;
do j=1 to 3;
do k=1 to 3;
output;
end;
end;
end;
run;
proc sql noprint;
delete from test1 as a
where a.i in (select distinct i from todel)
and a.j in (select distinct j from todel)
and a.k in (select distinct k from todel);
quit;
proc sql noprint;
delete from test2 as a
where exists (select * from todel as b where a.i=b.i and a.j=b.j and a.k=b.k);
quit;

thank you guys. Joe I used your suggestion and wrote this code.
/*---create a temp table in oracle---*/
libname ora oracle user='xxx' password='ppp' path = abcd;
proc append base=ora.TEMP_TABLE data=SAS.sas_TABLE;
run;
/*-----delete the rows using the temp table--------*/
proc sql;
connect to oracle(......);
execute (delete from ORA.ORACLE_TABLE a
where exists (select * from ora.TEMP_TABLE b where a.col1=B.col1 AND a.col2=B.col2 AND A.col3=B.col3)
) by oracle;
quit;
Thank you so much guys ! I appreciate your feedback.

Related

Changing FROM statement with a variable

I am trying to change the name of the table I am getting my data from
Like this:
COREPOUT.KUNDE_REA_UDL_202112 --> COREPOUT.KUNDE_REA_UDL_202203
I create my variable like this:
PROC SQL NOPRINT;
SELECT DISTINCT
PERIOKVT_PREV_BANKSL_I_YYMMN6
INTO :PERIOKVT_PREV_BANKSL_I_YYMMN6
FROM Datostamp_PREV_Kvartal;
This is the code I want to use the variable for.
%_eg_conditional_dropds(WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000);
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000 AS
SELECT t1.Z_ORDINATE,
(input(t1.cpr_se,w.)) AS KundeNum
FROM COREPOUT.KUNDE_REA_UDL_202203 t1;
QUIT;
I have tried things like:
FROM string("COREPOUT.KUNDE_REA_UDL_",PERIOKVT_PREV_BANKSL_I_YYMMN6," t1";
I hope you can point me in the right direction.
Use & to reference and resolve macro variables into strings (e.g. &PERIOKVT_PREV_BANKSL_I_YYMMN6).
proc sql noprint;
select distinct PERIOKVT_PREV_BANKSL_I_YYMMN6
into :PERIOKVT_PREV_BANKSL_I_YYMMN6
from Datostamp_PREV_Kvartal
;
quit;
proc sql;
create table WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000 AS
select t1.Z_ORDINATE,
(input(t1.cpr_se,w.)) AS KundeNum
from &PERIOKVT_PREV_BANKSL_I_YYMMN6 t1
;
quit;
You can use CALL SYMPUTX() to move values from a dataset into a macro variable.
data _null_;
set Datostamp_PREV_Kvartal;
call symputx('dataset_name',PERIOKVT_PREV_BANKSL_I_YYMMN6);
stop;
run;
Then use the value of the macro variable to insert the dataset name into the code at the appropriate place. So your posted SQL is equivalent to this simple data step.
data QUERY_FOR_KUNDE_REA_UDL_20_0000;
set &dataset_name. ;
KundeNum = input(cpr_se,32.);
keep Z_ORDINATE KundeNum;
run;
Note: I did not see any definition of a user defined informat named W in your posted code so I just replaced it with the normal numeric informat instead since it looked like you where trying to convert a character value into a number.
The solution I ended up with was inspried by #Stu Sztukowski response:
I made a data step to concat the variable and created a macro variable.
data Concat_var;
str_PERIOKVT_PREV_YYMMN6 = CAT("COREPOUT.KUNDE_REA_UDL_",&PERIOKVT_PREV_BANKSL_I_YYMMN6," t1");
run;
PROC SQL NOPRINT;
SELECT DISTINCT
str_PERIOKVT_PREV_YYMMN6
INTO :str_PERIOKVT_PREV_YYMMN6
FROM Concat_var;
Then I used the variable in the FROM statement:
%_eg_conditional_dropds(WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000);
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000 AS
SELECT t1.Z_ORDINATE,
(input(t1.cpr_se,w.)) AS KundeNum
FROM &str_PERIOKVT_PREV_YYMMN6;
QUIT;
I hope this helps someone else in the future.

How to create a table using `Proc SQL` without selecting from existing tables

What is the SAS version of "select from dual"? I want to create a table using Proc SQL without selecting from excisting tables. for instance. Basically I want something like:
PROC SQL;
CREATE TABLE tmptable AS
SELECT 1 AS myvar FROM dual;
QUIT;
This does not work. What choices do I have?
I don't think there is anything like select from DUAL in SAS.
But you could try if this helps you:
proc sql inobs=1; /* limit read to only one observations */
select 1 as myvar
from sashelp.table /* or any table desired */
;
quit;
The INOBS=1 makes sure you only read one row from sashelp.table, so you only have one result.
As you state, SAS Proc SQL does not have a premade DUAL table.
You can use CREATE and INSERT statements instead.
Example
proc sql;
create table want (x num);
insert into want values (1);
insert into want
values(2)
values(3)
;
quit;
or create your own DUAL first (perhaps if migrating SQL code into SAS Proc SQL)
proc sql;
create table dual (dummy char(1)); insert into dual values ('X');
CREATE TABLE tmptable AS
SELECT 1 AS myvar FROM dual;
quit;

Mean Imputation with SQL

PROC SQL;
UPDATE GUEST
SET
STAY_DURATION = ( CASE WHEN STAY_DURATION EQ . THEN MEAN(STAY_DURATION )
ELSE STAY_DURATION END AS STAY_DURATION FORMAT 8.0 END);
RUN;
I would like to insert the average straight into the dataset without going through the process of creating a new table then update the main dataset. Well, I did this but I want to use a nested CASE statement with the update query for multiple variables.
You can use a subquery for the calculation:
PROC SQL;
UPDATE GUEST
SET STAY_DURATION = (SELECT AVG(STAY_DURATION) FROM GUEST)
WHERE STAY_DURATION IS NULL;
If you want to just use PROC SQL, you can use two steps:
PROC SQL;
CREATE TABLE AVG_GUEST AS
SELECT AVG(STAY_DURATION) as AVG_SD FROM GUEST;
RUN;
PROC SQL;
UPDATE GUEST
SET STAY_DURATION = (SELECT AVG_SD FROM AVG_GUEST)
WHERE STAY_DURATION IS NULL;
It is normally not a good idea to overwrite your input data. Make a new dataset with your modifications to the data. You can use PROC STDIZE to replace missing values with the mean of the variable.
proc stdize data=guest out=want reponly missing=mean;
var stay_duration;
run;
In SQL
proc sql;
create table WANT as
select *
, coalesce(stay_duration,mean(stay_duration)) as stay_duration_imputed
from guest
;
quit;

Update Oracle table from SAS dataset

How do I update an Oracle table in SAS from a SAS dataset?
Here's the scenario:
Trough a libname I load an Oracle table into a SAS dataset.
Make some data processing during which I UPDATE some values, INSERT some new observations and DELETE some observations in the dataset.
I need to update the original Oracle table with the dataset I've modified in the previous step - so when there's a match between the keys of the oracle table and the dataset, then the values will be updated, when there's a missing key in the oracle table, then it will be inserted, and when there's a key which is in the Oracle table but already deleted from the dataset, then it will be deleted from the Oracle table.
NOTE: I can not create a new table in Oracle. I need to make the "updating" on the original table.
I was trying to do it in two step using MERGE INTO and DELETE, but there's no MERGE INTO in PROC SQL.
I would really appreciate any help.
EDIT: I was also thinking about just truncating the oracle table and inserting the rows (talking about 4-5000 rows per procedure run), but seems like there's no built in truncate statement in PROC SQL.
Please try using the below,
Method 1:
PROC SQL;
insert into <User_Defined_Oracle_table>
select variables
from <SAS_Tables>;
QUIT;
Above creates a table that resides in the same database and schema.
PROC SQL;
connect to oracle (user= oraclepwd=);
execute(
UPDATE <Oracle_table> a SET <Column to be updated> = (SELECT <Columns to update seperated by commas>
FROM <SAS_table> b
WHERE a.<VARIABLE>=b.<VARIABLE>)
WHERE exists (select * from <SAS_table> b
WHERE a.<VARIABLE>=b.<VARIABLE> ))
by oracle;
QUIT;
PROC SQL;
connect to oracle
(user= oraclepwd=};
execute (truncate table <SAS_table>) by
oracle;
QUIT;
This is one of the efficient ways to update the oracle table.
Please refer to Update Oracle using SAS for more information.
Method 2:
LIBNAME Sample oracle user= password= path= schema= ; run;
PROC SQL;
UPDATE Sample_Oracle.<Table_Name> as a SET <Variable_Name> = (SELECT <Varibales>
FROM <Sas_table> as b
WHERE <A.Variable_Name>=<B.Variable_Name>)
WHERE exists
(select * from <Sas_table> as b
WHERE <A.Variable_Name>=<B.Variable_Name>);
QUIT;
This method takes longer processing time of all methods.
Also,
Method 3:
%MACRO update_oracle (SAS_Table,Oracle_Table);
Proc sql ;
select count(*) into: Count_Obs from <SAS_Table> ; Quit;
%do i = 1 %to &Count_Obs;
Proc sql;
select <variables to update seperated by commas> into: <macros> ; Quit;
PROC SQL;
UPDATE &Oracle_Table as a
SET <Oracle_Variable_to_Update>=<Variable_macro_created_above>
WHERE <A.Variable_Name>=<B.Variable_Name>
QUIT;
%end;
%MEND update_oracle;
%update_oracle();
The macro variables SAS_Table and Oracle_Table represent the SAS Dataset that contains the records to update and records to be updated in oracle, respectively.
Method 3 uses less processing time than method 2 but not as efficient as method 1.
Surely there are UPDATE and INSERT methods in proc SQL. Also, check if SAS will allow you to do other SQL operations "execute immediate" (such as PL/SQL will allow) where you can construct the SQL statement as a string, then send it to Oracle to execute.

Conditional Insert using SAS Proc SQL

I am trying to add records from one smaller table into a very large table if the primary key value for the rows in the smaller table is not in the larger one:
data test;
Length B C $4;
infile datalines delimiter=',';
input a b $ c $;
datalines;
1000,Test,File
2000,Test,File
3000,Test,File
;
data test2;
Length B C $4;
infile datalines delimiter=',';
input a b $ c $;
datalines;
1000,Test,File
4000,Test,File
;
proc sql;
insert into test
select * from test2
where a not in (select a from test2);
quit;
This however insets no records into the table Test. Can anyone tell me what I am doing wrong? The end result should be that the row where a = 4000 should be added to the table Test.
EDIT:
Using where a not in (select a from test) was what I originally tried and it generated the following error:
WARNING: This DELETE/INSERT statement recursively references the target table. A consequence of this is a possible data integrity problem.
ERROR: You cannot reopen WORK.TEST.DATA for update access with member-level control because WORK.TEST.DATA is in use by you in resource
environment SQL.
ERROR: PROC SQL could not undo this statement if an ERROR were to happen as it could not obtain exclusive access to the data set. This
statement will not execute as the SQL option UNDO_POLICY=REQUIRED is in effect.
224 quit;
Thanks
You can either do the process in two steps. First create the table of records to insert and then insert them.
proc sql ;
create table to_add as
select * from test2
where a not in (select a from test)
;
insert into test select * from to_add ;
quit;
Or you could just change the setting for the UNDO_POLICY option and SAS will let you reference TEST while updating TEST.
proc sql undo_policy=none;
insert into test
select * from test2
where a not in (select a from test)
;
quit;