Mean Imputation with SQL

Mean Imputation with SQL - sql

PROC SQL;
UPDATE GUEST
SET
STAY_DURATION = ( CASE WHEN STAY_DURATION EQ . THEN MEAN(STAY_DURATION )
ELSE STAY_DURATION END AS STAY_DURATION FORMAT 8.0 END);
RUN;
I would like to insert the average straight into the dataset without going through the process of creating a new table then update the main dataset. Well, I did this but I want to use a nested CASE statement with the update query for multiple variables.

You can use a subquery for the calculation:
PROC SQL;
UPDATE GUEST
SET STAY_DURATION = (SELECT AVG(STAY_DURATION) FROM GUEST)
WHERE STAY_DURATION IS NULL;
If you want to just use PROC SQL, you can use two steps:
PROC SQL;
CREATE TABLE AVG_GUEST AS
SELECT AVG(STAY_DURATION) as AVG_SD FROM GUEST;
RUN;
PROC SQL;
UPDATE GUEST
SET STAY_DURATION = (SELECT AVG_SD FROM AVG_GUEST)
WHERE STAY_DURATION IS NULL;

It is normally not a good idea to overwrite your input data. Make a new dataset with your modifications to the data. You can use PROC STDIZE to replace missing values with the mean of the variable.
proc stdize data=guest out=want reponly missing=mean;
var stay_duration;
run;
In SQL
proc sql;
create table WANT as
select *
, coalesce(stay_duration,mean(stay_duration)) as stay_duration_imputed
from guest
;
quit;

Related

Changing FROM statement with a variable

I am trying to change the name of the table I am getting my data from
Like this:
COREPOUT.KUNDE_REA_UDL_202112 --> COREPOUT.KUNDE_REA_UDL_202203
I create my variable like this:
PROC SQL NOPRINT;
SELECT DISTINCT
PERIOKVT_PREV_BANKSL_I_YYMMN6
INTO :PERIOKVT_PREV_BANKSL_I_YYMMN6
FROM Datostamp_PREV_Kvartal;
This is the code I want to use the variable for.
%_eg_conditional_dropds(WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000);
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000 AS
SELECT t1.Z_ORDINATE,
(input(t1.cpr_se,w.)) AS KundeNum
FROM COREPOUT.KUNDE_REA_UDL_202203 t1;
QUIT;
I have tried things like:
FROM string("COREPOUT.KUNDE_REA_UDL_",PERIOKVT_PREV_BANKSL_I_YYMMN6," t1";
I hope you can point me in the right direction.

Use & to reference and resolve macro variables into strings (e.g. &PERIOKVT_PREV_BANKSL_I_YYMMN6).
proc sql noprint;
select distinct PERIOKVT_PREV_BANKSL_I_YYMMN6
into :PERIOKVT_PREV_BANKSL_I_YYMMN6
from Datostamp_PREV_Kvartal
;
quit;
proc sql;
create table WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000 AS
select t1.Z_ORDINATE,
(input(t1.cpr_se,w.)) AS KundeNum
from &PERIOKVT_PREV_BANKSL_I_YYMMN6 t1
;
quit;

You can use CALL SYMPUTX() to move values from a dataset into a macro variable.
data _null_;
set Datostamp_PREV_Kvartal;
call symputx('dataset_name',PERIOKVT_PREV_BANKSL_I_YYMMN6);
stop;
run;
Then use the value of the macro variable to insert the dataset name into the code at the appropriate place. So your posted SQL is equivalent to this simple data step.
data QUERY_FOR_KUNDE_REA_UDL_20_0000;
set &dataset_name. ;
KundeNum = input(cpr_se,32.);
keep Z_ORDINATE KundeNum;
run;
Note: I did not see any definition of a user defined informat named W in your posted code so I just replaced it with the normal numeric informat instead since it looked like you where trying to convert a character value into a number.

The solution I ended up with was inspried by #Stu Sztukowski response:
I made a data step to concat the variable and created a macro variable.
data Concat_var;
str_PERIOKVT_PREV_YYMMN6 = CAT("COREPOUT.KUNDE_REA_UDL_",&PERIOKVT_PREV_BANKSL_I_YYMMN6," t1");
run;
PROC SQL NOPRINT;
SELECT DISTINCT
str_PERIOKVT_PREV_YYMMN6
INTO :str_PERIOKVT_PREV_YYMMN6
FROM Concat_var;
Then I used the variable in the FROM statement:
%_eg_conditional_dropds(WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000);
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_KUNDE_REA_UDL_20_0000 AS
SELECT t1.Z_ORDINATE,
(input(t1.cpr_se,w.)) AS KundeNum
FROM &str_PERIOKVT_PREV_YYMMN6;
QUIT;
I hope this helps someone else in the future.

Dynamize range of SAS PROC SQL SELECT INTO macro creation

I want to put multiple observations into an own macro variable. I would do this by using select into :obs1 - :obs4, however, as count of observations can differ, i would like to dynamize the range and my code looks like this:
proc sql;
create table segments as select distinct substr(name,1,6) as segment from dictionary.columns
where libname = 'WORK' and memname = 'ALL_CCFS' and name ne 'MONTH';
run;
proc sql noprint;
select count(*) into: count from segments;
run;
proc sql noprint;
select segment into :segment_1 - :segment_&count. from dictionary.columns;
run;
However, this doesn't seem to work... any suggestions? Thank you!

Leave last value empty/blank and SAS will create them automatically
Set it to an absurdly large number and SAS will only use what's required
Use a data step to create it where you can dynamically increment your number (not shown).
proc sql noprint;
select segment into :segment_1 -
from dictionary.columns;
run;
proc sql noprint;
select segment into :segment_1 - :segment_999
from dictionary.columns;
run;

Proc Sql Select Into Is Creating a Temporary Variable that I can't Call

I am trying to use proc sql select into to create a variable that I then try to call later. This variable is the average price (BlockPrice).
proc sql;
create table Block_Price_Calc as
select mean(Price) Into : BlockPrice
from Data1
Where As_Of_Date >= '31MAR2015'd and As_Of_Date < '07APR2015'd;
quit;
%put &BlockPrice;
proc sql;
create table Want as
select *,
(&BlockPrice) as Block
from Data2;
quit;
The variable BlockPrice is not being recognized and it seems like it is being stored as a temporary variable. Any thoughts?

An INTO clause cannot be used in a CREATE TABLE statement.
proc sql;
select mean(Price) Into : BlockPrice
from Data1
Where As_Of_Date >= '31MAR2015'd and As_Of_Date < '07APR2015'd;
quit;

Update Oracle table from SAS dataset

How do I update an Oracle table in SAS from a SAS dataset?
Here's the scenario:
Trough a libname I load an Oracle table into a SAS dataset.
Make some data processing during which I UPDATE some values, INSERT some new observations and DELETE some observations in the dataset.
I need to update the original Oracle table with the dataset I've modified in the previous step - so when there's a match between the keys of the oracle table and the dataset, then the values will be updated, when there's a missing key in the oracle table, then it will be inserted, and when there's a key which is in the Oracle table but already deleted from the dataset, then it will be deleted from the Oracle table.
NOTE: I can not create a new table in Oracle. I need to make the "updating" on the original table.
I was trying to do it in two step using MERGE INTO and DELETE, but there's no MERGE INTO in PROC SQL.
I would really appreciate any help.
EDIT: I was also thinking about just truncating the oracle table and inserting the rows (talking about 4-5000 rows per procedure run), but seems like there's no built in truncate statement in PROC SQL.

Please try using the below,
Method 1:
PROC SQL;
insert into <User_Defined_Oracle_table>
select variables
from <SAS_Tables>;
QUIT;
Above creates a table that resides in the same database and schema.
PROC SQL;
connect to oracle (user= oraclepwd=);
execute(
UPDATE <Oracle_table> a SET <Column to be updated> = (SELECT <Columns to update seperated by commas>
FROM <SAS_table> b
WHERE a.<VARIABLE>=b.<VARIABLE>)
WHERE exists (select * from <SAS_table> b
WHERE a.<VARIABLE>=b.<VARIABLE> ))
by oracle;
QUIT;
PROC SQL;
connect to oracle
(user= oraclepwd=};
execute (truncate table <SAS_table>) by
oracle;
QUIT;
This is one of the efficient ways to update the oracle table.
Please refer to Update Oracle using SAS for more information.
Method 2:
LIBNAME Sample oracle user= password= path= schema= ; run;
PROC SQL;
UPDATE Sample_Oracle.<Table_Name> as a SET <Variable_Name> = (SELECT <Varibales>
FROM <Sas_table> as b
WHERE <A.Variable_Name>=<B.Variable_Name>)
WHERE exists
(select * from <Sas_table> as b
WHERE <A.Variable_Name>=<B.Variable_Name>);
QUIT;
This method takes longer processing time of all methods.
Also,
Method 3:
%MACRO update_oracle (SAS_Table,Oracle_Table);
Proc sql ;
select count(*) into: Count_Obs from <SAS_Table> ; Quit;
%do i = 1 %to &Count_Obs;
Proc sql;
select <variables to update seperated by commas> into: <macros> ; Quit;
PROC SQL;
UPDATE &Oracle_Table as a
SET <Oracle_Variable_to_Update>=<Variable_macro_created_above>
WHERE <A.Variable_Name>=<B.Variable_Name>
QUIT;
%end;
%MEND update_oracle;
%update_oracle();
The macro variables SAS_Table and Oracle_Table represent the SAS Dataset that contains the records to update and records to be updated in oracle, respectively.
Method 3 uses less processing time than method 2 but not as efficient as method 1.

Surely there are UPDATE and INSERT methods in proc SQL. Also, check if SAS will allow you to do other SQL operations "execute immediate" (such as PL/SQL will allow) where you can construct the SQL statement as a string, then send it to Oracle to execute.

SQL Accessing SAS Variable in Query

I'm attempting to use SASDOS in my statement below, but it's failing to be found. My understanding is that I have to use a form of derived table to access this new column. Is this correct? If so, could someone please help elaborate on how to do that?
proc sql;
create table TEST as
select
DQBBDA AS 'Sbm Date'n,
case when 'Sbm Date'n > 999999
then input('1' || substr(put('Sbm Date'n,z8.),3), z7.)
end as SASDOS format=z7.
from
DB2SCHEMA.ORIGIN
where
SASDOS = 1130314;
quit;

As sasfrog commented, you need to add the CALCULATED keyword to refer to a new column in SAS SQL and you should refer to the native DB2 column in your query. For example:
proc sql;
create table TEST as
select DQBBDA AS 'Sbm Date'n
, case when DQBBDA > 999999
then input('1' || substr(put(DQBBDA,z8.),3), z7.)
end as SASDOS format=z7.
from DB2SCHEMA.ORIGIN
WHERE CALCULATED SASDOS = 1130314;
quit;
However, you really should rethink what you are doing and figure out how to write a WHERE clause that uses only columns from DB2; otherwise the entire table must be pulled back to SAS (a likely poor solution). Cases like this are probably better solved using a pass-thru query (where you can execute native SQL directly in DB2).
UPDATE: Here is another (tested) example using a SAS data set rather than a table from a LIBNAME reference. Notice I'm also correcting a syntax error with the input function (the last parameter should be 7. not z7.).
data ORIGIN;
DQBBDA = 11130314; output;
DQBBDA = 22130314; output;
run;
options validvarname=any;
proc sql;
create table TEST as
select DQBBDA AS 'Sbm Date'n
, case when DQBBDA > 999999
then input('1' || substr(put(DQBBDA,z8.),3), 7.)
end as SASDOS format=z7.
from ORIGIN
WHERE CALCULATED SASDOS = 1130314;
quit;

I think rather than where calculated sasdos you should use having sasdos

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Mean Imputation with SQL - sql

Related

Changing FROM statement with a variable

Dynamize range of SAS PROC SQL SELECT INTO macro creation

Proc Sql Select Into Is Creating a Temporary Variable that I can't Call

Update Oracle table from SAS dataset

SQL Accessing SAS Variable in Query

Categories

Resources