SAS PROC SQL NOT CONTAINS multiple values in one statement - sql

In PROC SQL, I need to select all rows where a column called "NAME" does not contain multiple values "abc", "cde" and "fbv" regardless of what comes before or after these values. So I did it like this:
SELECT * FROM A WHERE
NAME NOT CONTAINS "abc"
AND
NAME NOT CONTAINS "cde"
AND
NAME NOT CONTAINS "fbv";
which works just fine, but I imagine it would be a headache if we had a hundred of conditions. So my question is - can we accomplish this in a single statement in PROC SQL?
I tried using this:
SELECT * FROM A WHERE
NOT CONTAINS(NAME, '"abc" AND "cde" AND "fbv"');
but this doesn't work in PROC SQL, I am getting the following error:
ERROR: Function CONTAINS could not be located.
I don't want to use LIKE.

You could use regular expressions, I suppose.
data a;
input name $;
datalines;
xyabcde
xyzxyz
xycdeyz
xyzxyzxyz
fbvxyz
;;;;
run;
proc sql;
SELECT * FROM A WHERE
NAME NOT CONTAINS "abc"
AND
NAME NOT CONTAINS "cde"
AND
NAME NOT CONTAINS "fbv";
SELECT * FROM A WHERE
NOT (PRXMATCH('~ABC|CDE|FBV~i',NAME));
quit;
You can't use CONTAINS that way, though.

You can use NOT IN:
SELECT * FROM A WHERE
NAME NOT IN ('abc','cde','fbv');

If the number of items is above reasonable number to build inside code, you can create a table (work.words below) to store the words and iterate over it to check occurrences:
data work.values;
input name $;
datalines;
xyabcde
xyzxyz
xycdeyz
xyzxyzxyz
fbvxyz
;
run;
data work.words;
length word $50;
input word $;
datalines;
abc
cde
fbv
;
run;
data output;
set values;
/* build a has of words */
length word $50;
if _n_ = 1 then do;
/* this runs once only */
call missing(word);
declare hash words (dataset: 'work.words');
words.defineKey('word');
words.defineData('word');
words.defineDone();
end;
/* iterate hash of words */
declare hiter iter('words');
rc = iter.first();
found = 0;
do while (rc=0);
if index(name, trim(word)) gt 0 then do; /* check if word present using INDEX function */
found= 1;
rc = 1;
end;
else rc = iter.next();
end;
if found = 0 then output; /* output only if no word found in name */
drop word rc found;
run;

Related

Building an SQL query in Base SAS for 81 variables to report count of non null values

I have a data template with 81 exposure elements/ variables and approx 9 million rows for loans generated by a bank for e.g. customer number , reporting date, account number , customer type etc.
I need to conduct data validation to report
variable missing or not
No of non missing values populated for each available variable
the data type of the values populated under each variable
Individually for each variable I'm using the query
select COUNT(variable) from library.table where not missing(variable);
quit;
How can I extend the above query to all 81 variables ?
I already have the attributes using
proc sql;
create table test as select * from dictionary.columns where libname="XXX" and memname="tablename";
quit;
But if the above could be incorporated in one holistic query that could generate an output which I can potentially export as an excel , that would be great
Thanks
In SAS there are usually PROCs for this kind of task. For this, it would be the PROC FREQ, e. g. this example.
If you want an output dataset you can adapt the linked solution and do
proc format;
value $missfmt ' ' = 'Missing' other = 'Not Missing';
value missfmt . = 'Missing' other = 'Not Missing';
run;
/* Capture the output. */
ods output OneWayFreqs = want;
/* Count missing and not missing. */
proc freq data=have;
format _char_ $missfmt. _numeric_ missfmt.;
tables _all_ / missing nocum nopercent;
run;
data want;
set want;
array f {*} f_:;
/* Extract column name. */
do i = 1 to dim(f);
if not missing(f[i]) then
column = substr(vname(f[i]), 3);
end;
/* Extract column type. */
type = vtypex(column);
/* Get value, i. e. missing or not missing. */
value = cats(of f_:);
run;
proc sort data=want;
by column type value;
run;
/* Transpose the missing and not missing rows into two columns. */
proc transpose data=want out=want(drop=_:);
by column type;
id value;
var frequency;
run;

Rename column headers - either after a key database in sas - or after values from first row

I need to rename the column headers of my variables so they match what I have in my key list. I attached a picture below to describe what I have and what I need.
My Data
I don't necesarily need actual code, just an idea of how to make it happen. :)
Thank you so much folks, and so sorry about the changes, I have never posted a question before.
If you have a table like
NEW1 NEW2 NEW3
OLDX OLDY OLDZ
And you want to use it to generate rename statement like
rename oldx=new1 oldy=new2 oldz=new3 ;
Then an easy way to do it is to use PROC TRANSPOSE to convert it into a separate row for each name pair.
proc transpose data=have out=names ;
var _all_;
run;
Which will get you a table like
_NAME_ COL1
NEW1 OLDX
NEW2 OLDY
NEW3 OLDZ
Then you can either use PROC SQL to quickly generate a macro variable with the pairs.
proc sql noprint;
select catx('=',col1,_name_) into :rename separated by ' '
from names;
quit;
data new ;
set old;
rename &rename ;
run;
If the list of names is too long to put into a single macro variable then just use a data step to generate the rename statement to a text file and use %INCLUDE to run it where you want.
filename code temp;
data _null_;
set names end=eof;
file code ;
if _n_=1 then put 'rename' ;
put col1 '=' _name_ ;
if eof then put ';';
run;
data new ;
set old;
%include code ;
run;
EDIT
You could probably do the last step directly from the data set and skip the proc transpose.
filename code temp;
data _null_;
set have ;
array _X _character_ ;
file code ;
put 'rename ' # ;
do i=1 to dim(_X);
oldname = _x(i);
newname = vname(_x(i));
put oldname '=' newname #;
end;
put / ';' ;
stop;
run;
You can use column aliases to change what's displayed in the results header row.
SELECT A AS 'NewA',
B AS 'OtherB',
C AS 'diffC'
FROM <<Table>>
If you want 'NewA OtherB diffC' as a row in the results, you could do this:
SELECT 'NewA' AS 'A',
'OtherB' AS 'B',
'diffC' AS 'C'
UNION
SELECT A,
B,
C
FROM <<Table>>

simple SAS select into

I want to use "select into" to create a list of all IDs in SAS.
/* my state table try01 */
data try01;
input id state $;
cards;
1108 va
1102 dc
1101 md
1105 on
;
run;
/* select into */
proc sql noprint;
select id into: x from try01;
quit;
%put &x;
My question is why the log shows that macro x is only one value (1108) instead
of a list (1108,1102,1101,1105) ? So confused... thanks a lot.
If you want SQL to put multiple values into the macro variable then you need to include the SEPARATED BY clause.
select id into :x separated by ' ' from try01;
You could then use this list in, for example an IN operator call.
proc print data=have ;
where id in (&x);
run;

SAS proc sql inside %macro

Firstly I have the following table:
data dataset;
input id $ value;
datalines;
A 1
A 2
A 3
A 4
B 2
B 3
B 4
B 5
C 2
C 4
C 6
C 8
;
run;
I would like to write a macro so that the user can subset the data by giving the id value. I do proc sql inside the macro as follows:
%macro sqlgrp(id=,);
proc sql;
create table output_&id. as
select *
from dataset
where id = '&id.'
;
quit;
%mend;
%sqlgrp(id=A); /*select id=A only*/
I am able to generate the output_A table in the WORK library, however it has zero (0) observations.
Why is this not working?
You need to use double quotes when referring to macro variables.
Current Code
%macro sqlgrp(id=,);
proc sql;
create table output_&id. as
select *
from dataset
where id = '&id.'
;
quit;
%mend;
%sqlgrp(id=A); /*select id=A only*/
Looks for values of id that are literally '&id.'. You can test this by creating this dataset:
data dataset;
input id $ value;
datalines;
&id. 2
A 2
;
run;
Now, use %let to set the value of the macro variable id:
%let id=A;
Run a quick test of the functionality difference between single and double quotes. Notice the titles also contain single and double quotes, so we can see exactly what has happened in the output:
proc sql;
title 'Single Quotes - where id=&id.';
select *
from dataset
where id='&id.';
title "Double Quotes - where id=&id.";
select *
from dataset
where id="&id.";
title;
quit;
Correct Code
%macro sqlgrp(id=,);
proc sql;
create table output_&id. as
select *
from dataset
where id = "&id."
;
quit;
%mend;
%sqlgrp(id=A); /*select id=A only*/
The double quotes allow the macro variable &id to resolve to 'A', which will return results based on your input.
Just a simple rewrite of the previous answer which passes 'in' and 'out' through a signature of the macros
%macro sqlgrp(in=, id=, out=);
proc sql noprint;
create table &out. as select * from &in. where id = "&id.";
quit;
%mend sqlgrp;

Select character variables that have all missing values

I have a SAS dataset with around 3,000 variables, and I would like to get rid of the character variables for which all values are missing. I know how to do this for numeric variables-- I'm wondering specifically about the character variables. I need to do the work using base SAS, but that could include proc SQL, which is why I've tagged this one 'SQL' also.
Thank you!
Edit:
Background info: This is a tall dataset, with survey data from 7 waves of interviews. Some, but not all, of the survey items (variables) were repeated across waves. I'm trying to create a list of items that were actually used in each wave by pulling all the records for that wave, getting rid of all the columns that have nothing but SAS's default missing values, and then running proc contents.
I created a macro that will check for empty character columns and either remove them from the original or create a new data set with the empty columns removed. It takes two optional arguments: The name of the data set (default is the most recently created data set), and a suffix to name the new copy (set suffix to nothing to edit the original).
It uses proc freq with the levels option and a custom format to determine the empty character columns. proc sql is then used to create a list of the columns to be removed and store them in a macro variable.
Here is the macro:
%macro delemptycol(ds=_last_, suffix=_noempty);
option nonotes;
proc format;
value $charmiss
' '= ' '
other='1';
run;
%if "&ds"="_last_" %then %let ds=&syslast.;
ods select nlevels;
ods output nlevels=nlev;
proc freq data=&ds.(keep=_character_) levels ;
format _character_ $charmiss.;
run;
ods output close;
/* create macro var with list of cols to remove */
%local emptycols;
proc sql noprint;
select tablevar into: emptycols separated by ' '
from nlev
where NNonMissLevels=0;
quit;
%if &emptycols.= %then %do;
%put DELEMPTYCOL: No empty character columns were found in data set &ds.;
%end;
%else %do;
%put DELEMPTYCOL: The following empty character columns were found in data set &ds. : &emptycols.;
%put DELEMPTYCOL: Data set &ds.&suffix created with empty columns removed;
data &ds.&suffix. ;
set &ds(drop=&emptycols);
run;
%end;
options notes;
%mend;
Examples usage:
/* create some fake data: Here char5 will be empty */
data chardata(drop= j randnum);
length char1-char5 $8.;
array chars(5) char1-char5;
do i=1 to 100;
call missing(of char:);
randnum=floor(10*ranuni(i));
do j=2 to 5;
if (j-1)<randnum<=(j+1) then chars(j-1)="FOO";
end;
output;
end;
run;
%delemptycol(); /* uses default _last_ for the data and "_noempty" as the suffix */
%delemptycol(ds=chardata, suffix=); /* removes the empty columns from the original */
There's probably a simpler way but this is what I came up with.
Cheers
Rob
EDIT: Note that this works for both character and numeric variables.
**
** TEST DATASET
*;
data x;
col1 = "a"; col2 = ""; col3 = "c"; output;
col1 = "" ; col2 = ""; col3 = "c"; output;
col1 = "a"; col2 = ""; col3 = "" ; output;
run;
**
** GET A LIST OF VARIABLE NAMES
*;
proc sql noprint;
select name into :varlist separated by " "
from sashelp.vcolumn
where upcase(libname) eq "WORK"
and upcase(memname) eq "X";
quit;
%put &varlist;
**
** USE A MACRO TO CREATE A DATASTEP. FOR EACH COLUMN THE
** THE DATASTEP WILL CREATE A NEW COLUMN WITH THE SAME NAME
** BUT PREFIXED WITH "DELETE_". IF THERE IS AT LEAST 1
** NON-MISSING VALUE FOR THE COLUMN THEN THE "DELETE" COLUMN
** WILL FINISH WITH A VALUE OF 0, ELSE 1. WE WILL ONLY
** KEEP THE COLUMNS CALLED "DELETE_" AND OUTPUT ONLY A SINGLE
** OBSERVATION TO THE FINAL DATASET.
*;
%macro find_unused_cols(iDs=);
%local cnt;
data vars_to_delete;
set &iDs end=eof;
%let cnt = 1;
%let varname = %scan(&varlist, &cnt);
%do %while ("&varname" ne "");
retain delete_&varname;
delete_&varname = min(delete_&varname, missing(&varname));
drop &varname;
%let cnt = %eval(&cnt + 1);
%let varname = %scan(&varlist, &cnt);
%end;
if eof then do;
output;
end;
run;
%mend;
%find_unused_cols(iDs=x);
**
** GET A LIST OF VARIABLE NAMES FROM THE NEW DATASET
** THAT WE WANT TO DELETE AND STORE TO A MACRO VAR.
*;
proc transpose data=vars_to_delete out=vars_to_delete;
run;
proc sql noprint;
select substr(_name_,8) into :vars_to_delete separated by " "
from vars_to_delete
where col1;
quit;
%put &vars_to_delete;
**
** CREATE A NEW DATASET CONTAINING JUST THOSE VARS
** THAT WE WANT TO KEEP
*;
data new_x;
set x;
drop &vars_to_delete;
run;
Rob and cmjohns, thank you SO MUCH for your help. Based on your solutions and an idea I had over the weekend, here is what I came up with:
%macro removeEmptyCols(origDset, outDset);
* get the number of obs in the original dset;
%let dsid = %sysfunc(open(&origDset));
%let origN = %sysfunc(attrn(&dsid, nlobs));
%let rc = %sysfunc(close(&dsid));
proc transpose data= &origDset out= transpDset;
var _all_;
run;
data transpDset;
set transpDset;
* proc transpose converted all old vars to character,
so the . from old numeric vars no longer means 'missing';
array oldVar_ _character_;
do over oldVar_;
if strip(oldVar_) = "." then oldVar_ = "";
end;
* each row from the old dset is now a column with varname starting with 'col';
numMiss = cmiss(of col:);
numCols = &origN;
run;
proc sql noprint;
select _NAME_ into: varsToKeep separated by ' '
from transpDset
where numMiss < numCols;
quit;
data &outDset;
set &origDset (keep = &varsToKeep);
run;
%mend removeEmptyCols;
I will try all 3 ways and report back on which one is fastest...
P.S. added 23 Dec 2010 for future reference: SGF Paper 048-2010: Dropping Automatically Variables with Only Missing Values
This is very simple method useful for all variables
proc freq data=class nlevels ;
ods output nlevels=levels(where=(nmisslevels>0 and nnonmisslevels=0));
run;
proc sql noprint;
select TABLEVAR into :_MISSINGVARS separated by ' ' from levels;
quit;
data want;
set class (keep=&_MISSINGVARS);
run;