Using Perl Regular Expressions in SAS Proc SQL statements - sql

I was attempting to use regular expressions in a SAS SQL statement and couldn't get them working. It runs, but it doesn't return the matched expression (column xx is always blank). Not sure if this is something I'm doing wrong or if SAS doesn't let you do this.
proc sql noprint;
create table xx as
select *,
prxposn(prxparse("/a/i"), 0, name) as xx
from sashelp.class
;
quit;
Thanks
Rob
EDIT: I know I could just do the PROC SQL and then do the regex in a datastep - I can get that working fine, I just want to know if it's possible to do it all in the PROC sql.

I don't think the documentation is particularly clear on the matter, but "the PRXPOSN function uses the results of PRXMATCH, PRXSUBSTR, PRXCHANGE, or PRXNEXT to return a capture buffer" and so you have to call one of those functions first, using the regular expression ID you generate via PRXPARSE, prior to calling PRXPOSN.
The following SAS code works for me on 9.1.3. Your intention is not entirely clear to me, but I'm assuming you want to capture the suffix starting from the first "a", and so I modified your regular expression accordingly:
proc sql;
create table xx as
select *,
prxparse("/a\w*/i") as re,
ifc(
prxmatch(calculated re, name),
prxposn(calculated re, 0, name),
" "
) as xx
from sashelp.class;
quit;
One downside to this approach (besides its decided lack of elegance) is that it adds an additional variable (re) to the output data set. The following sources were helpful to me in tracking down the behavior of PRXPOSN:
http://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf
http://communities.sas.com/thread/30443
http://groups.google.com/group/comp.soft-sys.sas/browse_thread/thread/15ec39268d497990/d2eaf9c4512ee0b5?lnk=gst&q=prxposn&pli=1

This may be different than what you want, but this will populate xx with the location of the first a in name:
proc sql noprint;
create table xx as
select *,
prxmatch('/a/i', name) as xx
from sashelp.class
;
quit;

Related

How to use RPAD in SAS?

I am trying to add right zeros to a variable in a SQL step in SAS, I am using the RPAD statement which belongs to Oracle, however in SAS this statement does not work.
It would be very helpful if someone could support me.
In SAS there is not a direct equivalent; you'd do something like
proc sql;
select name, cats(name,'00000')
from sashelp.class;
quit;
Or, you can use repeat which repeats a character a certain number of times.
proc sql;
select name, cats(name,repeat('0',4))
from sashelp.class;
quit;
Finally, all SAS variables are de facto right padded with spaces, so if the variable is '### ' you could do
proc sql;
select name, translate(name,'0',' ')
from sashelp.class;
quit;
Assuming you wanted to fully pad the right side with 0's.

Proc SQL with a macro in where statement to select on ID's

I am currently trying to write a small SAS macro that does the following:
The macro reads the input ID values from an input table: "input_table".
The ID values are used to query an oracle database for the variable "TARGET".
The macro is shown below.
Whenever I run the macro, the filtering on ID does not seem to work and the proc sql return an empty table. I can not get my head around what might be going wrong, all help is welcome!
My current solution is using an inner join, which does the job. However, the SQL solution is strongly preferred for efficiency reasons.
QUESTION: Why is the Proc SQL not selecting records based on the list "id_list"?
%macro query_DB_from_table(input_table = , output_table = );
/* PART 1: Get IDs from the input table */
%local id_list;
proc sql noprint;
select ID into: id_list separated by "' , '"
from &input_table;
quit;
/* PART 2: Query the Oracle Database */
proc sql noprint;
create table &output_table as
select ID, TARGET
from ORACLE_DB
where ID in (%str(')%bquote(&id_list)%str('))
order by ID;
quit;
%mend query_DB_from_table;
The QUOTE function allows a second argument that specifies the quoting character.
The value being quoted should also be TRIM'd.
Change your SQL to directly populate the macro variable value as a complete comma separated list of single quoted values.
Example:
proc sql noprint;
select
quote(trim(name),"'") into :idlist separated by ','
from
sashelp.class
;
quit;
%put &=idlist;
---------- log ----------
IDLIST='Alfred','Alice','Barbara','Carol','Henry','James','Jane',
'Janet','Jeffrey','John','Joyce','Judy','Louise','Mary','Philip','Robert',
'Ronald','Thomas','William'
The query where clause would then be simpler:
…
WHERE ID in ( &id_list )
…

Use SAS Proc SQL to Find Rows Where All Columns Are Not Missing?

I would like to use SAS Proc SQL to find me row in a table in which every column has a non-missing value. Is there a way to do this without having to list all of the column names? This did not work, but it might give you an idea of what my intended output is.
Proc SQL;
Select *
from work.table
Where * is not null;
Run;
I would also like to limit the results to one observation if possible. Thanks.
Nontrivial in SQL since you cannot get all variables in one item without using the macro language. In the datastep, this is trivial.
data class;
set sashelp.class;
if mod(_n_,3)=1 then call missing(age);
run;
data want;
set class;
if cmiss(of _all_)=0;
run;
cmiss indicates a 1 if a character or numeric variable is missing (and specifically in this case counts the total number of missings). You can use the obs=1 option on the data step to limit to one.
In SQL, you have to be explicit. * is not a general purpose macro that expands out the columns. It is a syntactic elements that happens to be used in select * and count(*).
So, something like this:
Proc SQL;
Select *
from work.table
Where col1 is not null and col2 is not null and col3 is not null . . .
Run;
Using SQL and dictionary tables:
proc sql noprint;
select cats('not missing(', name, ')')
into :expression separated by " and "
from dictionary.columns
where libname = "SASHELP" and memname = "CLASS";
quit;
proc sql outobs=1;
select *
from sashelp.class
where &expression.;
quit;

SAS SQL: WHERE LIKE 'list of words'

I've a dataset that has some comments that would exclude subjects. I want to make a mini dataset to collect these subjects.
I'm trying to use SAS SQL for this so I tried to do this:
PROC SQL;
CREATE TABLE EXCLUDE as
SELECT *
FROM data_set
WHERE UPCASE(COMMENT) like '%(INELIGIBLE | REFUSED)%';
QUIT;
I also tried
PROC SQL;
CREATE TABLE exclude as
SELECT *
FROM Data_set
WHERE UPCASE(COMMENT) like ('%INELIGIBLE%'|'%REFUSED%')
;
QUIT;
I keep getting an error that says 'LIKE OPERATOR Requires character operands'
How can I make this a proper syntax query?
Thanks
You could do it via a like-join against a list of the terms to exclude :
data words ;
input word $char16. ;
datalines ;
INELIGABLE
REFUSED
;
run ;
proc sql ;
create table exclude as
select a.*
from data_set a
left join
words b on upcase(a.comment) like cats('%',b.word,'%')
where missing(b.word) ;
quit ;
You can use perl regular expressions to do this, if you're working with a string that already is formed. (If not, you're better off just writing the separate syntax, PRXs are slow.)
Equivalent code here, one written out, one with a PRX using a single string:
proc sql;
select *
from sashelp.class
where not (name like 'A%' or name like 'B%');
quit;
proc sql;
select *
from sashelp.class
where not (prxmatch('~^[A|B]~io',name));
quit;
SQL does not have full regular expressions support. In SAS, you could use prxmatch(). But, you can also do this in SQL:
PROC SQL;
CREATE TABLE EXCLUDE as
SELECT *
FROM data_set
WHERE UPCASE(COMMENT) like '%INELIGIBLE%' OR
UPCASE(COMMENT) like '%REFUSED)%';
QUIT;
Note: this will not use an index on comment.
Here is another solution, using contains, where the search terms comes from a dataset (which can e.g. be read from external file). I like this for its portability.
Proc sql noprint;
select 'Upcase(Comment) contains '''||strip(Upcase(term))||''''
into :strings separated by ' or '
from exclusion_terms
order by 1;
create table Excluded as
select *
from Data_set
where &strings;
Quit;
Here, the first section creates the macro variable string from the dataset of exclusion terms, which is then used to create the Excluded dataset.
The hard-coded version version of search terms using Contains:
Proc Sql;
create table Excluded as
select *
from Data_set
where Upcase(Comment) contains ('INELIGIBLE' OR 'REFUSED');
Quit;

Sorting Table Variables by Prefix/Starting Letter

This is for a SAS table, so SQL commands would work, as well.
I have a table with 300 variables; they have 5 different prefixes, which I would like to sort them by. I want them in a particular order (mtr prefix before date prefix), but alphabetical would be acceptable.
I was thinking SQL would have something along the lines of:
Select mtr*, date* from Table
or
Select mtr%, date% from Table
As gbn says, you'll need to get the column names and dynamically build some sql (or data step code).
Here's a solution that retrieves the column names from an automatic SAS view that holds metadata about your session, ordered alphabetically, into a single macro variable which you can then use later in your code:
proc sql noprint;
select name into :orderedVarNames separated by ','
from sashelp.vcolumn
where libname='WORK' and memname='YOUR_TABLE_NAME'
order by name
;
quit;
(Obviously you'll need to replace the quoted values with the correct libname and table name for your table.) Then you can use this macro variable in another step, like this:
proc sql;
select &orderedVarNames
from YOUR_TABLE_NAME
;
quit;
Here, "&orderedVarNames" is resolved to the list of column names. You can check what is in the variable by putting it out to the log thus: %put &orderedVarNames;
There are other ways to do what you're thinking of, but this is probably the quickest and will work for any table. If you were going to use this technique for a variable list in a data step, change the separator to separated by ' '.
Once you've got the hang of this, you could then tailor the solution to get the exact order you want by generating more than one macro variable and filtering what you're retrieving from sashelp.vcolumn. Something like this:
proc sql noprint;
select name into :orderedMTRvars separated by ','
from sashelp.vcolumn
where libname='WORK' and memname='MYTABLE' and substr(name,1,3)='MTR'
order by name
;
select name into :orderedDATEvars separated by ','
from sashelp.vcolumn
where libname='WORK' and memname='MYTABLE' and substr(name,1,4)='DATE'
order by name
;
quit;
proc sql;
select &orderedMTRVars, &orderedDATEVars
from MYTABLE
;
quit;