I'm trying to create a set of flags based off of a column of character strings in a data set. The string has thousands of unique values, but I want to create flags for only a small subset (say 10). I'd like to use a SAS macro variable to do this. I've tried many different approaches, none of which have worked. Here is the code that seems simplest and most logical to me, although it's still not working:
%let Px1='12345';
PROC SQL;
CREATE TABLE CLAIM1 AS
SELECT
b.MEMBERID
, b.ENROL_MN
, CASE WHEN (a.PROCEDURE = &Px1.) THEN 1 ELSE 0 END AS CPT_+&Px1.
, a.DX1
, a.DX2
, a.DX3
, a.DX4
FROM ENROLLMENT as b
left join CLAIMS as a
on a.MEMBERID = b.MEMBERID;
QUIT;
Obviously there is only one flag in this code, but once I figure it out the idea is that I would add additional macro variables and flags. Here is the error message I get:
8048 , CASE WHEN (PROCEDURE= &Px1.) THEN 1 ELSE 0 END AS CPT_+&Px1.
-
78
ERROR 78-322: Expecting a ','.
It seems that the cause of the problem is related to combining the string CPT_ with the macro variable. As I mentioned, I've tried several approaches to addressing this, but none have worked.
Thanks in advance for your help.
Something like this normally requires dynamic sql (although I am not sure how will that works with SAS, I believe it may depend on how you have established connection with the database).
Proc sql;
DECLARE #px1 varchar(20) = '12345'
,#sql varhcar(max) =
'SELECT b.MEMBERID
, b.ENROL_MN
, CASE WHEN (a.PROCEDURE = ' + #Px1 + ') THEN 1 ELSE 0
END AS CPT_' + #px1 + '
, a.DX1
, a.DX2
, a.DX3
, a.DX4
FROM ENROLLMENT as b
left join CLAIMS as a
on a.MEMBERID = b.MEMBERID'
EXEC sp_excutesql #sql;
QUIT;
Your issue here is the quotes in the macro variable.
%let Px1='12345';
So now SAS is seeing this:
... THEN 1 ELSE 0 END AS CPT_+'12345'
That's not remotely legal! You need to remove the '.
%let Px1 = 12345;
Then add back on at the right spot.
CASE WHEN a.procedure = "&px1." THEN 1 ELSE 0 END AS CPT_&px1.
Note " not ' as that lets the macro variable resolve.
If you have a list it might help to put the list into a table. Then you can use SAS code to generate the code to make the flag variables instead of macro code.
Say a table with PX code variable.
data pxlist;
input px $10. ;
cards;
12345
4567
;
You could then use PROC SQL query to generate code to make the flag variable into a macro variable.
proc sql noprint;
select catx(' ','PROCEDURE=',quote(trim(px)),'as',cats('CPT_',px))
into :flags separated by ','
from pxlist
;
%put &=flags;
quit;
Code looks like
PROCEDURE= "12345" as CPT_12345,PROCEDURE= "4567" as CPT_4567
So if we make some dummy data.
data enrollment ;
length memberid $8 enrol_mn $6 ;
input memberid enrol_nm;
cards;
1 201612
;
data claims;
length memberid $8 procedure $10 dx1-dx4 $10 ;
input memberid--dx4 ;
cards;
1 12345 1 2 . . .
1 345 1 2 3 . .
;
We can then combine the two tables and create the flag variables.
proc sql noprint;
create table want as
select *,&flags
from ENROLLMENT
natural join CLAIMS
;
quit;
Results
memberid procedure dx1 dx2 dx3 dx4 enrol_mn CPT_12345 CPT_4567
1 12345 1 2 201612 1 0
1 345 1 2 3 201612 0 0
Related
I have around 80 columns names diag1 to diag80. I am wondering how can I pick just 30 columns and apply a case statment in proc SqL. The following code produces an error because it doesn't understand the range.
proc sql;
create table data_WANT as
select *,
case
when **diag1:diag30** in ('F00','G30','F01','F02','F03','F051') then 1
else 0
end as p_nervoussystem
from data_HAVE;
quit;
Thank you, any help is appreciated!
You have two problem with that attempted syntax. First is that variable lists are not supported by PROC SQL (since they are not supported by SQL syntax). The second is there is no simple syntax to search N variables for a list of M strings.
You will need a loop of some kind. It will be much easier in SAS code than in SQL.
For example you could make an array to reference your 30 variables than loop over the variables checking whether each one has a value in the list of values. You can stop checking once one is found.
data want;
set have;
array vars diag1-diag30;
p_nervoussystem=0;
do index=1 to dim(vars) while (not p_nervoussystem);
p_nervoussystem = vars[index] in ('F00','G30','F01','F02','F03','F051');
end;
run;
The inverse pattern to #Tom search for a nervous system diagnostic code:
via FINDW over a concatenation of the observed diagnoses
via WHICHC over an array of the observed diagnoses
data have;
infile datalines missover;
length id 8;
array dx(30) $5;
input id (dx1-dx50) (50*:$5.);
datalines;
1 A00 B00 A12
2 F00 Z12 T45
3 A01 A02 B12 F00
4 Q12
5 Q13
6 T14
7 F44 F45 F46
8 . . . . . . . . . . . . . . G30
;
data want;
length p_nervoussystem p_ns 4;
set have;
array dx dx:;
array ns(6) $5 _temporary_ ('F00','G30','F01','F02','F03','F051');
dx_catx = catx(' ', of dx(*));* drop dx_catx; * way 1;
do _n_ = 1 to dim(ns) until(p_nervoussystem);
p_nervoussystem = 0 < indexw(dx_catx, trim(ns(_n_))); * way 1;
p_ns = 0 < whichc(ns(_n_), of dx(*)); * way 2;
end;
run;```
try it sys.tables and sys.columns and filter your columns.
SELECT * FROM sys.tables INNER JOIN sys.columns ON columns.object_id = tables.object_id
I am examining data quality and am trying to see how many rows are populated properly. The field should contain a string with one character followed by nine numerical and is of type 'Character' length 10.
Ex.
A123456789
B123531490
C319861045
I have tried using PRXMATCH function, but I am unsure if i use the proper syntax. I have also tried using PROC SQL with "Where not like "[A-Z][0-9][0-9]" and so on. My feeling is that this should not be difficult to perform, does anyone have a solution?
Best regards
You can construct a REGEX to make that test. Or just build the test using normal SAS functions.
data want ;
set have ;
flag1 = prxmatch('/^[A-Z][0-9]{9}$/',trim(name));
test1 = 'A' <= name <= 'Z' ;
test2 = not notdigit(trim(substr(name,2))) ;
test3 = length(name)=10;
flag2 = test1 and test2 and test3 ;
run;
Results:
Obs name flag1 test1 test2 test3 flag2
1 A123456789590 0 1 1 0 0
2 B123531490ABC 0 1 0 0 0
3 C3198610 0 1 1 0 0
4 A123456789 1 1 1 1 1
5 B123531490 1 1 1 1 1
6 C319861045 1 1 1 1 1
You can use:
^[a-zA-z][0-9]{9}$
The built-in SAS functions NOTALPHA and NOTDIGIT can perform validation testing.
invalid_flag = notalpha(substr(s,1,1)) || notdigit(s,2) ;
You can select invalid records directly with a where statement or option
data invalid;
set raw;
where notalpha(substr(s,1,1)) || notdigit(s,2) ; * statement;
run;
data invalid;
set raw (where=(notalpha(substr(s,1,1)) || notdigit(s,2))); * data set option;
run;
There are several functions in the NOT* and ANY* families and they can offer faster performance than the general purpose regular expression functions in the PRX* family.
you can use prxparse and prxmatch as shown below.
data have;
input name $20.;
datalines;
A123456789590
B123531490ABC
C3198610
A123456789
B123531490
C319861045
;
data want;
set have;
if _n_=1 then do;
retain re;
re = prxparse('/^[a-zA-z][0-9]{9}$/');
end;
if prxmatch(re,trim(name)) gt 0 then Flag ='Y';
else Flag ='N';
drop re;
run;
if you want only records those match the criteria then use
data want;
set have;
if _n_=1 then do;
retain re;
re = prxparse('/^[a-zA-z][0-9]{9}$/');
end;
if prxmatch(re,trim(name));
drop re;
run;
I want to use "select into" to create a list of all IDs in SAS.
/* my state table try01 */
data try01;
input id state $;
cards;
1108 va
1102 dc
1101 md
1105 on
;
run;
/* select into */
proc sql noprint;
select id into: x from try01;
quit;
%put &x;
My question is why the log shows that macro x is only one value (1108) instead
of a list (1108,1102,1101,1105) ? So confused... thanks a lot.
If you want SQL to put multiple values into the macro variable then you need to include the SEPARATED BY clause.
select id into :x separated by ' ' from try01;
You could then use this list in, for example an IN operator call.
proc print data=have ;
where id in (&x);
run;
Firstly I have the following table:
data dataset;
input id $ value;
datalines;
A 1
A 2
A 3
A 4
B 2
B 3
B 4
B 5
C 2
C 4
C 6
C 8
;
run;
I would like to write a macro so that the user can subset the data by giving the id value. I do proc sql inside the macro as follows:
%macro sqlgrp(id=,);
proc sql;
create table output_&id. as
select *
from dataset
where id = '&id.'
;
quit;
%mend;
%sqlgrp(id=A); /*select id=A only*/
I am able to generate the output_A table in the WORK library, however it has zero (0) observations.
Why is this not working?
You need to use double quotes when referring to macro variables.
Current Code
%macro sqlgrp(id=,);
proc sql;
create table output_&id. as
select *
from dataset
where id = '&id.'
;
quit;
%mend;
%sqlgrp(id=A); /*select id=A only*/
Looks for values of id that are literally '&id.'. You can test this by creating this dataset:
data dataset;
input id $ value;
datalines;
&id. 2
A 2
;
run;
Now, use %let to set the value of the macro variable id:
%let id=A;
Run a quick test of the functionality difference between single and double quotes. Notice the titles also contain single and double quotes, so we can see exactly what has happened in the output:
proc sql;
title 'Single Quotes - where id=&id.';
select *
from dataset
where id='&id.';
title "Double Quotes - where id=&id.";
select *
from dataset
where id="&id.";
title;
quit;
Correct Code
%macro sqlgrp(id=,);
proc sql;
create table output_&id. as
select *
from dataset
where id = "&id."
;
quit;
%mend;
%sqlgrp(id=A); /*select id=A only*/
The double quotes allow the macro variable &id to resolve to 'A', which will return results based on your input.
Just a simple rewrite of the previous answer which passes 'in' and 'out' through a signature of the macros
%macro sqlgrp(in=, id=, out=);
proc sql noprint;
create table &out. as select * from &in. where id = "&id.";
quit;
%mend sqlgrp;
I'm having this macro. The aim is to take the name of variables from the table dicofr and put the rows inside into variable name using a symput.
However , something is not working correctly because that variable, &nvarname, is not seen as a variable.
This is the content of dico&&pays&l
varname descr
var12 aza
var55 ghj
var74 mcy
This is the content of dico&&pays&l..1
varname
var12
var55
var74
Below is my code
%macro testmac;
%let pays1=FR ;
%do l=1 %to 1 ;
data dico&&pays&l..1 ; set dico&&pays&l (keep=varname);
call symput("nvarname",trim(left(_n_))) ;
run ;
data a&&pays&l;
set a&&pays&l;
nouv_date=mdy(substr(date,6,2),01,substr(date,1,4));
format nouv_date monyy5.;
run;
proc sql;
create table toto
(nouv_date date , nomvar varchar (12));
quit;
proc sql;
insert into toto SELECT max(nouv_date),"&nvarname" as nouv_date as varname FROM a&&pays&l WHERE (&nvarname ne .);
%end;
%mend;
%testmac;
A subsidiary question. Is it possible to have the varname and the date related to that varname into a macro variable? My man-a told me about this but I have never done that before.
Thanks in advance.
Edited:
I have this table
date col1 col2 col3 ... colx
1999M12 . . . .
1999M11 . 2 . .
1999M10 1 3 . 3
1999M9 0.2 3 2 1
I'm trying to do know the name of the column with the maximum date , knowing the value inside of the column is different than a missing value.
For col1, it would be 1999M10. For col2, it would be 1999M11 etc ...
Based on your update, I think the following code does what you want. If you don't mind sorting your input dataset first, you can get all the values you're looking for with a single data step - no macros required!
data have;
length date $7;
input date col1 col2 col3;
format date2 monyy5.;
date2 = mdy(substr(date,6,2),1,substr(date,1,4));
datalines;
1999M12 . . .
1999M11 . 2 .
1999M10 1 3 .
1999M09 0.2 3 2
;
run;
/*Required for the following data step to work*/
/*Doing it this way allows us to potentially skip reading most of the input data set*/
proc sort data = have;
by descending date2;
run;
data want(keep = max_date:);
array max_dates{*} max_date1-max_date3;
array cols{*} col1-col3;
format max_date: monyy5.;
do until(eof); /*Begin DOW loop*/
set have end = eof;
/*Check to see if we've found the max date for each col yet.*/
/*Save the date for that col if applicable*/
j = 0;
do i = 1 to dim(cols);
if missing(max_dates[i]) and not(missing(cols[i])) then max_dates[i] = date2;
j + missing(max_dates[i]);
end;
/*Use j to count how many cols we still need dates for.*/
/* If we've got a full set, we can skip reading the rest of the data set*/
if j = 0 then do;
output;
stop;
end;
end; /*End DOW loop*/
run;
EDIT: if you want to output the names alongside the max date for each, that can be done with a slight modification:
data want(keep = col_name max_date);
array max_dates{*} max_date1-max_date3;
array cols{*} col1-col3;
format max_date monyy5.;
do until(eof); /*Begin DOW loop*/
set have end = eof;
/*Check to see if we've found the max date for each col yet.*/
/*If not then save date from current row for that col*/
j = 0;
do i = 1 to dim(cols);
if missing(max_dates[i]) and not(missing(cols[i])) then max_dates[i] = date2;
j + missing(max_dates[i]);
end;
/*Use j to count how many cols we still need dates for.*/
/* If we've got a full set, we can skip reading the rest of the data set*/
if j = 0 or eof then do;
do i = 1 to dim(cols);
col_name = vname(cols[i]);
max_date = max_dates[i];
output;
end;
stop;
end;
end; /*End DOW loop*/
run;
It looks to me that you're trying to use macros to generate INSERT INTO statements to populate your table. It's possible to do this without using macros at all which is the approach I'd recommend.
You could use a datastep statement to write out the INSERT INTO statements to a file. Then following the datastep, use a %include statement to run the file.
This will be easier to write/maintain/debug and will also perform better.