I am having trouble with the syntax when trying to reference a macro variable.
I have a subset of ID numbers and a dataset with a quantitative variable xxx associated by IDnum:
data IDnumlist;
input IDnum;
cards;
123
456
789
;
run;
data info;
input IDnum xxx;
cards;
123 2
123 5
456 3
789 1
789 4
555 9
;
run;
I want to summarize the data in the info dataset, but not for IDnum=555, since that is not in my subset. So my data set would look like this:
IDnum xxx_count xxx_sum
123 2 7
456 1 3
789 2 5
Here is my attempt so far:
proc sql noprint;
select count(*)
into :NObs
from IDnumlist;
select IDnum
into :IDnum1-:IDnum%left(&NObs)
from IDnumlist;
quit;
proc sql;
create table want as
select IDnum,
count(xxx) as xxx_count,
sum(xxx) as xxx_sum
from info
where IDnum in (&IDnum1-IDnum%left(&NObs))
group by 1;
run;
What am I doing wrong?
Why are you using macro variables for this? This is what a join is for, or a subquery, or who knows how many other better ways to do this.
proc sql;
create table want as
select info.idnum, count(xxx) as xxx_count, sum(xxx) as xxx_sum
from info inner join idnumlist
on info.idnum=idnumlist.idnum
group by info.idnum;
quit;
The specific problem in your above code is that you can't use 'macro variable lists' like you can data step lists. You could in theory list them individually, but better would be to do the select into differently.
proc sql noprint;
select IDnum
into :IDnumlist separated by ','
from IDnumlist;
quit;
Then all of the values are in &idnumlist. and can be used directly with the in operator:
where idnum in (&idnumlist.)
Related
In the data below I would like proc sql to select the minimum date for subject 123 as the missing date.
data visit;
input subject $1-3 dtc $4-24 ;
cards;
123 2014-01-15T00:00
123
123 2014-01-17T00:00:00
124 2014-01-15T00:00:00
124 2014-01-15T00:00:00
124 2014-01-17T00:00:00
;
run;
proc sql;
create table want. as
select distinct subject, min(dtc) as mindt format = date9.
from have
where subject ne ''
group by subject;
quit;
MIN() will discard missing values from the aggregate computation. Thus, you need to test separately if there are any missing values.
Example:
Use a CASE expression to calculate the MIN you want.
data have;
input subject $1-3 dtc $5-27 ;
cards;
123 2014-01-15T00:00
123 .
123 2014-01-17T00:00:00
124 2014-01-15T00:00:00
124 2014-01-15T00:00:00
124 2014-01-17T00:00:00
;
proc sql ;
create table want as
select
subject
, case when nmiss(dtc) then '' else min(dtc) end as mindtc
, input (calculated mindtc, ? yymmdd10.) as mindt format=date9.
from have
where subject ne ''
group by subject
;
quit;
Here is an alternative solution in SAS:
First, create an index or sort your data by subject and dtc.
proc sort data=have out=have_sorted;
by subject dtc;
quit;
Then you can apply a data step with by grouping and use the first.[column] to get the minimum for each subject including missing values:
data minima;
set have_sorted;
by subject dtc;
if first.subject;
run;
I have a table with an account number and several attributes.
acct | attr1 | attr2 | attr3...
The issue is that there are duplicate account numbers in the list with different attributes. To make matters worse, when there are two account number entries, those entries may have entirely different attributes.
I have a sorting scheme to use to somewhat solve the issue, but after I sort the table, I only need the first occurrence of each account number. I am attempting to do this in sas using Proc SQL.
Any ideas?
I don't think it's possible to do this with PROC SQL, however in DATA STEP logic it is possible.
After the data is sorted, use first. (pronounced first-dot) logic to pick the first occurrence:
First sort the data, using your desired scheme.
proc sort data=have out=intermediate_table;
by acct <other variables>;
run;
Then just use first.acct:
data want;
set intermediate_table;
by acct <other variables>;
if first.acct then output;
run;
proc sort is easiest way to do this. You can use undocumented monotonic() function to do this in Proc sql as shown below
data have;
input acct attr1 $ attr2 $ attr3 $;
datalines;
100 a b c
100 b d e
100 c e f
101 a b c
102 h i j
102 h k l
;
proc sql;
create table want(drop =rn) as
select * from
(select b.*,monotonic() as rn
from have b)
group by acct
having rn =min(rn);
or by using n in a datastep(creating view is a good option as suggested #richard in comments sections)followed by group by as shown below.
data have_view/view=have_view;;
set have;
rn=_n_;
run;
proc sql;
create table want as
select acct, attr1 , attr2 , attr3
from have_view b
group by acct
having rn =min(rn);
I'm trying to organize a dataset in a specific way with a list of variables that changes. The issue I'm having is that I don't always know the actual number of variables I'm going to have in my dataset. I've done this previously with either a PROC SQL statement or a RETAIN statement after the data statement where the list of variables was static.
My data looks like this:
APPNUM DATE REASON1 REASON2 REASON3 REASON4 NAME1 NAME2 NAME3 NAME4
123 1/1/2017 X Y Z A Jon Mary Tom Suzie
I want it to look like this:
APPNUM DATE REASON1 NAME1 REASON2 NAME2 etc
123 1/1/2017 X Jon Y Mary etc
This would be easy with sql or a retain statement. However, I am using loops, etc to pull these variables together, and the number of variables presented is dependent upon my input data. Some days there may be 20 instances of REASON/NAME and others there may be 1 of each.
I tried the below code to pull a list of variable names, then order the APPNUM, DATE, then finally order by the LAST digit of the variable name. I.E. 1,1,2,2,3,3 - but I was unsuccessful. The list was being stored properly - no errors, but when resolving the value of &VARLIST. they are not ordered as expected. Has anyone ever tried and accomplished this?
PROC SQL;
SELECT NAME INTO :VARLIST SEPARATED BY ','
FROM DICTIONARY.COLUMNS
WHERE LIBNAME = 'WORK'
AND MEMNAME = 'SFINAL'
ORDER BY NAME, SUBSTR(NAME,LENGTH(NAME)-1);
QUIT;
The above code would order something like this:
APPNUM, DATE, NAME1...2...3..., REASON1...2...3...
and not:
APPNUM, DATE, NAME1, REASON1, NAME2, REASON2....
Two problems.
First, your order on the ORDER BY is backwards.
Second, your SUBSTR() call is not correct. You have an arbitrary length number at the end. You don't know how many characters that will be. You best bet is to read that number string, convert to a number, and then order by that.
data test;
array name[20];
array reason[20];
format appnum best. date date9.;
run;
proc sql noprint;
SELECT NAME INTO :VARLIST SEPARATED BY ','
FROM DICTIONARY.COLUMNS
WHERE LIBNAME = 'WORK'
AND MEMNAME = 'TEST'
and (upcase(NAME) like 'NAME%' or upcase(NAME) like 'REASON%')
ORDER BY input(compress(compress(name,'name'),'reason'),best.), NAME ;
quit;
%put &varlist;
proc sql noprint;
create table test2 as
select APPNUM, DATE, &varlist
from test;
quit;
I have two datasets A & B. I want to join them against two fields: ID and End of Month date. This is defined as EOMDate in dataset A and BalDate in dataset B. How do I join them so that ID and the dates match with each other?
Tom's comment works. Here are a few worked samples:
/*Create some input data for the samples...*/
data first;
input id_a id_b data $;
cards;
1 1 A
2 2 B
3 33 C
4 4 D
55 5 E
;
run;
data second;
input id_a id_b data2 $;
cards;
1 1 AA
2 2 BB
3 3 CC
4 4 DD
5 5 EE
;
run;
/*The proc sql way. We create table 'combo' as result. */
/*You can add more conditions than one. */
proc sql noprint;
create table combo as
select * from first join second
on first.id_a=second.id_a and first.Id_b=second.id_b;
quit;
I've noticed that proc sql is quite slow when working with large sets.
This is a way to make the same with data statements.
First you need to sort the data.
/*A way to accomplish this with datasets.*/
proc sort data=first; by id_a id_b; run;
proc sort data=second; by id_a id_b; run;
data Combo_sas;
merge first(in=a) second(in=b);
by id_a id_b;
if a and b;
run;
I have dataset like follows;
data dataset;
input name $ mob5 mob1 mob3 x;
datalines;
a 1 3 5 7
b 2 4 6 8
c 3 5 7 9
d 5 7 9 2
;
run;
I would like to select the fields name and those with mob (UNKNOW columns name and number of columns contain mob). i dunno how to use retain i do not know how many of columns with columns name contains mob.
proc sql;
create table table1 as
select *
from dataset(keep=name mob:)
quit;
My desired output will be
name mob1 mob3 mob5
a 3 5 1
b 4 6 2
c 5 7 3
d 7 9 5
You can use the dictionary tables for this (assuming your source dataset is called 'dataset' and resides in the work library, make changes to the WHERE clause if not, but make sure you use upper-case for the values):
PROC SQL;
SELECT name INTO: mob_cols SEPARATED BY ','
FROM dictionary.columns
WHERE libname = 'WORK' and memname = 'DATASET'
AND upcase(name) LIKE 'MOB%'
ORDER BY name;
QUIT;
This code loads all of the 'mob' columns into a macro variable, ordered by name and separated by comma.
Then you can use this macro variable in the SELECT clause of your PROC SQL:
PROC SQL;
CREATE TABLE table1 AS
SELECT name,
&mob_cols.
FROM dataset;
QUIT;