I have dataset like follows;
data dataset;
input name $ mob5 mob1 mob3 x;
datalines;
a 1 3 5 7
b 2 4 6 8
c 3 5 7 9
d 5 7 9 2
;
run;
I would like to select the fields name and those with mob (UNKNOW columns name and number of columns contain mob). i dunno how to use retain i do not know how many of columns with columns name contains mob.
proc sql;
create table table1 as
select *
from dataset(keep=name mob:)
quit;
My desired output will be
name mob1 mob3 mob5
a 3 5 1
b 4 6 2
c 5 7 3
d 7 9 5
You can use the dictionary tables for this (assuming your source dataset is called 'dataset' and resides in the work library, make changes to the WHERE clause if not, but make sure you use upper-case for the values):
PROC SQL;
SELECT name INTO: mob_cols SEPARATED BY ','
FROM dictionary.columns
WHERE libname = 'WORK' and memname = 'DATASET'
AND upcase(name) LIKE 'MOB%'
ORDER BY name;
QUIT;
This code loads all of the 'mob' columns into a macro variable, ordered by name and separated by comma.
Then you can use this macro variable in the SELECT clause of your PROC SQL:
PROC SQL;
CREATE TABLE table1 AS
SELECT name,
&mob_cols.
FROM dataset;
QUIT;
Related
My data is as follows:
ID
1
2
3
3
4
5
6
6
I want to create a column that indicates the uniqueness of a value in the ID column as such:
ID COUNT
1 1
2 1
3 1
3 0
4 1
5 1
6 1
6 0
I'd like to do this without creating a temporary table, via a subquery or something. Any assistance would be much appreciated.
One option would be to use by functionality in the data step:
data have;
input ID;
datalines;
1
2
3
3
4
5
6
6
;run;
data want;
set have;
by ID;
if first.ID then count = 1;
else count = 0;
run;
That type of logic is not really amenable to SQL since the order of observations is not really insured. In a more modern version of SQL you could use windowing functions (like ROW_NUMBER() with PARTITION BY) to impose an record count.
If you really wanted to try to do it just in PROC SQL you might need to resort to using the undocumented MONOTONIC() function. But even then to defeat the optimizer eliminating the duplicate rows you might need to make a temporary table with the row counter first.
data have;
input ID ##;
datalines;
1 2 3 3 4 5 6 6
;
proc sql ;
create table _temp_ as select id,monotonic() as row from have;
create table want as
select a.id
, b.row=min(b.row) as FLAG
from have a,_temp_ b
where a.id=b.id
group by a.id
order by 1,2
;
quit;
I have two datasets A & B. I want to join them against two fields: ID and End of Month date. This is defined as EOMDate in dataset A and BalDate in dataset B. How do I join them so that ID and the dates match with each other?
Tom's comment works. Here are a few worked samples:
/*Create some input data for the samples...*/
data first;
input id_a id_b data $;
cards;
1 1 A
2 2 B
3 33 C
4 4 D
55 5 E
;
run;
data second;
input id_a id_b data2 $;
cards;
1 1 AA
2 2 BB
3 3 CC
4 4 DD
5 5 EE
;
run;
/*The proc sql way. We create table 'combo' as result. */
/*You can add more conditions than one. */
proc sql noprint;
create table combo as
select * from first join second
on first.id_a=second.id_a and first.Id_b=second.id_b;
quit;
I've noticed that proc sql is quite slow when working with large sets.
This is a way to make the same with data statements.
First you need to sort the data.
/*A way to accomplish this with datasets.*/
proc sort data=first; by id_a id_b; run;
proc sort data=second; by id_a id_b; run;
data Combo_sas;
merge first(in=a) second(in=b);
by id_a id_b;
if a and b;
run;
I am having trouble with the syntax when trying to reference a macro variable.
I have a subset of ID numbers and a dataset with a quantitative variable xxx associated by IDnum:
data IDnumlist;
input IDnum;
cards;
123
456
789
;
run;
data info;
input IDnum xxx;
cards;
123 2
123 5
456 3
789 1
789 4
555 9
;
run;
I want to summarize the data in the info dataset, but not for IDnum=555, since that is not in my subset. So my data set would look like this:
IDnum xxx_count xxx_sum
123 2 7
456 1 3
789 2 5
Here is my attempt so far:
proc sql noprint;
select count(*)
into :NObs
from IDnumlist;
select IDnum
into :IDnum1-:IDnum%left(&NObs)
from IDnumlist;
quit;
proc sql;
create table want as
select IDnum,
count(xxx) as xxx_count,
sum(xxx) as xxx_sum
from info
where IDnum in (&IDnum1-IDnum%left(&NObs))
group by 1;
run;
What am I doing wrong?
Why are you using macro variables for this? This is what a join is for, or a subquery, or who knows how many other better ways to do this.
proc sql;
create table want as
select info.idnum, count(xxx) as xxx_count, sum(xxx) as xxx_sum
from info inner join idnumlist
on info.idnum=idnumlist.idnum
group by info.idnum;
quit;
The specific problem in your above code is that you can't use 'macro variable lists' like you can data step lists. You could in theory list them individually, but better would be to do the select into differently.
proc sql noprint;
select IDnum
into :IDnumlist separated by ','
from IDnumlist;
quit;
Then all of the values are in &idnumlist. and can be used directly with the in operator:
where idnum in (&idnumlist.)
I am trying to find a way to select a frequency count for rows of a subgroup with no distinct identifiers (well, I guess the distinct identifier is a combination of statuses). Consider the sample data:
data have;
input Series $ Game Name $ Points;
datalines;
A 1 LeBron 2
A 1 LeBron 3
A 1 LeBron 2
A 1 LeBron 2
A 2 LeBron 2
A 2 LeBron 2
A 2 LeBron 3
A 3 LeBron 2
;
run;
Each row here is a shot LeBron took in a game within a series. I want The series/game summary, with a count for number of shots. Like this:
Series Game Name Freq Sum 2pt 3pt
A 1 LeBron 4 9 3 1
A 2 LeBron 3 7 2 1
A 3 LeBron 1 2 1 0
I have to use Proc SQL here, rather then proc means because I am pulling the data in from multiple tables. Also, I will have several thousand "Series" and many more "Games" and "Names" so please keep answer general Here is what I have:
proc sql;
create table want as
select Series,
Game,
Name,
sum(points) as totalpoints
from have
group by 1,2,3;
run;
Thanks.
Pyll
No particular reason you couldn't use PROC MEANS pulling from multiple tables - you can always create a view (either in SQL or in the data step). But anyway,
proc sql;
create table want as
select Series,
Game,
Name,
sum(points) as totalpoints,
count(points) as numbershotsmade
from have
group by 1,2,3;
run;
You can also use the n function which does the same thing.
count(points) will count the non-null points values; count(1) will count the total number of rows even if points is null.
I would like to use proc sql in sas to identify if a case or record is missing some information. I have two datasets. One is a record of an entire data collection, that shows what forms have been collected during a visit. The second is a specification of what forms should be collected during a visit. I have tried many options including data steps and sql code using not in to no avail...
Example data is below
***** dataset crf is a listing of all forms that have been filled out at each visit ;
***** cid is an identifier for a study center ;
***** pid is an identifier for a participant ;
data crf;
input visit cid pid form ;
cards;
1 10 101 10
1 10 101 11
1 10 101 12
1 10 102 10
1 10 102 11
2 10 101 11
2 10 101 13
2 10 102 11
2 10 102 12
2 10 102 13
;
run;
***** dataset crfrule is a listing of all forms that should be filled out at each visit ;
***** so, visit 1 needs to have forms 10, 11, and 12 filled out ;
***** likewise, visit 2 needs to have forms 11 - 14 filled out ;
data crfrule;
input visit form ;
cards;
1 10
1 11
1 12
2 11
2 12
2 13
2 14
;
run;
***** We can see from the two tables that participant 101 has a complete set of records for visit 1 ;
***** However, participant 102 is missing form 12 for visit 1 ;
***** For visit 2, 101 is missing forms 12 and 14, whereas 102 is missing form 14 ;
***** I want to be able to know which forms were **NOT** filled out by each person at each visit (i.e., which forms are missing for each visit) ;
***** extracting unique cases from crf ;
proc sql;
create table visit_rec as
select distinct cid, pid, visit
from crf;
quit;
***** building the list of expected forms by visit number ;
proc sql;
create table expected as
select x.*,
y.*
from visit_rec as x right join crfrule as y
on x.visit = y.visit
order by visit, cid, pid, form;
quit;
***** so now I have a list of which forms that **SHOULD** have been filled out by each person ;
***** now, I just need to know if they were filled out or not... ;
The strategy I have been trying is to merge expected back onto the crf table with some indicator of which forms are missing for each visit.
Optimally, I would like to produce a table that would have: visit, cid, pid, missing_form
Any guidance is greatly appreciated.
EXCEPT will do what you want. I don't necessarily know that this is the most efficient solution in general (and if you're doing this in SAS, it's almost certainly not), but given what you've done so far, this does work:
create table want as
select cid,pid,visit,form from expected
except select cid,pid,visit,form from crf
;
Just be careful with EXCEPT - it's very picky (note that select * doesn't work, as your tables are in different orders).
I suggest a nested query, which alternatively can be done in two steps. What about this one:
proc sql;
create table temp as
select distinct c.*
, (d.visit is null and d.form is null and d.pid is null) as missing_form
from (
select distinct a.pid, b.* from
crf a, crfrule b
) c
left join crf d
on c.pid = d.pid
and c.form = d.form
and c.visit = d.visit
order by c.pid, c.visit, c.form
;
quit;
It gives you a list with all possible (i.e. expected) combinations of pid, form, visit and a boolean indicating whether it was present or not.
You could use a left join and use the where clause to filter out the records with missing records in the right table.
select
e.*
from
expected e left join
crf c on
e.visit = c.visit and
e.cid = c.cid and
e.pid = c.pid and
e.form = c.form
where c.visit is missing
;