Sorry, not sure how to best word this so I'll just give an example.
1 VA b x 10
2 VA g y 5
3 VA b x 6
4 VA s y 7
5 VA s x 8
6 PA b y 1
7 PA s x 4
8 PA g y 5
9 PA s x 6
10 PA b y 9
I would like to summarize the above data like the following:
x_b x_s x_g y_b y_s y_g
VA 16 8 0 9 7 0
PA 0 10 0 9 0 5
where I have a row for each state and combinations of the two groups (group of x, y and group of b,s,g) across the top and summarize the values for all groupings like that.
What is the best way to do this in SQL?
Thanks!
You can do this using conditional aggregation:
proc sql;
select state,
sum(case when col3 = 'b' and col4 = 'x' then col5 else 0 end) as x_b,
sum(case when col3 = 's' and col4 = 'x' then col5 else 0 end) as x_s,
sum(case when col3 = 'g' and col4 = 'x' then col5 else 0 end) as x_g,
sum(case when col3 = 's' and col4 = 'y' then col5 else 0 end) as y_s,
sum(case when col3 = 'g' and col4 = 'y' then col5 else 0 end) as y_g
from t
group by state;
Don't do this in SQL. You're in SAS, use the tools you have: here PROC TABULATE is the best tool.
data have;
input obs state $ var1 $ var2 $ val;
datalines;
1 VA b x 10
2 VA g y 5
3 VA b x 6
4 VA s y 7
5 VA s x 8
6 PA b y 1
7 PA s x 4
8 PA g y 5
9 PA s x 6
10 PA b y 9
;;;;
run;
proc tabulate data=have;
class state var1 var2;
var val;
tables state, var1=' '*var2=' '*val=' '*sum=' '/printmiss misstext='0';
run;
If you want a dataset and not a printed table, that's easy enough to do. Just make a dataset from TABULATE, then make a few minor changes and transpose it.
proc tabulate data=have out=want_first;
class state var1 var2;
var val;
tables state, var1=' '*var2=' '*val=' '*sum=' '/printmiss misstext='0';
run;
data want_pret;
set want_first;
var_name = catx('_',var2,var1);
value = coalesce(val_sum,0);
keep state var_name value;
run;
proc transpose data=want_pret out=want;
by state;
id var_name;
var value;
run;
Notice that none of this requires hardcoding the values for any of the variables - no matter what you put in var1/var2, this will always give you the right result.
If you want to make a cross-tab use PROC FREQ. Use the WEIGHT statement to pass in the exiting counts and the SPARSE option to get the zeros output.
proc freq data=have ;
tables state*v2*v1 / noprint out=counts sparse ;
weight cnt ;
run;
You can then turn the result into your horizontal format by using PROC TRANSPOSE.
proc transpose data=counts delimiter=_
out=want(drop= _name_ _label_)
;
by state ;
id v2 v1 ;
var count ;
run;
Related
I want to assign a value in new_col based on value in column 'ind' when months = 1;
idnum1 months ind new_col
1 1 X X
1 2 X X
1 3 Y X
1 4 Y X
1 5 X X
2 1 Y Y
2 2 Y Y
2 3 X Y
2 4 X Y
2 5 X Y
Below query just assign the value X where months = 1 but I want in all the rows of new_col for all the id -
create table tmp as
select t1.*,
case when months = 1 then ind end as new_col
from table t1;
I am trying to do it in SAS using proc sql;
Ideally you would use RETAIN within a data step:
data want;
set have;
retain new_var;
if month=1 then new_var = ind;
run;
SQL isn't as good with this as a data step.
But assuming your variable ID is repeated then this would work. If it's not then you really do need the data step approach.
proc sql;
create table want as
select *, max(ind) as new_col
from have
group by ID;
quit;
EDIT: If you want to retain the first per ID just use FIRST. instead of If month =1.
data want;
set have;
by ID;
retain new_var;
if first.id then new_var = ind;
run;
A robust Proc SQL statement that deals with possibly repeated first month situations that chooses the lowest ind to distribute to the group
data have; input
idnum1 months ind $ new_col $; datalines;
1 1 X X
1 2 X X
1 3 Y X
1 4 Y X
1 5 X X
2 1 Y Y
2 2 Y Y
2 3 X Y
2 4 X Y
2 5 X Y
3 1 Z .
3 1 Y .
3 1 X .
3 2 A .
;
create table want as
select
have.idnum1, months, ind, new_col, lowest_first_ind
from
have
join
( select idnum1, min(ind) as lowest_first_ind from
(
select idnum1, ind
from have
group by idnum1
having months = min(months)
)
group by idnum1
) value_seeker
on
have.idnum1 = value_seeker.idnum1
;
You can use a window function:
select t1.*,
max(case when months = 1 then ind end) over (partition by id) as new_col
from t1;
If there is only one MONTH=1 observation per BY group then just use a simple join.
create table WANT as
select t1.*,t2.ind as new_col
from table t1
left join (select idnum1,ind from table where month=1) t2
on t1.idnum1 = t2.idnum1
;
I have a data set containing an unbalanced panel of observations, where I want to forward and backward fill missing and/or "wrong" observations of ticker with the latest non-missing string.
id time ticker_have ticker_want
------------------------------
1 1 ABCDE YYYYY
1 2 . YYYYY
1 3 . YYYYY
1 4 YYYYY YYYYY
1 5 . YYYYY
------------------------------
2 4 . ZZZZZ
2 5 ZZZZZ ZZZZZ
2 6 . ZZZZZ
------------------------------
3 1 . .
------------------------------
4 2 OOOOO OOOOO
4 3 OOOOO OOOOO
4 4 OOOOO OOOOO
Basically, if the observation already has a ticker, but this ticker is not the same as the latest non-empty ticker, we replace this ticker using the latest ticker.
So far, I have managed to fill missing observations forward using this code
proc sql;
create table have as select * from old_have order by id, time desc;
quit;
data want;
drop temp;
set have;
by id;
/* RETAIN the new variable*/
retain temp; length temp $ 5;
/* Reset TEMP when the BY-Group changes */
if first.id then temp=' ';
/* Assign TEMP when X is non-missing */
if ticker ne ' ' then temp=ticker;
/* When X is missing, assign the retained value of TEMP into X */
else if ticker=' ' then ticker=temp;
run;
Now I am stuck figuring out the cases where I can't access the non-missing value using last.ticker or first.ticker ...
How would one do this using DATA or PROC SQL or any other SAS commands?
You can do this several ways, but proc sql with some nested sub-queries is one solution.
(Read it from inside out, #1 then 2 then 3. You could build each subquery into a dataset first if it helps)
proc sql ;
create table want as
/* #3 - match last ticker on id */
select a.id, a.time, a.ticker_have, b.ticker_want
from have a
left join
/* #2 - id and last ticker */
(select x.id, x.ticker_have as ticker_want
from have x
inner join
/* #1 - max time with a ticker per id */
(select id, max(time) as mt
from have
where not missing(ticker_have)
group by id) as y on x.id = y.id and x.time = y.mt) as b on a.id = b.id
;
quit ;
Consider using a data step to retrieve the last ticker by time for each id, then joining it to main table. Also, use a CASE statement to conditionally assign new ticker if missing or not.
data LastTicker;
set Tickers (where=(ticker_have ~=""));
by id;
first = first.id;
last = last.id;
if last = 1;
run;
proc sql;
create table Tickers_Want as
select t.id, t.time, t.ticker_have,
case when t.ticker_have = ""
then l.ticker_have
else t.ticker_have
end as tickerwant
from Tickers t
left join LastTicker l
on t.id = l.id
order by t.id, t.time;
quit;
Data
data Tickers;
length ticker_have $ 5;
input id time ticker_have $;
datalines;
1 1 ABCDE
1 2 .
1 3 .
1 4 YYYYY
1 5 .
2 4 .
2 5 ZZZZZ
2 6 .
3 1 .
4 2 OOOOO
4 3 OOOOO
4 4 OOOOO
;
Output
Obs id time ticker_have tickerwant
1 1 1 ABCDE ABCDE
2 1 2 YYYYY
3 1 3 YYYYY
4 1 4 YYYYY YYYYY
5 1 5 YYYYY
6 2 4 ZZZZZ
7 2 5 ZZZZZ ZZZZZ
8 2 6 ZZZZZ
9 3 1
10 4 2 OOOOO OOOOO
11 4 3 OOOOO OOOOO
12 4 4 OOOOO OOOOO
I have a table that will have a variable number of columns based on my initial input. Is there a function to sum all the numeric columns of this table without specifying the name of each column?
Right now I have each column name hard coded in a proc sql command.
CREATE TABLE &new_table_name AS
(SELECT SUM(CASE WHEN col1 = &state THEN 1 ELSE 0 END) AS month_01,
SUM(CASE WHEN col2 = &state THEN 1 ELSE 0 END) AS month_02,
SUM(CASE WHEN col3 = &state THEN 1 ELSE 0 END) AS month_03,
SUM(CASE WHEN col4 = &state THEN 1 ELSE 0 END) AS month_04,
SUM(CASE WHEN col5 = &state THEN 1 ELSE 0 END) AS month_05
);
Sample input would be like this:
name m1 m2 m3 m4
aa 1 7 7 1
ab 2 4 2
ac 1 1
ad 1 3 1 1
ae 2 1 3
Then the sample output would be
name m1 m2 m3 m4
7 16 13 2
You are looking for PROC MEANS. Or really any summarization proc.
data have;
infile datalines missover;
input name $ m1 m2 m3 m4;
datalines;
aa 1 7 7 1
ab 2 4 2
ac 1 1
ad 1 3 1 1
ae 2 1 3
;;;;
run;
proc means data=have;
output out=want sum=;
run;
And the class statement would let you group by state or whatever. WHERE also works fine in PROC MEANS to filter.
Leaving the var statement off calls for all numeric variables, or you can put in a var statement to limit, such as
var m1-m4;
as Reeza notes in comments.
Supose we've got the following dataset:
DATE VAR1 VAR2
1 A 1
2 A 1
3 B 1
4 C 2
5 D 3
6 E 4
7 F 5
8 B 6
9 B 7
10 D 1
Each record belongs to a person, the problem is that a single person can have more than one record with different values.
To identify a person: If you share the same VAR1, you are the same person, BUT also if you share the same VAR2, you are the same person.
My objective is to create a new variable IDPERSON which uniquely identifies the person for each record. In my example, there are only 4 different people:
DATE VAR1 VAR2 IDPERSON
1 A 1 1
2 A 1 1
3 B 1 1
4 C 2 2
5 D 3 1
6 E 4 3
7 F 5 4
8 B 6 1
9 B 7 1
10 D 1 1
How could I achieve this by using SQL or SAS?
%macro grouper(
inData /*Input dataset*/,
outData /*output dataset*/,
id1 /*First identification variable (must be numeric)*/,
id2 /*Second identification variable*/,
idOut /*Name of variable to contain group ID*/,
maxN = 5 /*Max number of itterations in case of failure*/);
/* Assign an ID to each distict connected graph in a a network */
/* Create first guess for group ID */
data _g_temp;
set &inData.;
&idOut. = &id1.;
run;
/* Loop, improve group ID each time*/
%let i = 1;
%do %while (&i. <= &maxN.);
%put Loop number &i.;
%let i = %eval(&i. + 1);
proc sql noprint;
/* Find the lowest group ID for each group of first variable */
create table _g_map1 as
select
min(&idOut.) as &idOut.,
&id1.
from _g_temp
group by &id1.;
/* Find the lowest group ID for each group of second variable */
create table _g_map2 as
select
min(&idOut.) as &idOut.,
&id2.
from _g_temp
group by &id2.;
/* Find the lowest group ID from both grouping variables */
create table _g_new as
select
a.&id1.,
a.&id2.,
coalesce(min(b.&idOut., c.&idOut.), a.&idOut.) as &idOut.,
a.&idOut. as &idOut._old
from _g_temp as a
full outer join _g_map1 as b
on a.&id1. = b.&id1.
full outer join _g_map2 as c
on a.&id2. = c.&id2.;
/* Put results into temporary dataset ready for next itteration */
create table _g_temp as
select *
from _g_new;
/* Check if the itteration provided any improvement */
select
min(
case when &idOut._old = &idOut. then 1
else 0
end) into :stopFlag
from _g_temp;
quit;
/* End loop if ID unchanged over last itteration */
%if &stopFlag. %then %let i = %eval(&maxN. + 1);
%end;
/* Output lookup table */
proc sql;
create table &outData. as
select
&id1.,
min(&idOut.) as &idOut.
from _g_temp
group by &id1.;
quit;
/* Clean up */
proc datasets nolist;
delete _g_:;
quit;
%mend grouper;
DATA baseData;
INPUT VAR1 VAR2 $;
CARDS;
1 A
1 A
1 B
2 C
3 D
4 E
5 F
6 B
7 B
1 D
1 X
7 G
6 Y
6 D
6 I
8 D
9 Z
9 X
;
RUN;
%grouper(
baseData,
outData,
VAR1,
VAR2,
groupID);
Do you think this will work?
It's written in SAS, but it uses SQL sentences.
DATA TEMP3;
INPUT VAR1 VAR2 $ DATE;
CARDS;
1 A 1
1 A 2
1 B 3
2 C 4
3 D 5
4 E 6
5 F 7
6 B 8
7 B 9
1 D 10
;
RUN;
PROC SQL;
CREATE TABLE WORK.TEMP4 AS SELECT DISTINCT VAR2, VAR1 FROM WORK.TEMP3 ORDER BY VAR2, VAR1;
CREATE TABLE WORK.TEMP5 AS SELECT DISTINCT VAR1, VAR2 FROM WORK.TEMP3 ORDER BY VAR1, VAR2;
CREATE TABLE WORK.TEMP6 AS SELECT TEMP4.VAR2, TEMP4.VAR1, TEMP5.VAR2 AS VAR22 FROM WORK.TEMP4 INNER JOIN WORK.TEMP5 ON (TEMP4.VAR1=TEMP5.VAR1);
CREATE TABLE WORK.TEMP7 AS SELECT TEMP6.*, TEMP5.VAR1 AS VAR12 FROM WORK.TEMP6 INNER JOIN WORK.TEMP5 ON (TEMP6.VAR2=TEMP5.VAR2);
CREATE TABLE WORK.TEMP8 AS SELECT DISTINCT VAR22, VAR12 FROM WORK.TEMP7 ORDER BY VAR22, VAR12;
CREATE TABLE WORK.TEMP9 AS SELECT VAR22, MAX(VAR12) AS VAR12 FROM WORK.TEMP8 GROUP BY VAR22;
CREATE TABLE WORK.TEMP10 AS SELECT TEMP8.* FROM WORK.TEMP8 INNER JOIN WORK.TEMP9 ON (TEMP8.VAR22=TEMP9.VAR22 AND TEMP8.VAR12=TEMP9.VAR12);
CREATE TABLE WORK.TEMP11 AS SELECT TEMP3.*, TEMP10.VAR12 AS IDPERSONA FROM WORK.TEMP3 LEFT JOIN WORK.TEMP10 ON (TEMP3.VAR2=TEMP10.VAR22);
QUIT;
I've broken down this problem into a few steps, which works for the data you've supplied. There's probably a way to reduce the number of steps, at the expense of readability. Let me know if this works for your real data.
/* create input dataset */
data have;
input DATE VAR1 $ VAR2;
datalines;
1 A 1
2 A 1
3 B 1
4 C 2
5 D 3
6 E 4
7 F 5
8 B 6
9 B 7
10 D 1
;
run;
/* calculate min VAR2 per VAR1 */
proc summary data=have nway idmin;
class var1;
output out=minvar2 (drop=_:) min(var2)=temp_var;
run;
/* add in min VAR2 data */
proc sql;
create table temp1 as select
a.*,
b.temp_var
from have as a
inner join
minvar2 as b
on a.var1 = b.var1
order by b.temp_var;
quit;
/* create idperson variable */
data want;
set temp1;
by temp_var;
if first.temp_var then idperson+1;
drop temp_var;
run;
/* sort back to original order */
proc sort data=want;
by date var1;
run;
Keith:
You solution does not work properly, take a look at the following dataset:
DATA TEMP3;
INPUT VAR2 VAR1 $ DATE;
DUMMY=1;
CARDS;
1 A 1
1 A 2
1 B 3
2 C 4
3 D 5
4 E 6
5 F 7
6 B 8
7 B 9
1 D 10
1 X 11
7 G 14
6 Y 15
6 D 16
6 I 18
8 D 20
9 Z 21
9 X 22
;
RUN;
Your program's result is:
VAR2 VAR1 DATE DUMMY idperson
1 A 1 1 1
1 A 2 1 1
1 B 3 1 1
2 C 4 1 2
3 D 5 1 1
4 E 6 1 3
5 F 7 1 4
6 B 8 1 1
7 B 9 1 1
1 D 10 1 1
1 X 11 1 1
7 G 14 1 6
6 Y 15 1 5
6 D 16 1 1
6 I 18 1 5
8 D 20 1 1
9 Z 21 1 7
9 X 22 1 1
Which are not corrent since Var1=6 records have two different ids.
This is what i've done, the whole program (not posted here) is more complex (and not so elegant) since it deals with missing data in Var1 and Var2.
PROC SQL;
CREATE TABLE WORK.TEMP4 AS SELECT DISTINCT VAR1, VAR2 FROM WORK.TEMP3 WHERE DUMMY=1 AND VAR2^=. ORDER BY VAR1, VAR2;
CREATE TABLE WORK.TEMP5 AS SELECT DISTINCT VAR2, VAR1 FROM WORK.TEMP3 WHERE DUMMY=1 AND VAR2^=. ORDER BY VAR2, VAR1;
CREATE TABLE WORK.TEMP6 AS SELECT TEMP4.*, TEMP5.VAR1 AS CIP2 FROM WORK.TEMP4 INNER JOIN WORK.TEMP5 ON (TEMP4.VAR2=TEMP5.VAR2);
CREATE TABLE WORK.TEMP7 AS SELECT TEMP6.*, TEMP4.VAR2 AS IDHH2 FROM WORK.TEMP6 INNER JOIN WORK.TEMP4 ON (TEMP6.VAR1=TEMP4.VAR1);
CREATE TABLE WORK.TEMP8 AS SELECT DISTINCT IDHH2, CIP2 FROM WORK.TEMP7;
CREATE TABLE WORK.TEMP9 AS SELECT TEMP7.*, TEMP8.CIP2 AS CIP3 FROM WORK.TEMP7 INNER JOIN WORK.TEMP8 ON (TEMP7.IDHH2=TEMP8.IDHH2);
CREATE TABLE WORK.TEMP10 AS SELECT TEMP9.*, TEMP8.IDHH2 AS IDHH3 FROM WORK.TEMP9 INNER JOIN WORK.TEMP8 ON (TEMP9.CIP3=TEMP8.CIP2);
CREATE TABLE WORK.TEMP11 AS SELECT DISTINCT VAR1, IDHH3 AS VAR2 FROM WORK.TEMP10 ORDER BY VAR1, IDHH3;
CREATE TABLE WORK.TEMP12 AS SELECT VAR1, MAX(VAR2) AS VAR2 FROM WORK.TEMP11 GROUP BY VAR1;
CREATE TABLE WORK.TEMP13 AS SELECT TEMP11.* FROM WORK.TEMP11 INNER JOIN WORK.TEMP12 ON (TEMP11.VAR1=TEMP12.VAR1 AND TEMP11.VAR2=TEMP12.VAR2);
CREATE TABLE WORK.TEMP14 AS SELECT TEMP3.*, TEMP13.VAR2 AS IDPERSONA FROM WORK.TEMP3 LEFT JOIN WORK.TEMP13 ON (TEMP3.VAR1=TEMP13.VAR1);
CREATE TABLE WORK.TEMP15 AS SELECT DISTINCT VAR2, IDPERSONA FROM WORK.TEMP14 WHERE VAR2^=. AND IDPERSONA^=.;
CREATE TABLE WORK.TEMP16 AS SELECT TEMP14.*, TEMP15.IDPERSONA AS IDPERSONA2 FROM WORK.TEMP14 LEFT JOIN WORK.TEMP15 ON (TEMP14.VAR2=TEMP15.VAR2) ORDER BY DATE;
QUIT;
DATA TEMP16;
SET TEMP16;
IF IDPERSONA=. THEN IDPERSONA=IDPERSONA2;
DROP IDPERSONA2;
RUN;
And the right results:
VAR2 VAR1 DATE DUMMY IDPERSONA
1 A 1 1 9
1 A 2 1 9
1 B 3 1 9
2 C 4 1 2
3 D 5 1 9
4 E 6 1 4
5 F 7 1 5
6 B 8 1 9
7 B 9 1 9
1 D 10 1 9
1 X 11 1 9
7 G 14 1 9
6 Y 15 1 9
6 D 16 1 9
6 I 18 1 9
8 D 20 1 9
9 Z 21 1 9
9 X 22 1 9
I forgot to post my final solution, it is a SAS macro. I've made another one for 3 variables.
%MACRO GROUPER2(INDATA,OUTDATA,ID1,ID2,IDOUT,IDN=_N_,MAXN=5);
%PUT ****************************************************************;
%PUT ****************************************************************;
%PUT **** GROUPER MACRO;
%PUT **** PARAMETERS:;
%PUT **** INPUT DATA: &INDATA.;
%PUT **** OUTPUT DATA: &OUTDATA.;
%PUT **** FIRST VARIABLE: &ID1.;
%PUT **** SECOND VARIABLE: &ID2.;
%PUT **** OUTPUT GROUPING VARIABLE: &IDOUT.;
%IF (&IDN.=_N_) %THEN %PUT **** STARTING NUMBER VARIABLE: AUTONUMBER;
%ELSE %PUT **** STARTING NUMBER VARIABLE: &IDN.;
%PUT **** MAX ITERATIONS: &MAXN.;
%PUT ****************************************************************;
%PUT ****************************************************************;
/* CREATE FIRST GUESS FOR GROUP ID */
DATA _G_TEMP1 _G_TEMP2;
SET &INDATA.;
&IDOUT.=&IDN.;
IF &IDOUT.=. THEN OUTPUT _G_TEMP2;
ELSE OUTPUT _G_TEMP1;
RUN;
PROC SQL NOPRINT;
SELECT MAX(&IDOUT.) INTO :MAXIDOUT FROM _G_TEMP1;
QUIT;
DATA _G_TEMP2;
SET _G_TEMP2;
&IDOUT.=_N_+&MAXIDOUT.;
RUN;
DATA _G_TEMP;
SET _G_TEMP1 _G_TEMP2;
RUN;
PROC SQL;
UPDATE _G_TEMP SET &IDOUT.=. WHERE &ID1. IS NULL AND &ID2. IS NULL;
QUIT;
/* LOOP, IMPROVE GROUP ID EACH TIME*/
%LET I = 1;
%DO %WHILE (&I. <= &MAXN.);
%PUT LOOP NUMBER &I.;
%LET I = %EVAL(&I. + 1);
PROC SQL NOPRINT;
/* FIND THE LOWEST GROUP ID FOR EACH GROUP OF FIRST VARIABLE */
CREATE TABLE _G_MAP1 AS SELECT MIN(&IDOUT.) AS &IDOUT., &ID1. FROM _G_TEMP WHERE &ID1. IS NOT NULL GROUP BY &ID1.;
/* FIND THE LOWEST GROUP ID FOR EACH GROUP OF SECOND VARIABLE */
CREATE TABLE _G_MAP2 AS SELECT MIN(&IDOUT.) AS &IDOUT., &ID2. FROM _G_TEMP WHERE &ID2. IS NOT NULL GROUP BY &ID2.;
/* FIND THE LOWEST GROUP ID FROM BOTH GROUPING VARIABLES */
CREATE TABLE _G_NEW AS SELECT A.&ID1., A.&ID2., COALESCE(MIN(B.&IDOUT., C.&IDOUT.), A.&IDOUT.) AS &IDOUT.,
A.&IDOUT. AS &IDOUT._OLD FROM _G_TEMP AS A FULL OUTER JOIN _G_MAP1 AS B ON A.&ID1. = B.&ID1.
FULL OUTER JOIN _G_MAP2 AS C ON A.&ID2. = C.&ID2.;
/* PUT RESULTS INTO TEMPORARY DATASET READY FOR NEXT ITTERATION */
CREATE TABLE _G_TEMP AS SELECT * FROM _G_NEW ORDER BY &ID1., &ID2.;
/* CHECK IF THE ITTERATION PROVIDED ANY IMPROVEMENT */
SELECT MIN(CASE WHEN &IDOUT._OLD = &IDOUT. THEN 1 ELSE 0 END) INTO :STOPFLAG FROM _G_TEMP;
%PUT NO IMPROVEMENT? &STOPFLAG.;
QUIT;
/* END LOOP IF ID UNCHANGED OVER LAST ITTERATION */
%LET ITERATIONS=%EVAL(&I. - 1);
%IF &STOPFLAG. %THEN %LET I = %EVAL(&MAXN. + 1);
%END;
%PUT ****************************************************************;
%PUT ****************************************************************;
%IF &STOPFLAG. %THEN %PUT **** LOOPING ENDED BY NO-IMPROVEMENT CRITERIA. OUTPUT FULLY GROUPED.;
%ELSE %PUT **** WARNING: LOOPING ENDED BY REACHING THE MAXIMUM NUMBER OF ITERARIONS. OUTPUT NOT FULLY GROUPED.;
%PUT **** NUMBER OF ITERATIONS: &ITERATIONS. (MAX: &MAXN.);
%PUT ****************************************************************;
%PUT ****************************************************************;
DATA &OUTDATA.;
SET _G_TEMP;
DROP &IDOUT._OLD;
RUN;
/* OUTPUT LOOKUP TABLE */
PROC SQL;
CREATE TABLE &OUTDATA._1 AS SELECT &ID1., MIN(&IDOUT.) AS &IDOUT. FROM _G_TEMP WHERE &ID1. IS NOT NULL GROUP BY &ID1. ORDER BY &ID1.;
CREATE TABLE &OUTDATA._2 AS SELECT &ID2., MIN(&IDOUT.) AS &IDOUT. FROM _G_TEMP WHERE &ID2. IS NOT NULL GROUP BY &ID2. ORDER BY &ID2.;
QUIT;
/* CLEAN UP */
PROC DATASETS NOLIST;
DELETE _G_:;
QUIT;
%MEND GROUPER2;
I have to get rid of a subject if it satisfies a condition.
DATA:
Name Value1
A 60
A 30
B 70
B 30
C 60
C 50
D 70
D 40
What I want is if the value=30 then both the lines should not come in theoutput.
Desired outpu is
Name Value1
C 60
C 50
D 70
D 40
I have written a code in proc sql as
proc sql;
create table ck1 as
select * from ip where name in
(select distinct name from ip where value = 30)
order by name, subject, folderseq;
quit;
Change your SQL to be:
proc sql;
create table ck1 as
select * from ip where name not in
(select distinct name from ip where value = 30)
order by name, subject, folderseq;
quit;
Data step method:
data have;
input Name $ Value1;
datalines;
A 60
A 30
B 70
B 30
C 60
C 50
D 70
D 40
;;;;
run;
data want;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if value1=30 then value1_30=1;
if value1_30=1 then leave;
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if value1_30 ne 1 then output;
end;
run;
And an alternate, slightly faster method in some cases that avoids the second set statement when value1_30 is 1 (this is faster in particular if most have a 30 in them, so you're only keeping a small number of records).
data want;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
counter+1;
if first.name then firstcounter=counter;
else if last.name then lastcounter=counter;
if value1=30 then value1_30=1;
if value1_30=1 then leave;
end;
if value1_30 ne 1 then
do _n_ = firstcounter to lastcounter ;
set have point=_n_;
output;
end;
run;
Another SQL option...
proc sql number;
select
a.name,
a.value1,
case
when value1 = 30 then 1
else 0
end as flag,
sum(calculated flag) as countflagpername
from have a
group by a.name
having countflagpername = 0
;quit;