identifying the rows with maximum continuous values - sql

I have two columns in a table. the second column has 1 or zero depending on a predefined condition. Can someone help me with a logic to identify the maximum continuous occurrence of 1s. For example, in the below table the maximum continuous occurrence is between rows 7 and 18. Just the logic to identify this would be enough.
Thanks

Create the intervals.
data intervals ;
set have ;
by B NOTSORTED ;
if first.b then start=A ;
retain start ;
if last.b then do;
end = A ;
duration = end - start + 1 ;
output;
end;
drop A ;
run;
Then find the interval with the maximum duration. Perhaps you want the first occurrence of the maximum duration?
proc sort data=intervals out=want ;
by descending duration start;
run;
data want ;
set want (obs=1);
where B=1;
run;

something like this
data have;
input A B;
datalines;
1 0
2 0
3 1
4 1
5 1
6 0
7 0
8 0
9 1
10 0
11 1
12 1
13 1
14 1
15 1
16 1
17 0
18 0
19 0
20 1
21 0
;
proc sort data=have;
by A;
run;
data want;
set have;
if B=1 then count + 1;
if B = 0 then count = 0;
run;
proc sql;
select max(count) as max_value from want;

Related

how to Sort negative and positive data in SAS

I have bellow data in variable NUM
-3 1 0 1 3 2 -2 5 -5 -6 4 6 -4
i want data NUM in bellow sorting order
0 -1 1 -2 2 -3 3 -4 4 -5 5 -6 6
How can we sort negative and positive values together? please help
data have;
input NUM ##;
cards;
-3 1 0 1 3 2 -2 5 -5 -6 4 6 -4
;
run;
Sort by abs(num), num if you want the negative values to appear before the positive within the same absolute value as in the requested data.
data have;
input NUM ##;
cards;
-3 1 0 -1 3 2 -2 5 -5 -6 4 6 -4
;
run;
proc sql;
create table want as
select * from have
order by abs(num), num
;
quit;
Make a new variable with the absolute value and include it in the sort.
data want;
set have;
absolute=abs(num);
run;
proc sort data=want;
by absolute num;
run;

How do I add a key to a row based on its "group"?

I have a data set like this:
a 10
a 13
a 14
b 15
b 44
c 64
c 32
d 12
I want to write a PROC SQL statement or DATA step that will yield this:
a 10 1
a 13 1
a 14 1
b 15 2
b 44 2
c 64 3
c 32 3
d 12 4
How do?
DATA TEST;
INPUT id $ value ;
DATALINES;
a 10
a 13
a 14
b 15
b 44
c 64
c 32
d 12
;
RUN;
Sort your data if needed:
proc sort data=test;
by id;
run;
Then:
data want;
set test;
retain key;
by id;
if _n_ = 1 then key = 0;
if first.id then key = key + 1;
run;
The retain statement will retain the value of key through the iterations.
Then, whenever a new id appears, we sum 1 to key.
Alternatively as stated by Keith, you could use this simplified data step to do the job:
data want;
set test;
by id;
if first.id then key + 1;
run;
I'll leave both versions here for reference because I think the first one is easier to understand, and the last one from Keith's comments is a lot cleaner.

Create a variable based on sum of two variables (one lag)

I have a data set like the one below, where the amount has dropped off, but the adjustment remains. For each row amount should be the sum of the previous amount and the adjustment. So, amount for observation 5 is 134 (124+10).
I have an answer which gets me the next value, but I need some sort of recursion to get me the rest of the way there. What am I missing? Thanks.
data have;
input amount adjust;
cards;
100 0
101 1
121 20
124 3
. 10
. 4
. 3
. 0
. 1
;
run;
data attempt;
set have;
x=lag1(amount);
if amount=. then amount=adjust+x;
run;
data want;
input amount adjust;
cards;
100 0
101 1
121 20
124 3
134 10
138 4
141 3
141 0
142 1
;
run;
EDIT:
Also trying something like this now, still not quite what I want.
%macro doodoo;
%do i = 1 %to 5;
data have;
set have;
/* if _n_=i+4 then*/
amount=lag1(amount)+adjust;
run;
%end;
%mend;
%doodoo;
No need to LAG() use RETAIN instead.
data want ;
set have ;
retain previous ;
if amount = . then amount=sum(previous,adjust);
previous=amount ;
run;

SAS: prof freq list view, creating dummy

is there any way to create dummy variables for the list view generated from SAS: proc freq?
e.g.
this is my proc freq output :
x y z N %
0 0 0 10 2.8
0 0 1 20 5.6
0 1 0 30 8.3
0 1 1 40 11.1
1 0 0 50 13.9
1 0 1 60 16.7
1 1 0 70 19.4
1 1 1 80 22.2
can I create (easily in proc freq) dummy variables that can have 1/0 values for each level of the output (that is, 8 dummy variables) OR alternatively, a single variable which will have incremental value of 1,2,3,... for each level of output???
Thanks in advance !!
Here's one way you can do it with a single variable, assuming you just have combinations of variables with values of only 0 or 1:
data yourdata;
do i = 1 to 100;
x = round(ranuni(1));
y = round(ranuni(2));
z = round(ranuni(3));
t = 1;
output;
end;
run;
proc summary nway data = yourdata;
class x y z;
var t;
output out = summary_ds n=;
run;
data summary_ds;
set summary_ds;
singlevar = input(cats(x,y,z),binary3.);
run;

tracking customer retension on weekly basis

I have start and end weeks for a given customer and I need to make panel data for the weeks they are subscribed. I have manipulated the data into an easy form to convert, but when I transpose I do not get the weeks in between start and end filled in. Hopefully an example will shed some light on my request. Weeks start at 0 and end at 61, so forced any week above 61 to be 61, again for simplicity. Populate with a 1 if they are subscribed still and a blank if not.
ID Start_week End_week
1 6 61
2 0 46
3 45 61
what I would like
ID week0 week1 ... week6 ... week45 week46 week47 ... week61
1 . . ... 1 ... 1 1 1 ... 1
2 1 1 ... 1 ... 1 1 0 ... 0
3 0 0 ... 0 ... 1 1 1 ... 1
I see two ways to do it.
I would go for an array approach, since it will probably be the fastest (single data step) and is not that complex:
data RESULT (drop=start_week end_week);
set YOUR_DATA;
array week_array{62} week0-week61;
do week=0 to 61;
if week between start_week and end_week then week_array[week+1]=1;
else week_array[week+1]=0;
end;
run;
Alternatively, you can prepare a table for the transpose to work by creating one record per week per id::
data BEFORE_TRANSPOSE (drop=start_week end_week);
set YOUR_DATA;
do week=0 to 61;
if week between start_week and end_week then subscribed=1;
else subscribed=0;
output;
end;
run;
Use an array to create the variables. The one gotcha is SAS arrays are 1 indexed.
data input;
input ID Start_week End_week;
datalines;
1 6 61
2 0 46
3 45 61
;
data output;
array week[62] week0-week61;
set input;
do i=1 to 62;
if i > start_week and i<= (end_week+1) then
week[i] = 1;
else
week[i] = 0;
end;
drop i;
run;
I have no working syntax but a guideline for you.
first make a table with CTE or physically with the numbers 0 to 61 as rows. Then join this table with the subscribed table. Something like
FROM sub
INNER JOIN CTE
ON CTE.week BETWEEN sub.Start_week AND sub.End_week
Now you will have a row for every week a customer is subscribed. Transpose that and you will have the in between weeks also filled in.