SAS - How to separate a string with variable substring into multiple columns

SAS - How to separate a string with variable substring into multiple columns - variables

I have a dataset containing a variable X, made up of multiple numbers separated by a comma. The number of item is different among rows. I created a count words. Now I would like to see the numbers in different columns.
Here the example:
X Num_of_X Var1 Var2 Var3 Var4 ... Varn
3,10,165 3 3 10 165
1 1 1
15,100 2 15 100
10,52,63,90 4 10 52 63 90
I tried this way:
%let max_num_X=max(num_of_x);
data have;
set have;
length var1-var&max_num_X $10.;
array Var(&max_num_X) $;
do i=1 to &max_num_X;
Var[i]=scan(X,i,',');
output;
end;
run;
Could you help me?
Thank you

Do something like this
data have;
input X :$20.;
datalines;
3,10,165
1
15,100
10,52,63,90
;
data long;
set have;
n = _N_;
do i = 1 to countw(X, ',');
xx = scan(X, i, ',');
output;
end;
run;
proc transpose data = long out = want(drop=_:) prefix=Var;
by n;
id i;
var xx;
run;

you could use a macro to find the maximum number and create the variables:
%macro create_vars();
proc sql noprint; select max(countw(X)) into :max_num_X from have; quit;
data have; set have;
%do i = 1 %to &max_num_X.; Var&i. = scan(X,&i.,','); %end;
run;
%mend;
%create_vars();

Related

fill the nulls of a column with the mean sum of the division of two columns multiplied by one column minus the previous column SAS

I need to fill the nulls of a column with the mean sum of the division of two columns multiplied by one column and rest the previous An example would be
A B_01 B_02 ... B_60
5 . .
5 2 3
7 3 1,2
9 3 0,3
4 . .
Well, I would like the missing value for column B_01 to be (2/5 + 3/7 + 3/9) / 3 * its corresponding column A
For column B_02(3/5 + 1,2/7 + 0,3/9)/3 * its corresponding column A - his new value in B_01
I have thought about doing this, but it turns out that I have 60 columns with which to do it and the only way it comes to mi mind is to do this 60 times.
Proc sql;
create table new as
Select *
, sum(B_01/A)/sum(case when B_01 is missimg then . else 1)*A end as new_B_01
, sum(B_02/A)/sum(case when B_02 is missimg then . else 1)*A-B_01 end as new_B_02
from table_one
;
Thanks

This may be what you want.
data test;
input A B_01-B_02;
cards;
5 . .
5 2 3
7 3 1.2
9 3 0.3
4 . .
;;;;
data test2;
set test;
array B_ B_01-B_02;
array M_[2];
do i = 1 to dim(m_);
if not missing(b_[i]) then m_[i]= divide(b_[i],a);
end;
drop i;
run;
proc print;
run;
proc stdize reponly missing=mean data=test2 out=mean;
var m_:;
run;
proc print;
run;
data mean2;
set mean;
array B_ B_01-B_02;
array M_[2];
do i = 1 to dim(m_);
b_[i] = coalesce(b_[i],m_[i]);
end;
drop i m_:;
run;
proc print;
run;

Try this.
data test;
input A B_01-B_02;
cards;
5 . .
5 2 3
7 3 1.2
9 3 0.3
4 . .
;;;;
run;
data mean;
set test end=eof;
array b(*) b_:;
array sumb(2) _temporary_;
array cntb(2) _temporary_;
array mean_b_(2);
* cumulative sums and counts;
do i = 1 to dim(b);
sumb(i) = sum(sumb(i), b(i)/a);
if not missing(b(i)) then cntb(i) + 1;
end;
* mean;
if eof then do;
do i = 1 to dim(b);
mean_b_(i) = sumb(i)/cntb(i);
end;
output;
end;
keep mean_:;
run;
data test2;
if _n_ = 1 then do;
set mean;
array mean_b_(*) mean_b:;
end;
set test;
array b_(*) b_:;
array new_b_(2);
do i = 1 to dim(b_);
if i = 1 then new_b_(i) = coalesce(b_(i), mean_b_(i) * a);
else new_b_(i) = coalesce(b_(i), mean_b_(i) * a - new_b_(i-1));
end;
run;

How do i assign a value to a new variable, using another dataset which contains one value in SAS

I have a dataframe
ID value1
1 12
2 345
3 342
i have a second dataframe
value2
3823
how do I get the following result?
ID value1 value2
1 12 3823
2 345 3823
3 342 3823
any joins I have done have given me
ID value1 value2
1 12 .
2 345 .
3 342 .
. . 3823

No need for joins or helper variables:
data have;
do i = 1 to 3;
output;
end;
run;
data lookup;
j = 1;
run;
data want;
set have;
if _n_ = 1 then set lookup;
run;
Without the if _n_ = 1, the data step stops after one iteration when it tries to read a second row from the lookup dataset and finds that there are no rows remaining.
N.B. this requires that the have dataset doesn't already contain a variable with the same name as the variable(s) attached from the lookup dataset.

By far the easiest way to do this is to utilize PROC SQL and defining the condition 1=1, which is always true for each comparison:
data first;
input ID value1 ##;
cards;
1 12 2 345 3 342
run;
data second;
input value2 ;
cards;
3823
run;
proc sql;
create table wanted as
select * from first
left join second
on 1 =1
;quit;
Edit: As far as I know, there isn't direct way to merge datasets by each row, but you can do the following trick:
Add variable Help:
data second_trick;
set second;
help=1;
run;
data first_trick;
set first;
help=1;
run;
Then we just perform the merge by the static variable:
data wanted_trick;
merge first_trick(in=a) second_trick;
by help;
if a; /*Left join, just to be sure.*/
run;
now this only works if you want to add single static value. Don't try to use it your Second set has more rows.
For more on Merges and joins see: https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/249-30.pdf

Dropping variable based on sum of values in it using SAS

I wish to drop the columns in a SAS dataset which has a sum less than a particular value. Consider the case below.
Column_A Pred_1 Pred_2 Pred_3 Pred_4 Pred_5
A 1 1 0 1 0
A 0 1 0 1 0
A 0 1 0 1 0
A 0 1 0 1 1
A 0 1 0 0 1
Let us assume that our threshold is 4, so I wish to drop predictors having sum of active observations less than 4, so the output would look like
Column_A Pred_2 Pred_4
A 1 1
A 1 1
A 1 1
A 1 1
A 1 0
Currently I am using a very inefficient method of using multiple transposes to drop the predictors. There are multiple datasets with records > 30,000 so transpose approach is taking time. Would appreciate if anyone has a more efficient solution!
Thanks!

Seems like you could do:
Run PROC MEANS or similar proc to get the sums
Create a macro variable that contains all variables in the dataset with sum < threshhold
Drop those variables
Then no TRANSPOSE or whatever, just regular plain old summarization and drops. Note you should use ODS OUTPUT not the OUT= in PROC MEANS, or else you will have to PROC TRANSPOSE the normal PROC MEANS OUT= dataset.
An example using a trivial dataset:
data have;
array x[20];
do _n_ = 1 to 20;
do _i = 1 to dim(x);
x[_i] = rand('Uniform') < 0.2;
end;
output;
end;
run;
ods output summary=have_sums; *how we get our output;
ods html select none; *stop it from going to results window;
proc means data=have stackodsoutput sum; *stackodsoutput is 9.3+ I believe;
var x1-x20;
run;
ods html select all; *reenable normal html output;
%let threshhold=4; *your threshhold value;
proc sql;
select variable
into :droplist_threshhold separated by ' '
from have_sums
where sum lt &threshhold; *checking if sum is over threshhold;
quit;
data want;
set have;
drop &droplist_threshhold.; *and now, drop them!;
run;

Just use PROC SUMMARY to get the sums. You can then use a data step to generate the list of variable names to drop.
%let threshhold=4;
%let varlist= pred_1 - pred_5;
proc summary data=have ;
var &varlist ;
output out=sum sum= ;
run;
data _null_;
set sum ;
array x &varlist ;
length droplist $500 ;
do i=1 to dim(x);
if x(i) < &threshhold then droplist=catx(' ',droplist,vname(x(i)));
end;
call symputx('droplist',droplist);
run;
You can then use the macro variable to generate a DROP statement or a DROP= dataset option.
drop &droplist;

how to assign count of column to anothe variable in sas?

%Let abc = count( no of variables in data set )

The following code assigns the number of columns in the dataset 'have' to the macro variable abc.
data _null_;
if 0 then
do;
set have (obs=0);
end;
array chars _character_;
array nums _numeric_;
ncharvar = dim(chars);
nnumvar = dim(nums);
nvar = ncharvar + nnumvar;
call symput('abc',nvar);
run;

Simplifying the variable input in SAS

I have 90 variables in the data, I want to do the following in SAS.
Here is my SAS code:
data test;
length id class sex $ 30;
input id $ 1 class $ 4-6 sex $ 8 survial $ 10;
cards;
1 3rd F Y
2 2nd F Y
3 2nd F N
4 1st M N
5 3rd F N
6 2nd M Y
;
run;
data items2;
set test;
length tid 8;
length item $8;
tid = _n_;
item = class;
output;
item = sex;
output;
item = survial;
output;
keep tid item;
run;
What if I have 90 variables to input the data like this? There should be a very long list. I want to simplify it.

You could use an ARRAY or alternately a PROC TRANSPOSE.
The following is untested, because you haven't provided an exxample of your input dataset.
DATA ITEMS;
ARRAY VARS {*} VAR1-VAR90;
SET REPLACE;
DO I = LBOUND(VARS) TO HBOUUND(VARS);
ITEM = VARS{I};
OUTPUT;
END;
RUN;
OR
PROC TRANSPOSE DATA = TEST OUT = WANT;
BY ID;
VAR CLASS -- SURVIAL;
RUN;
In the future it would be best is you could supply your input and desired output.

I don't seem to be able to add another comment to the above answer, as such I am adding one here.
You need to extend the VAR statement to include all variables that you want transposed.
CLASS -- SURVIAL means all variables between CLASS and SURVIVAL inclusive.
Post your code and the error so that I can help you better.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SAS - How to separate a string with variable substring into multiple columns - variables

Do something like this data have; input X :$20.; datalines; 3,10,165 1 15,100 10,52,63,90 ; data long; set have; n = _N_; do i = 1 to countw(X, ','); xx = scan(X, i, ','); output; end; run; proc transpose data = long out = want(drop=_:) prefix=Var; by n; id i; var xx; run;

you could use a macro to find the maximum number and create the variables: %macro create_vars(); proc sql noprint; select max(countw(X)) into :max_num_X from have; quit; data have; set have; %do i = 1 %to &max_num_X.; Var&i. = scan(X,&i.,','); %end; run; %mend; %create_vars();

Related

fill the nulls of a column with the mean sum of the division of two columns multiplied by one column minus the previous column SAS

How do i assign a value to a new variable, using another dataset which contains one value in SAS

Dropping variable based on sum of values in it using SAS

how to assign count of column to anothe variable in sas?

Simplifying the variable input in SAS

Categories

Resources