How do I "conditionally" count missing values per variable in PROC SQL? - sql

I'm measuring expenses in different categories. I have two types of variables. A categorical variables which states if the respondent have had expenses in the category (such as "Exkl_UtgUtl_Flyg") and I have numerical variables (such as UtgUtl_FlygSSEK_Pers), which provides information on the amount spent by each respondent in that category.
I want to create a table which tells me if there are missing values in my numerical variables for categories where expenses have been reported (so missing values of "UtgUtl_FlygSSEK_Pers" where the variable "Exkl_UtgUtl_Flyg" equals 1, in an example with only one variable).
This works in a simple SQL query, so something like:
PROC SQL;
SELECT nmiss(UtgUtl_FlygSSEK_Pers)
FROM IBIS3_5
WHERE Exkl_UtgUtl_Flyg=1;
quit;
But I don't want to navigate between 20 different datasets to find my missing values, I want them all in the same table. I figure this should be possible if i write a subquery in the SELECT clause for each variable, so something like:
PROC SQL;
SELECT (SELECT nmiss(UtgUtl_FlygSSEK_Pers)
FROM IBIS3_5
WHERE Exkl_UtgUtl_Flyg=1) as nmiss_variable_1
FROM IBIS3_5;
quit;
This last query does not seem to work, however. It does not return a single value, but one value for each row in the dataset.
How do I make this work?

I suspect you want to generate a single value.
Either the total number of mis-matches.
select sum(missing(UtgUtl_FlygSSEK_Pers) and Exkl_UtgUtl_Flyg=1) as nmiss
from ibis3_5
;
Or perhaps just a binary 1/0 flag of whether or not there are any mismatches.
select max(missing(UtgUtl_FlygSSEK_Pers) and Exkl_UtgUtl_Flyg=1) as any_miss
from ibis3_5
;

Maybe a good usage of proc freq instead. Especially if you have multiple values.
Not all of this is necessary but this is a missing report. Depends exactly how you're defining missing of course.
*create sample data to work with;
data class;
set sashelp.class;
if age=14 then
call missing(height, weight, sex);
if name='Alfred' then
call missing(sex, age, height);
label age="Fancy Age Label";
run;
*set input data set name;
%let INPUT_DSN = class;
%let OUTPUT_DSN = want;
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN.;
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
proc sort data=long;
by variable;
run;
*make it a wide data set for presentation, with values as N (Percent);
proc transpose data=long out=wide_presentation (drop=_name_);
by variable;
id variable_value;
var presentation;
run;
*transpose only N;
proc transpose data=long out=wide_N prefix=N_;
by variable;
id variable_value;
var frequency;
run;
*transpose only percents;
proc transpose data=long out=wide_PCT prefix=PCT_;
by variable;
id variable_value;
var percent;
run;
*final output file;
data &Output_DSN.;
merge wide_N wide_PCT wide_presentation;
by variable;
drop _name_;
label N_Missing='# Missing' N_Not_Missing='# Not Missing'
PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing'
Not_missing='Not Missing';
run;
title "Missing Report of &INPUT_DSN.";
proc print data=&output_dsn. noobs label;
run;

Related

Building an SQL query in Base SAS for 81 variables to report count of non null values

I have a data template with 81 exposure elements/ variables and approx 9 million rows for loans generated by a bank for e.g. customer number , reporting date, account number , customer type etc.
I need to conduct data validation to report
variable missing or not
No of non missing values populated for each available variable
the data type of the values populated under each variable
Individually for each variable I'm using the query
select COUNT(variable) from library.table where not missing(variable);
quit;
How can I extend the above query to all 81 variables ?
I already have the attributes using
proc sql;
create table test as select * from dictionary.columns where libname="XXX" and memname="tablename";
quit;
But if the above could be incorporated in one holistic query that could generate an output which I can potentially export as an excel , that would be great
Thanks
In SAS there are usually PROCs for this kind of task. For this, it would be the PROC FREQ, e. g. this example.
If you want an output dataset you can adapt the linked solution and do
proc format;
value $missfmt ' ' = 'Missing' other = 'Not Missing';
value missfmt . = 'Missing' other = 'Not Missing';
run;
/* Capture the output. */
ods output OneWayFreqs = want;
/* Count missing and not missing. */
proc freq data=have;
format _char_ $missfmt. _numeric_ missfmt.;
tables _all_ / missing nocum nopercent;
run;
data want;
set want;
array f {*} f_:;
/* Extract column name. */
do i = 1 to dim(f);
if not missing(f[i]) then
column = substr(vname(f[i]), 3);
end;
/* Extract column type. */
type = vtypex(column);
/* Get value, i. e. missing or not missing. */
value = cats(of f_:);
run;
proc sort data=want;
by column type value;
run;
/* Transpose the missing and not missing rows into two columns. */
proc transpose data=want out=want(drop=_:);
by column type;
id value;
var frequency;
run;

Get Data from SAS Macro (list of values) to SAS table (column)

I am trying to create a SAS table from Macro variable using PROC SQL:
I have a list of value saved in a macro variable :
%let l=1,2,3;
I want to create a SAS table with a column containing the values of the macro variable :
1
2
3
Thank you very much for your help.
Sincerely,
Abdeljalil
you should so some effort to solving this yourself.
Put the values into a string, parse the string and output the values you would like.
%let l=1,2,3;
data want;
str = "&l";
do i=1 to countw(str,',');
value = input(scan(str,i,","),best.);
output;
end;
/*drop other variables if you want*/
drop str i;
run;
Something like this?
%let age=%str(12,13,15);
proc sql;
select * from sashelp.class where age in (&age);
quit;
You have a data set that contains a list of names and you want to place these names into a macro variable for later use. That will work as long as the macro variable does not go beyond the 64K limit.
If the value hits this limit, then you can use macro processing to retrieve the names from the data set. Since a macro definition does not have the 64K restriction, it can be used to create the list for you.
In the sample code on the Full Code tab, we have a list of names that we want to use on an INPUT statement along with a given informat. This sample demonstrates how to create the list without having to use a macro variable.
data one;
input name $;
datalines;
abc
def
ghi
;
run;
%macro test;
%let dsid=%sysfunc(open(one));
%let cnt=%sysfunc(attrn(&dsid,nobs));
%do i=1 %to &cnt;
%let rc=%sysfunc(fetchobs(&dsid,&i));
%cmpres(%sysfunc(getvarc(&dsid,%sysfunc(varnum(&dsid,name))))) $4.
%end;
%let rc=%sysfunc(close(&dsid));
%mend test;
/** Using %PUT to see outcome **/
/** %test could be used on an INPUT statement **/
%put %test;
source: http://support.sas.com/kb/39/605.html

How to randomly select variables in SAS?

I can find all sorts of information on how to randomly select observations in SAS which is a fairly easy task. This is not what I need though. I need to randomly select variables. What I want to do specifically is randomly choose 20 variables from my list of 159 variables and do this 50 times. I want to ensure diversity too. I have been spending about two days on this and am having no luck.
I'm glad that you asked this question, because I just developed a solution for that! Let's break down exactly what needs to be done, step-by-step.
Step 0: What do we need to do?
We need a way to take all of our variables and randomly select 20 of them while keeping them within the bounds of the SAS language rules.
We'll require:
All variables in the dataset
A way to re-sort them randomly
A limit of 20 variables
A way to loop this 50 times
Let's start with 1.
Step 1: Getting all the variables
sashelp.vcolumn provides a list of all variables within a dataset. Let's select them all.
proc sql noprint;
create table all_vars as
select name
where libname = 'LIBRARYHERE' AND memname = 'HAVE'
;
quit;
This gets us a list of all variables within our dataset. Now, we need to sort them randomly.
Step 2: Making them random
SAS provides the rand function that allows you to pull from any distribution that you'd like. You can use call streaminit(seedhere) prior to the rand function to set a specific seed, creating reproducable results.
We'll simply modify our original SQL statement and order the dataset with the rand() function.
data _null_;
call streaminit(1234);
run;
proc sql noprint;
create table all_vars as
select name
from sashelp.vcolumn
where libname = 'LIBRARYHERE' AND memname = 'HAVE'
order by rand('uniform');
quit;
Now we've got all of our variables in a random order, distributed evenly by the uniform distribution.
Step 3: Limit to 20 variables
You can do this a few ways. One way is the obs= dataset option in separate procedures, another is the outobs= proc sql option. Personally, I like the obs= dataset option since it doesn't generate a warning in the log, and can be used in other procedures.
data _null_;
call streaminit(1234);
run;
proc sql noprint outobs=20;
create table all_vars as
select name
from sashelp.vcolumn
where libname = 'LIBRARYHERE' AND memname = 'HAVE'
order by rand('uniform');
quit;
Step 4: Loop it 50 times
We'll use SAS Macro Language to do this part. We can create 50 individual datasets this way, or switch the code up slightly and read them into macro variables.
%macro selectVars(loop=50, seed=1234);
data _null_;
call streaminit(&seed);
run;
%do i = 1 %to &loop;
proc sql noprint outobs=20;
create table all_vars&i as
select name
from sashelp.vcolumn
where libname = 'LIBRARYHERE' AND memname = 'HAVE'
order by rand('uniform')
;
quit;
%end;
%mend;
%selectVars;
Or, option 2:
%macro selectVars(loop=50, seed=1234);
data _null_;
call streaminit(&seed);
run;
%do i = 1 %to &loop;
proc sql noprint outobs=20;
select name
into :varlist separated by ' '
from sashelp.vcolumn
where libname = 'LIBRARYHERE' AND memname = 'HAVE'
order by rand('uniform')
;
quit;
%end;
%mend;
%selectVars;
The 2nd option will create a local macro variable called &varlist that will have the random 20 variables separated by spaces. This can be convenient for various modeling procs, and is preferable since it does not create a separate dataset each time.
Hope this helps!
You will need to treat your meta data as data and use SURVEYSELECT to select observations. Then perhaps put these names into macro variables but you did not mention the exact output you want.
data v;
array rvars[159];
run;
proc transpose data=v(obs=0) out=vars name=name;
var rvars:;
run;
proc surveyselect reps=4 sampsize=20 data=vars out=selection;
run;
proc transpose data=selection out=lists(drop=_:);
by replicate;
var name;
run;
proc print;
run;
data _null_;
set lists;
by replicate;
call symputx(cats('VLIST',_n_),catx(' ',of col:));
run;
%put _global_;

SAS sql select variable as change name to a date in MonYY7. format

I am not sure if it is possible at all, but in case someone knows the answer. I need to select variables and rename them to dates in MonYY7. format. My understanding is that SAS stores dates as numbers, and it is the formats which represent them in the former way. However, would it be possible to somehow rename the variable's name itself according to the format?
Here is the code I have written:
%macro try;
%let month_count_back = 12;
%let today = %sysfunc(today());
%let sysmonth = %sysfunc(month("&sysdate"d));
proc sql;
create table try as
select *,
%do i = -&sysmonth. %to -&month_count_back.-&sysmonth.+1 %by -1;
max(month(FP_NDT) = month(intnx('month',&today.,&i.))) as mn%eval(&month_count_back.+&sysmonth.+&i.)
%if &i. = -&month_count_back.-&sysmonth.+1 %then %goto leave_month;
,
%leave_month:
%end;
from work.test
group by var;
quit;
run;
%mend try;
%try;
run;
It returns dummy indicators for each month value of the 'var' variable for the previous year (the intention here is to know which values are null and which are not). However, I would like each dummy variable created be named according to the month and the year it refers to. For example, m12 should be DEC2015, m11 - NOV2015 etc... As a corollary if month_count_back is equal to, say, 36 then m36 should be DEC2015, but M12 should be DEC2013 and M1 should be JAN2013 etc...
Maybe there is way to rename it later in a data step? I have tried to loop through it, but could not control for the changing month_count_back value...
Would appreciate any suggestions, thanks!

SAS - renaming variables

I am trying to change the names of variables in my table/dataset. I went through several websites and this discussion forum, but I didnĀ“t manage to find any code that would work properly in my case (i am a newcomer to SAS).
My dataset contains 103 columns and I would like to rename the first 100 columns. The name of the first column is CFT(1), CFT(2) of the second column,..., CFT(100) of the 100th column. New variables can be called for example CFT_n(1),...,CFT_n(100).
The code I was using is following:
data vystup_m200_b;
set vystup_m200_a;
rename 'cft(1)'n - 'cft(100)'n='cft(1)_n'n - 'cft(100)_n'n;
run;
But I obtain an error stating:
Aplhabetic prefixes for enumerated variables (cft(1)-cft(100)) are different.
Thank you for any suggestion what I am doing wrong.
Even with validvarname=any the numeric suffix on a numbered variable list have to have the number as the last part of the name. You "could" use the features of PROC TRANSPOSE to flip-flop the data to rename the variables. This is only advisable if the data are rather small.
data ren;
array _a[*] 'cft(1)'n 'cft(2)'n 'cft(3)'n ( 1 2 3);
do i = 1 to 10;
output;
end;
drop i;
run;
proc transpose data=ren out=ren2;
run;
proc transpose data=ren2 out=renamed(drop=_name_) suffix=_N;
id _name_;
run;
If your variables are sequentially named, a simple macro will suffice:
option validvarname = any;
data ren;
array _a[*] 'cft(1)'n 'cft(2)'n 'cft(3)'n ( 1 2 3);
do i = 1 to 10;
output;
end;
drop i;
run;
%macro rename_loop;
%local i;
%do i = 1 %to 3;
"cft(&i)"n = "cft(&i)_n"n
%end;
%mend rename_loop;
proc datasets lib = work nolist nowarn nodetails;
modify ren;
rename %rename_loop;
run;
quit;
This should work more or less instantaneously, regardless of the size of the dataset, as it only needs to update the metadata.
Renaming is fastest. I would look to a more general solution that doesn't require knowing anything like the name or how many or if you need name literals.
data ren;
array _a[*] 'cft(1)'n 'cft(2)'n 'cft(3)'n (1 2 3);
do i = 1 to 10;
output;
end;
drop i;
run;
proc print;
run;
proc transpose data=ren(obs=0) out=ren2;
run;
proc sql noprint;
select catx('=',nliteral(_name_),nliteral(cats(_name_,'_n')))
into :renamelist separated by ' '
from ren2;
quit;
run;
%put NOTE: &=renamelist;
proc datasets nolist;
modify ren;
rename &renamelist;
run;
contents data=ren varnum short;
quit;
Another solution, which is renaming variables after upload:
proc import datafile="\\folder\RUN_00.xlsx"
dbms=xlsx out=run_00 replace;
run;
data rename;
length ren $32767;
set run_00(obs= 1);
keep ren delka;
array cfte{*} CFT:;
do i=1 to dim(cfte);
ren=strip(ren)||" 'cft("||strip(i)||")'n='cft_"||strip(i)||"_00'n";
delka=length(ren);
end;
call symputx("renam",ren);
run;
proc datasets library=work;
modify run_00;
rename &renam;
run;