Format the summarised variables from proc summary - formatting

I'm using a Proc Summary, as I want to utilise a multilabel format. I've been going round and round trying to apply a format to my summarised outputs, but can't see how to get this without incurring warnings.
Proc Summary Data = Source CompleteTypes Missing NoPrint NWay;
Class Brand / MLF;
Var Id Total;
Output Out = Results
N(ID) = Volume
Sum(Total) = Grand_Total;
Run;
I want to format my Volume as Comma23. and the Grand_Total as Comma23.2. If I put a format statement after the outputs it warns me that the variables don't exist, but the dataset does have the format applied.
I would have thought that formatting a summarised variable would be a common action, but I can't find a way to apply it without getting the warnings. Is there something I'm missing?
Many thanks

Another approach is to use proc template to apply the format. The format will be carried over into the newly created data set using the ods output. Use ods trace on to find (1) the name of the template to alter (2) the name of the object to output into a data set. In your case, you want to alter the Base.Summary template and output the Summary object. Both will be found in the log when you run ods trace in front of a proc step. This can be done with other procedures as well. For instance, a proc frequency of a single table has the template Base.Freq.OneWayList
/* Create Test Data */
data test (drop = num);
do num = 1 to 100;
x = ceil(rand('NORMAL', 100, 10));
output;
end;
run;
/* Check log with ODS Trace On to find template to alter and object to output */
ods trace on;
proc summary data = test sum n mean print;
var x;
run;
ods trace off;
/* Alter the Base.Summary template */
ods path reset;
ods path (PREPEND) WORK.TEMPLATE(UPDATE);
proc template;
edit Base.Summary;
edit N;
label = 'Count';
header = varlabel;
format = Comma10.;
end;
edit Mean;
label = 'Average';
header = varlabel;
format = Comma10.;
end;
edit Sum;
label = "Sum";
header = varlabel;
format = Comma10.;
end;
end;
run;
/* Output Results (formatted) from the Proc */
ods output summary = results;
proc summary data = test sum n mean print stackodsoutput;
var x;
run;

Some statistics like SUM inherit the format of the analysis variable. N statistics does not inherit the format but you can format the new variable if you can use the : trick shown in the example, and no warning is produced.
proc summary data=sashelp.class;
class sex;
output out=test n(age)=Nage sum(weight)=sum_weight;
format nage: comma12. weight comma12.3;
run;
proc contents varnum;
run;
proc print;
run;

Use proc datasets to apply the format to your output dataset after proc summary has created it:
proc datasets lib = work;
modify results;
format Volume comma23. Grand_total comma23.2;
run;
quit;

Related

How do I "conditionally" count missing values per variable in PROC SQL?

I'm measuring expenses in different categories. I have two types of variables. A categorical variables which states if the respondent have had expenses in the category (such as "Exkl_UtgUtl_Flyg") and I have numerical variables (such as UtgUtl_FlygSSEK_Pers), which provides information on the amount spent by each respondent in that category.
I want to create a table which tells me if there are missing values in my numerical variables for categories where expenses have been reported (so missing values of "UtgUtl_FlygSSEK_Pers" where the variable "Exkl_UtgUtl_Flyg" equals 1, in an example with only one variable).
This works in a simple SQL query, so something like:
PROC SQL;
SELECT nmiss(UtgUtl_FlygSSEK_Pers)
FROM IBIS3_5
WHERE Exkl_UtgUtl_Flyg=1;
quit;
But I don't want to navigate between 20 different datasets to find my missing values, I want them all in the same table. I figure this should be possible if i write a subquery in the SELECT clause for each variable, so something like:
PROC SQL;
SELECT (SELECT nmiss(UtgUtl_FlygSSEK_Pers)
FROM IBIS3_5
WHERE Exkl_UtgUtl_Flyg=1) as nmiss_variable_1
FROM IBIS3_5;
quit;
This last query does not seem to work, however. It does not return a single value, but one value for each row in the dataset.
How do I make this work?
I suspect you want to generate a single value.
Either the total number of mis-matches.
select sum(missing(UtgUtl_FlygSSEK_Pers) and Exkl_UtgUtl_Flyg=1) as nmiss
from ibis3_5
;
Or perhaps just a binary 1/0 flag of whether or not there are any mismatches.
select max(missing(UtgUtl_FlygSSEK_Pers) and Exkl_UtgUtl_Flyg=1) as any_miss
from ibis3_5
;
Maybe a good usage of proc freq instead. Especially if you have multiple values.
Not all of this is necessary but this is a missing report. Depends exactly how you're defining missing of course.
*create sample data to work with;
data class;
set sashelp.class;
if age=14 then
call missing(height, weight, sex);
if name='Alfred' then
call missing(sex, age, height);
label age="Fancy Age Label";
run;
*set input data set name;
%let INPUT_DSN = class;
%let OUTPUT_DSN = want;
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN.;
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
proc sort data=long;
by variable;
run;
*make it a wide data set for presentation, with values as N (Percent);
proc transpose data=long out=wide_presentation (drop=_name_);
by variable;
id variable_value;
var presentation;
run;
*transpose only N;
proc transpose data=long out=wide_N prefix=N_;
by variable;
id variable_value;
var frequency;
run;
*transpose only percents;
proc transpose data=long out=wide_PCT prefix=PCT_;
by variable;
id variable_value;
var percent;
run;
*final output file;
data &Output_DSN.;
merge wide_N wide_PCT wide_presentation;
by variable;
drop _name_;
label N_Missing='# Missing' N_Not_Missing='# Not Missing'
PCT_Missing='% Missing' N_Not_Missing='% Not Missing' Missing='Missing'
Not_missing='Not Missing';
run;
title "Missing Report of &INPUT_DSN.";
proc print data=&output_dsn. noobs label;
run;

Building an SQL query in Base SAS for 81 variables to report count of non null values

I have a data template with 81 exposure elements/ variables and approx 9 million rows for loans generated by a bank for e.g. customer number , reporting date, account number , customer type etc.
I need to conduct data validation to report
variable missing or not
No of non missing values populated for each available variable
the data type of the values populated under each variable
Individually for each variable I'm using the query
select COUNT(variable) from library.table where not missing(variable);
quit;
How can I extend the above query to all 81 variables ?
I already have the attributes using
proc sql;
create table test as select * from dictionary.columns where libname="XXX" and memname="tablename";
quit;
But if the above could be incorporated in one holistic query that could generate an output which I can potentially export as an excel , that would be great
Thanks
In SAS there are usually PROCs for this kind of task. For this, it would be the PROC FREQ, e. g. this example.
If you want an output dataset you can adapt the linked solution and do
proc format;
value $missfmt ' ' = 'Missing' other = 'Not Missing';
value missfmt . = 'Missing' other = 'Not Missing';
run;
/* Capture the output. */
ods output OneWayFreqs = want;
/* Count missing and not missing. */
proc freq data=have;
format _char_ $missfmt. _numeric_ missfmt.;
tables _all_ / missing nocum nopercent;
run;
data want;
set want;
array f {*} f_:;
/* Extract column name. */
do i = 1 to dim(f);
if not missing(f[i]) then
column = substr(vname(f[i]), 3);
end;
/* Extract column type. */
type = vtypex(column);
/* Get value, i. e. missing or not missing. */
value = cats(of f_:);
run;
proc sort data=want;
by column type value;
run;
/* Transpose the missing and not missing rows into two columns. */
proc transpose data=want out=want(drop=_:);
by column type;
id value;
var frequency;
run;

Proc Sql Output

Hello i am new in sas and i created sql code, and now i need to redirect the output to /tmp/output.txt.
proc sql;
select (COUNT(IDCUENTACLIENTE)) AS COUNT_of_IDCUENTACLIENTE from S1.CUENTACLIENTE where segmentonivel1 = 'Altas Recientes'
and segmentonivel2 = 'Masivo'
GROUP BY SEGMENTONIVEL1,SEGMENTONIVEL2;
quit;
I tried to put
data _null_;
FILE "/tmp/MyFile.txt";
run;
but is not creating the file.
Some one can help me?
I have a sugestion...
First create a data set using the query. In your code, I have doubts about the GROUP BY you are using. It's run without errors?
Second export to txt file like below
proc sql;
create table work.temp as
select SEGMENTONIVEL1,SEGMENTONIVEL2, (COUNT(IDCUENTACLIENTE)) AS
COUNT_of_IDCUENTACLIENTE from S1.CUENTACLIENTE
where segmentonivel1 = 'Altas Recientes'
and segmentonivel2 = 'Masivo'
GROUP BY SEGMENTONIVEL1,SEGMENTONIVEL2;
quit;
/* code to create TXT file */
data _null_;
FILE "/tmp/MyFile.txt";
set work.temp;
put
SEGMENTONIVEL1
SEGMENTONIVEL2
COUNT_of_IDCUENTACLIENTE;
run;
If you want use the filename definition and you don't want to write the filename into datastep:
proc sql;
create table tableName as
select (COUNT(IDCUENTACLIENTE)) AS COUNT_of_IDCUENTACLIENTE from
S1.CUENTACLIENTE where segmentonivel1 = 'Altas Recientes'
and segmentonivel2 = 'Masivo'
GROUP BY SEGMENTONIVEL1,SEGMENTONIVEL2;
quit;
filename x "c:\temp\teszt.txt";
data _null_;
file x;
set work.tableName;
put COUNT_of_IDCUENTACLIENTE;
run;

SAS - renaming variables

I am trying to change the names of variables in my table/dataset. I went through several websites and this discussion forum, but I didnĀ“t manage to find any code that would work properly in my case (i am a newcomer to SAS).
My dataset contains 103 columns and I would like to rename the first 100 columns. The name of the first column is CFT(1), CFT(2) of the second column,..., CFT(100) of the 100th column. New variables can be called for example CFT_n(1),...,CFT_n(100).
The code I was using is following:
data vystup_m200_b;
set vystup_m200_a;
rename 'cft(1)'n - 'cft(100)'n='cft(1)_n'n - 'cft(100)_n'n;
run;
But I obtain an error stating:
Aplhabetic prefixes for enumerated variables (cft(1)-cft(100)) are different.
Thank you for any suggestion what I am doing wrong.
Even with validvarname=any the numeric suffix on a numbered variable list have to have the number as the last part of the name. You "could" use the features of PROC TRANSPOSE to flip-flop the data to rename the variables. This is only advisable if the data are rather small.
data ren;
array _a[*] 'cft(1)'n 'cft(2)'n 'cft(3)'n ( 1 2 3);
do i = 1 to 10;
output;
end;
drop i;
run;
proc transpose data=ren out=ren2;
run;
proc transpose data=ren2 out=renamed(drop=_name_) suffix=_N;
id _name_;
run;
If your variables are sequentially named, a simple macro will suffice:
option validvarname = any;
data ren;
array _a[*] 'cft(1)'n 'cft(2)'n 'cft(3)'n ( 1 2 3);
do i = 1 to 10;
output;
end;
drop i;
run;
%macro rename_loop;
%local i;
%do i = 1 %to 3;
"cft(&i)"n = "cft(&i)_n"n
%end;
%mend rename_loop;
proc datasets lib = work nolist nowarn nodetails;
modify ren;
rename %rename_loop;
run;
quit;
This should work more or less instantaneously, regardless of the size of the dataset, as it only needs to update the metadata.
Renaming is fastest. I would look to a more general solution that doesn't require knowing anything like the name or how many or if you need name literals.
data ren;
array _a[*] 'cft(1)'n 'cft(2)'n 'cft(3)'n (1 2 3);
do i = 1 to 10;
output;
end;
drop i;
run;
proc print;
run;
proc transpose data=ren(obs=0) out=ren2;
run;
proc sql noprint;
select catx('=',nliteral(_name_),nliteral(cats(_name_,'_n')))
into :renamelist separated by ' '
from ren2;
quit;
run;
%put NOTE: &=renamelist;
proc datasets nolist;
modify ren;
rename &renamelist;
run;
contents data=ren varnum short;
quit;
Another solution, which is renaming variables after upload:
proc import datafile="\\folder\RUN_00.xlsx"
dbms=xlsx out=run_00 replace;
run;
data rename;
length ren $32767;
set run_00(obs= 1);
keep ren delka;
array cfte{*} CFT:;
do i=1 to dim(cfte);
ren=strip(ren)||" 'cft("||strip(i)||")'n='cft_"||strip(i)||"_00'n";
delka=length(ren);
end;
call symputx("renam",ren);
run;
proc datasets library=work;
modify run_00;
rename &renam;
run;

Select character variables that have all missing values

I have a SAS dataset with around 3,000 variables, and I would like to get rid of the character variables for which all values are missing. I know how to do this for numeric variables-- I'm wondering specifically about the character variables. I need to do the work using base SAS, but that could include proc SQL, which is why I've tagged this one 'SQL' also.
Thank you!
Edit:
Background info: This is a tall dataset, with survey data from 7 waves of interviews. Some, but not all, of the survey items (variables) were repeated across waves. I'm trying to create a list of items that were actually used in each wave by pulling all the records for that wave, getting rid of all the columns that have nothing but SAS's default missing values, and then running proc contents.
I created a macro that will check for empty character columns and either remove them from the original or create a new data set with the empty columns removed. It takes two optional arguments: The name of the data set (default is the most recently created data set), and a suffix to name the new copy (set suffix to nothing to edit the original).
It uses proc freq with the levels option and a custom format to determine the empty character columns. proc sql is then used to create a list of the columns to be removed and store them in a macro variable.
Here is the macro:
%macro delemptycol(ds=_last_, suffix=_noempty);
option nonotes;
proc format;
value $charmiss
' '= ' '
other='1';
run;
%if "&ds"="_last_" %then %let ds=&syslast.;
ods select nlevels;
ods output nlevels=nlev;
proc freq data=&ds.(keep=_character_) levels ;
format _character_ $charmiss.;
run;
ods output close;
/* create macro var with list of cols to remove */
%local emptycols;
proc sql noprint;
select tablevar into: emptycols separated by ' '
from nlev
where NNonMissLevels=0;
quit;
%if &emptycols.= %then %do;
%put DELEMPTYCOL: No empty character columns were found in data set &ds.;
%end;
%else %do;
%put DELEMPTYCOL: The following empty character columns were found in data set &ds. : &emptycols.;
%put DELEMPTYCOL: Data set &ds.&suffix created with empty columns removed;
data &ds.&suffix. ;
set &ds(drop=&emptycols);
run;
%end;
options notes;
%mend;
Examples usage:
/* create some fake data: Here char5 will be empty */
data chardata(drop= j randnum);
length char1-char5 $8.;
array chars(5) char1-char5;
do i=1 to 100;
call missing(of char:);
randnum=floor(10*ranuni(i));
do j=2 to 5;
if (j-1)<randnum<=(j+1) then chars(j-1)="FOO";
end;
output;
end;
run;
%delemptycol(); /* uses default _last_ for the data and "_noempty" as the suffix */
%delemptycol(ds=chardata, suffix=); /* removes the empty columns from the original */
There's probably a simpler way but this is what I came up with.
Cheers
Rob
EDIT: Note that this works for both character and numeric variables.
**
** TEST DATASET
*;
data x;
col1 = "a"; col2 = ""; col3 = "c"; output;
col1 = "" ; col2 = ""; col3 = "c"; output;
col1 = "a"; col2 = ""; col3 = "" ; output;
run;
**
** GET A LIST OF VARIABLE NAMES
*;
proc sql noprint;
select name into :varlist separated by " "
from sashelp.vcolumn
where upcase(libname) eq "WORK"
and upcase(memname) eq "X";
quit;
%put &varlist;
**
** USE A MACRO TO CREATE A DATASTEP. FOR EACH COLUMN THE
** THE DATASTEP WILL CREATE A NEW COLUMN WITH THE SAME NAME
** BUT PREFIXED WITH "DELETE_". IF THERE IS AT LEAST 1
** NON-MISSING VALUE FOR THE COLUMN THEN THE "DELETE" COLUMN
** WILL FINISH WITH A VALUE OF 0, ELSE 1. WE WILL ONLY
** KEEP THE COLUMNS CALLED "DELETE_" AND OUTPUT ONLY A SINGLE
** OBSERVATION TO THE FINAL DATASET.
*;
%macro find_unused_cols(iDs=);
%local cnt;
data vars_to_delete;
set &iDs end=eof;
%let cnt = 1;
%let varname = %scan(&varlist, &cnt);
%do %while ("&varname" ne "");
retain delete_&varname;
delete_&varname = min(delete_&varname, missing(&varname));
drop &varname;
%let cnt = %eval(&cnt + 1);
%let varname = %scan(&varlist, &cnt);
%end;
if eof then do;
output;
end;
run;
%mend;
%find_unused_cols(iDs=x);
**
** GET A LIST OF VARIABLE NAMES FROM THE NEW DATASET
** THAT WE WANT TO DELETE AND STORE TO A MACRO VAR.
*;
proc transpose data=vars_to_delete out=vars_to_delete;
run;
proc sql noprint;
select substr(_name_,8) into :vars_to_delete separated by " "
from vars_to_delete
where col1;
quit;
%put &vars_to_delete;
**
** CREATE A NEW DATASET CONTAINING JUST THOSE VARS
** THAT WE WANT TO KEEP
*;
data new_x;
set x;
drop &vars_to_delete;
run;
Rob and cmjohns, thank you SO MUCH for your help. Based on your solutions and an idea I had over the weekend, here is what I came up with:
%macro removeEmptyCols(origDset, outDset);
* get the number of obs in the original dset;
%let dsid = %sysfunc(open(&origDset));
%let origN = %sysfunc(attrn(&dsid, nlobs));
%let rc = %sysfunc(close(&dsid));
proc transpose data= &origDset out= transpDset;
var _all_;
run;
data transpDset;
set transpDset;
* proc transpose converted all old vars to character,
so the . from old numeric vars no longer means 'missing';
array oldVar_ _character_;
do over oldVar_;
if strip(oldVar_) = "." then oldVar_ = "";
end;
* each row from the old dset is now a column with varname starting with 'col';
numMiss = cmiss(of col:);
numCols = &origN;
run;
proc sql noprint;
select _NAME_ into: varsToKeep separated by ' '
from transpDset
where numMiss < numCols;
quit;
data &outDset;
set &origDset (keep = &varsToKeep);
run;
%mend removeEmptyCols;
I will try all 3 ways and report back on which one is fastest...
P.S. added 23 Dec 2010 for future reference: SGF Paper 048-2010: Dropping Automatically Variables with Only Missing Values
This is very simple method useful for all variables
proc freq data=class nlevels ;
ods output nlevels=levels(where=(nmisslevels>0 and nnonmisslevels=0));
run;
proc sql noprint;
select TABLEVAR into :_MISSINGVARS separated by ' ' from levels;
quit;
data want;
set class (keep=&_MISSINGVARS);
run;