Rename column headers - either after a key database in sas - or after values from first row - sql

I need to rename the column headers of my variables so they match what I have in my key list. I attached a picture below to describe what I have and what I need.
My Data
I don't necesarily need actual code, just an idea of how to make it happen. :)
Thank you so much folks, and so sorry about the changes, I have never posted a question before.

If you have a table like
NEW1 NEW2 NEW3
OLDX OLDY OLDZ
And you want to use it to generate rename statement like
rename oldx=new1 oldy=new2 oldz=new3 ;
Then an easy way to do it is to use PROC TRANSPOSE to convert it into a separate row for each name pair.
proc transpose data=have out=names ;
var _all_;
run;
Which will get you a table like
_NAME_ COL1
NEW1 OLDX
NEW2 OLDY
NEW3 OLDZ
Then you can either use PROC SQL to quickly generate a macro variable with the pairs.
proc sql noprint;
select catx('=',col1,_name_) into :rename separated by ' '
from names;
quit;
data new ;
set old;
rename &rename ;
run;
If the list of names is too long to put into a single macro variable then just use a data step to generate the rename statement to a text file and use %INCLUDE to run it where you want.
filename code temp;
data _null_;
set names end=eof;
file code ;
if _n_=1 then put 'rename' ;
put col1 '=' _name_ ;
if eof then put ';';
run;
data new ;
set old;
%include code ;
run;
EDIT
You could probably do the last step directly from the data set and skip the proc transpose.
filename code temp;
data _null_;
set have ;
array _X _character_ ;
file code ;
put 'rename ' # ;
do i=1 to dim(_X);
oldname = _x(i);
newname = vname(_x(i));
put oldname '=' newname #;
end;
put / ';' ;
stop;
run;

You can use column aliases to change what's displayed in the results header row.
SELECT A AS 'NewA',
B AS 'OtherB',
C AS 'diffC'
FROM <<Table>>
If you want 'NewA OtherB diffC' as a row in the results, you could do this:
SELECT 'NewA' AS 'A',
'OtherB' AS 'B',
'diffC' AS 'C'
UNION
SELECT A,
B,
C
FROM <<Table>>

Related

Delete prefix from multiple variables

I have a dataset "Dairy1" with variables labeled as '240, 241, 242 ..." but the actual name is '_240, _241, _242 ...'.
How can I delete the prefix "_" from the name of all these variables? I tried to use the following code in SAS 9.4 but it doesn't work.
**proc sql noprint;
select cats(name,'=',scan(name, 1, '_'))
into :suffixlist
separated by ' '
from dictionary.columns
where libname = 'WORK' and memname = 'Dairy1' and '240' = scan(name, 2, '_');
quit;
%put &suffixlist.;
data want;
set Dairy1;
rename &suffixlist.;
run;**
It shows me the following:
WARNING: Apparent symbolic reference SUFFIXLIST not resolved.
ERROR 22-322: Syntax error, expecting one of the following: un nombre, ;.
Thanks in advance
I did similar solution the other way around, which added suffix to variables.
What this does, is to create single string, which contains all the variable rename-commands.
Note that this is a bit stupid as it will remove all characters. Just modify the logic to suit your needs in data sjm_tmp2.
%macro remove_str(Dataset, str);
proc contents noprint
data=work.&dataset out=sjm_tmp(keep=NAME);
run;
data sjm_tmp2;
set sjm_tmp;
help= tranwrd(NAME, "&str.", '');
foobar=cats(name, '=',help);
run;
proc sql noprint;
select foobar into :sjm_list separated by ' ' from sjm_tmp2;
quit;
proc datasets library = work nolist;
modify &dataset;
rename &sjm_list;
quit;
proc datasets library=work noprint;
delete sjm_tmp sjm_tmp2 ;
run;
%mend;
Tested with code:
data test;
_a=1;
_b=1;
_c=1;
_d=1;
_e=1;
_f=1;
run;
%remove_str(test, _);

Get Data from SAS Macro (list of values) to SAS table (column)

I am trying to create a SAS table from Macro variable using PROC SQL:
I have a list of value saved in a macro variable :
%let l=1,2,3;
I want to create a SAS table with a column containing the values of the macro variable :
1
2
3
Thank you very much for your help.
Sincerely,
Abdeljalil
you should so some effort to solving this yourself.
Put the values into a string, parse the string and output the values you would like.
%let l=1,2,3;
data want;
str = "&l";
do i=1 to countw(str,',');
value = input(scan(str,i,","),best.);
output;
end;
/*drop other variables if you want*/
drop str i;
run;
Something like this?
%let age=%str(12,13,15);
proc sql;
select * from sashelp.class where age in (&age);
quit;
You have a data set that contains a list of names and you want to place these names into a macro variable for later use. That will work as long as the macro variable does not go beyond the 64K limit.
If the value hits this limit, then you can use macro processing to retrieve the names from the data set. Since a macro definition does not have the 64K restriction, it can be used to create the list for you.
In the sample code on the Full Code tab, we have a list of names that we want to use on an INPUT statement along with a given informat. This sample demonstrates how to create the list without having to use a macro variable.
data one;
input name $;
datalines;
abc
def
ghi
;
run;
%macro test;
%let dsid=%sysfunc(open(one));
%let cnt=%sysfunc(attrn(&dsid,nobs));
%do i=1 %to &cnt;
%let rc=%sysfunc(fetchobs(&dsid,&i));
%cmpres(%sysfunc(getvarc(&dsid,%sysfunc(varnum(&dsid,name))))) $4.
%end;
%let rc=%sysfunc(close(&dsid));
%mend test;
/** Using %PUT to see outcome **/
/** %test could be used on an INPUT statement **/
%put %test;
source: http://support.sas.com/kb/39/605.html

SAS - renaming variables

I am trying to change the names of variables in my table/dataset. I went through several websites and this discussion forum, but I didnĀ“t manage to find any code that would work properly in my case (i am a newcomer to SAS).
My dataset contains 103 columns and I would like to rename the first 100 columns. The name of the first column is CFT(1), CFT(2) of the second column,..., CFT(100) of the 100th column. New variables can be called for example CFT_n(1),...,CFT_n(100).
The code I was using is following:
data vystup_m200_b;
set vystup_m200_a;
rename 'cft(1)'n - 'cft(100)'n='cft(1)_n'n - 'cft(100)_n'n;
run;
But I obtain an error stating:
Aplhabetic prefixes for enumerated variables (cft(1)-cft(100)) are different.
Thank you for any suggestion what I am doing wrong.
Even with validvarname=any the numeric suffix on a numbered variable list have to have the number as the last part of the name. You "could" use the features of PROC TRANSPOSE to flip-flop the data to rename the variables. This is only advisable if the data are rather small.
data ren;
array _a[*] 'cft(1)'n 'cft(2)'n 'cft(3)'n ( 1 2 3);
do i = 1 to 10;
output;
end;
drop i;
run;
proc transpose data=ren out=ren2;
run;
proc transpose data=ren2 out=renamed(drop=_name_) suffix=_N;
id _name_;
run;
If your variables are sequentially named, a simple macro will suffice:
option validvarname = any;
data ren;
array _a[*] 'cft(1)'n 'cft(2)'n 'cft(3)'n ( 1 2 3);
do i = 1 to 10;
output;
end;
drop i;
run;
%macro rename_loop;
%local i;
%do i = 1 %to 3;
"cft(&i)"n = "cft(&i)_n"n
%end;
%mend rename_loop;
proc datasets lib = work nolist nowarn nodetails;
modify ren;
rename %rename_loop;
run;
quit;
This should work more or less instantaneously, regardless of the size of the dataset, as it only needs to update the metadata.
Renaming is fastest. I would look to a more general solution that doesn't require knowing anything like the name or how many or if you need name literals.
data ren;
array _a[*] 'cft(1)'n 'cft(2)'n 'cft(3)'n (1 2 3);
do i = 1 to 10;
output;
end;
drop i;
run;
proc print;
run;
proc transpose data=ren(obs=0) out=ren2;
run;
proc sql noprint;
select catx('=',nliteral(_name_),nliteral(cats(_name_,'_n')))
into :renamelist separated by ' '
from ren2;
quit;
run;
%put NOTE: &=renamelist;
proc datasets nolist;
modify ren;
rename &renamelist;
run;
contents data=ren varnum short;
quit;
Another solution, which is renaming variables after upload:
proc import datafile="\\folder\RUN_00.xlsx"
dbms=xlsx out=run_00 replace;
run;
data rename;
length ren $32767;
set run_00(obs= 1);
keep ren delka;
array cfte{*} CFT:;
do i=1 to dim(cfte);
ren=strip(ren)||" 'cft("||strip(i)||")'n='cft_"||strip(i)||"_00'n";
delka=length(ren);
end;
call symputx("renam",ren);
run;
proc datasets library=work;
modify run_00;
rename &renam;
run;

macro into a table or a macro variable with sas

I'm having this macro. The aim is to take the name of variables from the table dicofr and put the rows inside into variable name using a symput.
However , something is not working correctly because that variable, &nvarname, is not seen as a variable.
This is the content of dico&&pays&l
varname descr
var12 aza
var55 ghj
var74 mcy
This is the content of dico&&pays&l..1
varname
var12
var55
var74
Below is my code
%macro testmac;
%let pays1=FR ;
%do l=1 %to 1 ;
data dico&&pays&l..1 ; set dico&&pays&l (keep=varname);
call symput("nvarname",trim(left(_n_))) ;
run ;
data a&&pays&l;
set a&&pays&l;
nouv_date=mdy(substr(date,6,2),01,substr(date,1,4));
format nouv_date monyy5.;
run;
proc sql;
create table toto
(nouv_date date , nomvar varchar (12));
quit;
proc sql;
insert into toto SELECT max(nouv_date),"&nvarname" as nouv_date as varname FROM a&&pays&l WHERE (&nvarname ne .);
%end;
%mend;
%testmac;
A subsidiary question. Is it possible to have the varname and the date related to that varname into a macro variable? My man-a told me about this but I have never done that before.
Thanks in advance.
Edited:
I have this table
date col1 col2 col3 ... colx
1999M12 . . . .
1999M11 . 2 . .
1999M10 1 3 . 3
1999M9 0.2 3 2 1
I'm trying to do know the name of the column with the maximum date , knowing the value inside of the column is different than a missing value.
For col1, it would be 1999M10. For col2, it would be 1999M11 etc ...
Based on your update, I think the following code does what you want. If you don't mind sorting your input dataset first, you can get all the values you're looking for with a single data step - no macros required!
data have;
length date $7;
input date col1 col2 col3;
format date2 monyy5.;
date2 = mdy(substr(date,6,2),1,substr(date,1,4));
datalines;
1999M12 . . .
1999M11 . 2 .
1999M10 1 3 .
1999M09 0.2 3 2
;
run;
/*Required for the following data step to work*/
/*Doing it this way allows us to potentially skip reading most of the input data set*/
proc sort data = have;
by descending date2;
run;
data want(keep = max_date:);
array max_dates{*} max_date1-max_date3;
array cols{*} col1-col3;
format max_date: monyy5.;
do until(eof); /*Begin DOW loop*/
set have end = eof;
/*Check to see if we've found the max date for each col yet.*/
/*Save the date for that col if applicable*/
j = 0;
do i = 1 to dim(cols);
if missing(max_dates[i]) and not(missing(cols[i])) then max_dates[i] = date2;
j + missing(max_dates[i]);
end;
/*Use j to count how many cols we still need dates for.*/
/* If we've got a full set, we can skip reading the rest of the data set*/
if j = 0 then do;
output;
stop;
end;
end; /*End DOW loop*/
run;
EDIT: if you want to output the names alongside the max date for each, that can be done with a slight modification:
data want(keep = col_name max_date);
array max_dates{*} max_date1-max_date3;
array cols{*} col1-col3;
format max_date monyy5.;
do until(eof); /*Begin DOW loop*/
set have end = eof;
/*Check to see if we've found the max date for each col yet.*/
/*If not then save date from current row for that col*/
j = 0;
do i = 1 to dim(cols);
if missing(max_dates[i]) and not(missing(cols[i])) then max_dates[i] = date2;
j + missing(max_dates[i]);
end;
/*Use j to count how many cols we still need dates for.*/
/* If we've got a full set, we can skip reading the rest of the data set*/
if j = 0 or eof then do;
do i = 1 to dim(cols);
col_name = vname(cols[i]);
max_date = max_dates[i];
output;
end;
stop;
end;
end; /*End DOW loop*/
run;
It looks to me that you're trying to use macros to generate INSERT INTO statements to populate your table. It's possible to do this without using macros at all which is the approach I'd recommend.
You could use a datastep statement to write out the INSERT INTO statements to a file. Then following the datastep, use a %include statement to run the file.
This will be easier to write/maintain/debug and will also perform better.

Select character variables that have all missing values

I have a SAS dataset with around 3,000 variables, and I would like to get rid of the character variables for which all values are missing. I know how to do this for numeric variables-- I'm wondering specifically about the character variables. I need to do the work using base SAS, but that could include proc SQL, which is why I've tagged this one 'SQL' also.
Thank you!
Edit:
Background info: This is a tall dataset, with survey data from 7 waves of interviews. Some, but not all, of the survey items (variables) were repeated across waves. I'm trying to create a list of items that were actually used in each wave by pulling all the records for that wave, getting rid of all the columns that have nothing but SAS's default missing values, and then running proc contents.
I created a macro that will check for empty character columns and either remove them from the original or create a new data set with the empty columns removed. It takes two optional arguments: The name of the data set (default is the most recently created data set), and a suffix to name the new copy (set suffix to nothing to edit the original).
It uses proc freq with the levels option and a custom format to determine the empty character columns. proc sql is then used to create a list of the columns to be removed and store them in a macro variable.
Here is the macro:
%macro delemptycol(ds=_last_, suffix=_noempty);
option nonotes;
proc format;
value $charmiss
' '= ' '
other='1';
run;
%if "&ds"="_last_" %then %let ds=&syslast.;
ods select nlevels;
ods output nlevels=nlev;
proc freq data=&ds.(keep=_character_) levels ;
format _character_ $charmiss.;
run;
ods output close;
/* create macro var with list of cols to remove */
%local emptycols;
proc sql noprint;
select tablevar into: emptycols separated by ' '
from nlev
where NNonMissLevels=0;
quit;
%if &emptycols.= %then %do;
%put DELEMPTYCOL: No empty character columns were found in data set &ds.;
%end;
%else %do;
%put DELEMPTYCOL: The following empty character columns were found in data set &ds. : &emptycols.;
%put DELEMPTYCOL: Data set &ds.&suffix created with empty columns removed;
data &ds.&suffix. ;
set &ds(drop=&emptycols);
run;
%end;
options notes;
%mend;
Examples usage:
/* create some fake data: Here char5 will be empty */
data chardata(drop= j randnum);
length char1-char5 $8.;
array chars(5) char1-char5;
do i=1 to 100;
call missing(of char:);
randnum=floor(10*ranuni(i));
do j=2 to 5;
if (j-1)<randnum<=(j+1) then chars(j-1)="FOO";
end;
output;
end;
run;
%delemptycol(); /* uses default _last_ for the data and "_noempty" as the suffix */
%delemptycol(ds=chardata, suffix=); /* removes the empty columns from the original */
There's probably a simpler way but this is what I came up with.
Cheers
Rob
EDIT: Note that this works for both character and numeric variables.
**
** TEST DATASET
*;
data x;
col1 = "a"; col2 = ""; col3 = "c"; output;
col1 = "" ; col2 = ""; col3 = "c"; output;
col1 = "a"; col2 = ""; col3 = "" ; output;
run;
**
** GET A LIST OF VARIABLE NAMES
*;
proc sql noprint;
select name into :varlist separated by " "
from sashelp.vcolumn
where upcase(libname) eq "WORK"
and upcase(memname) eq "X";
quit;
%put &varlist;
**
** USE A MACRO TO CREATE A DATASTEP. FOR EACH COLUMN THE
** THE DATASTEP WILL CREATE A NEW COLUMN WITH THE SAME NAME
** BUT PREFIXED WITH "DELETE_". IF THERE IS AT LEAST 1
** NON-MISSING VALUE FOR THE COLUMN THEN THE "DELETE" COLUMN
** WILL FINISH WITH A VALUE OF 0, ELSE 1. WE WILL ONLY
** KEEP THE COLUMNS CALLED "DELETE_" AND OUTPUT ONLY A SINGLE
** OBSERVATION TO THE FINAL DATASET.
*;
%macro find_unused_cols(iDs=);
%local cnt;
data vars_to_delete;
set &iDs end=eof;
%let cnt = 1;
%let varname = %scan(&varlist, &cnt);
%do %while ("&varname" ne "");
retain delete_&varname;
delete_&varname = min(delete_&varname, missing(&varname));
drop &varname;
%let cnt = %eval(&cnt + 1);
%let varname = %scan(&varlist, &cnt);
%end;
if eof then do;
output;
end;
run;
%mend;
%find_unused_cols(iDs=x);
**
** GET A LIST OF VARIABLE NAMES FROM THE NEW DATASET
** THAT WE WANT TO DELETE AND STORE TO A MACRO VAR.
*;
proc transpose data=vars_to_delete out=vars_to_delete;
run;
proc sql noprint;
select substr(_name_,8) into :vars_to_delete separated by " "
from vars_to_delete
where col1;
quit;
%put &vars_to_delete;
**
** CREATE A NEW DATASET CONTAINING JUST THOSE VARS
** THAT WE WANT TO KEEP
*;
data new_x;
set x;
drop &vars_to_delete;
run;
Rob and cmjohns, thank you SO MUCH for your help. Based on your solutions and an idea I had over the weekend, here is what I came up with:
%macro removeEmptyCols(origDset, outDset);
* get the number of obs in the original dset;
%let dsid = %sysfunc(open(&origDset));
%let origN = %sysfunc(attrn(&dsid, nlobs));
%let rc = %sysfunc(close(&dsid));
proc transpose data= &origDset out= transpDset;
var _all_;
run;
data transpDset;
set transpDset;
* proc transpose converted all old vars to character,
so the . from old numeric vars no longer means 'missing';
array oldVar_ _character_;
do over oldVar_;
if strip(oldVar_) = "." then oldVar_ = "";
end;
* each row from the old dset is now a column with varname starting with 'col';
numMiss = cmiss(of col:);
numCols = &origN;
run;
proc sql noprint;
select _NAME_ into: varsToKeep separated by ' '
from transpDset
where numMiss < numCols;
quit;
data &outDset;
set &origDset (keep = &varsToKeep);
run;
%mend removeEmptyCols;
I will try all 3 ways and report back on which one is fastest...
P.S. added 23 Dec 2010 for future reference: SGF Paper 048-2010: Dropping Automatically Variables with Only Missing Values
This is very simple method useful for all variables
proc freq data=class nlevels ;
ods output nlevels=levels(where=(nmisslevels>0 and nnonmisslevels=0));
run;
proc sql noprint;
select TABLEVAR into :_MISSINGVARS separated by ' ' from levels;
quit;
data want;
set class (keep=&_MISSINGVARS);
run;