Would the following code work?
I'm dropping duplcated columns from my tables. I feel a bit confused after thinking about it. My code looks like working but I'm concerned about unseen mistakes.
proc sql;
create table toto
as select min(nomvar) as nomvar,count(intitule) as compte
from dicoat
group by intitule
having count(intitule) > 1;
data work.toto;
set toto;
do while(cpte>=1);
proc sql;
delete from dicoat where nomvar in (select nomvar from toto);
insert into toto
select min(nomvar) as nomvar,count(intitule) as compte from dicoat
group by intitule
having count(intitule) > 1;
end;
run;
data _null_;
file tempf;
set toto end=lastobs;
if _n_=1 then put "data aat;set aat (drop=";
put var /;
if lastobs then put ");run;";
run;
%inc tempf;
filename tempf clear;
After some thoughts and some questioning (ok - lots of questionning), one of my acquaintance helped me out with this
proc sort data=dicoat;
by title;
run;
data _null_;
set dicoat end=last;
length dropvar $1000;
retain dropvar;
by title;
if not first.title then dropvar = catx(' ',dropvar,nomvar);
if last then call symput('dropvar',trim(dropvar));
run;
data aat;
set aat(drop=&DROPVAR.);
run;
It should do the tricks of removing duplicate columns. And no,proc sql does not work within a data step.
Best.
Related
I need some help with macros in SAS. I want to sum variables (for example, from v_1 to v_7) to aggregate them, grouping by year. There are plenty of them, so I want to use macro. However, it doesn't work (I get only v_1) I would really appreciate Your help.
%macro my_macro();
%local i;
%do i = 1 %to 7;
proc sql;
create table my_table as select
year,
sum(v_&i.) as v_&i.
from my_table
group by year
;
quit;
%end;
%mend;
/* I don't know to run this macro - is it ok? */
data run_macro;
set my_table;
%my_macro();
run;
The macro processor just generates SAS code and then passes onto to SAS to run. You are calling a macro that generates a complete SAS step in the middle of your DATA step. So you are trying to run this code:
data run_macro;
set my_table;
proc sql;
create table my_table as select
year,
sum(v_1) as v_1
from my_table
group by year
;
quit;
proc sql;
create table my_table as select
year,
sum(v_1) as v_1
from my_table
group by year
;
quit;
...
So first you make a copy of MY_TABLE as RUN_MACRO. Then you overwrite MY_TABLE with a collapsed version of MY_TABLE that has just two variables and only one observations per year. Then you try to collapse it again but are referencing a variable named V_2 that no longer exists.
If you simply move the %DO loop inside the generation of the SQL statement it should work. Also don't overwrite your input dataset. Here is version of the macro will create a new dataset name MY_NEW_TABLE with 8 variables from the existing dataset named MY_TABLE.
%macro my_macro();
%local i;
proc sql;
create table my_NEW_table as
select year
%do i = 1 %to 7;
, sum(v_&i.) as v_&i.
%end;
from my_table
group by year
;
quit;
%mend;
%my_macro;
Note if this is all you are doing then just use PROC SUMMARY. With regular SAS code instead of SQL code you can use variable lists like v_1-v_7. So there is no need for code generation.
proc summary nway data=my_table ;
class year ;
var v_1 - v_7;
output out=my_NEW_table sum=;
run;
I have about 100 large datasets and within each dataset I'm hoping to extract distinct IDs to join them vertically. The datasets are unsorted, named as data_01 , data_02, data_03 ....data_100.
Since the datasets are all very large, set them together without reducing the size is not feasible, the join didn't even move after hours of running. Therefore, I believe there is the need to reduce the datasets before stacking is necessary, and I'm here to seek some help.
I tried to create a macro to select distinct ID and sum a numerical variable,cnt, by ID before vertically joining all datasets by proc sql union. The macro is not working properly:
/*Get dataset names*/
proc sql noprint;
select memname into :mylist separated by ' '
from dictionary.tables where libname= "mylib" and upcase(memname) like "DATA_%"
;
quit;
%put &mylist;
/*create union statements*/
%global nextdata;
%let nextdata =;
%macro combinedata(mylist);
data _null_;
datanum = countw("&mylist");
call symput('Dataset', put(datanum, 10.));
run;
%do i = 1 %to &Dataset ;
data _null_;
temp = scan("&mylist", &i);
call symput("Dataname", strip(put(temp,$12.)));
run;
%put &Dataname;
%put &Dataset;
%if (&i=&Dataset) %then %do;
%let nextdata = &nextdata.
select id, sum(cnt)
from mylib.&&Dataname
group by id;
%end;
%else %do;
%let nextdata = &nextdata.
select id, sum(cnt)
from mylib.&&Dataname union
group by id;
%end;
%put nextdata = &nextdata;
%end;
%mend combinedata;
%combinedata(&mylist);
/*execute from proc sql*/
proc sql;
create table combined as (&nextdata);
quit;
I have also attempted to use proc summary, but there was not enough memory to run the following code:
data vneed / view=vneed;
set data_: (keep=id cnt);
run;
proc summary data=vneed nway;
class id;
var cnt;
output out=want (drop=_type_) sum=sumcnt;
run;
Appreciate any help!
If the number of values of ID is reasonable you should be able to use a hash object.
data _null_ ;
if _n_=1 then do;
dcl hash H (ordered: "A") ;
h.definekey ("ID") ;
h.definedata ("ID", "SUMCNT") ;
h.definedone () ;
end;
set data_: (keep=id cnt) end=eof;
if h.find() then sumcnt=.;
sumcnt+cnt ;
h.replace() ;
if eof then h.output (dataset: "WANT") ;
run ;
If the number of ID values is too large to fit the summary data into a HASH object you could adapt this code to stop at some reasonable number of distinct ID values to avoid memory overload and write the current summary to an actual SAS dataset and then generate the final counts by re-aggregating the intermediate datasets. But at that point you should just use my other answer and let PROC SQL create the intermediate summary datasets instead.
Summarize the data as you go instead of trying to generate one massive query. Then re-aggregate the aggregates.
proc sql ;
%do i = 1 %to &Dataset ;
%let dataname=mylib.%scan(&mylist,&i,%str( ));
create table sum&i as
select id,sum(cnt) as cnt
from &dataname
group by id
order by id
;
%end;
quit;
data want ;
do until(last.id);
set sum1 - sum&dataset ;
by id;
sumcnt+cnt;
end;
drop cnt;
run;
Is it possible to create an array of column variables within sql to perform an operation like the following (please excuse the syntax):
array(Col1,Col2,Col3);
update tempTable
for(i=1;i<3;i++){
set array[i] =
case missing(array[i])
then 0*1
else
array[i]*1
end
};
note: I am using a proc SQL step in SAS
Desired function:
Perform the operation in the for loop above on multiple columns of a table, without writing a separate set statement for each column.
It is possible to do what you are looking for with a SAS macro.
It is easier, if this is a local SAS table, to just update it with the Data Step.
data have;
set have;
array v[3] col1 col2 col3;
do i=1 to 3;
v[i] = sum(v[i],0);
end;
drop i;
run;
The sum() function sums values (obviously). If a value is missing, it is not added and the remaining values are added. So you will get 0 in the case of missing and the value of the column when it is not.
SAS Macros write SAS code for you. They are pre-compiler scripts that generate SAS Code.
You want code that looks like
update table
set col1 = ... ,
col2 = ... ,
.... ,
;
Here is a script. It generates a test table, defines the macro, and then calls the macro on the table. It uses the sum() function from my other answer.
data have;
array col[3];
do r=1 to 100;
do i=1 to 3;
if ranuni(123)> .8 then
col[i] = .;
else
col[i] = rannor(123);
end;
output;
end;
drop i r;
run;
%macro sql_zero_if_missing(data, cols);
%local n i col;
%let n=%sysfunc(countw(&cols));
proc sql noprint;
update &data
set
%do i=1 %to &n;
%let col=%scan(&cols,&i);
&col = sum(&col,0)
%if &i ^= &n %then , ;
%end;
;
quit;
%mend;
options mprint;
%sql_zero_if_missing(have, col1 col2 col3);
The MPRINT option will let you see the SAS code that was generated. Here is the log:
MPRINT(SQL_ZERO_IF_MISSING): proc sql noprint;
MPRINT(SQL_ZERO_IF_MISSING): update have set col1 = sum(col1,0) ,
col2 = sum(col2,0) , col3 = sum(col3,0) ;
NOTE: 100 rows were updated
in WORK.HAVE.
MPRINT(SQL_ZERO_IF_MISSING): quit;
I have some code which appends yesterday's data to [large dataset], using proc append. After doing so it changes the value of the variable "latest_date" in another dataset to yesterday's date, thus showing the maximum date value in [large dataset] without a time-consuming data step or proc sql.
How can I check, within the same program in which proc append is used, whether proc append was successful (no errors)? My goal is to change the "latest_date" variable in this secondary dataset only if the append is successful.
Try the automatic macro variable &SYSCC.
data test;
do i=1 to 10;
output;
end;
run;
data t1;
i=11;
run;
data t2;
XXX=12;
run;
proc append base=test data=t1;
run;
%put &syscc;
proc append base=test data=t2;
run;
%put &syscc;
I'm using the %get_table_size macro, which I found here. My steps are
run %get_table_size(large_table, size_preappend)
Create dataset called to_append
run %get_table_size(to_append, append_size)
run proc append
run %get_table_size(large_table, size_postappend)
Check if &size_postappend = &size_preappend + &append_size
Using &syscc isn't exactly what I wanted, because it doesn't check specifically for an error in proc append. It could be thrown off by earlier errors.
You can do this by counting how many records are in the table pre and post appending. This would work with any sas table or database.
The best practice is to always have control table for your process to log run time and number of records read.
Code:
/*Create input data*/
data work.t1;
input row ;
datalines;
1
2
;;
run;
data work.t2;
input row ;
datalines;
3
;;
run;
/*Create Control table, Run this bit only once, otherwise you delete the table everytime*/
data work.cntrl;
length load_dt 8. source 8. delta 8. total 8. ;
format load_dt datetime21.;
run;
proc sql; delete * from work.cntrl; quit;
/*Count Records before append*/
proc sql noprint ; select count(*) into: count_t1 from work.t1; quit;
proc sql noprint; select count(*) into: count_t2 from work.t2; quit;
/*Append data*/
proc append base=work.t1 data=work.t2 ; run;
/*Count Records after append*/
proc sql noprint ; select count(*) into: count_final from work.t1; quit;
/*Insert counts and timestampe into the Control Table*/
proc sql noprint; insert into work.cntrl
/*values(input(datetime(),datetime21.), input(&count_t1.,8.) , input(&count_t2.,8.) , input(&count_final.,8.)) ; */
values(%sysfunc(datetime()), &count_t1. , &count_t2., &count_final.) ;
quit;
Output: Control table is updated
%let rows = "";
%macro test;
proc sql noprint;
select count(ID)
into: sqlRows
from mytbl;
quit;
%do i = 1 %to &sqlRows; * loop from 1 to sqlRows;
proc sql noprint;
select ID
into: ColumnID
from mytbl(firstobs= &i);
quit;
%if &rows eq "" %then %do
%let rows = "<tr><td>&ColumnID</td></tr>";
%end;
%if &rows ne "" %then %do
%let rows = "&rows<tr><td>&ColumnID</td></tr>";
%end;
%end;*End loop;
%mend;
%test;
%put &rows;
Hi I want to put all data of column ID data of mytbl into a variable.
I've created a variable named rows and assigned empty value in it. Then using loop I'm getting the values one by one of mytab and saving them in columnID variable. if rows variable is empty then only add tr and td with columnID data. if rows variable is not empty then append it. but it's only giving me the last record of my table.
lets say mytbl has data 1,2 and 3 in ID column
rows variable should have data as
<tr><td>1</td></tr><tr><td>2</td></tr><tr><td>3</td></tr>
but its only showing me data of last row as
<tr><td>3</td></tr>
You've got a few different problems, starting with some missing semicolons. More importantly, your code is more complex than it needs to be. You can get what you want with one PROC SQL step using SELECT INTO:, you don't need a separate PROC SQL step for each record. Play around with:
data have;
do ID=1 to 3;
output;
end;
run;
proc sql noprint;
select cats('<tr><td>',ID,'</td></tr>')
into :Rows
separated by ""
from have;
quit;
%put &rows;
I think you're severely misunderstanding what macro variables are, as opposed to regular variables, in SAS. You don't say exactly what you're going to eventually do with this, but nonetheless.
First off, macro variables don't take quotation marks; if they contain them, they're treated just as regular characters. So:
%let var = "";
%let var = "&var.123";
%put &=var.;
will return
"""123"
since it doesn't really know much about the quotation marks (it is somewhat aware of them, but it doesn't treat them the way a normal SAS variable does).
Second, as Quentin correctly points out, why on earth are you using SQL to go a row at a time? That's basically the opposite reason as what you'd use SQL for. SQL is great for doing something to the whole dataset at once, it's absolutely horrible at one row at a time- that's what the data step is for.
If you actually want a SAS variable, or you want to process things a row at a time, you should just use the data step:
data want;
set mytbl end=eof;
retain rows; *do not need to initialize to missing, that is normal;
length rows $32767;
rows = cats(rows,"<tr><td>",ColID,"</td></tr>");
if eof then output;
run;
You'd usually do that if you were going to use call execute, for example if you planned to put this to an HTML page (in a stored proc for example) with some wrapper code that you wanted to execute, in if _n_=1 for the start and if eof for the end.