I got a piece of code on SAS that predicts consumer behavior. So far I did 50 samples with 50 logistic regression by hand, but I'd like to automate this process.
Steps are as follows:
Create a table with all client having value "1"
Create a table with all client having value "0"
(below code starts here) Start a loop that:
Get a sample of 3000 people from client having value "1"
Get a sample of 3000 people from client having value "0"
Join those two tables
Logistic regression which should get as output (ROC value, and Maximum Likelihood Estimates)
Get a all probabilities in the same file MODELE_RESULTS
You'll find below a piece of the code. Can you advice me on how to loop this logistic regression 50 times please? So far I can't make it work... I'm beginner in SQL
%macro RunReg (DSName, NumVars) ;
%do i=1 %to &NumVars
/* Create a 3000 people sample called TOP_1*/
PROC SURVEYSELECT DATA= TOP_1
OUT= ALEA_1
METHOD=SRS
N=3000;
QUIT;
/* Create a 3000 people sample called TOP_0*/
PROC SURVEYSELECT DATA= TOP_0
OUT= ALEA_0
METHOD=SRS
N=3000;
QUIT;
/*Append both tables */
PROC SQL;
CREATE TABLE BOTH_SAMPLES As
SELECT * FROM TOP_1
OUTER UNION CORR
SELECT * FROM TOP_0;
QUIT;
/* Logistic regression*/
DATA WORK.&DSName noprint
Outset=PE(rename(x&i=Value));
Model Y = x&I;
SET WORK.APPEND_TABLE(IN=__ORIG) WORK.BASE_PREDICT_2;
__FLAG=__ORIG;
__DEP=TOP_CREDIT_HABITAT_2017;
if not __FLAG then TOP_CREDIT_HABITAT_2017=.;
RUN;
PROC SQL;
CREATE VIEW WORK.SORTTempTableSorted AS
SELECT *
FROM WORK.TMP0TempTableAddtnlPredictData
;
QUIT;
TITLE;
TITLE1 "Résultats de la régression logistique";
FOOTNOTE;
FOOTNOTE1 "Généré par le Système SAS (&_SASSERVERNAME, &SYSSCPL) le %TRIM(%QSYSFUNC(DATE(), NLDATE20.)) à %TRIM(%SYSFUNC(TIME(), TIMEAMPM12.))";
PROC LOGISTIC DATA=WORK.SORTTempTableSorted
PLOTS(ONLY)=NONE
;
CLASS age_classe (PARAM=EFFECT) Flag_bq_principale (PARAM=EFFECT) flag_univers_detenus (PARAM=EFFECT) csp_1 (PARAM=EFFECT) SGMT_FIDELITE (PARAM=EFFECT) situ_fam_1 (PARAM=EFFECT);
MODEL TOP_CREDIT_HABITAT_2017 (Event = '1')=top_situ_particuliere top_chgt_csp_6M top_produit_monetaire_bloque top_CREDIT top_chgt_contrat_travail_6M top_credit_CONSO top_credit_HABITAT top_produit_monetaire_dispo top_VM_autres top_Sicav top_produit_epargne_logement top_Predica top_ferm_prod_6M top_ouv_prod_6M top_produit_Assurance top_produit_Cartes top_produit_Credit "moy_surface_financière_6M"n moy_surf_financiere_ecart_6M moy_encours_dav_6M moy_encours_dav_ecart_6M moy_monetaire_dispo_6M moy_monetaire_dispo_ecart_6M moy_emprunts_6M moy_emprunts_ecarts_6M moy_sicav_6M moy_sicav_ecart_6M moy_vm_autres_6M moy_vm_autres_ecart_6M moy_predica_6M moy_predica_ecart_6M moy_bgpi_6M moy_bgpi_ecart_6M moy_epargne_logement_6M moy_epargne_logement_ecart_6M "moy.an_mt_flux_cred_norme_B2"n "moy.an_mt_op_cred_ep_a_terme"n "moy.an_mt_op_debit_ep_a_terme"n "moy.an_mt_ope_credit_depot"n "moy.an_mt_ope_credit_ep_a_vue"n "moy.an_mt_ope_debit_depot"n "moy.an_mt_ope_debit_ep_a_vue"n "moy.an_mt_pmts_carte_etr"n "moy.an_mt_remise_chq"n "moy.an_mt_paie_carte"n "moy.an_mt_paie_chq"n "moy.an_nb_paie_carte"n "moy.an_nb_paie_chq"n "moy.an_mt_ret_carte_Aut_bq"n "moy.an_mt_ret_carte_CRCA"n "moy.an_mt_ret_carte_etr"n "moy.an_nb_flux_cred_normeB2"n "moy.an_nb_ope_credit_ep_a_terme"n "moy.an_nb_ope_debit_ep_a_terme"n "moy.an_nb_ope_credit_depot"n "moy.an_nb_ope_credit_ep_a_vue"n "moy.an_nb_ope_debit_depot"n "moy.an_nb_ope_debit_ep_a_vue"n "moy.an_nb_pmts_carte_etr"n "moy.an_nb_remise_chq"n "moy.an_nb_ret_carte_Aut_bq"n "moy.an_nb_ret_carte_CRCA"n "moy.an_nb_ret_carte_etr"n "moy.an_nb_ret_carte"n "moy.an_mt_factu_ttc"n "moy.an_mt_reduc_ttc"n "moy.an_mt_rist_ttc"n "moy.an_mt_mvt_domicilie_mktg"n "moy.an_nb_mvt_M_domicilie_mktg"n top_produit_Epargne top_ouverture_reclam age_classe Flag_bq_principale flag_univers_detenus csp_1 SGMT_FIDELITE situ_fam_1 /
SELECTION=STEPWISE
SLE=0.05
SLS=0.05
INCLUDE=0
LINK=LOGIT
OUTROC=_PROB_
ALPHA=95
EXPEST
PARMLABEL
CORRB
NOPRINT
;
OUTPUT OUT=WORK.PREDLogRegPredictions(LABEL="Statistiques et prédictions de régression logistique pour WORK.APPEND_TABLE" WHERE=(NOT __FLAG))
PREDPROBS=INDIVIDUAL;
RUN;
QUIT;
%end;
%mend;
DATA WORK.PREDLogRegPredictions;
set WORK.PREDLogRegPredictions;
TOP_CREDIT_HABITAT_2017=__DEP;
_FROM_=__DEP;
DROP __DEP;
DROP __FLAG;
RUN ;
QUIT ;
Thank you in advance
If you're trying to do a bootstrap algorithm or something similar to that, the seminal paper on the topic is David Cassell's Don't be LOOPy from the 2007 SGF. In broad strokes, this describes the "old" way to do this (involving a loop, where you sample a new sample and then perform an analysis 50 times; and the new way, where you use PROC SURVEYSELECT with the rep option.
From the paper, the example:
proc surveyselect data=YourData out=outboot
seed=30459584
method=urs samprate=1 outhits
rep=1000;
run;
This generates a dataset with a Replicate variable, which you can then use as a by variable in most analyses. This then performs the analysis separately for each value of the variable, which is presumably what you want. You can use the various options on proc surveyselect to get the samples you want (sample size/rate, method of sampling, etc.)
If you're trying just to split your dataset up into chunks so you can either do a smaller analysis (as perhaps it might take too long to run the big one) or to do test and validation subsamples, but don't care about how nicely random things are, you can just add a variable in the datastep like so:
data for_regression;
set your_data;
sample_group = mod(_n_,50);
run;
proc sort data=for_regression;
by sample_group;
run;
And then you have 50 groups; you can sort by something random first if you prefer them be more "randomized" and don't think they are now, but PROC SURVEYSELECT is usually better for that sort of thing ultimately.
Related
I have an example table:
data data;
length code $30;
input code$;
datalines;
PPPES
PPPEW
pppESS
saf
xwq3
er32
ddES
ea9ESS
asesEo
ewlEa
;
run;
and I want to filter for rows that end in ES, ESS, or EW. I tried the following but it didn't work:
proc sql;
create table revised as
select *
from data
where code like ("%ES", "%ESS", "%EW")
quit;
Is there a way to filter if a variable ends in a possible list of string values?
This is my desired output:
data data1;
length code $30;
input code$;
datalines;
PPPES
PPPEW
pppESS
ddES
ea9ESS
;
run;
No.
Either explicitly test for each string.
where code like '%ES' or code like '%ESS' or code like '%EW'
In a data step you could use either of these:
if left(reverse(code)) in: ('SE','SSE','WE');
where left(reverse(code)) in: ('SE','SSE','WE');
PROC SQL does not support the truncated comparisons specified by the : modifier. But you could use the WHERE= dataset option
from data(where=(left(reverse(code)) in: ('SE','SSE','WE')))
Using "or" and simple quotation marks:
data data;
length code $30;
input code$;
datalines;
PPPES
PPPEW
pppESS
saf
xwq3
er32
ddES
ea9ESS
asesEo
ewlEa
;
run;
proc sql;
create table revised as
select *
from data
where code like ('%ES') or code like ('%ESS') or code like ('%EW');
quit;
In some scenarios you may want to cross join your search terms (as data) with your data, or do an existential test on your data.
data endings;
length target $10;
input target $char10.;
datalines;
ES
ESS
EW
;
data have;
length code $30;
input code $char30.;
datalines;
PPPES
PPPEW
pppESS
saf
xwq3
er32
ddES
ea9ESS
asesEo
ewlEa
;
run;
* cross join;
proc sql;
create table want as
select distinct code
from have
cross join endings
having code like '%'||target
;
quit;
* existential test;
proc sql;
create table want as
select distinct code from have
where exists (
select * from endings
where code like '%'||target
);
quit;
You might also need to deal with case insensitive searches by uppercasing the data values.
I need to create a pie chart showing gender distribution for a state pulled through a macro variable. The code for the sql table looks like this:
proc sql;
create table pie_data as
select distinct mean(Men2017/(Men2017 + Women2017)) format=comma10.2 as Men2017, mean(Women2017/(Men2017 + Women2017)) format=comma10.2 as Women2017
from project.county_data
where State= &StateValue;
quit;
The output of the table looks like this:
Men2017 Women2017
0.49 0.51
I don't know how to do it through proc gchart since Men2017 and Women2017 are technically separate variables, and I can't manually input the data since it needs to be dynamic. Any suggestions to how I should approach this? I'm new to SAS.
Use PROC TRANSPOSE to transpose the data
Rough idea, may need to re-name variables:
data pie_data;
input Men2017 Women2017;
cards;
0.49 0.51
;;;;
proc transpose data=pie_data out=pie_data_long;
var MEN2017 WOMEN2017;
run;
proc sgpie data=pie_data_long;
pie col1;
run;
PROC TRANSPOSE can re-arrange your data so that the values appear in 1 variable. Then you can easily do the pie chart. Add the following after your PROC SQL step.
proc transpose data= pie_data out=pie_transpose name=gender;
var men2017 women2017;
run;
title 'Men and women, 2017';
proc gchart data=pie_transpose;
pie gender / sumvar=col1 noheading;
run;
title;
Is it possible to make a macro of this form work?
%macro tableMath(input1,input2);
%local result;
proc sql; ---some code here using inputs--- quit;
proc sql; ---more code here--- quit;
proc sql;
select something into: result
quit;
&result
%mend;
I want to run some fairly complicated logic on each observation of a dataset, and in any other language I've used before the way to do this would be to encapsulate it in a function that returns a result each time it's called--I'm not sure how to do this logic in SAS however.
EDIT: input1 and input2 would be columns of a dataset and result would be used to create a new column in some other macro in another part of the program. I don't need a specific code solution I just literally don't get how you're supposed to do traditional function logic where you need a return value in SAS...
As Richard wrote, function-style macros emit SAS code. The general rule of developing function-style macros is that they contain only macro language statements. Any SAS code they contain will be emitted. Historically, this made it difficult/annoying to write a function-style macro that would process data like you would with a DATA step. Luckily, SAS has added a function, DOSUBL, which makes it easier to write function-style macros that execute SAS code in a "side session" and emit a result. See Rick Langston's paper.
Here is an example of a function-style macro which used DOSUBL to count the number of records in a table, and emits the count. (This is a very inefficient way to get a record count, just an example of doing something in SQL).
%macro SQLcount(table);
%local rc emit;
%let rc=%sysfunc(dosubl(%nrstr(
proc sql noprint;
select count(*) into :emit trimmed
from &table
quit;
)));
&emit
%mend ;
It can be used like:
proc sql ;
select name
,%SQLcount(sashelp.shoes) as ShoeCount /*emits 395*/
from sashelp.class
;
quit ;
When the above step runs, it will return 19 rows of names from sashelp.class, and the value of ShoeCount will be 395 on every row. Note that the macro SQLcount only executed once. While the PROC SQL step is being compiled/interpreted the call to SQLcount is seen and the macro is executed and emits 395. The step becomes:
proc sql ;
select name
,395 as ShoeCount /*emits 395*/
from sashelp.class
;
quit ;
DOSUBL uses a "side session" to execute code, which allows you to execute a PROC SQL step in the side session while the main session is interpreting a PROC SQL step.
I can't tell from your question if that sort of use case is what you want. It's possible you want a function-style macro where you could pass values to it from a table, and have the macro execute on each value and return something. Suppose you had a table which was a list of table names, and wanted to use SQL to get the count of records in each table:
data mytables ;
input table $20. ;
cards ;
sashelp.shoes
sashelp.class
sashelp.prdsale
;
quit ;
You can do that by using the resolve() function to build macro calls from data, delaying the execution of the macro until the SELECT statement executes:
proc sql ;
select table
,resolve('%SQLcount('||table||')') as count
from mytables
;
quit ;
With that, SQLcount will be called three times, and will return the number of records in each dataset.
table count
---------------------------
sashelp.shoes 395
sashelp.class 19
sashelp.prdsale 1440
The macro call is not seen when the PROC SQL step is interpreted, because it is hidden by the single quotes. The resolve function then calls the macro when the SELECT statement executes, passing the value of table as a parameter value, and the macro emits the record count. This is similar to a CALL EXECUTE approach for using data to drive macro calls.
You state you want to:
run some fairly complicated logic on each observation of a dataset
To do that you should use the SAS language instead of the macro processor or PROC SQL. You can use a data step. Or for even more complicated logic you should look at PROC DS2.
Sounds like you may want to create an FCMP function using proc fcmp. This is basically a way to create your own SAS functions that can be used within proc sql and data steps. For example:
/******************************************************************************
** PROGRAM: COMMON.FCMP_DIV.SAS
**
** DESCRIPTION: PERFORMS A MATHEMATICAL DIVISION BUT WILL RETURN NULL IF THE
** NUMERATOR OR DENOMINATOR IS MISSING (OR IF THE DIVISOR IS 0).
**
******************************************************************************/
proc fcmp outlib=common.funcs.funcs;
function div(numerator, denominator);
if numerator eq . or denominator in (0,.) then do;
return(.);
end;
else do;
return(numerator / denominator);
end;
endsub;
run;
Example Usage (example is data step but works equally well within SQL):
data x;
x1 = div(1,0);
x2 = div(1,.);
x3 = div(1,1);
x4 = div(0,0);
x5 = div(0,.);
x6 = div(0,1);
x7 = div(.,0);
x8 = div(.,.);
x9 = div(.,1);
put _all_;
run;
Macro functions do not return values. A macro function can 'emit' source code that
that are one or more steps,
that is a snippet that code be incorporated in a statement,
that is one or more statements that are part of a step,
etc
For your case of wanting to 'do' things in SQL, you could write SQL views that are then
opened with %sysfunc(open()) and
processed with
%sysfunc(set()) and
%sysfunc(getvarn()) and
%sysfunc(getvarc()).
Not all SQL functionality can utilized by this technique -- the select something into :result, would have to be a view with the select something and the macro would getvarc to read the result.
Access done in the open/set/get manner does not cause a step boundary to occur, so the macro processing can proceed with it's logic and eventually emit source code for snippet level consumption. (The consumer is the SAS executor that processes macro code, and implicitly compiles and runs SAS steps)
I have a macro where I am currently passing in 6 table names and 6 columns. However, the number of columns and tables will not always be constant.
Is there a way to have a variable number of parameters? I am familiar with the concept in python with **kwargs.
Also, is there a way to parameterize the proc sql statement to only take as many col and table inputs as provided? Or do a try catch of some sort in SAS to check if the variables exist before running the sql statement?
Here is my macro I'm trying to parameterize.
%macro Generate_TP_tbl(new_tbl_name, trans_col, tbl_1, tbl_2, tbl_3, tbl_4,
tbl_5, tbl_6, col_1, col_2, col_3, col_4, col_5, col_6);
proc sql;
CREATE TABLE &new_tbl_name AS
SELECT a1._NAME_, a1.&trans_col as &col_1, a2.&trans_col as &col_2,
a3.&trans_col as &col_3, a4.&trans_col as &col_4, a5.&trans_col as &col_5,
a6.&trans_col as &col_6
FROM &tbl_1 as a1, &tbl_2 as a2, &tbl_3 as a3, &tbl_4 as a4, &tbl_5 as a5,
&tbl_6 as a6
WHERE a1._NAME_ = a2._NAME_ = a3._NAME_ = a4._NAME_ = a5._NAME_ = a6._NAME_;
run;
%mend Generate_TP_table;
An even more generic way of doing this is as follows:
%macro mymacro /parmbuff;
%put &SYSPBUFF;
%mend;
You can then call %mymacro with any parameters you like and parse them all out from the &SYSPBUFF automatic macro variable.
This would probably need more work than Reeza's solution would, but I thought I'd post this anyway for completeness, as it's occasionally useful.
Pass them in as a single parameter and have the macro parse them out later.
%macro (parameters = , table_list = tb1 tb2 tb3 ... tb6, col_list=col1 col2 ... col6, other_parms= ... );
I would recommend building the rest of your code using a do loop with the number of parameters. The documentation here has a somewhat bad example of how to extract each element of a list:
http://support.sas.com/documentation/cdl/en/mcrolref/67912/HTML/default/viewer.htm#p1n2i0ewaj1zian1ria5579z1zjh.htm
The SQL is ugly...I wonder if a data step would be easier since you're merging on a single variable? Then it really becomes a rename from each table as in the example above in many respects.
Are there any statements\functions capable of get the name of variables?
Preferrably putting them into a column of another data set, a text field or a macro variable.
E.g.
- Data set 1
Name age sex
Jk 14 F
FH 34 M
Expected data set
Var_name_of_dataset1
Name
age
sex
PS: I know a statement: select into, which does sth relevantly
It can read the value of a column into a field with customized separetors, and therefore wish there are similar ways of reading column names into a field or a column.
Thanks
PROC CONTENTS would be the quickest way to get that information in a dataset. Column names can be found in the column NAME.
proc contents data=sashelp.class out=contents noprint;
run;
You can also use a datastep and array functions, e.g.
data colnames ;
set sashelp.class (obs=1) ;
array n{*} _NUMERIC_ ;
array c{*} _CHARACTER_ ;
do i = 1 to dim(n) ;
vname = vname(n{i}) ;
output ;
end ;
do i = 1 to dim(c) ;
vname = vname(c{i}) ;
output ;
end ;
run ;
%macro getvars(dsn);
%global vlist;
proc sql;
select name into :vlist separated by ' '
from dictionary.columns
where memname=upcase("&dsn");
quit;
%mend;
This creates a macro variable called &vlist that will contain the names of all the variables in your dataset, separated by a space. If you want commas between the variable names, all you have to do is change the 'separated by' value from ' ' to ', '. The use of the upcase function in the where statement avoids problems with someone passing the dataset name in the wrong case. The global statement is needed since the macro variable created will not necessarily be available outside the macro without defining it as global
Slightly changed from SAS help and documentation.
%macro names(dsid);
%let dsid=%sysfunc(open(&dsid, i));
%let num=%sysfunc(attrn(&dsid,nvars));
%let varlist=;
%do i=1 %to &num ;
%let varlist=&varlist %sysfunc(varname(&dsid, &i));
%end;
%let rc = %sysfunc(close(&dsid)); /*edit by Moody_Mudskipper: omitting this line will lock the dataset */
%put varlist=&varlist;
%mend names;
%names(sasuser.class) ;
Then we preserve case and the order off data, even if numeric and character is mixed.
I'm not sure Rawfocus assertion that reading dictionary tables queries all libraries is true, had the example used sashelp.vcolumn instead then it would be true, that approach is very slow and does access all the libraries allocated. (You can prove this with the SAS RTRACE system option.)
I am of the opinion that a sql query to dictionary.columns is the fastest of the methods outlined here. Obviously the macrotised code would work without the macro but the point of the macro here is I think as a utility; put the code into your favourite macro library and you never need to think about it again.