I am giving the below command to check the number of rows in SAS data set but it's outputting the 60 records of dataset however the dataset have 247 records.
Is there is any other way to do in unix command?
UNIX command:
awk 'END {print NR}' /home/user/check.sas7bdat
You need to write a SAS program to output the number of observations for you. The structure of the sas7bdat file is complicated.
data _null_;
file stdout;
set "&sysparm" nobs=nobs;
put "NOBS:" nobs;
stop;
run;
I named this "test.sas"
This reads in the data set specified in a passed system parameter and outputs to STDOUT the number of observations.
I created a test data set in my home directory like:
libname d "~/";
data d.test;
do i=1 to 1000;
output;
end;
run;
From the command line run
<path to sas>/sas test.sas -sysparm ~/test.sas7bdat
I get NOBS:1000 back.
What about just doing it in a SAS datastep? You can fetch the number of rows with the NOBS statement.
/* Test dataset */
data have;
a = 1;output;
a = 2;output;
a = 3;output;
run;
data _null_;
set have NOBS = size;
call symput("size",strip(size));
run;
%put NOTE: Number of records: &size.;
NOTE: Number of records: 3
Related
Dummy data:
MEMNAME _var1 var2 var3 var4
XY XYSTART_1 XYSTATT_2 XYSTAET_3 XYSTAWT_4
I want to create a macro variable that will have data as TEST_XYSTART, TEST_XYSTATT, TEST_XYSTAET, TEST_TAWT.... how can I do this in datastep without using call symput because I want to use this macro variable in the same datastep (call symput will not create macro variable until I end the datastep).
I tried as below (not working), please tell me what is the correct way of write the step.
case = "TEST_"|| strip(reverse(substr(strip(reverse(testcase(i))),3)));
%let var = case; (with/without quotes not getting the desired result).
abc= strip(reverse(substr(strip(reverse(testcase(i))),3)));
%let test = TEST_;
%let var = &test.abc;
I am getting correct data with this statement: strip(reverse(substr(strip(reverse(testcase(i))),3)))
just not able to concatenate this value with TEST_ and assign it to the macro variable in a datastep.
Appreciate your help!
It makes no sense to locate a %LET statement in the middle of a data step. The macro processor evaluates the macro statements first and then passes the resulting code onto SAS to evaluate. So if you had this code:
data want;
set have;
%let var=abc ;
run;
It is the same as if you placed the %LET statements before the DATA statement.
%let var=abc ;
data want;
set have;
run;
If you want to reference a variable dynamically in a data step then use either an index into an array.
data want;
set have;
array test test_1 - test_3;
new_var = test[ testnum ] ;
run;
Or use the VvalueX() function which will return the value of a variable whose name is a character expression.
data want;
set have;
new_var = vvaluex( cats('test_',testnum) );
run;
I have a binary data set with no delimiters and no fixed length records. I know each record contains 22 bytes of data then an unknown number of 23 byte blocks, up to 50 blocks. The problem is that it's only reading 1 line of 32767 bytes for a total of 728 obs. I'm expecting 2.7MM output obs. How can I make this read the input file to the end? I've already tried adding an "OBS=" option and "lrecl=" option to the infile line. Adding the "end=" option had no effect on the result.
DATA INFILE.MYDATA (drop= i);
INFILE "&Path./UGLYDATA" end=eof;
INPUT
MY_KEY s370fPD9.
...
OCCURS s370fPD2.
#
;
ARRAY MyData{50} MyData1-MyData50;
...
ARRAY Filler{50} $ Filler1-Filler50;
DO I = 1 TO min(50,OCCURS);
INPUT
MyData{I} s370fPD4.
...
Filler{I} $ebcdic10.
##
;
End;
RUN;
Relevant Log:
NOTE: 1 record was read from the infile "UGLYDATA".
The minimum record length was 32767.
The maximum record length was 32767.
One or more lines were truncated.
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set INFILE.MYDATA has 728 observations and 356 variables.
NOTE: Compressing data set INFILE.MYDATA decreased size by 47.06 percent.
Compressed is 9 pages; un-compressed would require 17 pages.
NOTE: DATA statement used (Total process time):
real time 2.69 seconds
user cpu time 0.02 seconds
system cpu time 0.11 seconds
memory 1890.40k
OS Memory 10408.00k
Timestamp 12/07/2021 05:17:34 PM
Step Count 1 Switch Count 0
Page Faults 3
Page Reclaims 1028
Page Swaps 0
Voluntary Context Switches 272
Involuntary Context Switches 1226
Block Input Operations 309648
Block Output Operations 2312
Sounds like the file does not consists of lines of text. So try using RECFM=N on your INFILE statement so that SAS will not be looking for LINEFEED character (or CARRIAGE RETURN and LINEFEED combination) to mark the end of the lines.
INFILE "&Path./UGLYDATA" recfm=n ;
If you are unsure what the file contains just run a simple data step to look at the first few hundred bytes and then figure it out. If any of the bytes in a "line" are not printable characters the LIST command will include the hexcodes for the bytes under the lines when it writes to the SAS log.
data _null_;
INFILE "&Path./UGLYDATA" recfm-=f lrecl=100 obs=10 ;
input;
list;
run;
Per #Tom, indeed RECFM=N.
Example:
Create and read back a binary file.
filename foo '%temp%/foo.bin' recfm=n;
data _null_;
file foo;
call streaminit(2021);
filler = repeat('*', 10);
do recnum = 1001 to 1010;
put recnum s370fPD9. #;
put filler $char11. #;
occurs = rand('integer',1,26);
put occurs s370fPD2. #;
do z = 0 to occurs-1;
record = repeat(byte(rank('A')+z), 22);
put record $ebcdic23.;
end;
putlog 'NOTE: ' recnum= occurs=;
end;
stop;
run;
data want;
infile foo;
* read master;
input recnum s370fPD9. filler $char11. occurs s370fPD2.;
* read details;
do index = 1 to occurs;
input content $ebcdic23.;
output;
end;
run;
dm 'vt want';
I would like to dynamically create macros to query a transactional data set. I have a table that has a set of parameters (parameter_data) and transaction data (txs). For each row in my parameter data I want to create a macro that can be called to query the data.
Parameter data:
data parameter_data;
input macro_name $ parameter_name $ parameter_value $;
datalines;
A Person_ID 1
B TX_ID 2
;
Transactional Data:
data txns;
input Person_ID $ TX_ID $ TX_Amount $;
datalines;
John Sales 1123
Mary Acctng 34
John Sales 23
Mary Sales 2134
;
Here I try to create a macro that should create macros dynamically according to the parameter data. The 'inner macros' are the macros that are created from the parameter data.
%macro outerMacro;
/*loop through each row in the parameter table to get the detail of the macro we want to create*/
%DO ROW = 1 %To 2;
data _NULL_;
set parameter_data;
if _N_ = ROW then do;
call symputx('parameter_name',parameter_name);
call symputx('parameter_value',parameter_value);
end;
run;
/*define inner macro parameters*/
%let macroName = myMacro; /*set the name of the macro we want to create*/
%let innerMacroStart = macro ¯oName.; /*set the macro name to start the macro definition*/
%let innerMacroEnd = mend ¯oName;
%&&innerMacroStart.; /*start the inner macro*/
/*body of the macro*/
data output;
set txns;
&¶meter_name = &¶meter_value;
/*so here effectively for the first row in the parameter table we are filtering where person_id = John*/
run;
%&&innerMacroEnd.; /*end the inner macro*/
%mend outerMacro;
%&&outerMacroName.;
It seems that SAS is unable to parse the lines %innerMacroStart. Any help is much appreciated.
Thanks!
If the goal is just to subset data then it might be better to generate macro variables instead of actual macros. Try something like this instead.
data _null_;
set parameter_data ;
call symputx(macro_name,catx(' ','where also'
,parameter_name,'=',quote(trim(parameter_value)),';'));
run;
Then just use the generated where statement(s) when you need them by expanding the macro variable. Like this:
data output ;
set txns;
&a
run;
If you really want to generate a macro definition then you probably want to just use a data step to write the code to a file and then %include the file to compile the macros. That will be much easier to debug than macro logic.
Let's fix your parameter file to better match your test data. Person_ID and TX_ID are character variables in your transaction dataset. You will probably need to add logic or change the parameter file to allow it to handle testing of both numeric and character variables. For now I just made it generate code that assumes that PARAMETER_NAME refers to a character variable so that PARAMETER_VALUE will need to have quotes added to make it a string literal.
data parameter_data;
input macro_name :$32. parameter_name :$32. parameter_value $:200.;
datalines;
A Person_ID John
B TX_ID Acctng
;
data txns;
input Person_ID $ TX_ID $ TX_Amount $;
datalines;
John Sales 1123
Mary Acctng 34
John Sales 23
Mary Sales 2134
;
Now let's run a data step to generate the code for all of your macros. I added logic to use AND if there were multiple "parameters" defined for each macro.
filename code temp;
data _null_;
set parameter_data ;
by macro_name ;
file code ;
if first.macro_name then put
'%macro ' macro_name ';'
/ 'data output;'
/ ' set txns;'
/ ' where ' #
;
else put ' and ' # ;
put parameter_name '=' parameter_value :$quote. # ;
if last.macro_name then put
';'
/ 'run;'
/ '%mend ' macro_name ';'
;
run;
Now just use %include to compile the macros.
%include code / source2 ;
NOTE: %INCLUDE (level 1) file CODE is file C:\...\#LN00048.
432 +%macro A ;
433 +data output;
434 + set txns;
435 + where Person_ID ="John" ;
436 +run;
437 +%mend A ;
438 +%macro B ;
439 +data output;
440 + set txns;
441 + where TX_ID ="Acctng" ;
442 +run;
443 +%mend B ;
NOTE: %INCLUDE (level 1) ending.
Now you can use your macros.
445 options mprint;
446 %a ;
MPRINT(A): data output;
MPRINT(A): set txns;
MPRINT(A): where Person_ID ="John" ;
MPRINT(A): run;
NOTE: There were 2 observations read from the data set WORK.TXNS.
WHERE Person_ID='John';
NOTE: The data set WORK.OUTPUT has 2 observations and 3 variables.
447 %b ;
MPRINT(B): data output;
MPRINT(B): set txns;
MPRINT(B): where TX_ID ="Acctng" ;
MPRINT(B): run;
NOTE: There were 1 observations read from the data set WORK.TXNS.
WHERE TX_ID='Acctng';
NOTE: The data set WORK.OUTPUT has 1 observations and 3 variables.
I have placed a comment before each block of code, but essentially it is:
Parameter set up.
Macro generation.
%include.
Call any desired macro.
I have assumed no more than 999 parameter observations - this is controlled by seq.
You can examine file "inner_macro.sas" to see the macro definitions.
NB. When you try it, make sure to use your own path in place of <your-path> (occurs twice):
/* set up parameters */
data parameters;
infile datalines dlm=',';
input var : $8.
operator : $8.
value : $8.
;
datalines;
name,eq,"John"
age,gt,12
weight,eq,0
;
/* read parameters and generate a macro definition for each obs, written to a file */
data _null_;
file '<your-path>/inner_macro.sas';
set parameters;
seq = put(_n_,z3.);
put '%macro inner_' seq ';';
put ' where ' var operator value ';';
put '%mend inner_' seq ';';
put;
run;
/* %include (submits code in file) all of the macro definitions */
%include '<your-path>/inner_macro.sas';
options mprint;
/* invoke the macro with the required data sets */
data class1;
set sashelp.class;
%inner_001;
run;
data class2;
set sashelp.class;
%inner_002;
run;
data class3;
set sashelp.class;
%inner_003;
run;
I have a set of data of gym membership starting with an ID, then 119 in-time columns and 119 out-time columns. The in-time and out-time columns are in the syntax of ##:##:## and I am trying to input the variables in the simplest way. Rather than writing [ID in1 $ in2 $ inX $ out1 $ out2 $ outX $], is there a way to easily input hundreds of columns in a simple line of code?
Just use variable lists. Let's assume your data file is comma delimited.
data want ;
infile 'myfile.csv' dsd truncover ;
input id (in1-in119 out1-out119) (:time8.) ;
format in1-in119 out1-out119 time8.;
run;
"proc import" can be an alternative solution.
It defines data type automatically.
The statement looks like the following:
proc import
datafile = myfile.csv
out = work.destination_table
dbms = csv replace
;
run;
So I am trying to break up a large dataset (70,000 obs with 1,790 variables) based on a specific variable grouping. Excel or CSV is the ideal format to export to, but there is a limitation on variable numbers (260 or something). Any ideas how I can do this in SAS (or R / SQL otherwise)?
I know the macro works, I have used it before. The error message reads the limit on variables has been reached.
There is certainly a limit on creating an Excel file, but not a CSV file. Here is an example using a dummy SAS data set:
data a;
array x(*) x1-x1790;
do j=1 to 5;
do i=1 to dim(x);
x(i) = ranuni(0);
end;
output;
end;
run;
proc export data=a
outfile="c:\temp\tempfile.csv"
dbms=CSV
replace;
run;
And here is the relevant log:
NOTE: The file 'c:\temp\tempfile.csv' is:
Filename=c:\temp\tempfile.csv,
RECFM=V,LRECL=32767,File Size (bytes)=0,
Last Modified=23Jan2013:15:27:13,
Create Time=23Jan2013:15:27:13
NOTE: 6 records were written to the file 'c:\temp\tempfile.csv'.
The minimum record length was 9636.
The maximum record length was 23087.
NOTE: There were 5 observations read from the data set WORK.A.
NOTE: DATA statement used (Total process time):
real time 0.26 seconds
cpu time 0.09 seconds
5 records created in c:\temp\tempfile.csv from A.
NOTE: "c:\temp\tempfile.csv" file was successfully created.
NOTE: PROCEDURE EXPORT used (Total process time):
real time 2.04 seconds
cpu time 0.26 seconds
Note the first row contains column headers.
UPDATE: If you have a recent version of SAS (9.3 TS1M1 or later) you can create an Office 2010 Excel spreadsheet, which has a maximum of 1,048,576 rows by 16,384 columns. In that case, you would use DBMS=XLSX.
Bob's answer is good if you are okay with XLSX or a CSV. If you do want to make a .xls excel file (255 column limit), or don't have 9.3TS1M1, it's fairly easy to do that. How exactly depends on how you want to specify the columns that go into each file.
Say you just want each 255 columns into a separate file, and two files split at the midpoint (35000 records into file A, 35001-end into file B, per set of variables). You would do something like this:
options mprint symbolgen;
data test;
array xs x1-x1700;
do id = 1 to 70000;
do _t = 1 to dim(xs);
xs[_t]=ranuni(7);
end;
output;
end;
run;
%macro export_file(varstart=,varend=,varnumstart=0,varnumend=0,recstart=1,recend=0,keeplist=,dset=, libname=WORK, outfile=,sheet="sheet1");
%if &varnumstart ne 0 %then %do;
proc sql noprint;
select name into :varstart from dictionary.columns
where libname=upcase("&libname.") and memname=upcase("&dset.") and varnum=&varnumstart.;
select name into :varend from dictionary.columns
where libname=upcase("&libname.") and memname=upcase("&dset.") and varnum=&varnumend.;
quit;
%end;
%if &varstart=%str() or &varend=%str() %then %do;
%put "ERROR: MISSING PARAMETERS. PLEASE CHECK YOUR MACRO CALL AND RERUN. MUST HAVE VARSTART AND VAREND OR VARNUMSTART AND VARNUMEND.";
%abort;
%end;
data _for_Export/view=_for_export;
set &libname..&dset;
keep &varstart.--&varend.
%if &keeplist ne %str() %then %do;
&keeplist
%end;
;
if _N_ ge &recstart.;
%if &recend ne 0 %then %do;
if _N_ le &recend.;
%end;
run;
proc export data=_for_export file=&outfile. dbms=excel replace;
sheet=&sheet.;
run;
proc datasets nolist noprint lib=work;
delete _for_export/memtype=view;
quit;
%mend export_file;
%export_file(varnumstart=1,varnumend=250, keeplist=id,recstart=1,recend=35000,dset=test,outfile="c:\temp\test.xls",sheet="sheet1");
%export_file(varnumstart=1,varnumend=250, keeplist=id,recstart=35001,recend=99999,dset=test,outfile="c:\temp\test.xls",sheet="sheet2");
%export_file(varnumstart=251,varnumend=500, keeplist=id,recstart=1,recend=35000,dset=test,outfile="c:\temp\test.xls",sheet="sheet3");
%export_file(varnumstart=251,varnumend=500, keeplist=id,recstart=35001,recend=99999,dset=test,outfile="c:\temp\test.xls",sheet="sheet4");
Mine fails when I try to export sheet4, not sure if there's some limit to the total size of an .xls file, but you can easily modify this to create separate files. This wouldn't work if you needed to specify specific variable names that are nonconsecutive for each separate file, but you could fairly easily modify the SQL code that pulls from dictionary.columns to instead pull from a table you create that holds the variable names you want in each file.