SAS - Reading a File Backwards? - file-io

I need SAS to read many large log files, which are set up to have the most recent activities at the bottom. All I need is the most recent time a particular activity occurred, and I was wondering if it's possible for SAS to skip parsing the (long) beginning parts of the file.
I looked online and found how to read a dataset backwards, but that would require SAS to first parse everything in the .log file into the dataset first. Is it possible to directly read the file starting from the very end so that I can stop the data step as soon as I find the most recent activity of a particular type?
I read up on infile as well, and the firstobs option, but I have no idea how long these log files are until they are parsed, right? Sounds like a catch-22 to me. So is what I'm describing doable?

I'd probably set up a filename pipe statement to use an operating system command like tail -r or tac to present the file in reverse order to SAS. That way SAS can read the file normally and you don't have to worry about how long the file is.

If you mean parsing a sas log file, I am not sure if reading the log file backward is worth the trouble in practice. For instance, the following code executes less than a tenth of a second on my PC and it is writing and reading a 10,000 line log file. How big is your log files and how many are there? Also as shown below, you don't have to "parse" everything on every line. You can selectively read some parts of the line and if it is not what you are looking for, then you can just go to the next line.
%let pwd = %sysfunc(pathname(WORK));
%put pwd=&pwd;
x cd &pwd;
/* test file. more than 10,000 line log file */
data _null_;
file "test.log";
do i = 1 to 1e4;
r = ranuni(0);
put r binary64.;
if r < 0.001 then put "NOTE: not me!";
end;
put "NOTE: find me!";
do until (r<0.1);
r = ranuni(0);
put r binary64.;
end;
stop;
run;
/* find the last line that starts with
NOTE: and get the rest of the line. */
data _null_;
length msg $80;
retain msg;
infile "test.log" lrecl=80 eof=eof truncover;
input head $char5. #;
if head = "NOTE:" then input #6 msg $char80.;
else input;
return;
eof:
put "last note was on line: " _n_ ;
put "and msg was: " msg $80.;
run;
/* on log
last note was on line: 10013
and msg was: find me!
*/

Related

To grep contents from a CSV/Text File using Autohotkey(AHK) Script

Can anyone please help me in writing a script in AHK based on below requirement.
Requirement:
I have a CSV/TXT file in my windows environment which contains 20,000+ records in below format.
So, when I run the script it should prompt a InputBox to enter an instance name.
Example : If i enter Instance4 , it should display result in MsgBox as ServerName4
Sample Format:
ServerName1,ServerIP,Instance1,Type
ServerName2,ServerIP,Instance2,Type
ServerName3,ServerIP,Instance3,Type
ServerName4,ServerIP,Instance4,Type
ServerName5,ServerIP,Instance5,Type
.
.
.
Also as the CSV/TXT file contains large no of records , pls also consider the best way to avoid delay in fetching the results.
Please post your code, or at least show what you've already done.
You can use a Parsing Loop with CSV as the delimiter, and make a variable for each 'Instance' who's value is that of the current row's 'ServerName'.
The steps are to first FileRead the data from the file, then Loop, Parse like so:
Loop, Parse, data, CSV
{
; Parses row by row, then column by column in each row.
; A_LoopField // Current value
; A_Index // Current loop's index
; Write a script that makes a variable named with the current value of column 3, and give it the value of column 1
}
After that, you can make a Goto loop that spams InputBox and following a command that prints out the needed variable using the MsgBox command, like so:
MsgBox % %input%

How to run same syntax on multiple spss files

I have 24 spss files in .sav format in a single folder. All these files have the same structure. I want to run the same syntax on all these files. Is it possible to write a code in spss for this?
You can use the SPSSINC PROCESS FILES user submitted command to do this or write your own macro. So first lets create some very simple fake data to work with.
*FILE HANDLE save /NAME = "Your Handle Here!".
*Creating some fake data.
DATA LIST FREE / X Y.
BEGIN DATA
1 2
3 4
END DATA.
DATASET NAME Test.
SAVE OUTFILE = "save\X1.sav".
SAVE OUTFILE = "save\X2.sav".
SAVE OUTFILE = "save\X3.sav".
EXECUTE.
*Creating a syntax file to call.
DO IF $casenum = 1.
PRINT OUTFILE = "save\TestProcess_SHOWN.sps" /"FREQ X Y.".
END IF.
EXECUTE.
Now we can use the SPSSINC PROCESS FILES command to specify the sav files in the folder and apply the TestProcess_SHOWN.sps syntax to each of those files.
*Now example calling the syntax.
SPSSINC PROCESS FILES INPUTDATA="save\X*.sav"
SYNTAX="save\TestProcess_SHOWN.sps"
OUTPUTDATADIR="save" CONTINUEONERROR=YES
VIEWERFILE= "save\Results.spv" CLOSEDATA=NO
MACRONAME="!JOB"
/MACRODEFS ITEMS.
Another (less advanced) way is to use the command INSERT. To do so, repeatedly GET each sav-file, run the syntax with INSERT, and sav the file. Probably something like this:
get 'file1.sav'.
insert file='syntax.sps'.
save outf='file1_v2.sav'.
dataset close all.
get 'file2.sav'.
insert file='syntax.sps'.
save outf='file2_v2.sav'.
etc etc.
Good luck!
If the Syntax you need to run is completely independent of the files then you can either use: INSERT FILE = 'Syntax.sps' or put the code in a macro e.g.
Define !Syntax ()
* Put Syntax here
!EndDefine.
You can then run either of these 'manually';
get file = 'file1.sav'.
insert file='syntax.sps'.
save outfile ='file1_v2.sav'.
Or
get file = 'file1.sav'.
!Syntax.
save outfile ='file1_v2.sav'.
Or if the files follow a reasonably strict naming structure you can embed either of the above in a simple bit of python;
Begin Program.
imports spss
for i in range(0, 24 + 1):
syntax = "get file = 'file" + str(i) + ".sav.\n"
syntax += "insert file='syntax.sps'.\n"
syntax += "save outfile ='file1_v2.sav'.\n"
print syntax
spss.Submit(syntax)
End Program.

Multiple OS-COMMAND calls from procedure are conflicting

I have a procedure which is writing a file, emailing it using mail_files, and then an OS-DELETE statement to delete the file after it is sent. The call to the external procedure which calls mail_files or the actual OS-COMMAND itself are asynchronous. The OS is AIX 6 and the version of Progress is 102B. Here's an example below:
Here is the main procedure:
DEFINE STREAM outStr.
OUTPUT STREAM outStr TO foo.txt.
FOR EACH customer NO-LOCK:
EXPORT STREAM outStr customer.
END.
OUTPUT STREAM outStr CLOSE. /*EDIT: The problem occurs even if it's closed*/
RUN sendmail.p.
OS-DELETE foo.txt.
Here is sendmail.p:
DEFINE STREAM stMail.
OUTPUT STREAM stMail THROUGH
"mail_files -f foo#bar.com -t me#here.com -s\"subject\" -b~\foo.txt\").
PUT STREAM stMail "Email body".
OUTPUT STREAM stMail CLOSE.
In testing it on my own, I can't replicate the error. Is Progress trying to "optimize" something here? Is there anything to cleanly make it do what I want without hard-coding a pause?
EDIT:
The stream is being closed before the email attempt, but the error still occurs. No partial file is sent.
The error I get is from mail_files because it can't find the file. I've checked, and no other processes are scheduled to run which would access the file.
No such file or directory
/usr/local/bin/mail_files[268]: foo.txt: cannot open
DEFINE STREAM outStr.
OUTPUT STREAM outStr TO foo.txt.
FOR EACH customer NO-LOCK:
EXPORT STREAM outStr customer.
END.
/* Dont forget to close */
OUTPUT STREAM outStr CLOSE.
RUN sendmail.p.
OS-DELETE foo.txt.
This looks like a pathing issue to me.
In your output stream statement you never define the path that the file will be written to. This will result in the path being the current working directory of whatever application this is running under. The path of the current working directory may not necessarily be the same path that mail__files is reading from (which appears to be /usr/local/bin).
I would suggest updating your code as follows:
OUTPUT STREAM outStr TO /usr/tmp/foo.txt.
and
OUTPUT STREAM stMail THROUGH
"mail_files -f foo#bar.com -t me#here.com -s\"subject\" -b\"/usr/tmp/foo.txt\").
...or you could just try updating this line to point at /usr/local/bin (although /usr/local/bin doesn't really strike me as an appropriate directory for temporary files):
OUTPUT STREAM outStr TO /usr/local/bin/foo.txt.
If I understood correctly, Progress removes your file before mail_files use it.
If this is that, you can use unique files and cron, delete all files that are supperior to a certain date.
For example:
DEFINE VARIABLE wlc-Identifiant AS CHARACTER NO-UNDO.
DEFINE VARIABLE wlc-file-txt AS CHARACTER NO-UNDO.
wlc-Identifiant = STRING(YEAR(TODAY), "9999") + STRING(MONTH(TODAY), "99") + STRING(DAY(TODAY), "99") + REPLACE(STRING(TIME, "HH:MM:SS"), ":", "").
wlc-file-txt = wlc-Identifiant + "foo.txt".
DEFINE STREAM outStr.
OUTPUT STREAM outStr TO VALUE (wlc-file-txt).
FOR EACH customer NO-LOCK:
EXPORT STREAM outStr customer.
END.
OUTPUT STREAM outStr CLOSE.
RUN sendmail.p (INPUT wlc-file-txt). /* add the file in parameter */
/*OS-DELETE foo.txt.*/ /* It 's a cron job */
In sendmail.p:
DEFINE INPUT PARAMETER wlpic-file-txt AS CHARACTER NO-UNDO.
DEFINE STREAM stMail.
OUTPUT STREAM stMail THROUGH
"mail_files -f foo#bar.com -t me#here.com -s\"subject\" -b~\" + wlpic-file-txt + "\").
PUT STREAM stMail "Email body".
OUTPUT STREAM stMail CLOSE.
And with cron, delete old files that were created there today - 1 (it's an examle)
I hope it will help you. :)

How does this NAWK script work to show the ports being used by a process on Solaris?

I am trying to understand how the following command works (from here):
<!-- language: lang-bash -->
pfiles /proc/* 2>&- |
nawk 'END {
if (f) print p
}
/^[0-9]/ {
if (f) print p, RS
p = $0
f = 0
}
/INET / {
sub(/.*INET/,"")
p = p ? p RS $0 : $0
f = 1
}'
This command works well (in SOLARIS 5.10) and shows all the ports opened by processes.
I understand that, pfiles /proc/* displays a bunch of output related to all processes by querying the /proc/ filesystem. From the man-page:
pfiles Report fstat(2) and fcntl(2) information
for all open files in each process. In
addition, a path to the file is reported
if the information is available from
/proc/pid/path. This is not necessarily
the same name used to open the file. See
proc(4) for more information.
The output from pfiles is then processed by nawk ('New Awk').
Questions
Could you please explain how NAWK is processing the output of pfiles in the following command? It would be most helpful to know how the parameters f, p and $0 mean.
In the first line, what does redirection of standard error to &- mean? Does it mean the standard error stream is being closed ?
I had to read that script once or twice to make sure I got it straight in
my head. It's a little confusing because we see the END at the beginning.
$0 is the entire line.
The line /^[0-9]/ matches the process id (specifically) and that block
then sets the sentinel variable f to 0.
The block starting with /INET / matches (and then strips, via the sub(..))
the open port number. The sentinel value f is set to 1 so that we know to
print differently when we hit the END. Each time we finish an output
collection (ie, the entire output from pfiles for a process), we hit the END
block and print the output.
BTW, the RS is the Record Separator.
Running the script on just one process might make it a little easier to get
the head around it.
Sorry, forgot to answer your other question re the redirection.
2>&-
in this context means "redirect stderr from the process to standard input",
so that nawk takes input from there rather than a file.

SAS: Using the Put statement to create dynamic code

I'd like to create dynamic code using the PUT statement. According to this document from SUGI 29 (http://www2.sas.com/proceedings/sugi29/175-29.pdf),
put
"data XXXXX; "
/ 'infile "&datadir/&compid&filetype" missover ls=' tbla_fle
';' / 'input'
;
is equivalent to running
data onecomp ;
infile
"&datadir/&compid&filetype"
missover ls = 268 ;
input
However, when I try something similar to their example, the code enclosed in the PUT statement doesn't run and is instead written to the SAS Output Log:
data _NULL_;
put // "data put_test;" / "b=2;" / "run;";
run;
In Output Log:
data put_test;
b=2;
run;
I've checked the SAS documentation, and it seems that PUT is only used to "Write lines to the SAS log, to the SAS output window, or to an external location that is specified in the most recent FILE statement." Nowhere does it say that it can be used to create dynamically generated code.
I know that I must be missing something, but I'm not sure what. I'm using SAS Enterprise Guide 4.1.
Thank you!
The idea is to use put to write your generated code to a file. You then %include the file into your SAS session to run it. What you're missing is a file statement and the %include directive.
data _null_;
file 'temp.sas'; /* redirects put to a file instead of the SAS log */
put
"data XXXXX; "
/ 'infile "&datadir/&compid&filetype" missover ls=' tbla_fle
';' / 'input'
;
run;
%include 'temp.sas';