SAS : read in PDF file - pdf

I am looking for ways to read in a PDF file with SAS. Apparently this is not basic functionality and there is very little to be found on the internet. (Let alone that google is not easy with PDF in you search giving you also links to PDF documents that go about other things.)
The only things that can be found, are people looking for ways to import data into datasets from a PDF. For me, that is not even necesarry. I would like to be able to read the contents of the PDF file in one big character variable. If possible, it would even be better to be able to read in the file's binary data.
Is this possible with SAS and how? (I got it to work in Access VBA, but can't find any similar ways in SAS.)
(In the end, the purpose is to convert this to base64 and put that base64-string into an XML document.)

You probably will not be able to read the entire file into one character variable since the maximum size of a character variable is around 33 KB. A simple way to read in one line at a time, though, is something like the following:
%let pdfFileName = Test.pdf;
%let lineSize = 2000;
data base;
format text_line $&lineSize..;
infile "&pdfFileName" lrecl=&lineSize;
input text_line $;
run;
This requires that you have a general idea of the maximum record length ahead of time, but you could write additional code to determine the maximum record size prior to reading in the file. In this example each line of text is read into one character variable named "text_line." From there, you could use a RETAIN statement or double trailers (##) in the INPUT line to process multiple lines at a time. The SAS web-site has plenty of documentation on how to read and process text from various types of input files.

Related

How to write results in to NSArray and save it as csv file using objective-c

I'm trying to store my results in NSArray and save it as CSV File using Objective-C but i don't seem to find any solution which is relevant. Please find the below sample code:
int a=5,b=10;
int c=b-a;
double d=4.5,e=3.0;
double h=d-e;
NSLog(#"host_port:%f", c);
NSLog(#"host_size:%d", h;
I would like to store my values c and h in array and write that to CSV File. Any advise on this would be helpful.
Thanks in advance.
When you ask a question on SO you need to show effort - code you've tried, details of what you've read - if you don't you'll get down and close votes (you have one of each as I write this). The code you have included has nothing to do with CSV or arrays, and is not even pasted in valid code (the formats are wrong).
That said, let's see if you can give you something to get you going.
A CSV file is just plain text, you don't need to use any packages to write one, just standard I/O routines will do the job. You also do not need to store all the values in an array and then output the array, or build up a string version of the whole CSV file and output that, you can output items as they are generated if you wish and it may be more efficient to do so. In your code fragment you only have two values, maybe you intend this to be the core of a loop, and given those we assume you wish the CSV file:
host_port,host_size
5,1.5
your values have basic types, int and double, they are not Objective-C object types. Given this you can use the standard C I/O operations to produce your file.
First you may need to obtain the destination file name from the user, assuming this is a GUI app look up NSOpenPanel for this. That will give you an NSURL from which you can obtain the file path as an NSString, and you can convert that into a C string using NSString methods.
Now you can enter the C I/O world, to find the documentation on the following functions open the Terminal and use the man command, e.g. man fopen etc.
To create and open for writing the file for writing use fopen() passing it the C string pathname you obtained above.
To write the headers and each row of data use fprintf(). This takes a format string just like NSLog(), but you must remember to explicitly include the line breaks by using \n in the format.
When you've finished close the file with fclose().
Now go read the documentation and write your CSV file!
HTH

Fortran: How to skip many lines of data file efficiently

I have a formatted data file which is typically billions of lines long, with several lines of headers of variable length. The data file takes the form:
# header 1
# header 2
# headers are of variable length.
# data begins from next line.
1.23 4.56 7.89 0.12
2.34 5.67 8.90 1.23
:
:
# billions of lines of data, each row the same length, same format.
-- end of file --
I would like to extract a portion of data from this file, and my current code looks like:
<pre>
do j=1,jmax !Suppose I want to extract jmax lines of data from the file.
[algorithm to determine number of lines to skip, "N(j)"]
!This determines the number of lines to skip from the previous file
!position, when the data was read on j-1th iteration.
!Skip N-1 lines to go to the next data line to read off:
do i=1,N-1
read(unit=unit,fmt='(A)')
end do
!Now read off the line of data I want:
read(unit=unit,fmt='(data_format)'),data1,data2,etc.
!Data is stored in some arrays.
end do
</pre>
The problem is, N(j) can be anywhere between 1 and several billion, so it takes some time to run the code.
My question is, is there a more efficient way of skipping millions of lines of data? The only way I can think of, while sticking to Fortran, is to open the file with direct access and jump to the desired line upon opening the file.
As you suggest, direct access seems like the best option. But that requires the records to all have the same length, which your headers violate. Also, why used formatted output? With a file of this length, its hard to imagine a person reading the file. If you use unformatted IO, the file will be both smaller and IO will be faster. Perhaps create two files, one with the headers (metadata) in human reader form, and the other with the data in native form. Native / binary representation means a loss of portability, which is something to consider if you want to move the files to different computer architectures or have them be useable for decades. Otherwise it's probably worth the convenience. Other options would be to use a more sophisticated file format that combines metadata and data, such as HDF5 or FITS, but for communication between two programs of one person, that's probably excessive.

idl strange symbols in file

I've written several IDL programs to analyse some data. To keep it simple the programs read in some time varying data and calculate the fourier spectrum. This spectrum is written to file using this code:
openw,3,filename
printf,3,[transpose(freq),transpose(power)],format='(e,e)'
close,3
The file is then read by another program using this code:
rdfloat,filename,freq,power,/double
The rdfloat procedure can be found here: http://idlastro.gsfc.nasa.gov/
The error i get when trying to read the a file is: "Input conversion error. Unit: 101"
When i delve in to the file being read, i notice several types of unrecognised characters. I dont know if these are a result of the writing to the file or some thing else related to the number of files being created (over 300 files)
These symbols/characters are in the place of a single number:
< dle> < dc1> < dc2> < dc3> < dc4> < can> < nak> < em> < soh> < syn>
Example of what appears in the file being read, Note they are not consecutive lines.
7.7346< dle>18165493007e+01 8.4796811549010105e+00
7.7354408697119453e+01 1.04459538071< dc2>1749e+01
7.7360701595839< can>28e+01 3.0447318983094189e+00
Whenever I run the procedures that write the files, there is always at least one file that has some or all of these characters. The file/s that contains these characters is always different.
Can anyone explain what these symbols are and what I might be doing to create them as well as how to ensure they are not written to file?
I see two things that may be causing a problem. But first, I want to suggest a few tips.
When you open a file, it is useful to use the /GET_LUN keyword because it allows IDL to find and use a logical unit number (LUN) that is available (e.g., in case you left LUN 3 open somewhere else). When you print formatted data, you should specify the total width and number of decimal places. It will make things easier because then you need not worry about changing spacings between numbers in a file.
So I would change your first set of code to the following (or some variant of the following):
OPENW,gunit,filename[0],/GET_LUN,ERROR=err
FOR j=0L, N_ELEMENTS(freq) - 1L DO BEGIN
PRINTF,gunit,freq[j],power[j],FORMAT='(2e20.12)'
ENDFOR
FREE_LUN,gunit ;; this is better than using the CLOSE routine
So the first potential issue I see is that if your variable power was calculated using something like FFT.pro, then it will be a complex float or complex double, depending on the input and keywords used.
The second potential issue may be due to an incorrect format statement. You did not tell PRINTF how many columns or rows to expect. It might not know how to handle the input properly, so it guesses and may result in those characters you show. Those characters may be spacing characters due to the vague format statement or the software you are using to look at the files (e.g., I would not recommend using Word to open text files, use a text editor).
Side Note: You can open and read the file you just wrote in a similar fashion to what I showed above, but changed to the following:
n = FILE_LINES(filename[0])
freq = DBLARR(n)
power = DBLARR(n)
OPENR,gunit,filename[0],/GET_LUN,ERROR=err
FOR j=0L, N_ELEMENTS(freq) - 1L DO BEGIN
READF,gunit,freq[j],power[j],FORMAT='(2e20.12)'
ENDFOR
FREE_LUN,gunit ;; this is better than using the CLOSE routine

SAS - Generate Variable File Name Correctly

I'm trying to generate a variable file name.
ods pdf file = "D:\FileDirectory\&&mFileNameVariable&I .pdf" notoc;
This generates a variable file name but adds a space before the extension (eg. FileName .pdf; I need FileName.pdf).
I read that you could do something like this:
ods pdf file = "D:\FileDirectory\&&mFileNameVariable&I..pdf" notoc;
To add the dot for the extension; however, when I try that macro doesn't work, I get a WYSIWYG value (eg. &&mFileNameVariable&I.pdf).
I'm assuming its because my string ends with a "&I".
Another solution I thought of, but it seams unnecessary / workaround is to trim(FilePathAndName) and, or concatinate cats(of FilePathAndName FileExtension) the values seperately.
Any insight or feedback is much appreciated, thank you in advance for your time and help.
Cheers!
Since you are doing two passes through the macro resolution process, you need an extra period between the filename and the extension (three total, 2 get munched during macro resolution, one to represent the separator).
e.g.
%let mFileNameVariable1=myfile;
%let l=1;
ods pdf file="C:\Temp\&&mFileNameVariable&l...pdf" notoc; /*note 3 periods!!*/
On Log
NOTE: Writing ODS PDF output to DISK destination "C:\Temp\myfile.pdf", printer "PDF".

How can I add and remove bytes on from the start of a file?

I'm trying to open an existent file save a bytes in the start of it to later read them.
How can I do that? Because the "&" operand isn't working fo this type of data.
I'm using Encoding.UTF8.GetBytes("text") to convert info to bytes and then add them.
Help Please.
You cannot add to or remove from the beginning of a file. It just doesn’t work. Instead, you need to read the whole file, and then write a new file with the modified data. (You can, however, replace individual bytes or chunks of bytes in a file without needing to touch the whole file.)
Secondly,
I'm using Encoding.UTF8.GetBytes("text") to convert info to bytes and then add them.
You’re doing something wrong. Apparently you’ve read text data from the file and are now trying to convert it to bytes. This is the wrong way of doing it. Do not read text from the file, read the bytes directly (e.g. via My.Computer.FileSystem.ReadAllBytes). Raw byte data and text (i.e. String) are two fundamentally different concepts, do not confuse them. Do not convert needlessly to and fro.