idl strange symbols in file - file-io

I've written several IDL programs to analyse some data. To keep it simple the programs read in some time varying data and calculate the fourier spectrum. This spectrum is written to file using this code:
openw,3,filename
printf,3,[transpose(freq),transpose(power)],format='(e,e)'
close,3
The file is then read by another program using this code:
rdfloat,filename,freq,power,/double
The rdfloat procedure can be found here: http://idlastro.gsfc.nasa.gov/
The error i get when trying to read the a file is: "Input conversion error. Unit: 101"
When i delve in to the file being read, i notice several types of unrecognised characters. I dont know if these are a result of the writing to the file or some thing else related to the number of files being created (over 300 files)
These symbols/characters are in the place of a single number:
< dle> < dc1> < dc2> < dc3> < dc4> < can> < nak> < em> < soh> < syn>
Example of what appears in the file being read, Note they are not consecutive lines.
7.7346< dle>18165493007e+01 8.4796811549010105e+00
7.7354408697119453e+01 1.04459538071< dc2>1749e+01
7.7360701595839< can>28e+01 3.0447318983094189e+00
Whenever I run the procedures that write the files, there is always at least one file that has some or all of these characters. The file/s that contains these characters is always different.
Can anyone explain what these symbols are and what I might be doing to create them as well as how to ensure they are not written to file?

I see two things that may be causing a problem. But first, I want to suggest a few tips.
When you open a file, it is useful to use the /GET_LUN keyword because it allows IDL to find and use a logical unit number (LUN) that is available (e.g., in case you left LUN 3 open somewhere else). When you print formatted data, you should specify the total width and number of decimal places. It will make things easier because then you need not worry about changing spacings between numbers in a file.
So I would change your first set of code to the following (or some variant of the following):
OPENW,gunit,filename[0],/GET_LUN,ERROR=err
FOR j=0L, N_ELEMENTS(freq) - 1L DO BEGIN
PRINTF,gunit,freq[j],power[j],FORMAT='(2e20.12)'
ENDFOR
FREE_LUN,gunit ;; this is better than using the CLOSE routine
So the first potential issue I see is that if your variable power was calculated using something like FFT.pro, then it will be a complex float or complex double, depending on the input and keywords used.
The second potential issue may be due to an incorrect format statement. You did not tell PRINTF how many columns or rows to expect. It might not know how to handle the input properly, so it guesses and may result in those characters you show. Those characters may be spacing characters due to the vague format statement or the software you are using to look at the files (e.g., I would not recommend using Word to open text files, use a text editor).
Side Note: You can open and read the file you just wrote in a similar fashion to what I showed above, but changed to the following:
n = FILE_LINES(filename[0])
freq = DBLARR(n)
power = DBLARR(n)
OPENR,gunit,filename[0],/GET_LUN,ERROR=err
FOR j=0L, N_ELEMENTS(freq) - 1L DO BEGIN
READF,gunit,freq[j],power[j],FORMAT='(2e20.12)'
ENDFOR
FREE_LUN,gunit ;; this is better than using the CLOSE routine

Related

Do grammar subparse on a file

Let's say you only want to parse the start of a large file using Perl 6 grammar. In order to avoid reading the whole file into a string, and then call subparse on the string. Is it possible to do a subparse when reading the file?
I could not find any subparsefile() method in the Grammar class, so I guess this is difficult to implement. But it should be possible in theory, see for example How do I search a file for a multiline pattern without reading the whole file into memory?
Currently you can't. Parsing anything at the moment requires the entire string to exist in memory.
Having said that, if you know the maximum number of lines your pattern may expand over, you could do something like:
my $max = 3; # maximum number of lines
for "textfile".IO.lines(:!chomp).rotor( $max => -$max + 1 ) -> #lines {
#lines.join.subparse( $grammar)
# and whatever you would like to do
}
It wouldn't be the fastest way of doing it, but it would not have to read the whole file in memory.

Fortran: How to skip many lines of data file efficiently

I have a formatted data file which is typically billions of lines long, with several lines of headers of variable length. The data file takes the form:
# header 1
# header 2
# headers are of variable length.
# data begins from next line.
1.23 4.56 7.89 0.12
2.34 5.67 8.90 1.23
:
:
# billions of lines of data, each row the same length, same format.
-- end of file --
I would like to extract a portion of data from this file, and my current code looks like:
<pre>
do j=1,jmax !Suppose I want to extract jmax lines of data from the file.
[algorithm to determine number of lines to skip, "N(j)"]
!This determines the number of lines to skip from the previous file
!position, when the data was read on j-1th iteration.
!Skip N-1 lines to go to the next data line to read off:
do i=1,N-1
read(unit=unit,fmt='(A)')
end do
!Now read off the line of data I want:
read(unit=unit,fmt='(data_format)'),data1,data2,etc.
!Data is stored in some arrays.
end do
</pre>
The problem is, N(j) can be anywhere between 1 and several billion, so it takes some time to run the code.
My question is, is there a more efficient way of skipping millions of lines of data? The only way I can think of, while sticking to Fortran, is to open the file with direct access and jump to the desired line upon opening the file.
As you suggest, direct access seems like the best option. But that requires the records to all have the same length, which your headers violate. Also, why used formatted output? With a file of this length, its hard to imagine a person reading the file. If you use unformatted IO, the file will be both smaller and IO will be faster. Perhaps create two files, one with the headers (metadata) in human reader form, and the other with the data in native form. Native / binary representation means a loss of portability, which is something to consider if you want to move the files to different computer architectures or have them be useable for decades. Otherwise it's probably worth the convenience. Other options would be to use a more sophisticated file format that combines metadata and data, such as HDF5 or FITS, but for communication between two programs of one person, that's probably excessive.

Fortran runtime error: Bad integer for item 0 in list input?

How do I fix the Fortran runtime error: Bad integer for item 0 in list input?
Below is the Fortran program which generates a runtime error.
CHARACTER CNFILE*(*)
REAL BOX
INTEGER CNUNIT
PARAMETER ( CNUNIT = 10 )
INTEGER NN
OPEN ( UNIT = CNUNIT, FILE = CNFILE, STATUS = 'OLD' )
READ ( CNUNIT,* ) NN, BOX
The error message received from gdb is :
At line 688 of file MCNPT.f (unit = 10, file = 'LATTICE-256.txt')
Fortran runtime error: Bad integer for item 0 in list input
[Inferior 1 (process 3052) exited with code 02]
(gdb)
I am not sure what options must be specified for READ() to read to numbers from the text file. Does it matter if the two numbers on the same line are specified as either an integer or a real in the text file?
Below is the gdb execution of the program using a break point at the open call
Breakpoint 1, readcn (
cnfile=<error reading variable: Cannot access memory at address 0x7fffffffdff0>,
box=-3.37898272e+33, _cnfile=30) at MCNPT.f:686
Since you did not specify form="unformatted" on the open statement, the unit / file is opened for formatted IO. This is appropriate for a human-readable text file. ("unformatted" would be used for a non-human readable file in computer-native format, sometimes called "binary".) Therefore you should provide a format on the read, or use list-directed read, i.e., read(unit, *). To advise on a particular format we would have to know the layout of the numbers in the file. A possible read with format is: read (CNUINT, '(I4, 2X, F6.2)' ) NN, BOX
P.S. I'm answering the question in your question and not the title, which seems unrelated.
EDIT: now that you are show the text data file, a list-directed read looks easier. That is because the data doesn't line up in columns. It seems that the file has two integers on the first line, then three real numbers on each of the following lines. Most likely you need a different read for the first line. Is the code sample that you are showing us trying to read the first line, or one of the later lines? If the first line, it would seem plausible to read into two integer variables. If a later line, into two or three real variables. Two if you wish to skip the third data item on the line.
EDIT 2: the question has been substantially altered several times, which is very confusing. The first line of the text file that was shown in one version of the question contained integers, with later lines having reals. Since the listed-directed read is reading into an integer and a floating variable, it will have problems if you attempt to use it on the later lines that have two real values.

'Bad repeat count' while inputting a file, FORTRAN

I am trying to read a file into my code.
there are 2 subroutines, one which writes a file and the other which reads it.
the writing part was:
write(*,*)'entered refile, shall make file'
ileunitA=int(presentstep)
write(fname,1012)ileunitA
1012 format('DATA_',i6.6,'.dat')
write(fnam,1112)index
1112 format('pp',i3.3)
open(UNIT=ileunitA,FILE=fname)
!variables from module global
write(ileunita,*)u,v,w,pc,p,p0,rho1,gam,con
write(ileunita,*)aip,aim,ajp,ajm,akp,akm,ap,ap0
write(ileunita,*) scon,smomu,smomv,smomw
...
The reading part was as follows(in another subroutine):
ileunita=25;
open(unit=ILEUNITA,file='DATA_010500.dat')
!variables from module global
read(ileunita,*)u,v,w,pc,p,p0,rho1,gam,con
read(ileunita,*)aip,aim,ajp,ajm,akp,akm,ap,ap0
read(ileunita,*) scon,smomu,smomv,smomw
...
When I run the code, it shows the following error:
At line 3682 of file bub2.f90 (unit = 25, file = 'DATA_000001.dat')
Fortran runtime error: Bad repeat count in item 1 of list input
Can anyone help me figure out what could be the problem? And what is 'repeat count'. What is a 'bad' repeat count? Thanks
Guessing a little (you could show the text in the problematic line in your question...), but you are using list directed input (and output) with the * as the second specifier in the read (and write) statements. List directed input allows multiple fields that have the same value to be represented using the syntax r*c, where r is a numeric repeat count and c is the value to be repeated.
If any of your output items generate a field that contains a * then that could be confusing the processing of input.
(It is permissible (though rare) for a processor to represent multiple output fields that have the same value using a repeat count, for example WRITE (unit,*) 23, 23, 23, 23 could result in an input file that contains the text 4*23.)
List directed input also has some other features, such as the handling of delimiter characters, the / character causing input processing to terminate and the possibility and handling of null values. Some of these features may surprise those not familiar with the rules (which are inspired by typical short cuts taken when input was submitted via punched cards), which why it is often better to avoid list directed input and output and use an explicit format instead.
If any of your data fields are of type character you should consider using a non-default DELIM mode to avoid any special characters within the character variable value from confusing the input processing.

SAS : read in PDF file

I am looking for ways to read in a PDF file with SAS. Apparently this is not basic functionality and there is very little to be found on the internet. (Let alone that google is not easy with PDF in you search giving you also links to PDF documents that go about other things.)
The only things that can be found, are people looking for ways to import data into datasets from a PDF. For me, that is not even necesarry. I would like to be able to read the contents of the PDF file in one big character variable. If possible, it would even be better to be able to read in the file's binary data.
Is this possible with SAS and how? (I got it to work in Access VBA, but can't find any similar ways in SAS.)
(In the end, the purpose is to convert this to base64 and put that base64-string into an XML document.)
You probably will not be able to read the entire file into one character variable since the maximum size of a character variable is around 33 KB. A simple way to read in one line at a time, though, is something like the following:
%let pdfFileName = Test.pdf;
%let lineSize = 2000;
data base;
format text_line $&lineSize..;
infile "&pdfFileName" lrecl=&lineSize;
input text_line $;
run;
This requires that you have a general idea of the maximum record length ahead of time, but you could write additional code to determine the maximum record size prior to reading in the file. In this example each line of text is read into one character variable named "text_line." From there, you could use a RETAIN statement or double trailers (##) in the INPUT line to process multiple lines at a time. The SAS web-site has plenty of documentation on how to read and process text from various types of input files.