Trying to figure out the structure of an input file based on a code - input

I'm sort of troubleshooting a code written in Fortran but I'm not yet familiar with it. I have the following code (a part), and basically, I have to 'reverse-engineer' the structure of the input file.
OPEN(10, FILE=TRIM(10)//'.txt', STATUS='OLD')
READ(10, *) ! header
READ(10, *) NB_TEMP
ALLOCATE(TEMP(abs(NB_TEMP)))
IF (NB_TEMP <0) THEN
NB_TEMP = ABS(NB_TEMP)
READ(10, *) TEMP_0, TEMP_D
TEMP(1) = TEMP_0
DO I=2, NB_TEMP
TEMP(I) = TEMP(I-1) + TEMP_D
ENDDO
ELSEIF (NB_TEMP>0) THEN
READ(10,*) TEMP(:)
ENDIF
READ(10, *) NB_PRS
ALLOCATE(PRS(ABS(NB_PRS)))
IF (NB_PRS<0) THEN
NB_PRS = ABS(NB_PRS)
READ(10, *) PRS_0, PRS_D
PRS(1) = PRS_0*PI/180.
DO I=2, NB_PRS
PRS(I) = PRS(I-1) + PRS_D*PI/180.
ENDDO
ELSEIF(NB_PRS>0) THEN
READ(10,*) PRS(:)
DO I=1, NB_PRS
PRS(I) = PRS(I)*PI/180.
ENDDO
ENDIF
So, I know I'm opening the .txt file first. Then, the first value I read is the "NB_TEMP". I'm not understanding what is happening at the second READ command. Does the program read the same values but set them differently, as "NB_PRS" I don't really have an error or anything - I'm simply trying to understand what this code does line-by-line and what would the structure be like for the input text file.
Thanks in advance!

Each read command reads a line from the file. Lets us examine them one by one
READ(10, *) ! header
Ignore the first line
READ(10, *) NB_TEMP
Read an integer value into NB_TEMP.
READ(10, *) TEMP_0, TEMP_D
Read two real values. This happens if NB_TEMP is negative, and they represent the starting value and the step for an arithmetic sequence stored into TEMP(:). This is asserted from TEMP(I) = TEMP(I-1) + TEMP_D.
READ(10,*) TEMP(:)
Read multiple real values and store them into TEMP(:) array. The should be NB_TEMP count values in one line here.
and similarly for PRS where the program branches depending if the integer NB_PRS is positive or negative.
Here are some valid inputs they way I interpret this code
! TEST INPUT #1, DEFINED TEMP and PRS
6
32.0000 40.0000 60.0000 90.0000 120.0000 130.0000
9
0.00000 5.0000 15.0000 20.0000 25.0000 30.0000 45.0000 55.0000 60.0000
! TEST INPUT #2, SEQUENCE TEMP and DEFINED PRS
-6
30.0000 10.0000
9
0.00000 5.0000 15.0000 20.0000 25.0000 30.0000 45.0000 55.0000 60.0000

Related

How do you detect blank lines in Fortran?

Given an input that looks like the following:
123
456
789
42
23
1337
3117
I want to iterate over this file in whitespace-separated chunks in Fortran (any version is fine). For example, let's say I wanted to take the average of each chunk (e.g. mean(123, 456, 789) then mean(42, 23, 1337) then mean(31337)).
I've tried iterating through the file normally (e.g. READ), reading in each line as a string and then converting to an int and doing whatever math I want to do on each chunk. The trouble here is that Fortran "helpfully" ignores blank lines in my text file - so when I try and compare against the empty string to check for the blank line, I never actually get a .True. on that comparison.
I feel like I'm missing something basic here, since this is a typical functionality in every other modern language, I'd be surprised if Fortran didn't somehow have it.
If you're using so-called "list-directed" input (format = '*'), Fortran does special handling to spaces, commas, and blank lines.
To your point, there's a feature which is using the BLANK keyword with read
read(iunit,'(i10)',blank="ZERO",err=1,end=2) array
You can set:
blank="ZERO" will return a valid zero value if a blank is found;
blank="NULL" is the default behavior that skips blank/returns an error depending on the input format.
If all your input values are positive, you could use blank="ZERO" and then use the location of zero values to process your data.
EDIT as #vladimir-f has correctly pointed out, you not only have blanks in between lines, but also after the end of the numbers in most lines, so this strategy will not work.
You can instead load everything into an array, and process it afterwards:
program array_with_blanks
integer :: ierr,num,iunit
integer, allocatable :: array(:)
open(newunit=iunit,file='stackoverflow',form='formatted',iostat=ierr)
allocate(array(0))
do
read(iunit,'(i10)',iostat=ierr) num
if (is_iostat_end(ierr)) then
exit
else
array = [array,num]
endif
end do
close(iunit)
print *, array
end program
Just read each line as a character (but note Francescalus's comment on the format). Then read the character as an internal file.
program stuff
implicit none
integer io, n, value, sum
character (len=1000) line
n = 0
sum = 0
io = 0
open( 42, file="stuff.txt" )
do while( io == 0 )
read( 42, "( a )", iostat = io ) line
if ( io /= 0 .or. line == "" ) then
if ( n > 0 ) print *, ( sum + 0.0 ) / n
n = 0
sum = 0
else
read( line, * ) value
n = n + 1
sum = sum + value
end if
end do
close( 42 )
end program stuff
456.000000
467.333344
3117.00000

How to detect a NaN in list directed input in fortran

I read real valued data from an input file with list directed input. What is the best way to detect a NaN via iostat=iostatus?
As an example for the input for a large program I need to read a file like:
1 NaN 1.0
2 nan 2.0
where there are occasionally "NaN"s instead of real valued numbers.
with READ(ird, *, iostat=iostatus) nr, r1, r2
no iostatus is flagged for gfortran 7.4.0 on UBUNTU!
Apparently the input error is treated as a quiet NaN instead of a signalling NaN.
The gfortran compiler options do not seem to provide an option to turn this into a signalling NaN.
The fortran standard at
https://j3-fortran.org/doc/year/18/18-007r1.pdf
does not specify treatment of a Nan in list directed input.
In section 13.7.2.3.2 there is info for formatted input, and it specifically states that the Nan on input is treated as a quiet NaN.
I found a documentation on the NAG compiler and it states explicitely that a NaN on input is never flagged as a signalling NaN.
The philosophy behind these decisions in the standard beats me. I would have expected that this shoud be an error.
If the line is changed to:
read(ird, *, err=10, iostat=iostatus) nr, r1, r2
10 continue
the iostatus changes to +5001 ! But I do not really want to resort back to the ancient err= clause.
I am aware that I could read the line into a character string, analyze the string for the presence of any "NaN" and proceed. I could as in the example analyse ieee_is_nan(). That seems however unnecessarily tedious given the iostat= clause.
So is there a compiler option that I missed ?
PROGRAM main
USE ieee_arithmetic
IMPLICIT NONE
INTEGER, PARAMETER :: IRD = 12
INTEGER :: iostatus
REAL :: r1 = -1.0, r2 = -1.0
INTEGER :: nr = 0
CHARACTER(LEN=1024) :: line = ' '
OPEN(UNIT=IRD,FILE='data.data', STATUS='old')
READ(IRD, *, IOSTAT=iostatus, IOMSG=line) nr, r1, r2
WRITE(*,*) iostatus, nr, r1, r2, ieee_is_nan(r1), line(1:LEN_TRIM(line))
r1 = -1.0
r2 = -1.0
READ(IRD, ERR=10, IOSTAT=iostatus, IOMSG=line) nr, r1, r2
10 CONTINUE
WRITE(*,*) iostatus, nr, r1, r2, ieee_is_nan(r1), line(1:LEN_TRIM(line))
CLOSE(UNIT=IRD)
END PROGRAM main
The output of my program is: (Shortened insignificant digits and blanks)
0 1 NaN 1.00 T
5001 1 -1.00 -1.00 F
Missing format for FORMATTED data transfer
At the first read iostatus is set to 0, while on the second read its set to 5001!

Octave strread can't return parsed results to an array (?)

In Octave, I am reading very large text files from disk and parsing them. The function textread() does just what I want except for the way it is implemented. Looking at the source, textread.m pulls the entire text file into memory before attempting to parse lines. If the text file is large, it fills all my free RAM (16 GB) with text and then starts saving back to disk (virtual memory), before parsing. If I wait long enough, textread() will complete, but it takes almost forever.
Notice that after parsing into a matrix of floating point values, the same data fit into memory quite easily. So I'm using textread() in an intermediate zone, where there is enough memory for the floats, but not enough memory for the same data as text.
All of that is preparation for my question, which is about strread(). The data in my text files looks like this
0.0647148 -2.0072535 0.5644875 8.6954257
0.1294296 -8.4689583 0.6567095 144.3090450
0.1941444 -9.2658037 -1.0228742 173.8027785
0.2588593 -6.5483359 -1.5767574 90.7337329
0.3235741 -0.7646807 -0.5320896 1.7357120
... and so on. There are no header lines or comments in the file.
I wrote a function that reads the file line by line, and notice the two ways I'm attempting to use strread() to parse a line of data.
function dest = readPowerSpectrumFile(filename, dest)
% read enough lines to fill destination array
[rows, cols] = size(dest);
fid = fopen(filename, 'r');
for line = 1 : rows
lstr = fgetl(fid);
% this line works, but is very brittle
[dest(line, 1), dest(line, 2), dest(line, 3), dest(line, 4)] = strread(lstr, "%f %f %f %f");
% This line doesn't work. Or anything similar I can think of.
% dest(line, 1:4) = strread(lstr, "%f %f %f %f");
endfor
fclose(fid);
endfunction
Is there an elegant way of having strread return parsed values to an array? Otherwise I'll have to write a new function any time I change the number of columns.
Thanks
Your described format is a matrix with floating point values. In this case you can just use load
d = load ("yourfile");
which is much faster than any other function. You can have a look at the used implementation in libinterp/corefcn/ls-mat-ascii.cc: read_mat_ascii_data
If you feed fprintf more values than are in its format specification, it will reapply the print statement until it's used them up:
>> fprintf("%d %d \n", 1:6)
1 2
3 4
5 6
It appears this also works with strread. If you specify only one value to read, but there are multiple on the current line, it will keep reading them and add them to a column vector. All we need to do is to assign those values to the correct row of dest:
function dest = readPowerSpectrumFile(filename, dest)
% read enough lines to fill destination array
[rows, cols] = size(dest);
fid = fopen(filename, 'r');
for line = 1 : rows
lstr = fgetl(fid);
% read all values from current line into column vector
% and store values into row of dest
dest(line,:) = strread(lstr, "%f");
% this will also work since values are assumed to be numeric by default:
% dest(line,:) = strread(lstr);
endfor
fclose(fid);
endfunction
Output:
readPowerSpectrumFile(filename, zeros(5,4))
ans =
6.4715e-02 -2.0073e+00 5.6449e-01 8.6954e+00
1.2943e-01 -8.4690e+00 6.5671e-01 1.4431e+02
1.9414e-01 -9.2658e+00 -1.0229e+00 1.7380e+02
2.5886e-01 -6.5483e+00 -1.5768e+00 9.0734e+01
3.2357e-01 -7.6468e-01 -5.3209e-01 1.7357e+00

awk n-gram extracting not correct

I'm currently working on an awk script which extracts all n-grams from an input file.
When running my awk script on a file it prints out every n-gram (sorted) with the number of occurrences next to it.
When testing on an input file it prints out the correct order of n-grams. Only the number of occurrences are not correct.
For extracting n-grams I have the following code:
$1=$1
line=tolower($0)
split(line,chars,"")
begin_len=0
for (i in chars){
ngram=""
for (ind=0;ind<n;ind++){
ngram=ngram""chars[i+ind]
}
if(begin_len == 0){
begin_len=length(ngram)
}
if(length(ngram) == begin_len){
counter+=1
freq_tabel[ngram]+=1
}
}
(sort function not included)
I was wondering if there is something wrong in the code. Or are there some aspects which I have overlooked?
The output I should have is the following:
35383
1580 n
1323 en
1081 e
940 de
839 v
780 er
716 d
713 an
615 t
instead, i have the following output:
34845
1561 n
1302 en
1067 e
930 de
827 v
772 er
711 d
703 an
609 t
As you can see, the n-grams are correct but the number of occurences not.
INPUT FILE: http://cl.ly/202j3r0B1342
Not an answer but may help you (assuming n=2).
Did you happen to convert the original file (that seems UTF-8) to latin-1? I got two sets of figures:
==> sorted.latin1_in_utf8_locale <==
1566 n
1308 en
1072 e
929 de
836 v
==> sorted.utf8_in_utf8_locale <==
1579 n
1320 en
1080 e
940 de
838 v
with latin-1 input the figures are closer to yours. with utf-8 to the expected ones.
However, neither matches. Scratching my head.
BTW, I am not sorting the ngrams in the script but outputting in the form suitable for piping them to sort -rn. But this should not cause difference, I guess.
for (ngram in freq_tabel)
printf "%7i %s\n", freq_tabel[ngram], ngram
I'm in your class, so here's a couple of hints:
Copy the exact input file (using clone from github, don't do a raw copy)
Re-read the assignment, you're supposed to get rid of the leading and trailing spaces, and replace all multiple tabs/spaces with one space.
Also, what's the point of the $1 = $1 on top?

Objective C CFSwapInt32

I want to create a siple app which swaps the bytes of 2 and 4 byte hex codes.
So it should do: from 1234 to 3421 swap. I google and found out that I have to use byteorder and CFSwapInt32 and CFSwapInt16.
Here is what I already got:
NSString *byteOrder = [NSString stringWithFormat:#"%d",CFSwapInt32(12345678)];
NSLog(byteOrder);
But instead of the correct swapped bytes I get: 1315027968 as the number of the NSLog.
Can someone help me or tell me what I did wrong? :) I just want to swap bytes so they are in reversed order
1234 -->3412
12 34 -->34 12
12345678 -->78563412
12 34 56 78 --> 78 56 34 12
Thank you
Try
NSString *byteOrder = [NSString stringWithFormat:#"%x",CFSwapInt32(0x12345678)];
%x will output a value as hexadecimal.
Starting a number with 0x will interpret it as a hexadecimal value.
Your original number is 12345678 which, in hex, is 0x00BC614E
The output you get in the log is 1315027968 which, in hex, is 0x4E61BC00
So everything is working correctly.
You can try doing the same in hex if you prefer:
NSString *byteOrder = [NSString stringWithFormat:#"%x",CFSwapInt32(0x00BC614E)];
NSLog(byteOrder);
should log 0x4E61BC00 while
NSString *byteOrder = [NSString stringWithFormat:#"%x",CFSwapInt32(0x12345678)];
NSLog(byteOrder);
should log 0x78563412