Recursive expression whose only base case is an exception [Context: Reading from files in OCaml] - file-io

Edit: Disregard this question! See comments below.
I want an OCaml expression which is passed a file (as an "in_channel"), then reads the file line by line, doing some processing, to the end, then returns the result of the processing.
I wrote this test:
let rec sampler_string file string_so_far =
try
let line = input_line file in
let first_two_letters = String.sub line 0 2 in
sampler_string file (string_so_far ^ first_two_letters)
with End_of_file -> string_so_far;;
let a = sampler_string (open_in Sys.argv.(1)) "";;
(Here the "doing some processing" is adding the first two characters of each line to a running tally, and the idea is that at the end a string containing the first two characters of every line should be returned.)
This doesn't work: OCaml thinks that "sampler_string" produces something of type unit, rather than of type string. (Difficulties then occur later when I try to use the result as a string.) I think this problem is because the only base case happens in an exception (the End_of_file).
So, a specific question and a general question:
Is there a way to fix this code, by explicitly telling OCaml to expect that the result of sampler_string should be a string?
Is there some standard, better syntax for a routine which reads a file line by line to the end, and returns the result of line-by-line processing?

As Damien Pollet says, your sampler_string function compiles fine (and runs correctly) on my machine as well, ocaml v3.12.0. However, I'll answer your questions:
You can specify types on your functions/values using the : operator. For example, here's your function with it's types annotated. You'll notice that the return type is put at the very end of the function declaration.
let rec sampler_string (file : in_channel) (string_so_far : string) : string = ...
I do not know if there's a better way of reading a file, line-by-line. It certainly is a pain to be forced to deal with an end-of-file via exception. Here's a blog post on the subject, though the function presented there is of reading a file into a list of lines. Another mailing list version.
A couple of nitpicks:
You don't need to use ;; to separate function/value definitions, ocamlc can figure it out from whitespace.
You should close your file sockets.
String.sub will throw an exception if your file has a line with less than 2 characters.

A major point of style is avoiding recursive calls inside an exception handler. Such calls are not in tail position, so you will blow the stack with a sufficiently large file. Use this pattern instead:
let rec sampler_string file string_so_far =
match try Some (input_line file) with End_of_file -> None with
| Some line ->
let first_two_letters = String.sub line 0 2 in
sampler_string file (string_so_far ^ first_two_letters)
| None -> string_so_far
Of course a better functional strategy is to abstract away the recursive schema:
let rec fold_left_lines f e inch =
match try Some (input_line inch) with End_of_file -> None with
| Some line -> fold_left_lines f (f e line) inch
| None -> e
since "doing things with the lines of a file" is a generally useful operation in and of itself (counting lines, counting words, finding the longest line, parsing, etc. are all particular instances of this schema). Then your function is:
let sampler_string file string_so_far =
fold_left_lines (fun string_so_far line ->
let first_two_letters = String.sub line 0 2 in
string_so_far ^ first_two_letters)
string_so_far file

As Matias pointed out, it's first important to move the recursive call outside the try/with expression so it can be tail-call optimized.
However, there is a semi-standard solution for this: use Batteries Included. Batteries provides an abstraction, Enums, of the concept of iterating over something. Its IO infrastructure then provides the BatIO.lines_of function, which returns an enumeration of the lines of a file. So your whole function can become this:
fold (fun s line -> s ^ String.sub line 0 2) "" (BatIO.lines_of file)
The enum will automatically close the file when it is exhausted or garbage collected.
The code can be made more efficient (avoiding the repeated concatenation) with a buffer:
let buf = Buffer.create 2048 in
let () = iter (fun line -> Buffer.add_string buf (String.sub line 0 2))
(BatIO.lines_of file) in
Buffer.contents buf
Basically: Batteries can save you a lot of time and effort in code like this.

Related

How do I read a line of input from a user, until an EOF is hit, in GNU Prolog?

I have been reading the GNU Prolog documentation to figure out how to read a line of input until an end_of_file atom is reached. Here is my pseudocode for writing such a goal:
read_until_end(Chars, Out):
if peek_char unifies with end_of_file, Out = Chars
otherwise, get the current character, add it to a buffer, and keep reading
I implemented that like this:
read_until_end(Chars, Out) :-
peek_char(end_of_file) -> Out = Chars;
peek_char(C) -> read_until_end([C | Chars], Out).
prompt(Line) :-
write('> '),
read_until_end([], Line).
Here's what happens in the REPL:
| ?- prompt(Line).
> test
Fatal Error: global stack overflow (size: 32768 Kb, reached: 32765 Kb, environment variable used: GLOBALSZ)
If I print out C for the second branch of read_until_end, I can see that peek_char always gives me the same character, 'b'. I think that I need a way to progress some type of input character index or something like that, but I can't find a way to do so in the documentation. If I knew a way, I would probably have to use recursion to progress such a pointer, since I can't have any mutable state, but aside from that, I do not know what to do. Does anyone have any advice?
You are using peek_char/1 to get the next character, but that predicate does not consume the character from the stream (it just "peeks" the stream). Therefore an infinite recursion undergoes in your code that ends with a global stack overflow.
You should use get_char/1 to read and consume the character from the stream, and reverse/2 the list of collected chars:
read_until_end(Chars, Out) :-
get_char(Char),
(
Char = end_of_file -> reverse(Chars, Out)
;
read_until_end([Char | Chars], Out)
).
To avoid the need to reverse the list you may slightly modify your procedures to build the list in order (without using an accumulator):
read_until_end(Output) :-
get_char(Char),
(
Char = end_of_file -> Output=[]
;
(
Output=[Char|NOutput],
read_until_end(NOutput)
)
).
prompt(Line) :-
write('> '),
read_until_end(Line).

Do grammar subparse on a file

Let's say you only want to parse the start of a large file using Perl 6 grammar. In order to avoid reading the whole file into a string, and then call subparse on the string. Is it possible to do a subparse when reading the file?
I could not find any subparsefile() method in the Grammar class, so I guess this is difficult to implement. But it should be possible in theory, see for example How do I search a file for a multiline pattern without reading the whole file into memory?
Currently you can't. Parsing anything at the moment requires the entire string to exist in memory.
Having said that, if you know the maximum number of lines your pattern may expand over, you could do something like:
my $max = 3; # maximum number of lines
for "textfile".IO.lines(:!chomp).rotor( $max => -$max + 1 ) -> #lines {
#lines.join.subparse( $grammar)
# and whatever you would like to do
}
It wouldn't be the fastest way of doing it, but it would not have to read the whole file in memory.

SWI-Prolog predicate for reading in lines from input file

I'm trying to write a predicate to accept a line from an input file. Every time it's used, it should give the next line, until it reaches the end of the file, at which point it should return false. Something like this:
database :-
see('blah.txt'),
loop,
seen.
loop :-
accept_line(Line),
write('I found a line.\n'),
loop.
accept_line([Char | Rest]) :-
get0(Char),
C =\= "\n",
!,
accept_line(Rest).
accept_line([]).
Obviously this doesn't work. It works for the first line of the input file and then loops endlessly. I can see that I need to have some line like "C =\= -1" in there somewhere to check for the end of the file, but I can't see where it'd go.
So an example input and output could be...
INPUT
this is
an example
OUTPUT
I found a line.
I found a line.
Or am I doing this completely wrong? Maybe there's a built in rule that does this simply?
In SWI-Prolog, the most elegant way to do this is to first use a DCG to describe what a "line" means, and then use library(pio) to apply the DCG to a file.
An important advantage of this is that you can then easily apply the same DCG also on queries on the toplevel with phrase/2 and do not need to create a file to test the predicate.
There is a DCG tutorial that explains this approach, and you can easily adapt it to your use case.
For example:
:- use_module(library(pio)).
:- set_prolog_flag(double_quotes, codes).
lines --> call(eos), !.
lines --> line, { writeln('I found a line.') }, lines.
line --> ( "\n" ; call(eos) ), !.
line --> [_], line.
eos([], []).
Example usage:
?- phrase_from_file(lines, 'blah.txt').
I found a line.
I found a line.
true.
Example usage, using the same DCG to parse directly from character codes without using a file:
?- phrase(lines, "test1\ntest2").
I found a line.
I found a line.
true.
This approach can be very easily extended to parse more complex file contents as well.
If you want to read into code lists, see library(readutil), in particular read_line_to_codes/2 which does exactly what you need.
You can of course use the character I/O primitives, but at least use the ISO predicates. "Edinburgh-style" I/O is deprecated, at least for SWI-Prolog. Then:
get_line(L) :-
get_code(C),
get_line_1(C, L).
get_line_1(-1, []) :- !. % EOF
get_line_1(0'\n, []) :- !. % EOL
get_line_1(C, [C|Cs]) :-
get_code(C1),
get_line_1(C1, Cs).
This is of course a lot of unnecessary code; just use read_line_to_codes/2 and the other predicates in library(readutil).
Since strings were introduced to Prolog, there are some new nifty ways of reading. For example, to read all input and split it to lines, you can do:
read_string(user_input, _, S),
split_string(S, "\n", "", Lines)
See the examples in read_string/5 for reading linewise.
PS. Drop the see and seen etc. Instead:
setup_call_cleanup(open(Filename, read, In),
read_string(In, N, S), % or whatever reading you need to do
close(In))

Fortran runtime error: Bad integer for item 0 in list input?

How do I fix the Fortran runtime error: Bad integer for item 0 in list input?
Below is the Fortran program which generates a runtime error.
CHARACTER CNFILE*(*)
REAL BOX
INTEGER CNUNIT
PARAMETER ( CNUNIT = 10 )
INTEGER NN
OPEN ( UNIT = CNUNIT, FILE = CNFILE, STATUS = 'OLD' )
READ ( CNUNIT,* ) NN, BOX
The error message received from gdb is :
At line 688 of file MCNPT.f (unit = 10, file = 'LATTICE-256.txt')
Fortran runtime error: Bad integer for item 0 in list input
[Inferior 1 (process 3052) exited with code 02]
(gdb)
I am not sure what options must be specified for READ() to read to numbers from the text file. Does it matter if the two numbers on the same line are specified as either an integer or a real in the text file?
Below is the gdb execution of the program using a break point at the open call
Breakpoint 1, readcn (
cnfile=<error reading variable: Cannot access memory at address 0x7fffffffdff0>,
box=-3.37898272e+33, _cnfile=30) at MCNPT.f:686
Since you did not specify form="unformatted" on the open statement, the unit / file is opened for formatted IO. This is appropriate for a human-readable text file. ("unformatted" would be used for a non-human readable file in computer-native format, sometimes called "binary".) Therefore you should provide a format on the read, or use list-directed read, i.e., read(unit, *). To advise on a particular format we would have to know the layout of the numbers in the file. A possible read with format is: read (CNUINT, '(I4, 2X, F6.2)' ) NN, BOX
P.S. I'm answering the question in your question and not the title, which seems unrelated.
EDIT: now that you are show the text data file, a list-directed read looks easier. That is because the data doesn't line up in columns. It seems that the file has two integers on the first line, then three real numbers on each of the following lines. Most likely you need a different read for the first line. Is the code sample that you are showing us trying to read the first line, or one of the later lines? If the first line, it would seem plausible to read into two integer variables. If a later line, into two or three real variables. Two if you wish to skip the third data item on the line.
EDIT 2: the question has been substantially altered several times, which is very confusing. The first line of the text file that was shown in one version of the question contained integers, with later lines having reals. Since the listed-directed read is reading into an integer and a floating variable, it will have problems if you attempt to use it on the later lines that have two real values.

Reading comment lines correctly in an input file using Fortran 90

It is my understanding that Fortran, when reading data from file, will skip lines starting with and asterisk (*) assuming that they are a comment. Well, I seem to be having a problem with achieving this behavior with a very simple program I created. This is my simple Fortran program:
1 program test
2
3 integer dat1
4
5 open(unit=1,file="file.inp")
6
7 read(1,*) dat1
8
9
10 end program test
This is "file.inp":
1 *Hello
2 1
I built my simple program with
gfortran -g -o test test.f90
When I run, I get the error:
At line 7 of file test.f90 (unit = 1, file = 'file.inp')
Fortran runtime error: Bad integer for item 1 in list input
When I run the input file with the comment line deleted, i.e.:
1 1
The code runs fine. So it seems to be a problem with Fortran correctly interpreting that comment line. It must be something exceedingly simple I'm missing here, but I can't turn up anything on google.
Fortran doesn't automatically skip comments lines in input files. You can do this easily enough by first reading the line into a string, checking the first character for your comment symbol or search the string for that symbol, then if the line is not a comment, doing an "internal read" of the string to obtain the numeric value.
Something like:
use, intrinsic :: iso_fortran_env
character (len=200) :: line
integer :: dat1, RetCode
read_loop: do
read (1, '(A)', isostat=RetCode) line
if ( RetCode == iostat_end) exit ReadLoop
if ( RetCode /= 0 ) then
... read error
exit read_loop
end if
if ( index (line, "*") /= 0 ) cycle read_loop
read (line, *) dat1
end do read_loop
Fortran does not ignore anything by default, unless you are using namelists and in that case comments start with an exclamation mark.
I found the use of the backspace statement to be a lot more intuitive than the proposed solutions. The following subroutine skips the line when a comment character, "#" is encountered at the beginning of the line.
subroutine skip_comments(fileUnit)
integer, intent(in) :: fileUnit
character(len=1) :: firstChar
firstChar = '#'
do while (firstChar .eq. '#')
read(fileUnit, '(A)') firstChar
enddo
backspace(fileUnit)
end subroutine skip_comments
This subroutine may be used in programs before the read statement like so:
open(unit=10, file=filename)
call skip_comments(10)
read(10, *) a, b, c
call skip_comments(10)
read(10, *) d, e
close(10)
Limitations for the above implementation:
It will not work if the comment is placed between the values of a variable spanning multiple lines, say an array.
It is very inefficient for large input files since the entire file is re-read from the beginning till the previous character when the backspace statement is encountered.
Can only be used for sequential access files, i.e. typical ASCII text files. Files opened with the direct or append access types will not work.
However, I find it a perfect fit for short files used for providing user-parameters.