How do I read a line of input from a user, until an EOF is hit, in GNU Prolog? - input

I have been reading the GNU Prolog documentation to figure out how to read a line of input until an end_of_file atom is reached. Here is my pseudocode for writing such a goal:
read_until_end(Chars, Out):
if peek_char unifies with end_of_file, Out = Chars
otherwise, get the current character, add it to a buffer, and keep reading
I implemented that like this:
read_until_end(Chars, Out) :-
peek_char(end_of_file) -> Out = Chars;
peek_char(C) -> read_until_end([C | Chars], Out).
prompt(Line) :-
write('> '),
read_until_end([], Line).
Here's what happens in the REPL:
| ?- prompt(Line).
> test
Fatal Error: global stack overflow (size: 32768 Kb, reached: 32765 Kb, environment variable used: GLOBALSZ)
If I print out C for the second branch of read_until_end, I can see that peek_char always gives me the same character, 'b'. I think that I need a way to progress some type of input character index or something like that, but I can't find a way to do so in the documentation. If I knew a way, I would probably have to use recursion to progress such a pointer, since I can't have any mutable state, but aside from that, I do not know what to do. Does anyone have any advice?

You are using peek_char/1 to get the next character, but that predicate does not consume the character from the stream (it just "peeks" the stream). Therefore an infinite recursion undergoes in your code that ends with a global stack overflow.
You should use get_char/1 to read and consume the character from the stream, and reverse/2 the list of collected chars:
read_until_end(Chars, Out) :-
get_char(Char),
(
Char = end_of_file -> reverse(Chars, Out)
;
read_until_end([Char | Chars], Out)
).
To avoid the need to reverse the list you may slightly modify your procedures to build the list in order (without using an accumulator):
read_until_end(Output) :-
get_char(Char),
(
Char = end_of_file -> Output=[]
;
(
Output=[Char|NOutput],
read_until_end(NOutput)
)
).
prompt(Line) :-
write('> '),
read_until_end(Line).

Related

Null char returning from reading a file in Common Lisp

I’m reading files and storing them as a string using this function:
(defun file-to-str (path)
(with-open-file (stream path) :external-format 'utf-8
(let ((data (make-string (file-length stream))))
(read-sequence data stream)
data)))
If the file has only ASCII characters, I get the content of the files as expected; but if there are characters beyond 127, I get a null character (^#), at the end of the string, for each such character beyond 127. So, after $ echo "~a^?" > ~/teste I get
CL-USER> (file-to-string "~/teste")
"~a^?
"
; but after echo "aaa§§§" > ~/teste , the REPL gives me
CL-USER> (file-to-string "~/teste")
"aaa§§§
^#^#^#"
and so forth. How can I fix this? I’m using SBCL 1.4.0 in an utf-8 locale.
First of all, your keyword argument :external-format is misplaced and has no effect. It should be inside the parenteses with stream and path. However, this has no effect to the end result, as UTF-8 is the default encoding.
The problem here is that in UTF-8 encoding, it takes a different number of bytes to encode different characters. ASCII characters all encode into single bytes, but other characters take 2-4 bytes. You are now allocating, in your string, data for every byte of the input file, not every character in it. The unused characters end up unchanged; make-string initializes them as ^#.
The (read-sequence) function returns the index of the first element not changed by the function. You are currently just discarding this information, but you should use it to resize your buffer after you know how many elements have been used:
(defun file-to-str (path)
(with-open-file (stream path :external-format :utf-8)
(let* ((data (make-string (file-length stream)))
(used (read-sequence data stream)))
(subseq data 0 used))))
This is safe, as length of the file is always greater or equal to the number of UTF-8 characters encoded in it. However, it is not terribly efficient, as it allocates an unnecessarily large buffer, and finally copies the whole output into a new string for returning the data.
While this is fine for a learning experiment, for real-world use cases I recommend the Alexandria utility library that has a ready-made function for this:
* (ql:quickload "alexandria")
To load "alexandria":
Load 1 ASDF system:
alexandria
; Loading "alexandria"
* (alexandria:read-file-into-string "~/teste")
"aaa§§§
"
*

How to read elements from a line in VHDL?

I'm trying to use VHDL to read from a file that can have different formats. I know you're supposed to use the following two lines of code to read a line at a time, the read individual elements in that line.
readline(file, aline);
read(aline, element);
However my question is what will read(aline, element) return into element? What will it return if the line is empty? What will it return if I've used it let's say 5 times and my line only has 4 characters?
The reason I want to know is that if I am reading a file with an arbitrary number of spaces between valid data, how do I parse this valid data?
The file contains ASCII characters separated by arbitrary amounts of white space (any number of spaces, tabs, or new lines). If the line starts with a # that line is a comment and should be ignored.
Outside of these comments, the first part of the file contains characters that are only letters or numbers in combinations of variable size. In other words this:
123 ABC 12ABB3
However, the majority of the file (after a certain number of read words) will be purely numbers of arbitrary length, separated by an arbitrary amount of white space. In other words, the second part of the file is this:
255 0 2245 625 430
2222 33 111111
and I must be able to parse these numbers (and interpret them as such) individually.
As mentioned in the comments, all the read procedures in std.textio and ieee.std_logic_textio skip over leading spaces apart from the character and string versions (because a space is as much a character as any other).
You can test whether a line variable (the buffer) is empty like this:
if L'length > 0 then
where L is your line variable. There is also a set of overloaded read procedures with an extra status output:
procedure read (L : inout LINE;
VALUE: out <type> ;
GOOD : out BOOLEAN);
The extra output - GOOD - is true if the read was successful and false if it wasn't. The advantage of these if that the read is unsuccessful, the simulation does not stop (as it does with the regular procedures). Also, with the versions in std.textio, if the read is unsuccessful, the read is non-destructive (ie whatever you were trying to read remains in the buffer). This is not the case with the versions in ieee.std_logic_textio, however.
If you really do not know what format you are trying to read, you could read the entire line into a string, like this:
variable S : string(1 to <some big number>);
...
readline(F, L);
assert L'length < S'length; -- make sure S is big enough
S := (others => ' '); -- make sure that the previous line is overwritten
if L'length > 0 then
read(L, S(1 to L'length);
end if;
The line L is now in the string S. You can then write some code to parse it. You may find the type attribute 'value useful. This converts a string to some type, eg
variable I : integer;
...
I := integer'value(S(12 to 14));
would set integer I to the value contained in elements 12 to 14 of string S.
Another approach, as suggested by user1155120 below, is to peek at the values in the buffer, eg
if L'length > 0 then -- check that the L isn't empty, otherwise the next line blows up
if L.all(1) = '#' then
-- the first character of the line is a '#' so the line must be a comment

SWI-Prolog predicate for reading in lines from input file

I'm trying to write a predicate to accept a line from an input file. Every time it's used, it should give the next line, until it reaches the end of the file, at which point it should return false. Something like this:
database :-
see('blah.txt'),
loop,
seen.
loop :-
accept_line(Line),
write('I found a line.\n'),
loop.
accept_line([Char | Rest]) :-
get0(Char),
C =\= "\n",
!,
accept_line(Rest).
accept_line([]).
Obviously this doesn't work. It works for the first line of the input file and then loops endlessly. I can see that I need to have some line like "C =\= -1" in there somewhere to check for the end of the file, but I can't see where it'd go.
So an example input and output could be...
INPUT
this is
an example
OUTPUT
I found a line.
I found a line.
Or am I doing this completely wrong? Maybe there's a built in rule that does this simply?
In SWI-Prolog, the most elegant way to do this is to first use a DCG to describe what a "line" means, and then use library(pio) to apply the DCG to a file.
An important advantage of this is that you can then easily apply the same DCG also on queries on the toplevel with phrase/2 and do not need to create a file to test the predicate.
There is a DCG tutorial that explains this approach, and you can easily adapt it to your use case.
For example:
:- use_module(library(pio)).
:- set_prolog_flag(double_quotes, codes).
lines --> call(eos), !.
lines --> line, { writeln('I found a line.') }, lines.
line --> ( "\n" ; call(eos) ), !.
line --> [_], line.
eos([], []).
Example usage:
?- phrase_from_file(lines, 'blah.txt').
I found a line.
I found a line.
true.
Example usage, using the same DCG to parse directly from character codes without using a file:
?- phrase(lines, "test1\ntest2").
I found a line.
I found a line.
true.
This approach can be very easily extended to parse more complex file contents as well.
If you want to read into code lists, see library(readutil), in particular read_line_to_codes/2 which does exactly what you need.
You can of course use the character I/O primitives, but at least use the ISO predicates. "Edinburgh-style" I/O is deprecated, at least for SWI-Prolog. Then:
get_line(L) :-
get_code(C),
get_line_1(C, L).
get_line_1(-1, []) :- !. % EOF
get_line_1(0'\n, []) :- !. % EOL
get_line_1(C, [C|Cs]) :-
get_code(C1),
get_line_1(C1, Cs).
This is of course a lot of unnecessary code; just use read_line_to_codes/2 and the other predicates in library(readutil).
Since strings were introduced to Prolog, there are some new nifty ways of reading. For example, to read all input and split it to lines, you can do:
read_string(user_input, _, S),
split_string(S, "\n", "", Lines)
See the examples in read_string/5 for reading linewise.
PS. Drop the see and seen etc. Instead:
setup_call_cleanup(open(Filename, read, In),
read_string(In, N, S), % or whatever reading you need to do
close(In))

Reading a character string of unknown length

I have been tasked with writing a Fortran 95 program that will read character input from a file, and then (to start with) simply spit it back out again.
The tricky part is that these lines of input are of varying length (no maximum length given) and there can be any number of lines within the file.
I've used
do
read( 1, *, iostat = IO ) DNA ! reads to EOF -- GOOD!!
if ( IO < 0 ) exit ! if EOF is reached, exit do
I = I + 1
NumRec = I ! used later for total no. of records
allocate( Seq(I) )
Seq(I) = DNA
print*, I, Seq(I)
X = Len_Trim( Seq(I) ) ! length of individual sequence
print*, 'Sequence size: ', X
print*
end do
However, my initial statements list
character(100), dimension(:), allocatable :: Seq
character(100) DNA
and the appropriate integers etc.
I guess what I'm asking is if there is any way to NOT list the size of the character strings in the first instance. Say I've got a string of DNA that is 200+ characters, and then another that is only 25, is there a way that the program can just read what there is and not need to include all the additional blanks? Can this be done without needing to use len_trim, since it can't be referenced in the declaration statements?
To progressively read a record in Fortran 95, use non-advancing input. For example:
CHARACTER(10) :: buffer
INTEGER :: size
READ (unit, "(A)", ADVANCE='NO', SIZE=size, EOR=10, END=20) buffer
will read up to 10 characters worth (the length of buffer) each time it is called. The file position will only advance to the next record (the next line) once the entire record has been read by a series of one or more non-advancing reads.
Barring an end of file condition, the size variable will be defined with the actual number of characters read into buffer each time the read statement is executed.
The EOR and END and specifiers are used to control execution flow (execution will jump to the appropriately labelled statement) when end of record or end of file conditions occur respectively. You can also use an IOSTAT specifier to detect these conditions, but the particular negative values to use for the two conditions are processor dependent.
You can sum size within a particular record to work out the length of that particular record.
Wrap such a non-advancing read in a loop that appropriately detects for end of file and end of record and you have the incremental reading part.
In Fortran 95, the length specification for a local character variable must be a specification expression - essentially an expression that can be safely evaluated prior to the first executable statement of the scope that contains the variable's declaration. Constants represent the simplest case, but a specification expression in a procedure can involve dummy arguments of that procedure, amongst other things.
Reading the entire record of arbitrary length in is then a multi stage process:
Determine the length of the current record by using a series of incremental reads. These incremental reads for a particular record finish when the end of record condition occurs, at which time the file position will have moved to the next record.
Backspace the file back to the record of interest.
Call a procedure, passing the length of the current record as a dummy argument. Inside that procedure have an character variable whose length is given by the dummy argument.
Inside that called procedure, read the current record into that character variable using normal advancing input.
Carry out further processing on that character variable!
Note that each record ends up being read twice - once to determine its length, the second to actually read the data into the correctly "lengthed" character variable.
Alternative approaches exist that use allocatable (or automatic) character arrays of length one. The overall strategy is the same. Look at the code of the Get procedures in the common ISO_VARYING_STRING implementation for an example.
Fortran 2003 introduces deferred length character variables, which can have their length specified by an arbitrary expression in an allocate statement or, for allocatable variables, by the length of the right hand side in an assignment statement. This (in conjunction with other "allocatable" enhancements) allows the progressive read that determines the record length to also build the character variable that holds the contents of the record. Your supervisor needs to bring his Fortran environment up to date.
Here's a function for Fortran 2003, which sets an allocatable string (InLine) of exactly the length of the input string (optionally trimmed), or returns .false. if end of file
function ReadLine(aunit, InLine, trimmed) result(OK)
integer, intent(IN) :: aunit
character(LEN=:), allocatable, optional :: InLine
logical, intent(in), optional :: trimmed
integer, parameter :: line_buf_len= 1024*4
character(LEN=line_buf_len) :: InS
logical :: OK, set
integer status, size
OK = .false.
set = .true.
do
read (aunit,'(a)',advance='NO',iostat=status, size=size) InS
OK = .not. IS_IOSTAT_END(status)
if (.not. OK) return
if (present(InLine)) then
if (set) then
InLine = InS(1:size)
set=.false.
else
InLine = InLine // InS(1:size)
end if
end if
if (IS_IOSTAT_EOR(status)) exit
end do
if (present(trimmed) .and. present(InLine)) then
if (trimmed) InLine = trim(adjustl(InLine))
end if
end function ReadLine
For example to do something with all lines in a file with unit "aunit" do
character(LEN=:), allocatable :: InLine
do while (ReadLine(aunit, InLine))
[.. something with InLine]
end do
I have used the following. Let me know if it is better or worse than yours.
!::::::::::::::::::::: SUBROUTINE OR FUNCTION :::::::::::::::::::::::::::::::::::::::
!__________________ SUBROUTINE lineread(filno,cargout,ios) __________________________
subroutine lineread(filno,cargout,ios)
Use reallocate,ErrorMsg,SumStr1,ChCount
! this subroutine reads
! 1. following row in a file except a blank line or the line begins with a !#*
! 2. the part of the string until first !#*-sign is found or to end of string
!
! input Arguments:
! filno (integer) input file number
!
! output Arguments:
! cargout (character) output chArActer string, converted so that all unecessay spaces/tabs/control characters removed.
implicit none
integer,intent(in)::filno
character*(*),intent(out)::cargout
integer,intent(out)::ios
integer::nlen=0,i,ip,ich,isp,nsp,size
character*11,parameter::sep='=,;()[]{}*~'
character::ch,temp*100
character,pointer::crad(:)
nullify(crad)
cargout=''; nlen=0; isp=0; nsp=0; ich=-1; ios=0
Do While(ios/=-1) !The eof() isn't standard Fortran.
READ(filno,"(A)",ADVANCE='NO',SIZE=size,iostat=ios,ERR=9,END=9)ch ! start reading file
! read(filno,*,iostat=ios,err=9)ch;
if(size>0.and.ios>=0)then
ich=iachar(ch)
else
READ(filno,"(A)",ADVANCE='no',SIZE=size,iostat=ios,EOR=9); if(nlen>0)exit
end if
if(ich<=32)then ! tab(9) or space(32) character
if(nlen>0)then
if(isp==2)then
isp=0;
else
isp=1;
end if
eend if; cycle;
elseif(ich==33.or.ich==35.or.ich==38)then !if char is comment !# or continue sign &
READ(filno,"(A)",ADVANCE='yes',SIZE=size,iostat=ios,EOR=9)ch; if(nlen>0.and.ich/=38)exit;
else
ip=scan(ch,sep);
if(isp==1.and.ip==0)then; nlen=nlen+1; crad=>reallocate(crad,nlen); nsp=nsp+1; endif
nlen=nlen+1; crad=>reallocate(crad,nlen); crad(nlen)=ch;
isp=0; if(ip==1)isp=2;
end if
end do
9 if(size*ios>0)call ErrorMsg('Met error in reading file in [lineread]',-1)
! ios<0: Indicating an end-of-file or end-of-record condition occurred.
if(nlen==0)return
!write(6,'(a,l)')SumStr1(crad),eof(filno)
!do i=1,nlen-1; write(6,'(a,$)')crad(i:i); end do; if(nlen>0)write(6,'(a)')crad(i:i)
cargout=SumStr1(crad)
nsp=nsp+1; i=ChCount(SumStr1(crad),' ',',')+1;
if(len(cargout)<nlen)then
call ErrorMsg(SumStr1(crad)// " is too long!",-1)
!elseif(i/=nsp.and.nlen>=0)then
! call ErrorMsg(SumStr1(crad)// " has unrecognizable data number!",-1)
end if
end subroutine lineread
I'm using Fortran 90 to do this:
X = Len_Trim( Seq(I) ) ! length of individual sequence
write(*,'(a<X>)') Seq(I)(1:X)
You can simply declare Seq to be a large character string and then trim it as your write it out. I don't know how kosher this solution is but it certainly works for my purpose. I know that some compilers do not support "variable format expressions", but there are various workarounds to do the same thing almost as simply.
GNU Fortran variable expression workaround.

Recursive expression whose only base case is an exception [Context: Reading from files in OCaml]

Edit: Disregard this question! See comments below.
I want an OCaml expression which is passed a file (as an "in_channel"), then reads the file line by line, doing some processing, to the end, then returns the result of the processing.
I wrote this test:
let rec sampler_string file string_so_far =
try
let line = input_line file in
let first_two_letters = String.sub line 0 2 in
sampler_string file (string_so_far ^ first_two_letters)
with End_of_file -> string_so_far;;
let a = sampler_string (open_in Sys.argv.(1)) "";;
(Here the "doing some processing" is adding the first two characters of each line to a running tally, and the idea is that at the end a string containing the first two characters of every line should be returned.)
This doesn't work: OCaml thinks that "sampler_string" produces something of type unit, rather than of type string. (Difficulties then occur later when I try to use the result as a string.) I think this problem is because the only base case happens in an exception (the End_of_file).
So, a specific question and a general question:
Is there a way to fix this code, by explicitly telling OCaml to expect that the result of sampler_string should be a string?
Is there some standard, better syntax for a routine which reads a file line by line to the end, and returns the result of line-by-line processing?
As Damien Pollet says, your sampler_string function compiles fine (and runs correctly) on my machine as well, ocaml v3.12.0. However, I'll answer your questions:
You can specify types on your functions/values using the : operator. For example, here's your function with it's types annotated. You'll notice that the return type is put at the very end of the function declaration.
let rec sampler_string (file : in_channel) (string_so_far : string) : string = ...
I do not know if there's a better way of reading a file, line-by-line. It certainly is a pain to be forced to deal with an end-of-file via exception. Here's a blog post on the subject, though the function presented there is of reading a file into a list of lines. Another mailing list version.
A couple of nitpicks:
You don't need to use ;; to separate function/value definitions, ocamlc can figure it out from whitespace.
You should close your file sockets.
String.sub will throw an exception if your file has a line with less than 2 characters.
A major point of style is avoiding recursive calls inside an exception handler. Such calls are not in tail position, so you will blow the stack with a sufficiently large file. Use this pattern instead:
let rec sampler_string file string_so_far =
match try Some (input_line file) with End_of_file -> None with
| Some line ->
let first_two_letters = String.sub line 0 2 in
sampler_string file (string_so_far ^ first_two_letters)
| None -> string_so_far
Of course a better functional strategy is to abstract away the recursive schema:
let rec fold_left_lines f e inch =
match try Some (input_line inch) with End_of_file -> None with
| Some line -> fold_left_lines f (f e line) inch
| None -> e
since "doing things with the lines of a file" is a generally useful operation in and of itself (counting lines, counting words, finding the longest line, parsing, etc. are all particular instances of this schema). Then your function is:
let sampler_string file string_so_far =
fold_left_lines (fun string_so_far line ->
let first_two_letters = String.sub line 0 2 in
string_so_far ^ first_two_letters)
string_so_far file
As Matias pointed out, it's first important to move the recursive call outside the try/with expression so it can be tail-call optimized.
However, there is a semi-standard solution for this: use Batteries Included. Batteries provides an abstraction, Enums, of the concept of iterating over something. Its IO infrastructure then provides the BatIO.lines_of function, which returns an enumeration of the lines of a file. So your whole function can become this:
fold (fun s line -> s ^ String.sub line 0 2) "" (BatIO.lines_of file)
The enum will automatically close the file when it is exhausted or garbage collected.
The code can be made more efficient (avoiding the repeated concatenation) with a buffer:
let buf = Buffer.create 2048 in
let () = iter (fun line -> Buffer.add_string buf (String.sub line 0 2))
(BatIO.lines_of file) in
Buffer.contents buf
Basically: Batteries can save you a lot of time and effort in code like this.