Reading file line by line in Lua

Reading file line by line in Lua - file-io

As per Lua documentation, file:read("*l") reads next line skipping end of line.
Note:- "*l": reads the next line skipping the end of line, returning nil on end of file. This is the default format
Is this documentation right? Because file:read("*l") reads the current line,instead of next line or my understanding is wrong? Pretty confusing...

Lua manages files using the same model of the underlying C implementation (this model is used also by other programming languages and it is fairly common). If you are not familiar with this way of looking at files, the terminology could be unclear, indeed.
In this model a file is represented as a stream of bytes having a so called current position. The current position is a sort of conceptual pointer to the first byte in the file that will be read or written by the next I/O operation. When you open a file for reading, a new stream is set-up so that its current position is the beginning of the file, i.e. the current position "points" to the first byte in the file.
In Lua you manage streams through so-called file handles, which are a sort of intermediaries for the underlying streams. Any operation you perform using the handle is carried over to the corresponding stream.
Lua io.open opens a file, associates a C stream with it and returns a file handle that represents that stream:
local file_handle = io.open( "myfile.txt" ) -- file opened for reading
Therefore, if you perform any operation that reads some bytes (usually interpreted as characters, if you work with text files) those are read from the stream and for each byte read the current position of the stream advances by one, pointing each time to the next byte to be read.
Lua documentation implies this model. Thus when it says next line, it means that the input operation will read all characters in the stream starting from the current position until an end-of-line character is found.
Note that if you look at text files as a sequence of lines you could be misled, since you could think of a "current line" and a "next line". That would be an higher level model compared to the C model. There is no "current line" in C. In C text files are nothing more than a sequence of bytes where some special characters (end-of-line characters) undergo some special treatment (which is mostly implementation-dependent) and are used by some C standard functions as line terminators, i.e. as marks to detect when stop reading characters.
Another source of confusion for newbies or people coming from higher level languages is that in C, for an historical accident, bytes are handled as characters (the basic data type to handle single bytes is char, which is the smallest numeric type in C!). Therefore for people with a C background it is natural to think of bytes as characters and vice versa.
Although Lua is a much higher level language than C, its close relationship with C (it was designed to be easily interfaced with C code) makes it inherit part of this C "bytes-as-characters" approach. In fact, for example, Lua strings can hold arbitrary bytes and can be used to process raw binary data.

Like Lorenso said above, read starts at the current file position and reads from that position some portion of the file. How much of the file it reads depends on read instruction. For reference, in Lua 5.3:
"*all" : reads to the end of the file
"*line" : reads from the current position to the end of the line.
The end of the line is marked by a special character usually denoted
LfCr (Line feed, carriage return )
"*number" : reads a number, that is, it will read up to the end of what
it recognizes in the text as a number, stopping at, for example, a
comma ",".
num : reads a string with up to num characters
Here's an example that reads a file with a list of numbers into an array (a table), then returns the array. (Just change the "*number" to "*line" and it would read a file line by line):
function read_array(file)
local arr = {}
local handle = assert( io.open(file,"r") )
local value = handle:read("*number")
while value do
table.insert( arr, value )
value = handle:read("*number")
end
handle:close()
return arr
end

Related

How to read a binary file with TCL

So I have a function I'm using to read data from a file. It works fine if the file is plain text, but when I try to read a binary file, like a png, it returns a different text (diff confirms that). I opened a hex editor to see what was wrong and found out it is putting some c2 bytes along with the file (I don't know if the position is random or if there are other bytes except this c2 one).
This is my function. I just want it to read and save to a variable.
proc read_file {path} {
set channel [open $path r]
fconfigure $channel -translation binary
set return_string "[read $channel]"
close $channel
return "$return_string"
}
To actually print, I'm doing this:
puts -nonewline [read_file file.png]

When you open a file, it defaults to being in text mode . In text mode (which is really a combination of options) the IO layer translates characters from whatever encoding they are in into Tcl's internal encoding, and does the reverse operation on output. The default encoding scheme is platform specific, but in your case it sounds like it is UTF-8. (Tcl uses a complex internal system of encodings; it doesn't expose those to the outside world.)
By contrast, when you put the channel into binary mode, the bytes on the outside are directly mapped to characters in the range 0-255 (and vice versa on output). You get a perfect copy, provided you put both input and output channels in binary mode. (There are other optimisations for binary mode, but they don't matter here.)
When you only put one of the channels in binary mode, you get what looks like corruption. It isn't random though. In particular, when the input is binary but the output is UTF-8, input bytes in the range 128-255 get converted into multiple output bytes, where the first of those bytes is in the sort of range you observed. There are other combinations that mess things up; the whole range of problems is collectively known as mojibake.
tl;dr Don't mix up binary and text data unless you're very careful. The results of getting it wrong are "surprising".

Directly read a particular line in fortran

I want to read a particular line of a file, e.g. the 3rd line of the input.dat file. My present code :
Program Read_a_line
Implicit None
Integer:: i
Real*8:: x,y
open (10, file='input.dat', status='old')
do i=1,3
read (10,*) x, y
end do
print*,'x=',x,' y=',y
End Program Read_a_line
However, the code reads all the data till it reaches the 3rd line. Can we just read the 3rd line? Can we read several particular lines, eg. the 2nd and 4th lines only.
Online available examples do a similar trick. I was wondering if there exists a direct way in modern fortran version.
I'm a bit curious!

if you have fixed size records you can seek to the correct point
see also Can I move the file pointer to a particular (byte) location in a formatted file?

Fortran runtime error: Bad integer for item 0 in list input?

How do I fix the Fortran runtime error: Bad integer for item 0 in list input?
Below is the Fortran program which generates a runtime error.
CHARACTER CNFILE*(*)
REAL BOX
INTEGER CNUNIT
PARAMETER ( CNUNIT = 10 )
INTEGER NN
OPEN ( UNIT = CNUNIT, FILE = CNFILE, STATUS = 'OLD' )
READ ( CNUNIT,* ) NN, BOX
The error message received from gdb is :
At line 688 of file MCNPT.f (unit = 10, file = 'LATTICE-256.txt')
Fortran runtime error: Bad integer for item 0 in list input
[Inferior 1 (process 3052) exited with code 02]
(gdb)
I am not sure what options must be specified for READ() to read to numbers from the text file. Does it matter if the two numbers on the same line are specified as either an integer or a real in the text file?
Below is the gdb execution of the program using a break point at the open call
Breakpoint 1, readcn (
cnfile=<error reading variable: Cannot access memory at address 0x7fffffffdff0>,
box=-3.37898272e+33, _cnfile=30) at MCNPT.f:686

Since you did not specify form="unformatted" on the open statement, the unit / file is opened for formatted IO. This is appropriate for a human-readable text file. ("unformatted" would be used for a non-human readable file in computer-native format, sometimes called "binary".) Therefore you should provide a format on the read, or use list-directed read, i.e., read(unit, *). To advise on a particular format we would have to know the layout of the numbers in the file. A possible read with format is: read (CNUINT, '(I4, 2X, F6.2)' ) NN, BOX
P.S. I'm answering the question in your question and not the title, which seems unrelated.
EDIT: now that you are show the text data file, a list-directed read looks easier. That is because the data doesn't line up in columns. It seems that the file has two integers on the first line, then three real numbers on each of the following lines. Most likely you need a different read for the first line. Is the code sample that you are showing us trying to read the first line, or one of the later lines? If the first line, it would seem plausible to read into two integer variables. If a later line, into two or three real variables. Two if you wish to skip the third data item on the line.
EDIT 2: the question has been substantially altered several times, which is very confusing. The first line of the text file that was shown in one version of the question contained integers, with later lines having reals. Since the listed-directed read is reading into an integer and a floating variable, it will have problems if you attempt to use it on the later lines that have two real values.

'Bad repeat count' while inputting a file, FORTRAN

I am trying to read a file into my code.
there are 2 subroutines, one which writes a file and the other which reads it.
the writing part was:
write(*,*)'entered refile, shall make file'
ileunitA=int(presentstep)
write(fname,1012)ileunitA
1012 format('DATA_',i6.6,'.dat')
write(fnam,1112)index
1112 format('pp',i3.3)
open(UNIT=ileunitA,FILE=fname)
!variables from module global
write(ileunita,*)u,v,w,pc,p,p0,rho1,gam,con
write(ileunita,*)aip,aim,ajp,ajm,akp,akm,ap,ap0
write(ileunita,*) scon,smomu,smomv,smomw
...
The reading part was as follows(in another subroutine):
ileunita=25;
open(unit=ILEUNITA,file='DATA_010500.dat')
!variables from module global
read(ileunita,*)u,v,w,pc,p,p0,rho1,gam,con
read(ileunita,*)aip,aim,ajp,ajm,akp,akm,ap,ap0
read(ileunita,*) scon,smomu,smomv,smomw
...
When I run the code, it shows the following error:
At line 3682 of file bub2.f90 (unit = 25, file = 'DATA_000001.dat')
Fortran runtime error: Bad repeat count in item 1 of list input
Can anyone help me figure out what could be the problem? And what is 'repeat count'. What is a 'bad' repeat count? Thanks

Guessing a little (you could show the text in the problematic line in your question...), but you are using list directed input (and output) with the * as the second specifier in the read (and write) statements. List directed input allows multiple fields that have the same value to be represented using the syntax r*c, where r is a numeric repeat count and c is the value to be repeated.
If any of your output items generate a field that contains a * then that could be confusing the processing of input.
(It is permissible (though rare) for a processor to represent multiple output fields that have the same value using a repeat count, for example WRITE (unit,*) 23, 23, 23, 23 could result in an input file that contains the text 4*23.)
List directed input also has some other features, such as the handling of delimiter characters, the / character causing input processing to terminate and the possibility and handling of null values. Some of these features may surprise those not familiar with the rules (which are inspired by typical short cuts taken when input was submitted via punched cards), which why it is often better to avoid list directed input and output and use an explicit format instead.
If any of your data fields are of type character you should consider using a non-default DELIM mode to avoid any special characters within the character variable value from confusing the input processing.

Can Fortran read bytes directly from a binary file?

I have a binary file that I would like to read with Fortran. The problem is that it was not written by Fortran, so it doesn't have the record length indicators. So the usual unformatted Fortran read won't work.
I had a thought that I could be sneaky and read the file as a formatted file, byte-by-byte (or 4 bytes by 4 bytes, really) into a character array and then convert the contents of the characters into integers and floats via the transfer function or the dreaded equivalence statement. But this doesn't work: I try to read 4 bytes at a time and, according to the POS output from the inquire statement, the read skips over like 6000 bytes or so, and the character array gets loaded with junk.
So that's a no go. Is there some detail in this approach I am forgetting? Or is there just a fundamentally different and better way to do this in Fortran? (BTW, I also tried reading into an integer*1 array and a byte array. Even though these codes would compile, when it came to the read statement, the code crashed.)

Yes.
Fortran 2003 introduced stream access into the language. Prior to this most processors supported something equivalent as an extension, perhaps called "binary" or similar.
Unformatted stream access imposes no record structure on the file. As an example, to read data from the file that corresponds to a single int in the companion C processor (if any) for a particular Fortran processor:
USE, INTRINSIC :: ISO_C_BINDING, ONLY: C_INT
INTEGER, PARAMETER :: unit = 10
CHARACTER(*), PARAMETER :: filename = 'name of your file'
INTEGER(C_INT) :: data
!***
OPEN(unit, filename, ACCESS='STREAM', FORM='UNFORMATTED')
READ (unit) data
CLOSE(unit)
PRINT "('data was ',I0)", data
You may still have issues with endianess and data type size, but those aspects are language independent.
If you are writing to a language standard prior to Fortran 2003 then unformatted direct access reading into a suitable integer variable may work - it is Fortran processor specific but works for many of the current processors.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas