Red has no open function like Rebol? - rebol

I want to read 10 lines from the end of a big text file without loading the whole file in memory.
I wanted to try to use Open as explained here for Rebol In Rebol, what is the idiomatic way to read a text file line by line?
But Red doesn't have open function ?

You can try a read/lines/seek/part %yourfile offset blocksize
But I have no clue. You have to test and adapt your offset and blocksize.

Red doesn't have open function yet. Full IO support is planned for 0.7.0. So you have to either wait or use OS calls directly.

Related

How to do an incremental read of binary files

TL;DR: can I do an incremental read of binary files with Red or Rebol?
I would like to use Red to process some large (13MB to 2GB) structured binary files (Kurzweil synthesizer files). I've used other languages (C, Go, Tcl, Ruby, Dart) to walk through these files, and now I'd like to do the same with Red or Rebol.
Is there a way to incrementally read binary files, byte by byte? All I see is read/binary which seems to slurp the entire file at once (or a part of a file).
I'll need to jump around a little bit, too (either peek at the next byte, or skip to the end of a section, or skip past variable length strings to the start of data).
(Yes, I could make some helpers that tracked the position and used read/part/seek.)
I would like to make a call to the low level OS read/seek if that is possible - something new to learn.
This is on macos, but a portable solution would be great.
Thanks!
PS: "open/read %abc" gives an error "*** Script Error: open does not allow file! for its port argument", even though the help message say the port argument is "port [port! file! url! block!]"
Rebol has ports for that, which are planned for 0.7.0 release in Red. So, current I/O is very basic and buffer-only, and open is a preliminary stub.
I would like to make a call to the low level OS read/seek if that is possible - something new to learn.
You can leverage Rebol or Red/System FFI as a learning excercise.
Here is how you would do it in Rebol:
>> file: open/direct/binary %file.dat
>> until [none? probe copy/part file 20]
>> close file
#{732F7072696E74657253657474696E6773312E62}
#{696E504B01022D00140006000800000021006149}
#{0910890100001103000010000000000000000000}
...
#{000000006A290000646F6350726F70732F617070}
#{2E786D6C504B0506000000000D000D0068030000}
#{292C00000000}
none
first file or pick file 1 will return the next byte value (integer!)
This even works with text files: open/lines/direct, in that case copy/part file 20 will return 20 lines, or you can use pick file 1 or first file to get the next line.
Soon this will be available on Red too.

TypeError: 'str' does not support the buffer interface in python

I'm having this error in a python script:
TypeError: 'str' does not support the buffer interface
The line that is generating the error is
username = cred_file.readlines()[0].split(';')[0]
I'm a python beginner, any help is appreciated.
You're running a python 2 script with python 3. Python 3 now returns bytes no longer str when reading from a binary stream.
3 choices:
run it with python 2. That if you don't have the rights/time to adapt the script, not recommended as python 3 is becoming more and more the norm.
change your code to insert a decode function (it will continue to work in python 2):
username = cred_file.readlines()[0].decode().split(';')[0]
If file is opened in read/binary mode, readlines returns a list of bytes not str. You have do decode the bytes into a str to apply str methods.
open the file in "r" instead of "rb". readlines then returns a list of str and your code will work. Sometimes it can be problematic on windows because of need to preserve the carriage return (\r) chars, so look out for side effects in your code.
Note: cred_file.readlines()[0] is a questionable construction: you're reading the whole file lines, and drop all the lines but the first. Not very efficient I/O and CPU wise.
Prefer that: cred_file.readline() which is equivalent to read the first line.
If you need to read all the lines for further processing, then store the result of readlines in a list.

Trying to read in a .gda file to IDL

I am trying to read in a .gda file into IDL for plotting purposes. I am not familiar with the format and my research indicates it is an unformatted binary data file type. Anyways, here is what I am doing:
pro omidi_contour
openr, 1, 'data.gda'
a = fltarr(128,128,128)
readu, 1, a
close, 1
end
However when I look at the variable definition at the left panel of IDL, it indicates that a is 'undefined'. When I try to print:
print, a[0,0,0]
I get:
Variable is undefined: A
How can I solve this?
I found out that there was nothing wrong with my program. It was reading the right values from the file. However, the IDL "forgot" the value of the variables once the program was done. Solution: DO not run this as a program i.e. remove the following lines:
pro omidi_contour
end
This makes the code to run as if each line were typed into the IDL prompt and IDL does remember the values this time round.

Reading MANY files at once in Fortran

I have 500,000 files which I need to read in Fortran and each file has ~14,000 entries in it (each entry is only about 100 characters long). I need to process each line for each file at a time. For example, I need to process line 1 for all 500,000 files before moving on to line 2 from the files and so forth.
I cannot open them all at once (I tried making an array of file pointers and opening them all) because there will be too many files open at once. Instead, I would like to do something as follows:
do iline = 1,Nlines
do ifile = 1,Nfiles
! open the file
! read a line
! close the file
enddo
end
In hopes that this would allow me to read one line at a time (from each file) and then move on to the next line (in each file). Unfortunately, each time I open the file it starts me off at line 1 again. Is there any way to open/close a file and then open it again where you left off previously?
Thanks
Unfortunately it is not possible in this way in standard Fortran. Even If you specify
position="ASIS"
the actual position will be unspecified for a not already connected unit and will be in fact the beginning of the file on most systems.
That means You have to use
read(*,*)
enough times to get on the right place in the file.
You could also use stream access. The file would be again opened at the beginning, but you can use
read(u,*,pos=n) number
where n is the position saved from the previous open. You can get the position from
inquire(unit=u, pos=n)
n = n
You would open the file with acess="STREAM".
Also 500000 opened files is indeed too much. There are ways how to inquire for the system limits and how to control them, but also your compiler may have some limits http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/
Other solution: Couldn't you store the content of the files in memory? Today couple of Gigabytes is OK, but it may be not enough for you.
You can try using fseek and ftell in something like the following.
! initialize an array of 0's
do iline = 1,Nlines
do ifile = 1,Nfiles
! open the file
! fseek(fd, array(ifile))
! read a line
! array(ifile)=ftell(fd)
! close the file
enddo
end
The (untested) idea is to store the offset of each file in an array and position the cursor at that place upon opening the file. Then, once a line is read, the ftell retrieves the current position which is saved to memory for next round. If all entries have the same length, you can spare the array and just store one value.
If the files have fixed, i.e., constant, record lengths, you could use direct access. Then you could "directly" read a specific record. A big "if" however.
the overhead of all the file opening/closing will be a big performance bottleneck.
You should try to read as much as you can for each open operation given whatever memory you have:
pseudocode:
loop until done:
loop over all files:
open
fseek !as in damiens answer
read N lines into array ! N=100 eg.
save ftell value for file
close
end file loop
loop over N output files:
open
write array data
close

Exporting Mathematica Print[] Output to a .txt file

I have a large Mathematica notebook that uses Print[] commands periodically to output runtime messages. This is the only output (aside from exported files) that this notebook generates. Is there any way I can automate the export of this output to a .txt file without having to re-write the Print[] commands?
According to the documentation, Print outputs to the $Output channel which is a list of streams. So, at the beginning of the notebook,
strm = OpenWrite["output.log"];
AppendTo[ $Output, strm ];
and at the end of the notebook
Close[strm];
Note, if execution is interrupted prior to closing the stream, then you'll have to do it manually. Also, the above code will overwrite prior data in "output.log," so you may wish to use OpenAppend, instead.
Edit: to guarantee that Abort will be called, consider using one of the techniques outlined here.
You want the PutAppend command.