Writing/reading a file in binary mode in Clisp - file-io

I'm writing this program that's supposed to read from a file, do some stuff with the content and write to an output file preserving the original line endings. If the file has CRLF endings, the output file should also have that. My problem is in writing the line ending especially with the CLISP implementation(it works with gcl). When I try to write a linefeed character(LF), the file ends up having CRLF endings. I'm guessing this is something to do with CLISP's implementation.
I need a way to write the file in binary mode like in other langauages. The standard I/O functions in the specification only take an optional stream name and the content to be written.
You can reproduce that behaviour with something like this:
(with-open-file (out-file "test.dat" :direction :output)
(setf ending #\linefeed)
(princ "First Line" out-file)
(write-char ending out-file)
(princ "Second Line" out-file)
(write-char ending out-file)
(princ "Second Line" out-file))
I need a solution that works in windows.

You need to specify :EXTERNAL-FORMAT argument, mentioning the line terminator mode:
(with-open-file (out-file "test.dat" :direction :output :external-format :unix)
...)
The external format defaults to :dos on windows because that is the standard on Microsoft systems.
Note that you do not want binary mode if you are actually writing text. In Common Lisp (as opposed to C and Emacs Lisp), there is a very clear separation between binary i/o (reading and writing bytes) and text i/o (reaching and writing characters), just like a number is not a character and vice versa, even though characters have an integer code.

Related

How to read a binary file with TCL

So I have a function I'm using to read data from a file. It works fine if the file is plain text, but when I try to read a binary file, like a png, it returns a different text (diff confirms that). I opened a hex editor to see what was wrong and found out it is putting some c2 bytes along with the file (I don't know if the position is random or if there are other bytes except this c2 one).
This is my function. I just want it to read and save to a variable.
proc read_file {path} {
set channel [open $path r]
fconfigure $channel -translation binary
set return_string "[read $channel]"
close $channel
return "$return_string"
}
To actually print, I'm doing this:
puts -nonewline [read_file file.png]
When you open a file, it defaults to being in text mode . In text mode (which is really a combination of options) the IO layer translates characters from whatever encoding they are in into Tcl's internal encoding, and does the reverse operation on output. The default encoding scheme is platform specific, but in your case it sounds like it is UTF-8. (Tcl uses a complex internal system of encodings; it doesn't expose those to the outside world.)
By contrast, when you put the channel into binary mode, the bytes on the outside are directly mapped to characters in the range 0-255 (and vice versa on output). You get a perfect copy, provided you put both input and output channels in binary mode. (There are other optimisations for binary mode, but they don't matter here.)
When you only put one of the channels in binary mode, you get what looks like corruption. It isn't random though. In particular, when the input is binary but the output is UTF-8, input bytes in the range 128-255 get converted into multiple output bytes, where the first of those bytes is in the sort of range you observed. There are other combinations that mess things up; the whole range of problems is collectively known as mojibake.
tl;dr Don't mix up binary and text data unless you're very careful. The results of getting it wrong are "surprising".

Can't open Outfile error (NSIS 3.0.5) when using a specific strftime Format for !define /date "NOW"

I have following definition: !define /date NOW "%Y-%b-%d_%H-%M-%S"
When creating the outfile as such: OutFile "..\my_app_name_Setup-x64_${NOW}_Build_${__COUNTER__}.exe"
compiles successful. However when i change the format to: !define /date NOW "%Y-%b-%d_%H:%M:%S" (Effectively changing the hyphens by colons for Hours, Minutes and Seconds) it no longer compiles successful. (Compile error ´Can´t open Outfile at the very end).
Two (2) questions:
What causes this error?
How can i use my preferred strftime Format? (= with colons for HH:MM:SS)
Maybe Anders can shine a light on this ;)
Add2. as per https://nsis.sourceforge.io/mediawiki/index.php?title=Reference/!define&oldid=24774 tmy formatting ("%Y-%b-%d_%H:%M:%S" is correct. (Might be depreciated(?)
Colons are not legal in filenames on Windows. See Naming Conventions: reserved characters.
If you are compiling on POSIX it is theoretically legal but rather pointless since you cannot execute the installer without renaming it.

SHA256 generation different for file and content of this file

I use online SHA256 converters to calculate a hash for a given file. There, I have seen an effect I don't understand.
For testing purposes, I wanted to calculate the hash for a very simple file. I named it "test.txt", and its only content is the string "abc", followed by a new line (I just pressed enter).
Now, when I put "abc" and newline into a SHA256 generator, I get the hash
edeaaff3f1774ad2888673770c6d64097e391bc362d7d6fb34982ddf0efd18cb
But when I put the complete file into the same generator, I get the hash
552bab6864c7a7b69a502ed1854b9245c0e1a30f008aaa0b281da62585fdb025
Where does the difference come from? I used this generator (in fact, I tried several ones, and they always yield the same result):
https://emn178.github.io/online-tools/sha256_checksum.html
Note that this different does not arise without newlines. If the file just contains the string "abc", the hash is
ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad
for the file as well as just for the content.
As noted in my comment, the difference is caused by how newline characters are represented across different operating systems (see details here):
On UNIX and UNIX-like systems, newlines are represented by a line feed character (\n).
On DOS and Windows systems, newlines are represented by a carriage return followed by a line feed character (\r\n).
Compare the following two commands and their output, corresponding to the SHA256 values in your question:
echo -en "abc\n" | sha256sum
edeaaff3f1774ad2888673770c6d64097e391bc362d7d6fb34982ddf0efd18cb
echo -en "abc\r\n" | sha256sum
552bab6864c7a7b69a502ed1854b9245c0e1a30f008aaa0b281da62585fdb025
The issue you are having could come from the character encoding of the new line.
In windows the new line is escaped with \r\n and in linux is escaped with \n.
These 2 have a different dec value (\r is 13 and \n is 10).
More info you can find here:
https://en.wikipedia.org/wiki/Newline
https://en.wikipedia.org/wiki/List_of_Unicode_characters
Even i faced same issue. but providing the data in hex mode helped to understand the actual behavior.
Canonicalization of data needs to be performed before SHA calculations which will eliminate such issues. Canonicalization needs to be performed both at Generation side and also at verification side.

End-of-line conversion during Input/Output for text files

How to write strings (&str and String) containing newlines to text files?
In C you can switch between writing text as is or converting '\n' to proper end of line symbol for the OS via fopen flags, "w" or "wb". For example in Windows '\n' is converted to "\r\n" during I/O.
How can I achieve this with Rust? I cannot find corresponding API in std::fs::File.
There is no such API in the standard library (there might be a crate for this, though). The simplest way to write lines to a file is with the writeln! macro and it only uses \n for newlines.
It was probably considered (by the Rust developers) not useful enough because I'm pretty sure that nowadays \r\n is used only for Microsoft Notepad compatibility.
There once was an issue related to write not using CRLF on Windows, but it was concluded that:
the raw io::File will likely not handle it by default but would instead require a wrapper
(note: since Rust 1.0 it is no longer io::File, but fs::File)

Line terminator getting added to plain text file

I'm a little confused by some behavior I'm seeing with text files on my Mac. When I open a new file in vim and type in a single character (let's say the letter "t") into the file with no carriage return and hit save and then do a hex dump on the file (using vim's :r !xxd command), I see the following:
00000000: 740a t.
There is still a line feed oa in the file. And when I look at the file properties on the file, there are two bytes, not one. How did it get in there if I didn't type it?
Ok, so it turns out vim automatically adds a newline character at the last line to comply with Posix standard that all lines must end with a new line. You can turn this off with :set noeol in vim.