Write string to file in lisp - file-io

I ran some code like this
(defun writeFile (name content)
(with-open-file (stream name
:direction :output
:if-exists :overwrite
:if-does-not-exist :create)
(format stream content)))
(writeFile "C:\Users\Peter\test.txt" "Test...")
but then I checked my C:\Users\Peter directory and it did not contain create a file named test.txt. What am I doing wrong?

\ is an escape character in strings in Common Lisp.
(length "\\") is 1.
(length "\U") is 1.
"\U" is "U".
"C:\Users\Peter\test.txt" is "C:UsersPetertest.txt".
So you are writing a file called "C:UsersPetertest.txt".
Three possible solutions:
escape the backslash with a backslash
I'm not sure if that works: use a forward slash
use one of the PATHNAME functions to construct the pathname
Advanced: use a logical pathname

Related

Null char returning from reading a file in Common Lisp

I’m reading files and storing them as a string using this function:
(defun file-to-str (path)
(with-open-file (stream path) :external-format 'utf-8
(let ((data (make-string (file-length stream))))
(read-sequence data stream)
data)))
If the file has only ASCII characters, I get the content of the files as expected; but if there are characters beyond 127, I get a null character (^#), at the end of the string, for each such character beyond 127. So, after $ echo "~a^?" > ~/teste I get
CL-USER> (file-to-string "~/teste")
"~a^?
"
; but after echo "aaa§§§" > ~/teste , the REPL gives me
CL-USER> (file-to-string "~/teste")
"aaa§§§
^#^#^#"
and so forth. How can I fix this? I’m using SBCL 1.4.0 in an utf-8 locale.
First of all, your keyword argument :external-format is misplaced and has no effect. It should be inside the parenteses with stream and path. However, this has no effect to the end result, as UTF-8 is the default encoding.
The problem here is that in UTF-8 encoding, it takes a different number of bytes to encode different characters. ASCII characters all encode into single bytes, but other characters take 2-4 bytes. You are now allocating, in your string, data for every byte of the input file, not every character in it. The unused characters end up unchanged; make-string initializes them as ^#.
The (read-sequence) function returns the index of the first element not changed by the function. You are currently just discarding this information, but you should use it to resize your buffer after you know how many elements have been used:
(defun file-to-str (path)
(with-open-file (stream path :external-format :utf-8)
(let* ((data (make-string (file-length stream)))
(used (read-sequence data stream)))
(subseq data 0 used))))
This is safe, as length of the file is always greater or equal to the number of UTF-8 characters encoded in it. However, it is not terribly efficient, as it allocates an unnecessarily large buffer, and finally copies the whole output into a new string for returning the data.
While this is fine for a learning experiment, for real-world use cases I recommend the Alexandria utility library that has a ready-made function for this:
* (ql:quickload "alexandria")
To load "alexandria":
Load 1 ASDF system:
alexandria
; Loading "alexandria"
* (alexandria:read-file-into-string "~/teste")
"aaa§§§
"
*

Is there a way to change delimiters of documentation string in Common Lisp?

I sometimes put examples of function calls and their output in the documentation string of a function definition.
(defun js-[] (&rest args)
"Javascript array literal statement.
(js-[] 1 2 3)
> \"[1, 2, 3]\"
"
(format nil "[~{~A~^, ~}]" (mapcar #'js-expr args)))
But sometimes the output of the function is a string. So I have to escape the double quotes in the example output. This becomes tedious very quickly.
Is there a way to change the docstring delimiter from double quotes to something else so I don't have to keep escaping them?
Please note that sometimes it's worse than just escaping once:
(defun js-~ (str)
"Javascript string statement. This is needed so that double quotes are inserted.
(js-~ \"string\")
> \"\\\"string\\\"\"
"
(format nil "\"~A\"" str))
Here there is an additional problem. Reading the docstring is difficult.
TL;DR
Yes, you can, no, you do not want to do it.
No, CL has just one syntax for strings
The only way to represent a string in Common Lisp is to use
Double-Quote ".
Yes, you can modify the reader so that something else denotes a string
E.g., suppose you want to a string to be started and stopped by, say, #.
(This is an ordinary character rarely used in symbol names,
in contrast to % and $ which are often used in implementation-internal symbols.)
Set the properties of # from ":
(multiple-value-bind (function non-terminating-p)
(get-macro-character #\")
(set-macro-character #\# function non-terminating-p))
Now:
(read-from-string "#123#")
==> "123" ; 5
(read-from-string #"123"#)
==> "123" ; 5
Do not forget to restore the input syntax to standard Common Lisp syntax:
(setq *readtable* (copy-readtable nil))
See Reader.
You might be able to modify the printer
The standard does not require that the printing of standard objects
(such as a string) to be
used-modifiable.
You can try defining a print-object method:
(defmethod print-object ((o string) (d stream))
...)
however,
implementing this correctly is not easy
this is non-conforming code (defining a method for a standardized generic function which is applicable when all of the arguments are direct instances of standardized classes)
thus many implementations will signal errors on this code,
even if you disable package locks &c, the implementation is free to ignore your method.
No, you do not want to do that
The code exists for people to read it.
Changing Lisp syntax will make it harder for others to read your code.
It will also confuse various tools you use (editor &c).
CL has many warts, but this is not one of them ;-)
PS. See also documentation and describe, as well as comment syntax Sharpsign Vertical-Bar and Semicolon.
You could make a reader macro that slurps in a multi line string like this:
(defun hash-slash-reader (stream slash arg)
(declare (ignore slash arg))
(loop :with s := (make-string-output-stream)
:for c := (read-char stream)
:if (and (eql #\/ c) (eql #\# (peek-char nil stream)))
:do (read-char stream) (return (get-output-stream-string s))
:if (eql #\Newline c)
:do (peek-char t stream)
:do (princ c s)))
(set-dispatch-macro-character #\# #\/ #'hash-slash-reader)
Now you can do:
(defun js-~ (str)
#/ --------------------------
Javascript string statement.
This is needed so that double quotes are inserted.
(js-~ "string")
> "\"string\""
-------------------------- /#
(format nil "\"~A\"" str))
The documentation string will be added just as if you'd written it with double quotes. This is effectively the same as changing the delimiter for strings!. In fact, it is an additional way to delimit strings.
Which is why you can use it (not recommended though) in regular lisp code, and not just for documentation purposes.
Using / as the sub-character of the dispatch macro, helps keep it conceptually close to the multiline comment, but avoids being ignored by the reader altogether.
Another idea. Write your docstrings as usual, without examples.
(defun js-~ (str)
"Javascript array literal statement."
...)
Define tests. That can be as simple as:
(defparameter *tests*
'(((js-~ "string") . "\"string\"")
...))
Use that list to perform tests:
(loop for (form . expected) in *tests*
for (fn . args) = form
for actual = (apply fn args)
do (assert (equalp actual expected)))
... and to update the documentation. Be careful, this appends to the existing documentation string, so don't run it twice.
(loop for (form . expected) in *tests*
for (fn . args) = form
do (setf (documentation fn 'function)
(format nil
"~a~%~% ~S~% => ~S"
(documentation fn 'function)
form
expected)))
You can (ab)use cl-interpol. Although the purpose of the library is to enable string interpolation it also allows custom string delimiters, if you don't mind preprending the string with #?. e.j.
CL-USER> (cl-interpol:enable-interpol-syntax)
; No values
CL-USER> #?'foo'
"foo"
CL-USER> #?/foo/
"foo"
CL-USER> #?{foo}
"foo"
CL-USER>
so after enabling the interpol reader macro you could write
(defun js-[] (&rest args)
#?'Javascript array literal statement.
(js-[] 1 2 3)
> "[1, 2, 3]"
'

CSH: How to tokenize a string

I'm making a CSH script where I am looping through the file names in a directory:
foreach i ($INPUTDIR/*)
$i
end
i ends up being something like this:
/dir1/dir2/dir3/dir4/fileNameHead_middle_2016080924
My question is, using CSH, how can I tokenize each path, first splitting on the forward slashes, then on the underscores, collecting only the last token?
The basename utility deletes any prefix ending with the last slash / character present in string (after first stripping trailing slashes), and a suffix, if given. On my system there's also a gbasename which is part of GNU coreutils which does essentially the same thing with a few more options.
basename is part of POSIX, so it's safe to use everywhere.

Strip filename (shortest) extension by CMake (get filename removing the last extension)

get_filename_component can be used to remove/extract the longest extension.
EXT = File name longest extension (.b.c from d/a.b.c)
NAME_WE = File name without directory or longest extension
I have a file with a dot in its name, so I need the shortest extension:
set(MYFILE "a.b.c.d")
get_filename_component(MYFILE_WITHOUT_EXT ${MYFILE} NAME_WE)
message(STATUS "${MYFILE_WITHOUT_EXT}")
reports
-- a
but I want
-- a.b.c
What is the preferred way to find the file name without the shortest extension?
I would do:
string(REGEX REPLACE "\\.[^.]*$" "" MYFILE_WITHOUT_EXT ${MYFILE})
The regular expression matches a dot (\\., see next paragraph), followed by any number of characters that is not a dot [^.]* until the end of the string ($), and then replaces it with an empty string "".
The metacharacter dot (normally in a regular expression it means "match any character") needs to be escaped with a \ to be interpreted as a literal dot. However, in CMake string literals (like C string literals), \ is a special character and need to be escaped as well (see also here). Therefore you obtain the weird sequence \\..
Note that (almost all) metacharacters do not need to be escaped within a Character Class: therefore we have [^.] and not [^\\.].
Finally, note that this expression is safe also if there's no dot in the filename analyzed (the output corresponds to the input string in that case).
Link to string command documentation.
As of CMake 3.14 it is possible to do this with get_filename_component directly.
NAME_WLE: File name without directory or last extension
set(INPUT_FILE a.b.c.d)
get_filename_component(OUTPUT_FILE_WE ${INPUT_FILE} NAME_WE)
get_filename_component(OUTPUT_FILE_WLE ${INPUT_FILE} NAME_WLE)
OUTPUT_FILE_WE would be set to a, and OUTPUT_FILE_WLE would be set to a.b.c.
I'd solve this with a simple regex:
string(REGEX MATCH "^(.*)\\.[^.]*$" dummy ${MYFILE})
set(MYFILE_WITHOUT_EXT ${CMAKE_MATCH_1})
In CMake 3.20 and greater, the cmake_path command now provides an elegant solution:
cmake_path(GET <path-var> STEM [LAST_ONLY] <out-var>)
where STEM refers to the portion of the filename before the extension. The cmake_path command supersedes the get_filename_component command.
So, in your example a.b.c.d, the following code grabs the stem of the filename:
cmake_path(GET MYFILE STEM MYFILE_WITHOUT_EXT)
message(STATUS ${MYFILE_WITHOUT_EXT})
which yields:
a
But this code with LAST_ONLY grabs the stem, but considers only the last extension as the file extension:
cmake_path(GET MYFILE STEM LAST_ONLY MYFILE_WITHOUT_EXT)
message(STATUS ${MYFILE_WITHOUT_EXT})
which yields:
a.b.c

Writing/reading a file in binary mode in Clisp

I'm writing this program that's supposed to read from a file, do some stuff with the content and write to an output file preserving the original line endings. If the file has CRLF endings, the output file should also have that. My problem is in writing the line ending especially with the CLISP implementation(it works with gcl). When I try to write a linefeed character(LF), the file ends up having CRLF endings. I'm guessing this is something to do with CLISP's implementation.
I need a way to write the file in binary mode like in other langauages. The standard I/O functions in the specification only take an optional stream name and the content to be written.
You can reproduce that behaviour with something like this:
(with-open-file (out-file "test.dat" :direction :output)
(setf ending #\linefeed)
(princ "First Line" out-file)
(write-char ending out-file)
(princ "Second Line" out-file)
(write-char ending out-file)
(princ "Second Line" out-file))
I need a solution that works in windows.
You need to specify :EXTERNAL-FORMAT argument, mentioning the line terminator mode:
(with-open-file (out-file "test.dat" :direction :output :external-format :unix)
...)
The external format defaults to :dos on windows because that is the standard on Microsoft systems.
Note that you do not want binary mode if you are actually writing text. In Common Lisp (as opposed to C and Emacs Lisp), there is a very clear separation between binary i/o (reading and writing bytes) and text i/o (reaching and writing characters), just like a number is not a character and vice versa, even though characters have an integer code.