How to read/write objects to a file? - file-io

I would like to write an object (a simple collection) to a file. I've been looking around and found this question and this question. I also went through a lot of sites with broken links etc, but I don't seem to be able to find a way to write to a file in smalltalk. I tried this (and other things, but they come down to the same):
out := 'newFile' asFileName writeStream.
d associationsDo: [ :assoc | out
nextPutAll: assoc key asString;
nextPut: $, ;
nextPutAll: assoc value asString; cr. ]
out close.
as suggested in the linked questions, but it does not seem to be doing anything. It does not throw errors, but I don't find any files either.
The only thing I want to do is persist my object (binary or textual does not really matter), so how could I do this?
Thanks in advance

What you are doing is creating a write stream on a string. That actually works but the information is stored on a string object, no files will be written.
This works in both Squeak and Pharo (and probably other dialects):
FileStream
forceNewFileNamed: 'filename.ext'
do: [ :stream |
d associationsDo: [ :assoc |
stream
ascii; "data is text, not binary"
nextPutAll: assoc key asString;
nextPut: $, ;
nextPutAll: assoc value asString;
cr ] ].
In Pharo you could write:
'filename.ext' asFileReference writeStreamDo: [ :stream |
... ].
Note however that there are better ways to store structured data in files, e.g. STON (Smalltalk Object Notation, Smalltalk version of JSON) or XML. If you want to persist objects than you might want to checkout Fuel, StOMP (probably no longer supported) or any of the other object serializers.
Lastly there's also ImageSegment, a VM based object serializer (no extra packages needed), but you'll probably need some help with that.

The traditional Smalltalk serialization format uses the storeOn: and readFrom: methods. E.g.
d1 := {'a'->1. 'b'->2. 'c'->'3'} as: Dictionary.
"store"
FileStream forceNewFileNamed: 'mydict.st' do: [:out | d1 storeOn: out].
"read"
d2 := FileStream oldFileNamed: 'mydict.st' do: [:in | Object readFrom: in].
This is a textual format and gets inefficient for larger data sets. Also, it cannot store cyclical references. For that, check out the more advanced serialization options as listed in the other answers.

Solution:
| d out |
d := Dictionary new.
d at: 'green' put: 'vert'.
d at: 'blue' put: 'bleu'.
d at: 'red' put: 'rouge'.
d at: 'white' put: 'blanc'.
out := FileStream fileNamed: 'dict-out.txt'.
d associationsDo: [ :assoc | out
nextPutAll: assoc key asString;
nextPut: $, ;
nextPutAll: assoc value asString; cr.].
out close.
See also:
http://wiki.squeak.org/squeak/1583
http://wiki.squeak.org/squeak/6338

It seems that you are using a syntax for an extension, but not the base.
At least in Pharo, 'newFile' asFileName is a string, and #writeStream provides you a stream on the same string, not on a file.
Try with FileStream newFileNamed: 'newFile' or something like that.
And most of all: when something strange happens, inspect. Inspect partial evaluations and check all your assumptions. Or better yet, debug and see where is the code going.

The equivalent in Fuel would be
FLSerializer serialize: d toFileNamed: 'filename.ext'.
And
d := FLMaterializer materializeFromFileNamed: 'filename.ext'

Related

Stream assigment in pharo

I have a problem here.
I have a data variable of type an OrderedCollection.
this variable shows me this when I do a DoIt: an OrderedCollection ('3c7lwmdva2b8jbt39ls4pz3sl' '3c7lwmbf36tamw1m45riirdze' 8 February 1994).
Now I would like this:
object:=String streamContents:
[:stream|
stream
nextPutAll: 'data:= ';cr;
print:data asArray.]
But when I run, it shows me this:
data := an Array End of statement list encounteencountered -> ('3c7lwmdva2b8jbt39ls4pz3sl' '3c7lwmbf36tamw1m45riirdze' 8 February 1994).
So month I wanted to get this:
data := #('3c7lwmdva2b8jbt39ls4pz3sl' '3c7lwmbf36tamw1m45riirdze' 8 February 1994).
How to do please?
result := String streamContents: [:stream |
stream nextPutAll: 'data := #('.
data
do: [:string | stream nextPut: $'; nextPutAll: string asString; nextPut: $']
separatedBy: [stream space].
stream nextPut: $)]
Since my answer has been downvoted, I'll explain the solution.
What's in data? The question says that data prints as (original formatting, sorry about that)
an OrderedCollection ('3c7lwmdva2b8jbt39ls4pz3sl' '3c7lwmbf36tamw1m45riirdze' 8 February 1994).
which indicates that data is an OrderedCollection with two strings and a Date.
What is the OP trying to compute? It is not clear. The use of String streamContents: seems to indicate that the OP is trying to produce a String, more precisely an assignment sentence where data is assigned the OrderedCollection converted to an Array.
Solution to 2. Assuming my guess in 2 is right, my code above produces such a sentence.
What other interpretation can we give to this unclear question?
Well, may the the OP is just looking for a method that would convert the OrderedCollection into an Array. In this case, the answer would have been simply
object := data asArray.
However, given a previous post, where the same OP was trying to do some metaprogramming, the actual intention remains unclear.

Kaitai (KSY) - optional attribute

I'm trying to describe SSH protocol in Kaitai language (.ksy file).
At the beginning, there is a protocol version exchange in the following format:
SSH-protoversion-softwareversion SP comments CR LF
where SP comments is optional. AFAIK, there is not way of describing attribute as fully optional, only via if condition. Does anybody know how to describe this relation in Kaitai, so that parser accepts also this format: SSH-protoversion-softwareversion CR LF?
Thanks
Kaitai Struct is not designed to be what you would call a grammar in its traditional meaning (i.e. something mapping to a regular language, context-free grammar, BNF, or something similar). Traditional grammars have notion of "this element being optional" or "this element can be repeated multiple times", but KS works the other way around: it's not even attempting to solve the ambiguility problem, but rather builds on a fact that all binary formats are designed to be non-ambiguous.
So, whenever you're encountering something like "optional element" or "repeated element" without any further context, please take a pause and consider if Kaitai Struct is a right tool for the task, and is it really a binary format you're trying to parse. For example, parsing something like JSON or XML or YAML might be theoretically possible with KS, but the result will be not of much use.
That said, in this particular case, it's perfectly possible to use Kaitai Struct, you'll just need to think on how a real-life binary parser will handle this. From my understanding, a real-life parser will read the whole line until the CR byte, and then will do a second pass at trying to interpret the contents of that line. You can model that in KS using something like that:
seq:
- id: line
terminator: 0xd # CR
type: version_line
# ^^^ this creates a substream with all bytes up to CR byte
- id: reserved_lf
contents: [0xa]
types:
version_line:
seq:
- id: magic
contents: 'SSH-'
- id: proto_version
type: str
terminator: 0x2d # '-'
- id: software_version
type: str
terminator: 0x20 # ' '
eos-error: false
# ^^^ if we don't find that space and will just hit end of stream, that's fine
- id: comments
type: str
size-eos: true
# ^^^ if we still have some data in the stream, that's all comment
If you want to get null instead of empty string for comments when they're not included, just add extra if: not _io.eof for the comments attribute.

What's the inverse of block: load text in rebol / red

Let's say I have some rebol / red code. If I load the source text, I get a block, but how can get back the source text from block ? I tried form block but it doesn't give back the source text.
text: {
Red [Title: "Red Pretty Printer"]
out: none ; output text
spaced: off ; add extra bracket spacing
indent: "" ; holds indentation tabs
emit-line: func [] [append out newline]
emit-space: func [pos] [
append out either newline = last out [indent] [
pick [#" " ""] found? any [
spaced
not any [find "[(" last out find ")]" first pos]
]
]
]
emit: func [from to] [emit-space from append out copy/part from to]
clean-script: func [
"Returns new script text with standard spacing."
script "Original Script text"
/spacey "Optional spaces near brackets and parens"
/local str new
] [
spaced: found? spacey
clear indent
out: append clear copy script newline
parse script blk-rule: [
some [
str:
newline (emit-line) |
#";" [thru newline | to end] new: (emit str new) |
[#"[" | #"("] (emit str 1 append indent tab) blk-rule |
[#"]" | #")"] (remove indent emit str 1) break |
skip (set [value new] load/next str emit str new) :new
]
]
remove out ; remove first char
]
print clean-script read %clean-script.r
}
block: load text
LOAD is a higher-level operation with complex behaviors, e.g. it can take a FILE!, a STRING!, or a BLOCK!. Because it does a lot of different things, it's hard to speak of its exact complement as an operation. (For instance, there is SAVE which might appear to be the "inverse" of when you LOAD from a FILE!)
But your example is specifically dealing with a STRING!:
If I load the source text, I get a block, but how can get back the source text from block ?
As a general point, and very relevant matter: you can't "get back" source text.
In your example above, your source text contained comments, and after LOAD they will be gone. Also, a very limited amount of whitespace information is preserved, in the form of the NEW-LINE flag that each value carries. Yet what specific indentation style you used--or whether you used tabs or spaces--is not preserved.
On a more subtle note, small amounts of notational distinction are lost. STRING! literals which are loaded will lose knowledge of whether you wrote them "with quotes" or {with curly braces}...neither Rebol nor Red preserve that bit. (And even if they did, that wouldn't answer the question of what to do after mutations, or with new strings.) There are variations of DATE! input formats, and it doesn't remember which specific one you used. Etc.
But when it comes to talking about code round-tripping as text, the formatting is minor compared to what happens with binding. Consider that you can build structures like:
>> o1: make object! [a: 1]
>> o2: make object! [a: 2]
>> o3: make object! [a: 3]
>> b: compose [(in o1 'a) (in o2 'a) (in o3 'a)]
== [a a a]
>> reduce b
[1 2 3]
>> mold b
"[a a a]"
You cannot simply serialize b to a string as "[a a a]" and have enough information to get equivalent source. Red obscures the impacts of this a bit more than in Rebol--since even operations like to block! on STRING! and system/lexer/transcode appear to do binding into the user context. But it's a problem you will face on anything but the most trivial examples.
There are some binary formats for Rebol2 and Red that attempt to address this. For instance in "RedBin" a WORD! saves its context (and index into that context). But then you have to think about how much of your loaded environment you want dragged into the file to preserve context. So it's certainly opening a can of worms.
This isn't to say that the ability to MOLD things out isn't helpful. But there's no free lunch...so Rebol and Red programs wind up having to think about serialization as much as anyone else. If you're thinking of doing processing on any source code--for the reasons of comment preservation if nothing else--then PARSE should probably be the first thing you reach for.

How to identify binary and text files using Smalltalk

I want to verify that a given file in a path is of type text file, i.e. not binary, i.e. readable by a human. I guess reading first characters and check each character with :
isAlphaNumeric
isSpecial
isSeparator
isOctetCharacter ???
but joining all those testing methods with and: [ ... and: [ ... and: [ ] ] ] seems not to be very smalltalkish. Any suggestion for a more elegant way?
(There is a Python version here How to identify binary and text files using Python? which could be useful but syntax and implementation looks like C.)
only heuristics; you can never be really certain...
For ascii, the following may do:
|isPlausibleAscii numChecked|
isPlausibleAscii :=
[:char |
((char codePoint between:32 and:127)
or:[ char isSeparator ])
].
numChecked := text size min: 1024.
isPossiblyText := text from:1 to:numChecked conform: isPlausibleAscii.
For unicode (UTF8 ?) things become more difficult; you could then try to convert. If there is a conversion error, assume binary.
PS: if you don't have from:to:conform:, replace by (copyFrom:to:) conform:
PPS: if you don't have conform: , try allSatisfy:
All text contains more space than you'd expect to see in a binary file, and some encodings (UTF16/32) will contain lots of 0's for common languages.
A smalltalky solution would be to hide the gory details in method on Standard/MultiByte-FileStream, #isProbablyText would probably be a good choice.
It would essentially do the following:
- store current state if you intend to use it later, reset to start (Set Latin1 converter if you use a MultiByteStream)
Iterate over N next characters (where N is an appropriate number)
Encounter a non-printable ascii char? It's probably binary, so return false. (not a special selector, use a map, implement a new method on Character or something)
Increase 2 counters if appropriate, one for space characters, and another for zero characters.
If loop finishes, return whether either of the counters have been read a statistically significant amount
TLDR; Use a method to hide the gory details, otherwise it's pretty much the same.

new line in squeak

i want to do something like this: Transcript show: '\n'. how?
Use the following:
Transcript cr
You can use it after a value via a cascade:
Transcript show: 123; cr
From my (long) experience, missing character escapes are one of the few things that are missing in Smalltalk. For streaming, solutions using cr, tab etc. are ok.
However, if you need a particular control character in a string, this may be ugly and hard to read (using "streamContents:", or "withCRs" to add a newLine). Alternatively, you may want to use one of the (non-standard) string expansion mechanisms. For example, in VisualWorks or Smalltalk/X, you can write (if I remember correctly):
'someString with newline<n>and<t>tabs' expandMacros
or even with printf-like slicing of other object's printStrings:
'anotherString<n><t>with newlines<n>and<t>tabs and<p>' expandMacrosWith:(Float pi)
I guess, there is something similar in Squeak and V'Age as well.
But, be aware: these expansions are done at execution time. So you may encounter a penalty when heavily using them on many strings.
The character itself can be reached as Character cr. So, you could also do this:
Transcript show: 'Bla! , Character cr asString.
But of course,
Transcript show: 'Bla!' ; cr.
is way more elegant.
What I do as a convenience is add a method "line" to the String class:
line
^self, String lf
Then you can just say obj showSomething: 'Hello world!' line.
Or call it newline, endl, lf, etc...