Explain a piece of Smalltalk code? - smalltalk

I cannot understand this piece of Smalltalk code:
[(line := self upTo: Character cr) size = 0] whileTrue.
Can anybody help explain it?

One easy thing to do, if you have the image where the code came from, is run a debugger on it and step through.
If you came across the code out of context, like a mailing list post, then you could browse implementers of one of the messages and see what it does. For example, #size and #whileTrue are pretty standard, so we'll skip those for now, but #upTo: sounds interesting. It reminds me of the stream methods, and bringing up implementors on it confirms that (in Pharo 1.1.1), ReadStream defines it. There is no method comment, but OmniBrowser shows a little arrow next to the method name indicating that it is defined in a superclass. If we check the immediate superclass, PositionableStream, there is a good method comment explaining what the method does, which is draw from the stream until reaching the object specified by the argument.
Now, if we parse the code logically, it seems that it:
reads a line from the stream (i.e. up to a cr)
if it is empty (size = 0), the loop continues
if it is not, it is returned
So, the code skips all empty lines and returns the first non-empty one. To confirm, we could pass it a stream on a multi-line string and run it like so:
line := nil.
paragraph := '
this is a line of text.
this is another line
line number three' readStream.
[(line := paragraph upTo: Character cr) size = 0] whileTrue.
line. "Returns 'this is a line of text.'"

Is this more readable:
while(!strlen(line=gets(self)))
Above expression has a flaw if feof or any other error, line==NULL
So has the Smalltalk expression, if end of stream is encountered, upTo: will answer an empty collection, and you'll have an infinite loop, unless you have a special stream that raises an Error on end of stream... Try
String new readStream upTo: Character cr

The precedence rules of Smalltalk are
first: unary messages
second: binary messages
third: keyword messages
last: left to right
This order of left to right, can be changed by using parenthesis i.e. ( ) brackets. The expression within the pair of brackets is evaluated first.
Where brackets are nested, the inner-most bracket is is evaluated first, then work outwards in towards the outer bracket, and finally the remains of the expression outside the brackets.
Because of the strong left-to-right tendency, I often find it useful to read the expression from right to left.
So for [(line := self upTo: Character cr) size = 0] whileTrue.
Approaching it from the end back to beginning gives us the following interpretation.
. End the expression. Equivalent to ; in C or Java
whileTrue What's immediately to the left of it? ] the closure of a block object.
So whileTrue is a unary message being sent to the block [ ... ]
i.e. keep doing this block, while the block evaluates to true
A block returns the result of the last expression evaluated in the block.
The last expression in the block is size = 0 a comparison. And a binary message.
size is generally a unary message sent to a receiver. So we're checking the size of something, to see if it is 0. If the something has a size of 0, keep going.
What is it we are checking the size of? The expression immediately to the left of the message name. To the left of size is
(line := self upTo: Character cr)
That's what we want to know the size of.
So, time to put this expression under the knife.
(line := self upTo: Character cr) is an assignment. line is going have the result of
self upTo: Character cr assigned to it.
What's at the right-hand end of that expression? cr It's a unary message, so has highest precedence. What does it get sent to. i.e. what is the receiver for the cr message?
Immediately to its left is Character. So send the Character class the message cr This evaluates to an instance of class Character with the value 13 - i.e. a carriage return character.
So now we're down to self upTo: aCarriageReturn
If self - the object receiving the self upTo: aCarriageReturn message - does not understand the message name sizeUpto: it will raise an exception.
So if this is code from a working system, we can infer that self has to be an object that understands sizeUpTo: At this point, I am often tempted to search for the massage name to see which Classes have the message named sizeUpto: in their list of message names they know and understand (i.e. their message protocol ).
(In this case, it did me no good - it's not a method in any of the classes in my Smalltalk system).
But self appears to be being asked to deal with a character string that contains (potentially) many many carriage returns.
So, return the first part of aCharacterString, as far as the first carriage-return.
If the length of aCharacterString from the start to the first carriage return is zero, keep going and do it all again.
So it seems to be we're dealing with a concatenation of multiple cr-terminated strings, and processing each one in turn until we find one that's not empty (apart from its carriage- return), and assigning it to line

One thing about Smalltalk that I'm personally not a huge fan of is that, while message passing is used consistently to do nearly everything, it can sometimes be difficult to determine what message is being sent to what receiver. This is because Smalltalk doesn't have any delimiters around message sends (such as Objective-C for example) and instead allows you to chain message sends while following a set of precedence rules which go something like "message sends are interpreted from left to right, and unless delimited by parentheses, messages with many keywords are evaluated first, then binary keyword messages, then unary, and then no keyword ones." Of course using temporary variables or even just parentheses to make the order of the messages explicit can reduce the number of situations where you have to think about this order of operations. Here is an example of the above code, split up into multiple lines, using temp variables and parenthesis for explicit message ordering for readability. I think this is a bit clearer about the intent of the code:
line = (self upTo: (Character cr)).
([((line size) = 0)] whileTrue).
So basically, line is the string created when you concatenate the characters in string self up until the carriage return character (Character cr).
Then, we check line's size in characters, and check if that's equal to 0, and because we put this in a block (brackets), we can send it a whileTrue, which re-evaluates the condition in the block until it returns true. So, yeah whileTrue really would be clearer if it was called doWhileTrue or something like that.
Hope that helps.

Related

ftell/fseek fail when near end of file

Reading a text file (which happens to be a PDS Member FB 80)
hFile = fopen(filename,"r");
and have reached up to the point in the file where there is only an empty line left.
FilePos = ftell(hFile);
Then read the last line, which only contains a '\n' character.
fseek(hFile, FilePos, SEEK_SET);
fails with:-
errno=(27) EDC5027I The position specified to fseek() was invalid.
The position specified to fseek() was returned by ftell() a few lines earlier. It has the value 841 in the specific error case I have seen. Checking through the debugger, this is also the value returned by ftell a few lines earlier. It has not been corrupted.
The same code works at other positions in the file, and only fails at the point where there is a single empty line left to read when the position is remembered.
My understanding of how ftell/fseek should work is succinctly captured by another answer on SO.
The value returned from ftell on a text stream has no predictable relationship to the number of characters you have read so far. The only thing you can rely on is that you can use it subsequently as the offset argument to fseek or fseeko to move back to the same file position.
It would seem that I cannot rely on the one thing I should be able to rely on.
My questions is, why does fseek fail in this way?
As z/OS has some file formats that are unique you might find the answer in this Knowledge Center article.
Given that you are processing a PDS member I would suspect that this is record level I/O which is handled differently than stream I/O which is more common in distributed implementations.
I do not know why fseek fails in this way, but if your common usage pattern is to use ftell to get the position and then fseek to go to that position, I strongly suggest using fgetpos and fsetpos instead for data set I/O. Not only will you avoid this problem that you are finding, but it is also better performing for certain data set characteristics.

String replacement with .subst in a for loop

I'd like to make a string substitution in a for block using a named capture. I've expected to get the numbers 1,2,3 as output. But it is Nil for the first run, and then 1 and 2 for the 2nd and 3rd run. How do I use the .subst correctly in the loop construct? I see the same behavior when using a map construct instead the for loop. It does work as expected, if I replace with a fixed string value.
for <a1 b2 c3> -> $var {
say $var;
say $var.subst(/.$<nr>=(\d)/, $<nr>); #.subst(/.$<nr>=(\d)/, 'X'); #OK
}
#`[
This is Rakudo version 2019.11 built on MoarVM version 2019.11
Output:
a1
Use of Nil in string context
in block at test3.pl6 line 3
b2
1
c3
2
]
TL;DR Defer evaluation of $<nr> until after evaluation of the regex. #JoKing++ suggests one way. Another is to just wrap the replacement with braces ({$<nr>}).
What happens when your original code calls subst
Before Raku attempts to call the subst routine, it puts together a list of arguments to pass to it.
There are two values. The first is a regex. It does not run. The second value is $<nr>. It evaluates to Nil because, at the start of a program, the current match object variable is bound to something that claims its value is Nil and any attempt to access the value of a key within it -- $<nr> -- also returns Nil. So things have already gone wrong at this point, before subst ever runs.
Once Raku has assembled this list of arguments, it attempts to call subst. It succeeds, and subst runs.
To get the next match, subst runs the regex. This updates the current match object variable $/. But it's too late to make any difference to the substitution value that has already been passed to subst.
With match in hand, subst next looks at the substitution argument. It finds it's Nil and acts accordingly.
For the second call of subst, $<nr> has taken on the value from the first call of subst. And so on.
Two ways to defer evaluation of $<nr>
#JoKing suggests considering use of S///. This construct evaluates the regex (between the first pair of /s) first, then the replacement (between the last pair of /s). (The same principle applies if you use other valid S syntaxes like S[...] = ....)
If you use subst, then, as explained in the previous section, Raku puts together the argument list for it before calling it. It finds a regex (which it does not run) and a closure (which it does not run either). It then attempts to call subst with those arguments and succeeds in doing so.
Next, subst starts running. It has received code for both the match (a regex) and the substitution (a closure).
It runs the regex as the matching operation. If the regex returns a match then subst runs the closure and uses the value it returns as the substitution.
Thus, because we switched from passing $<nr> as a naked value, which meant it got frozen into Nil, to passing it wrapped in a closure, which deferred its evaluation until $/ had been set to a match with a populated <nr> entry, we solved the problem.
Note that this only works because whoever designed/implemented subst was smart/nice enough to allow both the match and substitution arguments to be forms of Code (a regex for the match and ordinary closure for the substitution) if a user wants that. It then runs the match first and only then runs the substitution closure if it's been passed one, using the result of that latter call as the final substitution. Similarly, S/// works because that has been designed to only evaluate the replacement after it's first evaluated the substitution.

Rules for barewords

Barewords can be used at the left hand side of Pair declarations (this is not documented yet, I'm addressing this issue right now, but I want to get everything right). However, I have not found what is and what's not going to be considered a bareword key anywhere.
This seems to work
say (foo'bar-baz => 3); # OUTPUT: «foo'bar-baz => 3␤»
This does not
say (foo-3 => 3); # OUTPUT: «(exit code 1) ===SORRY!=== Error while compiling /tmp/jorTNuKH9V␤Undeclared routine:␤ foo used at line 1␤␤»
So it apparently follows the same syntax as the ordinary identifiers. Is that correct? Am I missing something here?
There are no barewords in Perl 6 in the sense that they exist in Perl 5, and the term isn't used in Perl 6 at all.
There are two cases that we might call a "bare identifier":
An identifier immediately followed by zero or more horizontal whitespace characters (\h*), followed by the characters =>. This takes the identifier on the left as a pair key, and the term parsed after the => as a pair value. This is an entirely syntactic decision; the existence of, for example, a sub or type with that identifier will not have any influence.
An identifier followed by whitespace (or some other statement separator or terminator). If there is already a type of that name, then it is compiled into a reference to the type object. Otherwise, it will always be taken as a sub call. If no sub declaration of that name exists yet, it will be considered a call to a post-declared sub, and an error produced at CHECK-time if a sub with that name isn't later declared.
These two cases are only related in the sense that they are both cases of terms in the Perl 6 grammar, and that they both look to parse an identifier, which follow the standard rules linked in the question. Which wins is determined by Longest Token Matching semantics; the restriction that there may only be horizontal whitespace between the identifier and => exists to make sure that the identifier, whitespace, and => will together be counted as the declarative prefix, and so case 1 will always win over case 2.

Parse streaming JSON in Objective C

I am using JSON-RPC over TCP, the problem is that I could not find any JSON parse capable of parsing multiple JSON objects correctly, and it would be relatively hard to split it, since there is no delimiter used.
Anyone knows a way how I could handle i.e. this:
{"foo":false, "bar: true, "baz": "cool"}{"ba
Somehow I need to split it so I end up just with the first, complete JSON object. The remaining string needs to stay in buffer until I have enough data to parse it properly.
XBMC's JSON-RPC doc does give a hint:
As such, your client needs to be able to deal with this, eg. by counting and matching curly braces ({}).
Update: As Jody Hagins pointed out, beware of curly braces inside JSON strings when using this approach.
Another possible and probably much better solution would be using a streaming JSON parser like yajl (or its Objective-C wrapper yajl-objc). You can feed the parser with data until it says the current object is done and then restart parsing.
#ePirat, if someone just concatenates multiple JSON dictionaries without delimiters, they should be shot.
For parsing: JSONSerialization parses NSData which could come in any encoding. Fortunately, if you have multiple JSON dictionaries concatenated, they are quite easy to take apart. All you need is looking at the bytes and check for the characters \ " { and }.
If you find a { then increase the counter for "open brackets".
If you find a } then decrease the counter for "open brackets". If the counter is at zero, you've found the end of a dictionary.
If you find a ", then repeatedly look at the next character. If the next character is a " then skip it and go to the normal processing (you've found the end of a string). If the next character is a \ then skip that character and the following character. If the next character is anything else, skip it.
If you reach the end of the data, then your JSON data is incomplete. You could remember which state you were in (count of open brackets, whether you are parsing a string, and if parsing a string whether you just encountered a backlash character) and continue right where you left off.
No need to convert the NSData to a string until you've separated it into dictionaries. If you suspect that you might be given UTF-16 or UTF-32, check whether bytes 0, 1, 2 or 1, 2, 3 are zero (UTF-32), then check whether bytes 0 and 2 or 1 and 3 are zero (UTF-16). But in that case, if the server sends non-standard JSON in UTF-16 or UTF-32, change "the person responsible should be shot" to "the person responsible must be shot".

Test of a regex (substring) occurs anywhere in any of the items in an array of strings

My spec wants to test if a certain substring occurs withing any entry of an array of strings.
p #banner.errors.messages[:base] #=> ["Specify a leader text or an image, not both"]
All my spec really wants to know, is whether or not the "not both" string occurs in any of the array-items in there.
#banner.errors.messages[:base].should include(/not both/)
fails, because "not both" is not included in ["Specify a leader text or an image, not both"]
Note: When I test against the literal string (should include("Specify...both"), things work. But that seems dirty to me. Such user-faced texts are not critical for the test to pass; and such texts will change: every time the error message is changed, I will need to update my tests.
Maybe like this?
#banner.errors.messages[:base].join.should match(/not both/)
But note that there is a edge case where the match might be over two or more lines, e.g. a line ending with "not " and the next line is " both".