PostScript mark token - operators

In PostScript if you have
[4 5 6]
you have the following tokens:
mark integer integer integer mark
The stack goes like this:
| mark |
| mark | integer |
| mark | integer | integer |
| mark | integer | integer | integer |
| array |
Now my question:
Is the ]-mark operator a literal object or an executable object?
Am I correct that the [-mark is a literal object (just data) and that the ]-mark is an executable object (because you always need to create an array when you see this ]-mark operator) ?
PostScript Language Reference Manual section 3.3.2 gives me:
The [ and ] operators, when executed, produce a literal array object with the en-closed objects as elements. Likewise, << and >> (LanguageLevel 2) produce a
literal dictionary object.
That is not clear for me if both [ ] operators are executable or only the ] operator.

Summary.
All of these special tokens, [, ], <<, >>, come out of the scanner as executable names. [ and << are defined to yield a marktype object (so they are not operators per se, but they are executable names defined in systemdict where all the operators live). ] and >> are defined as procedures or operators which are executed just like any other procedure or operator. These use the counttomark operator to find the opening bracket. But all of these tokens are treated specially by the scanner, which recognizes them without surrounding whitespace since they are part of its delimiter set.
Details.
It all depends on when you look at it. Let's trace through what the interpreter does with these tokens. I'm going to illustrate this with a string, but it works just the same with a file.
So if you have an input string
([4 5 6]) cvx exec
cvx makes a literal object executable. The program stream is a file object also labeled executable. exec pushes an object on the Execution Stack, where it is encountered by the interpreter on the next iteration of the inner interpreter processing loop. When executing the program stream, the executable file object is topmost on the Execution Stack.
The interpreter uses token to call the scanner. The scanner skips initial whitespace, then reads all non-whitespace characters up to the next delimiter, then attempts to interpret the string as a number, and failing that it becomes an executable name. The brackets are part of the set of delimiters, and so are termed 'self-delimiting'. So the scanner reads the one bracket character, stops reading because it's a delimiter, discovers it cannot be a number, so it yields an executable name.
Top of Exec Stack | Operand Stack
(4 5 6]) [ |
Next, the interpreter loop executes anything executable (unless it's an array). Executing a token means loading it from the dictionary, and then executing the definition if it's executable. [ is defined as a -mark- object, same as the name mark is defined. It's not technically an operator or a procedure, it's just a definition. Automatic loading happens because the name comes out of the scanner with the executable flag set.
(4 5 6]) | -mark-
The scanner then yields 4, 5, and 6 which are numbers and get pushed straight to the operand stack. 6 is delimited by the ] which is pushed back on the stream.
(]) | -mark- 4 5 6
The interpreter doesn't execute the numbers since they are not executable, but it would be just the same if it did. The action for executing a number is simply to push it on the stack.
Then, finally the scanner encounters the right bracket ]. And that's where the magic happens. Self-delimited, it doesn't need to be followed by any whitespace. The scanner yields the executable name ] and the interpreter executes it by loading and it finds ...
{ counttomark array astore exch pop }
Or maybe an actual operator that does this. But, yeah. counttomark yields the number of elements. array creates an array of that size. astore fills an array with elements from the stack. And exch pop to discard that pesky mark once and for all.
For dictionaries, << is exactly the same as [. It drops a mark. Then you line up some key-value pairs, and >> is procedure that does something to effect of ...
{ counttomark dup dict begin 2 idiv { def } repeat pop currentdict end }
Make a dictionary. Define all the pairs. Pop the mark. Yield the dictionary. This version of the procedure tries to create a fast dictionary by making it double-sized. Move the 2 idiv to before dup to make a small dictionary.
So, to get philosophical, counttomark is the operator you're using. And it requires a special object-type that isn't used for anything else, the marktype object, -mark-. The rest is just syntactical sugar to let you access this stack-counting ability to create linear data-structures.
Appendix
Here's a procedure that models the interpreter loop reading from currentfile.
{currentfile token not {exit} if dup type /arraytype ne {exec} if }loop
exec is responsible for loading (and further executing) any executable names. You can see from this that token really is the name of the scanner; and that procedures (arrays) directly encountered by the interpreter loop are not executed (type /arraytype ne {exec} if).
Using token on strings lets you do really cool stuff, however. For example, you can dynamically construct procedure bodies with substituted names. This is very much like a lisp macro.
/makeadder { % n . { n add }
1 dict begin
/n exch def
({//n add}) token % () {n add} true
pop exch pop % {n add}
end
} def
token reads the entire procedure from the string, substituting the immediately-evaluated name //n with its currently defined value. Notice here that the scanner reads an executable array all at once, effectively executing [ ... ] cvx internally before returning (In certain interpreters, like my own xpost, this allows you to bypass the stack-size limits to build an array, because the array is built in separate memory. But Level 2 garbage collection makes this largely irrelevant).
There is also the bind operator which modifies a procedure by replacing operator names with the operator objects themselves. These tricks help you to factor-out name lookups in speed-critical procedures (like inner loops).

Both [ and ] are executable tokens. [ produces a mark object, ] creates an array of objects to the last mark

Related

How do I read a line of input from a user, until an EOF is hit, in GNU Prolog?

I have been reading the GNU Prolog documentation to figure out how to read a line of input until an end_of_file atom is reached. Here is my pseudocode for writing such a goal:
read_until_end(Chars, Out):
if peek_char unifies with end_of_file, Out = Chars
otherwise, get the current character, add it to a buffer, and keep reading
I implemented that like this:
read_until_end(Chars, Out) :-
peek_char(end_of_file) -> Out = Chars;
peek_char(C) -> read_until_end([C | Chars], Out).
prompt(Line) :-
write('> '),
read_until_end([], Line).
Here's what happens in the REPL:
| ?- prompt(Line).
> test
Fatal Error: global stack overflow (size: 32768 Kb, reached: 32765 Kb, environment variable used: GLOBALSZ)
If I print out C for the second branch of read_until_end, I can see that peek_char always gives me the same character, 'b'. I think that I need a way to progress some type of input character index or something like that, but I can't find a way to do so in the documentation. If I knew a way, I would probably have to use recursion to progress such a pointer, since I can't have any mutable state, but aside from that, I do not know what to do. Does anyone have any advice?
You are using peek_char/1 to get the next character, but that predicate does not consume the character from the stream (it just "peeks" the stream). Therefore an infinite recursion undergoes in your code that ends with a global stack overflow.
You should use get_char/1 to read and consume the character from the stream, and reverse/2 the list of collected chars:
read_until_end(Chars, Out) :-
get_char(Char),
(
Char = end_of_file -> reverse(Chars, Out)
;
read_until_end([Char | Chars], Out)
).
To avoid the need to reverse the list you may slightly modify your procedures to build the list in order (without using an accumulator):
read_until_end(Output) :-
get_char(Char),
(
Char = end_of_file -> Output=[]
;
(
Output=[Char|NOutput],
read_until_end(NOutput)
)
).
prompt(Line) :-
write('> '),
read_until_end(Line).

What's the inverse of block: load text in rebol / red

Let's say I have some rebol / red code. If I load the source text, I get a block, but how can get back the source text from block ? I tried form block but it doesn't give back the source text.
text: {
Red [Title: "Red Pretty Printer"]
out: none ; output text
spaced: off ; add extra bracket spacing
indent: "" ; holds indentation tabs
emit-line: func [] [append out newline]
emit-space: func [pos] [
append out either newline = last out [indent] [
pick [#" " ""] found? any [
spaced
not any [find "[(" last out find ")]" first pos]
]
]
]
emit: func [from to] [emit-space from append out copy/part from to]
clean-script: func [
"Returns new script text with standard spacing."
script "Original Script text"
/spacey "Optional spaces near brackets and parens"
/local str new
] [
spaced: found? spacey
clear indent
out: append clear copy script newline
parse script blk-rule: [
some [
str:
newline (emit-line) |
#";" [thru newline | to end] new: (emit str new) |
[#"[" | #"("] (emit str 1 append indent tab) blk-rule |
[#"]" | #")"] (remove indent emit str 1) break |
skip (set [value new] load/next str emit str new) :new
]
]
remove out ; remove first char
]
print clean-script read %clean-script.r
}
block: load text
LOAD is a higher-level operation with complex behaviors, e.g. it can take a FILE!, a STRING!, or a BLOCK!. Because it does a lot of different things, it's hard to speak of its exact complement as an operation. (For instance, there is SAVE which might appear to be the "inverse" of when you LOAD from a FILE!)
But your example is specifically dealing with a STRING!:
If I load the source text, I get a block, but how can get back the source text from block ?
As a general point, and very relevant matter: you can't "get back" source text.
In your example above, your source text contained comments, and after LOAD they will be gone. Also, a very limited amount of whitespace information is preserved, in the form of the NEW-LINE flag that each value carries. Yet what specific indentation style you used--or whether you used tabs or spaces--is not preserved.
On a more subtle note, small amounts of notational distinction are lost. STRING! literals which are loaded will lose knowledge of whether you wrote them "with quotes" or {with curly braces}...neither Rebol nor Red preserve that bit. (And even if they did, that wouldn't answer the question of what to do after mutations, or with new strings.) There are variations of DATE! input formats, and it doesn't remember which specific one you used. Etc.
But when it comes to talking about code round-tripping as text, the formatting is minor compared to what happens with binding. Consider that you can build structures like:
>> o1: make object! [a: 1]
>> o2: make object! [a: 2]
>> o3: make object! [a: 3]
>> b: compose [(in o1 'a) (in o2 'a) (in o3 'a)]
== [a a a]
>> reduce b
[1 2 3]
>> mold b
"[a a a]"
You cannot simply serialize b to a string as "[a a a]" and have enough information to get equivalent source. Red obscures the impacts of this a bit more than in Rebol--since even operations like to block! on STRING! and system/lexer/transcode appear to do binding into the user context. But it's a problem you will face on anything but the most trivial examples.
There are some binary formats for Rebol2 and Red that attempt to address this. For instance in "RedBin" a WORD! saves its context (and index into that context). But then you have to think about how much of your loaded environment you want dragged into the file to preserve context. So it's certainly opening a can of worms.
This isn't to say that the ability to MOLD things out isn't helpful. But there's no free lunch...so Rebol and Red programs wind up having to think about serialization as much as anyone else. If you're thinking of doing processing on any source code--for the reasons of comment preservation if nothing else--then PARSE should probably be the first thing you reach for.

How to read elements from a line in VHDL?

I'm trying to use VHDL to read from a file that can have different formats. I know you're supposed to use the following two lines of code to read a line at a time, the read individual elements in that line.
readline(file, aline);
read(aline, element);
However my question is what will read(aline, element) return into element? What will it return if the line is empty? What will it return if I've used it let's say 5 times and my line only has 4 characters?
The reason I want to know is that if I am reading a file with an arbitrary number of spaces between valid data, how do I parse this valid data?
The file contains ASCII characters separated by arbitrary amounts of white space (any number of spaces, tabs, or new lines). If the line starts with a # that line is a comment and should be ignored.
Outside of these comments, the first part of the file contains characters that are only letters or numbers in combinations of variable size. In other words this:
123 ABC 12ABB3
However, the majority of the file (after a certain number of read words) will be purely numbers of arbitrary length, separated by an arbitrary amount of white space. In other words, the second part of the file is this:
255 0 2245 625 430
2222 33 111111
and I must be able to parse these numbers (and interpret them as such) individually.
As mentioned in the comments, all the read procedures in std.textio and ieee.std_logic_textio skip over leading spaces apart from the character and string versions (because a space is as much a character as any other).
You can test whether a line variable (the buffer) is empty like this:
if L'length > 0 then
where L is your line variable. There is also a set of overloaded read procedures with an extra status output:
procedure read (L : inout LINE;
VALUE: out <type> ;
GOOD : out BOOLEAN);
The extra output - GOOD - is true if the read was successful and false if it wasn't. The advantage of these if that the read is unsuccessful, the simulation does not stop (as it does with the regular procedures). Also, with the versions in std.textio, if the read is unsuccessful, the read is non-destructive (ie whatever you were trying to read remains in the buffer). This is not the case with the versions in ieee.std_logic_textio, however.
If you really do not know what format you are trying to read, you could read the entire line into a string, like this:
variable S : string(1 to <some big number>);
...
readline(F, L);
assert L'length < S'length; -- make sure S is big enough
S := (others => ' '); -- make sure that the previous line is overwritten
if L'length > 0 then
read(L, S(1 to L'length);
end if;
The line L is now in the string S. You can then write some code to parse it. You may find the type attribute 'value useful. This converts a string to some type, eg
variable I : integer;
...
I := integer'value(S(12 to 14));
would set integer I to the value contained in elements 12 to 14 of string S.
Another approach, as suggested by user1155120 below, is to peek at the values in the buffer, eg
if L'length > 0 then -- check that the L isn't empty, otherwise the next line blows up
if L.all(1) = '#' then
-- the first character of the line is a '#' so the line must be a comment

Variable substitution within braces in Tcl

Correct me wherever I am wrong.
When we use the variables inside braces, the value won't be replaced during evaluation and simply passed on as an argument to the procedure/command. (Yes, some exception are there like expr {$x+$y}).
Consider the following scenarios,
Scenario 1
% set a 10
10
% if {$a==10} {puts "value is $a"}
value is 10
% if "$a==10" "puts \"value is $a\""
value is 10
Scenario 2
% proc x {} {
set c 10
uplevel {set val $c}
}
%
% proc y {} {
set c 10
uplevel "set val $c"
}
% x
can't read "c": no such variable
% y
10
% set val
10
%
In both of the scenarios, we can see that the variable substitution is performed on the body of the if loop (i.e. {puts "value is $a"}), whereas in the uplevel, it is not (i.e. {set val $c}), based on the current context.
I can see it as if like they might have access it via upvar kind of stuffs may be. But, why it has to be different among places ? Behind the scene, why it has to be designed in such this way ? Or is it just a conventional way how Tcl works?
Tcl always works exactly the same way with exactly one level of interpretation, though there are some cases where there is a second level because a command specifically requests it. The way it works is that stuff inside braces is never interpolated or checked for word boundaries (provided those braces start at the start of a “word”), stuff in double quotes is interpolated but not parsed for word boundaries (provided they start a word), and otherwise both interpolation and word boundary scanning are done (with the results of interpolation not scanned).
But some commands send the resulting word through again. For example:
eval {
puts "this is an example with your path: $env(PATH)"
}
The rule applies to the outer eval, but that concatenates its arguments and then sends the results into Tcl again. if does something similar with its body script except there's no concatenation, and instead there's conditional execution. proc also does the same, except it delays running the code until you call the procedure. The expr command is like eval, except that sends the script into the expression evaluation engine, which is really a separate little language. The if command also uses the expression engine (as do while and for). The expression language understands $var (and […]) as well.
So what happens if you do this?
set x [expr $x + $y]
Well, first we parse the first word out, set, then x, then with the third word we start a command substitution, which recursively enters the parser until the matching ] is found. With the inner expr, we first parse expr, then $x (reading the x variable), then +, then $y. Now the expr command is invoked with three arguments; it concatenates the values with spaces between them and sends the result of the concatenation into the expression engine. If you had x previously containing $ab and y containing [kaboom], the expression to evaluate will be actually:
$ab + [kaboom]
which will probably give you an error about a non-existing variable or command. On the other hand, if you did expr {$x + $y} with the braces, you'll get an addition applied to the contents of the two variables (still an error in this case, because neither looks like a number).
You're recommended to brace your expressions because then the expression that you write is the expression that will be evaluated. Otherwise, you can get all sorts of “unexpected” behaviours. Here's a mild example:
set x {12 + 34}
puts [expr $x]
set y {56 + 78}
puts [expr $y]
puts [expr $x * $y]
Remember, Tcl always works the same way. No special cases. Anything that looks like a special cases is just a command that implements a little language (often by calling recursively back into Tcl or the expression engine).
In addition to Donal Fellows's answer:
In scenario 2, in x the command uplevel {set val $c} is invoked, and fails because there is no such variable at the caller's level.
In y, the equivalent of uplevel {set val 10} is invoked (because the value of c is substituted when the command is interpreted). This script can be evaluated at the caller's level since it doesn't depend on any variables there. Instead, it creates the variable val at that level.
It has been designed this way because it gives the programmer more choices. If we want to avoid evaluation when a command is prepared for execution (knowing that the command we invoke may still evaluate our variables as it executes), we brace our arguments. If we want evaluation to happend during command preparation, we use double quotes (or no form of quoting).
Now try this:
% set c 30
30
% x
30
% y
10
If there is such a variable at the caller's level, x is a useful command for setting the variable val to the value of c, while y is a useful command for setting the variable val to the value encapsulated inside y.

How to pass functions as parameters in REBOL

I have attempted to pass a function as a parameter in the REBOL programming language, but I haven't figured out the correct syntax yet:
doSomething: func [a b] [
a b
a b
]
doSomething print "hello" {This should pass print as the first argument and "hello" as the second argument.}
This produces an error, since the print function is being called instead of being passed:
hello
*** ERROR
** Script error: doSomething does not allow unset! for its a argument
** Where: try do either either either -apply-
** Near: try load/all join %/users/try-REBOL/data/ system/script/args...
Is it possible to pass the print function as a parameter instead of calling the print function?
I've found the solution: I only need to add : before the name of the function that is being passed as a parameter.
Here, the :print function is being passed as a parameter instead of being invoked with "hello" as its argument:
doSomething: func [a b] [
a b
a b
]
doSomething :print "hello" {This should pass print as the first argument and "hello" as the second argument.}
You have discovered that by the nature of the system, when the interpreter comes across a WORD! symbol type which has been bound to a function, it will invoke the function by default. The default interpreter seeing a GET-WORD! symbol type, on the other hand, suppresses invocation and just returns the value the word is bound to.
The evaluator logic is actually rather straightforward for how it reacts when it sees a certain symbol type. Another way of suppressing invocation is the single quote, which will give you a LIT-WORD! symbol... but these become evaluated as the corresponding WORD! when it sees them:
>> some-word: 'print
>> type? some-word
== word!
In fact, the behavior of a GET-WORD! when the evaluator sees it is equivalent to using the GET function with a WORD!
doSomething: func [a b] [
a b
a b
]
doSomething get 'print "hello" {Message}
The interpreter sees the LIT-WORD! 'print and evaluates that into the WORD! for print, which is then passed to GET, which gives you a FUNCTION! back.
Simplicity of the interpreter logic is why you get things like:
>> a: b: c: 10 print [a b c]
10 10 10
Due to the nature of how it handles a SET-WORD! symbol followed by complete expressions. That yields also the following code printing out 20:
if 10 < a: 20 [
print a
]
Other languages achieve such features with specialized constructs (like multiple initialization, etc.) But Rebol's logic is simpler.
Just wanted to elaborate a bit to help explain what you were looking at. My answer to this other question might provide some more insight into the edge cases, historical and future: "When I use error? and try, err need a value"