display working out of order in Chez Scheme - chez-scheme

I'm using chez 9.5.4 on a Mac.
The following code:
;; demo.ss
(map display (list "this " "is " "weird "))
does this:
$ chez --script demo.ss
weird this is
Why the accidental Yoda?
How do I prevent this?
It works as expected in Chicken Scheme.

As answered by u/bjoli on reddit:
Your code is relying on unspecified behaviour: The order of map is unspecified.
You want for-each.
Chez has no stack overflow (like racket or guile). Doing a right fold with map means no (reverse ...) At the end. It is faster, except in some continuation-heavy code.
Schemes without the expanding stack optimization all do a left fold. Like chicken.

The short answer is that this is just what map does.
According to the r7rs-small specification, on page 51 of https://small.r7rs.org/attachment/r7rs.pdf :
The dynamic order in which proc is applied to
the elements of the list s is unspecified.
That's because map is intended for transforming lists by applying a pure function to each of their elements. The only effect of map should be its result list.
As divs1210 quotes u/bjoli in pointing out, Scheme also defines a procedure that does the thing you want. In fact, for-each is described on the very same page of the r7rs-small pdf! It says:
The arguments to for-each are like the arguments to map,
but for-each calls proc for its side effects rather than for
its values. Unlike map, for-each is guaranteed to call proc
on the elements of the list s in order from the first ele-
ment(s) to the last, and the value returned by for-each
is unspecified.

Related

Where is contains( Junction) defined?

This code works:
(3,6...66).contains( 9|21 ).say # OUTPUT: «any(True, True)␤»
And returns a Junction. It's also tested, but not documented.
The problem is I can't find its implementation anywhere. The Str code, which is also called from Cool, never returns a Junction (it does not take a Junction, either). There are no other methods contain in source.
Since it's autothreaded, it's probably specially defined somewhere. I have no idea where, though. Any help?
TL;DR Junction autothreading is handled by a single central mechanism. I have a go at explaining it below.
(The body of your question starts with you falling into a trap, one I think you documented a year or two back. It seems pretty irrelevant to what you're really asking but I cover that too.)
How junctions get handled
Where is contains( Junction) defined? ... The problem is I can't find [the Junctional] implementation anywhere. ... Since it's autothreaded, it's probably specially defined somewhere.
Yes. There's a generic mechanism that automatically applies autothreading to all P6 routines (methods, operators etc.) that don't have signatures that explicitly control what happens with Junction arguments.
Only a tiny handful of built in routines have these explicit Junction handling signatures -- print is perhaps the most notable. The same is true of user defined routines.
.contains does not have any special handling. So it is handled automatically by the generic mechanism.
Perhaps the section The magic of Junctions of my answer to an earlier SO Filtering elements matching two regexes will be helpful as a high level description of the low level details that follow below. Just substitute your 9|21 for the foo & bar in that SO, and your .contains for the grep, and it hopefully makes sense.
Spelunking the code
I'll focus on methods. Other routines are handled in a similar fashion.
method AUTOTHREAD does the work for full P6 methods.
This is setup in this code that sets up handling for both nqp and full P6 code.
The above linked P6 setup code in turn calls setup_junction_fallback.
When a method call occurs in a user's program, it involves calling find_method (modulo cache hits as explained in the comment above that code; note that the use of the word "fallback" in that comment is about a cache miss -- which is technically unrelated to the other fallback mechanisms evident in this code we're spelunking thru).
The bit of code near the end of this find_method handles (non-cache-miss) fallbacks.
Which arrives at find_method_fallback which starts off with the actual junction handling stuff.
A trap
This code works:
(3,6...66).contains( 9|21 ).say # OUTPUT: «any(True, True)␤»
It "works" to the degree this does too:
(3,6...66).contains( 2 | '9 1' ).say # OUTPUT: «any(True, True)␤»
See Lists become strings, so beware .contains() and/or discussion of the underlying issues such as pmichaud's comment.
Routines like print, put, infix ~, and .contains are string routines. That means they coerce their arguments to Str. By default the .Str coercion of a listy value is its elements separated by spaces:
put 3,6...18; # 3 6 9 12 15 18
put (3,6...18).contains: '9 1'; # True
It's also tested
Presumably you mean the two tests with a *.contains argument passed to classify:
my $m := #l.classify: *.contains: any 'a'..'f';
my $s := classify *.contains( any 'a'..'f'), #l;
Routines like classify are list routines. While some list routines do a single operation on their list argument/invocant, eg push, most of them, including classify, iterate over their list doing something with/to each element within the list.
Given a sequence invocant/argument, classify will iterate it and pass each element to the test, in this case a *.contains.
The latter will then coerce individual elements to Str. This is a fundamental difference compared to your example which coerces a sequence to Str in one go.

Is it possible to preserve variable names when writing and reading term programatically?

I'm trying to write an SWI-Prolog predicate that applies numbervars/3 to a term's anonymous variables but preserves the user-supplied names of its non-anonymous variables. I eventually plan on adding some kind of hook to term_expansion (or something like that).
Example of desired output:
?- TestList=[X,Y,Z,_,_].
> TestList=[X,Y,Z,A,B].
This answer to the question Converting Terms to Atoms preserving variable names in YAP prolog shows how to use read_term to obtain as atoms the names of the variables used in a term. This list (in the form [X='X',Y='Y',...]) does not contain the anonymous variables, unlike the variable list obtained by term_variables, making isolation of the anonymous variables fairly straightforward.
However, the usefulness of this great feature is somewhat limited if it can only be applied to terms read directly from the terminal. I noticed that all of the examples in the answer involve direct user input of the term. Is it possible to get (as atoms) the variable names for terms that are not obtained through direct user input? That is, is there some way to 'write' a term (preserving variable names) to some invisible stream and then 'read' it as if it were input from the terminal?
Alternatively... Perhaps this is more of a LaTeX-ish line of thinking, but is there some way to "wrap" variables inside single quotes (thereby atom-ifying them) before Prolog expands/tries to unify them as variables, with the end result that they're treated as atoms that start with uppercase letters rather than as variables?
You can use the ISO core standard variable_names/1 read and write option. Here is some example code, that replaces anonymous variables in a variable name mapping:
% replace_anon(+Map, +Map, -Map)
replace_anon([_=V|M], S, ['_'=V|N]) :- member(_=W, S), W==V, !,
replace_anon(M, S, N).
replace_anon([A=V|M], S, [A=V|N]) :-
replace_anon(M, S, N).
replace_anon([], _, []).
variable_names/1 is ISO core standard. It was always a read option. It then became a write option as well. See also: https://www.complang.tuwien.ac.at/ulrich/iso-prolog/WDCor3
Here is an example run:
Welcome to SWI-Prolog (threaded, 64 bits, version 7.7.25)
?- read_term(X,[variable_names(M),singletons(S)]),
replace_anon(M,S,N),
write_term(X,[variable_names(N)]).
|: p(X,Y,X).
p(X,_,X)
To use the old numbervars/3 is not recommended, since its not compatible with attribute variables. You cannot use it for example in the presence of CLP(FD).
Is it possible to get (as atoms) the variable names for terms that are not obtained through direct user input?
if you want to get variable names from source files you should read them from there.
The easiest way to do so using term expansion.
Solution:
read_term_from_atom(+Atom, -Term, +Options)
Use read_term/3 to read the next term from Atom.
Atom is either an atom or a string object.
It is not required for Atom to end with a full-stop.
Use Atom as input to read_term/2 using the option variable_names and return the read term in Term and the variable bindings in variable_names(Bindings).
Bindings is a list of Name = Var couples, thus providing access to the actual variable names. See also read_term/2.
If Atom has no valid syntax, a syntax_error exception is raised.
write_term( Term ) :-
numbervars(Term, 0, End),
write_canonical(Term), nl.

Why programming documentation has square brackets and commas in weird places?

Why in various programming documentation for functions do they have square brackets around parameters, but they are ordered such that the later parameters seem to be subsets of the first? Or if the brackets in that language delineate arrays it's as if the second parameter is supposed to be inside of the array of the first, but often the parameters are not even supposed to be arrays, and also they have commas in weird places.
I've seen this style all over the place and tried to find some place where it is written down why they do this. Maybe someone just arbitrarily decided on that and other programmers thought, "oh that looks cool, I'll try that in writing my own documentation.."
Or maybe there is some big book of rules for how to make programming docs? If so I'd like to know about it.
Here is an example: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/slice
if you go to the link in the blue box near the top of the page right bellow the h2? heading "syntax" it says this: arr.slice([begin[, end]]) meaning that the first parameter is begin, and the next parameter is end, for the slice method. When you first see something like this it looks like the brackets and commas are randomly placed.. but they do it all over the place and the same way. There must be some method to the madness!
Brackets around a parameter name indicate that it is an optional parameter. E.g. you can call the slice method without an end parameter. This is just a general rule of language syntax documentation, square brackets indicate optional words/tokens.

SWI prolog make set of variables name with rbtrees or others means

I have got a term from which I want to get set of variables name.
Eg. input: my_m(aa,b,B,C,max(D,C),D)
output: [B,C,D] (no need to be ordered as order of appearance in input)
(That would call like set_variable_name(Input,Output).)
I can simply get [B,C,D,C,D] from the input, but don't know how to implement set (only one appearance in output). I've tried something like storing in rbtrees but that failed, because of
only_one([],T,T) :- !.
only_one([X|XS],B,C) :- rb_in(X,X,B), !, only_one(XS,B,C).
only_one([X|XS],B,C) :- rb_insert(B,X,X,U), only_one(XS,U,C).
it returns tree with only one node and unification like B=C, C=D.... I think I get it why - because of unification of X while questioning rb_in(..).
So, how to store only once that name of variable? Or is that fundamentally wrong idea because we are using logic programming? If you want to know why I need this, it's because we are asked to implement A* algorithm in Prolog and this is one part of making search space.
You can use sort/2, which also removes duplicates.

TSearch2 - dots explosion

Following conversion
SELECT to_tsvector('english', 'Google.com');
returns this:
'google.com':1
Why does TSearch2 engine didn't return something like this?
'google':2, 'com':1
Or how can i make the engine to return the exploded string as i wrote above?
I just need "Google.com" to be foundable by "google".
Unfortunately, there is no quick and easy solution.
Denis is correct in that the parser is recognizing it as a hostname, which is why it doesn't break it up.
There are 3 other things you can do, off the top of my head.
You can disable the host parsing in the database. See postgres documentation for details. E.g. something like ALTER TEXT SEARCH CONFIGURATION your_parser_config
DROP MAPPING FOR url, url_path
You can write your own custom dictionary.
You can pre-parse your data before it's inserted into the database in some manner (maybe splitting all domains before going into the database).
I had a similar issue to you last year and opted for solution (2), above.
My solution was to write a custom dictionary that splits words up on non-word characters. A custom dictionary is a lot easier & quicker to write than a new parser. You still have to write C tho :)
The dictionary I wrote would return something like 'www.facebook.com':4, 'com':3, 'facebook':2, 'www':1' for the 'www.facebook.com' domain (we had a unique-ish scenario, hence the 4 results instead of 3).
The trouble with a custom dictionary is that you will no longer get stemming (ie: www.books.com will come out as www, books and com). I believe there is some work (which may have been completed) to allow chaining of dictionaries which would solve this problem.
First off in case you're not aware, tsearch2 is deprecated in favor of the built-in functionality:
http://www.postgresql.org/docs/9/static/textsearch.html
As for your actual question, google.com gets recognized as a host by the parser:
http://www.postgresql.org/docs/9.0/static/textsearch-parsers.html
If you don't want this to occur, you'll need to pre-process your text accordingly (or use a custom parser).