How to make SBCL optimize away possible call to FDEFINITION? - optimization

Apologies: I don't have sufficient knowledge to rework this as an easy to understand code snippet.
I've been using the SBCL compiler notes as signs to what might be improved but I'm well out of my depth with this —
; compiling (DEFUN EXECUTE-PARALLEL ...)
; file: /home/dunham/8000-benchmarksgame/bench/spectralnorm/spectralnorm.sbcl-8.sbcl
; in: DEFUN EXECUTE-PARALLEL
; (FUNCALL FUNCTION START END)
; --> SB-C::%FUNCALL THE
; ==>
; (SB-KERNEL:%COERCE-CALLABLE-FOR-CALL FUNCTION)
;
; note: unable to
; optimize away possible call to FDEFINITION at runtime
; because:
; FUNCTION is not known to be a function
—
#+sb-thread
(defun execute-parallel (start end function)
(declare (type int31 start end))
(let* ((num-threads 4))
(loop with step = (truncate (- end start) num-threads)
for index from start below end by step
collecting (let ((start index)
(end (min end (+ index step))))
(sb-thread:make-thread
(lambda () (funcall function start end))))
into threads
finally (mapcar #'sb-thread:join-thread threads))))
#-sb-thread
(defun execute-parallel (start end function )
(funcall function start end))
(The program is here. Measurements for similar programs are here.)
Is it practical to make SBCL "optimize away possible call to FDEFINITION" or is that compiler note an explanation rather than an opportunity?

The reason for the possible call to fdefinition is that it doesn't know that function is a function: it might be the name of one: in general it may be a function designator rather than a function. To keep the compiler quiet, explain to it that it is a function with a suitable type declaration, which is (declare (type function function)): you just need to declare that its type is function).
Rainer is right: there is ε chance that this is ever going to be a performance problem, given you're starting a new thread. In particular it is fairly likely that adding a declaration will make no difference at all:
without a declaration the call to funcall will get compiled as something like 'check the type of the object: if it is a function, call it; if it is not, call fdefinition on it and call the result;';
with a declaration then the overall function looks like 'check the object is a function, signalling an error if not ... call the function'.
In both cases, if the object is a function, there is one type check and one call: the type check is just in a different place. In the first case, the code will still work if the object is merely the name of a function, while with the type check it won't.
And in both of these cases this is code where you care calling make-thread: if this is anything like as fast as a function call, even via fdefinition I would be really impressed by the threading system! Almost certainly the performance of this function is entirely dominated by the overhead of making threads.

In real code, avoid optimizations like that - unless really needed
Is it practical to make SBCL "optimize away possible call to FDEFINITION" or is that compiler note an explanation rather than an opportunity?
Generally it does not matter, especially since most Lisp code should not be compiled with optimization qualities (speed 3) (safety 0) (space 0), since it may open up the software to runtime errors and crashes depending on the implementation and program used. Calling things unchecked (without safety), other than functions or symbols naming functions, via funcall might be dangerous enough to cause a program crash.
For a specific benchmark one might check via timings if a type declaration and a specialized fdefinition compilation brings any advantage.
a type declaration
A type declaration to make clear that a variable named fn is referencing an object of type function would be:
(declare (type function fn))
in the specific benchmark program FDEFINITION won't be called anyway
In the example you have provided, fdefinition will not be called anyway.
(setf foo (lambda (x) x)) ; foo references a function object
(funcall foo 3)
funcall is probably implemented by something like this:
(etypecase f
((or cons symbol) (funcall (fdefinition f) ...))
(function ...))
Since your code passes a function object, there is never the need to call fdefinition.
The optimization benefit then will be that the runtime type dispatch can be removed and the dead code for the cons or symbol case...

You ask a question about removing an fdefinition but actually your question relies on a premise that the sbcl notes are a good way to drive optimisations and improvements. The notes are a good way to spot obvious issues and places where type declarations can help. They do not tell you what actually makes your program slow. The correct way to improve the performance of a program is to 1. Think if there is a faster algorithm, and 2. Measure it’s performance and work out what is slow.
A single fdefinition call will only matter if it happens in a tight loop (ie it is not single but very plural)
In this case it happens to start a thread. If you are starting threads in a tight loop then your performance problem comes from starting threads in a tight loop. Don’t do that.
If you aren’t starting threads in a tight loop (looking at your code, it appears you are not), there are bigger fish to fry. Why waste time on an fdefinition that maybe gets called 4 times per call to execute-parallel when you can optimise the inner function instead.

Related

If CLOS had a compile-time dispatch, what would happen to this code snippet?

I am reading the Wikipedia article about CLOS.
It says that:
This dispatch mechanism works at runtime. Adding or removing methods thus may lead to changed effective methods (even when the generic function is called with the same arguments) at runtime. Changing the method combination also may lead to different effective methods.
Then, I inserted:
; declare the common argument structure prototype
(defgeneric f (x y))
; define an implementation for (f integer t), where t matches all types
(defmethod f ((x integer) y) 1)
Using SBCL and SLIME, I compiled the regions with code and had the following result:
CL-USER> (f 1 2)
1
Then, I added to the definition:
; define an implementation for (f integer real)
(defmethod f ((x integer) (y real)) 2)
Again, I repeated the process compiling the new region and using the REPL to eval:
CL-USER> (f 1 2.0)
2
First question, if CLOS had the opposite behavior of run-time dispatch (compile-time dispatch, I suppose), what would the result be?
Second question, I decided to comment out the second method, leaving just the generic function and the first written method. Then, I re-compiled the region with Emacs.
When calling the fuction fin the REPL with (f 1 2) I thought I would get 1 since the second method is out. Instead, I got 2.
CL-USER> (f 1 2.0)
2
Why did this happen?
The only way I can get back to (f 1 2) returning 1 is re-starting the Slime REPL and compiling the region (with the second method being commented out). Third question, is there a better way to have this result without having to re-start the REPL?
Thanks!
These two are actually the same question. The answer is: you are modifying a system while it is running.
If CLOS objects weren't re-definable at run-time, this would simply not work, or you'd not be allowed to do that. Try such re-definitions with basic structs (i. e. the things you get when using defstruct), and you will often run into pretty severe warnings or even errors when the change is not compatible. Of course, structs have other limitations, too, e. g. only single dispatch, so that it's not so easy to make an exactly analogous example. But try to remove a slot from a defstruct.
Just commenting out some source code doesn't change the fact that you evaluated (compiled and loaded) it before. You are manipulating a running system, and the source code is just that: source. If you want to remove a method from the running system, you can use remove-method (see also How to remove a defmethod for a struct). Most Lisp IDEs have ways to do that interactively, e. g. in SLIME using the SLIME inspector.

using user defined function inside forall loop

I have a pde to solve. For optimisation, i am using forall loops and inside the loop, the variables change using a user defined function in the following way.
forall(i=2:n-1,j=2:n-1, w(i,j).gt.wmax/1000)
k3(i,j)=w(i,j)+h*k(x(i),t)
end forall
Here, the k(x,t) is an externally defined function I defined earlier.
The error comes:
Reference to impure function ‘k’ at (1) inside a FORALL block
I am using gfortran. What is the solution if I need a user defined function inside a forall loop? Is it possible at all inside a forall or I need to do some other thing, that also would optimise? If some other thing is possible, kindly explain that too.
The problem is that you are referencing an impure function called k inside the FORALL block. To get this to work when you write your function you must make it a pure one, and have an interface in scope at the point you call it in the loop - pure is an assertion that the function will not (amongst other things) change its arguments, which, if this were to occur, could make parallel processing of the Forall construct give incorrect answers. If you had given a complete, minimal program showing your problem I would have shown you the changes you need to make, but as you haven't, well I can't.
But really this is by the by. DON'T use FORALL. Almost certainly your program won't run any faster than using a simple do loop, and quite possibly slower. Forall seemed like a good idea at the time, but for a variety of reasons it hasn't really worked out - I note in the latest edition of "Modern Fortran Explained" by Metcalf, Reid and Cohen, the classic book on Fortran, they mark it as obsolescent. Instead I would look into the more modern Do Concurrent, or, probably best, learn how to parallelise your loop with OpenMP.

How to explicitely use inherited variables in Fortran?

I have a question regarding best practices of model/variable usage:
Let's assume I have a module containing a few variable/parameter definitions and some subroutines that use these variables.
I do not need to explicitly use these variables in the subroutines since they are inherited from the parent module - but would it be better practice to do so?
Example:
module test
implicit none
integer, parameter :: a = 1
real :: x
contains
subroutine idk(y,z)
real, intent(in) :: y
real, intent(out) :: z
if(a .eq. 1) then
z = x*y + 5.
else
z = x*y - 5.
end if
end subroutine idk
end module test
The above example should work just fine but would it be better to add
use test, only: a,x
to the declaration part of subroutine idk?
In my reasoning, there are two main points here:
1) Pro: Explicitly adding this line let's me easily see which variables are actually needed in the subroutine.
In many cases, the module contains quite a number of variables but only a few are needed in each subroutine. So for reasons of better comprehensibility, it would be beneficial to add this line.
BUT
2) Contra: In quite a few cases, one needs a lot of the variables/parameters declared above (sometimes numbering more than 100 parameters). Explicitly using these at the beginning of the subroutine just unnecessarily clutters the code, reducing the readability of the code.
Point 1 matters mostly if only a few variables need to be included, whereas point 2 is only important if many variables need to be included. But I think it would be silly to do one thing for few variables and another for many - once you have picked a convention, you should stick to it IMHO...
Is there a best practice regarding this?
Addition:
Alternatively, one could declare the subroutine as
subroutine idk(b,w,y,z)
and then call it as idk(a,x,y,z).
On the one hand, this would give me greater flexibility if I later decide that I want to use idk with other variables.
On the other hand, it also increases the risk of mistakes if I change something later (say, I realize I don't need parameter a as a condition but parameter c. In the first cases, I simply switch out a -> c in the subroutine. But in the last case, I need to change every call to idk(c,...). If there are a lot of these calls, this is prone to mistakes)
I would really appreciate your input! Thank you!
There is absolutely no reason to use the module currently being defined. It is illegal. It may happen to compile if the module was compiled before and the compiler can find the .mod file, but file, but other than that it is wrong.
You should expect error such as
ifort -c assoc.f90
assoc.f90(10): error #6928: The module-name on a USE statement in a program unit cannot be the name of any encompassing scoping unit. [TEST]
use test
------^
The module subroutine gets the variables from the host module through host association and the use statement is for use association. These are two different things and should not be mixed.
If you want to avoid global variables, pass them as arguments. This is a general advice. What is best depends on each case and the programmer and cannot be answered generally.

How can I attach a type tag to a closure in Scheme?

How can I attach an arbitrary tag to a closure in Scheme?
Here are a couple things I'd like to use this for:
(1) To mark closures that provide an interface to produce a string for what they represent, like what #kud0h asked for here. A general ->string procedure could include code something like this:
(display (if (stringable? x)
(x 'string)
x)
str-port)
(2) More generally, to determine if a closure is an "object" that obeys the rules of a general object interface, or maybe to tell the class of an object (something like what #KPatnode was asking about here).
I can't query a procedure to see if it supports a certain interface by calling it, because if it doesn't support a known interface, calling the procedure will produce unpredictable results, most likely a run-time error.
Chez Scheme has putprop and getprop procedures that allow you to add keys and values to symbols. However, closures can be anonymous, or bound to different symbols, so I'd prefer to attach a calling-convention tag to the closure itself, not a symbol that it's bound to.
The only idea I have right now is to maintain a global hash table of all "stringable" or "object" closures in the system. That seems a little clunky. Is there a simpler, more elegant, or more efficient way?
Racket has applicable structures: you can give a structure type an apply hook to be called if an instance is used as a function.
If you want a more portable solution, you can use a hash table to associate your data with certain procedures. Unless your Scheme provides weak hashtables, though, keep in mind that the hashtable will prevent the procedures from being garbage-collected.
I think you might, instead of tagging procedures per se, want to look at Racket's object system, which has a concept of interfaces. It sounds quite similar to what you're after.
You could go extreme and redefine lambda syntax. Something like this (but untested by me):
(define *properties* '()) ;; example only
(define-syntax lambda
(let-syntax ((sys-lambda
(syntax-rules ()
((_ args body ...)
(lambda args body ...)))))
(syntax-rules ()
((_ args body ...)
(let ((func (sys-lambda args body ...)))
(set! *properties*
(cons (cons func '(NO-PROPERTIES))
*properties*))
func)))))

Use of recursion in Scala when run in the JVM

From searching elsewhere on this site and the web, tail call optimization is not supported by the JVM. Does that therefore mean that tail recursive Scala code such as the following, which may run on very large input lists, should not be written if it is to run on the JVM?
// Get the nth element in a list
def nth[T](n : Int, list : List[T]) : T = list match {
case Nil => throw new IllegalArgumentException
case _ if n == 0 => throw new IllegalArgumentException
case _ :: tail if n == 1 => list.head
case _ :: tail => nth(n - 1, tail)
}
Martin Odersky's Scala by Example contains the following paragragh which seems to suggests that there are circumstances or other environments where recursion is appropriate:
In principle, tail calls can always re-use the stack frame of the calling
function. However, some run-time environments (such as the Java VM) lack the
primitives to make stack frame re-use for tail calls efficient. A production quality
Scala implementation is therefore only required to re-use the stack frame of a di-
rectly tail-recursive function whose last action is a call to itself. Other tail calls might
be optimized also, but one should not rely on this across implementations.
Can anyone explain what this middle two sentences of this paragraph mean?
Thank you!
Since direct tail recursion is equivalent to a while loop, your example will run efficiently on the JVM because the Scala compiler can compile this to a loop under the hood, simply using a jump. General TCO however is not supported on the JVM, although there is available the tailcall() method which supports tail calls using compiler-generated trampolines.
To ensure that the compiler can correctly optimize a tail-recursive function to a loop, you can use the scala.annotation.tailrec annotation, which will cause a compiler error if the compiler cannot make the desired optimization:
import scala.annotation.tailrec
#tailrec def nth[T](n : Int, list : List[T]) : Option[T] = list match {
case Nil => None
case _ if n == 0 => None
case _ :: tail if n == 1 => list.headOption
case _ :: tail => nth(n - 1, tail)
}
(screw IllegalArgmentException!)
In principle, tail calls can always re-use the stack frame of the calling
function. However, some runtime environments (such as the Java VM) lack the
primitives to make stack frame re-use for tail calls efficient. A production quality
Scala implementation is therefore only required to re-use the stack frame of a di
rectly tail-recursive function whose last action is a call to itself. Other tail calls might
be optimized also, but one should not rely on this across implementations.
Can anyone explain what this middle two sentences of this paragraph mean?
Tail recursion is a special case of a tail call. Direct tail recursion is a special case of tail recursion. Only direct tail recursion is guaranteed to be optimized. Others may be optimized, too, but that's basically just a compiler optimization. As a language feature, Scala only guarantees direct tail recursion elimination.
So, what's the difference?
Well, a tail call is simply the last call in a subroutine:
def a = {
b
c
}
In this case, the call to c is a tail call, the call to b is not.
Tail recursion is when a tail call calls a subroutine that was already called before:
def a = {
b
}
def b = {
a
}
This is tail recursion: a calls b (a tail call), which in turn calls a again. (In contrast to the direct tail recursion described below, this is sometimes called indirect tail recursion.)
However, none of the two examples will get optimized by Scala. Or, more precisely: a Scala implementation is allowed to optimize them, but it is not required to do so. This is in contrast to, e.g. Scheme, where the language specification guarantees that all of the above cases will take O(1) stack space.
The Scala Language Specification only guarantees that direct tail recursion is optimized, i.e. when a subroutine directly calls itself with no other calls in between:
def a = {
b
a
}
In this case, the call to a is a tail call (because it is the last call in the subroutine), it is tail recursion (because it calls itself again) and most importantly it is direct tail recursion, because a directly calls itself without going through another call first.
Note that there are many subtle things that may lead to a method not being directly tail-recursive. For example, if a is overloaded, then the recursion may actually go through different overloads, and thus would no longer be direct.
In practice, this means two things:
you cannot perform an Extract Method Refactoring on a tail-recursive method, at least not including the tail call, because this would turn a directly tail-recursive method (which will get optimized) into an indirectly tail-recursive method (which will not get optimized).
You can only use direct tail recursion. A tail-recursive descent parser, or a state machine, which can be very elegantly expressed using indirect tail recursion, are out.
The main reason for this is that when your underlying execution engine lacks powerful control flow manipulation features such as GOTO, continuations, first-class mutable stack or proper tail calls, then you need to either implement your own stack on top of it, use trampolines, make a global CPS transform or something similarly nasty, in order to provide generalized proper tail calls. All of these have either severe impact on performance or interoperability with other code on the same platform.
Or, as Rich Hickey, the creator of Clojure, said when he was facing the same problem: "Performance, Java interop, tail calls. Pick two." Both Clojure and Scala chose to compromise on tail calls and provide only tail recursion and not full tail calls.
To cut a long story short: yes, the specific example you posted will be optimized, since it is direct tail recursion. You can test this by putting an #tailrec annotation on the method. The annotation does not change whether or not the method gets optimized, it does however guarantee that you will get a compile error if the method can not be optimized.
Due to the above-mentioned subtleties, it is generally a good idea to put an #tailrec annotation on methods that you need to be optimized, both in order to get a compile error, but also as a hint to other developers maintaining your code.
The Scala compiler will attempt to optimize tail calls by "flattening" them into a loop that won't cause a continually expanding stack.
Of course, your code has to be optimizable for it to do so. If you use the annotation #tailrec before your method however (scala.annotation.tailrec) the compiler will REQUIRE the method be optimizable or fail to compile.
Martin's remark is saying that only directly self-recursive calls are candidates (other criteria being met) for the TCO optimization. Indirect, mutually recursive method pairs (or larger sets of recursive methods) cannot be so optimized.
Note that there are JVMs that support tail call optimization (IIRC, IBM's J9 does), it's just not a requirement in the JLS, and Oracle's implementation doesn't do it.