Optimization of Function Calls in Haskell - optimization

Not sure what exactly to google for this question, so I'll post it directly to SO:
Variables in Haskell are immutable
Pure functions should result in same values for same arguments
From these two points it's possible to deduce that if you call somePureFunc somevar1 somevar2 in your code twice, it only makes sense to compute the value during the first call. The resulting value can be stored in some sort of a giant hash table (or something like that) and looked up during subsequent calls to the function. I have two questions:
Does GHC actually do this kind of optimization?
If it does, what is the behaviour in the case when it's actually cheaper to repeat the computation than to look up the results?
Thanks.

GHC doesn't do automatic memoization. See the GHC FAQ on Common Subexpression Elimination (not exactly the same thing, but my guess is that the reasoning is the same) and the answer to this question.
If you want to do memoization yourself, then have a look at Data.MemoCombinators.
Another way of looking at memoization is to use laziness to take advantage of memoization. For example, you can define a list in terms of itself. The definition below is an infinite list of all the Fibonacci numbers (taken from the Haskell Wiki)
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
Because the list is realized lazily it's similar to having precomputed (memoized) previous values. e.g. fibs !! 10 will create the first ten elements such that fibs 11 is much faster.

Saving every function call result (cf. hash consing) is valid but can be a giant space leak and in general also slows your program down a lot. It often costs more to check if you have something in the table than to actually compute it.

Related

Time Complexity comparision of memoized recursion and table method in Dynamic programming

Does every code of Dynamic Programming have the same time complexity in a table method or memorized recursion method?
A Solution with an appropriate example would be appreciated.
Time complexity- Yes (if you ignore the function calls/returns in Memoization)
Space complexity- No. Tabulation can save space by overwriting previously calculated but no longer needed values.
As mentioned in the "Optimality" section of this answer- https://stackoverflow.com/a/6165124/7145074
Either approach may not be time-optimal if the order you happen (or try to) visit subproblems is not optimal, specifically if there is more than one way to calculate a subproblem (normally caching would resolve this, but it's theoretically possible that caching might not in some exotic cases). Memoization will usually add on your time-complexity to your space-complexity (e.g. with tabulation you have more liberty to throw away calculations, like using tabulation with Fib lets you use O(1) space, but memoization with Fib uses O(N) stack space).
Further reading- https://www.geeksforgeeks.org/tabulation-vs-memoization/

Optimization of Lisp function calls

If my code is calling a function, and one of the function's arguments will vary based on a certain condition, is it more efficient to have the conditional statement as an argument of the function, or to call the function multiple times in the conditional statement.
Example:
(if condition (+ 4 3) (+ 5 3))
(+ (if condition 4 5) 3)
Obiously this is just an example: in the real scenario the numbers would be replaced by long, complex expressions, full of variables. The if might instead be a long cond statement.
Which would be more efficient in terms of speed, space etc?
Don't
What you care about is not performance (in this case the difference will be trivial) but code readability.
Remember,
"... a computer language is not just a way of getting a computer to
perform operations, but rather ... it is a novel formal medium for
expressing ideas about methodology"
Abelson/Sussman "Structure and
Interpretation of Computer Programs".
You are writing code primarily for others (and you yourself next year!) to read. The fact that the computer can execute it is a welcome fringe benefit.
(I am exaggerating, of course, but much less than you think).
Okay...
Now that you skipped the harangue (if you claim you did not, close your eyes and tell me which specific language I mention above), let me try to answer your question.
If you profiled your program and found that this place is the bottleneck, you should first make sure that you are using the right algorithm.
E.g., using a linearithmic sort (merge/heap) instead of quadratic (bubble/insertion) sort will make much bigger difference than micro-optimizations like you are contemplating.
Then you should disassemble both versions of your code; the shorter version is (ceteris paribus) likely to be marginally faster.
Finally, you can waste a couple of hours of machine time repeatedly running both versions on the same output on an otherwise idle box to discover that there is no statistically significant difference between the two approaches.
I agree with everything in sds's answer (except using a trick question -_-), but I think it might be nice to add an example. The code you've given doesn't have enough context to be transparent. Why 5? Why 4? Why 3? When should each be used? Should there always be only two options? The code you've got now is sort of like:
(defun compute-cost (fixed-cost transaction-type)
(+ fixed-cost
(if (eq transaction-type 'discount) ; hardcoded magic numbers
3 ; and conditions are brittle
4)))
Remember, if you need these magic numbers (3 and 4) here, you might need them elsewhere. If you ever have to change them, you'll have to hope you don't miss any cases. It's not fun. Instead, you might do something like this:
(defun compute-cost (fixed-cost transaction-type)
(+ fixed-cost
(variable-cost transaction-type)))
(defun variable-cost (transaction-type)
(case transaction-type
((employee) 2) ; oh, an extra case we'd forgotten about!
((discount) 3)
(t 4)))
Now there's an extra function call, it's true, but computation of the magic addend is pulled out into its own component, and can be reused by anything that needs it, and can be updated without changing any other code.

Why is argsort called argsort?

So, it's an indirect sort that returns the indices that would sort an array. Why is it "argsort" (which makes some sense given that it takes an argument -- the type of sort to use) but not "indirect_sort" or something like that? Or get_sort_indexer?
Googling for argsort gives numpy as first result, so it might indeed be a bit uncommon name. This might be something historic, apparently it already existed in 1997 in Numeric, a predecessor to Numpy.
I would guess they came up with that name as a parallel to argmax, which seems to be standard mathematical function that is implemented in many languages (numpy, mathematica, ...), even though the analogy is not perfect.

Memory efficiency in If statements

I'm thinking more about how much system memory my programs will use nowadays. I'm currently doing A level Computing at college and I know that in most programs the difference will be negligible but I'm wondering if the following actually makes any difference, in any language.
Say I wanted to output "True" or "False" depending on whether a condition is true. Personally, I prefer to do something like this:
Dim result As String
If condition Then
Result = "True"
Else
Result = "False"
EndIf
Console.WriteLine(result)
However, I'm wondering if the following would consume less memory, etc.:
If condition Then
Console.WriteLine("True")
Else
Console.WriteLine("False")
EndIf
Obviously this is a very much simplified example and in most of my cases there is much more to be outputted, and I realise that in most commercial programs these kind of statements are rare, but hopefully you get the principle.
I'm focusing on VB.NET here because that is the language used for the course, but really I would be interested to know how this differs in different programming languages.
The main issue making if's fast or slow is predictability.
Modern CPU's (anything after 2000) use a mechanism called branch prediction.
Read the above link first, then read on below...
Which is faster?
The if statement constitutes a branch, because the CPU needs to decide whether to follow or skip the if part.
If it guesses the branch correctly the jump will execute in 0 or 1 cycle (1 nanosecond on a 1Ghz computer).
If it does not guess the branch correctly the jump will take 50 cycles (give or take) (1/200th of a microsecord).
Therefore to even feel these differences as a human, you'd need to execute the if statement many millions of times.
The two statements above are likely to execute in exactly the same amount of time, because:
assigning a value to a variable takes negligible time; on average less than a single cpu cycle on a multiscalar CPU*.
calling a function with a constant parameter requires the use of an invisible temporary variable; so in all likelihood code A compiles to almost the exact same object code as code B.
*) All current CPU's are multiscalar.
Which consumes less memory
As stated above, both versions need to put the boolean into a variable.
Version A uses an explicit one, declared by you; version B uses an implicit one declared by the compiler.
However version A is guaranteed to only have one call to the function WriteLine.
Whilst version B may (or may not) have two calls to the function WriteLine.
If the optimizer in the compiler is good, code B will be transformed into code A, if it's not it will remain with the redundant calls.
How bad is the waste
The call takes about 10 bytes for the assignment of the string (Unicode 2 bytes per char).
But so does the other version, so that's the same.
That leaves 5 bytes for a call. Plus maybe a few extra bytes to set up a stackframe.
So lets say due to your totally horrible coding you have now wasted 10 bytes.
Not much to worry about.
From a maintainability point of view
Computer code is written for humans, not machines.
So from that point of view code A is clearly superior.
Imagine not choosing between 2 options -true or false- but 20.
You only call the function once.
If you decide to change the WriteLine for another function you only have to change it in one place, not two or 20.
How to speed this up?
With 2 values it's pretty much impossible, but if you had 20 values you could use a lookup table.
Obviously that optimization is not worth it unless code gets executed many times.
If you need to know the precise amount of memory the instructions are going to take, you can use ildasm on your code, and see for yourself. However, the amount of memory consumed by your code is much less relevant today, when the memory is so cheap and abundant, and compilers are smart enough to see common patterns and reduce the amount of code that they generate.
A much greater concern is readability of your code: if a complex chain of conditions always leads to printing a conditionally set result, your first code block expresses this idea in a cleaner way than the second one does. Everything else being equal, you should prefer whatever form of code that you find the most readable, and let the compiler worry about optimization.
P.S. It goes without saying that Console.WriteLine(condition) would produce the same result, but that is of course not the point of your question.

Best Scala collection type for vectorized numerical computing

Looking for the proper data type (such as IndexedSeq[Double]) to use when designing a domain-specific numerical computing library. For this question, I'm limiting scope to working with 1-Dimensional arrays of Double. The library will define a number functions that are typically applied for each element in the 1D array.
Considerations:
Prefer immutable data types, such as Vector or IndexedSeq
Want to minimize data conversions
Reasonably efficient in space and time
Friendly for other people using the library
Elegant and clean API
Should I use something higher up the collections hierarchy, such as Seq?
Or is it better to just define the single-element functions and leave the mapping/iterating to the end user?
This seems less efficient (since some computations could be done once per set of calls), but at at the same time a more flexible API, since it would work with any type of collection.
Any recommendations?
If your computations are to do anything remotely computationally intensive, use Array, either raw or wrapped in your own classes. You can provide a collection-compatible wrapper, but make that an explicit wrapper for interoperability only. Everything other than Array is generic and thus boxed and thus comparatively slow and bulky.
If you do not use Array, people will be forced to abandon whatever things you have and just use Array instead when performance matters. Maybe that's okay; maybe you want the computations to be there for convenience not efficiency. In that case, I suggest using IndexedSeq for the interface, assuming that you want to let people know that indexing is not outrageously slow (e.g. is not List), and use Vector under the hood. You will use about 4x more memory than Array[Double], and be 3-10x slower for most low-effort operations (e.g. multiplication).
For example, this:
val u = v.map(1.0 / _) // v is Vector[Double]
is about three times slower than this:
val u = new Array[Double](v.length)
var j = 0
while (j<u.length) {
u(j) = 1.0/v(j) // v is Array[Double]
j += 1
}
If you use the map method on Array, it's just as slow as the Vector[Double] way; operations on Array are generic and hence boxed. (And that's where the majority of the penalty comes from.)
I am using Vectors all the time when I deal with numerical values, since it provides very efficient random access as well as append/prepend.
Also notice that, the current default collection for immutable indexed sequences is Vector, so that if you write some code like for (i <- 0 until n) yield {...}, it returns IndexedSeq[...] but the runtime type is Vector. So, it may be a good idea to always use Vectors, since some binary operators that take two sequences as input may benefit from the fact that the two arguments are of the same implementation type. (Not really the case now, but some one has pointed out that vector concatenation could be in log(N) time, as opposed to the current linear time due to the fact that the second parameter is simply treated as a general sequence.)
Nevertheless, I believe that Seq[Double] should already provide most of the function interfaces you need. And since mapping results from Range does not yield Vector directly, I usually put Seq[Double] as the argument type as my input, so that it has some generality. I would expect that efficiency is optimized in the underlying implementation.
Hope that helps.