Why does Metal compile this recursive function?

Why does Metal compile this recursive function? - gpu

The Metal spec says that recursive functions are not allowed. However this compiles fine:
int b(int c) {
if (c == 1)
return b(c++);
else if (c == 2)
return b(c + 2);
else
return c;
}
Why is that? What is the definition of recursive that Metal is using? It refers to section 5.2.2 of the C++14 spec, which also does not give any definition of "recursive" thus I'd expect the above to be a standard example of recursion.
Even if I do this, it still compiles:
int b(int c) {
return b(c + 2);
}
What gives?!

The function can compile fine like any C++ function, no problem. It's when you call the function from a kernel that you will get a compile error about recursive function use.

Related

Rust Range.contains failed to be inlined/optimized

I was running my code through Clippy and it suggested changing the following:
const SPECIAL_VALUE: u8 = 0; // May change eventually.
pub fn version1(value: u8) -> bool {
(value >= 1 && value <= 9) || value == SPECIAL_VALUE
}
Into
pub fn version2(value: u8) -> bool {
(1..=9).contains(&value) || value == SPECIAL_VALUE
}
Since it is more readable. Unfortunately the resulting assembly output is twice as long, even with optimization level 3. Manually inlining it (2-nestings down), gives almost the same code as version1 and is as efficient.
pub fn manually_inlined(value: u8) -> bool {
(1 <= value && value <= 9) || value == SPECIAL_VALUE
}
If I remove the || value == SPECIAL_VALUE they all resolve with the same (though with 1 more instruction added to decrement the parameter value before a compare). Also if I change SPECIAL_VALUE to something not adjacent to the range they all resolve to same assembly code as version2, which is the reason why I kept it 0 unless I eventually have to change it.
I have a link to Godbolt with the code here: https://rust.godbolt.org/z/bMYzfcYob
Why is the compiler failing to properly inline/optimize version2? Is it an "optimization bug"? Or am I misunderstanding some semantics of Rust, maybe something with the borrowing prevents the optimization, but can't the compiler assume no mutation of value due to the aliasing and referencing rules?
Trying to do the same in C++ suggest, yields the worse option in both cases (https://godbolt.org/z/zahfz65W3)
Edit: Changing the compiler for my C++ version to GCC makes it optimized in both cases.

This was indeed a missed optimization opportunity that has now been corrected in LLVM. https://github.com/rust-lang/rust/issues/90609#issuecomment-1046037263 .

Does Go support operator type variables?

This is might not be such a good question, since I don't know of any compiled language that supports this feature, but since Go is constantly surprising me, I'll ask it anyway:
For my own practice, I am writing a little calculator program in Go. I'm wondering if there is a way I can declare and assign a variable of type "Operator", such that I could, for example, write:
var o Operator
o = +
var o1 Operator
o1 = /
and write function like this
func DoOperation(a,b int,o Operator) int{
return a o b
}
(No, I am not asking about operator overloading.)
Offhand, I don't know of any compiled language that supports such a thing (I'm not an expert in this). I did look at the docs under operators and found nothing. Can Go surprise me again?
Edit: The accepted answer states that Haskell supports this,

No, Go operators are not functions and hence no valid right-hand expressions. They work in a generic way e.g. the plus-operator works on all numeric types and infix-notation a la haskell is not supported either.
You would have to write your own "soft"-generic addition function using reflection.
One compiled language that covers all of your requirements is Haskell.

You can't do exactly what you say, but you can use functions instead. You have to write functions for each operator, but that's relatively little code.
type BinaryOperator func(a, b int) int
func OpAdd(a, b int) int { return a + b }
func OpSub(a, b int) int { return a - b }
func ApplyBinaryOperator(a, b int, op BinaryOperator) int {
return op(a, b)
}

Coming from an oop background I started doing this :
package main
import "fmt"
type MyInt int64
func (i * MyInt) Add(n MyInt) * MyInt {
*i += n
return i
}
func (i MyInt) String() string {
v := int64(i)
return fmt.Sprintf("0x%x (%d)", v, v)
}
func main() {
x := MyInt(10)
x.Add(10).Add(20).Add(30)
fmt.Println("x = ", x)
}

Gimpel's PC Lint Value Tracking

I'm a newbie to this site, so if I mess up any question-asking etiquette here I apologize in advance... Thanks!
This is extremely simplified example code, but I think it shows what I'm talking about: I have a C++ method that makes a call into another method to test a value...
char m_array[MAX]; // class member, MAX is a #define
foo(unsigned int n)
{
if (validNumber(n)) //test n
{
// do stuff
m_array[n-1] = 0;
}
}
where: validNumber(unsigned int val) { return ((val > 0) && (val <= MAX)); }
The irritation I'm having is that PC Lint's Value Tracking seems to ignore the validNumber() call and gives a warning 661 possible access of out-of-bounds pointer (1 beyond end of data) by operator '['
However if I do it like this, Lint is happy:
if ((n > 0) && (n <= MAX)) //test n
...
So, does Lint's Value Tracking just not work if the test is a method call?
Thanks again,
HF

I'd guess that validNumber is defined after foo, but in any case, PC Lint normally makes one pass over the code, and in such cases it doesn't see validNumber as a check for the boundaries for n.
You could try the option -passes(2) or even 3, and see what Lint makes out of it. I think (but didn't try) that Lint would then correctly note that the value for n is within the correct bounds.

What compilers can detect pure mathematical functions and optimize them (without telling you so)?

I have seen that GCC is not able to detect pure mathematical functions and it needs you to provide the attribute "const" to indicate that.
What compilers can detect pure mathematical functions and optimize them (without telling you so)?

To do so is inherently risky in languages that have pointers and lack global compilation & analysis. So, if a an operation is declared non-const, the compiler must assume it could have side-effects.
Example:
//getx.cpp
int GetX(int input)
{
int* pData = (int*) input;
*pData = 50;
return 0;
}
// gety.cpp
int GetY(int input)
{
return GetX(input + 4);
}
// main.cpp
int main()
{
int arg[] { 0, 4 };
return GetY((int)arg);
}
The compiler while compiling GetY can't tell that GetX treats its argument as a pointer and dereferences and modifies data in a non-functional, side-effect-prone manner. That information is only available during linking so you'd have to re-invent the concept of linking to include a lot of code generation and analysis to support such a feature.

It's not really (afaik) the compiler that does this, but when writing C# in Visual Studio when using the plugin ReSharper, you can get compile time hints that indicate that it is possible to declare something as const. On the other hand, that doesn't go under the category "without telling you so", so it might not be what you're looking for...

It seems that gcc now does: doing "gcc -O2 -S" on the following code, and reading the assembly, the call to foo() from within test() is identified as pure and moved outside of the loop:
#include <stdio.h>
double __attribute__((noinline)) foo(double x)
{
x = x + 1;
x = x * x;
if (x > 20)
x -= 1;
x -= x * x;
return x;
}
void test(int iters, double x)
{
int i;
for (i = 0; i < iters; ++i) {
printf("%g\n", foo(x));
}
}
This is Fedora 22, gcc 5.1.1, x86_64. I haven't tried, but with -flto, I would expect this to work across compilation units.
Also, it is worth noting that today gcc has the command line options -Wsuggest-attribute=pure and -Wsuggest-attribute=const.

How do I write a generic memoize function?

I'm writing a function to find triangle numbers and the natural way to write it is recursively:
function triangle (x)
if x == 0 then return 0 end
return x+triangle(x-1)
end
But attempting to calculate the first 100,000 triangle numbers fails with a stack overflow after a while. This is an ideal function to memoize, but I want a solution that will memoize any function I pass to it.

Mathematica has a particularly slick way to do memoization, relying on the fact that hashes and function calls use the same syntax:
triangle[0] = 0;
triangle[x_] := triangle[x] = x + triangle[x-1]
That's it. It works because the rules for pattern-matching function calls are such that it always uses a more specific definition before a more general definition.
Of course, as has been pointed out, this example has a closed-form solution: triangle[x_] := x*(x+1)/2. Fibonacci numbers are the classic example of how adding memoization gives a drastic speedup:
fib[0] = 1;
fib[1] = 1;
fib[n_] := fib[n] = fib[n-1] + fib[n-2]
Although that too has a closed-form equivalent, albeit messier: http://mathworld.wolfram.com/FibonacciNumber.html
I disagree with the person who suggested this was inappropriate for memoization because you could "just use a loop". The point of memoization is that any repeat function calls are O(1) time. That's a lot better than O(n). In fact, you could even concoct a scenario where the memoized implementation has better performance than the closed-form implementation!

You're also asking the wrong question for your original problem ;)
This is a better way for that case:
triangle(n) = n * (n - 1) / 2
Furthermore, supposing the formula didn't have such a neat solution, memoisation would still be a poor approach here. You'd be better off just writing a simple loop in this case. See this answer for a fuller discussion.

I bet something like this should work with variable argument lists in Lua:
local function varg_tostring(...)
local s = select(1, ...)
for n = 2, select('#', ...) do
s = s..","..select(n,...)
end
return s
end
local function memoize(f)
local cache = {}
return function (...)
local al = varg_tostring(...)
if cache[al] then
return cache[al]
else
local y = f(...)
cache[al] = y
return y
end
end
end
You could probably also do something clever with a metatables with __tostring so that the argument list could just be converted with a tostring(). Oh the possibilities.

In C# 3.0 - for recursive functions, you can do something like:
public static class Helpers
{
public static Func<A, R> Memoize<A, R>(this Func<A, Func<A,R>, R> f)
{
var map = new Dictionary<A, R>();
Func<A, R> self = null;
self = (a) =>
{
R value;
if (map.TryGetValue(a, out value))
return value;
value = f(a, self);
map.Add(a, value);
return value;
};
return self;
}
}
Then you can create a memoized Fibonacci function like this:
var memoized_fib = Helpers.Memoize<int, int>((n,fib) => n > 1 ? fib(n - 1) + fib(n - 2) : n);
Console.WriteLine(memoized_fib(40));

In Scala (untested):
def memoize[A, B](f: (A)=>B) = {
var cache = Map[A, B]()
{ x: A =>
if (cache contains x) cache(x) else {
val back = f(x)
cache += (x -> back)
back
}
}
}
Note that this only works for functions of arity 1, but with currying you could make it work. The more subtle problem is that memoize(f) != memoize(f) for any function f. One very sneaky way to fix this would be something like the following:
val correctMem = memoize(memoize _)
I don't think that this will compile, but it does illustrate the idea.

Update: Commenters have pointed out that memoization is a good way to optimize recursion. Admittedly, I hadn't considered this before, since I generally work in a language (C#) where generalized memoization isn't so trivial to build. Take the post below with that grain of salt in mind.
I think Luke likely has the most appropriate solution to this problem, but memoization is not generally the solution to any issue of stack overflow.
Stack overflow usually is caused by recursion going deeper than the platform can handle. Languages sometimes support "tail recursion", which re-uses the context of the current call, rather than creating a new context for the recursive call. But a lot of mainstream languages/platforms don't support this. C# has no inherent support for tail-recursion, for example. The 64-bit version of the .NET JITter can apply it as an optimization at the IL level, which is all but useless if you need to support 32-bit platforms.
If your language doesn't support tail recursion, your best option for avoiding stack overflows is either to convert to an explicit loop (much less elegant, but sometimes necessary), or find a non-iterative algorithm such as Luke provided for this problem.

function memoize (f)
local cache = {}
return function (x)
if cache[x] then
return cache[x]
else
local y = f(x)
cache[x] = y
return y
end
end
end
triangle = memoize(triangle);
Note that to avoid a stack overflow, triangle would still need to be seeded.

Here's something that works without converting the arguments to strings.
The only caveat is that it can't handle a nil argument. But the accepted solution can't distinguish the value nil from the string "nil", so that's probably OK.
local function m(f)
local t = { }
local function mf(x, ...) -- memoized f
assert(x ~= nil, 'nil passed to memoized function')
if select('#', ...) > 0 then
t[x] = t[x] or m(function(...) return f(x, ...) end)
return t[x](...)
else
t[x] = t[x] or f(x)
assert(t[x] ~= nil, 'memoized function returns nil')
return t[x]
end
end
return mf
end

I've been inspired by this question to implement (yet another) flexible memoize function in Lua.
https://github.com/kikito/memoize.lua
Main advantages:
Accepts a variable number of arguments
Doesn't use tostring; instead, it organizes the cache in a tree structure, using the parameters to traverse it.
Works just fine with functions that return multiple values.
Pasting the code here as reference:
local globalCache = {}
local function getFromCache(cache, args)
local node = cache
for i=1, #args do
if not node.children then return {} end
node = node.children[args[i]]
if not node then return {} end
end
return node.results
end
local function insertInCache(cache, args, results)
local arg
local node = cache
for i=1, #args do
arg = args[i]
node.children = node.children or {}
node.children[arg] = node.children[arg] or {}
node = node.children[arg]
end
node.results = results
end
-- public function
local function memoize(f)
globalCache[f] = { results = {} }
return function (...)
local results = getFromCache( globalCache[f], {...} )
if #results == 0 then
results = { f(...) }
insertInCache(globalCache[f], {...}, results)
end
return unpack(results)
end
end
return memoize

Here is a generic C# 3.0 implementation, if it could help :
public static class Memoization
{
public static Func<T, TResult> Memoize<T, TResult>(this Func<T, TResult> function)
{
var cache = new Dictionary<T, TResult>();
var nullCache = default(TResult);
var isNullCacheSet = false;
return parameter =>
{
TResult value;
if (parameter == null && isNullCacheSet)
{
return nullCache;
}
if (parameter == null)
{
nullCache = function(parameter);
isNullCacheSet = true;
return nullCache;
}
if (cache.TryGetValue(parameter, out value))
{
return value;
}
value = function(parameter);
cache.Add(parameter, value);
return value;
};
}
}
(Quoted from a french blog article)

In the vein of posting memoization in different languages, i'd like to respond to #onebyone.livejournal.com with a non-language-changing C++ example.
First, a memoizer for single arg functions:
template <class Result, class Arg, class ResultStore = std::map<Arg, Result> >
class memoizer1{
public:
template <class F>
const Result& operator()(F f, const Arg& a){
typename ResultStore::const_iterator it = memo_.find(a);
if(it == memo_.end()) {
it = memo_.insert(make_pair(a, f(a))).first;
}
return it->second;
}
private:
ResultStore memo_;
};
Just create an instance of the memoizer, feed it your function and argument. Just make sure not to share the same memo between two different functions (but you can share it between different implementations of the same function).
Next, a driver functon, and an implementation. only the driver function need be public
int fib(int); // driver
int fib_(int); // implementation
Implemented:
int fib_(int n){
++total_ops;
if(n == 0 || n == 1)
return 1;
else
return fib(n-1) + fib(n-2);
}
And the driver, to memoize
int fib(int n) {
static memoizer1<int,int> memo;
return memo(fib_, n);
}
Permalink showing output on codepad.org. Number of calls is measured to verify correctness. (insert unit test here...)
This only memoizes one input functions. Generalizing for multiple args or varying arguments left as an exercise for the reader.

In Perl generic memoization is easy to get. The Memoize module is part of the perl core and is highly reliable, flexible, and easy-to-use.
The example from it's manpage:
# This is the documentation for Memoize 1.01
use Memoize;
memoize('slow_function');
slow_function(arguments); # Is faster than it was before
You can add, remove, and customize memoization of functions at run time! You can provide callbacks for custom memento computation.
Memoize.pm even has facilities for making the memento cache persistent, so it does not need to be re-filled on each invocation of your program!
Here's the documentation: http://perldoc.perl.org/5.8.8/Memoize.html

Extending the idea, it's also possible to memoize functions with two input parameters:
function memoize2 (f)
local cache = {}
return function (x, y)
if cache[x..','..y] then
return cache[x..','..y]
else
local z = f(x,y)
cache[x..','..y] = z
return z
end
end
end
Notice that parameter order matters in the caching algorithm, so if parameter order doesn't matter in the functions to be memoized the odds of getting a cache hit would be increased by sorting the parameters before checking the cache.
But it's important to note that some functions can't be profitably memoized. I wrote memoize2 to see if the recursive Euclidean algorithm for finding the greatest common divisor could be sped up.
function gcd (a, b)
if b == 0 then return a end
return gcd(b, a%b)
end
As it turns out, gcd doesn't respond well to memoization. The calculation it does is far less expensive than the caching algorithm. Ever for large numbers, it terminates fairly quickly. After a while, the cache grows very large. This algorithm is probably as fast as it can be.

Recursion isn't necessary. The nth triangle number is n(n-1)/2, so...
public int triangle(final int n){
return n * (n - 1) / 2;
}

Please don't recurse this. Either use the x*(x+1)/2 formula or simply iterate the values and memoize as you go.
int[] memo = new int[n+1];
int sum = 0;
for(int i = 0; i <= n; ++i)
{
sum+=i;
memo[i] = sum;
}
return memo[n];

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Why does Metal compile this recursive function? - gpu

The function can compile fine like any C++ function, no problem. It's when you call the function from a kernel that you will get a compile error about recursive function use.

Related

Rust Range.contains failed to be inlined/optimized

Does Go support operator type variables?

Gimpel's PC Lint Value Tracking

What compilers can detect pure mathematical functions and optimize them (without telling you so)?

How do I write a generic memoize function?

Categories

Resources