Is it possible to get 3-6x speedup from the following simple class?
I am trying to make a class that pretends to be an inline function but the parenthesis/subsref operator overloading doesn't go fast enough for me.
I created the class CTestOp to replace the inline function f = #(x) A*x by letting subsref take a vector and multiplying it against the class property A.
Benchmarks indicate that for small size A and x (say, m=5) it takes 4-7x as long to use the inline function as to just write A*x and it takes 4-7x as long to use the class as to use the inline function:
Elapsed time is 0.327328 seconds for the class
Elapsed time is 0.053322 seconds for the inline function.
Elapsed time is 0.011704 seconds for just writing A*x.
I have made a series of improvements to get here but there are problems. I can see substantial gains, for instance, by not asking for this.A but then that defeats the whole purpose. I would have liked to use an abstract class that allows us to write various operation functions---but while making the class abstract didn't add much time at all, making the actual function call did.
Any ideas?
The class is:
classdef CTestOp < handle
properties
A = [];
end
methods
function this = CTestOp(A)
this.A = A;
end
function result = operation(this, x)
result = this.A*x;
end
function result = subsref(this, S)
% switch S.type
% case '()'
% result = this.operation(S.subs{1}); % Killed because this was really slow
% result = operation(this, S.subs{1}); % I wanted this, but it was too slow
result = this.A*S.subs{1};
% otherwise
% result = builtin('subsref', this, S);
% end
end
end
end
While the test code is:
m = 5;
A = randn(m,m);
x = randn(m,1);
f = #(x) A*x;
myOp = CTestOp(A);
nc = 10000;
% Try with the class:
tic
for ind = 1:nc
r_abs = myOp(x);
end
toc
% Try with the inline function:
tic
for ind = 1:nc
r_fp = f(x);
end
toc
% Try just inline. so fast!
tic
for ind = 1:nc
r_inline = A*x;
end
toc
If you want to write fast code in Matlab, the trick was always to vectorize the code.
The same holds for using Matlab OO. Though I am unable to test it at the moment I am quite confident that you can reduce the overhead by performing one big operation rather than many small ones.
In your specific example, you can run the benchmark again and see if my statement actually holds by changing these two lines:
m = 500; % Work with one big matrix rather than many tiny ones
nc = 99; % Just some number that should give you reasonable run times
Related
I have
var x: Int
var invert: Boolean
and I need the value of the expression
if (invert) -x else x
Is there any more succinct way to write that expression in Kotlin?
There is no shorter expression using only the stdlib to my knowledge.
This is pretty clear, though. Using custom functions to make it shorter is possible, but it would only obscure the meaning IMO.
It's hard to tell which approach might be best without seeing more of the code, but one option is an extension function. For example:
fun Int.negateIf(condition: Boolean) = if (condition) -this else this
(I'm using the term ‘negate’ here, as that's less ambiguous: when dealing with numbers, I think ‘inverse’ more often refers to a multiplicative inverse, i.e. reciprocal.)
You could then use:
x.negateIf(invert)
I think that makes the meaning very clear, and saves a few characters. (The saving is greater if x is a long name or an expression, of course.)
If invert didn't change (e.g. if it were a val), another option would be to derive a multiplier from it, e.g.:
val multiplier = if (invert) -1 else 1
Then you could simply multiply by that:
x * multiplier
That's even shorter, though a little less clear; if you did that, it would be worth adding a comment to explain it.
(BTW, whichever approach you use, there's an extremely rare corner case here: no positive Int has the same magnitude as Int.MIN_VALUE (-2147483648), so you can't negate that one value. Either way, you'll get that same number back. There's no easy way around that, but it's worth being aware of.)
You could create a local extension function.
Local functions are helpful when you want to reduce some repetitive code and you want to access a local variable (in this case, the invert boolean).
Local functions are particularly useful when paired with extension functions, as extension functions only have one 'receiver' - so it would be difficult, or repetitive, to access invert if invert() wasn't a local function.
fun main() {
val x = 1
val y = 2
val z = 3
var invert = false
// this local function can still access 'invert'
fun Int.invert(): Int = if (invert) -this else this
println("invert = false -> (${x.invert()}, ${y.invert()}, ${z.invert()})")
invert = true
println("invert = true -> (${x.invert()}, ${y.invert()}, ${z.invert()})")
}
invert = false -> (1, 2, 3)
invert = true -> (-1, -2, -3)
hi read in a book that calling subroutines is considered to be a constant time operation, even if the subroutines itself does not execute in constant time, but depends on the input size.
Then if i have the following piece of code:
void func(int m){
int n = 10;
subrout(m);//function which complexity depends on m
subrout2(n);//function which complexity depends on n
}
i suppose i can consider func() to be a constant time function, e.g. O(1)?
and what if i have this:
void func(){
int n = 10;
Type object;
object.member_method(n);/*member function which time complexity depends upon n*/
}
can i still consider func() a constant time function?
is there some case in which this rule falls ?
thanks!
No, you cannot consider func(int m) to have a constant time complexity. Its time complexity is O(T1(m) + T2(10)), where T1 and T2 are functions describing the time complexity of subrout and subrout2, respectively.
In the second case, the time complexity, technically, is constant.
As a general comment, the point of specifying time complexity with asymptotic notation is to describe how the number of operations increases as a function of input size.
What the book probably meant to say is that time complexity of the calling function T_func is T_call + T_callee. Here T_call is the time operation of passing parameters and setting up the environment for the callee and T_callee is the time spent inside the subroutine. The book says is that it is safe to assume T_call is constant, while no such assumptions are made regarding T_callee.
To clarify assume we have a function func that calls one subroutine callee.
func(s){
callee(s);
}
then T_func(s) = T_call + T_callee(s). If size(s) = n and T_callee = O(f(n)) then it is safe to say that T_func = O(f(n)).
I've created an interprter for a simple language. It is AST based (to be more exact, an irregular heterogeneous AST) with visitors executing and evaluating nodes. However I've noticed that it is extremely slow compared to "real" interpreters. For testing I've ran this code:
i = 3
j = 3
has = false
while i < 10000
j = 3
has = false
while j <= i / 2
if i % j == 0 then
has = true
end
j = j+2
end
if has == false then
puts i
end
i = i+2
end
In both ruby and my interpreter (just finding primes primitively). Ruby finished under 0.63 second, and my interpreter was over 15 seconds.
I develop the interpreter in C++ and in Visual Studio, so I've used the profiler to see what takes the most time: the evaluation methods.
50% of the execution time was to call the abstract evaluation method, which then casts the passed expression and calls the proper eval method. Something like this:
Value * eval (Exp * exp)
{
switch (exp->type)
{
case EXP_ADDITION:
eval ((AdditionExp*) exp);
break;
...
}
}
I could put the eval methods into the Exp nodes themselves, but I want to keep the nodes clean (Terence Parr saied something about reusability in his book).
Also at evaluation I always reconstruct the Value object, which stores the result of the evaluated expression. Actually Value is abstract, and it has derived value classes for different types (That's why I work with pointers, to avoid object slicing at returning). I think this could be another reason of slowness.
How could I make my interpreter as optimized as possible? Should I create bytecodes out of the AST and then interpret bytecodes instead? (As far as I know, they could be much faster)
Here is the source if it helps understanding my problem: src
Note: I haven't done any error handling yet, so an illegal statement or an error will simply freeze the program. (Also sorry for the stupid "error messages" :))
The syntax is pretty simple, the currently executed file is in OTZ1core/testfiles/test.txt (which is the prime finder).
I appreciate any help I can get, I'm really beginner at compilers and interpreters.
One possibility for a speed-up would be to use a function table instead of the switch with dynamic retyping. Your call to the typed-eval is going through at least one, and possibly several, levels of indirection. If you distinguish the typed functions instead by name and give them identical signatures, then pointers to the various functions can be packed into an array and indexed by the type member.
value (*evaltab[])(Exp *) = { // the order of functions must match
Exp_Add, // the order type values
//...
};
Then the whole switch becomes:
evaltab[exp->type](exp);
1 indirection, 1 function call. Fast.
So, I am programming a simple Mandelbrot renderer.
My inner loop (which is executed up to ~100,000,000 times each time I draw on screen) looks like this:
Complex position = {re,im};
Complex z = {0.0, 0.0};
uint32_t it = 0;
for (; it < maxIterations; it++)
{
//Square z
double old_re = z.re;
z.re = z.re*z.re - z.im*z.im;
z.im = 2*old_re*z.im;
//Add c
z.re = z.re+position.re;
z.im = z.im+position.im;
//Exit condition (mod(z) > 5)
if (sqrt(z.re*z.re + z.im*z.im) > 5.0f)
break;
}
//Color in the pixel according to value of 'it'
Just some very simple calculations. This takes between 0.5 and a couple of seconds, depending on the zoom and so on, but i need it to be much faster, to enable (almost) smooth scrolling.
My question is: What is my best bet to achieve the maximum possible calculation speed?
OpenCl to use the GPU? Coding it in assembly? Dividing the image into small pieces and dispatch the calculation of each piece on another thread? A combination of those?
Any help is appreciated!
I have written a Mandelbrot set renderer several times... and here are the things that you should keep in mind...
The things that take the longest are the ones that never escape and take all the iterations.
a. so you can make a region in the middle out of a few rectangles and check that first.
any starting point with a real and imaginary part between -1 and 1 will never escape.
you can cache points (20, or 30) in a rolling buffer and if you ever see a point in the buffer that you just calculated means that you have a cycle and it will never escape.
You can use a more general logic that doesn't require a square root... in that if any part is less than -2 or more than 2 it will race out of control and can be considered escaped.
But you can also break this up because each point is its own thing, so you can make a separate thread or gcd dispatch or whatever for each row or quadrant... it is a very easy problem to divide up and run in parallel.
In addition to the comments by #Grady Player you could start just by optimising your code
//Add c
z.re += position.re;
z.im += position.im;
//Exit condition (mod(z) > 5)
if (z.re*z.re + z.im*z.im > 25.0f)
break;
The compiler may optimise the first, but the second will certainly help.
Why are you coding your own complex rather than using complex.h
Below is an example that doesn't work in Matlab because obj.yo is used as the for loop's index. You can just convert this to the equivalent while loop and it works fine, so why won't Matlab let this code run?
classdef iter_test
properties
yo = 1;
end
methods
function obj = iter_test
end
function run(obj)
for obj.yo = 1:10
disp('yo');
end
end
end
end
Foreword: You shouldn't expect too much from Matlab's oop capabilities. Even though things have gotten better with matlab > 2008a, compared to a real programming language, oop support in Matlab is very poor.
From my experience, Mathworks is trying to protect the user as much as possible from doing mistakes. This sometimes also means that they are restricting the possibilities.
Looking at your example I believe that exactly the same is happening.
Possible Answer: Since Matlab doesn't have any explicit typing (variables / parameters are getting typed on the fly), your code might run into problems. Imagine:
$ a = iter_test()
% a.yo is set to 1
% let's overwrite 'yo'
$ a.yo = struct('somefield', [], 'second_field', []);
% a.yo is now a struct
The following code will therefore fail:
$ for a.yo
disp('hey');
end
I bet that if matlab would support typing of parameters / variables, your code would work just fine. However, since you can assign a completely different data type to a parameter / variable after initialization, the compiler doesn't allow you to do what you want to do because you might run into trouble.
From help
"properties are like fields of a struct object."
Hence, you can use a property to read/write to it. But not use it as variable like you are trying to do. When you write
for obj.yo = 1:10
disp('yo');
end
then obj.yo is being used as a variable, not a field name.
compare to actual struct usage to make it more clear:
EDU>> s = struct('id',10)
for s.id=1:10
disp('hi')
end
s =
id: 10
??? for s.id=1:10
|
Error: Unexpected MATLAB operator.
However, one can 'set' the struct field to new value
EDU>> s.id=4
s =
id: 4
compare the above error to what you got:
??? Error using ==> iter_test
Error: File: iter_test.m Line: 9 Column: 20
Unexpected MATLAB operator.
Therefore, I do not think what you are trying to do is possible.
The error is
??? Error: File: iter_test.m Line: 9 Column: 20
Unexpected MATLAB operator.
Means that the MATLAB parser doesn't understand it. I'll leave it to you to decide whether it's a bug or deliberate. Raise it with TMW Technical Support.
EDIT: This also occurs for all other kinds of subscripting:
The following all fail to parse:
a = [0 1];
for a(1) = 1:10, end
a = {0 1};
for a{1} = 1:10, end
a = struct('a', 0, 'b', 0);
for a.a = 1:10, end
It's an issue with the MATLAB parser. Raise it with Mathworks.