I'm writing a compiler using LLVM as backend and have a lot of reference counting. When I borrow an object, I increment the object's reference counter. When I release an object, I decrement the reference counter, and free the object if it goes to zero.
However, if I only do a small piece of code, like this one:
++obj->ref;
global_variable_A = obj->a;
if (--obj->ref == 0)
free_object(obj);
LLVM optimizes this to (in IR but this is the equal code in C):
global_variable_A = obj->a;
if (obj->ref == 0)
free_object(obj);
But since I know that a reference counter is always positive before the first statement, it could be optimized to only
global_variable_A = obj->a;
My question: is there any way to tell the LLVM optimizer that a register or some memory, at a the time of reading it, is known to contain non-zero data?
An other equal question would be if I can tell the optimizer that a pointer is non-null, that would also be great.
You could write a custom FunctionPass that would replace the variable with a true value, then it should be optimised by DCE or SimplifyCFG.
http://llvm.org/docs/WritingAnLLVMPass.html
Related
I'm going through a Fortran code, and one bit has me a little puzzled.
There is a subroutine, say
SUBROUTINE SSUB(X,...)
REAL*8 X(0:N1,1:N2,0:N3-1),...
...
RETURN
END
Which is called in another subroutine by:
CALL SSUB(W(0,1,0,1),...)
where W is a 'working array'. It appears that a specific value from W is passed to the X, however, X is dimensioned as an array. What's going on?
This is non-uncommon idiom for getting the subroutine to work on a (rectangular in N-dimensions) subset of the original array.
All parameters in Fortran (at least before Fortran 90) are passed by reference, so the actual array argument is resolved as a location in memory. Choose a location inside the space allocated for the whole array, and the subroutine manipulates only part of the array.
Biggest issue: you have to be aware of how the array is laid out in memory and how Fortran's array indexing scheme works. Fortran uses column major array ordering which is the opposite convention from c. Consider an array that is 5x5 in size (and index both directions from 0 to make the comparison with c easier). In both languages 0,0 is the first element in memory. In c the next element in memory is [0][1] but in Fortran it is (1,0). This affects which indexes you drop when choosing a subspace: if the original array is A(i,j,k,l), and the subroutine works on a three dimensional subspace (as in your example), in c it works on Aprime[i=constant][j][k][l], but in Fortran in works on Aprime(i,j,k,l=constant).
The other risk is wrap around. The dimensions of the (sub)array in the subroutine have to match those in the calling routine, or strange, strange things will happen (think about it). So if A is declared of size (0:4,0:5,0:6,0:7), and we call with element A(0,1,0,1), the receiving routine is free to start the index of each dimension where ever it likes, but must make the sizes (4,5,6) or else; but that means that the last element in the j direction actually wraps around! The thing to do about this is not use the last element. Making sure that that happens is the programmers job, and is a pain in the butt. Take care. Lots of care.
in fortran variables are passed by address.
So W(0,1,0,1) is value and address. so basically you pass subarray starting at W(0,1,0,1).
This is called "sequence association". In this case, what appears to be a scaler, an element of an array (actual argument in caller) is associated with an array (implicitly the first element), the dummy argument in the subroutine . Thereafter the elements of the arrays are associated by storage order, known as "sequence". This was done in Fortran 77 and earlier for various reasons, here apparently for a workspace array -- perhaps the programmer was doing their own memory management. This is retained in Fortran >=90 for backwards compatibility, but IMO, doesn't belong in new code.
Say I have a TensorFlow variable to track the mean of a value. mean can be updated with the following graph snippet:
mean.assign((step * mean.read() + value) / (step + 1))
Unfortunately, those operations are not atomic, so if two different portions of the graph try to update the same mean variable one of the updates may be lost.
If instead I were tracking sum, I could just do
sum.assign_add(value, use_locking=True)
and everything would be great. Unfortunately, in other cases a more complicated update to mean (or std or etc.) may be required, and it may be impossible to use tf.assign_add.
Question: Is there any way to make the first code snippet atomic?
Unfortunately I believe the answer is no, since (1) I don't remember any such mechanism and (2) one of our reasons for making optimizers C++ ops was to get atomic behavior. My main source of hope is XLA, but I do not whether whether this kind of atomicity can be guaranteed there.
The underlying problem of the example is that there are two operations - read and subsequent assign - that together need to be executed atomically.
Since beginning of 2018, the tensorflow team added the CriticalSection class to the codebase. However, this only works for resource variables (as pointed out in Geoffrey's comments). Hence, value in the example below needs to be acquired as:
value = tf.get_variable(..., use_resource=True, ...)
Although I did not test this, according to the class' documentation the atomic update issue should then be solvable as follows:
def update_mean(step, value):
old_value = mean.read_value()
with tf.control_dependencies([old_value]):
return mean.assign((step * old_value + value) / (step + 1))
cs = tf.CriticalSection()
mean_update = cs.execute(update_mean, step, value)
session.run(mean_update)
Essentially, it provides a lock from the beginning of execute() till its end, i.e. covering the whole assignment operation including read and assign.
I've created an interprter for a simple language. It is AST based (to be more exact, an irregular heterogeneous AST) with visitors executing and evaluating nodes. However I've noticed that it is extremely slow compared to "real" interpreters. For testing I've ran this code:
i = 3
j = 3
has = false
while i < 10000
j = 3
has = false
while j <= i / 2
if i % j == 0 then
has = true
end
j = j+2
end
if has == false then
puts i
end
i = i+2
end
In both ruby and my interpreter (just finding primes primitively). Ruby finished under 0.63 second, and my interpreter was over 15 seconds.
I develop the interpreter in C++ and in Visual Studio, so I've used the profiler to see what takes the most time: the evaluation methods.
50% of the execution time was to call the abstract evaluation method, which then casts the passed expression and calls the proper eval method. Something like this:
Value * eval (Exp * exp)
{
switch (exp->type)
{
case EXP_ADDITION:
eval ((AdditionExp*) exp);
break;
...
}
}
I could put the eval methods into the Exp nodes themselves, but I want to keep the nodes clean (Terence Parr saied something about reusability in his book).
Also at evaluation I always reconstruct the Value object, which stores the result of the evaluated expression. Actually Value is abstract, and it has derived value classes for different types (That's why I work with pointers, to avoid object slicing at returning). I think this could be another reason of slowness.
How could I make my interpreter as optimized as possible? Should I create bytecodes out of the AST and then interpret bytecodes instead? (As far as I know, they could be much faster)
Here is the source if it helps understanding my problem: src
Note: I haven't done any error handling yet, so an illegal statement or an error will simply freeze the program. (Also sorry for the stupid "error messages" :))
The syntax is pretty simple, the currently executed file is in OTZ1core/testfiles/test.txt (which is the prime finder).
I appreciate any help I can get, I'm really beginner at compilers and interpreters.
One possibility for a speed-up would be to use a function table instead of the switch with dynamic retyping. Your call to the typed-eval is going through at least one, and possibly several, levels of indirection. If you distinguish the typed functions instead by name and give them identical signatures, then pointers to the various functions can be packed into an array and indexed by the type member.
value (*evaltab[])(Exp *) = { // the order of functions must match
Exp_Add, // the order type values
//...
};
Then the whole switch becomes:
evaltab[exp->type](exp);
1 indirection, 1 function call. Fast.
I'm wrestling with the concept of code "order of execution" and so far my research has come up short. I'm not sure if I'm phrasing it incorrectly, it's possible there is a more appropriate term for the concept. I'd appreciate it if someone could shed some light on my various stumbling blocks below.
I understand that if you call one method after another:
[self generateGrid1];
[self generateGrid2];
Both methods are run, but generateGrid1 doesn't necessarily wait for generateGrid2. But what if I need it to? Say generateGrid1 does some complex calculations (that take an unknown amount of time) and populate an array that generateGrid2 uses for it's calculations? This needs to be done every time an event is fired, it's not just a one time initialization.
I need a way to call methods sequentially, but have some methods wait for others. I've looked into call backs, but the concept is always married to delegates in all the examples I've seen.
I'm also not sure when to make the determinate that I can't reasonably expect a line of code to be parsed in time for it to be used. For example:
int myVar = [self complexFloatCalculation];
if (myVar <= 10.0f) {} else {}
How do I determine if something will take long enough to implement checks for "Is this other thing done before I start my thing". Just trial and error?
Or maybe I'm passing a method as parameter of another method? Does it wait for the arguments to be evaluated before executing the method?
[self getNameForValue:[self getIntValue]];
I understand that if you call one method after another:
[self generateGrid1];
[self generateGrid2];
Both methods are run, but generateGrid1 doesn't necessarily wait for generateGrid2. But what if I need it to?
False. generateGrid1 will run, and then generateGrid2 will run. This sequential execution is the very basis of procedural languages.
Technically, the compiler is allowed to rearrange statements, but only if the end result would be provably indistinguishable from the original. For example, look at the following code:
int x = 3;
int y = 4;
x = x + 6;
y = y - 1;
int z = x + y;
printf("z is %d", z);
It really doesn't matter whether the x+6 or the y-1 line happens first; the code as written does not make use of either of the intermediate values other than to calculate z, and that can happen in either order. So if the compiler can for some reason generate more efficient code by rearranging those lines, it is allowed to do so.
You'd never be able to see the effects of such rearranging, though, because as soon as you try to use one of those intermediate values (say, to log it), the compiler will recognize that the value is being used, and get rid of the optimization that would break your logging.
So really, the compiler is not required to execute your code in the order provided; it is only required to generate code that is functionally identical to the code you provided. This means that you actually can see the effects of these kinds of optimizations if you attach a debugger to a program that was compiled with optimizations in place. This leads to all sorts of confusing things, because the source code the debugger is tracking does not necessarily match up line-for-line with the code the compiled code the compiler generated. This is why optimizations are almost always turned off for debug builds of a program.
Anyway, the point is that the compiler can only do these sorts of tricks when it can prove that there will be no effect. Objective-c method calls are dynamically bound, meaning that the compiler has absolutely no guarantee about what will actually happen at runtime when that method is called. Since the compiler can't make any guarantees about what will happen, the compiler will never reorder Objective-C method calls. But again, this just falls back to the same principle I stated earlier: the compiler may change order of execution, but only if it is completely imperceptible to the user.
In other words, don't worry about it. Your code will always run top-to-bottom, each statement waiting for the one before it to complete.
In general, most method calls that you see in the style you described are synchronous, that means they'll have the effect you desire, running in the order the statements were coded, where the second call will only run after the first call finishes and returns.
Also, when a method takes parameters, its parameters are evaluated before the method is called.
This seems like a simple question to ask but, is it generally a good idea to re-use variables in any scripting language?
I'm particularly interested in the reasons why it is/isn't good practice as I'm besides my self if I should or shouldn't be doing it.
For example, $i is one of the most common variables used in a loop to iterate through things.
E.g (in php):
//This script will iterate each entry in $array
//and output it with a comma if it isn't the last item.
$i=0; //Recycling $i
foreach ($array as $v) {
$i++;
if ($i<count($array))
{
echo $v.', ';
}
else {
echo $v;
}
}
Say I had several loops in my script, would it be better to re-use $i or to just use another variable such as $a and for any other loops go from $b to $z.
Obviously to re-use $i, I'd have to set $i=0; or null before the loop, or else it would give some very weird behaviour later on in the script. Which makes me wonder if reusing it is worth the hassle..
Would the following take up more space than just using another variable?
$i=0; //Recycling $i
Of course this is the most simple example of use and I would like to know about if the script were terribly more complicated, would it cause more trouble than it's worth?
I know that re-using a variable could save a minuscule amount of memory but when those variables stack up, it gets important, doesn't it?
Thank you for any straight answers in advance and if this question is too vague, I apologize and would like to know that too- so that I know I should ask questions such as these elsewhere.
I think you have "string" confused with "variable". A string is a collection of ASCII or Unicode characters, depending on the programming language. A variable is a named location in memory. So you are talking about recycling a variable, not a string.
Unfortunately, the way that variables are dealt with is highly language and compiler specific. You will get much better information if you give us more details.
Most languages have the concept of scope, with variable space being allocated at declaration and deallocated when the scope ends. If you keep a variable in its proper scope, you won't have to worry about wasting memory, as variables will be deallocated when no longer needed. Again, the specifics of scope are language dependent.
Generally, it is a bad idea to reuse variable names like that, as when (not if) you forget to reset the variable to some initial value within the same scope, your program will break at runtime.
You should be initializing $i to 0 before entering your foreach loop, regardless of whether you used it earlier in the function, and regardless of the language you're using. It's quite simply a language-neutral best practice.
Many languages help you enforce this by allowing you to bind the counting variable to the scope of the for loop. PHP's foreach provides a similar mechanism, provided your array is 0-based and contains only numeric keys:
foreach ($array as $i => $v) {
echo $v;
if ($i + 1 < count($array))
echo ', ';
}
Also, in regards to your statement:
I know that re-using a variable could save a minuscule amount of memory but when those variables stack up, it gets important, doesn't it?
No, it really doesn't. Variables will be garbage-collected when they fall out of scope. It's virtually impossible (without programatically generating code) to use so many counting/temporary variables through the course of a single function that the memory usage is at all significant. Reusing a variable instead of declaring a new variable will have no measurable impact on your programs performance, and virtually no impact on its memory footprint.
Resetting your variable ($i=0) is not going to take more space.
When you declare a variable (say of type integer), a predefined amount of memory is allocated for that integer regardless of what its value is. By changing it back to zero, you only modify the value.
If you declare a new variable, memory will have to be allocated for it and typically it will also be initialized to zero, so resetting your existing variable probably isn't taking any extra CPU time either.
Reusing an index variable for mutually exclusive loops is a good idea; it makes your code easier to understand. If you're declaring a lot of extra variables, other people reading your code may wonder if "i" is still being used for something even after your loop has completed.
Reuse it! :)
Personally I donm't have any problem with reusing index/counter variables - although in some languages eg .Net, using a variable in a for loop specifically scopes the variable to that loop - so it's cleaned up afterwards no matter what.
You shouldn't ever re-use variables which contain any significant data (This shouldn't be an issue as all your variables should have descriptive names)
I once had to try to clean up some excel macro code written by a student - all the variable names were Greek gods and the usage swapped repeatedly throughout the script. In the end, I just re-wrote it.
You should also be aware that depending on the type of variable and the language you're using, re-allocating contents can be as expensive as creating a new variable. eg again in .Net, strings are immutable so this...
Dim MyString as String = "Something"
MyString = "SomethingElse"
Will actually allocate a new area of memory to store the 2nd usage and change the MyString pointer to point at the new location.
Incidentally, this is why the following is really inefficient (at least in .Net):
Dim SomeString = SomeVariable & " " & SomeOtherVariable
This allocates memory for SomeVariable then more memory for SomeVariable + " ", then yet again for SomeVariable + " " + SomeOtherVariable - meaning SomeVariable is actually written to memory 3 times.
In summary, very simple variables for loops I'd re-use. Anything else, I'd just assign a new variables - especially since memory allocation is usually only a tiny fraction of the time/processing of most applications.
Remember: Premature optimization is the root of all evil
This depends on the problem you are solving.
When using PHP (and other scripting languages) you should consider unset() on sizeable variables such as arrays or long strings. You shouldn't bother to re-use or unset small variables (integer/boolean/etc). It just isn't worth it when using a (slow) scripting language.
If you are developing a performance critical application in a low (lower) level language such as C then variable re-use is more important.