How a programmers solve the dilemma of using old variables instead of new variables? - optimization

For example:
... some code
int sizeOfSomeObject = someObject.length();
... some code, sizeOfSomeObject is not need anymore
now I need other int variable for other action(for example, for position in some object), and i have the dilemma: create a new variable or use sizeOfSomeObject for this. In the first case I will keep readability, but lose performance. In the second case - on the contrary. What usually do programmers in this situation?

In the first case I will keep readability, but lose performance. In the second case - on the contrary.
So did you benchmark it? I suspect no, you didn't. Most modern compilers do a lot of agressive analysis during register allocation, so if the optimizer perceives that there's a variable that's not used anymore, but there's a new variable of the same type, it will just merge the two variables to the same memory region or processor register. No need to worry about performance penalties.
And anyway, don't do premature optimization (which this is). In 90% of the cases, readability is more important than "performance".
All in all, go ahead and create a new variable with an appropriate, different, descriptive name. And just for fun, compile this version and the version in which you used the same variable name, and look at the generated assembly (or bytecode, or...) - and find out that they're identical.

I would use different named variables for different things.
In terms of something like this, I don't think just one variable would cause a massive performance hit. In most languages you have the option to clear variables from memory in some way when they are no longer in use, so I would recommend doing that so that the code means something to you or others when read at a later date.

In C++, you can use blocks for objects to be destroyed as soon as they are not needed anymore:
void some_function () {
{
MyClass c;
// ... here we use c ...
}
// now c has been destroyed
{
MyClass d;
// ... here we use d ...
}
// now d has been destroyed
}
In your example (with int variables), there is no reason to worry about performance. The worst thing that could probably happen is memory for two variables being used instead of one, but (i) that's negligible and (ii) int's will probably live in a CPU register, anyway. If you really worry, use the block approach for your int example.

It depends how often such an int would be initialized. If it's not in some hugely nested for loop, most (all) programmers will go for the first. Besides, most modern programming languages have a garbage collector, which cleans up left over objects.

Decent compiler will optimize out your second variable, so that shouldn't be an issue.
That said, there are situations where variable reuse makes sense. E.g., you might have some variable that holds a generic output populated from call to some external API. According to the context and parameters passed to the API you'll process the data differently but it's probably better (more readable etc.) to reuse the same data variable.
For example, something like this:
void* data = getSomeData(params);
//process data
//change params
data = getSomeData(params);
//process data
//change params
data = getSomeData(params);

Related

How does gcc push local variables on to the stack?

void
f
()
{
int a[1];
int b;
int c;
int d[1];
}
I have found that these local variables, for this example, are not pushed on to the stack in order. b and c are pushed in the order of their declaration, but, a and d are grouped together. So the compiler is allocating arrays differently from any other built in type or object.
Is this a C/C++ requirement or gcc implementation detail?
The C standard says nothing about the order in which local variables are allocated. It doesn't even use the word "stack". It only requires that local variables have a lifetime that begins on entry to the nearest enclosing block (basically when execution reaches the {) and ends on exit from that block (reaching the }), and that each object has a unique address. It does acknowledge that two unrelated variables might happen to be adjacent in memory (for obscure technical reasons involving pointer arithmetic), but doesn't say when this might happen.
The order in which variables are allocated is entirely up to the whim of the compiler, and you should not write code that depends on any particular ordering. A compiler might lay out local variables in the order in which they're declared, or alphabetically by name, or it might group some variables together if that happens to result in faster code.
If you need to variables to be allocated in a particular order, you can wrap them in an array or a structure.
(If you were to look at the generated machine code, you'd most likely find that the variables are not "pushed onto the stack" one by one. Instead, the compiler will probably generate a single instruction to adjust the stack pointer by a certain number of bytes, effectively allocating a single chunk of memory to hold all the local variables for the function or block. Code that accesses a given variable will then use its offset within the stack frame.)
And since your function doesn't do anything with its local variables, the compiler might just not bother allocating space for them at all, particularly if you request optimization with -O3 or something similar.
The compiler can order the local variables however it wants. It may even choose to either not allocate them at all (for example, if they're not used, or are optimized away through propagation/ciscizing/keeping in register/etc) or allocate the same stack location for multiple locals that have disjoint live ranges.
There is no common implementation detail to outline how a particular compiler does it, as it may change at any time.
Typically, compilers will try to group similar sized variables (and/or alignments) together to minimize wasted space through "gaps", but there are so many other factors involved.
structs and arrays have slightly different requirements, but that's beyond the scope of this question I believe.

Is it bad practice to use a function in an if statement?

I have a function in another class file that gets information about the battery. In a form I have the following code:
If BatteryClass.getStatus_Battery_Charging = True Then
It appears Visual Studio accepts this. However, would it be better if I used the following code, which also works:
dim val as boolean = BatteryClass.getStatus_Battery_Charging
If val = True Then
Is there a difference between these two methods?
What you're asking in general is which approach is idiomatic.
The technical rule is not to invoke a method multiple times - unless you're specifically checking a volatile value for change - when its result can be preserved in a locally scoped variable. That's not what your asking but its important to understand that multiple calls should typically be bound to a variable.
That being said its better to produce less lines of code from a maintenance perspective as long as doing so improves the readability of your code. If you do have to declare a locally scoped variable to hold the return value of a method make sure to give the variable a meaningful name.
Prefer this [idiomatic VB.NET] one liner:
If BatteryClass.getStatus_Battery_Charging Then
over this:
Dim isBatteryCharging As Boolean = BatteryClass.getStatus_Battery_Charging
If isBatteryCharging Then
Another point you should concern yourself with are methods, which when invoked, create a side effect that affects the state of your program. In most circumstances it is undesirable to have a side effect causing method invoked multiple times - when possible rewrite such side affecting methods so that they do not cause any side effects. To limit the number of times a side effect will occur use the same local variable scoping rule instead of performing multiple invocations of the method.
No real difference.
The second is better if you need the value again of course. It's also marginally easier to debug when you have a value stored in a variable.
Personally I tend to use the first because I'm an old school C programmer at heart!

Most appropriate data structure for dynamic languages field access

I'm implementing a dynamic language that will compile to C#, and it's implementing its own reflection API (.NET's is too slow, and the DLR is limited only to more recent and resourceful implementations).
For this, I've implemented a simple .GetField(string f) and .SetField(string f, object val) interface. Until recently, the implementation just switches over all possible field string values and makes the corresponding action.
Also, this dynamic language has the possibility to define anonymous objects. For those anonymous objects, at first, I had implemented a simple hash algorithm.
By now, I am looking for ways to optimize the dynamic parts of the language, and I have come across the fact that a hash algorithm for anonymous objects would be overkill. This is because the objects are usually small. I'd say the objects contain 2 or 3 fields, normally. Very rarely, they would contain more than 15 fields. It would take more time to actually hash the string and perform the lookup than if I would test for equality between them all. (This is not tested, just theoretical).
The first thing I did was to -- at compile-time -- create a red-black tree for each anonymous object declaration and have it laid onto an array so that the object can look for it in a very optimized way.
I am still divided, though, if that's the best way to do this. I could go for a perfect hashing function. Even more radically, I'm thinking about dropping the need for strings and actually work with a struct of 2 longs.
Those two longs will be encoded to support 10 chars (A-za-z0-9_) each, which is mostly a good prediction of the size of the fields. For fields larger than this, a special function (slower) receiving a string will also be provided.
The result will be that strings will be inlined (not references), and their comparisons will be as cheap as a long comparison.
Anyway, it's a little hard to find good information about this kind of optimization, since this is normally thought on a vm-level, not a static language compilation implementation.
Does anyone have any thoughts or tips about the best data structure to handle dynamic calls?
Edit:
For now, I'm really going with the string as long representation and a linear binary tree lookup.
I don't know if this is helpful, but I'll chuck it out in case;
If this is compiling to C#, do you know the complete list of fields at compile time? So as an idea, if your code reads
// dynamic
myObject.foo = "some value";
myObject.bar = 32;
then during the parse, your symbol table can build an int for each field name;
// parsing code
symbols[0] == "foo"
symbols[1] == "bar"
then generate code using arrays or lists;
// generated c#
runtimeObject[0] = "some value"; // assign myobject.foo
runtimeObject[1] = 32; // assign myobject.bar
and build up reflection as a separate array;
runtimeObject.FieldNames[0] == "foo"; // Dictionary<int, string>
runtimeObject.FieldIds["foo"] === 0; // Dictionary<string, int>
As I say, thrown out in the hope it'll be useful. No idea if it will!
Since you are likely to be using the same field and method names repeatedly, something like string interning would work well to quickly generate keys for your hash tables. It would also make string equality comparisons constant-time.
For such a small data set (expected upper bounds of 15) I think almost any hashing will be more expensive then a tree or even a list lookup, but that is really dependent on your hashing algorithm.
If you want to use a dictionary/hash then you'll need to make sure the objects you use for the key return a hash code quickly (perhaps a single constant hash code that's built once). If you can prevent collisions inside of an object (sounds pretty doable) then you'll gain the speed and scalability (well for any realistic object/class size) of a hash table.
Something that comes to mind is Ruby's symbols and message passing. I believe Ruby's symbols act as a constant to just a memory reference. So comparison is constant, they are very lite, and you can use symbols like variables (I'm a little hazy on this and don't have a Ruby interpreter on this machine). Ruby's method "calling" really turns into message passing. Something like: obj.func(arg) turns into obj.send(:func, arg) (":func" is the symbol). I would imagine that symbol makes looking up the message handler (as I'll call it) inside the object pretty efficient since it's hash code most likely doesn't need to be calculated like most objects.
Perhaps something similar could be done in .NET.

const vs enum in D

Check out this quote from here, towards the bottom of the page. (I believe the quoted comment about consts apply to invariants as well)
Enumerations differ from consts in that they do not consume any space
in the final outputted object/library/executable, whereas consts do.
So apparently value1 will bloat the executable, while value2 is treated as a literal and doesn't appear in the object file.
const int value1 = 0xBAD;
enum int value2 = 42;
Back in C++ I always assumed this was for legacy reasons, and old compilers that couldn't optimize away constants. But if this is still true in D, there must be a deeper reason behind this. Anyone know why?
Just like in C++, an enum in D seems to be a "conserved integer literal" (edit: amazing, D2 even supports floats and strings). Its enumerators have no location. They are just immaterial as values without identity.
Placing enum is new in D2. It first defines a new variable. It is not an lvalue (so you also cannot take its address). An
enum int a = 10; // new in D2
Is like
enum : int { a = 10 }
If i can trust my poor D knowledge. So, a in here is not an lvalue (no location and you can't take its address). A const, however, has an address. If you have a global (not sure whether this is the right D terminology) const variable, the compiler usually can't optimize it away, because it doesn't know what modules can access that variable or could take its address. So it has to allocate storage for it.
I think if you have a local const, the compiler can still optimize it away just as in C++, because the compiler knows by looking at its scope whether or not anyone is interested in its address or whether everyone just takes its value.
Your actual question; why enum/const is the same in D as in C++; seems to be unanswered. Sadly there exists no good reason for this choice whatsoever. I believe that this was just an unintentional side effect in C++ that became a de facto pattern. In D the same pattern was needed, and Walter Bright decided that it should be done as in C++ such that those coming from that place would recognize what to do ... In fact, before this rather IMHO silly decision, the keyword manifest was used instead of enum for this usecase.
I think a good compiler/linker should still remove the constant. It's just that with the enum, it's actually guaranteed in the spec. The difference is primarily a matter of semantics. (Also keep in mind that 2.0 isn't complete yet)
The real purpose of enum being expanded syntactically to support single manifest constants, from what I understand, is that Don Clugston, a D template guru, was doing some crazy stuff with templates. He kept running into long build times, ridiculous compiler memory usage, etc. because the compiler kept creating internal data strucutres for const variables. One key thing about const/immutable variables compared to enums is that const/immutable variables are lvalues and can have their address taken. This means there is some extra overhead for the compiler. This usually doesn't matter, but when you're executing really complicated compile-time metaprograms, even if const variables are optimized away, this is still significant overhead at compile time.
It sounds like the enum value will be used "inline" in expressions where as the const will actually take storage and any expression referencing it will be loading the value from the memory storage.
This sound similar to the difference between const vs. readonly in C#. The former is a compile-time constant and the later is a run-time constant. This definitely affected versioning of assemblies (since assemblies referencing a readonly would receive a copy at compile time and would not get a change to the value if the referenced assembly was rebuilt with a different value).

Is it good practice to create once-used variables?

A colleague of mine refactored this code:
private void btnGeneral_Click(object sender, RoutedEventArgs e)
{
Button button = (Button)e.OriginalSource;
Type type = this.GetType();
Assembly assembly = type.Assembly;
string userControlFullName = String.Format("{0}.{1}", type.Namespace, button.Name);
UserControl userControl = (UserControl)assembly.CreateInstance(userControlFullName);
}
to this code:
private void btnGeneral_Click(object sender, RoutedEventArgs e)
{
Button button = (Button)e.OriginalSource;
Type type = this.GetType();
Assembly assembly = type.Assembly;
UserControl userControl = (UserControl)assembly.CreateInstance(String.Format("{0}.{1}", type.Namespace, button.Name));
}
saying that you don't need to create a variable if it is only going to be used once.
My response was that making once-used variables is good practice since it:
functions as and reduces comments (it is clear what "userControlFullName" is)
makes code easier to read, i.e. more of your code "reads like English"
avoids super-long statements by replacing parts of them with clear variable names
easier to debug since you can mouse over the variable name, and in the cases of e.g. PHP programming without debuggers, it is easier to echo out these variable names to get their values
The arguments against this way "more lines of code", "unnecessary variables" are arguments to make life easier for the compiler but with no significant speed or resource savings.
Can anyone think of any situations in which one should not create once-used variable names?
I'm with your opinion in this case. Readability is key. I'm sure that the compiler produces the same executable in both cases, with the compilers as intelligent as they are today.
But I wouldn't claim "always use once-used variables" either. Example:
String name = "John";
person.setName(name);
is unnecessary, because
person.setName("John");
reads equally well - if not even better. But, of course, not all cases are as clear cut. "Readability" is a subjective term, after all.
All your reasons seem valid to me.
There are occasions where you effectively have to avoid using intermediate variables, where you need a single expression (e.g. for member variable initialization in Java/C#) but introducing an extra variable for clarity is absolutely fine where it's applicable. Obviously don't do it for every argument to every method, but in moderation it can help a lot.
The debugging argument is particularly strong - it's also really nice to be able to step over the lines which "prepare" the arguments to a method, and step straight into the method itself, having seen the arguments easily in the debugger.
Your colleague doesn't seem to be consistent.
The consistent solution looks like this:
private void btnGeneral_Click(object sender, RoutedEventArgs e)
{
UserControl userControl = ((UserControl)type.Assembly).CreateInstance(String.Format("{0}.{1}", this.GetType().Namespace, ((Button)e.OriginalSource).Name));
}
I'm completely with you on this one.
I especially use this if a method takes a lot of booleans, ie
public void OpenDocument(string filename, bool asReadonly, bool copyLocal, bool somethingElse)
To me this is a lot more readable:
bool asReadonly = true;
bool copyLocal = false;
bool somethingElse = true;
OpenDocument("somefile.txt", asReadonly, copyLocal, somethingElse);
..than:
OpenDocument("somefile.txt", true, false, true);
Since the programming languages I use generally do not tell me what was null in an exception stacktrace I generally try to use variables so that no more than one item per line can be null. I actually find this to be the most significant limiter of how many statements I want to put on a single line.
If you get a nullpointerexception in this statement from your production logs you're really in trouble:
getCustomer().getOrders().iterator().next().getItems().iterator().next().getProduct().getName()
Although I agree with your thoughts, adding an extra variable can introduce an extra concept in the method and this concept may not always be relevant to the overall goal of the method. So excessive adding of variables can also increase method complexity and reduce legibility. Note the usage of excessive here.
I guess in some cases where it could have an effect on performance. In particular in this example:
for (int i1 = 0; i1 < BIG_NR; i1++)
{
for (int i2 = 0; i2 < BIG_NR; i2++)
{
for (int i3 = 0; i3 < BIG_NR; i3++)
{
for (int i4 = 0; i4 < BIG_NR; i4++)
{
int amount = a + b;
someVar[i1][i2][i3][i4] = amount;
}
}
}
}
... the extra assignment might have a too big impact on performance.
But in general, your arguments are 100% correct.
The both codes are exactly the same. Of course, yours is more readable, maintenable and debuggable, but, if that was the point of your colleague, his code is NOT memory less consumer.
I think it's a judgement call based on how tidy you want you code to be. I also think that both you and your colleague are correct.
In this instance I would side with you colleague based on the code you presented (for performance reasons), however as I said before it does depend on the context in which it will be used and I think your position is perfectly acceptable.
I would point out that creating variables for once used parameters can be pointless, unless they are const variables or things that you need to use in many places.
I would argue that declaring a once used variable could possible create more confusion when you are debugging if there are lots and lots of these, but one here and there is probably fine.
Creating a new variable means one more concept for the reader to keep track of. (Consider the extreme case: int b=a;c=b;) If a method is so complex - in such need of breaking up - that the extra concept is a price worth paying, then you ought to go the whole hog and split it into two methods. This way you get both the meaningful name and the smaller size. (Smaller for your original method: if it's like you say, then people won't typically need to read the auxiliary method.)
That's a generalisation, particularly in a language with a lot of boilerplate for adding new methods, but you're not going to disagree with the generalisation often enough to make it worth leaving out of your style guide.
I'm completely with your colleague in principle, but not in this case.
The problem with throwaway variables is that they introduce state, and more state means the code is harder to understand since you don't know what effects it could have on the program's flow. Functional programming has no variables at all, exactly for this reason. So the fewer variables there are, the better. Eliminating variables is good.
However, in this particular case, where the method ends right after the variable's only use, the disadvantages of having it are minimal, and the advantages you mention are probably more substantial.
Trying hard to come up with an argument against introducing new variables I'd say that when you read the code (for the first time, at least), you don't know if the variable is being used more than once. So immediately you will let your eyes scan down through the code to see if it is used in more places. The longer the function the more will you have to look to see if you can find it.
Thats's the best argument against that I can come up with! :-)
That's how I used to code. Nowadays I tried to minimize intermediate variables. The use of intermediate variables is perfectly fine if it's immutable.
I'm in agreement with the majority here, code readability is key.
It's a rare line count crusader that actually writes highly readable, highly maintainable code.
Additionally, it all gets compiled to MSIL anyway and the compiler will optimise a lot for you.
In the following example, the compiler will optimise the code anyway:
List<string> someStrings = new List<string>();
for (int i = 0; i < 1000; i++)
{
string localString = string.Format("prefix{0}", i);
someStrings.Add(localString);
}
Rather than:
List<string> someStrings = new List<string>();
string localString = string.Empty;
for (int i = 0; i < 1000; i++)
{
localString = string.Format("prefix{0}", i);
someStrings.Add(localString);
}
So there's really no performance reason not to go with it in many cases.
Agreed
"makes code easier to read, i.e. more of your code "reads like English"
I think this is the most important factor as there is no difference in performance or functionality on most moder managed languages
After all we all know code is harder to read than it is to write.
Karl