Are long variable names a waste of memory? - optimization

If I have an variable int d; // some comment.
Will that be better than int daysElapsedSinceBlahBlahBlahBlah with no comment.
It is more reabale, but will it waste memory?

You marked this as language-agnostic but the example corresponds to C-languages family. In C-like languages the name of the variable shouldn't waste memory, it's just a label for the compiler. In the resulting binary code it will be replaced by a memory address.
In general, there is no benefit in storing the name of variable inside resulting binary, the only usages I can think of is some extreme debug, reverse-engineering or some weird form of reflection. None of these are normal use-cases.

It depends entirely on the language and its implementation, but I can give you some examples based on a few languages that I know.
In C and C++, variable names are symbolic identifiers for the programmer and compiler. The compiler replaces them with memory addresses, CPU registers, or otherwise inlines their accesses to eliminate them entirely. The names do not appear in the generated code. They can appear in generated debug information, but you can omit that for release builds of a program when you don't need to do interactive step-debugging any more.
In Java, the compiler eliminates function-local variable names and rewrites the code to use stack-relative offsets. Field names (i.e., variables at class level) do persist in the compiled bytecode. This is required mainly because of the way classes are compiled individually and can be linked dynamically at run time, so the compiler cannot optimize the state of the entire program at once. The other reason field names are kept is because they are available through reflection. At run time, the virtual machine can generate native code which largely just uses memory addresses and native CPU registers in the manner of C & C++. Field names are kept in memory anyway, for reflection, and so that any additional loaded classes can be linked in. Mid-stage whole-program optimizers & obfuscators like ProGuard can make all symbolic names much shorter though.
In languages with an eval function, such as JavaScript and PHP, even local variable names have to be kept around, in theory, as they can all be accessed by name via run-time strings. A good interpreter can optimize this to use fast memory addresses in cases when it can prove that a particular variable isn't accessed by name.
In true line-by-line interpreters, like very old implementations of BASIC, variable names have to be kept around always, because the interpreter operates directly from the source code. It forgets about each line when it moves on to the next, so it has no way to track variables except by their names in the source. Since the full source code had to be kept in memory at run time, and it was often limited to 64 kB or less, variable names really did matter! Code from this era often used (and re-used) cryptically short names because of that (but also for other reasons, such as the way BASIC code was sometimes printed in magazines to be typed in, and these platforms did not have particularly nice keyboards or editors.)
Unless you're programming for an interpreter from the 1980s or earlier, identifier names are so cheap in any case, that you should not worry about it. Chose names that are long enough to be understandable, and short enough to be quick to read. Let the compiler worry about the rest, or run an optimizer or minifier on the code after it's written.

Variable names never take up memory. At least not nearly enough to even start worrying about it. While some language implementation will store the variable names somewhere (sometimes the language even requires this), the space they take up is absolutely tiny compared to everything else that's flying around. Just use whatever is best by other metrics (readability, conventions, etc.).

Related

Is it better to use a boolean variable to replace an if condition for readability or not?

I am in the second year of my bachelor study in information technology. Last year in one of my courses they taught me to write clean code so other programmers have an easier time working with your code. I learned a lot about writing clean code from a video ("clean code") on pluralsight (paid website for learning which my school uses). There was an example in there about assigning if conditions to boolean variables and using them to enhance readability. In my course today my teacher told me it's very bad code because it decreases performance (in bigger programs) due to increased tests being executed. I was wondering now whether I should continue using boolean variables for readability or not use them for performance. I will illustrate in an example (I am using python code for this example):
example boolean variable
Let's say we need to check whether somebody is legal to drink alcohol we get the persons age and we know the legal drinking age is 21.
is_old_enough = persons_age >= legal_drinking_age
if is_old_enough:
do something
My teacher told me today that this would be very bad for performance since 2 tests are performed first persons_age >= legal_drinking_age is tested and secondly in the if another test occurs whether the person is_old_enough.
My teacher told me that I should just put the condition in the if, but in the video they said that code should be read like natural language to make it clear for other programmers. I was wondering now which would be the better coding practice.
example condition in if:
if persons_age >= legal_drinking_age:
do something
In this example only 1 test is tested whether persons_age >= legal_drinking_age. According to my teacher this is better code.
Thank you in advance!
yours faithfully
Jonas
I was wondering now which would be the better coding practice.
The real safe answer is : Depends..
I hate to use this answer, but you won't be asking unless you have faithful doubt. (:
IMHO:
If the code will be used for long-term use, where maintainability is important, then a clearly readable code is preferred.
If the program speed performance crucial, then any code operation that use less resource (smaller dataSize/dataType /less loop needed to achieve the same thing/ optimized task sequencing/maximize cpu task per clock cycle/ reduced data re-loading cycle) is better. (example keyword : space-for-time code)
If the program minimizing memory usage is crucial, then any code operation that use less storage and memory resource to complete its operation (which may take more cpu cycle/loop for the same task) is better. (example: small devices that have limited data storage/RAM)
If you are in a race, then you may what to code as short as possible, (even if it may take a slightly longer cpu time later). example : Hackathon
If you are programming to teach a team of student/friend something.. Then readable code + a lot of comment is definitely preferred .
If it is me.. I'll stick to anything closest to assembly language as possible (as much control on the bit manipulation) for backend development. and anything closest to mathematica-like code (less code, max output, don't really care how much cpu/memory resource is needed) for frontend development. ( :
So.. If it is you.. you may have your own requirement/preference.. from the user/outsiders/customers point of view.. it is just a working/notWorking program. YOur definition of good program may defer from others.. but this shouldn't stop us to be flexible in the coding style/method.
Happy exploring. Hope it helps.. in any way possible.
Performance
Performance is one of the least interesting concerns for this question, and I say this as one working in very performance-critical areas like image processing and raytracing who believes in effective micro-optimizations (but my ideas of effective micro-optimization would be things like improving memory access patterns and memory layouts for cache efficiency, not eliminating temporary variables out of fear that your compiler or interpreter might allocate additional registers and/or utilize additional instructions).
The reason it's not so interesting is, because, as pointed out in the comments, any decent optimizing compiler is going to treat those two you wrote as equivalent by the time it finishes optimizing the intermediate representation and generates the final results of the instruction selection/register allocation to produce the final output (machine code). And if you aren't using a decent optimizing compiler, then this sort of microscopic efficiency is probably the last thing you should be worrying about either way.
Variable Scopes
With performance aside, the only concern I'd have with this convention, and I think it's generally a good one to apply liberally, is for languages that don't have a concept of a named constant to distinguish it from a variable.
In those cases, the more variables you introduce to a meaty function, the more intellectual overhead it can have as the number of variables with a relatively wide scope increases, and that can translate to practical burdens in maintenance and debugging in extreme cases. If you imagine a case like this:
some_variable = ...
...
some_other_variable = ...
...
yet_another_variable = ...
(300 lines more code to the function)
... in some function, and you're trying to debug it, then those variables combined with the monstrous size of the function starts to multiply the difficulty of trying to figure out what went wrong. That's a practical concern I've encountered when debugging codebases spanning millions of lines of code written by all sorts of people (including those no longer on the team) where it's not so fun to look at the locals watch window in a debugger and see two pages worth of variables in some monstrous function that appears to be doing something incorrectly (or in one of the functions it calls).
But that's only an issue when it's combined with questionable programming practices like writing functions that span hundreds or thousands of lines of code. In those cases it will often improve everything just focusing on making reasonable-sized functions that perform one clear logical operation and don't have more than one side effect (or none ideally if the function can be programmed as a pure function). If you design your functions reasonably then I wouldn't worry about this at all and favor whatever is readable and easiest to comprehend at a glance and maybe even what is most writable and "pliable" (to make changes to the function easier if you anticipate a future need).
A Pragmatic View on Variable Scopes
So I think a lot of programming concepts can be understood to some degree by just understanding the need to narrow variable scopes. People say avoid global variables like the plague. We can go into issues with how that shared state can interfere with multithreading and how it makes programs difficult to change and debug, but you can understand a lot of the problems just through the desire to narrow variable scopes. If you have a codebase which spans a hundred thousand lines of code, then a global variable is going to have the scope of a hundred thousands of lines of code for both access and modification, and crudely speaking a hundred thousand ways to go wrong.
At the same time that pragmatic sort of view will find it pointless to make a one-shot program which only spans 100 lines of code with no future need for extension avoid global variables like the plague, since a global here is only going to have 100 lines worth of scope, so to speak. Meanwhile even someone who avoids those like the plague in all contexts might still write a class with member variables (including some superfluous ones for "convenience") whose implementation spans 8,000 lines of code, at which point those variables have a much wider scope than even the global variable in the former example, and this realization could drive someone to design smaller classes and/or reduce the number of superfluous member variables to include as part of the state management for the class (which can also translate to simplified multithreading and all the similar types of benefits of avoiding global variables in some non-trivial codebase).
And finally it'll tend to tempt you to write smaller functions as well, since a variable towards the top of some function spanning 500 lines of code is going to also have a fairly wide scope. So anyway, my only concern when you do this is to not let the scope of those temporary, local variables get too wide. And if they do, then the general answer is not necessarily to avoid those variables but to narrow their scope.

Reassign an interface or allow GC to do its work on temporary variables

I'm very new to Go and am currently porting a PHP program.
I understand that Go is not a dynamically-typed language and I like that about it. It seems very structured and easy to keep track of everything.
But I've been coming across situations that seem to be a little ... ugly. Is there a better way of performing this sort of process:
plyr := builder.matchDetails.plyr[i]
plyrDetails := strings.Split(plyr, ",")
details := map[string]interface{}{
"position": plyrDetails[0], "id": plyrDetails[1],
"xStart": plyrDetails[2], "zStart": plyrDetails[3],
}
EDIT:
Is there a better way to achieve a map containing the strings from plyr than to create two additional variables, to be destroyed straight afterwards? Or is this the correct way?
tl;dr:
If possible, choose a different format and let a library do the string parsing/generation for you
Use structs rather than maps for anything you use a few times, for more compiler checks
The common way of using encoding/json accomplishes both of those.
Meanwhile, don't sweat perf too much because you'll probably vastly improve the old app's speed regardless; there's no indication speed of parsing or GC is a problem yet; and the syntactical differences you mentioned in the first rev. of the post don't necessarily actually relate to GC.
So, I understand you may be porting piece-for-piece, and that may limit what you can change now.
But if/when you can change things, a really clean solution would be to use the encoding/json package and a struct: the json package will parse input/generate output from structs without any manual string manipulation on your part, and using a struct gives you compile-time checking rather than only the runtime checking you get with a map. Lots of Go apps (and others) use JSON to expose their services.
An intermediate step could be to introduce struct types for any internal structure you use at least a few times, rather than maps, so even without updating the parsing, at least the internals of the app get the benefits of compile-time checking. structs are also what things like the gorm object/relational mapper expect to deal with. They happen to use less memory than maps, and be quicker (and more concise syntactically) to access, but those aren't even necessarily the most important considerations here.
On the performance of what you have now, and particularly whether different syntax would make it faster: don't sweat that, for a bunch of reasons: the port's likely to be faster than the PHP was whatever you do; we don't yet have any indication that parsing or GC is actually slow or your bottleneck; and the syntactical differences you talked about in the first revision of your question may not relate to GC much or at all. More/fewer var names in your code may not correspond to more/fewer heap allocations, 'cause often Go can allocate on the stack, briefly discussed under 'escape analysis' in Dave Cheney's Gocon Tokyo slides. And as peterSO said, we seem to be looking at allocations of smallish references, not, say, copying all of the string bytes from the request each time.
Go is NOT PHP. Write Go programs in Go. Write PHP programs in PHP.
Interface values are represented as a two-word pair giving a pointer
to information about the type stored in the interface and a pointer to
the associated data. Go Data Structures:
Interfaces
Reusing Go interface variables to "increase performance" makes no sense.

Is it bad coding style to overwrite variable names?

When writing quick, get-er-done scripts, I often overwrite variable names once I don't need them anymore. Is this bad? It saves memory, in the same way recycling a soda can saves the planet.
How does it save memory to overwrite a variable as opposed to disposing of it properly and declaring a new one. Do you have any experience with the mechanics of assembly? I believe the only thing you save by doing it your way is typing. Code readability suffers when you don't choose unique variables as you get further down the script.
Take for example a book that reuses page 1 all the way through to the page of a 300 page book. How easy would it be to recall your position?
In reference to your comparison of reusing variables and recycling. Variables are relative spaces of memory that are dissipated completely as though they never existed upon destruction, where as plastics and paper will always leave some sort of footprint even if it is in the form of a physical shift, such as a solid to a gas.
I would recommend declaring only the amount of variables needed to get the job done, but as many as it takes to improve code readability. Unless there is some alternative motivation.
I look at variables like plates at the buffet, once your done... Get a new one.

Determining most register hungry part of kernel

when I get a kernel using too many registers there are basically 3 options I can do:
leave the kernel as it is, which results in low occupancy
set compiler to use lower number of registers, spilling them, causing worse performance
rewrite the kernel
For option 3, I'd like to know which part of the kernel needs the maximum number of registers. Is there any tool or technique allowing me to identify this part? Reading through the PTX code (I develop on NVidia) is not helpful, the registers have various high numbers and to be honest, the best I can do is to identify which part of the assembly code maps to which part of the C code.
Just commenting out some code is not much a way to go - for example, I noticed that if I just put the code into loop, the number of registers raises dramatically, not only by one for the loop control variable. I personally suspect the NVidia compiler from imperfect variable liveness analysis, but of course I cannot do much with that :-)
If you're running on NVidia hardware, you can pass -cl-nv-verbose compile option to clBuildProgram then clGetProgramInfo CL_PROGRAM_BINARIES to get human readable text about the compile. In there it will say the number of registers it uses. Note that NVidia caches compiles and it only produces that register info when the kernel source actually changes, so you may want to inject some superfluous change in the source code to force it to do the full analysis.
If you're running on AMD hardware, just set the environment variable GPU_DUMP_DEVICE_KERNEL=1. It will produce a text file of the IL during the compile. Not sure it explicitly says the number of registers used, but it's what is equivalent to the NVidia technique above.
When looking at that output (at least on nvidia), you'll find that it seems to use an infinite number of registers (if you go by the register numbers). In reality, it does a flow analysis and actually reuses registers in a way that is not at all obvious when looking at the IL.
This is a tough question in any language, and there's probably not one correct answer. But here are some things to think about:
Look for the code in the "deepest" scope you can find, keeping in mind that most functions are probably inlined by your OpenCL compiler. Count the variables used in this scope, and walk up the containing scopes. In each containing scope, count variables that are used both before and after the inner scope. These are potentially live while the inner scope executes. This process could help you account for the live registers at a particular part of the program.
Try to take variables that "span" deep scopes and make them not span the scope if possible. For example, if you have something like this:
int i = func1();
int j = func2(); // perhaps lots of live registers here
int k = func3(i,j);
you could try to reorder the first two lines if func2 has lots of live registers. That would remove i from the set of live registers while func2 is running. This is a trivial pattern, of course, but hopefully it's illustrative.
Think about getting rid of variables that just keep around the results of simple computations. You might be able to recompute these when you need them. For example, if you have something like int i = get_local_id(0) you might be able to just use get_local_id(0) wherever you would have used i.
Think about getting rid of variables that are keeping around values stored in memory.
Without good tools for this kind of thing, it ends up being more art than science. But hopefully some of this is helpful.

Should primitive datatypes be capitalized?

If you were to invent a new language, do you think primitive datatypes should be capitalized, like Int, Float, Double, String to be consistent with standard class naming conventions? Why or why not?
By "primitive" I don't mean that they can't be (or behave like) objects. I guess I should have said "basic" datatypes.
If I were to invent a new language, it wouldn't have primitive data types, just wrapper objects. I've done enough wrapper-to-primitive-to-wrapper conversions in Java to last me the rest of my life.
As for capitalization? I'd go with case-sensitive first letter capitalized, partly because it's a convention that's ingrained in my brain, and partly to convey the fact that hey, these are objects too.
Case insensitivity leads to some crazy internationalization stuff; think umlauts, tildes, etc. It makes the compiler harder and allows the programmer freedoms that don't result in better code. Seriously, you think there's enough arguments over where to put braces in C... just watch.
As far as primitives looking like classes... only if you can subclass primitives. Don't assume everyone capitalizes class names; the C++ standard libraries do not.
Personally, I'd like a language that has, for example, two integer types:
int: Whatever integer type is fastest on the platform, and
int(bits): An integer with the given number of bits.
You can typedef whatever you need from that. Then maybe I could get a fixed(w,f) type (number of bits to left and right of decimal, respectively) and a float(m,e). And uint and ufixed for unsigned. (Anyone who wants an unsigned float can beg.) And standardize how bit fields are packed into structures. If the compiler can't handle a particular number of bits, it should say so and abort.
Why, yes, I program embedded systems and got sick of int and long changing size every couple years, how could you tell? ^_-
(Warning: MASSIVE post. If you want my final answer to this question, skip to the bottom section, where I answer it. If you do, and you think I'm spouting a load of bull, please read the rest before trying to argue with my "bull.")
If I were to make a programming language, here are a few caveats:
The type system would be more or less Perl 6 (but I totally came up with the idea first :P) - dynamically and weakly typed, with a stronger (I'm thinking Haskellian) type system that can be imposed on top of it.
There would be a minimal number of language keywords. Everything else would be reassignable first-class objects (types, functions, so on).
It will be a very high level language, like Perl / Python / Ruby / Haskell / Lisp / whatever is fashionable today. It will probably be interpreted, but I won't rule out compilation.
If any of those (rather important) design decisions don't apply to your ideal language (and they may very well not), then my following (apparently controversial) decision won't work for you. If you're not me, it may not work for you either. I think it fits my language, because it's my language. You should think about your language and how you want your language to be so that you, like Dennis Ritchie or Guido van Rossum or Larry Wall, can grow up to make bad design decisions and defend them in retrospect with good arguments.
Now then, I would still maintain that, in my language, identifiers would be case insensitive, and this would include variables, functions (which would be variables), types (which would also be variables, both built-in/primitive (which would be subclass-able) and user-defined), you name it.
To address issues as they come:
Naming consistency is the best argument I've seen, but I disagree. First off, allowing two different types called int and Int is ridiculous. The fact that Java has int and Integer is almost as ridiculous as the fact that neither of them allow arbitrary-precision. (Disclaimer: I've become a big fan of the word "ridiculous" lately.)
Normally I would be a fan of allowing people to shoot themselves in the foot with things like two different objects called int and Int if they want to, but here it's an issue of laziness, and of the old multiple-word-variable-name argument.
My personal take on the issue of underscore_case vs. MixedCase vs. camelCase is that they're both ugly and less readable and if at all possible you should only use a single word. In an ideal world, all code should be stored in your source control in an agreed-upon format (the style that most of the team uses) and the team's dissenters should have hooks in their VCS to convert all checked out code from that style to their style and vice versa for checking back in, but we don't live in that world.
It bothers me for some reason when I have to continually write MixedCaseVariableOrClassNames a lot more than it bothers me to write underscore_separated_variable_or_class_names. Even TimeOfDay and time_of_day might be the same identifier because they're conceptually the same thing, but I'm a bit hesitant to make that leap, if only because it's an unusual rule (internal underscores are removed in variable names). On one hand, it could end the debate between the two styles, but on the other hand it could just annoy people.
So my final decision is based on two parts, which are both highly subjective:
If I make a name others must use that's likely to be exported to another namespace, I'll probably name it as simply and clearly as I can. I usually won't use many words, and I'll use as much lowercase as I can get away with. sizedint doesn't strike me as much better or worse than sized_int or SizedInt (which, as far as examples of camelCase go, looks particularly bad because of the dI IMHO), so I'd go with that. If you like camelCase (and many people do), you can use it. If you like underscores, you're out of luck, but if you really need to you can write sized_int = sizedint and go on with life.
If someone else wrote it, and wanted to use sized_int, I can live with that. If they wrote it and used SizedInt, I don't have to stick with their annoying-to-type camelCase and, in my code, can freely write it as sizedint.
Saying that consistency helps us remember what things mean is silly. Do you speak english or English? Both, because they're the same word, and you recognize them as the same word. I think e.e. cummings was on to something, and we probably shouldn't have different cases at all, but I can't exactly rewrite most human and computer languages out there on a whim. All I can do is say, "Why are you making such a fuss about case when it says the same thing either way?" and implement this attitude in my own language.
Throwaway variables in functions (i.e. Person person = /* something */) is a pretty good argument, but I disagree that people would do Person thePerson (or Person aPerson). I personally tend to just do Person p anyway.
I'm not much fond of capitalizing type names (or much of anything) in the first place, and if it's enough of a throwaway variable to declare it undescriptively as Person person, then you won't lose much information with Person p. And anyone who says "non-descriptive one-letter variable names are bad" shouldn't be using non-descriptive many-letter variable names either, like Person person.
Variables should follow sane scoping rules (like C and Perl, unlike Python - flame war starts here guys!), so conflicts in simple names used locally (like p) should never arise.
As to making the implementation barf if you use two variables with the same names differing only in case, that's a good idea, but no. If someone makes library X that defines the type XMLparser and someone else makes library Y that defines the type XMLParser, and I want to write an abstraction layer that provides the same interface for many XML parsers including the two types, I'm pretty boned. Even with namespaces, this still becomes prohibitively annoying to pull off.
Internationalization issues have been brought up. Distinguishing between capital and lowercase umlautted U's will be no easier in my interpreter/compiler (probably the former) than in my source code.
If a language has a string type (i.e. the language isn't C) and the string type supports Unicode (i.e. the language isn't Ruby - it's only a joke, don't crucify me), then the language already provides a way to convert Unicode strings to and from lowercase, like Perl's lc() function (sometimes) and Python's unicode.lower() method. This function must be built into the language somewhere and can handle Unicode.
Calling this function during an interpreter's compile-time rather than its runtime is simple. For a compiler it's only marginally harder, because you'll still have to implement this kind of functionality anyway, so including it in the compiler is no harder than including it in the runtime library. If you're writing the compiler in the language itself (and you should be), and the functionality is built into the language, you'll have no problems.
To answer your question, no. I don't think we should be capitalizing anything, period. It's annoying to type (to me) and allowing case differences creates (or allows) unnecessary confusion between capitalized and lowercased things, or camelCased and under_scored things, or other sets of semantically-distinct-but-conceptually-identical things. If the distinction is entirely semantic, let's not bother with it at all.