Occasionally I will store the state of some system as an integer. I often find myself using small values for these states (say 1-10), since the system is relatively simple.
In general, what's the best declaration for a variable which stores small positive integers - where best is defined as fastest read/write time & smallest memory consumption? Small is here defined as 1-10, although a complete list of integer storing methods and their ranges would be useful.
Originally I used Integer as on the face of it, it uses less memory. But I have since learned that that is not the case, as it is silently converted to Long
I then used Long for the above reason, and in the knowledge that it uses less memory than Double
I have since discovered Byte and switched to that, since it stores smaller integers (0-255 or 256, I never remember which), and I guess uses less memory from it's minute name. But I don't really trust VBA and wonder if there's any internal type conversions done here too.
Boolean I thought was only 0 or 1, but I've read that any non-zero number is converted to True, does this mean it can also store numbers?
Originally I used Integer as on the face of it, it uses less memory. But I have since learned that that is not the case, as it is silently converted to Long
That's right there is no advantage in using Integer over Long because of that conversion, but Integer might be necessary when communicating with old 16 bit APIs.
Also read "Why Use Integer Instead of Long?"
I then used Long for the above reason, and in the knowledge that it uses less memory than Double
You would not decide between Long or Double because one uses less memory. You decide between them because …
you need floating point numbers (Double)
or you don't accept floating point numbers. (Long)
Deciding on memory usage in this specific case is just a very bad idea because these types are fundamentally different.
I have since discovered Byte and switched to that, since it stores smaller integers (0-255 or 256, I never remember which), and I guess uses less memory from it's minute name. But I don't really trust VBA and wonder if there's any internal type conversions done here too.
I don't see any case where you use Office/Excel and run into any memory issues by using Long instead of Byte to iterate from 1 to 10. If you need to limit it to 255 (some old APIs, whatever) then you might use Byte. If there is no need for that I would use Long just to be flexible and not run into any coding issues because you need to remember which counters are only Byte and which are Long.
E.g. If I use i for iterating I would expect Long. I see no advantage in using Byte for that case.
Stay as simple as possible. Don't do strange things one would not expect only because you can. Avoiding future coding issues is worth more than one (or three) byte of memory usage. Sometimes it is worthier to write good human readable and maintainable code than faster code especially if you can't notice the differences (which you really can't in this case). Bad readable code always results in errors or vulnerabilities sooner or later.
Boolean I thought was only 0 or 1, but I've read that any non-zero number is converted to True, does this mean it can also store numbers?
No that's wrong. Boolean is -1 for True and 0 for False. But note that if you cast e.g. a Long into Boolean which is not 0 then it will automatically cast and result in True.
But Boolean in VBA is clearly defined as:
0 = False
-1 = True
The smallest chunk of memory that can be addressed is a byte (8 bits).
I cannot guarantee that VBA Bytes are stored as bytes in all cases, but using this type you are on the safest side.
By the way, the largest byte value is 11111111b, i.e 255d. The value 256d is 100000000b which requires 9 bits.
Also note that using Bytes every possible time might be unproductive as it can have a cost in terms of running time, if numerical conversions are required, while the spared memory space may be insignificant.
Except for very special applications, this kind of micro-optimization is of no use.
Related
I have been working with kotlin for little over 2 years now.
Looking over what I learned in these 2 years, I noticed that I have been using(num.toDouble()).toLong() for kotlin.math functions a bit too much. For example, Math.sqrt(num.toDouble()).toLong(). Two of my projects have extension function sumByLong() inside util created by team, because kotlin libs only have sumBy:Int and sumByDouble:Double and a lot of work in the project, uses Long.
In short, Mathematical operations using Long is more common than using Double or Float, yet Long has a very small footprint in kotlin standard library. And since kotlin.math is different than java.lang.Math, mixed usage is not a recommended practice.
Going over docs of kotlin.math, all functions except for abs, min, max, only have implementation for Float and Double only.
Can someone Explain like I am 5 the possible reasoning behind this. Something real, not silly stuff like devs were lazy, or more code means more work, which is all I could find in search engine results.
--
Update: Some Clarification
1. I can understand that in most cases, return types will contain floating point numbers. I am also talking about parameters lacking long counterpart. Maybe using Math.sqrt wasn't the best example, something like math.log, math.cos, etc would be better example, where floating return type us expected, but parameters doesn't even support Int
2. When I said "Long is more common than using Double", I was not talking about public at large, but was looking over my past two years working with kotlin. I am sorry if my phrasing wasn't clear.
Disclaimer: this answer may be a little opinionated, but I believe it is according to general consensus and best practices of using maths in computer science.
Mathematics for integers and for real numbers (floats) are really two much different math "sub-worlds". They're pretty separate, they have different uses and we usually don't mix them.
If we work on some physics, we do real-world simulations, we operate on units like temperature or speed, we use doubles. If we have identifiers (bank account number), we count something (number of bank accounts) or we operate on a discrete values with 100% precision (bank account value) we always use integers and never doubles.
Operations like sinus, square root or logarithm make perfect sense for physics, but not really for bank account values. They very often produce either very small or very large numbers that can't be safely represented as integers. They operate on approximations and don't really provide 100% precise results. They are continuous by nature while integers are discrete.
What is the point of using integers with sqrt() or log() if they almost always return a floating point result? What is the point of passing an integer to sin() if e.g. there are only 2 distinct angles smaller than square angle that can be represented as an integer: 0 and 1? Using integers with these functions is unnatural and impractical.
I can't think of a case where we have to often convert between longs and doubles. Usually, we operate either on longs or on doubles top to bottom and we don't convert between them too often. By converting we lose advantages of these specific "math sub-worlds", we sum their disadvantages. Maybe you should just keep using doubles in your application and don't convert to/from longs? Why do you use longs?
BTW, you mentioned that you can't/shouldn't use java.lang.Math in the Kotlin application. Well, if you look into java.lang.Math you will notice that... it supports only doubles :-)
In the case of ceil, it returns a Double because a Double has a bigger range of values than Long. Consider, for example:
ceil(Long.MAX_VALUE.toDouble() * 1000)
What would you expect it to return if it returned a Long? For further discussion, see Why does Math.ceil return a double?
In the case of log and trigonometric functions, the use cases requiring Long parameters are rare and the requirements varied. For example, should it round up, down, or to the nearest integral value? These are decisions that should be made for your particular project, and therefore can't be made in the stdlib.
In your project, you can simply define your required functions in a single, small source file, making your project's choice of rounding method, and then use it everywhere instead of converting at each call site, e.g.:
fun cos(n: Long): Long = cos(x.toDouble()).roundToLong()
I'm learning 8051, and find it's hard to understand byte addressable and bit addressable.
A type of hardware architecture that supports unique access to individual bytes of data.
For example, let us assume a number 0x1234 (0001001000110100). When storing the numbers on a system which is byte addressable, the first byte of the data (00010010) gets a unique address to the second byte (00110100), i.e each byte aligned in the memory will be uniquely addressable. You could manipulate the content only in chunks of 8bits.
However in case of micro-controller registers were data is stored, if you could manipulate its content bit by bit it’s called bit addressable.
They are not really using the terms right, byte addressable is what we are used to an address represents a unique byte in memory or the memory space. Bit addressable would mean that each bit in the memory space has a unique address, which is not the case. they are just showing you how to make some macros/variables that can access individual bits, is not an 8051 thing, but a generic programming thing and specifically implemented in C using variable types or keywords (or just macros) for their compiler.
What they are telling you is they have this sbit declaration which unless it is just a macro is clearly not a C standard declaration. But you can do the same things without. it is just bit manipulation that they are doing for you. Normally to set bit 5 you would do something like
variable |= (1<<5);
to clear bit 5
variable&=~(1<<5);
and you can certainly make macros from that to make it more generic. What they have done for this compiler is allow you to declare a variable that is a single bit in some other variable and then that bit sized variable you can set to a one or zero.
Why does the Java API use int, when short or even byte would be sufficient?
Example: The DAY_OF_WEEK field in class Calendar uses int.
If the difference is too minimal, then why do those datatypes (short, int) exist at all?
Some of the reasons have already been pointed out. For example, the fact that "...(Almost) All operations on byte, short will promote these primitives to int". However, the obvious next question would be: WHY are these types promoted to int?
So to go one level deeper: The answer may simply be related to the Java Virtual Machine Instruction Set. As summarized in the Table in the Java Virtual Machine Specification, all integral arithmetic operations, like adding, dividing and others, are only available for the type int and the type long, and not for the smaller types.
(An aside: The smaller types (byte and short) are basically only intended for arrays. An array like new byte[1000] will take 1000 bytes, and an array like new int[1000] will take 4000 bytes)
Now, of course, one could say that "...the obvious next question would be: WHY are these instructions only offered for int (and long)?".
One reason is mentioned in the JVM Spec mentioned above:
If each typed instruction supported all of the Java Virtual Machine's run-time data types, there would be more instructions than could be represented in a byte
Additionally, the Java Virtual Machine can be considered as an abstraction of a real processor. And introducing dedicated Arithmetic Logic Unit for smaller types would not be worth the effort: It would need additional transistors, but it still could only execute one addition in one clock cycle. The dominant architecture when the JVM was designed was 32bits, just right for a 32bit int. (The operations that involve a 64bit long value are implemented as a special case).
(Note: The last paragraph is a bit oversimplified, considering possible vectorization etc., but should give the basic idea without diving too deep into processor design topics)
EDIT: A short addendum, focussing on the example from the question, but in an more general sense: One could also ask whether it would not be beneficial to store fields using the smaller types. For example, one might think that memory could be saved by storing Calendar.DAY_OF_WEEK as a byte. But here, the Java Class File Format comes into play: All the Fields in a Class File occupy at least one "slot", which has the size of one int (32 bits). (The "wide" fields, double and long, occupy two slots). So explicitly declaring a field as short or byte would not save any memory either.
(Almost) All operations on byte, short will promote them to int, for example, you cannot write:
short x = 1;
short y = 2;
short z = x + y; //error
Arithmetics are easier and straightforward when using int, no need to cast.
In terms of space, it makes a very little difference. byte and short would complicate things, I don't think this micro optimization worth it since we are talking about a fixed amount of variables.
byte is relevant and useful when you program for embedded devices or dealing with files/networks. Also these primitives are limited, what if the calculations might exceed their limits in the future? Try to think about an extension for Calendar class that might evolve bigger numbers.
Also note that in a 64-bit processors, locals will be saved in registers and won't use any resources, so using int, short and other primitives won't make any difference at all. Moreover, many Java implementations align variables* (and objects).
* byte and short occupy the same space as int if they are local variables, class variables or even instance variables. Why? Because in (most) computer systems, variables addresses are aligned, so for example if you use a single byte, you'll actually end up with two bytes - one for the variable itself and another for the padding.
On the other hand, in arrays, byte take 1 byte, short take 2 bytes and int take four bytes, because in arrays only the start and maybe the end of it has to be aligned. This will make a difference in case you want to use, for example, System.arraycopy(), then you'll really note a performance difference.
Because arithmetic operations are easier when using integers compared to shorts. Assume that the constants were indeed modeled by short values. Then you would have to use the API in this manner:
short month = Calendar.JUNE;
month = month + (short) 1; // is july
Notice the explicit casting. Short values are implicitly promoted to int values when they are used in arithmetic operations. (On the operand stack, shorts are even expressed as ints.) This would be quite cumbersome to use which is why int values are often preferred for constants.
Compared to that, the gain in storage efficiency is minimal because there only exists a fixed number of such constants. We are talking about 40 constants. Changing their storage from int to short would safe you 40 * 16 bit = 80 byte. See this answer for further reference.
The design complexity of a virtual machine is a function of how many kinds of operations it can perform. It's easier to having four implementations of an instruction like "multiply"--one each for 32-bit integer, 64-bit integer, 32-bit floating-point, and 64-bit floating-point--than to have, in addition to the above, versions for the smaller numerical types as well. A more interesting design question is why there should be four types, rather than fewer (performing all integer computations with 64-bit integers and/or doing all floating-point computations with 64-bit floating-point values). The reason for using 32-bit integers is that Java was expected to run on many platforms where 32-bit types could be acted upon just as quickly as 16-bit or 8-bit types, but operations on 64-bit types would be noticeably slower. Even on platforms where 16-bit types would be faster to work with, the extra cost of working with 32-bit quantities would be offset by the simplicity afforded by only having 32-bit types.
As for performing floating-point computations on 32-bit values, the advantages are a bit less clear. There are some platforms where a computation like float a=b+c+d; could be performed most quickly by converting all operands to a higher-precision type, adding them, and then converting the result back to a 32-bit floating-point number for storage. There are other platforms where it would be more efficient to perform all computations using 32-bit floating-point values. The creators of Java decided that all platforms should be required to do things the same way, and that they should favor the hardware platforms for which 32-bit floating-point computations are faster than longer ones, even though this severely degraded PC both the speed and precision of floating-point math on a typical PC, as well as on many machines without floating-point units. Note, btw, that depending upon the values of b, c, and d, using higher-precision intermediate computations when computing expressions like the aforementioned float a=b+c+d; will sometimes yield results which are significantly more accurate than would be achieved of all intermediate operands were computed at float precision, but will sometimes yield a value which is a tiny bit less accurate. In any case, Sun decided everything should be done the same way, and they opted for using minimal-precision float values.
Note that the primary advantages of smaller data types become apparent when large numbers of them are stored together in an array; even if there were no advantage to having individual variables of types smaller than 64-bits, it's worthwhile to have arrays which can store smaller values more compactly; having a local variable be a byte rather than an long saves seven bytes; having an array of 1,000,000 numbers hold each number as a byte rather than a long waves 7,000,000 bytes. Since each array type only needs to support a few operations (most notably read one item, store one item, copy a range of items within an array, or copy a range of items from one array to another), the added complexity of having more array types is not as severe as the complexity of having more types of directly-usable discrete numerical values.
If you used the philosophy where integral constants are stored in the smallest type that they fit in, then Java would have a serious problem: whenever programmers write code using integral constants, they have to pay careful attention to their code to check if the type of the constants matter, and if so look up the type in the documentation and/or do whatever type conversions are needed.
So now that we've outlined a serious problem, what benefits could you hope to achieve with that philosophy? I would be unsurprised if the only runtime-observable effect of that change would be what type you get when you look the constant up via reflection. (and, of course, whatever errors are introduced by lazy/unwitting programmers not correctly accounting for the types of the constants)
Weighing the pros and the cons is very easy: it's a bad philosophy.
Actually, there'd be a small advantage. If you have a
class MyTimeAndDayOfWeek {
byte dayOfWeek;
byte hour;
byte minute;
byte second;
}
then on a typical JVM it needs as much space as a class containing a single int. The memory consumption gets rounded to a next multiple of 8 or 16 bytes (IIRC, that's configurable), so the cases when there are real saving are rather rare.
This class would be slightly easier to use if the corresponding Calendar methods returned a byte. But there are no such Calendar methods, only get(int) which must returns an int because of other fields. Each operation on smaller types promotes to int, so you need a lot of casting.
Most probably, you'll either give up and switch to an int or write setters like
void setDayOfWeek(int dayOfWeek) {
this.dayOfWeek = checkedCastToByte(dayOfWeek);
}
Then the type of DAY_OF_WEEK doesn't matter, anyway.
Using variables smaller than the bus size of the CPU means more cycles are necessary. For example when updating a single byte in memory, a 64-bit CPU needs to read a whole 64-bit word, modify only the changed part, then write back the result.
Also, using a smaller data type requires overhead when the variable is stored in a register, since the behavior of the smaller data type to be accounted for explicitly. Since the whole register is used anyways, there is nothing to be gained by using a smaller data type for method parameters and local variables.
Nevertheless, these data types might be useful for representing data structures that require specific widths, such as network packets, or for saving space in large arrays, sacrificing speed.
I've got an algorithm using a single (positive integer) number as an input to produce an output. And I've got the reverse function which should do the exact opposite, going back from the output to the same integer number. This should be a unique one-to-one reversible mapping.
I've tested this for some integers, but I want to be 100% sure that it works for all of them, up to a known limit.
The problem is that if I just test every integer, it takes an unreasonably long time to run. If I use 64-bit integers, that's a lot of numbers to check if I want to check them all. On the other hand, if I only test every 10th or 100th number, I'm not going to be 100% sure at the end. There might be some awkward weird constellation in one of the 90% or 99% which I didn't test.
Are there any general ways to identify edge cases so that just those "interesting" or "risky" numbers are checked? Or should I just pick numbers at random? Or test in increasing increments?
Or to put the question another way, how can I approach this so that I gain 100% confidence that every case will be properly handled?
The approach for this is generally checking every step of the computation for potential flaws. Concerning integer math, that is overflows, underflows and rounding errors from division, basically that the mathematical result can't be represented accurately. In addition, all operations derived from this suffer similar problems.
The process of auditing then looks at single steps in turn. For example, if you want to allocate memory for N integers, you need N times the size of an integer in bytes and this multiplication can overflow. You now determine those values where the multiplication overflows and create according tests that exercise these. Note that for the example of allocating memory, proper handling typically means that the function does not allocate memory but fail.
The principle behind this is that you determine the ranges for every operation where the outcome is somehow different (like e.g. where it overflows) and then make sure via tests that both variants work. This reduces the number of tests from all possible input values to just those where you expect a significant difference.
I'm thinking more about how much system memory my programs will use nowadays. I'm currently doing A level Computing at college and I know that in most programs the difference will be negligible but I'm wondering if the following actually makes any difference, in any language.
Say I wanted to output "True" or "False" depending on whether a condition is true. Personally, I prefer to do something like this:
Dim result As String
If condition Then
Result = "True"
Else
Result = "False"
EndIf
Console.WriteLine(result)
However, I'm wondering if the following would consume less memory, etc.:
If condition Then
Console.WriteLine("True")
Else
Console.WriteLine("False")
EndIf
Obviously this is a very much simplified example and in most of my cases there is much more to be outputted, and I realise that in most commercial programs these kind of statements are rare, but hopefully you get the principle.
I'm focusing on VB.NET here because that is the language used for the course, but really I would be interested to know how this differs in different programming languages.
The main issue making if's fast or slow is predictability.
Modern CPU's (anything after 2000) use a mechanism called branch prediction.
Read the above link first, then read on below...
Which is faster?
The if statement constitutes a branch, because the CPU needs to decide whether to follow or skip the if part.
If it guesses the branch correctly the jump will execute in 0 or 1 cycle (1 nanosecond on a 1Ghz computer).
If it does not guess the branch correctly the jump will take 50 cycles (give or take) (1/200th of a microsecord).
Therefore to even feel these differences as a human, you'd need to execute the if statement many millions of times.
The two statements above are likely to execute in exactly the same amount of time, because:
assigning a value to a variable takes negligible time; on average less than a single cpu cycle on a multiscalar CPU*.
calling a function with a constant parameter requires the use of an invisible temporary variable; so in all likelihood code A compiles to almost the exact same object code as code B.
*) All current CPU's are multiscalar.
Which consumes less memory
As stated above, both versions need to put the boolean into a variable.
Version A uses an explicit one, declared by you; version B uses an implicit one declared by the compiler.
However version A is guaranteed to only have one call to the function WriteLine.
Whilst version B may (or may not) have two calls to the function WriteLine.
If the optimizer in the compiler is good, code B will be transformed into code A, if it's not it will remain with the redundant calls.
How bad is the waste
The call takes about 10 bytes for the assignment of the string (Unicode 2 bytes per char).
But so does the other version, so that's the same.
That leaves 5 bytes for a call. Plus maybe a few extra bytes to set up a stackframe.
So lets say due to your totally horrible coding you have now wasted 10 bytes.
Not much to worry about.
From a maintainability point of view
Computer code is written for humans, not machines.
So from that point of view code A is clearly superior.
Imagine not choosing between 2 options -true or false- but 20.
You only call the function once.
If you decide to change the WriteLine for another function you only have to change it in one place, not two or 20.
How to speed this up?
With 2 values it's pretty much impossible, but if you had 20 values you could use a lookup table.
Obviously that optimization is not worth it unless code gets executed many times.
If you need to know the precise amount of memory the instructions are going to take, you can use ildasm on your code, and see for yourself. However, the amount of memory consumed by your code is much less relevant today, when the memory is so cheap and abundant, and compilers are smart enough to see common patterns and reduce the amount of code that they generate.
A much greater concern is readability of your code: if a complex chain of conditions always leads to printing a conditionally set result, your first code block expresses this idea in a cleaner way than the second one does. Everything else being equal, you should prefer whatever form of code that you find the most readable, and let the compiler worry about optimization.
P.S. It goes without saying that Console.WriteLine(condition) would produce the same result, but that is of course not the point of your question.