I have a calculation that goes something like this:
Price = value * randomNumberBetween(decimalValueA, decimalValueB)
I was originally generating this using floats/doubles. However, after looking up a bit more on objective-c, it was mentioned numerous times that when calculating currency you should use NSDecimalNumber.
The issue I have is that I use this 'price' variable in comparisons and things, for example:
if (deposit/price) < 0.2
return price*0.05;
Using NSDecimalNumber makes this a lot more difficult. As far as I'm aware I should be converting any magic numbers (in this case 0.2 and 0.05) to NSDecimalNumber so then I can compare them and use functions such as NSDecimalMultiply.
Also, if I have a function that is something like:
return (minRandomPercentage + ((maxRandomPercentage - minRandomPercentage) * (randomNumber)
it ends up becoming this ridiculous string of nested function calls like:
return [minRandomPercentage decimalNumberByAdding:[[maxRandomPercentage decimalNumberBySubtracting: minRandomPercentage] decimalNumberByMultiplyingBy:random]]
Is this seriously how objective-c deals with decimals? Can anyone give me any clues on how to make this a lot less arduous? I can live with the nested function calls if I could do comparisons with the result and not have to be casting every magic number I have.
If you can't afford to deal with the rounding errors that can occur with the standard base-2 floating point types, you'll have to use NSDecimal or NSDecimalNumber. NSDecimal is a C struct, and Foundation provides a C interface for dealing with it. It provides functions NSDecimalAdd, NSDecimalMultiply, etc.
From the Number and Value Programming Guide: You might consider the C interface if you don’t need to treat decimal numbers as objects—that is, if you don’t need to store them in an object-oriented collection like an instance of NSArray or NSDictionary. You might also consider the C interface if you need maximum efficiency. The C interface is faster and uses less memory than the NSDecimalNumber class.
If you're writing object-oriented code, and you're not interacting with massive data sets, it might be best to stick with NSDecimalNumber. If you profile your code and find that using NSDecimalNumber is causing a high memory overhead, then you may need to consider alternatives.
If rounding errors are not a concern, you can also use native C scalars. See: How to add two NSNumber objects?
NSNumber and NSDecimalNumber are used as object wrappers when you need to pass a number to a method or store numbers in a collection. Since NSArray, NSSet, NSDictionary, etc. only allow you to store objects of type 'id', you can't store ints, floats, etc. natively.
If you're dealing with large data sets and can afford rounding errors, you can use ints, floats, doubles, etc. raw. Then when you have your result and you need to store it or pass it to another object, you can wrap it up in an NSNumber accordingly.
If you do have a need to store large collections of numbers, it's much more efficient to use C arrays than to initialize and store lots of NSNumber objects.
Seriously, this is how you do base 10 arithmetic in iOS. As you're probably aware, many numbers that have exact representations in base 10 don't have exact representations in base 2, and that can lead to unacceptable rounding when working with base 10 systems like currency or metric measurements.
Values represented by NSDecimalNumber are objects, unlike built-in numeric types like int, float, and double. It seems odd at first to use methods for arithmetic operations, but it makes more sense when you start thinking about the values as objects.
Related
I have been working with kotlin for little over 2 years now.
Looking over what I learned in these 2 years, I noticed that I have been using(num.toDouble()).toLong() for kotlin.math functions a bit too much. For example, Math.sqrt(num.toDouble()).toLong(). Two of my projects have extension function sumByLong() inside util created by team, because kotlin libs only have sumBy:Int and sumByDouble:Double and a lot of work in the project, uses Long.
In short, Mathematical operations using Long is more common than using Double or Float, yet Long has a very small footprint in kotlin standard library. And since kotlin.math is different than java.lang.Math, mixed usage is not a recommended practice.
Going over docs of kotlin.math, all functions except for abs, min, max, only have implementation for Float and Double only.
Can someone Explain like I am 5 the possible reasoning behind this. Something real, not silly stuff like devs were lazy, or more code means more work, which is all I could find in search engine results.
--
Update: Some Clarification
1. I can understand that in most cases, return types will contain floating point numbers. I am also talking about parameters lacking long counterpart. Maybe using Math.sqrt wasn't the best example, something like math.log, math.cos, etc would be better example, where floating return type us expected, but parameters doesn't even support Int
2. When I said "Long is more common than using Double", I was not talking about public at large, but was looking over my past two years working with kotlin. I am sorry if my phrasing wasn't clear.
Disclaimer: this answer may be a little opinionated, but I believe it is according to general consensus and best practices of using maths in computer science.
Mathematics for integers and for real numbers (floats) are really two much different math "sub-worlds". They're pretty separate, they have different uses and we usually don't mix them.
If we work on some physics, we do real-world simulations, we operate on units like temperature or speed, we use doubles. If we have identifiers (bank account number), we count something (number of bank accounts) or we operate on a discrete values with 100% precision (bank account value) we always use integers and never doubles.
Operations like sinus, square root or logarithm make perfect sense for physics, but not really for bank account values. They very often produce either very small or very large numbers that can't be safely represented as integers. They operate on approximations and don't really provide 100% precise results. They are continuous by nature while integers are discrete.
What is the point of using integers with sqrt() or log() if they almost always return a floating point result? What is the point of passing an integer to sin() if e.g. there are only 2 distinct angles smaller than square angle that can be represented as an integer: 0 and 1? Using integers with these functions is unnatural and impractical.
I can't think of a case where we have to often convert between longs and doubles. Usually, we operate either on longs or on doubles top to bottom and we don't convert between them too often. By converting we lose advantages of these specific "math sub-worlds", we sum their disadvantages. Maybe you should just keep using doubles in your application and don't convert to/from longs? Why do you use longs?
BTW, you mentioned that you can't/shouldn't use java.lang.Math in the Kotlin application. Well, if you look into java.lang.Math you will notice that... it supports only doubles :-)
In the case of ceil, it returns a Double because a Double has a bigger range of values than Long. Consider, for example:
ceil(Long.MAX_VALUE.toDouble() * 1000)
What would you expect it to return if it returned a Long? For further discussion, see Why does Math.ceil return a double?
In the case of log and trigonometric functions, the use cases requiring Long parameters are rare and the requirements varied. For example, should it round up, down, or to the nearest integral value? These are decisions that should be made for your particular project, and therefore can't be made in the stdlib.
In your project, you can simply define your required functions in a single, small source file, making your project's choice of rounding method, and then use it everywhere instead of converting at each call site, e.g.:
fun cos(n: Long): Long = cos(x.toDouble()).roundToLong()
I was told by a developer that working with integers can be frustrating in Objective-C, so that he prefers to work directly with strings for IDs returned in JSON messages in order to save frequent conversions between integers and strings. In other words, he wants these IDs returned as strings by the API even if they are natively integers on the server. He also said that you cannot use integers in things like dictionary/map in Objective-C so that again strings are better (and he suggested that we should just use '1' instead of 1 as primary keys in DB tables so that the data types on both sides coincide).
I know little about Objective-C, but I find it hard to believe that such basic stuff can be this annoying in the language. From what I can gather, there are different types of 'numbers' in Objective-C:
int, float, double, long, and short (C primitives)
NSInteger, NSUInteger, CGFloat (Objective-C primitives)
EDIT: NSNumber is not a primitive, but a wrapper object type that stores and retrieves different primitive values. Thanks #rmaddy
The latter types seem to standardize APIs across different underlying architectures so that the developers don't need to care about things like 32-bit vs. 64-bit integers on different hardware.
Back to the complaint by my friend, for dealing with integer IDs returned from a web API, which data type should be used for storing them and using them in data structures such as a dictionary? Are said conversions inevitable?
Your friend is out of date by a few years. It used to be as bad as they make out (though still I wouldn't agree to putting everything into strings because of it).
That has all been fixed with the modern Objective C syntax. This is a dictionary with some numbers in it:
NSUInteger answer = 42;
NSDictionary * dict = #{#"answerCount": #1,
#"value": #(answer)};
Sure, it's not the nicest syntax one could have chosen, but it's perfectly workable.
Also, there are a number of libraries for taking data from the wire, parsing the JSON, and exploding the result into model objects. This way you never really deal with raw JSON dictionaries. I'd strongly recommend investigating those options before you commit to a wire format.
Why does the Java API use int, when short or even byte would be sufficient?
Example: The DAY_OF_WEEK field in class Calendar uses int.
If the difference is too minimal, then why do those datatypes (short, int) exist at all?
Some of the reasons have already been pointed out. For example, the fact that "...(Almost) All operations on byte, short will promote these primitives to int". However, the obvious next question would be: WHY are these types promoted to int?
So to go one level deeper: The answer may simply be related to the Java Virtual Machine Instruction Set. As summarized in the Table in the Java Virtual Machine Specification, all integral arithmetic operations, like adding, dividing and others, are only available for the type int and the type long, and not for the smaller types.
(An aside: The smaller types (byte and short) are basically only intended for arrays. An array like new byte[1000] will take 1000 bytes, and an array like new int[1000] will take 4000 bytes)
Now, of course, one could say that "...the obvious next question would be: WHY are these instructions only offered for int (and long)?".
One reason is mentioned in the JVM Spec mentioned above:
If each typed instruction supported all of the Java Virtual Machine's run-time data types, there would be more instructions than could be represented in a byte
Additionally, the Java Virtual Machine can be considered as an abstraction of a real processor. And introducing dedicated Arithmetic Logic Unit for smaller types would not be worth the effort: It would need additional transistors, but it still could only execute one addition in one clock cycle. The dominant architecture when the JVM was designed was 32bits, just right for a 32bit int. (The operations that involve a 64bit long value are implemented as a special case).
(Note: The last paragraph is a bit oversimplified, considering possible vectorization etc., but should give the basic idea without diving too deep into processor design topics)
EDIT: A short addendum, focussing on the example from the question, but in an more general sense: One could also ask whether it would not be beneficial to store fields using the smaller types. For example, one might think that memory could be saved by storing Calendar.DAY_OF_WEEK as a byte. But here, the Java Class File Format comes into play: All the Fields in a Class File occupy at least one "slot", which has the size of one int (32 bits). (The "wide" fields, double and long, occupy two slots). So explicitly declaring a field as short or byte would not save any memory either.
(Almost) All operations on byte, short will promote them to int, for example, you cannot write:
short x = 1;
short y = 2;
short z = x + y; //error
Arithmetics are easier and straightforward when using int, no need to cast.
In terms of space, it makes a very little difference. byte and short would complicate things, I don't think this micro optimization worth it since we are talking about a fixed amount of variables.
byte is relevant and useful when you program for embedded devices or dealing with files/networks. Also these primitives are limited, what if the calculations might exceed their limits in the future? Try to think about an extension for Calendar class that might evolve bigger numbers.
Also note that in a 64-bit processors, locals will be saved in registers and won't use any resources, so using int, short and other primitives won't make any difference at all. Moreover, many Java implementations align variables* (and objects).
* byte and short occupy the same space as int if they are local variables, class variables or even instance variables. Why? Because in (most) computer systems, variables addresses are aligned, so for example if you use a single byte, you'll actually end up with two bytes - one for the variable itself and another for the padding.
On the other hand, in arrays, byte take 1 byte, short take 2 bytes and int take four bytes, because in arrays only the start and maybe the end of it has to be aligned. This will make a difference in case you want to use, for example, System.arraycopy(), then you'll really note a performance difference.
Because arithmetic operations are easier when using integers compared to shorts. Assume that the constants were indeed modeled by short values. Then you would have to use the API in this manner:
short month = Calendar.JUNE;
month = month + (short) 1; // is july
Notice the explicit casting. Short values are implicitly promoted to int values when they are used in arithmetic operations. (On the operand stack, shorts are even expressed as ints.) This would be quite cumbersome to use which is why int values are often preferred for constants.
Compared to that, the gain in storage efficiency is minimal because there only exists a fixed number of such constants. We are talking about 40 constants. Changing their storage from int to short would safe you 40 * 16 bit = 80 byte. See this answer for further reference.
The design complexity of a virtual machine is a function of how many kinds of operations it can perform. It's easier to having four implementations of an instruction like "multiply"--one each for 32-bit integer, 64-bit integer, 32-bit floating-point, and 64-bit floating-point--than to have, in addition to the above, versions for the smaller numerical types as well. A more interesting design question is why there should be four types, rather than fewer (performing all integer computations with 64-bit integers and/or doing all floating-point computations with 64-bit floating-point values). The reason for using 32-bit integers is that Java was expected to run on many platforms where 32-bit types could be acted upon just as quickly as 16-bit or 8-bit types, but operations on 64-bit types would be noticeably slower. Even on platforms where 16-bit types would be faster to work with, the extra cost of working with 32-bit quantities would be offset by the simplicity afforded by only having 32-bit types.
As for performing floating-point computations on 32-bit values, the advantages are a bit less clear. There are some platforms where a computation like float a=b+c+d; could be performed most quickly by converting all operands to a higher-precision type, adding them, and then converting the result back to a 32-bit floating-point number for storage. There are other platforms where it would be more efficient to perform all computations using 32-bit floating-point values. The creators of Java decided that all platforms should be required to do things the same way, and that they should favor the hardware platforms for which 32-bit floating-point computations are faster than longer ones, even though this severely degraded PC both the speed and precision of floating-point math on a typical PC, as well as on many machines without floating-point units. Note, btw, that depending upon the values of b, c, and d, using higher-precision intermediate computations when computing expressions like the aforementioned float a=b+c+d; will sometimes yield results which are significantly more accurate than would be achieved of all intermediate operands were computed at float precision, but will sometimes yield a value which is a tiny bit less accurate. In any case, Sun decided everything should be done the same way, and they opted for using minimal-precision float values.
Note that the primary advantages of smaller data types become apparent when large numbers of them are stored together in an array; even if there were no advantage to having individual variables of types smaller than 64-bits, it's worthwhile to have arrays which can store smaller values more compactly; having a local variable be a byte rather than an long saves seven bytes; having an array of 1,000,000 numbers hold each number as a byte rather than a long waves 7,000,000 bytes. Since each array type only needs to support a few operations (most notably read one item, store one item, copy a range of items within an array, or copy a range of items from one array to another), the added complexity of having more array types is not as severe as the complexity of having more types of directly-usable discrete numerical values.
If you used the philosophy where integral constants are stored in the smallest type that they fit in, then Java would have a serious problem: whenever programmers write code using integral constants, they have to pay careful attention to their code to check if the type of the constants matter, and if so look up the type in the documentation and/or do whatever type conversions are needed.
So now that we've outlined a serious problem, what benefits could you hope to achieve with that philosophy? I would be unsurprised if the only runtime-observable effect of that change would be what type you get when you look the constant up via reflection. (and, of course, whatever errors are introduced by lazy/unwitting programmers not correctly accounting for the types of the constants)
Weighing the pros and the cons is very easy: it's a bad philosophy.
Actually, there'd be a small advantage. If you have a
class MyTimeAndDayOfWeek {
byte dayOfWeek;
byte hour;
byte minute;
byte second;
}
then on a typical JVM it needs as much space as a class containing a single int. The memory consumption gets rounded to a next multiple of 8 or 16 bytes (IIRC, that's configurable), so the cases when there are real saving are rather rare.
This class would be slightly easier to use if the corresponding Calendar methods returned a byte. But there are no such Calendar methods, only get(int) which must returns an int because of other fields. Each operation on smaller types promotes to int, so you need a lot of casting.
Most probably, you'll either give up and switch to an int or write setters like
void setDayOfWeek(int dayOfWeek) {
this.dayOfWeek = checkedCastToByte(dayOfWeek);
}
Then the type of DAY_OF_WEEK doesn't matter, anyway.
Using variables smaller than the bus size of the CPU means more cycles are necessary. For example when updating a single byte in memory, a 64-bit CPU needs to read a whole 64-bit word, modify only the changed part, then write back the result.
Also, using a smaller data type requires overhead when the variable is stored in a register, since the behavior of the smaller data type to be accounted for explicitly. Since the whole register is used anyways, there is nothing to be gained by using a smaller data type for method parameters and local variables.
Nevertheless, these data types might be useful for representing data structures that require specific widths, such as network packets, or for saving space in large arrays, sacrificing speed.
Looking for the proper data type (such as IndexedSeq[Double]) to use when designing a domain-specific numerical computing library. For this question, I'm limiting scope to working with 1-Dimensional arrays of Double. The library will define a number functions that are typically applied for each element in the 1D array.
Considerations:
Prefer immutable data types, such as Vector or IndexedSeq
Want to minimize data conversions
Reasonably efficient in space and time
Friendly for other people using the library
Elegant and clean API
Should I use something higher up the collections hierarchy, such as Seq?
Or is it better to just define the single-element functions and leave the mapping/iterating to the end user?
This seems less efficient (since some computations could be done once per set of calls), but at at the same time a more flexible API, since it would work with any type of collection.
Any recommendations?
If your computations are to do anything remotely computationally intensive, use Array, either raw or wrapped in your own classes. You can provide a collection-compatible wrapper, but make that an explicit wrapper for interoperability only. Everything other than Array is generic and thus boxed and thus comparatively slow and bulky.
If you do not use Array, people will be forced to abandon whatever things you have and just use Array instead when performance matters. Maybe that's okay; maybe you want the computations to be there for convenience not efficiency. In that case, I suggest using IndexedSeq for the interface, assuming that you want to let people know that indexing is not outrageously slow (e.g. is not List), and use Vector under the hood. You will use about 4x more memory than Array[Double], and be 3-10x slower for most low-effort operations (e.g. multiplication).
For example, this:
val u = v.map(1.0 / _) // v is Vector[Double]
is about three times slower than this:
val u = new Array[Double](v.length)
var j = 0
while (j<u.length) {
u(j) = 1.0/v(j) // v is Array[Double]
j += 1
}
If you use the map method on Array, it's just as slow as the Vector[Double] way; operations on Array are generic and hence boxed. (And that's where the majority of the penalty comes from.)
I am using Vectors all the time when I deal with numerical values, since it provides very efficient random access as well as append/prepend.
Also notice that, the current default collection for immutable indexed sequences is Vector, so that if you write some code like for (i <- 0 until n) yield {...}, it returns IndexedSeq[...] but the runtime type is Vector. So, it may be a good idea to always use Vectors, since some binary operators that take two sequences as input may benefit from the fact that the two arguments are of the same implementation type. (Not really the case now, but some one has pointed out that vector concatenation could be in log(N) time, as opposed to the current linear time due to the fact that the second parameter is simply treated as a general sequence.)
Nevertheless, I believe that Seq[Double] should already provide most of the function interfaces you need. And since mapping results from Range does not yield Vector directly, I usually put Seq[Double] as the argument type as my input, so that it has some generality. I would expect that efficiency is optimized in the underlying implementation.
Hope that helps.
Let's say I'm writing a "DayData" class, containing the ivars
NSString *symbol; //such as "AAPL"
NSString *currency; //such as "USD"
NSDate *day;
double open;
double high;
double low;
double close;
The last four ivars are the open,high,low,close prices of that stock for that day.
Say I'm using this class as the fundamental building-block class behind intensive Monte Carlo simulations along many decades, i.e. thousands of days, of historical data. This means I'd have to access these ivars thousands if not millions if not billions of times in a short period of time to make the simulations as fast as possible.
Question: Should I stick to double, or should I still use NSDecimalNumber? How fast is NSDecimalNumber, really? Has anyone here tested NSDecimalNumber for intensive scientific applications?
Faster than NSDecimalNumber would be NSDecimal, which isn't an Obj-C object, so doesn't incur the overhead of objc_msgSend, but still has the advantages of decimal math. Here are the functions to work with NSDecimals.
As jtbandes has said, you should use NSDecimal if you want speed, NSDecimalNumber is an Obj-C object wrapper around NSDecimal. NSDecimal is a struct used to represent decimal numbers, which allows you to do calculations that don't give binary->decimal rounding and representation errors.
I would go with double, since it's a simulation, where the last ounce of accuracy probably won't matter anyway (presumably your simulation includes approximation, so minuscule errors won't have too much of an effect). You need to beware of the pitfalls of floating point calculations - certain types of operations can lead to larger errors, especially if the magnitude of two floating point numbers are very different, or if you subtract one number from another that is very close. This page on Wikipedia covers a few of the pitfalls.
If you are dealing with dollars and cents (but always whole cents), nothing is faster or more efficient than a regular int counting the cents, and dividing by 100 to get dollars.