SQL data type for physical science measurements - double or decimal - sql

Someone mentioned in this decimal vs double! - Which one should I use and when? post that it's best to use double for physical science computations. Does this apply to measurements such as flash point, viscosity, weight and volume? Can someone explain further?

Decimal = exact. That is, I have £1.23 in my pocket and applying VAT, say, is a known and accepted rounding issue.
When you measure something, you can never be exact. If something is 123 centimeters long then strictly speaking it's somewhere between 122.5 and 123.49999 centimeters long....
You are dealing with 2 different kinds of quantities.
So, use double for this.

The question you link to is about the data types in C#, not SQL, and the answers reflect that.
Decimal is a datatype used and created for dealing with currency, making sure calculations balance out (no lost cents or fractions of, for example).
Physical scientific computations rarely deal with money values, hence, you should use double whenever you need the accuracy.

Generally you would always use doubles for recording physical measurements unless you have a good reason to do otherwise. There is potentially a miniscule loss of accuracy when using any floating point number since binary is unable to perfectly represent certain decimal numbers (0.1 being the most obvious example) but the inaccuracy in the double representation is going to be many orders of magnitude smaller than the error in the measurements you take.
Decimal is used where it's very important that numbers are represented exactly so typically only when dealing with money (yes, it would seem we care more about money than science!)

Knowing which database this is for would make things easier...
Oracle only has NUMBER, which if you omit the two optional parameters precision and scale - is a float. Using both parameters is DECIMAL, and only specifying the precision is INTEGER. Can't remember how to get REAL...
MySQL Numeric data type info: http://dev.mysql.com/doc/refman/5.0/en/numeric-types.html
SQL Server Numeric data type info: http://msdn.microsoft.com/en-us/library/aa258271(SQL.80).aspx
I haven't dealt with float and real much, but I've heard they aren't great for mathematical computations. I've used DECIMAL for varying precision, not just for monetary values.
What data type to use depends on the data, and how you intend to use that data.

Related

When to use decimals or doubles

Quick Aside: I'm going to use the word "Float" to refer to both a .Net float and a SQL float with only 7 significant digits. I will use the word "Double" to refer to a .Net double and a SQL float with 15 significant digits. I also realize that this is very similar to some other posts regarding decimals/doubles, but the answers on those posts are really inconsistent, and I really want some recommendations for my specific circumstance...
I am part of a team that is rewriting an old application. The original app used floats (7 digits). This of course caused issues since the app conducted a lot of calculations and rounding errors accumulated very quickly. At some point, many of these floats were changed to decimals. Later, the floats (7) in the database all became doubles (15). After that we had several more errors with calculations involving doubles, and they too were changed to decimals.
Today about 1/3 of all of our floating point numbers in the database are decimals, the rest are doubles. My team wants to "standardize" all of our floating-point numbers in the database (and the new .Net code) to use either exclusively decimals or doubles except in cases where the other MUST be used. The majority of the team is set on using decimals; I'm the only person on my team advocating using doubles instead of decimals. Here's why...
Most of the numbers in the database are still doubles (though much of the application code still uses floats), and it would be a lot more effort to change all of the floats/doubles to decimals
For our app, none of the fields stored are "exact" decimal quantities. None of them are monetary quantities, and most represent some sort of "natural" measurement (e.g. mass, length, volume, etc.), so a double's 16 significant digits are already way more precise than even our initial measurements.
Many tables have measurements stored in two columns: 1 for the value; 1 for the unit of measure. This can lead to a HUGE difference in scale between the values in a single column. For example, one column can store a value in terms of pCi/g or Ci/m3 (1 Ci = 1000000000000 pCi). Since all the values in a single decimal columns must have the same scale (that is... an allocated number of digits both before and after the decimal point), I'm concerned that we will have overflow and rounding issues.
My teammates argue that:
Doubles are not as accurate nor as precise as decimals due to their inability to exactly represent 1/10 and that they only have 16 significant digits.
Even though we are not tracking money, the app is a inventory system that keeps track of material (mostly gram quantities) and it needs to be "as accurate as possible".
Even after the floats were changed to doubles, we continued to have bad results from calculations that used doubles. Changing these columns (and the application code) to decimals caused these calculations to produce the expected results.
It is my strong belief that the original issues where caused due to floats only having 7 significant digits and that simple arithmetic (e.g. 10001 * 10001) caused them to the data to quickly use up the few significant digits that they had. I do not believe this had anything to do with how binary-floating point numbers can only approximate decimal values, and I believe that using doubles would have fixed this issue.
I believe that the issue with doubles arose because doubles were used along side decimals in calculations that values were be converted back and forth between data types. Many of these calculations would round between intermediary steps in the calculation!
I'm trying to convince my team not to make everything under the sun into a decimal. Most values in the database don't have more than 5 or 6 significant digits anyway. Unfortunately, I am out-ranked by other members of my team that see things rather differently.
So, my question then is...
Am I worrying over nothing? Is there any real harm done by using almost exclusively decimals instead of doubles in an application with nearly 200 database tables, hundreds of transactions, and a rewrite schedule of 5 to 6 years?
Is using decimals actually solving an issue that doubles could not? From my research, both decimals and doubles are susceptible to rounding errors involving arbitrary fractions (adding 1/3 for example) and that the only way to account for this is to consider any value within a certain tolerance as being "equal" when comparing doubles and/or decimals.
If it is more appropriate to use doubles, what arguments could I make (other than what I have already made) could convince my team to not change everything to decimals?
Use decimal when you need perfect accuracy as a base-10 number (financial data, grades)
Use double or float when you are storing naturally imprecise data (measurements, temperature), want much faster mathematical operations, and can sacrifice a minute amount of imprecision.
Since you seem to be only storing various measurements (which have some precision anyways), float would be the logical choice (or double if you need more than 7 digits of precision).
Is using decimals actually solving an issue that doubles could not?
Not really - The data is only going to be as accurate as the measurements used to generate the data. Can you really say that a measured quantity is 123.4567 grams? Does the equipment used to measure it have that level of precision?
To deal with "rounding errors" I would argue that you can't really say whether a measurement of 1234.5 grams is exactly halfway - it could just as easily be 1234.49 grams, which would round down anyways.
What you need to decide is "what level of precision is acceptable" and always round to that precision as a last step. Don't round your data or intermediate calculations.
If it is more appropriate to use doubles, what arguments could I make (other than what I have already made) could convince my team to not change everything to decimals?
Other than the time spent switching, the only thing you're really sacrificing is speed. The only way to know how much speed is to try it both ways and measure the difference.
You'd better try your best not to lose precision. I guess my fault may convince you to choose double.
===> I did some wrong arithmetic, and it returns something very weird:
given 0.60, it returns 5
int get_index(double value) {
if (value < 0 || value > 1.00)
return -1;
return value / 0.10;
}
and I fixed it:
int get_index(double value) {
if (value < 0 || value > 1.00)
return -1;
return (value * 100000000) / (0.10 * 100000000);
}

SQL Server Strange Ceiling() behavior

Can anyone explain the following results in SQL Server? I'm stumped.
declare #mynum float = 8.31
select ceiling( #mynum*100)
Results in 831
declare #mynum float = 8.21
select ceiling( #mynum*100)
Results in 822
I've tested a whole range of numbers (in SQL Server 2012). Some increase while others stay the same. I'm at a loss understanding why ceiling is treating some of them differently. Changing from a float to a decimal(18,5) seems to fix the problem but I'm wary there may be other repercussions I'm missing from doing so. Any explanations would help.
I think this is called float precision. You can find it in almost all programming languages and in Database too. This is because data is stored only with some precision and in fact what you set as 8.31 is probably not 8.31 but for example 8.31631312381813 and when multiply it and ceil it may cause that different value appear.
At SQL server documentation page you can read:
Approximate-number data types for use with floating point numeric data. Floating point data is approximate; therefore, not all values in the data type range can be represented exactly.
In other database systems the same problem exists. For example at mysql website you can read:
Floating-point numbers sometimes cause confusion because they are approximate and not stored as exact values. A floating-point value as written in an SQL statement may not be the same as the value represented internally. Attempts to treat floating-point values as exact in comparisons may lead to problems. They are also subject to platform or implementation dependencies. The FLOAT and DOUBLE data types are subject to these issues. For DECIMAL columns, MySQL performs operations with a precision of 65 decimal digits, which should solve most common inaccuracy problems.
Floating point are not 100% accurate. Like Marcin Nabiałek wrote the 8.31 you see is probably represented by something else, something like 8.310000000001. See here for some interesting reading about the accuracy problem of floating point.
Solution is not to use floating point data types unless you really have to. You should rather use DECIMAL or MONEY data types.
If you really have to use a floating point data type, then you can add or subtract a small value (the accuracy thresold or epsilon) before every floor, ceiling or comparison operations to get the precision you want. If you have a lot of floating point operations then it might be worth it to code your own floating point comparison functions.

Objective C Multiplication of floats gives unexpected results

I'm literately just doing a multiplication of two floats. How come these statements produce different results ? Should I even be using floats ?
500,000.00 * 0.001660 = 830
How come these statements produce different results ?
Because floating-point arithmetic is not exact and apparently you were not printing the multiplier precisely enough (i. e. with sufficient number of decimal digits). And it wasn't .00166 but something that seemed 0.00166 rounded.
Should I even be using floats ?
No. For money, use integers and treat them as fixed-point rational numbers. (They still aren't exact, but significantly better and less error-prone.)
You didn't show how you initialized periodicInterest, and presumably you think you set it to 0.00166, but in fact the error in your output is large enough that you must not have explicitly initialized it as periodicInterest = 0.00166. It must be closer to 0.00165975, and the difference between 0.00166 and 0.00165975 is definitely large enough not to just be a single floating-point rounding error.
Assuming you are working with monetary quantities, you should use NSDecimalNumber or NSDecimal.
One non-obvious benefit of using NSDecimalNumber is that it works with NSNumberFormatter, so you can let Apple take care of formatting currencies for all sorts of foreign locales.
UPDATE
In response to the comments:
“periodicInterest is clearly not a monetary quantity” and “decimal is no more free of error when dividing by 12 than binary is” - for inexact quantities, I can think of two concerns:
One concern is using sufficient precision to give accurate results. NSDecimalNumber is a floating-point number with 38 digits of precision and an exponent in the range -128…127. This is more than twice the number of decimal digits an IEEE 'double' can store. The exponent range is less than that of a double, but that's unlikely to matter in financial computing. So NSDecimalNumbers can definitely result in smaller error than floats or doubles, even though none of them can store 1/12 exactly.
The other concern is matching the results computed by some other system, like your bank or your broker or the NYSE. In that case, you need to figure out how that other system is storing numbers and computing with them. If the other system is using a decimal format (which is likely in the financial sector), then NSDecimalNumber will probably be useful.
“Wouldn't it be more efficient to use primitive types to do floating point arithmetic, specially thousands in real time.” Arithmetic on primitive types is far faster than arithmetic on NSDecimalNumbers. I haven't measured it, but a factor of 100 would not surprise me.
You have to strike a balance between your requirements. If decimal accuracy is paramount (as it often is in financial programming), you must sacrifice performance for accuracy. If decimal accuracy is not so important, you can consider carefully using a primitive type, but you should be aware of the accuracy you're sacrificing. Even then, the size of a float is so small (usually only 7 significant decimal digits) that you should probably be using double (at least 15, usually 16 significant decimal digits).
If you need to perform millions of arithmetic operations per second with true decimal accuracy, you might be able to do it using doubles, if you are an IEEE 754 expert capable of analyzing your code to figure out where errors are introduced and how to eliminate them. Few people have this level of expertise. (I don't claim to.) You must also understand how your compiler turns your Objective-C code into machine instructions.
Anyway, perhaps you are just writing a casual app to compute a rough estimate of net present value or future value. In that case, using double would probably suffice, but using NSDecimalNumber would probably also be sufficiently fast. Without knowing more about the app you're writing, I can't give you more specific advice.

SQL: What do you use to store a ratio (percentage) in a database?

Should I use decimal or float to store a ratio in a database? Particularly in SQL2005.
That depends on what your need for accuracy is. If you can tolerate the typical errors that come from the IEEE method of storing floating point numbers, then use a float, otherwise, use a decimal if you need an exact representation (and that goes for any numbers that are not integers that you will use in calculations using the percentage as well).
It depends what your ratio is, really.
For interest rates, for instance, in banking, we always use decimal, with a precision and scale determined by the business. If an interest rate is to be calculated and the result is beyond the precision of the column, then there is always a business rule which defines how the result is to be rounded or truncated to fit into the column.
If your ratio is, for example, the ratio between an object's diameter and it's circumference, I would probably use a float.
Depends on your application. If you're using floats to calculate everything to do with this ratio already, it would probably make sense to store it as a float.
But otherwise, floats are in general a bit of a database smell. Database admins don't like them because the mounting inaccuracies mean that floating point numbers are inherently inconsistent. And that falls foul of the 'C'onsistency in our beloved ACID.
Decimals and scaled ints at least behave predictably.
Depends how exact you want it to be. If you need a 1 or 2 digits, and you know the maximum ratio is going to be under maxint/1000, I'd think about storing the ratio mulitplied by 100 in an int. If you need exact numbers, you might even want to store the numerator and denominator as separate ints. Otherwise, stick to floats or doubles.
I never use floats in a database, maybe it's an old habit that technology has addressed, I'm not 100% sure.
Either ints, scaled ints or decimals. There are times when a rounding error seems insignificant but it could fail matches on certain values or introduce cumulative errors.

Storing money in a decimal column - what precision and scale?

I'm using a decimal column to store money values on a database, and today I was wondering what precision and scale to use.
Since supposedly char columns of a fixed width are more efficient, I was thinking the same could be true for decimal columns. Is it?
And what precision and scale should I use? I was thinking precision 24/8. Is that overkill, not enough or ok?
This is what I've decided to do:
Store the conversion rates (when applicable) in the transaction table itself, as a float
Store the currency in the account table
The transaction amount will be a DECIMAL(19,4)
All calculations using a conversion rate will be handled by my application so I keep control of rounding issues
I don't think a float for the conversion rate is an issue, since it's mostly for reference, and I'll be casting it to a decimal anyway.
Thank you all for your valuable input.
If you are looking for a one-size-fits-all, I'd suggest DECIMAL(19, 4) is a popular choice (a quick Google bears this out). I think this originates from the old VBA/Access/Jet Currency data type, being the first fixed point decimal type in the language; Decimal only came in 'version 1.0' style (i.e. not fully implemented) in VB6/VBA6/Jet 4.0.
The rule of thumb for storage of fixed point decimal values is to store at least one more decimal place than you actually require to allow for rounding. One of the reasons for mapping the old Currency type in the front end to DECIMAL(19, 4) type in the back end was that Currency exhibited bankers' rounding by nature, whereas DECIMAL(p, s) rounded by truncation.
An extra decimal place in storage for DECIMAL allows a custom rounding algorithm to be implemented rather than taking the vendor's default (and bankers' rounding is alarming, to say the least, for a designer expecting all values ending in .5 to round away from zero).
Yes, DECIMAL(24, 8) sounds like overkill to me. Most currencies are quoted to four or five decimal places. I know of situations where a decimal scale of 8 (or more) is required but this is where a 'normal' monetary amount (say four decimal places) has been pro rata'd, implying the decimal precision should be reduced accordingly (also consider a floating point type in such circumstances). And no one has that much money nowadays to require a decimal precision of 24 :)
However, rather than a one-size-fits-all approach, some research may be in order. Ask your designer or domain expert about accounting rules which may be applicable: GAAP, EU, etc. I vaguely recall some EU intra-state transfers with explicit rules for rounding to five decimal places, therefore using DECIMAL(p, 6) for storage. Accountants generally seem to favour four decimal places.
PS Avoid SQL Server's MONEY data type because it has serious issues with accuracy when rounding, among other considerations such as portability etc. See Aaron Bertrand's blog.
Microsoft and language designers chose banker's rounding because hardware designers chose it [citation?]. It is enshrined in the Institute of Electrical and Electronics Engineers (IEEE) standards, for example. And hardware designers chose it because mathematicians prefer it. See Wikipedia; to paraphrase: The 1906 edition of Probability and Theory of Errors called this 'the computer's rule' ("computers" meaning humans who perform computations).
We recently implemented a system that needs to handle values in multiple currencies and convert between them, and figured out a few things the hard way.
NEVER USE FLOATING POINT NUMBERS FOR MONEY
Floating point arithmetic introduces inaccuracies that may not be noticed until they've screwed something up. All values should be stored as either integers or fixed-decimal types, and if you choose to use a fixed-decimal type then make sure you understand exactly what that type does under the hood (ie, does it internally use an integer or floating point type).
When you do need to do calculations or conversions:
Convert values to floating point
Calculate new value
Round the number and convert it back to an integer
When converting a floating point number back to an integer in step 3, don't just cast it - use a math function to round it first. This will usually be round, though in special cases it could be floor or ceil. Know the difference and choose carefully.
Store the type of a number alongside the value
This may not be as important for you if you're only handling one currency, but it was important for us in handling multiple currencies. We used the 3-character code for a currency, such as USD, GBP, JPY, EUR, etc.
Depending on the situation, it may also be helpful to store:
Whether the number is before or after tax (and what the tax rate was)
Whether the number is the result of a conversion (and what it was converted from)
Know the accuracy bounds of the numbers you're dealing with
For real values, you want to be as precise as the smallest unit of the currency. This means you have no values smaller than a cent, a penny, a yen, a fen, etc. Don't store values with higher accuracy than that for no reason.
Internally, you may choose to deal with smaller values, in which case that's a different type of currency value. Make sure your code knows which is which and doesn't get them mixed up. Avoid using floating point values even here.
Adding all those rules together, we decided on the following rules. In running code, currencies are stored using an integer for the smallest unit.
class Currency {
String code; // eg "USD"
int value; // eg 2500
boolean converted;
}
class Price {
Currency grossValue;
Currency netValue;
Tax taxRate;
}
In the database, the values are stored as a string in the following format:
USD:2500
That stores the value of $25.00. We were able to do that only because the code that deals with currencies doesn't need to be within the database layer itself, so all values can be converted into memory first. Other situations will no doubt lend themselves to other solutions.
And in case I didn't make it clear earlier, don't use float!
When handling money in MySQL, use DECIMAL(13,2) if you know the precision of your money values or use DOUBLE if you just want a quick good-enough approximate value.
So if your application needs to handle money values up to a trillion dollars (or euros or pounds), then this should work:
DECIMAL(13, 2)
Or, if you need to comply with GAAP then use:
DECIMAL(13, 4)
The money datatype on SQL Server has four digits after the decimal.
From SQL Server 2000 Books Online:
Monetary data represents positive or negative amounts of money. In Microsoft® SQL Server™ 2000, monetary data is stored using the money and smallmoney data types. Monetary data can be stored to an accuracy of four decimal places. Use the money data type to store values in the range from -922,337,203,685,477.5808 through +922,337,203,685,477.5807 (requires 8 bytes to store a value). Use the smallmoney data type to store values in the range from -214,748.3648 through 214,748.3647 (requires 4 bytes to store a value). If a greater number of decimal places are required, use the decimal data type instead.
4 decimal places would give you the accuracy to store the world's smallest currency sub-units. You can take it down further if you need micropayment (nanopayment?!) accuracy.
I too prefer DECIMAL to DBMS-specific money types, you're safer keeping that kind of logic in the application IMO. Another approach along the same lines is simply to use a [long] integer, with formatting into ¤unit.subunit for human readability (¤ = currency symbol) done at the application level.
If you were using IBM Informix Dynamic Server, you would have a MONEY type which is a minor variant on the DECIMAL or NUMERIC type. It is always a fixed-point type (whereas DECIMAL can be a floating point type). You can specify a scale from 1 to 32, and a precision from 0 to 32 (defaulting to a scale of 16 and a precision of 2). So, depending on what you need to store, you might use DECIMAL(16,2) - still big enough to hold the US Federal Deficit, to the nearest cent - or you might use a smaller range, or more decimal places.
Sometimes you will need to go to less than a cent and there are international currencies that use very large demoniations. For example, you might charge your customers 0.088 cents per transaction. In my Oracle database the columns are defined as NUMBER(20,4)
If you're going to be doing any sort of arithmetic operations in the DB (multiplying out billing rates and so on), you'll probably want a lot more precision than people here are suggesting, for the same reasons that you'd never want to use anything less than a double-precision floating point value in application code.
I would think that for a large part your or your client's requirements should dictate what precision and scale to use. For example, for the e-commerce website I am working on that deals with money in GBP only, I have been required to keep it to Decimal( 6, 2 ).
A late answer here, but I've used
DECIMAL(13,2)
which I'm right in thinking should allow upto 99,999,999,999.99.