SQL Server Rounding Issue - sql

I'm using SQL Server 2005. And I'm using ROUND T-SQL function to round a decimal column value. But it seems that the rounded value is incorrect.
PRINT ROUND(1890.124854, 2) => 1890.120000
As shown the ROUND function is returning 1890.12 where as it should be 1890.13. Does anyone encountered this and what should be the correct way of rounding so that I get the expected value 1890.13..?
Thanks.

ROUND() is working as it was intended to. You specified to round to 2 decimal places, and that's what you got.
Returns a numeric value, rounded to the specified length or precision.
Rounding means that a digit of 5 or above goes up to nearest, less than 5 down to nearest.
so,
PRINT ROUND(1890.125000, 2)
produces 1890.130000
Whereas
PRINT ROUND(1890.124999, 2)
produces 1890.120000

Your rounding issue is related to the rounding algorithm used by SQL Server. I believe SQL Server uses the "Round to Even" (sometimes known as Banker's Rounding) algorithm.
In Banker's Rounding, a digit get rounded down if the least significant digit to the right of it is less than five or rounded up if the least significant digit to the right of it is greater than five.
If the least significant digit to the right of it is equal to five, then the digit to the left of the five is rounded up to the nearest even number.
In your example of 1890.124854, as the rounding begins at the right-most digit and works to the left, the 8 causes the 4 to the left of it to get rounded up to 5. The five has an even number (2) to the left of it so, since it's already even, it leaves it alone. Thus, rounding to two decimal places should yield 1890.12.
However, if your example was instead 1890.134854, then as the rounding works from right to left, the 8 rounds the 4 up to 5 and then the 3 next to the 5 gets rounded up to the next even number which is 4. The result of rounding to two decimal places should then yield 1890.14.
The theory is that 1890.125 is neither closer to 1890.12 or 1890.13. It is exactly in between. Therefore, to always round up every digit to the left of a 5 would give an undesired upward bias that can skew calculations toward an artificially high result. This bias upward becomes more exaggerated in complex calculations or those involving multiple iterations where a five as the least-significant digit may be encountered numerous times. However, in general calculations, the number to the left of 5 is statistically just as likely to be odd as even. Because of this, rounding to the even number causes the calculation to statistically hover close to the true mean of the rounded number.
Anymore, almost everything uses this "Round to Even" algorithm. Many years ago, I used to develop in a programming language that didn't. It used the more "traditional" rounding where everything to the left of a 5 got rounded up, regardless of being odd or even. We ran into the biasing problem I mentioned above.

Related

When to use decimals or doubles

Quick Aside: I'm going to use the word "Float" to refer to both a .Net float and a SQL float with only 7 significant digits. I will use the word "Double" to refer to a .Net double and a SQL float with 15 significant digits. I also realize that this is very similar to some other posts regarding decimals/doubles, but the answers on those posts are really inconsistent, and I really want some recommendations for my specific circumstance...
I am part of a team that is rewriting an old application. The original app used floats (7 digits). This of course caused issues since the app conducted a lot of calculations and rounding errors accumulated very quickly. At some point, many of these floats were changed to decimals. Later, the floats (7) in the database all became doubles (15). After that we had several more errors with calculations involving doubles, and they too were changed to decimals.
Today about 1/3 of all of our floating point numbers in the database are decimals, the rest are doubles. My team wants to "standardize" all of our floating-point numbers in the database (and the new .Net code) to use either exclusively decimals or doubles except in cases where the other MUST be used. The majority of the team is set on using decimals; I'm the only person on my team advocating using doubles instead of decimals. Here's why...
Most of the numbers in the database are still doubles (though much of the application code still uses floats), and it would be a lot more effort to change all of the floats/doubles to decimals
For our app, none of the fields stored are "exact" decimal quantities. None of them are monetary quantities, and most represent some sort of "natural" measurement (e.g. mass, length, volume, etc.), so a double's 16 significant digits are already way more precise than even our initial measurements.
Many tables have measurements stored in two columns: 1 for the value; 1 for the unit of measure. This can lead to a HUGE difference in scale between the values in a single column. For example, one column can store a value in terms of pCi/g or Ci/m3 (1 Ci = 1000000000000 pCi). Since all the values in a single decimal columns must have the same scale (that is... an allocated number of digits both before and after the decimal point), I'm concerned that we will have overflow and rounding issues.
My teammates argue that:
Doubles are not as accurate nor as precise as decimals due to their inability to exactly represent 1/10 and that they only have 16 significant digits.
Even though we are not tracking money, the app is a inventory system that keeps track of material (mostly gram quantities) and it needs to be "as accurate as possible".
Even after the floats were changed to doubles, we continued to have bad results from calculations that used doubles. Changing these columns (and the application code) to decimals caused these calculations to produce the expected results.
It is my strong belief that the original issues where caused due to floats only having 7 significant digits and that simple arithmetic (e.g. 10001 * 10001) caused them to the data to quickly use up the few significant digits that they had. I do not believe this had anything to do with how binary-floating point numbers can only approximate decimal values, and I believe that using doubles would have fixed this issue.
I believe that the issue with doubles arose because doubles were used along side decimals in calculations that values were be converted back and forth between data types. Many of these calculations would round between intermediary steps in the calculation!
I'm trying to convince my team not to make everything under the sun into a decimal. Most values in the database don't have more than 5 or 6 significant digits anyway. Unfortunately, I am out-ranked by other members of my team that see things rather differently.
So, my question then is...
Am I worrying over nothing? Is there any real harm done by using almost exclusively decimals instead of doubles in an application with nearly 200 database tables, hundreds of transactions, and a rewrite schedule of 5 to 6 years?
Is using decimals actually solving an issue that doubles could not? From my research, both decimals and doubles are susceptible to rounding errors involving arbitrary fractions (adding 1/3 for example) and that the only way to account for this is to consider any value within a certain tolerance as being "equal" when comparing doubles and/or decimals.
If it is more appropriate to use doubles, what arguments could I make (other than what I have already made) could convince my team to not change everything to decimals?
Use decimal when you need perfect accuracy as a base-10 number (financial data, grades)
Use double or float when you are storing naturally imprecise data (measurements, temperature), want much faster mathematical operations, and can sacrifice a minute amount of imprecision.
Since you seem to be only storing various measurements (which have some precision anyways), float would be the logical choice (or double if you need more than 7 digits of precision).
Is using decimals actually solving an issue that doubles could not?
Not really - The data is only going to be as accurate as the measurements used to generate the data. Can you really say that a measured quantity is 123.4567 grams? Does the equipment used to measure it have that level of precision?
To deal with "rounding errors" I would argue that you can't really say whether a measurement of 1234.5 grams is exactly halfway - it could just as easily be 1234.49 grams, which would round down anyways.
What you need to decide is "what level of precision is acceptable" and always round to that precision as a last step. Don't round your data or intermediate calculations.
If it is more appropriate to use doubles, what arguments could I make (other than what I have already made) could convince my team to not change everything to decimals?
Other than the time spent switching, the only thing you're really sacrificing is speed. The only way to know how much speed is to try it both ways and measure the difference.
You'd better try your best not to lose precision. I guess my fault may convince you to choose double.
===> I did some wrong arithmetic, and it returns something very weird:
given 0.60, it returns 5
int get_index(double value) {
if (value < 0 || value > 1.00)
return -1;
return value / 0.10;
}
and I fixed it:
int get_index(double value) {
if (value < 0 || value > 1.00)
return -1;
return (value * 100000000) / (0.10 * 100000000);
}

Handling variable DECIMAL data in SQL

I have schedule job to pull data from our legacy system every month. The data can sometime swell and shrink. This cause havoc for DECIMAL precision.
I just found this job failed because DECIMAL(5,3) was too restrictive. I changed it to DECIMAL(6,3) and life is back on track.
Is there any way to evaluate this shifting data so it doesn't break on the DECIMAL()?
Thanks,
-Allen
Is there any way to evaluate this shifting data so it doesn't break on the DECIMAL()
Find the maximum value your data can have and set the column size appropriately.
Decimal columns have two size factors: scale and precision. Set your precision to as many deimal paces you need (3 in your case), and set the scale based on the largest possible number you can have.
A DECIMAL(5,3) has three digits of precision past the decimal and 5 total digits, so it can store numbers up to 99.999. If your data can be 100 or larger, use a bigger scale.
If your data is scientific in nature (e.g. temperature readings) and you don't care about exact equality, only showing trends, relative value, etc.) then you might use real instead. It takes less space than a DECIMAL(5,3) (4 bytes vs 5), has 7 digits of precision (vs. 5) and a range of -3.4E38 to 3.4E38 (vs -99.999 to 99.999).
DECIMAL is more suited for financial data or other data where exact equality is important (i.e. rounding errors are bad)

MATLAB dealing with approximation-- singles to doubles

I am pulling financial data into Matlab from SQL, where it is unfortunately stored as a 'Real' (which is an approximate data-type).
For example, a value got loaded into SQL as "96.194" which is the correct value (this could have any number of decimals 1-5). I know in SQL it is stored as something like 96.19400024 because it is an approximation, but SQL Server somehow knows to display it as 96.194.
When I pull it into matlab, it gets pulled in as 96.194, which is what I want. Unfortunately, it turns out it's not actually 96.194, as demonstrated:
>>price
price =
96.194
>> price==96.194
ans =
0
>> class(price)
ans =
single
>> double(price)
ans =
96.1940002441406
So my question is, is there a way to convert a single to a double exactly as it appears as a single (i.e. truncate all the decimals which are the approximation? Note: I cannot just round it because I don't know how many decimals it's supposed to have.
The vpa function lets you specify a number of significant (nonzero) digits that is different from the current digits setting. For example:
vpa(price, num_of_digits_required)
or in your case:
vpa(double(price),7)
(6 or 8 significant digits will yield the same result)
Edit
To use vpa you'll need the Symbolic Math Toolbox, there are alternatives found on the web, such as this FEX file.
Single precision floating point values have only about 7 digits of precision (23 bit fractional component, log10(2^24) ≈ 7.225 decimal digits) so you could round off all but the 7 most significant digits.

Print a number in decimal

Well, it is a low-level question
Suppose I store a number (of course computer store number in binary format)
How can I print it in decimal format. It is obvious in high-level program, just print it and the library does it for you.
But how about in a very low-level situation where I don't have this library.
I can just tell what 'character' to output. How to convert the number into decimal characters?
I hope you understand my question. Thank You.
There are two ways of printing decimals - on CPUs with division/remainder instructions (modern CPUs are like that) and on CPUs where division is relatively slow (8-bit CPUs of 20+ years ago).
The first method is simple: int-divide the number by ten, and store the sequence of remainders in an array. Once you divided the number all the way to zero, start printing remainders starting from the back, adding the ASCII code of zero ('0') to each remainder.
The second method relies on the lookup table of powers of ten. You define an array of numbers like this:
int pow10 = {10000,1000,100,10,1}
Then you start with the largest power, and see if you can subtract it from the number at hand. If you can, keep subtracting it, and keep the count. Once you cannot subtract it without going negative, print the count plus the ASCII code of zero, and move on to the next smaller power of ten.
If integer, divide by ten, get both the result and the remainder. Repeat the process on the result until zero. The remainders will give you decimal digits from right to left. Add 48 for ASCII representation.
Basically, you want to tranform a number (stored in some arbitrary internal representation) into its decimal representation. You can do this with a few simple mathematical operations. Let's assume that we have a positive number, say 1234.
number mod 10 gives you a value between 0 and 9 (4 in our example), which you can map to a character¹. This is the rightmost digit.
Divide by 10, discarding the remainder (an operation commonly called "integer division"): 1234 → 123.
number mod 10 now yields 3, the second-to-rightmost digit.
continue until number is zero.
Footnotes:
¹ This can be done with a simple switch statement with 10 cases. Of course, if your character set has the characters 0..9 in consecutive order (like ASCII), '0' + number suffices.
It doesnt matter what the number system is, decimal, binary, octal. Say I have the decimal value 123 on a decimal computer, I would still need to convert that value to three characters to display them. Lets assume ASCII format. By looking at an ASCII table we know the answer we are looking for, 0x31,0x32,0x33.
If you divide 123 by 10 using integer math you get 12. Multiply 12*10 you get 120, the difference is 3, your least significant digit. we go back to the 12 and divide that by 10, giving a 1. 1 times 10 is 10, 12-10 is 2 our next digit. we take the 1 that is left over divide by 10 and get zero we know we are now done. the digits we found in order are 3, 2, 1. reverse the order 1, 2, 3. Add or OR 0x30 to each to convert them from integers to ascii.
change that to use a variable instead of 123 and use any numbering system you like so long as it has enough digits to do this kind of work
You can go the other way too, divide by 100...000, whatever the largest decimal you can store or intend to find, and work your way down. In this case the first non zero comes with a divide by 100 giving a 1. save the 1. 1 times 100 = 100, 123-100 = 23. now divide by 10, this gives a 2, save the 2, 2 times 10 is 20. 23 - 20 = 3. when you get to divide by 1 you are done save that value as your ones digit.
here is another given a number of seconds to convert to say hours and minutes and seconds, you can divide by 60, save the result a, subtract the original number - (a*60) giving your remainder which is seconds, save that. now take a and divide by 60, save that as b, this is your number of hours. subtract a - (b*60) this is the remainder which is minutes save that. done hours, minutes seconds. you can then divide the hours by 24 to get days if you want and days and then that by 7 if you want weeks.
A comment about divide instructions was brought up. Divides are very expensive and most processors do not have one. Expensive in that the divide, in a single clock, costs you gates and power. If you do the divide in many clocks you might as well just do a software divide and save the gates. Same reason most processors dont have an fpu, gates and power. (gates mean larger chips, more expensive chips, lower yield, etc). It is not a case of modern or old or 64 bit vs 8 bit or anything like that it is an engineering and business trade off. the 8088/86 has a divide with a remainder for example (it also has a bcd add). The gates/size if used might be better served than for a single instruction. Multiply falls into that category, not as bad but can be. If operand sizes are not done right you can make either instruction (family) not as useful to a programmer. Which brings up another point, I cant find the link right now but a way to avoid divides but convert from a number to a string of decimal digits is that you can multiply by .1 using fixed point. I also cant find the quote about real programmers not needing floating point related to keeping track of the decimal point yourself. its the slide rule vs calculator thing. I believe the link to the article on dividing by 10 using a multiply is somewhere on stack overflow.

Why see -0,000000000000001 in access query?

I have an sql:
SELECT Sum(Field1), Sum(Field2), Sum(Field1)+Sum(Field2)
FROM Table
GROUP BY DateField
HAVING Sum(Field1)+Sum(Field2)<>0;
Problem is sometimes Sum of field1 and field2 is value like: 9.5-10.3 and the result is -0,800000000000001. Could anybody explain why this happens and how to solve it?
Problem is sometimes Sum of field1 and
field2 is value like: 9.5-10.3 and the
result is -0.800000000000001. Could
anybody explain why this happens and
how to solve it?
Why this happens
The float and double types store numbers in base 2, not in base 10. Sometimes, a number can be exactly represented in a finite number of bits.
9.5 → 1001.1
And sometimes it can't.
10.3 → 1010.0 1001 1001 1001 1001 1001 1001 1001 1001...
In the latter case, the number will get rounded to the closest value that can be represented as a double:
1010.0100110011001100110011001100110011001100110011010 base 2
= 10.300000000000000710542735760100185871124267578125 base 10
When the subtraction is done in binary, you get:
-0.11001100110011001100110011001100110011001100110100000
= -0.800000000000000710542735760100185871124267578125
Output routines will usually hide most of the "noise" digits.
Python 3.1 rounds it to -0.8000000000000007
SQLite 3.6 rounds it to -0.800000000000001.
printf %g rounds it to -0.8.
Note that, even on systems that display the value as -0.8, it's not the same as the best double approximation of -0.8, which is:
- 0.11001100110011001100110011001100110011001100110011010
= -0.8000000000000000444089209850062616169452667236328125
So, in any programming language using double, the expression 9.5 - 10.3 == -0.8 will be false.
The decimal non-solution
With questions like these, the most common answer is "use decimal arithmetic". This does indeed get better output in this particular example. Using Python's decimal.Decimal class:
>>> Decimal('9.5') - Decimal('10.3')
Decimal('-0.8')
However, you'll still have to deal with
>>> Decimal(1) / 3 * 3
Decimal('0.9999999999999999999999999999')
>>> Decimal(2).sqrt() ** 2
Decimal('1.999999999999999999999999999')
These may be more familiar rounding errors than the ones binary numbers have, but that doesn't make them less important.
In fact, binary fractions are more accurate than decimal fractions with the same number of bits, because of a combination of:
The hidden bit unique to base 2, and
The suboptimal radix economy of decimal.
It's also much faster (on PCs) because it has dedicated hardware.
There is nothing special about base ten. It's just an arbitrary choice based on the number of fingers we have.
It would be just as accurate to say that a newborn baby weighs 0x7.5 lb (in more familiar terms, 7 lb 5 oz) as to say that it weighs 7.3 lb. (Yes, there's a 0.2 oz difference between the two, but it's within tolerance.) In general, decimal provides no advantage in representing physical measurements.
Money is different
Unlike physical quantities which are measured to a certain level of precision, money is counted and thus an exact quantity. The quirk is that it's counted in multiples of 0.01 instead of multiples of 1 like most other discrete quantities.
If your "10.3" really means $10.30, then you should use a decimal number type to represent the value exactly.
(Unless you're working with historical stock prices from the days when they were in 1/16ths of a dollar, in which case binary is adequate anyway ;-) )
Otherwise, it's just a display issue.
You got an answer correct to 15 significant digits. That's correct for all practical purposes. If you just want to hide the "noise", use the SQL ROUND function.
I'm certain it is because the float data type (aka Double or Single in MS Access) is inexact. It is not like decimal which is a simple value scaled by a power of 10. If I'm remembering correctly, float values can have different denominators which means that they don't always convert back to base 10 exactly.
The cure is to change Field1 and Field2 from float/single/double to decimal or currency. If you give examples of the smallest and largest values you need to store, including the smallest and largest fractions needed such as 0.0001 or 0.9999, we can possibly advise you better.
Be aware that versions of Access before 2007 can have problems with ORDER BY on decimal values. Please read the comments on this post for some more perspective on this. In many cases, this would not be an issue for people, but in other cases it might be.
In general, float should be used for values that can end up being extremely small or large (smaller or larger than a decimal can hold). You need to understand that float maintains more accurate scale at the cost of some precision. That is, a decimal will overflow or underflow where a float can just keep on going. But the float only has a limited number of significant digits, whereas a decimal's digits are all significant.
If you can't change the column types, then in the meantime you can work around the problem by rounding your final calculation. Don't round until the very last possible moment.
Update
A criticism of my recommendation to use decimal has been leveled, not the point about unexpected ORDER BY results, but that float is overall more accurate with the same number of bits.
No contest to this fact. However, I think it is more common for people to be working with values that are in fact counted or are expected to be expressed in base ten. I see questions over and over in forums about what's wrong with their floating-point data types, and I don't see these same questions about decimal. That means to me that people should start off with decimal, and when they're ready for the leap to how and when to use float they can study up on it and start using it when they're competent.
In the meantime, while it may be a tad frustrating to have people always recommending decimal when you know it's not as accurate, don't let yourself get divorced from the real world where having more familiar rounding errors at the expense of very slightly reduced accuracy is of value.
Let me point out to my detractors that the example
Decimal(1) / 3 * 3 yielding 1.999999999999999999999999999
is, in what should be familiar words, "correct to 27 significant digits" which is "correct for all practical purposes."
So if we have two ways of doing what is practically speaking the same thing, and both of them can represent numbers very precisely out to a ludicrous number of significant digits, and both require rounding but one of them has markedly more familiar rounding errors than the other, I can't accept that recommending the more familiar one is in any way bad. What is a beginner to make of a system that can perform a - a and not get 0 as an answer? He's going to get confusion, and be stopped in his work while he tries to fathom it. Then he'll go ask for help on a message board, and get told the pat answer "use decimal". Then he'll be just fine for five more years, until he has grown enough to get curious one day and finally studies and really grasps what float is doing and becomes able to use it properly.
That said, in the final analysis I have to say that slamming me for recommending decimal seems just a little bit off in outer space.
Last, I would like to point out that the following statement is not strictly true, since it overgeneralizes:
The float and double types store numbers in base 2, not in base 10.
To be accurate, most modern systems store floating-point data types with a base of 2. But not all! Some use or have used base 10. For all I know, there are systems which use base 3 which is closer to e and thus has a more optimal radix economy than base 2 representations (as if that really mattered to 99.999% of all computer users). Additionally, saying "float and double types" could be a little misleading, since double IS float, but float isn't double. Float is short for floating-point, but Single and Double are float(ing point) subtypes which connote the total precision available. There are also the Single-Extended and Double-Extended floating point data types.
It is probably an effect of floating point number implementations. Sometimes numbers cannot be exactly represented, and sometimes the result of operations is slightly off what we may expect for the same reason.
The fix would be to use a rounding function on the values to cut off the extraneous digits. Like this (I've simply rounded to 4 significant digits after the decimal, but of course you should use whatever precision is appropriate for your data):
SELECT Sum(Field1), Sum(Field2), Round(Sum(Field1)+Sum(Field2), 4)
FROM Table
GROUP BY DateField
HAVING Round(Sum(Field1)+Sum(Field2), 4)<>0;