Handling variable DECIMAL data in SQL - sql

I have schedule job to pull data from our legacy system every month. The data can sometime swell and shrink. This cause havoc for DECIMAL precision.
I just found this job failed because DECIMAL(5,3) was too restrictive. I changed it to DECIMAL(6,3) and life is back on track.
Is there any way to evaluate this shifting data so it doesn't break on the DECIMAL()?
Thanks,
-Allen

Is there any way to evaluate this shifting data so it doesn't break on the DECIMAL()
Find the maximum value your data can have and set the column size appropriately.
Decimal columns have two size factors: scale and precision. Set your precision to as many deimal paces you need (3 in your case), and set the scale based on the largest possible number you can have.
A DECIMAL(5,3) has three digits of precision past the decimal and 5 total digits, so it can store numbers up to 99.999. If your data can be 100 or larger, use a bigger scale.
If your data is scientific in nature (e.g. temperature readings) and you don't care about exact equality, only showing trends, relative value, etc.) then you might use real instead. It takes less space than a DECIMAL(5,3) (4 bytes vs 5), has 7 digits of precision (vs. 5) and a range of -3.4E38 to 3.4E38 (vs -99.999 to 99.999).
DECIMAL is more suited for financial data or other data where exact equality is important (i.e. rounding errors are bad)

Related

Why when converting SQL Real to Numeric does the scale slightly increase?

I'm storing a value (0.15) as a Real datatype in a Quantity field in SQL.
Just playing around, when I cast as numeric, there are some very slight changes to scale.
I'm unsure why this occurs, and why these particular numbers?
select CAST(Quantity AS numeric(18,18)) -- Quantity being 0.15
returns
0.150000005960464480
Real and float are approximate numerics, not exact ones. If you need exact ones, use DECIMAL.
The benefit of the estimated ones is that they allow storing very large numbers using fewer storage bytes.
https://learn.microsoft.com/en-us/sql/t-sql/data-types/float-and-real-transact-sql?view=sql-server-2017
PS:Numeric and decimal are synonymous.
PS2: See Eric's Postpischil clarification comment below:
"Float and real represent a number as a significand multiplied by a power of two. decimal represents a number as a significand multiplied by a power of ten. Both means of representation are incapable of representing all real numbers, and both means of representation are subject to rounding errors. As I wrote, dividing 1 by 3 in a decimal format will have a rounding error"

Is there a difference between length of a decimal?

I have a column that has decimal lengths of 11 total digits (0.4335687532, -0.4216776567), however, I created the datatype in SQL as decimal(18,15), which ultimately adds five 0s at the end of the decimal. The reason I did this was because new data could have more digits, but there is no indication that it will. I am wondering if adding these decimals will ultimately impact any performance or anything else I might be missing.
The precision determines the length of the decimal value, not the scale. So, decimal(18, 5) and decimal(18, 15) typically occupy the same number of bytes on a data page.
The amount added no doubt depends on the database you are using -- each database is free to store decimals in its own format. But typically, the format would use one byte for every two decimal places and maybe another byte or two of overhead. So, the extra 5 zeros would typically require 2-3 more bytes of storage.
Such overhead is usually pretty minor when determining the behavior of a database. After all, the unit of reading and writing data is measured in data pages -- and these are usually measured in kilobytes to megabytes.
Depending on the system (hardware and software), arithmetic operations on slightly longer numerics might take slightly more time.
Overall, though, it is hard to think of a scenario where performance would be noticeably affected.

When to use decimals or doubles

Quick Aside: I'm going to use the word "Float" to refer to both a .Net float and a SQL float with only 7 significant digits. I will use the word "Double" to refer to a .Net double and a SQL float with 15 significant digits. I also realize that this is very similar to some other posts regarding decimals/doubles, but the answers on those posts are really inconsistent, and I really want some recommendations for my specific circumstance...
I am part of a team that is rewriting an old application. The original app used floats (7 digits). This of course caused issues since the app conducted a lot of calculations and rounding errors accumulated very quickly. At some point, many of these floats were changed to decimals. Later, the floats (7) in the database all became doubles (15). After that we had several more errors with calculations involving doubles, and they too were changed to decimals.
Today about 1/3 of all of our floating point numbers in the database are decimals, the rest are doubles. My team wants to "standardize" all of our floating-point numbers in the database (and the new .Net code) to use either exclusively decimals or doubles except in cases where the other MUST be used. The majority of the team is set on using decimals; I'm the only person on my team advocating using doubles instead of decimals. Here's why...
Most of the numbers in the database are still doubles (though much of the application code still uses floats), and it would be a lot more effort to change all of the floats/doubles to decimals
For our app, none of the fields stored are "exact" decimal quantities. None of them are monetary quantities, and most represent some sort of "natural" measurement (e.g. mass, length, volume, etc.), so a double's 16 significant digits are already way more precise than even our initial measurements.
Many tables have measurements stored in two columns: 1 for the value; 1 for the unit of measure. This can lead to a HUGE difference in scale between the values in a single column. For example, one column can store a value in terms of pCi/g or Ci/m3 (1 Ci = 1000000000000 pCi). Since all the values in a single decimal columns must have the same scale (that is... an allocated number of digits both before and after the decimal point), I'm concerned that we will have overflow and rounding issues.
My teammates argue that:
Doubles are not as accurate nor as precise as decimals due to their inability to exactly represent 1/10 and that they only have 16 significant digits.
Even though we are not tracking money, the app is a inventory system that keeps track of material (mostly gram quantities) and it needs to be "as accurate as possible".
Even after the floats were changed to doubles, we continued to have bad results from calculations that used doubles. Changing these columns (and the application code) to decimals caused these calculations to produce the expected results.
It is my strong belief that the original issues where caused due to floats only having 7 significant digits and that simple arithmetic (e.g. 10001 * 10001) caused them to the data to quickly use up the few significant digits that they had. I do not believe this had anything to do with how binary-floating point numbers can only approximate decimal values, and I believe that using doubles would have fixed this issue.
I believe that the issue with doubles arose because doubles were used along side decimals in calculations that values were be converted back and forth between data types. Many of these calculations would round between intermediary steps in the calculation!
I'm trying to convince my team not to make everything under the sun into a decimal. Most values in the database don't have more than 5 or 6 significant digits anyway. Unfortunately, I am out-ranked by other members of my team that see things rather differently.
So, my question then is...
Am I worrying over nothing? Is there any real harm done by using almost exclusively decimals instead of doubles in an application with nearly 200 database tables, hundreds of transactions, and a rewrite schedule of 5 to 6 years?
Is using decimals actually solving an issue that doubles could not? From my research, both decimals and doubles are susceptible to rounding errors involving arbitrary fractions (adding 1/3 for example) and that the only way to account for this is to consider any value within a certain tolerance as being "equal" when comparing doubles and/or decimals.
If it is more appropriate to use doubles, what arguments could I make (other than what I have already made) could convince my team to not change everything to decimals?
Use decimal when you need perfect accuracy as a base-10 number (financial data, grades)
Use double or float when you are storing naturally imprecise data (measurements, temperature), want much faster mathematical operations, and can sacrifice a minute amount of imprecision.
Since you seem to be only storing various measurements (which have some precision anyways), float would be the logical choice (or double if you need more than 7 digits of precision).
Is using decimals actually solving an issue that doubles could not?
Not really - The data is only going to be as accurate as the measurements used to generate the data. Can you really say that a measured quantity is 123.4567 grams? Does the equipment used to measure it have that level of precision?
To deal with "rounding errors" I would argue that you can't really say whether a measurement of 1234.5 grams is exactly halfway - it could just as easily be 1234.49 grams, which would round down anyways.
What you need to decide is "what level of precision is acceptable" and always round to that precision as a last step. Don't round your data or intermediate calculations.
If it is more appropriate to use doubles, what arguments could I make (other than what I have already made) could convince my team to not change everything to decimals?
Other than the time spent switching, the only thing you're really sacrificing is speed. The only way to know how much speed is to try it both ways and measure the difference.
You'd better try your best not to lose precision. I guess my fault may convince you to choose double.
===> I did some wrong arithmetic, and it returns something very weird:
given 0.60, it returns 5
int get_index(double value) {
if (value < 0 || value > 1.00)
return -1;
return value / 0.10;
}
and I fixed it:
int get_index(double value) {
if (value < 0 || value > 1.00)
return -1;
return (value * 100000000) / (0.10 * 100000000);
}

Proper Data Type in SQL Server to store Scientific Notation value? (Ex. 10^3)

Say I have test results values for a lab procedure that come in as 103. What would be the best way to store this in SQL Server? I would think since this is numerical data it would be improper to just store it as string text and then program around calculating the data value from the string.
If you want to use your data in numeric calculations, it is probably best to represent your data using once of SQL servers native numeric data type. Since you show scientific notation, it is likely you will want to use either REAL or FLOAT.
Real is basically 7 decimal digits of precision and float has 15 digits of precision (at least this is how they are normally used). You can actually specify reduced precision for FLOAT, but in practice most people just use REAL in that case. REAL takes 4 bytes of storage, and FLOAT requires 8 bytes.
The other numeric types are for fixed decimal point arithmetic.
Numbers in scientific notation like this have three pieces of information:
The significand
The precision of the significand
The exponent of 10
Presuming we want to keep all this information as exact as possible, it may be best to store these in three non-floating point columns (floating-point values are inexact):
DECIMAL significand
INT precision (# of decimal places)
INT exponent
The downside to the approach of separating these parts out, of course, is that you'll have to put the values back together when doing calculations -- but by doing that you'll know the correct number of significant figures for the result. Storing these three parts will also take up 25 bytes per value (17 for the DECIMAL, and 4 each for the two INTs), which may be a concern if you're storing a very large quantity of values.
Update per explanatory comments:
Given that your goal is to store an exponent from 1-8, you really only need to store the exponent, since you know the mantissa is always 10. Therefore, if your value is always going to be a whole number, you can just use a single INT column; if it will have decimal places, you can use a FLOAT or REAL per Gary Walker, or use a DECIMAL to store a precise decimal to a specified number of places.
If you specify a DECIMAL, you can provide two arguments in the column type; the first is the total number of digits to be stored, while the second is the number of digits to the right of the decimal point. So if your values are going to be accurate to the tenths place, you might create a column of DECIMAL(2,1). SQL Server MSDN documentation: DECIMAL and NUMERIC types

Sql Server 2005 Data Types

What is the diff between real, float, decimal and money. And most important, when would I use them. Like I understand - real and float are approx. types, meaning they dont store the exact value. Why would you ever want this?
Thanks
real and float numeric types are useful to handle a very wide range of values as is encountered with physical dimensions or mathematical results.
The loss of precision they incur, for example when adding values which are not in the same range, for example 0.00002468 + 1.23E9 (i.e. 1,230,000) is typically acceptable for practical uses. This is a small tribute to pay to the relatively compact storage requirements of these floating point types.
The decimal and money types do not cover such a broad range (yet they cover ranges that are beyond most typical accounting applications), and do not exhibit any of this lossy behavior with rounding and such.
See MS-SQL document for detailed information. The following table provides an indicative precision, range and storage requirement for various types.
Type Max value precision(*) Storage
money +/-922,000,000,000,000 3 (4?) 8 bytes
smallmoney +/-200,000 3? 4 bytes
decimal varies (as defined) varies varies 3 to 17
real +/- 3.4 * 10^38 7 digits 4 bytes
float "56" +/- 1.7 * 10 ^308 15 digits 8 bytes (float can also be declared to be just like a real)
(*) precision : For the "exact" types, this is the number of digits after the decimal point. For the "lossy" reals and floats, this is the number of significant digits.
Money is an exact data type. as in it is continuous between its upper and lower bound. You would generally use it when you want to store values of money and don't want to lose precision and get rounding errors caused by IEEE754. Decimal is a similarly an exact data type that isn't lossy up to a certain number of decimal places (which you can specify). Real is equivalent to float(24).
To be clear, precision loss can still occur when using division, but all other basic mathematical operations do not incur precision loss for Money and decimal types.
See here for an explanation of the various Transact SQL data types.