Is there a difference between length of a decimal? - sql

I have a column that has decimal lengths of 11 total digits (0.4335687532, -0.4216776567), however, I created the datatype in SQL as decimal(18,15), which ultimately adds five 0s at the end of the decimal. The reason I did this was because new data could have more digits, but there is no indication that it will. I am wondering if adding these decimals will ultimately impact any performance or anything else I might be missing.

The precision determines the length of the decimal value, not the scale. So, decimal(18, 5) and decimal(18, 15) typically occupy the same number of bytes on a data page.
The amount added no doubt depends on the database you are using -- each database is free to store decimals in its own format. But typically, the format would use one byte for every two decimal places and maybe another byte or two of overhead. So, the extra 5 zeros would typically require 2-3 more bytes of storage.
Such overhead is usually pretty minor when determining the behavior of a database. After all, the unit of reading and writing data is measured in data pages -- and these are usually measured in kilobytes to megabytes.
Depending on the system (hardware and software), arithmetic operations on slightly longer numerics might take slightly more time.
Overall, though, it is hard to think of a scenario where performance would be noticeably affected.

Related

When to use decimals or doubles

Quick Aside: I'm going to use the word "Float" to refer to both a .Net float and a SQL float with only 7 significant digits. I will use the word "Double" to refer to a .Net double and a SQL float with 15 significant digits. I also realize that this is very similar to some other posts regarding decimals/doubles, but the answers on those posts are really inconsistent, and I really want some recommendations for my specific circumstance...
I am part of a team that is rewriting an old application. The original app used floats (7 digits). This of course caused issues since the app conducted a lot of calculations and rounding errors accumulated very quickly. At some point, many of these floats were changed to decimals. Later, the floats (7) in the database all became doubles (15). After that we had several more errors with calculations involving doubles, and they too were changed to decimals.
Today about 1/3 of all of our floating point numbers in the database are decimals, the rest are doubles. My team wants to "standardize" all of our floating-point numbers in the database (and the new .Net code) to use either exclusively decimals or doubles except in cases where the other MUST be used. The majority of the team is set on using decimals; I'm the only person on my team advocating using doubles instead of decimals. Here's why...
Most of the numbers in the database are still doubles (though much of the application code still uses floats), and it would be a lot more effort to change all of the floats/doubles to decimals
For our app, none of the fields stored are "exact" decimal quantities. None of them are monetary quantities, and most represent some sort of "natural" measurement (e.g. mass, length, volume, etc.), so a double's 16 significant digits are already way more precise than even our initial measurements.
Many tables have measurements stored in two columns: 1 for the value; 1 for the unit of measure. This can lead to a HUGE difference in scale between the values in a single column. For example, one column can store a value in terms of pCi/g or Ci/m3 (1 Ci = 1000000000000 pCi). Since all the values in a single decimal columns must have the same scale (that is... an allocated number of digits both before and after the decimal point), I'm concerned that we will have overflow and rounding issues.
My teammates argue that:
Doubles are not as accurate nor as precise as decimals due to their inability to exactly represent 1/10 and that they only have 16 significant digits.
Even though we are not tracking money, the app is a inventory system that keeps track of material (mostly gram quantities) and it needs to be "as accurate as possible".
Even after the floats were changed to doubles, we continued to have bad results from calculations that used doubles. Changing these columns (and the application code) to decimals caused these calculations to produce the expected results.
It is my strong belief that the original issues where caused due to floats only having 7 significant digits and that simple arithmetic (e.g. 10001 * 10001) caused them to the data to quickly use up the few significant digits that they had. I do not believe this had anything to do with how binary-floating point numbers can only approximate decimal values, and I believe that using doubles would have fixed this issue.
I believe that the issue with doubles arose because doubles were used along side decimals in calculations that values were be converted back and forth between data types. Many of these calculations would round between intermediary steps in the calculation!
I'm trying to convince my team not to make everything under the sun into a decimal. Most values in the database don't have more than 5 or 6 significant digits anyway. Unfortunately, I am out-ranked by other members of my team that see things rather differently.
So, my question then is...
Am I worrying over nothing? Is there any real harm done by using almost exclusively decimals instead of doubles in an application with nearly 200 database tables, hundreds of transactions, and a rewrite schedule of 5 to 6 years?
Is using decimals actually solving an issue that doubles could not? From my research, both decimals and doubles are susceptible to rounding errors involving arbitrary fractions (adding 1/3 for example) and that the only way to account for this is to consider any value within a certain tolerance as being "equal" when comparing doubles and/or decimals.
If it is more appropriate to use doubles, what arguments could I make (other than what I have already made) could convince my team to not change everything to decimals?
Use decimal when you need perfect accuracy as a base-10 number (financial data, grades)
Use double or float when you are storing naturally imprecise data (measurements, temperature), want much faster mathematical operations, and can sacrifice a minute amount of imprecision.
Since you seem to be only storing various measurements (which have some precision anyways), float would be the logical choice (or double if you need more than 7 digits of precision).
Is using decimals actually solving an issue that doubles could not?
Not really - The data is only going to be as accurate as the measurements used to generate the data. Can you really say that a measured quantity is 123.4567 grams? Does the equipment used to measure it have that level of precision?
To deal with "rounding errors" I would argue that you can't really say whether a measurement of 1234.5 grams is exactly halfway - it could just as easily be 1234.49 grams, which would round down anyways.
What you need to decide is "what level of precision is acceptable" and always round to that precision as a last step. Don't round your data or intermediate calculations.
If it is more appropriate to use doubles, what arguments could I make (other than what I have already made) could convince my team to not change everything to decimals?
Other than the time spent switching, the only thing you're really sacrificing is speed. The only way to know how much speed is to try it both ways and measure the difference.
You'd better try your best not to lose precision. I guess my fault may convince you to choose double.
===> I did some wrong arithmetic, and it returns something very weird:
given 0.60, it returns 5
int get_index(double value) {
if (value < 0 || value > 1.00)
return -1;
return value / 0.10;
}
and I fixed it:
int get_index(double value) {
if (value < 0 || value > 1.00)
return -1;
return (value * 100000000) / (0.10 * 100000000);
}

Handling variable DECIMAL data in SQL

I have schedule job to pull data from our legacy system every month. The data can sometime swell and shrink. This cause havoc for DECIMAL precision.
I just found this job failed because DECIMAL(5,3) was too restrictive. I changed it to DECIMAL(6,3) and life is back on track.
Is there any way to evaluate this shifting data so it doesn't break on the DECIMAL()?
Thanks,
-Allen
Is there any way to evaluate this shifting data so it doesn't break on the DECIMAL()
Find the maximum value your data can have and set the column size appropriately.
Decimal columns have two size factors: scale and precision. Set your precision to as many deimal paces you need (3 in your case), and set the scale based on the largest possible number you can have.
A DECIMAL(5,3) has three digits of precision past the decimal and 5 total digits, so it can store numbers up to 99.999. If your data can be 100 or larger, use a bigger scale.
If your data is scientific in nature (e.g. temperature readings) and you don't care about exact equality, only showing trends, relative value, etc.) then you might use real instead. It takes less space than a DECIMAL(5,3) (4 bytes vs 5), has 7 digits of precision (vs. 5) and a range of -3.4E38 to 3.4E38 (vs -99.999 to 99.999).
DECIMAL is more suited for financial data or other data where exact equality is important (i.e. rounding errors are bad)

Is varchar(128) better than varchar(100)

Quick question. Does it matter from the point of storing data if I will use decimal field limits or hexadecimal (say 16,32,64 instead of 10,20,50)?
I ask because I wonder if this will have anything to do with clusters on HDD?
Thanks!
VARCHAR(128) is better than VARCHAR(100) if you need to store strings longer than 100 bytes.
Otherwise, there is very little to choose between them; you should choose the one that better fits the maximum length of the data you might need to store. You won't be able to measure the performance difference between them. All else apart, the DBMS probably only stores the data you send, so if your average string is, say, 16 bytes, it will only use 16 (or, more likely, 17 - allowing 1 byte for storing the length) bytes on disk. The bigger size might affect the calculation of how many rows can fit on a page - detrimentally. So choosing the smallest size that is adequate makes sense - waste not, want not.
So, in summary, there is precious little difference between the two in terms of performance or disk usage, and aligning to convenient binary boundaries doesn't really make a difference.
If it would be a C-Program I'd spend some time to think about that, too. But with a database I'd leave it to the DB engine.
DB programmers spent a lot of time in thinking about the best memory layout, so just tell the database what you need and it will store the data in a way that suits the DB engine best (usually).
If you want to align your data, you'll need exact knowledge of the internal data organization: How is the string stored? One, two or 4 bytes to store the length? Is it stored as plain byte sequence or encoded in UTF-8 UTF-16 UTF-32? Does the DB need extra bytes to identify NULL or > MAXINT values? Maybe the string is stored as a NUL-terminated byte sequence - then one byte more is needed internally.
Also with VARCHAR it is not neccessary true, that the DB will always allocate 100 (128) bytes for your string. Maybe it stores just a pointer to where space for the actual data is.
So I'd strongly suggest to use VARCHAR(100) if that is your requirement. If the DB decides to align it somehow there's room for extra internal data, too.
Other way around: Let's assume you use VARCHAR(128) and all things come together: The DB allocates 128 bytes for your data. Additionally it needs 2 bytes more to store the actual string length - makes 130 bytes - and then it could be that the DB aligns the data to the next (let's say 32 byte) boundary: The actual data needed on the disk is now 160 bytes 8-}
Yes but it's not that simple. Sometimes 128 can be better than 100 and sometimes, it's the other way around.
So what is going on? varchar only allocates space as necessary so if you store hello world in a varchar(100) it will take exactly the same amount of space as in a varchar(128).
The question is: If you fill up the rows, will you hit a "block" limit/boundary or not?
Databases store their data in blocks. These have a fixed size, for example 512 (this value can be configured for some databases). So the question is: How many blocks does the DB have to read to fetch each row? Rows that span several block will need more I/O, so this will slow you down.
But again: This doesn't depend on the theoretical maximum size of the columns but on a) how many columns you have (each column needs a little bit of space even when it's empty or null), b) how many fixed width columns you have (number/decimal, char), and finally c) how much data you have in variable columns.

SQL data type for physical science measurements - double or decimal

Someone mentioned in this decimal vs double! - Which one should I use and when? post that it's best to use double for physical science computations. Does this apply to measurements such as flash point, viscosity, weight and volume? Can someone explain further?
Decimal = exact. That is, I have £1.23 in my pocket and applying VAT, say, is a known and accepted rounding issue.
When you measure something, you can never be exact. If something is 123 centimeters long then strictly speaking it's somewhere between 122.5 and 123.49999 centimeters long....
You are dealing with 2 different kinds of quantities.
So, use double for this.
The question you link to is about the data types in C#, not SQL, and the answers reflect that.
Decimal is a datatype used and created for dealing with currency, making sure calculations balance out (no lost cents or fractions of, for example).
Physical scientific computations rarely deal with money values, hence, you should use double whenever you need the accuracy.
Generally you would always use doubles for recording physical measurements unless you have a good reason to do otherwise. There is potentially a miniscule loss of accuracy when using any floating point number since binary is unable to perfectly represent certain decimal numbers (0.1 being the most obvious example) but the inaccuracy in the double representation is going to be many orders of magnitude smaller than the error in the measurements you take.
Decimal is used where it's very important that numbers are represented exactly so typically only when dealing with money (yes, it would seem we care more about money than science!)
Knowing which database this is for would make things easier...
Oracle only has NUMBER, which if you omit the two optional parameters precision and scale - is a float. Using both parameters is DECIMAL, and only specifying the precision is INTEGER. Can't remember how to get REAL...
MySQL Numeric data type info: http://dev.mysql.com/doc/refman/5.0/en/numeric-types.html
SQL Server Numeric data type info: http://msdn.microsoft.com/en-us/library/aa258271(SQL.80).aspx
I haven't dealt with float and real much, but I've heard they aren't great for mathematical computations. I've used DECIMAL for varying precision, not just for monetary values.
What data type to use depends on the data, and how you intend to use that data.

Storing money in a decimal column - what precision and scale?

I'm using a decimal column to store money values on a database, and today I was wondering what precision and scale to use.
Since supposedly char columns of a fixed width are more efficient, I was thinking the same could be true for decimal columns. Is it?
And what precision and scale should I use? I was thinking precision 24/8. Is that overkill, not enough or ok?
This is what I've decided to do:
Store the conversion rates (when applicable) in the transaction table itself, as a float
Store the currency in the account table
The transaction amount will be a DECIMAL(19,4)
All calculations using a conversion rate will be handled by my application so I keep control of rounding issues
I don't think a float for the conversion rate is an issue, since it's mostly for reference, and I'll be casting it to a decimal anyway.
Thank you all for your valuable input.
If you are looking for a one-size-fits-all, I'd suggest DECIMAL(19, 4) is a popular choice (a quick Google bears this out). I think this originates from the old VBA/Access/Jet Currency data type, being the first fixed point decimal type in the language; Decimal only came in 'version 1.0' style (i.e. not fully implemented) in VB6/VBA6/Jet 4.0.
The rule of thumb for storage of fixed point decimal values is to store at least one more decimal place than you actually require to allow for rounding. One of the reasons for mapping the old Currency type in the front end to DECIMAL(19, 4) type in the back end was that Currency exhibited bankers' rounding by nature, whereas DECIMAL(p, s) rounded by truncation.
An extra decimal place in storage for DECIMAL allows a custom rounding algorithm to be implemented rather than taking the vendor's default (and bankers' rounding is alarming, to say the least, for a designer expecting all values ending in .5 to round away from zero).
Yes, DECIMAL(24, 8) sounds like overkill to me. Most currencies are quoted to four or five decimal places. I know of situations where a decimal scale of 8 (or more) is required but this is where a 'normal' monetary amount (say four decimal places) has been pro rata'd, implying the decimal precision should be reduced accordingly (also consider a floating point type in such circumstances). And no one has that much money nowadays to require a decimal precision of 24 :)
However, rather than a one-size-fits-all approach, some research may be in order. Ask your designer or domain expert about accounting rules which may be applicable: GAAP, EU, etc. I vaguely recall some EU intra-state transfers with explicit rules for rounding to five decimal places, therefore using DECIMAL(p, 6) for storage. Accountants generally seem to favour four decimal places.
PS Avoid SQL Server's MONEY data type because it has serious issues with accuracy when rounding, among other considerations such as portability etc. See Aaron Bertrand's blog.
Microsoft and language designers chose banker's rounding because hardware designers chose it [citation?]. It is enshrined in the Institute of Electrical and Electronics Engineers (IEEE) standards, for example. And hardware designers chose it because mathematicians prefer it. See Wikipedia; to paraphrase: The 1906 edition of Probability and Theory of Errors called this 'the computer's rule' ("computers" meaning humans who perform computations).
We recently implemented a system that needs to handle values in multiple currencies and convert between them, and figured out a few things the hard way.
NEVER USE FLOATING POINT NUMBERS FOR MONEY
Floating point arithmetic introduces inaccuracies that may not be noticed until they've screwed something up. All values should be stored as either integers or fixed-decimal types, and if you choose to use a fixed-decimal type then make sure you understand exactly what that type does under the hood (ie, does it internally use an integer or floating point type).
When you do need to do calculations or conversions:
Convert values to floating point
Calculate new value
Round the number and convert it back to an integer
When converting a floating point number back to an integer in step 3, don't just cast it - use a math function to round it first. This will usually be round, though in special cases it could be floor or ceil. Know the difference and choose carefully.
Store the type of a number alongside the value
This may not be as important for you if you're only handling one currency, but it was important for us in handling multiple currencies. We used the 3-character code for a currency, such as USD, GBP, JPY, EUR, etc.
Depending on the situation, it may also be helpful to store:
Whether the number is before or after tax (and what the tax rate was)
Whether the number is the result of a conversion (and what it was converted from)
Know the accuracy bounds of the numbers you're dealing with
For real values, you want to be as precise as the smallest unit of the currency. This means you have no values smaller than a cent, a penny, a yen, a fen, etc. Don't store values with higher accuracy than that for no reason.
Internally, you may choose to deal with smaller values, in which case that's a different type of currency value. Make sure your code knows which is which and doesn't get them mixed up. Avoid using floating point values even here.
Adding all those rules together, we decided on the following rules. In running code, currencies are stored using an integer for the smallest unit.
class Currency {
String code; // eg "USD"
int value; // eg 2500
boolean converted;
}
class Price {
Currency grossValue;
Currency netValue;
Tax taxRate;
}
In the database, the values are stored as a string in the following format:
USD:2500
That stores the value of $25.00. We were able to do that only because the code that deals with currencies doesn't need to be within the database layer itself, so all values can be converted into memory first. Other situations will no doubt lend themselves to other solutions.
And in case I didn't make it clear earlier, don't use float!
When handling money in MySQL, use DECIMAL(13,2) if you know the precision of your money values or use DOUBLE if you just want a quick good-enough approximate value.
So if your application needs to handle money values up to a trillion dollars (or euros or pounds), then this should work:
DECIMAL(13, 2)
Or, if you need to comply with GAAP then use:
DECIMAL(13, 4)
The money datatype on SQL Server has four digits after the decimal.
From SQL Server 2000 Books Online:
Monetary data represents positive or negative amounts of money. In Microsoft® SQL Server™ 2000, monetary data is stored using the money and smallmoney data types. Monetary data can be stored to an accuracy of four decimal places. Use the money data type to store values in the range from -922,337,203,685,477.5808 through +922,337,203,685,477.5807 (requires 8 bytes to store a value). Use the smallmoney data type to store values in the range from -214,748.3648 through 214,748.3647 (requires 4 bytes to store a value). If a greater number of decimal places are required, use the decimal data type instead.
4 decimal places would give you the accuracy to store the world's smallest currency sub-units. You can take it down further if you need micropayment (nanopayment?!) accuracy.
I too prefer DECIMAL to DBMS-specific money types, you're safer keeping that kind of logic in the application IMO. Another approach along the same lines is simply to use a [long] integer, with formatting into ¤unit.subunit for human readability (¤ = currency symbol) done at the application level.
If you were using IBM Informix Dynamic Server, you would have a MONEY type which is a minor variant on the DECIMAL or NUMERIC type. It is always a fixed-point type (whereas DECIMAL can be a floating point type). You can specify a scale from 1 to 32, and a precision from 0 to 32 (defaulting to a scale of 16 and a precision of 2). So, depending on what you need to store, you might use DECIMAL(16,2) - still big enough to hold the US Federal Deficit, to the nearest cent - or you might use a smaller range, or more decimal places.
Sometimes you will need to go to less than a cent and there are international currencies that use very large demoniations. For example, you might charge your customers 0.088 cents per transaction. In my Oracle database the columns are defined as NUMBER(20,4)
If you're going to be doing any sort of arithmetic operations in the DB (multiplying out billing rates and so on), you'll probably want a lot more precision than people here are suggesting, for the same reasons that you'd never want to use anything less than a double-precision floating point value in application code.
I would think that for a large part your or your client's requirements should dictate what precision and scale to use. For example, for the e-commerce website I am working on that deals with money in GBP only, I have been required to keep it to Decimal( 6, 2 ).
A late answer here, but I've used
DECIMAL(13,2)
which I'm right in thinking should allow upto 99,999,999,999.99.