Database storage of longitude/latitude values in SQL Server: decimal(2, ?) - sql

In the table definition I saw:
Latitude -> varchar(50)
Longitude -> nvarchar(50)
Immediately, obviously, I queried the thinking behind this - being positively sure these values are in fact numerical by nature. Long story short: I postulated that these will be numerical, decimal in fact, and we would discard the 'thinking-in-strings' philosophy.
Now for the horns of my dilemma, I just went ahead and typed:
Latitude -> decimal(2, 4)
But hold on a second, 4 ain't right, right? Right. So I thought I'd up the threshold before realising (in a split second might I add) that 6 or 8 might not not cut it either. So, first things first...
Am I right in insisting we even go about it this way? And if so...
To what precision ought these values be stored to ensure we can persist the entire value which is to be inserted? For example, is there anything predefined by specification?
I don't just want to use something like Latitude -> decimal(2, 16) simply for it to be just as flawed as decimal(2, 2) in principle. And a similar question arises for Longitude specifically but I'm assuming the answer to one will suffice for the other, i.e decimal(3, answer).
We are using MSSQL Server 2005.
It seems I am educating myself with SQL Server by manual experience and therefore rendering parts of this question irrelevant: I can only use decimal(x, max(x)) not decimal(x, y) anyway! Will leave the question as is for input.

Decimal(2, 4) means 2 total digits of precision and 4 behind the decimal. SQL Server won't let you do that, but I think it would means you can store values from -0.0099 to 0.0099.
I'd recommend decimal(9, 6). This will store with an accuracy down to about 1/6th of an inch at the equator. Using 9 or less as the precision requires 5 bytes of storage, using 10-19 requires 9 bytes.

The maximum precision of decimals in sql server is currently 38. The scale can be up to 38. At it's max, a decimal will take up 17 bites, where as a varchar takes up whatever the length is plus 2. So if you went with a varchar(38), at it's max you're taking up 40 bits of data. The flip side is that a varchar is not as limited in size as a decimal is. So what you really need to do is figure out how many decimal points you're going to allow and then figure out your data type for it.
Source Info

Related

Decimal(19,4) or Decimal(19.2) - which should I use?

This sounds like a silly question, but I've noticed that in a lot of table designs for e-commerce related projects I almost always see decimal(19, 4) being used for currency.
Why the 4 on scale? Why not 2?
Perhaps I'm missing a potential calculation issue down the road?
First off - you are receiving some incorrect advice from other answers. Obseve the following (64-bit OS on 64-bit architecture):
declare #op1 decimal(18,2) = 0.01
,#op2 decimal(18,2) = 0.01;
select result = #op1 * #op2;
result
---------.---------.---------.---------
0.0001
(1 row(s) affected)
Note the number of underscores underneath the title - 39 in all. (I changed every tenth to a period to aid counting.) That is precisely enough for 38 digits (the maximum allowable, and the default on a 64 bit CPU) plus a decimal point on display. Although both operands were declared as decimal(18,2) the calculation was performed, and reported, in decimal(38,4) datatype. (I am running SQL 2012 on a 64 bit machine - some details may vary based on machine architecture and OS.)
Therefore, it is clear that no precision is being lost. On the contrary, only overflow can occur, not precision loss. This is a direct consequence of all calculations on decimal operands being performed as integer arithmetic. You will occasionally see artifacts of this in intelli-sense when the type of intermediate fields of decimal type are reported as being int instead.
Consider the example above. The two operands are both of type decimal(18,2) and are stored as being integers of value 1, with a scale of 2. When multiplied the product is still 1, but the scale is evaluated by adding the scales, to create a result of integer value 1 and scale 4, which is a value of 0.0001 and of type decimal(18,4), stored as an integer with value 1 and scale 4.
Read that last paragraph again.
Rinse and repeat once more.
In practice, on a 64 bit machine and OS, this is actually stored and carried forward as being of type *decimal (38,4) because the calculations are being done on a CPU where the extra bits are free.
To return to your question - All major currencies of the world (that I am aware of) only require 2 decimal places, but there are a handful where 4 are required, and there are financial transactions such as currency transactions and bond sales where 4 decimal places are mandated by law. When devising the money datatype Microsoft appears to have opted for the maximum scale that might be required rather than the normal scale required. Given how few transactions, and corporations, actually require precision greater than 19 digits this seems eminently sensible.
If you have:
A high expectation of only dealing with major currencies (which at the current time only require 2 digits of scale); and
No expectation of dealing with transactions that are mandated by law to require 4 digits of scale
then you would be safe to use type decimal with scale 2 (such as decimal(19,2) or decimal(18,2) or decimal(38,2)) instead of money. This will ease some of your conversions and, given the assumptions above, have no cost. A typical case where these assumptions are met is in a GL or Subledger accounting system tracking transactions to the penny. However, a stock- or bond-trading system would not meet these assumptions because 4 digits of scale are mandated by law in those case.
A way to distinguish the two cases is whether transactions are reported in cents or percents, which only require 2 digits of scale, or in basis points which require 4 digits of scale.
If you are at all unsure as to which case applies to your programming circumstance, consult your Controller or Director of Finance as to the legal and GAAP requirements for your application. (S)he will be able to give you definitive advice.
In SQL the 19 is amount of integers, the 4 is amount of decimals.
If you only have 2 decimals and you store maybe a result of some calculations, which results in more than 2 decimals, theres "no way" to store those additional decimals.
Some currencies operates with more than 2 decimals.
Use the data type decimal, not money.
Things like gas prices would use the extra "scale" positions. You've seen gas at $1.959 per gallon, right?
When use decimal it's up to you how you want to use according to your business requirements.
But when you will use Money data type in sql by default it stores with 4 decimal places.
although the OP's question is about the scale, let's lament on why 19 is a popular precision for decimal on SQL server.
according to this document, this is how much storage a decimal uses:
Precision
Storage bytes
1 - 9
5
10-19
9
20-28
13
29-38
17
so 1 precision uses as much space as 9, and 10 uses as much as 19.
in a real world scenario 9 can easily be too little for money, especially if you opt for a scale of 4, leaving you between -99999.9999 and 99999.9999.
but 19 is plenty for any imaginable cases, that's why SQL Server's money data type uses that.
one can use 28 or 38 to prevent errors at conversions in case some erroneous data hides in the database.

precision gains where data move from one table to another in sql server

There are three tables in our sql server 2008
transact_orders
transact_shipments
transact_child_orders.
Three of them have a common column carrying_cost. Data type is same in all the three tables.It is float with NUMERIC_PRECISION 53 and NUMERIC_PRECISION_RADIX 2.
In table 1 - transact_orders this column has value 5.1 for three rows. convert(decimal(20,15), carrying_cost) returns 5.100000..... here.
Table 2 - transact_shipments three rows are fetching carrying_cost from those three rows in transact_orders.
convert(decimal(20,15), carrying_cost) returns 5.100000..... here also.
Table 3 - transact_child_orders is summing up those three carrying costs from transact_shipments. And the value shown there is 15.3 when I run a normal select.
But convert(decimal(20,15), carrying_cost) returns 15.299999999999999 in this stable. And its showing that precision gained value in ui also. Though ui is only fetching the value, not doing any conversion. In the java code the variable which is fetching the value from the db is defined as double.
The code in step 3, to sum up the three carrying_costs is simple ::
...sum(isnull(transact_shipments.carrying_costs,0)) sum_carrying_costs,...
Any idea why this change occurs in the third step ? Any help will be appreciated. Please let me know if any more information is needed.
Rather than post a bunch of comments, I'll write an answer.
Floats are not suitable for precise values where you can't accept rounding errors - For example, finance.
Floats can scale from very small numbers, to very high numbers. But they don't do that without losing a degree of accuracy. You can look the details up on line, there is a host of good work out there for you to read.
But, simplistically, it's because they're true binary numbers - some decimal numbers just can't be represented as a binary value with 100% accuracy. (Just like 1/3 can't be represented with 100% accuracy in decimal.)
I'm not sure what is causing your performance issue with the DECIMAL data type, often it's because there is some implicit conversion going on. (You've got a float somewhere, or decimals with different definitions, etc.)
But regardless of the cause; nothing is faster than integer arithmetic. So, store your values are integers? £1.10 could be stored as 110p. Or, if you know you'll get some fractions of a pence for some reason, 11000dp (deci-pennies).
You do then need to consider the biggest value you will ever reach, and whether INT or BIGINT is more appropriate.
Also, when working with integers, be careful of divisions. If you divide £10 between 3 people, where does the last 1p need to go? £3.33 for two people and £3.34 for one person? £0.01 eaten by the bank? But, invariably, it should not get lost to the digital elves.
And, obviously, when presenting the number to a user, you then need to manipulate it back to £ rather than dp; but you need to do that often anyway, to get £10k or £10M, etc.
Whatever you do, and if you don't want rounding errors due to floating point values, don't use FLOAT.
(There is ALOT written on line about how to use floats, and more importantly, how not to. It's a big topic; just don't fall into the trap of "it's so accurate, it's amazing, it can do anything" - I can't count the number of time people have screwed up data using that unfortunately common but naive assumption.)

Why see -0,000000000000001 in access query?

I have an sql:
SELECT Sum(Field1), Sum(Field2), Sum(Field1)+Sum(Field2)
FROM Table
GROUP BY DateField
HAVING Sum(Field1)+Sum(Field2)<>0;
Problem is sometimes Sum of field1 and field2 is value like: 9.5-10.3 and the result is -0,800000000000001. Could anybody explain why this happens and how to solve it?
Problem is sometimes Sum of field1 and
field2 is value like: 9.5-10.3 and the
result is -0.800000000000001. Could
anybody explain why this happens and
how to solve it?
Why this happens
The float and double types store numbers in base 2, not in base 10. Sometimes, a number can be exactly represented in a finite number of bits.
9.5 → 1001.1
And sometimes it can't.
10.3 → 1010.0 1001 1001 1001 1001 1001 1001 1001 1001...
In the latter case, the number will get rounded to the closest value that can be represented as a double:
1010.0100110011001100110011001100110011001100110011010 base 2
= 10.300000000000000710542735760100185871124267578125 base 10
When the subtraction is done in binary, you get:
-0.11001100110011001100110011001100110011001100110100000
= -0.800000000000000710542735760100185871124267578125
Output routines will usually hide most of the "noise" digits.
Python 3.1 rounds it to -0.8000000000000007
SQLite 3.6 rounds it to -0.800000000000001.
printf %g rounds it to -0.8.
Note that, even on systems that display the value as -0.8, it's not the same as the best double approximation of -0.8, which is:
- 0.11001100110011001100110011001100110011001100110011010
= -0.8000000000000000444089209850062616169452667236328125
So, in any programming language using double, the expression 9.5 - 10.3 == -0.8 will be false.
The decimal non-solution
With questions like these, the most common answer is "use decimal arithmetic". This does indeed get better output in this particular example. Using Python's decimal.Decimal class:
>>> Decimal('9.5') - Decimal('10.3')
Decimal('-0.8')
However, you'll still have to deal with
>>> Decimal(1) / 3 * 3
Decimal('0.9999999999999999999999999999')
>>> Decimal(2).sqrt() ** 2
Decimal('1.999999999999999999999999999')
These may be more familiar rounding errors than the ones binary numbers have, but that doesn't make them less important.
In fact, binary fractions are more accurate than decimal fractions with the same number of bits, because of a combination of:
The hidden bit unique to base 2, and
The suboptimal radix economy of decimal.
It's also much faster (on PCs) because it has dedicated hardware.
There is nothing special about base ten. It's just an arbitrary choice based on the number of fingers we have.
It would be just as accurate to say that a newborn baby weighs 0x7.5 lb (in more familiar terms, 7 lb 5 oz) as to say that it weighs 7.3 lb. (Yes, there's a 0.2 oz difference between the two, but it's within tolerance.) In general, decimal provides no advantage in representing physical measurements.
Money is different
Unlike physical quantities which are measured to a certain level of precision, money is counted and thus an exact quantity. The quirk is that it's counted in multiples of 0.01 instead of multiples of 1 like most other discrete quantities.
If your "10.3" really means $10.30, then you should use a decimal number type to represent the value exactly.
(Unless you're working with historical stock prices from the days when they were in 1/16ths of a dollar, in which case binary is adequate anyway ;-) )
Otherwise, it's just a display issue.
You got an answer correct to 15 significant digits. That's correct for all practical purposes. If you just want to hide the "noise", use the SQL ROUND function.
I'm certain it is because the float data type (aka Double or Single in MS Access) is inexact. It is not like decimal which is a simple value scaled by a power of 10. If I'm remembering correctly, float values can have different denominators which means that they don't always convert back to base 10 exactly.
The cure is to change Field1 and Field2 from float/single/double to decimal or currency. If you give examples of the smallest and largest values you need to store, including the smallest and largest fractions needed such as 0.0001 or 0.9999, we can possibly advise you better.
Be aware that versions of Access before 2007 can have problems with ORDER BY on decimal values. Please read the comments on this post for some more perspective on this. In many cases, this would not be an issue for people, but in other cases it might be.
In general, float should be used for values that can end up being extremely small or large (smaller or larger than a decimal can hold). You need to understand that float maintains more accurate scale at the cost of some precision. That is, a decimal will overflow or underflow where a float can just keep on going. But the float only has a limited number of significant digits, whereas a decimal's digits are all significant.
If you can't change the column types, then in the meantime you can work around the problem by rounding your final calculation. Don't round until the very last possible moment.
Update
A criticism of my recommendation to use decimal has been leveled, not the point about unexpected ORDER BY results, but that float is overall more accurate with the same number of bits.
No contest to this fact. However, I think it is more common for people to be working with values that are in fact counted or are expected to be expressed in base ten. I see questions over and over in forums about what's wrong with their floating-point data types, and I don't see these same questions about decimal. That means to me that people should start off with decimal, and when they're ready for the leap to how and when to use float they can study up on it and start using it when they're competent.
In the meantime, while it may be a tad frustrating to have people always recommending decimal when you know it's not as accurate, don't let yourself get divorced from the real world where having more familiar rounding errors at the expense of very slightly reduced accuracy is of value.
Let me point out to my detractors that the example
Decimal(1) / 3 * 3 yielding 1.999999999999999999999999999
is, in what should be familiar words, "correct to 27 significant digits" which is "correct for all practical purposes."
So if we have two ways of doing what is practically speaking the same thing, and both of them can represent numbers very precisely out to a ludicrous number of significant digits, and both require rounding but one of them has markedly more familiar rounding errors than the other, I can't accept that recommending the more familiar one is in any way bad. What is a beginner to make of a system that can perform a - a and not get 0 as an answer? He's going to get confusion, and be stopped in his work while he tries to fathom it. Then he'll go ask for help on a message board, and get told the pat answer "use decimal". Then he'll be just fine for five more years, until he has grown enough to get curious one day and finally studies and really grasps what float is doing and becomes able to use it properly.
That said, in the final analysis I have to say that slamming me for recommending decimal seems just a little bit off in outer space.
Last, I would like to point out that the following statement is not strictly true, since it overgeneralizes:
The float and double types store numbers in base 2, not in base 10.
To be accurate, most modern systems store floating-point data types with a base of 2. But not all! Some use or have used base 10. For all I know, there are systems which use base 3 which is closer to e and thus has a more optimal radix economy than base 2 representations (as if that really mattered to 99.999% of all computer users). Additionally, saying "float and double types" could be a little misleading, since double IS float, but float isn't double. Float is short for floating-point, but Single and Double are float(ing point) subtypes which connote the total precision available. There are also the Single-Extended and Double-Extended floating point data types.
It is probably an effect of floating point number implementations. Sometimes numbers cannot be exactly represented, and sometimes the result of operations is slightly off what we may expect for the same reason.
The fix would be to use a rounding function on the values to cut off the extraneous digits. Like this (I've simply rounded to 4 significant digits after the decimal, but of course you should use whatever precision is appropriate for your data):
SELECT Sum(Field1), Sum(Field2), Round(Sum(Field1)+Sum(Field2), 4)
FROM Table
GROUP BY DateField
HAVING Round(Sum(Field1)+Sum(Field2), 4)<>0;

Is there a way to change SQL 2005 server's maximum decimal precision?

Help says:
By default, the maximum precision
returns 38.
Examples:
SELECT ##MAX_PRECISION
Of course it is. That should mean, you can somehow change it, right? But I can't find an option for it. Is there some hidden crypted registry key or something?
The problem is that not all applications support precision > 24 and treat such values as text O_o But aggregate functions always return max precision if they not forced to something else.
For example, i need only 15 digits in all queries that return decimals, and don't want to manually CAST every SUM/MIN/MAX operator to decimal(10, 5)...
The MAX_PRECISION simply reflects the maximum internal size of your SQL-Server's representation of floating point numbers. Thus you cannot change it. It's like a parameter telling you that you have 4 GB of memory installed. There is no registry hack to change that amount :-)
However you can specify less than this value in the column datatype or, as you pointed out, you can convert the results.

Is there any reason for numeric rather than int in T-SQL?

Why would someone use numeric(12, 0) datatype for a simple integer ID column? If you have a reason why this is better than int or bigint I would like to hear it.
We are not doing any math on this column, it is simply an ID used for foreign key linking.
I am compiling a list of programming errors and performance issues about a product, and I want to be sure they didn't do this for some logical reason. If you follow this link:
http://msdn.microsoft.com/en-us/library/ms187746.aspx
... you can see that the numeric(12, 0) uses 9 bytes of storage and being limited to 12 digits, theres a total of 2 trillion numbers if you include negatives. WHY would a person use this when they could use a bigint and get 10 million times as many numbers with one byte less storage. Furthermore, since this is being used as a product ID, the 4 billion numbers of a standard int would have been more than enough.
So before I grab the torches and pitch forks - tell me what they are going to say in their defense?
And no, I'm not making a huge deal out of nothing, there are hundreds of issues like this in the software, and it's all causing a huge performance problem and using too much space in the database. And we paid over a million bucks for this crap... so I take it kinda seriously.
Perhaps they're used to working with Oracle?
All numeric types including ints are normalized to a standard single representation among all platforms.
There are many reasons to use numeric - for example - financial data and other stuffs which need to be accurate to certain decimal places. However for the example you cited above, a simple int would have done.
Perhaps sloppy programmers working who didn't know how to to design a database ?
Before you take things too seriously, what is the data storage requirement for each row or set of rows for this item?
Your observation is correct, but you probably don't want to present it too strongly if you're reducing storage from 5000 bytes to 4090 bytes, for example.
You don't want to blow your credibility by bringing this up and having them point out that any measurable savings are negligible. ("Of course, many of our lesser-experienced staff also make the same mistake.")
Can you fill in these blanks?
with the data type change, we use
____ bytes of disk space instead of ____
____ ms per query instead of ____
____ network bandwidth instead of ____
____ network latency instead of ____
That's the kind of thing which will give you credibility.
How old is this application that you are looking into?
Previous to SQL Server 2000 there was no bigint. Maybe its just something that has made it from release to release for many years without being changed or the database schema was copied from an application that was this old?!?
In your example I can't think of any logical reason why you wouldn't use INT. I know there are probably reasons for other uses of numeric, but not in this instance.
According to: http://doc.ddart.net/mssql/sql70/da-db_1.htm
decimal
Fixed precision and scale numeric data from -10^38 -1 through 10^38 -1.
numeric
A synonym for decimal.
int
Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647).
It is impossible to know if there is a reason for them using decimal, since we have no code to look at though.
In some databases, using a decimal(10,0) creates a packed field which takes up less space. I know there are many tables around my work that use that. They probably had the same kind of thought here, but you have gone to the documentation and proven that to be incorrect. More than likely, I would say it will boil down to a case of "that's the way we have always done it, because someone one time said it was better".
It is possible they spend a LOT of time in MS Access and see 'Number' often and just figured, its a number, why not use numeric?
Based on your findings, it doesn't sound like they are the optimization experts, and just didn't know. I'm wondering if they used schema generation tools and just relied on them too much.
I wonder how efficient an index on a decimal value (even if 0 scale is set) for a primary key compares to a pure integer value.
Like Mark H. said, other than the indexing factor, this particular scenario likely isn't growing the database THAT much, but if you're looking for ammo, I think you did find some to belittle them with.
In your citation, the decimal shows precision of 1-9 as using 5 bytes. Your column apparently has 12,0 - using 4 bytes of storage - same as integer.
Moreover, INT, datatype can go to a power of 31:
-2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)
While decimal is much larger to 38:
- 10^38 +1 through 10^38 - 1
So the software creator was actually providing more while using the same amount of storage space.
Now, with the basics out of the way, the software creator actually limited themselves to just 12 numbers or 123,456,789,012 (just an example for place holders not a maximum number). If they used INT they could not scale this column - it would go up to the full 31 digits. Perhaps there is a business reason to limit this column and associated columns to 12 digits.
An INT is an INT, while a DECIMAL is scalar.
Hope this helps.
PS:
The whole number argument is:
A) Whole numbers are 0..infinity
B) Counting (Natural) numbers are 1..infinity
C) Integers are infinity (negative) .. infinity (positive)
D) I would not cite WikiANYTHING for anything. Come on, use a real source! May as well be http://MyPersonalMathCite.com