Using a TSQL update command against a SQLServer database, how can I update a column of type FLOAT with the smallest possible double value?
The smallest possible double value in hex notation being 3ff0 0000 0000 0001
(http://en.wikipedia.org/wiki/Double_precision)
Whatever it is you need this for, I suggest you consider alternatives that don't require assumptions about SQL Server's FLOAT type. Unfortunately, SQL Server is rather flaky about IEEE 854 compliance. For example, see these newsgroup threads. Also note that SQL Server's behavior in this regard has changed between versions and may well change again without warning. Without giving all the gory details, the smallest value you can assign directly to a FLOAT is not necessarily the smallest value a FLOAT can contain (without complaining). Some of SQL Server's flakiness also revolves around what IEEE calls "denormalized" floating point numbers, consideration of which is important if you want "smallest" to have a precise meaning.
Sorry not to answer your question, but I don't think much good can come from answers that help you head further down the rocky path you're on.
Don't, use decimal to avoid losing the value. Or binary(8) if you want to store it in hex, as per the article.
Seriously: depending on the CPU, the wind direction and moon phase, it's highly likely won't get the same value when you manage to convert it.
And what do you mean by "smallest"? 3ff0 0000 0000 0001 = 1.0000000000000002
And, is it hex in the client but you want float in the db?
Related
I expected to find the answer to this question fairly quickly, but surprisingly, don't seem to see it anywhere.
I'm guessing that a comparison to a binary constant in an SQL query would be faster than a comparison to a decimal number, as the binary constant is probably a direct lookup while decimal numbers need to be converted, but is the performance difference measurable?
In other words, is the first query better than the second one? If so, how much better?
select *
from Cats
where Cats_Id = 0x0000000000000086
select *
from Cats
where Cats_Id = 134
There absolutely no difference: 0x0000000000000086 is an integer with a decimal value of 134. It's just written in base 16 (hexadecimal).
The two queries are exactly identical and will get exactly the same execution plan.
The one different will be if the column you are comparing to is binary(n) or varbinary(n). There the hex constant is representing a sequence of octets.
Your premise is based on a misunderstanding:
the binary constant is probably a direct lookup while decimal numbers need to be converted
The SQL you enter consists of characters in some text encoding; for simplicity, let's assume ASCII.
In the computer's memory, it's composed of a long string of binary states we normally write as 0 and 1, but could equally write as _ and |.
The binary data for the string 134 in ASCII looks something like __||___|__||__||__||_|__. The binary data for the string 0x0086 looks something like __||_____||||_____||______||______|||_____||_||_
When actually working with the data, e.g. comparing numbers, the computer will use a different representation altogether. The number "one hundred and thirty-four" will look something more like |____||_.
So whichever representation you use, there is a conversion going on.
Nonetheless, one conversion might be more efficient, by some incidental detail of its implementation, but by such a tiny margin that it would be almost impossible to measure amongst the noise of the system you're testing.
The answer is yes. In some cases, the query with the hexadecimal value is substantially better.
I hired a consultant DBA to help us with our system and after reviewing one of our queries, which was running a bit slow, he showed me that changing the value to hexadecimal improved it substantially (by about 95%).
He then showed me that, although I had an index on the field I was searching (binary foreign key), the index wasn't used when executing the query with the decimal value.
If anyone can provide a more detailed answer about the different cases in which these queries are identical or not, performance wise, I would appreciate that.
Can anyone explain the following results in SQL Server? I'm stumped.
declare #mynum float = 8.31
select ceiling( #mynum*100)
Results in 831
declare #mynum float = 8.21
select ceiling( #mynum*100)
Results in 822
I've tested a whole range of numbers (in SQL Server 2012). Some increase while others stay the same. I'm at a loss understanding why ceiling is treating some of them differently. Changing from a float to a decimal(18,5) seems to fix the problem but I'm wary there may be other repercussions I'm missing from doing so. Any explanations would help.
I think this is called float precision. You can find it in almost all programming languages and in Database too. This is because data is stored only with some precision and in fact what you set as 8.31 is probably not 8.31 but for example 8.31631312381813 and when multiply it and ceil it may cause that different value appear.
At SQL server documentation page you can read:
Approximate-number data types for use with floating point numeric data. Floating point data is approximate; therefore, not all values in the data type range can be represented exactly.
In other database systems the same problem exists. For example at mysql website you can read:
Floating-point numbers sometimes cause confusion because they are approximate and not stored as exact values. A floating-point value as written in an SQL statement may not be the same as the value represented internally. Attempts to treat floating-point values as exact in comparisons may lead to problems. They are also subject to platform or implementation dependencies. The FLOAT and DOUBLE data types are subject to these issues. For DECIMAL columns, MySQL performs operations with a precision of 65 decimal digits, which should solve most common inaccuracy problems.
Floating point are not 100% accurate. Like Marcin Nabiałek wrote the 8.31 you see is probably represented by something else, something like 8.310000000001. See here for some interesting reading about the accuracy problem of floating point.
Solution is not to use floating point data types unless you really have to. You should rather use DECIMAL or MONEY data types.
If you really have to use a floating point data type, then you can add or subtract a small value (the accuracy thresold or epsilon) before every floor, ceiling or comparison operations to get the precision you want. If you have a lot of floating point operations then it might be worth it to code your own floating point comparison functions.
I am attempting to store a float in my SQLite3 database using java. When I go to store the number 1.2 in the database, it is actually stored as 1.199999998 & the same occurs for every even number (1.4, 1.6, etc.).
This makes is really diffult to delete rows because I delete a row according to its version column(whose type =float). So this line wont work:
"DELETE FROM tbl WHERE version=1.2"
Thats because there is no 1.2 but only 1.19999998. How can I make sure that when I store a float in my SQLite3 DB, that it is the exact number I input?
Don't use a float if you need precise accuracy. Try a decimal instead.
Remember that the 1.2 you put in your source code or that the user entered into a textbox and ultimately ended up in the database is actually stored as a binary value (usually in a format known as IEEE754). To understand why this is a problem, try converting 1.2 (1 1/5) to binary by hand (binary .1 is 1/2, .01 is 1/4) and see what you end up with:
1.001100110011001100110011001100110011
You can save time by using this converter (ignore the last "1" that breaks the cycle at the site —its because the converter had to round the last digit).
As you can see, it's a repeating pattern. This goes on pretty much forever. It would be like trying to represent 1/3 as a decimal. To get around this problem, most programming languages have a decimal type (as opposed to float or double) that keeps a base 10 representation. However, calculations done using this type are orders of magnitude slower, and so it's typically reserved for financial transactions and the like.
This is the very nature of floating point numbers. They are not exact.
I'd suggest you either use an integer, or text field to store a version.
You should never rely on the accuracy of a float or a double. A float should never be used for keys in a data base or to represent money.
You should probably use decimal in this case.
Floats are not an accurate data type. They are designed to be fast, have a large range of values, and have a small memory footprint.
They are usually implemented using the IEEE standard
http://en.wikipedia.org/wiki/IEEE_754-2008
As Joel Coehoorn has pointed out, 1.2 is the recurring fraction 1.0011 0011 0011... in binary and can't be exactly represented in a finite number of bits.
The closest you can get with an IEEE 754 float is 1.2000000476837158203125. The closest you can get with a double is 1.1999999999999999555910790149937383830547332763671875. I don't know where you're getting 1.199999998 from.
Floating-point was designed for representing approximate quantities: Physical measurements (a swimming pool is never exactly 1.2 meters deep), or irrational-valued functions like sqrt, log, or sin. If you need a value accurate to 15 significant digits, it works fine. If you truly need an exact value, not so much.
For a version number, a more appropriate representation would be a pair of integers: One for the major version and one for the minor version. This would also correctly handle the sequence 1.0, 1.1, ..., 1.9, 1.10, 1.11, which would sort incorrectly in a REAL column.
Someone mentioned in this decimal vs double! - Which one should I use and when? post that it's best to use double for physical science computations. Does this apply to measurements such as flash point, viscosity, weight and volume? Can someone explain further?
Decimal = exact. That is, I have £1.23 in my pocket and applying VAT, say, is a known and accepted rounding issue.
When you measure something, you can never be exact. If something is 123 centimeters long then strictly speaking it's somewhere between 122.5 and 123.49999 centimeters long....
You are dealing with 2 different kinds of quantities.
So, use double for this.
The question you link to is about the data types in C#, not SQL, and the answers reflect that.
Decimal is a datatype used and created for dealing with currency, making sure calculations balance out (no lost cents or fractions of, for example).
Physical scientific computations rarely deal with money values, hence, you should use double whenever you need the accuracy.
Generally you would always use doubles for recording physical measurements unless you have a good reason to do otherwise. There is potentially a miniscule loss of accuracy when using any floating point number since binary is unable to perfectly represent certain decimal numbers (0.1 being the most obvious example) but the inaccuracy in the double representation is going to be many orders of magnitude smaller than the error in the measurements you take.
Decimal is used where it's very important that numbers are represented exactly so typically only when dealing with money (yes, it would seem we care more about money than science!)
Knowing which database this is for would make things easier...
Oracle only has NUMBER, which if you omit the two optional parameters precision and scale - is a float. Using both parameters is DECIMAL, and only specifying the precision is INTEGER. Can't remember how to get REAL...
MySQL Numeric data type info: http://dev.mysql.com/doc/refman/5.0/en/numeric-types.html
SQL Server Numeric data type info: http://msdn.microsoft.com/en-us/library/aa258271(SQL.80).aspx
I haven't dealt with float and real much, but I've heard they aren't great for mathematical computations. I've used DECIMAL for varying precision, not just for monetary values.
What data type to use depends on the data, and how you intend to use that data.
Why would someone use numeric(12, 0) datatype for a simple integer ID column? If you have a reason why this is better than int or bigint I would like to hear it.
We are not doing any math on this column, it is simply an ID used for foreign key linking.
I am compiling a list of programming errors and performance issues about a product, and I want to be sure they didn't do this for some logical reason. If you follow this link:
http://msdn.microsoft.com/en-us/library/ms187746.aspx
... you can see that the numeric(12, 0) uses 9 bytes of storage and being limited to 12 digits, theres a total of 2 trillion numbers if you include negatives. WHY would a person use this when they could use a bigint and get 10 million times as many numbers with one byte less storage. Furthermore, since this is being used as a product ID, the 4 billion numbers of a standard int would have been more than enough.
So before I grab the torches and pitch forks - tell me what they are going to say in their defense?
And no, I'm not making a huge deal out of nothing, there are hundreds of issues like this in the software, and it's all causing a huge performance problem and using too much space in the database. And we paid over a million bucks for this crap... so I take it kinda seriously.
Perhaps they're used to working with Oracle?
All numeric types including ints are normalized to a standard single representation among all platforms.
There are many reasons to use numeric - for example - financial data and other stuffs which need to be accurate to certain decimal places. However for the example you cited above, a simple int would have done.
Perhaps sloppy programmers working who didn't know how to to design a database ?
Before you take things too seriously, what is the data storage requirement for each row or set of rows for this item?
Your observation is correct, but you probably don't want to present it too strongly if you're reducing storage from 5000 bytes to 4090 bytes, for example.
You don't want to blow your credibility by bringing this up and having them point out that any measurable savings are negligible. ("Of course, many of our lesser-experienced staff also make the same mistake.")
Can you fill in these blanks?
with the data type change, we use
____ bytes of disk space instead of ____
____ ms per query instead of ____
____ network bandwidth instead of ____
____ network latency instead of ____
That's the kind of thing which will give you credibility.
How old is this application that you are looking into?
Previous to SQL Server 2000 there was no bigint. Maybe its just something that has made it from release to release for many years without being changed or the database schema was copied from an application that was this old?!?
In your example I can't think of any logical reason why you wouldn't use INT. I know there are probably reasons for other uses of numeric, but not in this instance.
According to: http://doc.ddart.net/mssql/sql70/da-db_1.htm
decimal
Fixed precision and scale numeric data from -10^38 -1 through 10^38 -1.
numeric
A synonym for decimal.
int
Integer (whole number) data from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647).
It is impossible to know if there is a reason for them using decimal, since we have no code to look at though.
In some databases, using a decimal(10,0) creates a packed field which takes up less space. I know there are many tables around my work that use that. They probably had the same kind of thought here, but you have gone to the documentation and proven that to be incorrect. More than likely, I would say it will boil down to a case of "that's the way we have always done it, because someone one time said it was better".
It is possible they spend a LOT of time in MS Access and see 'Number' often and just figured, its a number, why not use numeric?
Based on your findings, it doesn't sound like they are the optimization experts, and just didn't know. I'm wondering if they used schema generation tools and just relied on them too much.
I wonder how efficient an index on a decimal value (even if 0 scale is set) for a primary key compares to a pure integer value.
Like Mark H. said, other than the indexing factor, this particular scenario likely isn't growing the database THAT much, but if you're looking for ammo, I think you did find some to belittle them with.
In your citation, the decimal shows precision of 1-9 as using 5 bytes. Your column apparently has 12,0 - using 4 bytes of storage - same as integer.
Moreover, INT, datatype can go to a power of 31:
-2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)
While decimal is much larger to 38:
- 10^38 +1 through 10^38 - 1
So the software creator was actually providing more while using the same amount of storage space.
Now, with the basics out of the way, the software creator actually limited themselves to just 12 numbers or 123,456,789,012 (just an example for place holders not a maximum number). If they used INT they could not scale this column - it would go up to the full 31 digits. Perhaps there is a business reason to limit this column and associated columns to 12 digits.
An INT is an INT, while a DECIMAL is scalar.
Hope this helps.
PS:
The whole number argument is:
A) Whole numbers are 0..infinity
B) Counting (Natural) numbers are 1..infinity
C) Integers are infinity (negative) .. infinity (positive)
D) I would not cite WikiANYTHING for anything. Come on, use a real source! May as well be http://MyPersonalMathCite.com