SQL - Which data type represents percentages well? - sql

In SQL I am looking at decimal and float. Float says it is an approximation. I need to store percentages. They don't have to be very large or small. Some examples are
60.2
40
Which data type should I use?

decimal(x,y)
x is the total number of digits you want to be able to represent
y is the total number of digits after the decimal point that you want to be able to represent

Related

SQL Server - float vs varchar

In SQL Server, I have decimal data to be stored in a table (which is never used for joins or filtering). This decimal data is variable - 80% of the time it has single digit values (1, 4, 5) and remaining 20% are with 16 digit decimals (0.8999999761581421, 3.0999999046325684).
I am wondering If I can save any storage space going with varchar instead of float, or if I should stick with float since this is numeric data?
Here's an interesting observation:
Start with the mathematical value 0.9
Convert that to a binary number. For the same reason that 1/3 cannot be expressed in a finite number of digits in base 10, the number 0.9 cannot be expressed in a finite number of digits in base 2. The exact mathematical value is:
0.1 1100 1100 1100 1100 1100 1100 1100 ... with the "1100" repeating forever.
Let's store this value in an IEEE-754 single-precision floating-point value. (In SQL Server, this is called REAL type). To do that we have to round to 23 significant bits. The result is:
0.1 1100 1100 1100 1100 1100 11
Convert this to its exact decimal equivalent, you get this:
0.89999997615814208984375
Round that to 16 places after the decimal point. You get:
0.8999999761581421
Which is coincidentally the value you show as your example.
If you do the same thing to 3.1, you get 3.0999999046325684
Is it possible that all your inputs are simply numbers with one digit after the decimal point, which have been stored as a floating-point value, and then converted back into decimal?
Always use the most appropriate datatype! Since this is clearly numerical data - use a numerical type. This will allow to e.g. sum the values, order by those values - those are numbers - so treat and store them as such!!
If you need to support fractions, you could use FLOAT or REAL, but those are notorious for rounding errors etc. Using DECIMAL(p,s) avoids those pitfalls - it's stable, it's precise, not prone to rounding errors. So that would be my logical choice.
See the official MS docs for DECIMAL for your details on how to define the p (precision - total number of digits overall) and s (scale - number of digits after the decimal point).
And btw: those are stored in fewer bytes that a varchar column large enough to hold these values would be!

Non-standard number data types in Oracle

Besides the "usual" number data type where precision is greater then scale there are many "non-standard" number data types where scale greater then precision or where scale is negative.
For example
NUMBER(2, 5) means that there are 5 digits in the fractional part, 3 of them is obligatory zeros.
NUMBER(2,-6) Here the scale is -6, which means the value is rounded to millions and the precision is 2, so 2 significant digits can be stored.
Can somebody provide examples of using such data types in practice?

Storing big numbers in BigQuery

BigQuery documentation says that the NUMERIC data type accepts up to 38 digits of precision but yet I couldn't store this as a number when creating a table:
92540646618304498066684287400725037055
According to the documentation, the NUMERIC data type is set to 38 digits of precision and 9 digits of scale.
Precision is the total number of digits and scale is the number digits after the decimal point, so the largest integer you can store will be 38 - 9 = 29 digits (to the left of the decimal point).
The official range of the data type is -99999999999999999999999999999.999999999 to 99999999999999999999999999999.999999999

SQL SUM - I don't want to loose precision when summing floating point data

I know when you work with Money it's better (if not imperative) to use Decimal data type, especially when you work with Large Amount of Money :). But I want to store price of my products as less memory demanding float numbers because they don't really need such a precision. Now when i want to calculate the whole Income of the products sold, it could become a very large number and it must have great precision too. I want to know what would be the result if I do this summation by SUM keyword in a SQL query. I guess it will be stored in a Double variable and this surely lose some precision. How can I force it to do calculation using Decimal numbers? Perhaps someone who knows about the internals of SQL engines could answer my question. It's good to mention that I use Access Database Engine, but any general answer would be appreciated too. This might be an example of the query I would use:
SELECT SUM(Price * Qty) FROM Invoices
or
SELECT SUM(Amount) FROM Invoices
Amount and Price are stored as float(Single) data type and Qty as int32.
If you want to do the calculation as a double, then cast one of the values to that type:
SELECT SUM(cast(Price as double) * Qty)
FROM Invoices;
SELECT SUM(cast(Amount as double))
FROM Invoices;
real double precision
Note that naming is not consistent among databases. For instance "binary_float" is 5 bytes (based on IEEE 4-byte float) and "binary_double" is 9 bytes (based on IEEE 8-bytes double). But, "float" is 8-bytes in SQL Server, but 4-byte in MySQL. SQL Server and Postgres use "real" for the 4-byte version. MySQL and Postgres use "double" for the 8-byte version.
EDIT:
After writing this, I saw the reference to Access in the question (this should really be a tag). In Access, you would use cdbl() instead of cast():
SELECT SUM(cdbl(Price) * Qty)
FROM Invoices;
SELECT SUM(cdbl(Amount))
FROM Invoices;
If the choice is between a float and a 4-byte (unsigned) int (both requiring the same amount of storage in memory) there are pros and cons:
The float cannot accurately handle cents assuming that a price has
the format $$$$$.cc (1/100ths are not precisely representable in the
floating-point - single as well as double - format), so this will
introduce rounding errors which are usually unacceptable in
money-related applications.
The int - assuming that you express the price in cents - will allow
precise values in the range -2^31 to 2^31-2^0 (about 2 * 10^9) cents for signed values
and 0 to 2^32-2^0 (about 4 * 10^9) for unsigned. The downside is that it may feel
"unnatural" to use cents instead of dollars and cents but this is
mostly a problem inside the developers mind: the actual "problems" -
if you wish to call them that - arise when printing the values in
dollars and cents which require a slightly more complex formatting
but this is a very small price in relation to how the rest of the
application can be simplified.
Later, when summing or performing other calculations - the integer cent and quantity values are first converted to double precision floating-point. The double precision format allows expressing integer values (assuming integer cents) precisiely in the range -(2^53-2^0) to 2^53-2^0 which probably (you need to check) covers your needs. Keep in mind, though, that you will still have integer cents in the double which need to be converted to dollars and cents.
EDIT_______________________
"6-7 decimal digits of precision" is most easily explained by the range of integers representable in the single-precision format. Since the SP format significand is 24 bits long (1 implicit + 23 explicit) this allows integers in the range 2^0 to 2^24-2^0 or 1 to 16777215. 16777215 is more than six (999999) but less than seven (9999999) decimal digits, hence "6-7 decimal digits." The double-precision format features a 53 bit significand (1 + 52) which results in an integer range of 2^0 to 2^53-2^0.
The real SP precision is "24 sequential binary digits of precision."
If you can make do with cents in 50 unit increments your range in SP will be 2^-1 to 2^23-2^-1 or 0.5 to 8388607.5
If you can make do with cents in 25 unit increments your range in SP will be 2^-2 to 2^22-2^-2 or 0.25 to 4194303.75.
Actually, as #Phylogenesis said in the first comment, when I think about, we don't sell enough items to overflow the precision on a double value, just like items are not expensive enough to overflow the precision on a float value.As I guessed, I tested and found that if you run simple SELECT SUM(Amount) FROM Invoices query, the result will be a double value. But following what suggested by #Gordon Linoff, the safest approach for obsessive-compulsive people is to use a cast to Decimal or Currency(Access). So the query in Access syntax will be:
SELECT SUM(CCur(Price) * Qty)
FROM Invoices;
SELECT SUM(CCur(Amount))
FROM Invoices;
which CCur function converts Single(c# float) values to Currency(c# decimal). Its good to know that conversion to Double is not necessary, because the engine does it itself. So the easier approach which is also safe is to just run the simple query.

Database field definitions

I need to do a data migration from a data base and I'm not too familiar with databases so I would like some clarification. I have some documentation that could apply to either an Oracle or a SQL database and it has a column defined as NUMBER(10,5). I would like to know what this means. I think it means that the number has 10 digits with 5 after the decimal point, but I would like clarification. Also would this be different for either SQL or Oracle?
The first number is precision the second number is scale. The equivalent in SQL Server can be as Decimal / Numeric and you could define it like so:
DECLARE #MyDec decimal(18,2)
The 18 is the max total number of decimal digits that can be stored (that is the total number of digits, for instance 123.45 the precision here is 5, while the scale is 2). The 2 is the scale and it specifies the max number of digits stored to the right of the decimal point.
See this article
Just remember the more precision the more size in storage bytes. So keep it at a minimum if possible.
p (precision)
Specifies the maximum total number of
decimal digits that can be stored,
both to the left and to the right of
the decimal point. The precision must
be a value from 1 through the maximum
precision. The maximum precision is
38. The default precision is 18.
s (scale)
Specifies the maximum number of
decimal digits that can be stored to
the right of the decimal point. Scale
must be a value from 0 through p.
Scale can be specified only if
precision is specified. The default
scale is 0; therefore, 0 <= s <= p.
Maximum storage sizes vary, based on
the precision.
Finally, it is worth mentioning that in oracle you can define a scale greater then a precision, for instance Number(3, 10) is valid in oracle. SQL Server on the other hand requires that the precision >= scale. So if you defined Number(3,10) in oracle, it would map into sql as Number(10,10).
Defining a column in Oracle as NUMBER(10,5) means that the column value can have a decimal of up to five places of precision, and ten digits in overall length. If you insert a value into the column that does not have any decimal places defined, the maximum the column will support is 10 digits. For example, these values will be supported by the column defined as NUMBER(10,5):
1234567890
12345.67890
It made validation a pain.
MySQL and SQL Server don't support the NUMBER data type - to support decimals, you're looking at using DECIMAL (or FLOAT?). I haven't looked at PostgreSQL, but I would figure it to be similar to Oracle.
In Oracle, a column defined as NUMBER(4,5) requires a zero for the first digit after the decimal point and rounds all values past the fifth digit after the decimal point.
From the Oracle documentation
NUMBER(p,s)
where: p is the precision, or the
total number of digits. Oracle
guarantees the portability of numbers
with precision ranging from 1 to 38. s
is the scale, or the number of digits
to the right of the decimal point. The
scale can range from -84 to 127.
Here are some examples :
Actual data .000127 stored in NUMBER(4,5) becomes .00013
Actual data 7456123.89 stored in NUMBER(7,-2) becomes 7456100
Edited
JonH mentions something noteworthy:
Oracle allows the scale > precision,
so SQL will map that so that if s>p
then p becomes s. That is NUMBER(3, 4)
in oracle becomes NUMERIC(4,4) in SQL.