What should be the best way to store a percent value in SQL-Server? - sql

I want to store a value that represents a percent in SQL server, what data type should be the prefered one?

You should use decimal(p,s) in 99.9% of cases.
Percent is only a presentation concept: 10% is still 0.1.
Simply choose precision and scale for the highest expected values/desired decimal places when expressed as real numbers. You can have p = s for values < 100% and simply decide based on decimal places.
However, if you do need to store 100% or 1, then you'll need p = s+1.
This then allows up to 9.xxxxxx or 9xx.xxxx%, so I'd add a check constraint to keep it maximum of 1 if this is all I need.

decimal(p, s) and numeric(p, s)
p (precision):
The maximum total number of decimal digits that will be stored (both to the left and to the right of the decimal point)
s (scale):
The number of decimal digits that will be stored to the right of the decimal point (-> s defines the number of decimal places)
0 <= s <= p.
p ... total number of digits
s ... number of digits to the right of the decimal point
p-s ... number of digits to the left of the decimal point
Example:
CREATE TABLE dbo.MyTable
( MyDecimalColumn decimal(5,2)
,MyNumericColumn numeric(10,5)
);
INSERT INTO dbo.MyTable VALUES (123, 12345.12);
SELECT MyDecimalColumn, MyNumericColumn FROM dbo.MyTable;
Result:
MyDecimalColumn: 123.00 (p=5, s=2)
MyNumericColumn: 12345.12000 (p=10, s=5)
link: msdn.microsoft.com

I agree, DECIMAL is where you should store this type of number. But to make the decision easier, store it as a percentage of 1, not as a percentage of 100. That way you can store exactly the number of decimal places you need regardless of the "whole" number. So if you want 6 decimal places, use DECIMAL(9, 8) and for 23.3436435%, you store 0.23346435. Changing it to 23.346435% is a display problem, not a storage problem, and most presentation languages / report writers etc. are capable of changing the display for you.

I think decimal(p, s) should be used while s represents the percentage capability.
the 'p' could of been even 1 since we will never need more than one byte since each digit in left side of the point is one hunderd percent, so the p must be at least s+1, in order you should be able to store up to 1000%.
but SQL doesn't allow the 'p' to be smaller than the s.
Examples:
28.2656579879% should be decimal(13, 12) and should be stored 00.282656579879
128.2656579879% should be decimal(13, 12) and should be stored 01.282656579879
28% should be stored in decimal(3,2) as 0.28
128% should be stored in decimal(3,2) as 1.28
Note: if you know that you're not going to reach the 100% (i.e. your value will always be less than 100% than use decimal(s, s), if it will, use decimal(s+1, s).
And so on

The datatype of the column should be decimal.

Related

How to query column with letters on SQL?

I'm new to this.
I have a column: (chocolate_weight) On the table : (Chocolate) which has g at the end of every number, so 30x , 2x5g,10g etc.
I want to remove the letter at the end and then query it to show any that weigh greater than 35.
So far I have done
Select *
From Chocolate
Where chocolate_weight IN
(SELECT
REPLACE(chocolote_weight,'x','') From Chocolate) > 35
It is coming back with 0 , even though there are many that weigh more than 35.
Any help is appreciated
Thanks
If 'g' is always the suffix then your current query is along the right lines, but you don't need the IN you can do the replace in the where clause:
SELECT *
FROM Chocolate
WHERE CAST(REPLACE(chocolate_weight,'g','') AS DECIMAL(10, 2)) > 35;
N.B. This works in both the tagged DBMS SQL-Server and MySQL
This will fail (although only silently in MySQL) if you have anything that contains units other than grams though, so what I would strongly suggest is that you fix your design if it is not too late, store the weight as an numeric type and lose the 'g' completely if you only ever store in grams. If you use multiple different units then you may wish to standardise this so all are as grams, or alternatively store the two things in separate columns, one as a decimal/int for the numeric value and a separate column for the weight, e.g.
Weight
Unit
10
g
150
g
1000
lb
The issue you will have here though is that you will have start doing conversions in your queries to ensure you get all results. It is easier to do the conversion once when the data is saved and use a standard measure for all records.

SQL Server floating point numbers

I was wondering if there is a way to show the values of columns of type floating point numbers in two decimal places in SQL Server 2008 via settings? For instance, let say I have a table called orders with several columns. I want to be able to do the following:
SELECT * FROM orders
I expect to see any values in columns of type float to display with decimal notation; for instance, a value of 4 should display as 4.0 or 4.00.
Thanks
You may use CONVERT function with NUMERIC( x , 2) for numeric values
( where x is at least 3, better more, upto 38 )
SELECT CONVERT(NUMERIC(10, 2), 4 ) as "Dcm Nr(2)";
Dcm Nr(2)
---------
4,00
SELECT CONVERT(NUMERIC(10, 1), 4 ) as "Dcm Nr(1)";
Dcm Nr(1)
---------
4,0
The simplest form of what happens to me is making a "cast", for example:
SELECT CAST(orders AS DECIMAL(10,2)) FROM [your table];
The short answer to your question is "No".
SQL Server isn't really in the business of data presentation. We all do a lot of backbends from time to time to force things into a presentable state, and the other answers provided so far can help you on a column by column basis.
But the sort of "set it and forget it" thing you're looking for is better handled in a front end application.

Redshift numeric precision truncating

I have encountered situation that I can't explain how Redshift handles division of SUMs.
There is example table:
create table public.datatype_test(
a numeric(19,6),
b numeric(19,6));
insert into public.datatype_test values(222222.2222, 333333.3333);
insert into public.datatype_test values(444444.4444, 666666.6666);
Now I try to run query:
select sum(a)/sum(b) from public.datatype_test;
I get result 0.6666 (4 decimals). It is not related to tool display, it really returns only 4 decimal places, and it doesn't matter how big or small numbers are in table. In my case 4 decimals is not precise enough.
Same stands true if I use AVG instead of SUM.
If I use MAX instead of SUM, I get : 0.6666666666666666666 (19 decimals).
It also returns correct result (0.6666666666666667) when no phisical table is used:
with t as (
select 222222.2222::numeric(19,6) as a, 333333.3333::numeric(19,6) as b union all
select 444444.4444::numeric(19,6) as a, 666666.6666::numeric(19,6) as b
)
select sum(a)/sum(b) as d from t;
I have looked into Redshift documentation about SUM and Computations with Numeric Values, but I still don't get result according to documentation.
Using float datatype for table columns is not an option as I need to store precise currency amounts and 15 significant digits is not enough.
Using cast on SUM aggregation also gives 0.6666666666666666666 (19 decimals).
select sum(a)::numeric(19,6)/sum(b) from public.datatype_test;
But it looks wrong, and I can't force BI tools to do this workaround, also everyone who uses this data should not use this kind of workaround.
I have tried to use same test in PostgreSQL 10, and it works as it should, returning sufficient amount of decimals for division.
Is there anything I can do with database setup to avoid casting in SQL Query?
Any advice or guidance is highly appreciated.
Redshift version:
PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.4081
Using dc2.8xlarge nodes
I have run into similar issues, and although I don't have a solution that doesn't require a workaround, I can at least explain it.
The precision/scale of the result of division is defined by the rules in the "computations with numeric values" document.
A consequence of those rules is that a decimal(19,6) divided by another decimal(19,6) will return decimal(38,19).
What's happening to you, though, is that MAX returns the same precision/scale as the underlying column, but SUM returns decimal(38,*) no matter what.
(This is probably a safety precaution to prevent overflow on sums of "big data"). If you divide decimal(38,6) by another, you get decimal(38,4).
AWS support will probably not consider this a defect -- there is no SQL standard for how to treat decimal precision in division, and given that this is documented behavior, it's probably a deliberate decision.
The only way to address this is to typecast the numerator, or multiply it by something like sum(a) * cast(1 as decimal(10,9)) which is portable SQL and will force more decimal places in the numerator and thus the result.
As a convenience I made a calculator in JSFiddle with the rules so you can play around with different options:
scale = Math.max(4, s1 + p2 - s2 + 1)
precision = p1 - s1 + s2 + scale
if (precision > 38) {
scale = Math.max((38 + scale - precision), 4)
precision = 38
}

SQL Long Decimal Value Comparison

So I have two identical values that are results of sum functions with the exact same length (no rounding is being done). (Update, data type is float)
Value_1 = 29.9539194336501
Value_2 = 29.9539194336501
The issue I'm having is when I do an IF statement for Value_1 = Value_2, it comes up as FALSE.
Value_1:
SELECT SUM([INVN_DOL])/SUM([AVG_DLY_SLS_LST_35_DYS]) end as DSO
FROM TABLE A
Value_2:
SELECT SUM ([Total_Inventory_Val]) / SUM ([Daily_Independent_Demand])
FROM TABLE B
Any idea why they may not be exactly equal and what I can do to get a TRUE value since they do match?
Thanks in advance
The issue you are having here is that your are using a calculated value that is held within a float, which will by design be slightly imprecise at higher levels of precision, which is why you are getting your mismatch.
Use data types like decimal with a defined precision and scale to hold your values and calculation results and you should get consistent results.
You can make use ROUND to limit the decimal points
Or
Try the ABS and see if that works out.

How does the Average function work in relational databases?

I'm trying to find geometric average of values from a table with millions of rows. For those that don't know, to find the geometric average, you mulitply each value times each other then divide by the number of rows.
You probably already see the problem; The number multiplied number will quickly exceed the maximum allowed system maximum. I found a great solution that uses the natural log.
http://timothychenallen.blogspot.com/2006/03/sql-calculating-geometric-mean-geomean.html
However that got me to wonder wouldn't the same problem apply with the arithmetic mean? If you have N records, and N is very large the running sum can also exceed the system maximum.
So how do RDMS calculate averages during queries?
I don't know an exact implementation for arithmetic mean in an RDBMS, nor did you specify one in your original question. But the RDBMS does not need to sum a million rows in a column in order to obtain the arithmetic mean. Consider the following summation:
sum = (x1 + x2 + x3 + ... + x1000000)
Then the mean can be written as
mean = sum / N = (x1 + x2 + x3 + ... + x1000000) / N, for N = 1,000,000
But this expression can be broken up into pieces like this:
mean = [(x1 + x2 + x3) / N ] + [(x4 + x5 + x6) / N] + ...
In other words, the RDBMS can simply scan down the million rows in a column and find the mean section by section, without running the risk of an overflow. And since each number in the column is presumably within range for the type storing it, there is no chance of the mean value itself overflowing.
Most databases don't support a product() function the way they support an average.
However, you can use do what you want with logs. The product (simplified) is like:
select exp(sum(ln(x)) as product
The average would be:
select power(exp(sum(ln(x))), 1.0 / count(*)) as geoaverage
or
select EXP(AVG(LN(x))) as geoaverage
The LN() function might be LOG() on some platforms...
These are schematics. The functions for exp() and ln() and power() vary, depending on the database. Plus, if you have to take into account zero or negative numbers, the logic is more complicated.
Very easy to check. For example, SQL Server 2008.
DECLARE #T TABLE(i int);
INSERT INTO #T(i) VALUES
(2147483647),
(2147483647);
SELECT AVG(i) FROM #T;
result
(2 row(s) affected)
Msg 8115, Level 16, State 2, Line 7
Arithmetic overflow error converting expression to data type int.
There is no magic. Column type is int, server adds values together using internal variable of the same type int and intermediary result exceeds range for int.
You can run the similar check for any other DBMS that you use. Different engines may behave differently, but I would expect all of them to stick to the original type of the column. For example, averaging two int values 100 and 101 may result in 100 or 101 (still int), but never 100.5.
For SQL Server this behavior is documented. I would expect something similar for all other engines:
AVG () computes the average of a set of values by dividing the sum of
those values by the count of nonnull values. If the sum exceeds the
maximum value for the data type of the return value an error will be
returned.
So, you have to be careful when calculating simple average as well, not just product.
Here is extract from SQL 92 Standard:
6) Let DT be the data type of the < value expression >.
9) If SUM or AVG is specified, then:
a) DT shall not be character string, bit string, or datetime.
b) If SUM is specified and DT is exact numeric with scale S, then the
data type of the result is exact numeric with implementation-defined
precision and scale S.
c) If AVG is specified and DT is exact numeric, then the data type of
the result is exact numeric with implementation- defined precision not
less than the precision of DT and implementation-defined scale not
less than the scale of DT.
d) If DT is approximate numeric, then the data type of the result is
approximate numeric with implementation-defined precision not less
than the precision of DT.
e) If DT is interval, then the data type of the result is inter- val
with the same precision as DT.
So, DBMS can convert int to larger type when calculating AVG, but it has to be an exact numeric type, not floating-point. In any case, depending on the values you can still get arithmetic overflow.
Some DBMS — specifically, the Informix DBMS — convert from an INT type to a floating point type to do the calculation:
SQL[2148]: create table t(i int);
SQL[2149]: insert into t values(214748347);
SQL[2150]: insert into t values(214748347);
SQL[2151]: insert into t values(214748347);
SQL[2152]: select avg(i) from t;
214748347.0
SQL[2153]: types on;
SQL[2154]: select i from t;
INTEGER
214748347
214748347
214748347
SQL[2155]: select avg(i) from t;
DECIMAL(32)
214748347.0
SQL[2156]:
Similarly with other types. This can still end with an overflow under some circumstances; you then get a runtime error. However, it is rather seldom that you exceed the precision — it typically takes a very large number of rows for the sum to exceed the limits, even if you're counting the US deficit over the next century in atto-Zimbabwean dollars circa 2009.