How does SQL Server determine the number of significant digits when rounding - sql

I have two database fields that require significant precision - one column is decimal(36,12) and another is decimal(36,24).
When multiplying them I was expecting that precision will be preserved, however that seems not to be the case - I actually get less precision. For example when running following SQL:
SELECT 3.035 * 0.333333333333333333333333 AS Normal
SELECT CAST(3.035 AS DECIMAL(36,12)) * CAST(0.333333333333333333333333 AS DECIMAL(36,24)) AS Cast12
SELECT CAST(3.035 AS DECIMAL(36,24)) * CAST(0.333333333333333333333333 AS DECIMAL(36,24)) AS Cast24
I get following results:
Normal : 1.011666666666666666666665655
Cast12 : 1.011667
Cast24 : 1.0116666666667
I guess this is because multiplication is operation that usually yields "bigger" number. I obviously can find a way around this (Cast24 is "good enough"), but I want to understand how exactly does SQL Server determines when to round and what digits to keep?

The combination of the following articles should answer your question.
http://cuisinecode.blogspot.com/2014/04/losing-precision-after-multiplication.html
http://technet.microsoft.com/en-us/library/ms190476.aspx

Related

Real number comparison for trigram similarity

I am implementing trigram similarity for word matching in column comum1. similarity() returns real. I have converted 0.01 to real and rounded to 2 decimal digits. Though there are rank values greater than 0.01, I get no results on screen. If I remove the WHERE condition, lots of results are available. Kindly guide me how to overcome this issue.
SELECT *,ROUND(similarity(comum1,"Search_word"),2) AS rank
FROM schema.table
WHERE rank >= round(0.01::real,2)
I have also converted both numbers to numeric and compared, but that also didn't work:
SELECT *,ROUND(similarity(comum1,"Search_word")::NUMERIC,2) AS rank
FROM schema.table
WHERE rank >= round(0.01::NUMERIC,2)
LIMIT 50;
The WHERE clause can only reference input column names, coming from the underlying table(s). rank in your example is the column alias for a result - an output column name.
So your statement is illegal and should return with an error message - unless you have another column named rank in schema.table, in which case you shot yourself in the foot. I would think twice before introducing such a naming collision, while I am not completely firm with SQL syntax.
And round() with a second parameter is not defined for real, you would need to cast to numeric like you tried. Another reason your first query is illegal.
Also, the double-quotes around "Search_word" are highly suspicious. If that's supposed to be a string literal, you need single quotes: 'Search_word'.
This should work:
SELECT *, round(similarity(comum1,'Search_word')::numeric,2) AS rank
FROM schema.table
WHERE similarity(comum1, 'Search_word') > 0.01;
But it's still pretty useless as it fails to make use of trigram indexes. Do this instead:
SET pg_trgm.similarity_threshold = 0.01; -- set once
SELECT *
FROM schema.table
WHERE comum1 % 'Search_word';
See:
Finding similar strings with PostgreSQL quickly
That said, a similarity of 0.01 is almost no similarity. Typically, you need a much higher threshold.

Float precision in Bigquery [duplicate]

We don't have decimal data type in BigQuery now. So I have to use float
But
In Bigquery float division
0.029*50/100=0.014500000000000002
Although
0.021*50/100=0.0105
To round the value up
I have to use round(floatvalue*10000)/10000.
Is this the right way to deal with decimal data type now in BigQuery?
Depends on your coding preferences - for example you can just use simple ROUND(floatvalue, 4)
Depends on how exactly you need to round - up or down - you can respectively adjust expression
For example ROUND(floatvalue + 0.00005, 4)
See all rounding functions for BigQuery Standard SQL at below link
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#rounding-functions
Note that this question deserves a different answer now.
The premise of the question is "We don't have decimal data type in BigQuery now."
But now we do: You can use NUMERIC:
SELECT CAST('0.029' AS NUMERIC)*50/100
# 0.0145
Just make your column is NUMERIC instead of FLOAT64, and you'll get the desired results.
Rounding up in most SQL dialects is not a built-in function unless you're fortunate enough to be rounding up to an integer. In this case, CEIL is a quick and reliable solution.
In the case of rounding decimals up, we can also leverage CEIL, albeit with a couple of additional steps.
The procedure:
Multiply your value to move the last decimal to the tenths position. Ex. 18.234 becomes 1823.4 by multiplying by 100. (n * 100)
Use CEIL() to round up to the nearest integer. In our example, CEIL(n) = 1824.
Divide this result by the same figure used in step 1. In our example, n / 100 = 18.24.
Simplifying these steps leaves us with the below logic:
SELECT CEIL(value * 100) / 100 as rounded_up;
The same logic can be used to round down using the FLOOR function as such:
FLOOR(value * 100) / 100 AS rounded_down;
Thanks to #Mureinik for this answer.

How to use bigquery round up results to 4 digits after decimal point?

We don't have decimal data type in BigQuery now. So I have to use float
But
In Bigquery float division
0.029*50/100=0.014500000000000002
Although
0.021*50/100=0.0105
To round the value up
I have to use round(floatvalue*10000)/10000.
Is this the right way to deal with decimal data type now in BigQuery?
Depends on your coding preferences - for example you can just use simple ROUND(floatvalue, 4)
Depends on how exactly you need to round - up or down - you can respectively adjust expression
For example ROUND(floatvalue + 0.00005, 4)
See all rounding functions for BigQuery Standard SQL at below link
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#rounding-functions
Note that this question deserves a different answer now.
The premise of the question is "We don't have decimal data type in BigQuery now."
But now we do: You can use NUMERIC:
SELECT CAST('0.029' AS NUMERIC)*50/100
# 0.0145
Just make your column is NUMERIC instead of FLOAT64, and you'll get the desired results.
Rounding up in most SQL dialects is not a built-in function unless you're fortunate enough to be rounding up to an integer. In this case, CEIL is a quick and reliable solution.
In the case of rounding decimals up, we can also leverage CEIL, albeit with a couple of additional steps.
The procedure:
Multiply your value to move the last decimal to the tenths position. Ex. 18.234 becomes 1823.4 by multiplying by 100. (n * 100)
Use CEIL() to round up to the nearest integer. In our example, CEIL(n) = 1824.
Divide this result by the same figure used in step 1. In our example, n / 100 = 18.24.
Simplifying these steps leaves us with the below logic:
SELECT CEIL(value * 100) / 100 as rounded_up;
The same logic can be used to round down using the FLOOR function as such:
FLOOR(value * 100) / 100 AS rounded_down;
Thanks to #Mureinik for this answer.

Why is SQL Server changing operation order and boxing the way it does?

Four simple SELECT statements:
SELECT 33883.50 * -1;
SELECT 33883.50 / -1.05;
SELECT 33883.50 * -1 / 1.05;
SELECT (33883.50 * -1) / 1.05;
But the results are not as I would expect:
-33883.50
-32270.000000
-32269.96773000
-32270.000000
That third result is the one that seems questionable. I can see what is happening, first SQL Server evaluates this:
SELECT -1 / 1.05;
Getting an answer of:
-0.952380
Then it takes that answer and uses it to perform this calculation:
SELECT 33883.50 * -0.952380;
To get the (wrong) answer of:
-32269.96773000
But why is it doing this?
In your example
33883.50 * -1 / 1.05
is evaluated as
33883.50 * (-1 / 1.05)
instead of
(33883.50 * -1) / 1.05
which results in a loss in precision.
I played a bit with it. I used SQL Sentry Plan Explorer to see the details of how SQL Server evaluates expressions. For example,
2 * 3 * -4 * 5 * 6
is evaluated as
((2)*(3)) * ( -((4)*(5))*(6))
I'd explain it like this. In T-SQL unary minus is made to be the same priority as subtraction, which is lower than multiplication. Yes,
When two operators in an expression have the same operator precedence
level, they are evaluated left to right based on their position in the
expression.
, but here we have an expression that mixes operators with different priorities and parser follows these priorities to the letter. Multiplication has to go first, so it evaluates 4 * 5 * 6 at first and then applies unary minus to the result.
Normally (say in C++) unary minus has higher priority (like bitwise NOT) and such expressions are parsed and evaluated as expected. They should have made unary minus/plus same highest priority as bitwise NOT in T-SQL, but they didn't and this is the result. So, it is not a bug, but a bad design decision. It is even documented, though quite obscurely.
When you refer to Oracle - that the same example works differently in Oracle than in SQL Server:
Oracle may have different rules for operator precedence than SQL Server. All it takes is to make unary minus highest priority as it should.
Oracle may have different rules for determining result precision and scale when evaluating expressions with decimal type.
Oracle may have different rules for rounding intermediate results. SQL Server "uses rounding when converting a number to a decimal or numeric value with a lower precision and scale".
Oracle may be using completely different types for these kind of expressions, not decimal. In SQL Server "a constant with a decimal point is automatically converted into a numeric data value, using the minimum precision and scale necessary. For example, the constant 12.345 is converted into a numeric value with a precision of 5 and a scale of 3."
Even definition of decimal may be different in Oracle. Even in SQL Server "the default maximum precision of numeric and decimal data types is 38. In earlier versions of SQL Server, the default maximum is 28."
Do you know BODMAS rule. The answer is correct its not because of Sql Server, Its a basic mathematics.
First comes Division then comes the Subtraction, So always Division will happen before Subtraction
If you want to get correct answer then use proper parenthesis
SELECT (33883.50 * -1) / 1.05;
T-SQL has an rule for operator precedence which it follows. You can read about it on the link https://msdn.microsoft.com/en-us/library/ms190276.aspx.
It seems to be a precedence rule concerning unary operators. I have tried the following queries
SELECT 33883.50 * cast(-1 as int) / 1.05;
SELECT 33883.50 * (-1 * 1) / 1.05;
and it returns the right answer. The best thing to do is to use parentheses on expressions you want to occur first, and test thoroughly.

Query speed and expressions with constant value

Is
SELECT * FROM Table WHERE ID=2*2
slower than
SELECT * FROM Table WHERE ID=4
I mean: is 2*2 calculated once or is it calculated when checking each record?
You may ask why write 2*2 and not just simply 4?
Well actually in my real case I have some numbers which represents characters building some text (somebody had a "great" idea of saving array of numeric data as string).
So my query would be like:
SELECT * FROM Table where value1=(CHAR(20)+CHAR(8)+CHAR(32))
I could calculate this string on my side but there could be possible problems with encoding, passing characters like /n or /r in a query etc.
Will this (CHAR(20)+CHAR(8)+CHAR(32)) slow my query if I use it instead of plain string text like '#!_'
it will be the same performance with a constant expression like 2*2. However if you start comparing ID/2 = 2. It will slow down the query because it has to calculate for all rows.