Dealing with rounding errors in SAS - sql

I'm working on a large data application and have run into a rounding error that I can't seem to fix. The code is basically like this:
proc sql;
select round(amount,.01) - round(amount * (percentage/100),.01) as amount
from data;
quit;
I've tried various methods of fixing, but all seem to lead to other rounding errors in the other direction cropping up. For the row that produces the error amount = 56.45 and percentage = 10. I get the result equal to 50.80 and am hoping for the result to equal 50.81. I cannot accept the rounding error as there is a separate process that reverses the transactions that does not have a rounding error and in the end the reversals plus the part producing the rounding error must add up to zero.
Code I've tried:
select round((((100-percentage)/100)*amount), .01)
select round(amount,.01) - round(amount * (percentage/100),.001) as amount
the second of which fixes the issue but creates three rounding errors in the other direction.
Any help is very appreciated.
Thank you.

Without knowing your datatypes, I can't say for certain, but here are some changes that should help resolve your issue:
Make sure you are working with decimal data types, not floats.
Round after you finish the math. You are rounding each step of your calculation in two of your code snippets, which is likely to produce incorrect results.
Be very careful with your order of operations/parentheses. For example, 100-percentage/100 evaluates to 100-10/100 = 100-0.1 = 99.9, which I think is not what you want. Similarly, you have one more close parenthesis than open on that line.

Related

How to get around a GEOS error when doing st_union?

I have a big layer with lines, and a view that needs to calculate the length of these lines without counting their overlaps
A working query that does half the job (but does not account for the overlap, so overestimates the number)
select name, sum(st_length(t.geom)) from mytable t where st_isvalid(t.geom) group by name
The intended query that returns SQL Error [XX000]: ERROR: GEOSUnaryUnion: TopologyException: found non-noded intersection between LINESTRING (446659 422287, 446661 422289) and LINESTRING (446659 422288, 446660 422288) at 446659.27944086661 422288.0015405959
select name,st_length(st_union(t.geom)) from mytable t where st_isvalid(t.geom) group by name
The thing is that the later works fine for the first 200 rows, it's only when I try to export the entire view that I get the error
Would there be a way to use the preferred query first, and if it returns an error on a row use the other one? Something like:
case when st_length(st_union(t.geom)) = error then sum(st_length(t.geom))
else st_length(st_union(t.geom)) end
Make sure your geometries are valid before union by wrapping them in ST_MakeValid(). You can also query their individual validity using select id, ST_IsValid(t.geom) from mytable; to maybe filter out or correct the affected ones. In cases where one of you geometries is itself invalid in this way, it'll help. This will still leave cases where the invalidity appears after combining multiple valid geometries together.
See if ST_UnaryUnion(ST_Collect(ST_MakeValid(t.geom))) changes anything. It will try to dissolve and node the component linestrings.
When really desperate, you can make a PL/pgSQL wrapper around both of your functions and switch to the backup one in the exception block.
At the expense of some precision and with the benefit of a bit higher performance, you could try snapping them to grid ST_Union(ST_SnapToGrid(t.geom,1e-7)), gradually increasing the grid size to 1e-6, 1e-5. Some geometries could be not actually intersecting, but be so close, PostGIS can't tell at the precision it operates at. You can also try applying this only to your problematic geometries, if you can pinpoint them.
As reminded by #dr_jts PostGIS 3.1.0 includes a new overlay engine, so if your select postgis_full_version(); shows anything below that and GEOS 3.9.0, it might be worth upgrading. The upcoming PostGIS 3.2.0 with GEOS 3.10.1 should also provide some iprovement in validity checks.
Here's a related thread.

Round SQL String Data to correct decimal place, then return string data without floating point errors

We are trying to implement a reporting system using software that queries our SQL database. Due to a variety of circumstances, we have a need to round data within the SQL queries. Our goal is to avoid floating point errors, unwanted trailing zeros, and complexity of nested functions (if possible).
The incoming data is always type nvarchar(...) and needs to remain in a string format, which is causing problems for us. Here is an example of what I mean (tested using w3schools.com):
SELECT
STR(235.415, 10, 2) AS StringValue1,
STR('235.415', 10, 2) AS StringValue2,
STR(ROUND(235.415, 2),10,2) AS RoundValue1,
STR(ROUND('235.415', 2),10,2) AS RoundValue2,
STR(CAST('235.415' As NUMERIC(8,2)),10,2) As CastValue1
And, the result:
I know that the issue is a conversion to floating point data type when handling strings. I think the last option, i.e. casting to numeric, is the answer to my issue. However, I can't tell if this output is correct because the CAST guarantees there will not be an error, or because I got lucky for this specific instance.
Is there any type of SQL round function (or combination of functions) that takes string input, outputs string data, and doesn't involve floating point arithmetic? -- Thanks in advance!
NUMERIC/DECIMAL and MONEY don´t uses floating point arithmetic. The are in fact integers with a fixed comma.
Be aware that if you have large sums or do some calculations with these values, your rounding error can get pretty big, pretty fast. So it is advisable to take some moments to think about where you store a value with which precision and when you want to round.

Why does SQL Server rounds to 9.999999999999999e+004?

I am trying to figure out why SQL Server is returning 9.999999999999999e+004 when it's supposed to return 1.000000000000000e+005 from the following sql statement:
select Convert(
varchar(32),
round(cast('123456' as Float), -5),
2
)
Even more interesting is that the following statement correctly returns: 1.0000000e+005
select Convert(varchar(32),
round(cast('123456' as Float), -5),
1
)
Any help would be greatly appreciated.
My best guess is that the internal computation for round() is something to the effect:
(123456 / 100000.0) * 100000.0
The fractional part produced by the division is off by the lowest order bit, as floating point arithmetic is wont to do.
(The above will not reproduce the problem because the computation is between integers and decimals. There are no floating point values.)
Note that you don't need the quotes around '123456' to cause the problem. However, because numbers with a decimal point are interpreted as decimals, rather than floats, it does not happen with convert(varchar(32), 123456.0, 2).
The difference between formats "1" and "2" is interesting. I would put this up to the vagaries of floating point arithmetic as well.
I am guessing that you can figure out pretty easy work-arounds.
And, as I allude to in a comment, this is a bit weird. Floating point representations can exactly represent 123,456 as well as 100,000. The problem must be in an intermediate value.
sth about how floats cannot represent every single rational number because you're limited to using bits to represent the entire number. 9.999..^4 is the closest the 64-bit or 32-bit float can represent 10^5.
It's not a bug, more like a implementation limitation.
for more info: Wikipedia: Floating Point > Representable Numbers

SQL Server Strange Ceiling() behavior

Can anyone explain the following results in SQL Server? I'm stumped.
declare #mynum float = 8.31
select ceiling( #mynum*100)
Results in 831
declare #mynum float = 8.21
select ceiling( #mynum*100)
Results in 822
I've tested a whole range of numbers (in SQL Server 2012). Some increase while others stay the same. I'm at a loss understanding why ceiling is treating some of them differently. Changing from a float to a decimal(18,5) seems to fix the problem but I'm wary there may be other repercussions I'm missing from doing so. Any explanations would help.
I think this is called float precision. You can find it in almost all programming languages and in Database too. This is because data is stored only with some precision and in fact what you set as 8.31 is probably not 8.31 but for example 8.31631312381813 and when multiply it and ceil it may cause that different value appear.
At SQL server documentation page you can read:
Approximate-number data types for use with floating point numeric data. Floating point data is approximate; therefore, not all values in the data type range can be represented exactly.
In other database systems the same problem exists. For example at mysql website you can read:
Floating-point numbers sometimes cause confusion because they are approximate and not stored as exact values. A floating-point value as written in an SQL statement may not be the same as the value represented internally. Attempts to treat floating-point values as exact in comparisons may lead to problems. They are also subject to platform or implementation dependencies. The FLOAT and DOUBLE data types are subject to these issues. For DECIMAL columns, MySQL performs operations with a precision of 65 decimal digits, which should solve most common inaccuracy problems.
Floating point are not 100% accurate. Like Marcin Nabiałek wrote the 8.31 you see is probably represented by something else, something like 8.310000000001. See here for some interesting reading about the accuracy problem of floating point.
Solution is not to use floating point data types unless you really have to. You should rather use DECIMAL or MONEY data types.
If you really have to use a floating point data type, then you can add or subtract a small value (the accuracy thresold or epsilon) before every floor, ceiling or comparison operations to get the precision you want. If you have a lot of floating point operations then it might be worth it to code your own floating point comparison functions.

precision gains where data move from one table to another in sql server

There are three tables in our sql server 2008
transact_orders
transact_shipments
transact_child_orders.
Three of them have a common column carrying_cost. Data type is same in all the three tables.It is float with NUMERIC_PRECISION 53 and NUMERIC_PRECISION_RADIX 2.
In table 1 - transact_orders this column has value 5.1 for three rows. convert(decimal(20,15), carrying_cost) returns 5.100000..... here.
Table 2 - transact_shipments three rows are fetching carrying_cost from those three rows in transact_orders.
convert(decimal(20,15), carrying_cost) returns 5.100000..... here also.
Table 3 - transact_child_orders is summing up those three carrying costs from transact_shipments. And the value shown there is 15.3 when I run a normal select.
But convert(decimal(20,15), carrying_cost) returns 15.299999999999999 in this stable. And its showing that precision gained value in ui also. Though ui is only fetching the value, not doing any conversion. In the java code the variable which is fetching the value from the db is defined as double.
The code in step 3, to sum up the three carrying_costs is simple ::
...sum(isnull(transact_shipments.carrying_costs,0)) sum_carrying_costs,...
Any idea why this change occurs in the third step ? Any help will be appreciated. Please let me know if any more information is needed.
Rather than post a bunch of comments, I'll write an answer.
Floats are not suitable for precise values where you can't accept rounding errors - For example, finance.
Floats can scale from very small numbers, to very high numbers. But they don't do that without losing a degree of accuracy. You can look the details up on line, there is a host of good work out there for you to read.
But, simplistically, it's because they're true binary numbers - some decimal numbers just can't be represented as a binary value with 100% accuracy. (Just like 1/3 can't be represented with 100% accuracy in decimal.)
I'm not sure what is causing your performance issue with the DECIMAL data type, often it's because there is some implicit conversion going on. (You've got a float somewhere, or decimals with different definitions, etc.)
But regardless of the cause; nothing is faster than integer arithmetic. So, store your values are integers? £1.10 could be stored as 110p. Or, if you know you'll get some fractions of a pence for some reason, 11000dp (deci-pennies).
You do then need to consider the biggest value you will ever reach, and whether INT or BIGINT is more appropriate.
Also, when working with integers, be careful of divisions. If you divide £10 between 3 people, where does the last 1p need to go? £3.33 for two people and £3.34 for one person? £0.01 eaten by the bank? But, invariably, it should not get lost to the digital elves.
And, obviously, when presenting the number to a user, you then need to manipulate it back to £ rather than dp; but you need to do that often anyway, to get £10k or £10M, etc.
Whatever you do, and if you don't want rounding errors due to floating point values, don't use FLOAT.
(There is ALOT written on line about how to use floats, and more importantly, how not to. It's a big topic; just don't fall into the trap of "it's so accurate, it's amazing, it can do anything" - I can't count the number of time people have screwed up data using that unfortunately common but naive assumption.)