Treatment of error values in the SQL standard - sql

I have a question about the SQL standard which I'm hoping a SQL language lawyer can help with.
Certain expressions just don't work. 62 / 0, for example. The SQL standard specifies quite a few ways in which expressions can go wrong in similar ways. Lots of languages deal with these expressions using special exceptional flow control, or bottom psuedo-values.
I have a table, t, with (only) two columns, x and y each of type int. I suspect it isn't relevant, but for definiteness let's say that (x,y) is the primary key of t. This table contains (only) the following values:
x y
7 2
3 0
4 1
26 5
31 0
9 3
What behavior is required by the SQL standard for SELECT expressions operating on this table which may involve division(s) by zero? Alternatively, if no one behavior is required, what behaviors are permitted?
For example, what behavior is required for the following select statements?
The easy one:
SELECT x, y, x / y AS quot
FROM t
A harder one:
SELECT x, y, x / y AS quot
FROM t
WHERE y != 0
An even harder one:
SELECT x, y, x / y AS quot
FROM t
WHERE x % 2 = 0
Would an implementation (say, one that failed to realize on a more complex version of this query that the restriction could be moved inside the extension) be permitted to produce a division by zero error in response to this query, because, say it attempted to divide 3 by 0 as part of the extension before performing the restriction and realizing that 3 % 2 = 1? This could become important if, for example, the extension was over a small table but the result--when joined with a large table and restricted on the basis of data in the large table--ended up restricting away all of the rows which would have required division by zero.
If t had millions of rows, and this last query were performed by a table scan, would an implementation be permitted to return the first several million results before discovering a division by zero near the end when encountering one even value of x with a zero value of y? Would it be required to buffer?
There are even worse cases, ponder this one, which depending on the semantics can ruin boolean short-circuiting or require four-valued boolean logic in restrictions:
SELECT x, y
FROM t
WHERE ((x / y) >= 2) AND ((x % 2) = 0)
If the table is large, this short-circuiting problem can get really crazy. Imagine the table had a million rows, one of which had a 0 divisor. What would the standard say is the semantics of:
SELECT CASE
WHEN EXISTS
(
SELECT x, y, x / y AS quot
FROM t
)
THEN 1
ELSE 0
END AS what_is_my_value
It seems like this value should probably be an error since it depends on the emptiness or non-emptiness of a result which is an error, but adopting those semantics would seem to prohibit the optimizer for short-circuiting the table scan here. Does this existence query require proving the existence of one non-bottoming row, or also the non-existence of a bottoming row?
I'd appreciate guidance here, because I can't seem to find the relevant part(s) of the specification.

All implementations of SQL that I've worked with treat a division by 0 as an immediate NaN or #INF. The division is supposed to be handled by the front end, not by the implementation itself. The query should not bottom out, but the result set needs to return NaN in this case. Therefore, it's returned at the same time as the result set, and no special warning or message is brought up to the user.
At any rate, to properly deal with this, use the following query:
select
x, y,
case y
when 0 then null
else x / y
end as quot
from
t
To answer your last question, this statement:
SELECT x, y, x / y AS quot
FROM t
Would return this:
x y quot
7 2 3.5
3 0 NaN
4 1 4
26 5 5.2
31 0 NaN
9 3 3
So, your exists would find all the rows in t, regardless of what their quotient was.
Additionally, I was reading over your question again and realized I hadn't discussed where clauses (for shame!). The where clause, or predicate, should always be applied before the columns are calculated.
Think about this query:
select x, y, x/y as quot from t where x%2 = 0
If we had a record (3,0), it applies the where condition, and checks if 3 % 2 = 0. It does not, so it doesn't include that record in the column calculations, and leaves it right where it is.

Related

Combining many sort ranks into one master sort rank

Say I have some sorted result from a SQL query that looks like:
x y z
0 0 0
0 0 1
0 0 2
0 1 0
0 1 1
0 2 0
0 2 1
Where x, y and z are sort ranks. These sort ranks are always greater than 0, and smaller than 500mil.
Is there a way to combine the values from x, y and z into one "master" sort rank? Sorting the dataset using this "master" sort rank should result in the same ordering.
I'm thinking I can do something with bit shifting but I am not sure...
Assuming that every value in each of the three columns in between 1 and 500 million, you could use the following formula to generate a unique rank:
1000000
z + (500 x 10^6)*y + (500 x 10^6)*(500 x 10^6)*x
To generate this rank you could use the following query:
SELECT
x, y, z,
z + (500 * 1000000)*y + (500 * 1000000)*(500 * 1000000)*x AS master_rank
FROM yourTable;
The reason this works can be seen by examining say the z and y columns. The largest value from z is 500 million, which is guaranteed to be smaller than the smallest value in y, which is 1 billion. This logic applies to the whole formula. This approach is similar to using a bit mask, on a larger scale.
Note that I assume that your version of SQL can tolerate numbers this large. If it doesn't, then you might want to consider another approach here, possibly just ordering as #Gordon mentioned in his answer. Besides this, having 1 bil x 1 bil records would make for a very large table and would have other problems.
Do you mean something like this?
order by x * 10000 + y * 100 + z
(You would adjust the numbers for the width you need.)
I'm not sure why you would want to do that instead of:
order by x, y, z
If you do combine into a single value, be careful about integer overflow.

Prolog: how to optimize this code(Solving 123456789=100 puzzle)

So there was a puzzle:
This equation is incomplete: 1 2 3 4 5 6 7 8 9 = 100. One way to make
it accurate is by adding seven plus and minus signs, like so: 1 + 2 +
3 – 4 + 5 + 6 + 78 + 9 = 100.
How can you do it using only 3 plus or minus signs?
I'm quite new to Prolog, solved the puzzle, but i wonder how to optimize it
makeInt(S,F,FinInt):-
getInt(S,F,0,FinInt).
getInt(Start, Finish, Acc, FinInt):-
0 =< Finish - Start,
NewAcc is Acc*10 + Start,
NewStart is Start +1,
getInt(NewStart, Finish, NewAcc, FinInt).
getInt(Start, Finish, A, A):-
0 > Finish - Start.
itCounts(X,Y,Z,Q):-
member(XLastDigit,[1,2,3,4,5,6]),
FromY is XLastDigit+1,
numlist(FromY, 7, ListYLastDigit),
member(YLastDigit, ListYLastDigit),
FromZ is YLastDigit+1,
numlist(FromZ, 8, ListZLastDigit),
member(ZLastDigit,ListZLastDigit),
FromQ is ZLastDigit+1,
member(YSign,[-1,1]),
member(ZSign,[-1,1]),
member(QSign,[-1,1]),
0 is XLastDigit + YSign*YLastDigit + ZSign*ZLastDigit + QSign*9,
makeInt(1, XLastDigit, FirstNumber),
makeInt(FromY, YLastDigit, SecondNumber),
makeInt(FromZ, ZLastDigit, ThirdNumber),
makeInt(FromQ, 9, FourthNumber),
X is FirstNumber,
Y is YSign*SecondNumber,
Z is ZSign*ThirdNumber,
Q is QSign*FourthNumber,
100 =:= X + Y + Z + Q.
Not sure this stands for an optimization. The code is just shorter:
sum_123456789_eq_100_with_3_sum_or_sub(L) :-
append([G1,G2,G3,G4], [0'1,0'2,0'3,0'4,0'5,0'6,0'7,0'8,0'9]),
maplist([X]>>(length(X,N), N>0), [G1,G2,G3,G4]),
maplist([G,F]>>(member(Op, [0'+,0'-]),F=[Op|G]), [G2,G3,G4], [F2,F3,F4]),
append([G1,F2,F3,F4], L),
read_term_from_codes(L, T, []),
100 is T.
It took me a while, but I got what your code is doing. It's something like this:
itCounts(X,Y,Z,Q) :- % generate X, Y, Z, and Q s.t. X+Y+Z+Q=100, etc.
generate X as a list of digits
do the same for Y, Z, and Q
pick the signs for Y, Z, and Q
convert all those lists of digits into numbers
verify that, with the signs, they add to 100.
The inefficiency here is that the testing is all done at the last minute. You can improve the efficiency if you can throw out some possible solutions as soon as you pick one of your numbers, that is, testing earlier.
itCounts(X,Y,Z,Q) :- % generate X, Y, Z, and Q s.t. X+Y+Z+Q=100, etc.
generate X as a list of digits, and convert it to a number
if it's so big or small the rest can't possibly bring the sum back to 100, fail
generate Y as a list of digits, convert to number, and pick it sign
if it's so big or so small the rest can't possibly bring the sum to 100, fail
do the same for Z
do the same for Q
Your function is running pretty fast already, even if I search all possible solutions. It only picks 6 X's; 42 Y's; 224 Z's; and 15 Q's. I don't think optimizing will be worth your while.
But if you really wanted to: I tested this by putting a testing function immediately after selecting an X. It reduced the 6 X's to 3 (all before finding the solution); 42 Y's to 30; 224 Z's to 184; and 15 Q's to 11. I believe we could reduce it further by testing immediately after a Y is picked, to see whether X YSign Y is already so large or small there can be no solution.
In PROLOG programs that are more computationally intensive, putting parts of the 'test' earlier in 'generate and test' algorithms can help a lot.

How to choose a range for a loop based upon the answers of a previous loop?

I'm sorry the title is so confusingly worded, but it's hard to condense this problem down to a few words.
I'm trying to find the minimum value of a specific equation. At first I'm looping through the equation, which for our purposes here can be something like y = .245x^3-.67x^2+5x+12. I want to design a loop where the "steps" through the loop get smaller and smaller.
For example, the first time it loops through, it uses a step of 1. I will get about 30 values. What I need help on is how do I Use the three smallest values I receive from this first loop?
Here's an example of the values I might get from the first loop: (I should note this isn't supposed to be actual code at all. It's just a brief description of what's happening)
loop from x = 1 to 8 with step 1
results:
x = 1 -> y = 30
x = 2 -> y = 28
x = 3 -> y = 25
x = 4 -> y = 21
x = 5 -> y = 18
x = 6 -> y = 22
x = 7 -> y = 27
x = 8 -> y = 33
I want something that can detect the lowest three values and create a loop. From theses results, the values of x that get the smallest three results for y are x = 4, 5, and 6.
So my "guess" at this point would be x = 5. To get a better "guess" I'd like a loop that now does:
loop from x = 4 to x = 6 with step .5
I could keep this pattern going until I get an absurdly accurate guess for the minimum value of x.
Does anybody know of a way I can do this? I know the values I'm going to get are going to be able to be modeled by a parabola opening up, so this format will definitely work. I was thinking that the values could be put into a column. It wouldn't be hard to make something that returns the smallest value for y in that column, and the corresponding x-value.
If I'm being too vague, just let me know, and I can answer any questions you might have.
nice question. Here's at least a start for what I think you should do for this:
Sub findMin()
Dim lowest As Integer
Dim middle As Integer
Dim highest As Integer
lowest = 999
middle = 999
hightest = 999
Dim i As Integer
i = 1
Do While i < 9
If (retVal(i) < retVal(lowest)) Then
highest = middle
middle = lowest
lowest = i
Else
If (retVal(i) < retVal(middle)) Then
highest = middle
middle = i
Else
If (retVal(i) < retVal(highest)) Then
highest = i
End If
End If
End If
i = i + 1
Loop
End Sub
Function retVal(num As Integer) As Double
retVal = 0.245 * Math.Sqr(num) * num - 0.67 * Math.Sqr(num) + 5 * num + 12
End Function
What I've done here is set three Integers as your three Min values: lowest, middle, and highest. You loop through the values you're plugging into the formula (here, the retVal function) and comparing the return value of retVal (hence the name) to the values of retVal(lowest), retVal(middle), and retVal(highest), replacing them as necessary. I'm just beginning with VBA so what I've done likely isn't very elegant, but it does at least identify the Integers that result in the lowest values of the function. You may have to play around with the values of lowest, middle, and highest a bit to make it work. I know this isn't EXACTLY what you're looking for, but it's something along the lines of what I think you should do.
There is no trivial way to approach this unless the problem domain is narrowed.
The example polynomial given in fact has no minimum, which is readily determined by observing y'>0 (hence, y is always increasing WRT x).
Given the wide interpretation of
[an] equation, which for our purposes here can be something like y =
.245x^3-.67x^2+5x+12
many conditions need to be checked, even assuming the domain is limited to polynomials.
The polynomial order is significant, and the order determines what conditions are necessary to check for how many solutions are possible, or whether any solution is possible at all.
Without taking this complexity into account, an iterative approach could yield an incorrect solution due to underflow error, or an unfortunate choice of iteration steps or bounds.
I'm not trying to be hard here, I think your idea is neat. In practice it is more complicated than you think.

Approaches to converting a table of possibilities into logical statements

I'm not sure how to express this problem, so my apologies if it's already been addressed.
I have business rules summarized as a table of outputs given two inputs. For each of five possible value on one axis, and each of five values on another axis, there is a single output. There are ten distinct possibilities in these 25 cells, so it's not the case that each input pair has a unique output.
I have encoded these rules in TSQL with nested CASE statements, but it's hard to debug and modify. In C# I might use an array literal. I'm wondering if there's an academic topic which relates to converting logical rules to matrices and vice versa.
As an example, one could translate this trivial matrix:
A B C
-- -- -- --
X 1 1 0
Y 0 1 0
...into rules like so:
if B OR (A and X) then 1 else 0
...or, in verbose SQL:
CASE WHEN FieldABC = 'B' THEN 1
WHEN FieldABX = 'A' AND FieldXY = 'X' THEN 1
ELSE 0
I'm looking for a good approach for larger matrices, especially one I can use in SQL (MS SQL 2K8, if it matters). Any suggestions? Is there a term for this type of translation, with which I should search?
Sounds like a lookup into a 5x5 grid of data. The inputs on axis and the output in each cell:
Y=1 Y=2 Y=3 Y=4 Y=5
x=1 A A D B A
x=2 B A A B B
x=3 C B B B B
x=4 C C C D D
x=5 C C C C C
You can store this in a table of x,y,outvalue triplets and then just do a look up on that table.
SELECT OUTVALUE FROM BUSINESS_RULES WHERE X = #X and Y = #Y;

Weird Objective-C Mod Behavior for Negative Numbers

So I thought that negative numbers, when mod'ed should be put into positive space... I cant get this to happen in objective-c
I expect this:
-1 % 3 = 2
0 % 3 = 0
1 % 3 = 1
2 % 3 = 2
But get this
-1 % 3 = -1
0 % 3 = 0
1 % 3 = 1
2 % 3 = 2
Why is this and is there a workaround?
result = n % 3;
if( result < 0 ) result += 3;
Don't perform extra mod operations as suggested in the other answers. They are very expensive and unnecessary.
In C and Objective-C, the division and modulus operators perform truncation towards zero. a / b is floor(a / b) if a / b > 0, otherwise it is ceiling(a / b) if a / b < 0. It is always the case that a == (a / b) * b + (a % b), unless of course b is 0. As a consequence, positive % positive == positive, positive % negative == positive, negative % positive == negative, and negative % negative == negative (you can work out the logic for all 4 cases, although it's a little tricky).
If n has a limited range, then you can get the result you want simply by adding a known constant multiple of 3 that is greater that the absolute value of the minimum.
For example, if n is limited to -1000..2000, then you can use the expression:
result = (n+1002) % 3;
Make sure the maximum plus your constant will not overflow when summed.
We have a problem of language:
math-er-says: i take this number plus that number mod other-number
code-er-hears: I add two numbers and then devide the result by other-number
code-er-says: what about negative numbers?
math-er-says: WHAT? fields mod other-number don't have a concept of negative numbers?
code-er-says: field what? ...
the math person in this conversations is talking about doing math in a circular number line. If you subtract off the bottom you wrap around to the top.
the code person is talking about an operator that calculates remainder.
In this case you want the mathematician's mod operator and have the remainder function at your disposal. you can convert the remainder operator into the mathematician's mod operator by checking to see if you fell of the bottom each time you do subtraction.
If this will be the behavior, and you know that it will be, then for m % n = r, just use r = n + r. If you're unsure of what will happen here, use then r = r % n.
Edit: To sum up, use r = ( n + ( m % n ) ) % n
I would have expected a positive number, as well, but I found this, from ISO/IEC 14882:2003 : Programming languages -- C++, 5.6.4 (found in the Wikipedia article on the modulus operation):
The binary % operator yields the remainder from the division of the first expression by the second. .... If both operands are nonnegative then the remainder is nonnegative; if not, the sign of the remainder is implementation-defined
JavaScript does this, too. I've been caught by it a couple times. Think of it as a reflection around zero rather than a continuation.
Why: because that is the way the mod operator is specified in the C-standard (Remember that Objective-C is an extension of C). It confuses most people I know (like me) because it is surprising and you have to remember it.
As to a workaround: I would use uncleo's.
UncleO's answer is probably more robust, but if you want to do it on a single line, and you're certain the negative value will not be more negative than a single iteration of the mod (for example if you're only ever subtracting at most the mod value at any time) you can simplify it to a single expression:
int result = (n + 3) % 3;
Since you're doing the mod anyway, adding 3 to the initial value has no effect unless n is negative (but not less than -3) in which case it causes result to be the expected positive modulus.
There are two choices for the remainder, and the sign depends on the language. ANSI C chooses the sign of the dividend. I would suspect this is why you see Objective-C doing so also. See the wikipedia entry as well.
Not only java script, almost all the languages shows the wrong answer'
what coneybeare said is correct, when we have mode'd we have to get remainder
Remainder is nothing but which remains after division and it should be a positive integer....
If you check the number line you can understand that
I also face the same issue in VB and and it made me to forcefully add extra check like
if the result is a negative we have to add the divisor to the result
Instead of a%b
Use: a-b*floor((float)a/(float)b)
You're expecting remainder and are using modulo. In math they are the same thing, in C they are different. GNU-C has Rem() and Mod(), objective-c only has mod() so you will have to use the code above to simulate rem function (which is the same as mod in the math world, but not in the programming world [for most languages at least])
Also note you could define an easy to use macro for this.
#define rem(a,b) ((int)(a-b*floor((float)a/(float)b)))
Then you could just use rem(-1,3) in your code and it should work fine.