I'm not sure how to express this problem, so my apologies if it's already been addressed.
I have business rules summarized as a table of outputs given two inputs. For each of five possible value on one axis, and each of five values on another axis, there is a single output. There are ten distinct possibilities in these 25 cells, so it's not the case that each input pair has a unique output.
I have encoded these rules in TSQL with nested CASE statements, but it's hard to debug and modify. In C# I might use an array literal. I'm wondering if there's an academic topic which relates to converting logical rules to matrices and vice versa.
As an example, one could translate this trivial matrix:
A B C
-- -- -- --
X 1 1 0
Y 0 1 0
...into rules like so:
if B OR (A and X) then 1 else 0
...or, in verbose SQL:
CASE WHEN FieldABC = 'B' THEN 1
WHEN FieldABX = 'A' AND FieldXY = 'X' THEN 1
ELSE 0
I'm looking for a good approach for larger matrices, especially one I can use in SQL (MS SQL 2K8, if it matters). Any suggestions? Is there a term for this type of translation, with which I should search?
Sounds like a lookup into a 5x5 grid of data. The inputs on axis and the output in each cell:
Y=1 Y=2 Y=3 Y=4 Y=5
x=1 A A D B A
x=2 B A A B B
x=3 C B B B B
x=4 C C C D D
x=5 C C C C C
You can store this in a table of x,y,outvalue triplets and then just do a look up on that table.
SELECT OUTVALUE FROM BUSINESS_RULES WHERE X = #X and Y = #Y;
Related
Say I have some sorted result from a SQL query that looks like:
x y z
0 0 0
0 0 1
0 0 2
0 1 0
0 1 1
0 2 0
0 2 1
Where x, y and z are sort ranks. These sort ranks are always greater than 0, and smaller than 500mil.
Is there a way to combine the values from x, y and z into one "master" sort rank? Sorting the dataset using this "master" sort rank should result in the same ordering.
I'm thinking I can do something with bit shifting but I am not sure...
Assuming that every value in each of the three columns in between 1 and 500 million, you could use the following formula to generate a unique rank:
1000000
z + (500 x 10^6)*y + (500 x 10^6)*(500 x 10^6)*x
To generate this rank you could use the following query:
SELECT
x, y, z,
z + (500 * 1000000)*y + (500 * 1000000)*(500 * 1000000)*x AS master_rank
FROM yourTable;
The reason this works can be seen by examining say the z and y columns. The largest value from z is 500 million, which is guaranteed to be smaller than the smallest value in y, which is 1 billion. This logic applies to the whole formula. This approach is similar to using a bit mask, on a larger scale.
Note that I assume that your version of SQL can tolerate numbers this large. If it doesn't, then you might want to consider another approach here, possibly just ordering as #Gordon mentioned in his answer. Besides this, having 1 bil x 1 bil records would make for a very large table and would have other problems.
Do you mean something like this?
order by x * 10000 + y * 100 + z
(You would adjust the numbers for the width you need.)
I'm not sure why you would want to do that instead of:
order by x, y, z
If you do combine into a single value, be careful about integer overflow.
I'm looking at some exercices that reguards FSAs and my teacher is giving and odd solution for the language
I would have solved that language with a Not-Deterministic TM cause you have to remember n for a, b and c.
This is the given solution
Is this solution incorrect?
The example above does work and therefore is correct.
It is clear that for n = 1 the FSA posted above does work.
Now let's condiser the case n > 1, we have to have the same amount of a, b and c between every possible string composed of {a,b,c}. This string can be seen as the same string with n = 1 and the other n - 1 repetitions can be put in the {a,b,c}* group of strings, hence this FSA can accept that language correctly.
I've some variables, Lets say a, b, c, d. All belongs to a fixed interval [0, e]
Now i've some relations between them like
a > b
a > c
b > d
Something like this; I want to make a function which print all the possible cases for this.
Example:
a b c d
a c b d
a b d c
a c b d
In essence, what you have is a directed acyclic graph.
A relatively simple approach is to store, for each variable, a set of the variables that must precede them. (In your example, this storage would map b to {a}, c to {a}, and d to {b}.) You can then write a recursive function that generates all valid tails consisting of a subset of these variables (in your case, for example, the subset {c,d} produces two valid tails: [c,d] and [d,c]). This recursive function examines each variable in the subset and determines whether its prerequisites are already met. (For example, since b maps to {a}, any subset including both a and b cannot produce a tail that begins with b.) If so, then it can recursively call itself on the subset excluding that variable.
There are some optimizations you can then perform, if desired. For example, you can use dynamic programming to avoid repeatedly re-computing the set of valid tails for the same subset.
Hi all I have an exam coming up and am not getting much help from the lecturer on two questions on the practice exam. She has provided the answer but has not responded to my questions about the answer, I'm hoping someone here would be able to explain why the answer is the way it is.
Consider the following two tables R and S with their instances:
R S
A B C D E
a x y x y
a z w z w
b x k
b m j
c x y
f g h
a) πA(R[natural join]B=D S)
the answer being (a,b,c), why isn't it (a,a,b,c)? does a projection make it distinct?
b) π A(R[natural join] B<>D S)
the answer being (a,b,c,f), why is a an answer? b=d both times when values are x and z, so why is this being printed out?
a)In Relation Algebra, the projection operator provides duplicate elimination. In SQL this is not the default operation, but it is for relational algebra. Here is my source. At the moment, I can't recall why it does duplicate elimination, but this was my professor for databases and he is very knowledgeable. (I think it's because Relation Algebra uses set-logic and sets do not have duplicates.)
b)The joining of 2 tables creates a CROSS PRODUCT between the 2 tables. You have 6 rows and 2 rows. So the cross product is 6x2 = 12 rows. For row 1 of table R, you have a x y. This will be paired with x y AND z w resulting in [a x y x y] and [a x y z w]. The second pairing is valid for this relational algebra statement. Columns B and D do not match x != z.
a) πA(R[natural join]B=D S)
the answer being (a,b,c), why isn't it (a,a,b,c)? does a projection make it distinct?
In relational algebra, duplicate tuples are not permitted; that a main difference between sql (where distinct is needed) and relational algebra
b) π A(R[natural join] B<>D S)
the answer being (a,b,c,f), why is a an answer? b=d both times when values are x and z, so why is this being printed out?
Natural join operation returns the set of all combinations of tuples in R and S, so in this case returns also tuples (a x y z w) and (a z w x y); thus a has to be in the resulting projection.
[natural join] B=D
This is not a natural join because "natural join" is a join that joins relations exclusively over attributes of the same name. The construct you describe might in some places be labeled/termed an "equijoin" or so, but it is certainly noy a "natural join".
[natural join] B<>D
This is not a natural join because "natural join" is a join that joins together tuples of the argument relations if and only if the attribute values are equal.
You are being hopelessly mistaught and miseducated. Reference material : "an introduction to database systems", C.J.Date. It won't do you any good for your exams, but if you seek a later career in database technology it might be worthwhile to remember this.
But to answer your actual questions (in line with preceding answers) :
a) The attribute value 'a' cannot appear twice in the result of a projection, because a projection produces a relation, and a relation is defined to be a set, and sets cannot contain duplicates.
b) The [non-] natural join contains both the tuples (axyzw) and (azwxy). "First" tuple from R with "second" tuple from S, and other way round. The projection includes the result (a).
I have a question about the SQL standard which I'm hoping a SQL language lawyer can help with.
Certain expressions just don't work. 62 / 0, for example. The SQL standard specifies quite a few ways in which expressions can go wrong in similar ways. Lots of languages deal with these expressions using special exceptional flow control, or bottom psuedo-values.
I have a table, t, with (only) two columns, x and y each of type int. I suspect it isn't relevant, but for definiteness let's say that (x,y) is the primary key of t. This table contains (only) the following values:
x y
7 2
3 0
4 1
26 5
31 0
9 3
What behavior is required by the SQL standard for SELECT expressions operating on this table which may involve division(s) by zero? Alternatively, if no one behavior is required, what behaviors are permitted?
For example, what behavior is required for the following select statements?
The easy one:
SELECT x, y, x / y AS quot
FROM t
A harder one:
SELECT x, y, x / y AS quot
FROM t
WHERE y != 0
An even harder one:
SELECT x, y, x / y AS quot
FROM t
WHERE x % 2 = 0
Would an implementation (say, one that failed to realize on a more complex version of this query that the restriction could be moved inside the extension) be permitted to produce a division by zero error in response to this query, because, say it attempted to divide 3 by 0 as part of the extension before performing the restriction and realizing that 3 % 2 = 1? This could become important if, for example, the extension was over a small table but the result--when joined with a large table and restricted on the basis of data in the large table--ended up restricting away all of the rows which would have required division by zero.
If t had millions of rows, and this last query were performed by a table scan, would an implementation be permitted to return the first several million results before discovering a division by zero near the end when encountering one even value of x with a zero value of y? Would it be required to buffer?
There are even worse cases, ponder this one, which depending on the semantics can ruin boolean short-circuiting or require four-valued boolean logic in restrictions:
SELECT x, y
FROM t
WHERE ((x / y) >= 2) AND ((x % 2) = 0)
If the table is large, this short-circuiting problem can get really crazy. Imagine the table had a million rows, one of which had a 0 divisor. What would the standard say is the semantics of:
SELECT CASE
WHEN EXISTS
(
SELECT x, y, x / y AS quot
FROM t
)
THEN 1
ELSE 0
END AS what_is_my_value
It seems like this value should probably be an error since it depends on the emptiness or non-emptiness of a result which is an error, but adopting those semantics would seem to prohibit the optimizer for short-circuiting the table scan here. Does this existence query require proving the existence of one non-bottoming row, or also the non-existence of a bottoming row?
I'd appreciate guidance here, because I can't seem to find the relevant part(s) of the specification.
All implementations of SQL that I've worked with treat a division by 0 as an immediate NaN or #INF. The division is supposed to be handled by the front end, not by the implementation itself. The query should not bottom out, but the result set needs to return NaN in this case. Therefore, it's returned at the same time as the result set, and no special warning or message is brought up to the user.
At any rate, to properly deal with this, use the following query:
select
x, y,
case y
when 0 then null
else x / y
end as quot
from
t
To answer your last question, this statement:
SELECT x, y, x / y AS quot
FROM t
Would return this:
x y quot
7 2 3.5
3 0 NaN
4 1 4
26 5 5.2
31 0 NaN
9 3 3
So, your exists would find all the rows in t, regardless of what their quotient was.
Additionally, I was reading over your question again and realized I hadn't discussed where clauses (for shame!). The where clause, or predicate, should always be applied before the columns are calculated.
Think about this query:
select x, y, x/y as quot from t where x%2 = 0
If we had a record (3,0), it applies the where condition, and checks if 3 % 2 = 0. It does not, so it doesn't include that record in the column calculations, and leaves it right where it is.