BCNF Conditions not clear - sql

I read the below where R is the relation schema, X is the set of attributes and A is an attribute in R. Let F be set of FDs. For R to be in BCNF, for every X-> A in F following must hold:
1) A is a subset of X
2) X is a superkey
In 2), why does X have to be a superkey? Shouldn't the condition be X is a candidate key because I understand that for BCNF, for every non trivial dependency, a key determines some attribute.
What will go wrong if I replace 2) with X is a candidate key?

Here is simple example: suppose relation R(X, A, B), X candidate key.
If you have FD X->A, you also trivially have FD (X, B) -> A, however (X, B) is not a candidate key (it is not minimal), but it is a superkey.

Related

What is eq_rect and where is it defined in Coq?

From what I have read, eq_rect and equality seem deeply interlinked. Weirdly, I'm not able to find a definition on the manual for it.
Where does it come from, and what does it state?
If you use Locate eq_rect you will find that eq_rect is located in Coq.Init.Logic, but if you look in that file there is no eq_rect in it. So, what's going on?
When you define an inductive type, Coq in many cases automatically generates 3 induction principles for you, appending _rect, _rec, _ind to the name of the type.
To understand what eq_rect means you need its type,
Check eq_rect.
here we go:
eq_rect
: forall (A : Type) (x : A) (P : A -> Type),
P x -> forall y : A, x = y -> P y
and you need to understand the notion of Leibniz's equality:
Leibniz characterized the notion of equality as follows:
Given any x and y, x = y if and only if, given any predicate P, P(x) if and only if P(y).
In this law, "P(x) if and only if P(y)" can be weakened to "P(x) if P(y)"; the modified law is equivalent to the original, since a statement that applies to "any x and y" applies just as well to "any y and x".
Speaking less formally, the above quotation says that if x and y are equal, their "behavior" for every predicate is the same.
To see more clearly that Leibniz's equality directly corresponds to eq_rect we can rearrange the order of parameters of eq_rect into the following equivalent formulation:
eq_rect_reorder
: forall (A : Type) (P : A -> Type) (x y : A),
x = y -> P x -> P y

Form of the secret key in CP-ABE

Could someone tell me how does the output of the KeyGen algorithm in CP-ABE look like? KeyGen(MK, S). The key generation algorithm will take as input a set of attributes S and output a key that identifies with that set. The algorithm first
chooses a random r ∈ Zp, and then random rj ∈ Zp for each attribute j ∈ S. Then it computes the key as
SK = (D = g(α+r)/β, ∀j ∈ S : Dj = gr · H(j)rj ,D′j = grj )
When Dj and D'j have to be calculated, do we need to take it for the entire set of attributes S OR for each individual attribute in S?

AMPL: Modeling vehicles to departure "every n hours"

I want to model that departures from a node can only take place in a "every n hours" manner. I've started to model this using two variables - starttime[i,j,k] shows when vehicle k departured i with j as destination, x[i,j,k] is a binary variable having value 1 if vehicle k drove from i to j, and 0 otherwise. The model is:
maximize maxdrive: sum{i in V, j in V, k in K} traveltime[i,j]*x[i,j,k];
subject to TimeConstraint {k in K}:
sum{i in V, j in V} (traveltime[i,j]+servicetime[i])*x [i,j,k] <= 1440;
subject to StartTime{i in V,j in V, k in K}:
starttime[i,j,k] + traveltime[i,j] - 9000 * (1 - x[i,j,k]) <= starttime[j,i,k];
subject to yvar{i in V, j in V}:
sum{k in K} x[i,j,k] <= maxVisits[i,j];
subject to Constraint1{i in V, j in V, k in K, g in V, h in K}:
starttime[i,j,k] + TimeInterval[i]*x[i,j,k] <= starttime[i,g,h];
The constraint in question is "Constraint1" where i is the origin node, j the destination node, and k is the vehicle. The index g is used to show that the later departure can be to any destination node. TimeInterval corresponds to the interval intended, i.e. if TimeInterval at i is 2 hours, the starttime of the next vehicle to departure from i must not be less than 2 hours from previous departure. The origins corresponds to specific products (only available from said origin node) whereas I want the vehicles to not be bounded to a specific origin node - they should be able to jump between nodes to utilize backhauling etc. In other words, I want to conduct this constraint without having restraints on the vehicles themselves but rather the origin nodes.
The objective function to "maximize the traveltime" may seem strange, but the objective function is rather obsolete really. If the constraints are met, the solution is adequate. To maximize traveltime is merely an attempt to "force" the x variables to become 1.
The question is: how can I do this? With this formulation, all x[i,j,k] variables dissappears from the answer (without this constraint, some of the binary variables x becomes 1 and the other 0. The solution meets the maxVisits requirement. With the constraint all x variables becomes 0 and all starttimes becomes 0 as well. MINTO (The solver) doesn't state that the problem is infeasible either.
Also, how to separate the vehicles so the program recognizes that it is a comparison between all departures? I would rather to not include time dimensions, at it would give so much more variables.
EDIT: After trying a new model using a non-linear solver I've seen some strange results. Specifically, I'm using the limit 1440 (minutes) as an upper bound as to for how long a vehicle can operate each day. Using this model below the solution is 0 for every variable, but the starttime for all combinations of i,j,k is 720 (half of 1440). Does anyone have any clue in regards of what causing this solution? How did this constraint remove the link between starttime being higher than 0 requiring that x must be 1.
subject to StartTimeSelf{i in V, j in V, k in K, g in K, h in V}:
starttime[i,j,k]*x[i,j,k] + TimeInterval[i]*x[i,j,k] + y[i,k] <= starttime[i,h,g]*x[i,j,k];

relational algebra natural join

Hi all I have an exam coming up and am not getting much help from the lecturer on two questions on the practice exam. She has provided the answer but has not responded to my questions about the answer, I'm hoping someone here would be able to explain why the answer is the way it is.
Consider the following two tables R and S with their instances:
R S
A B C D E
a x y x y
a z w z w
b x k
b m j
c x y
f g h
a) πA(R[natural join]B=D S)
the answer being (a,b,c), why isn't it (a,a,b,c)? does a projection make it distinct?
b) π A(R[natural join] B<>D S)
the answer being (a,b,c,f), why is a an answer? b=d both times when values are x and z, so why is this being printed out?
a)In Relation Algebra, the projection operator provides duplicate elimination. In SQL this is not the default operation, but it is for relational algebra. Here is my source. At the moment, I can't recall why it does duplicate elimination, but this was my professor for databases and he is very knowledgeable. (I think it's because Relation Algebra uses set-logic and sets do not have duplicates.)
b)The joining of 2 tables creates a CROSS PRODUCT between the 2 tables. You have 6 rows and 2 rows. So the cross product is 6x2 = 12 rows. For row 1 of table R, you have a x y. This will be paired with x y AND z w resulting in [a x y x y] and [a x y z w]. The second pairing is valid for this relational algebra statement. Columns B and D do not match x != z.
a) πA(R[natural join]B=D S)
the answer being (a,b,c), why isn't it (a,a,b,c)? does a projection make it distinct?
In relational algebra, duplicate tuples are not permitted; that a main difference between sql (where distinct is needed) and relational algebra
b) π A(R[natural join] B<>D S)
the answer being (a,b,c,f), why is a an answer? b=d both times when values are x and z, so why is this being printed out?
Natural join operation returns the set of all combinations of tuples in R and S, so in this case returns also tuples (a x y z w) and (a z w x y); thus a has to be in the resulting projection.
[natural join] B=D
This is not a natural join because "natural join" is a join that joins relations exclusively over attributes of the same name. The construct you describe might in some places be labeled/termed an "equijoin" or so, but it is certainly noy a "natural join".
[natural join] B<>D
This is not a natural join because "natural join" is a join that joins together tuples of the argument relations if and only if the attribute values are equal.
You are being hopelessly mistaught and miseducated. Reference material : "an introduction to database systems", C.J.Date. It won't do you any good for your exams, but if you seek a later career in database technology it might be worthwhile to remember this.
But to answer your actual questions (in line with preceding answers) :
a) The attribute value 'a' cannot appear twice in the result of a projection, because a projection produces a relation, and a relation is defined to be a set, and sets cannot contain duplicates.
b) The [non-] natural join contains both the tuples (axyzw) and (azwxy). "First" tuple from R with "second" tuple from S, and other way round. The projection includes the result (a).

database index: why pairing

I have a table with multiple indexes, several of which duplicate the same columns:
Index 1 columns: X, B, C, D
Index 2 columns: Y, B, C, D
Index 3 columns: Z, B, C, D
I'm not very knowledgeable on indexing in practice, so I'm wondering if somebody can explain why X, Y and Z were paired with these same columns. B is an effective date. C is a semi-unique key ID for this table for a specific effective date B. D is a sequence that identifies the priority of this record for the identifier C.
Why not just create 6 indexes, one for each X, Y, Z, B, C, D?
I want to add an index to another column T, but in some contexts I'll only be querying on T alone while in others I will also be specifying the B, C and D columns... so should I create just one index like above or should I create one for T and one for (T, B, C, D)?
I've not had as much luck as expected when googling for comprehensive coverage of indexing. Any resources where I can get a through explanation and lots of examples of B-tree indexing?
The rule with indexing is that an index can be used to filter on any list of columns that constitute a prefix of the columns used for that index.
In other words, we can use Index 1 when we filter on X and B, or X, B and C, or just X, or all four.
However, we cannot use the index to filter "in the middle". This is because indexes work not entirely unlike concatenating the values of those columns for each row, and sorting the result. If we know what the thing we're looking for begins with, we can figure out where in the index to look - just like when doing binary search.
That's why a single index is no good: if we need to filter on B, C, D, and one of X, Y and Z, we need three indexes; X, Y is no good as an index for just filtering on Y, because the prefix of the values we're looking for - the X - is not known.
As Daniel mentioned, a covering index is a possible explanation for repeating B, C, and D: even if D is never filtered on, it may be the case that we need exactly the columns which you see in your indexes, and we can then just read the columns from the index instead of just using the index to locate the row.
One reason for having B, C and D in those indexes might be to have a covering index for frequently used queries. You will have a covering index when the index itself contains all the required data fields for a particular query.
A covering index can dramatically speed up data retrieval, since only the index pages, not the data pages, will be used to retrieve the data.
Below is an example query where index 1 would be a covering index:
SELECT B, C, D FROM table WHERE X = '10'
You should create it in (T, B, C, D).
Let's say you have two fields with an index in a table: A and B. When you create a separate index on each one of the columns, and have a query such as:
SELECT * FROM table WHERE A = 10 AND B = 20
What happens is either:
1) The DB creates two intermediate result-sets, one with rows where A = 10, and another one with rows where B = 20. It then has to merge these two result-sets into one (and also check for duplicate rows).
2) The DB creates one result-set with rows where A = 10. It then has to go manually through all of the rows in this intermediate result-set and check in each one where B = 10.
However when you know that index B depends on index A, and your query uses A before B, you can create one index for both of the columns: (A, B)
What this means that now the DB will first find all rows where A = 10, but because B is part of the same index, it can use the same index information to filter the result-set into rows where B is also 20. It doesn't have to make two intermediate result-sets + merge them, or only use one of the indexes and do manual scan for the other.
There might be other ways that the DB deals with these situations as well, it largely depends on an implementation.
The indexes in the form (X, B, C, D) can be used to optimize queries like:
... WHERE X rel sthg (possibly ORDER BY B, C, D)
... WHERE X = sthg AND B rel sthg (possibly ORDER BY C, D)
... WHERE X = sthf AND B = sthg AND C rel sthg (possibly ORDER BY D)
etc. where rel are arbitrary relation operators (<, >, =, <=, >=) and sthg are values or expressions. Especially the second two, and the sorting variants wouldn't be optimized by the "single column indexes variant".
OTOH, it cannot optimize a query
... WHERE B = sthg
because it starts in the middle of the index; here, the single column index would work.
For a resource where you can get a through explanation and lots of examples regarding indexes on Oracle (and any other Oracle-related issue), you should visit and bookmark askTom.