I want to store tuples {a implies c, not c,...}. How do I reference the operators? Is there a specific vocabulary I should use?
edited after first comment
Related
Suppose I have three objects A, B, C with relationships one A to many B and one A to many C. This naturally implies the existence of a many B to many C relationship, but the implication is clearly not recognized by the computer.
The questions are,
(i) How can this many2many be defined so that it respects the links as given through the already existing relationships?
(ii) Are there any special means of displaying said relationship in the form-view for each of objects B and C?
(iii) Is it possible that this is inherently the meaning of a many2many relationship and that I should just browse through the plethora of non-existent examples in the documentation?
You should be able to define a related fields.Many2many that uses relationships from B to C. See: Related Fields Documentation
For example:
Model_A:
b_ids = fields.One2many(comodel_name='B',
inverse_name='a_id')
c_ids = fields.One2many(comodel_name='C',
inverse_name='a_id')
Model_B:
a_id = fields.Many2one(comodel_name='A')
c_ids = fields.Many2many(comodel_name='C',
related='a_id.c_ids')
Model_C:
a_id = fields.Many2one(comodel_name='A')
b_ids = fields.Many2many(comodel_name='B',
related='a_id.b_ids')
Once you've defined the related fields, all the normal Many2many interactions will work (views, ORM, etc). You can add store=True to the field definition to store the relation in its own database table for easier searching and queries.
I have 3 tables: Persons, Variables, Person_Data.
Person_Data table has numerical data on various variables for different persons. Columns are: variable_value, person_id (foreign key to Persons) and variable_id (fk to Variables).
Some of the variables are related to each other (for example: Income, Family size and Per-capita-income). I want to create a Variable_Relationship table to store this type of information and perform data sanity check. One of the column in the table would be Dependant_Variable_Id (LHS of the relationship).
The issue is that the number of RHS variables is not fixed and neither is the mathematical expression.
Is there a way to implement this nicely?
Right now I am thinking about a relationship_definition text column along with another table that has Relationship_Id and RHS_VariableId columns.
In my opinion there is no way to manage it in SQL, since you have no way to interpret dynamically formulas expressed in column values.
Depending on the language you use to access data on the database, you should develop an expression parser (you can search for plenty of open source libraries providing such a feature) and use it to parse the expressions read from the RHS column, evaluate them and perform sanity checks.
At 9:34 in this video the speaker says that all 3 functional dependencies are in Boyce Codd Normal Form. I don't believe it because clearly GPA can't determine the SSN, sName, address and all other attributes in the student table. Either I'm confused about the definition of Boyce Codd Normal Form or what a super key is? Does it only have to be able to uniquly identify certain attributes, not all attributes in the schema? For example GPA does determine priority (which is on the right side of the functional dependency) but not everything else.
For example if I had the relation R(A,B,C,D) and the FDs A->B would we say A is a superkey for B but I thought a super key is for the whole table? To add to my confusion I know for BCNF it can be a (primary) key but you can only have on primary key for the table. Ugh my brain hurts.
"... the speaker says that all 3 functional dependencies are in Boyce Codd Normal Form."
To be in BC normal form is a property that can be had by RELATIONS (relation variables, more specifically, or relation schemas, if that term suits you better), not by functional dependencies. If you find someone talking so sloppily of normalization theory, leave and move onto more accurate explanations.
Whether or not a relation variable is indeed in BC normal form, depends on which functional dependencies are supposed to hold in it. That is why it is utter nonsense to say that functional dependencies are or are not in BC normal form.
"I don't believe it because clearly GPA can't determine the SSN, sName, address and all other attributes in the student table. Either I'm confused about the definition of Boyce Codd Normal Form or what a super key is? Does it only have to be able to uniquly identify certain attributes, not all attributes in the schema?"
An irreducible candidate key is that set (not necessarily unique) of attributes of the relation schema that is guaranteed to have unique combinations of attribute values in whatever relation values could validly appear in the relation variable in the database.
In your (A,B,C,D) example, if A->B is the only FD that holds, then the only candidate key is {A,C,D}.
"For example if I had the relation R(A,B,C,D) and the FDs A->B would we say A is a superkey for B"
It is sloppy and confusing to talk of A as being the "key" for B in such a case. People who pretend to be teaching others ought to know this, and people who don't, ought not engage in any teaching until they do know this. It would be better to talk of A as the "determinant" for B in such contexts. The term "key" in the context of relational database design has a very well-defined and precise meaning, and using the same term for other meanings merely confuses people. As evidenced by your question.
"but I thought a super key is for the whole table?"
Yes you thought right.
Back to your (A,B,C,D) example. If we were to split that design into (A,B) and (A,C,D), then we would have a relation schema -the (A,B) one- of which we can say that "{A} is a key" in that schema.
That is actually precisely what the FD A->B means : if you take the projection -of the relation value that would appear in the database in the (A,B,C,D) schema- over the attributes {A,B}, then you should be getting a relation in which no A value appears twice (if it did, then that A value would correspond to >1 distinct B value, meaning that A could not possibly be a determinant for B after all).
"To add to my confusion I know for BCNF it can be a (primary) key but ..."
Now you are being sloppy yourself. What does "it" refer to ?
I came across this term called database closure.
I tried to look for it and what exactly it means but I have not found any simple explanation.
Can someone please explain what the concept of closure is and specifically what is a database closure, if it is good /bad, how it can be used or avoided ?
Also seems like there is in general a closure term: http://en.wikipedia.org/wiki/Closure_%28computer_science%29 which relates to binding of variables to function. Is a database closure related to this ?
Thanks!
Closure is actually a relatively simple concept. When designing databases we want to know that our database tables have as little redundancy as possible. This means making sure that we can have as little relationships between sets (or tables) as possible.
An example:
If we have two sets X and Y (which you can think of as two tables called X and Y) and they have a relationship with each other as so:
X -> Y (Read this as Y is dependent on X)
And we have another set Z which is dependent on Y:
Y -> Z (also read as Y determines Z)
To find the closure we find the minimum number of tables that we can reach all relationships with. In this case all we need is X.
So now, when we design our database we know that we only have to have a relationship from X, and Z and Y can actually be derived from X. We can therefore make sure there are no extra relationships in our database which cause redundancy.
If you want to read more, closure is a part of a topic called normalisation.
Closure is mentioned in database theory / set theory discussions -- as in, Dr. Codd / design & normalization kind of stuff. It has to do with finding the minimally representational elements of sets (i.e., without redundancy, etc.). I tried reading-up on it a long time ago, but my eyes went crossed, and I got a really bad headache.
If you want to read a decent summary of closure, here is one: http://www.cs.sfu.ca/CC/354/jpei/slides/ClosureDecomposition.pdf
All operations are performed on an entire relation and result in an entire relation, a concept known as closure. And that is one of relational database systems characteristics
The closure is essentially the full set of attributes that can be determined from a set of known attributes, for a given database, using its functional dependencies.
Formal math definition:
Given a set of functional dependencies, F, and a set of attributes X. The closure is defined to be the set of attributes Y such that X -> Y follows from F.
Algorithm definition:
Closure(X, F)
1 INITIALIZE V:= X
2 WHILE there is a Y -> Z in F such that:
- Y is contained in V and
- Z is not contained in V
3 DO add Z to V
4 RETURN V
It can be shown that the two definition coincide.
A database closure might refer to the closure of all of the database attributes. According to the definitions above, this closure would be the set of all attributes of the database itself.
The closure (computer science) term that you linked to is not related to closure in databases but the mathematical closure is.
For a better understanding of functional dependencies and a simple example for closure in databases I suggest reading this.
If we are referring to Closure in the Functional Dependency sense (relating to database design),
The closure of a set F of functional dependencies is the set of all functional dependencies logically implied by F.
The minimal representation of sets is referred to as the canonical cover: the irreducible set of FD's that describe the closure.
I've been trying to encode a relational algebra in Scala (which to my knowlege has one of the most advanced type systems) and just don't seem to find a way to get where I want.
As I'm not that experienced with the academic field of programming language design I don't really know what feature to look for.
So what language features would be needed, and what language has those features, to implement a statically verified relational algebra?
Some of the requirements:
A Tuple is a function mapping names from a statically defined set of valid names for the tuple in question to values of the type specified by the name. Lets call this name-type set the domain.
A Relation is a Set of Tuples with the same domain such that the range of any tuple is uniqe in the Set
So far the model can eaisly be modeled in Scala simply by
trait Tuple
trait Relation[T<Tuple] extends Set[T]
The vals, vars and defs in Tuple is the name-type set defined above. But there should'n be two defs in Tuple with the same name. Also vars and impure defs should probably be restricted too.
Now for the tricky part:
A join of two relations is a relation where the domain of the tuples is the union of the domains from the operands tuples. Such that only tuples having the same ranges for the intersection of their domains is kept.
def join(r1:Relation[T1],r2:Relation[T2]):Relation[T1 with T2]
should do the trick.
A projection of a Relation is a Relation where the domain of the tuples is a subset of the operands tuples domain.
def project[T2](r:Relation[T],?1):Relation[T2>:T]
This is where I'm not sure if it's even possible to find a sollution. What do you think? What language features are needed to define project?
Implied above offcourse is that the API has to be usable. Layers and layers of boilerplate is not acceptable.
What your asking for is to be able to structurally define a type as the difference of two other types (the original relation and the projection definition). I honestly can't think of any language which would allow you to do that. Types can be structurally cumulative (A with B) since A with B is a structural sub-type of both A and B. However, if you think about it, a type operation A less B would actually be a supertype of A, rather than a sub-type. You're asking for an arbitrary, contravariant typing relation on naturally covariant types. It hasn't even been proven that sort of thing is sound with nominal existential types, much less structural declaration-point types.
I've worked on this sort of modeling before, and the route I took was to constraint projections to one of three domains: P == T, P == {F} where F in T, P == {$_1} where $_1 anonymous. The first is where the projection is equivalent to the input type, meaning it is a no-op (SELECT *). The second is saying that the projection is a single field contained within the input type. The third is the tricky one. It is saying that you are allowing the declaration of some anonymous type $_1 which has no static relationship to the input type. Presumably it will consist of fields which delegate to the input type, but we can't enforce that. This is roughly the strategy that LINQ takes.
Sorry I couldn't be more helpful. I wish it were possible to do what you're asking, it would open up a lot of very neat possibilities.
I think I have settled on just using the normal facilities for mapping collection for the project part. The client just specify a function [T<:Tuple](t:T) => P
With some java trickery to get to the class of P I should be able to use reflection to implement the query logic.
For the join I'll probably use DynamicProxy to implement the mapping function.
As a bonus I might be able to get the API to be usable with Scalas special for-syntax.