I am dealing with OO for several years now, and sometimes I see a class, which has an attribute ( a list or table usually ) which holds references to itself and other objects of the same type.
Is there a name for this pattern ?
What is the usual case to use this and why ?
Thx in advance.
It is a very usual case. For instance a table which contains a foreign key to itself :
Employee
----------
ID
MANAGER_ID (reference to employee)
Your JPA/Hibernate entities will hold references to themselves when you are your own boss :-)
Related
I know how to convert an entity set, relationship, etc. into the relational model but what i wonder is that what should we do when an entire diagram is given? How do we convert it? Do we create a separate table for each relationship, and for each entity set? For example, if we are given the following ER diagram:
My solution to this is like the following:
//this part includes the purchaser relationship and policies entity set
CREATE TABLE Policies (
policyid INTEGER,
cost REAL,
ssn CHAR(11) NOT NULL,
PRIMARY KEY (policyid).
FOREIGN KEY (ssn) REFERENCES Employees,
ON DELETE CASCADE)
//this part includes the dependents weak entity set and beneficiary relationship
CREATE TABLE Dependents (
pname CHAR(20),
age INTEGER,
policyid INTEGER,
PRIMARY KEY (pname, policyid).
FOREIGN KEY (policyid) REFERENCES Policies,
ON DELETE CASCADE)
//This part includes Employees entity set
CREATE TABLE Employees(
ssn Char(11),
name char (20),
lot INTEGER,
PRIMARY KEY (ssn) )
My questions are:
1)Is my conversion true?
2)What are the steps for converting a complete diagram into relational model.
Here are the steps that i follow, is it true?
-I first look whether there are any weak entities or key constraints. If there
are one of them, then i create a single table for this entity set and the related
relationship. (Dependents with beneficiary, and policies with purchaser in my case)
-I create a separate table for the entity sets, which do not have any participation
or key constraints. (Employees in my case)
-If there are relationships with no constraints, I create separate table for them.
-So, in conclusion, every relationship and entity set in the diagram are included
in a table.
If my steps are not true or there is something i am missing, please can you write the steps for conversion? Also, what do we do if there is only participation constraint for a relationship, but no key constraint? Do we again create a single table for the related entity set and relationship?
I appreciate any help, i am new to databases and trying to learn this conversion.
Thank you
Hi #bigO I think it is safe to say that your conversion is true and the steps that you have followed are correct. However from an implementation point of view, there may be room for improvement. What you have implemented is more of a logical model than a physical model
It is common practice to add a Surrogate Instance Identifier to a physical table, this is a general requirement for most persistence engines, and as pointed out by #Pieter Geerkens, aids database efficiency. The value of the instance id for example EmployeeId (INT) would be automatically generated by the database on insert. This would also help with the issue that #Pieter Geerkens has pointed out with the SSN. Add the Id as the first column of all your tables, I follow a convention of tablenameId. Make your current primary keys into secondary keys ( the natural key).
Adding the Ids then makes it necessary to implement a DependentPolicy intersection table
DependentPolicyId, (PK)
PolicyId,
DependentId
You may then need to consider as to what is natural key of the Dependent table.
I notice that you have age as an attribute, you should consider whether this the age at the time the policy is created or the actual age of the dependent, I which case you should be using date of birth.
Other ornamentations you could consider are creation and modified dates.
I also generally favor using the singular for a table ie Employee not Employees.
Welcome to the world of data modeling and design.
I have read some UML class diagram. I see some UML design this way:
For example: there are two class: class Student and class Transcript. Every student has a transcript, and every transcript is together with a student. So, class Student and class Transcript is depend together. So, is this a well design class ?
If not, how can we fix this. And if this is OK, how can I implemented this relation well ?
Thanks :)
I wouldn't say it's a very good design: apparently, you want student.transcript.student == student and transcript.student.transcript == transcript to be true at any given moment, right? But what about Students without Transcripts? Transcripts without Students? If those are prohibited, you may end in a funny situation: you'll have to create corresponding Student and Transcript at the same time!
Well, in database area, this is usually modeled with three tables (which may or may not correspond directly to actual physical tables):
TABLE Student ( studentId ID PRIMARY KEY, ... )
TABLE Transcript ( transcriptId ID PRIMARY KEY, ... )
TABLE StudentTranscriptLink (
studentId ID NOT NULL UNIQUE REFERENCES Student(studentId),
transcriptID Id NOT NULL UNIQUE REFERENCES Transcript(transcriptId)
) PRIMARY KEY ( studentId, transcriptId )
The UNIQUE constraints ensure that if you take Student, get its Transcript, and get that Transcript's Student, you return to the original Student you started with; the same is true with Transcript->Student->Transcript navigation.
In the OOP world you probably would have some sort of StudentTranscriptDispatcher with List<Pair<Student, Transcript>> inside, and methods for turning Student into Transcript and back.
However, such bi-directional relationships are... unusual. It's basically having one large object sliced in two halves -- but still keeping those halves tied together very tightly. Why would you do this? It doesn't remove any complexity, on the contrary: it introduces new, artificial complexity which wasn't there before.
What you should probably think about here is navigability. It sounds like the relationship is entirely appropriate, however, do you ever have to navigate from Transcript to Student? If not, your relationship is not bi-directional and Transcript is no longer dependent on Student. Do you always locate a Student first and then a Transcript?
To be honest, I wouldn't worry about it too much. If it works for you - go with it!
I have a few tables that share only a few navigation properties and an ID.
I think Table per Concrete type inheritance would be interesting here.. (?)
It looks something like this :
Contact (Base, Abstract, not mapped)
- ContactID
- navigation properties to other tables (email, phone, ..)
Person : Contact (mapped to table Person with various properties + ContactID)
- various properties
Company : Contact (mapped to table Company with various properties + ContactID)
- various properties
Now for this to work, the primary key (contactID) should be unique across all tables.
2 options then:
- GUIDs (not a fan)
- an additional DB table generating identities (with just a ContactID field, deriving tables have FK), this would not be mapped in EF.
Is this setup doable ?
Also, what will happen in the ObjectContext ? What kind of temporary key does EF generate before calling SaveChanges ? Will it be unique across objects ?
Thanks for any thoughts.
mike.
We use a similiar construction with the folowing db design:
ContactEntity
ID
ContactPossibility
ID
Position
ContactTypeID
ContactEntityID
Address
ID (=PK and FK to ContactPossibility.ID)
Street
etc.
Telephone
ID (=PK and FK to ContactPossibility.ID)
Number
etc.
Person
ID (=PK and FK to ContactEntity.ID)
FirstName
etc.
Company
ID (=PK and FK to ContactEntity.ID)
Name
etc.
This results in the entity model in two abstract classes: ContactEntity (CE) & ContactPossibility (CP) and multiple derived classes (Address=CP, Email=CP, Person=CE, Company=CE). The abstract and derived classes (rows in the db ;) share the same unique identifier, because we use an ID field in derived classes that's a foreign key to the primary key of the abstract class. And we use Guid's for this, because our software has the requirement to function properly off-line (not connected to the main database) and we have to deal smoothly with synchronisation issues. Also, what's the problem with Guid's?
Entity Framework does support this db / class design very good and we have a lot of pleasure from this design.
Is this setup doable ?
Also, what will happen in the ObjectContext ?
What kind of temporary key does EF generate before calling SaveChanges ?
Will it be unique across objects ?
The proposed setup is very very doable!
The ObjectContext acts fine and will insert, update and delete the right tables for derived classes without effort. Temporary keys? You don't need them if you use the pattern of an ID for derived classes that is both primary key and foreign key to the abstract class. And with Guid's you can be pretty sure that's unique across objetcs.
Furthermore: The foreignKey from CP to CE will provide every CE (Person, Company, User, etc.) with a trackable collection of ContactPossibilities. Which is real cool and handy.
Hope this helps...
(not enough space in the comments section)
I've been running some tests.
The thing is you're OK as long as you ONLY specify the subtype you're querying for (ex. 'Address' in your case).
But if you query for the base type (even if you don't need the subtypes info), ex. only ContactPossibility.ID, the generated SQL will UNION all subtype tables.
So querying your 'trackable' collection of ContactPossibilities can create a performance problem.
I tried to work around this by unmapping the base entity and split the inherited entities to their own table + the common table, basically transforming the TPT into TPC : this worked fine from a conceptual perspective (after a lot of edmx editing). Until I realized this was stupid... :) Indeed in that case you will always need to Union all underlying tables to query for the common data...
(Though I'm not sure in the case described at the end of this post, didn't pursue to test it)
So I guess, since mostly I will need to query for a specific type (person, company, address, phone,..), it's gonna be OK for now and hoping MS will come with a fix in EF4.5.
So I'll have to be careful when querying, another interesting example :
Let's say you want to select a person and then query for his address, something like (tried to follow your naming) :
var person = from b in context.ContactEntities.OfType-Person-()
where b.FirstName.StartsWith("X")
select b;
var address = from a in context.ContactPossibilities.OfType-Address-()
where **a.ContactEntity == person.FirstOrDefault()**
select a;
this will produce a Union between all the tables of the Contact derived entities, and performance issues : generated SQL takes ContactPossibility table and joins to Address on ContactPossibilityID, then joins a union of all Contact derived tables joined with the base Contact table, before finally joining a filtered Person table.
However, consider the following alternative :
var person = from b in context.ContactEntities.OfType-Person-()
where b.FirstName.StartsWith("X")<BR>
select b;
var address = from a in context.ContactPossibilities.OfType-Address-()
where **a.ContactID == person.FirstOrDefault().ID**
select a;
This will work fine : generated SQL takes ContactPossibility table and joins to Address on ContactPossibilityID, and then joins the filtered Person table.
Mike.
I've got two entities, one called Site and the other called Assignment. A Site may or may not have an associated Assignment. An Assignment is only ever associated with one Site. In terms of C#, Site has a property of type Assignment which could hold a null reference.
I have got two tables by the same names in the database. The Assignment table's PK is also its FK back to the Site table (rather than Site having a nullable FK pointing to Assignment). The SQL (with fields omitted for brevity) is as follows
CREATE TABLE Site(
SiteId INT NOT NULL CONSTRAINT PK_Site PRIMARY KEY)
CREATE TABLE Assignment(
AssignmentId INT NOT NULL CONSTRAINT PK_Assignment PRIMARY KEY,
CONSTRAINT FK_Assignment_Site FOREIGN KEY (AssignmentId) REFERENCES Site (SiteId))
I'm using Fluent NHibernate's auto persistence model, which I think I will have to add an override to in order to get this to work. My question is, how do I map this relationship? Is my schema even correct for this scenario? I can change the schema if needs be.
You need to read these:
http://ayende.com/Blog/archive/2009/04/19/nhibernate-mapping-ltone-to-onegt.aspx
http://gnschenker.blogspot.com/2007/06/one-to-one-mapping-and-lazy-loading.html
https://www.hibernate.org/162.html
it's not possible to have one-to-ones lazy loaded unless they are not-nullable, or you map them as a many-to-one with one item in it
Let's say I have a table that represents a super class, students. And then I have N tables that represent subclasses of that object (athletes, musicians, etc). How can I express a constraint such that a student must be modeled in one (not more, not less) subclass?
Clarifications regarding comments:
This is being maintained manually, not through an ORM package.
The project this relates to sits atop SQL Server (but it would be nice to see a generic solution)
This may not have been the best example. There are a couple scenarios we can consider regarding subclassing, and I just happened to invent this student/athlete example.
A) In true object-oriented fashion, it's possible that the superclass can exist by itself and need not be modeled in any subclasses.
B) In real life, any object or student can have multiple roles.
C) The particular scenario I was trying to illustrate was requiring that every object be implemented in exactly one subclass. Think of the superclass as an abstract implementation, or just commonalities factored out of otherwise disparate object classes/instances.
Thanks to all for your input, especially Bill.
Each Student record will have a SubClass column (assume for the sake of argument it's a CHAR(1)). {A = Athlete, M=musician...}
Now create your Athlete and Musician tables. They should also have a SubClass column, but there should be a check constraint hard-coding the value for the type of table they represent. For example, you should put a default of 'A' and a CHECK constraint of 'A' for the SubClass column on the Athlete table.
Link your Musician and Athlete tables to the Student table using a COMPOSITE foreign key of StudentID AND Subclass. And you're done! Go enjoy a nice cup of coffee.
CREATE TABLE Student (
StudentID INT NOT NULL IDENTITY PRIMARY KEY,
SubClass CHAR(1) NOT NULL,
Name VARCHAR(200) NOT NULL,
CONSTRAINT UQ_Student UNIQUE (StudentID, SubClass)
);
CREATE TABLE Athlete (
StudentID INT NOT NULL PRIMARY KEY,
SubClass CHAR(1) NOT NULL,
Sport VARCHAR(200) NOT NULL,
CONSTRAINT CHK_Jock CHECK (SubClass = 'A'),
CONSTRAINT FK_Student_Athlete FOREIGN KEY (StudentID, Subclass) REFERENCES Student(StudentID, Subclass)
);
CREATE TABLE Musician (
StudentID INT NOT NULL PRIMARY KEY,
SubClass CHAR(1) NOT NULL,
Instrument VARCHAR(200) NOT NULL,
CONSTRAINT CHK_Band_Nerd CHECK (SubClass = 'M'),
CONSTRAINT FK_Student_Musician FOREIGN KEY (StudentID, Subclass) REFERENCES Student(StudentID, Subclass)
);
Here are a couple of possibilities. One is a CHECK in each table that the student_id does not appear in any of the other sister subtype tables. This is probably expensive and every time you need a new subtype, you need to modify the constraint in all the existing tables.
CREATE TABLE athletes (
student_id INT NOT NULL PRIMARY KEY,
FOREIGN KEY (student_id) REFERENCES students(student_id),
CHECK (student_id NOT IN (SELECT student_id FROM musicians
UNION SELECT student_id FROM slackers
UNION ...))
);
edit: #JackPDouglas correctly points out that the above form of CHECK constraint is not supported by Microsoft SQL Server. Nor, in fact, is it valid per the SQL-99 standard to reference another table (see http://kb.askmonty.org/v/constraint_type-check-constraint).
SQL-99 defines a metadata object for multi-table constraints. This is called an ASSERTION, however I don't know any RDBMS that implements assertions.
Probably a better way is to make the primary key in the students table a compound primary key, the second column denotes a subtype. Then restrict that column in each child table to a single value corresponding to the subtype represented by the table. edit: no need to make the PK a compound key in child tables.
CREATE TABLE athletes (
student_id INT NOT NULL PRIMARY KEY,
student_type CHAR(4) NOT NULL CHECK (student_type = 'ATHL'),
FOREIGN KEY (student_id, student_type) REFERENCES students(student_id, student_type)
);
Of course student_type could just as easily be an integer, I'm just showing it as a char for illustration purposes.
If you don't have support for CHECK constraints (e.g. MySQL), then you can do something similar in a trigger.
I read your followup about making sure a row exists in some subclass table for every row in the superclass table. I don't think there's a practical way to do this with SQL metadata and constraints. The only option I can suggest to meet this requirement is to use Single-Table Inheritance. Otherwise you need to rely on application code to enforce it.
edit: JackPDouglas also suggests using a design based on Class Table Inheritance. See his example or my examples of the similar technique here or here or here.
If you are interested in data modeling, in addition to object modeling, I suggest you look up "relational modeling generalization specialization" on the web.
There used to be some good resources out there that explains this kind of pattern quite well.
I hope those resources are still there.
Here's a simplified view of what I hope you'll find.
Before you begin designing a database, it's useful to come up with a conceptual data model that connects the values stored in the database back to the subject matter. Making a conceptual data model is really data analysis, not database design. Sometimes it's difficult to keep analysis and design separate.
One way of modeling data at the conceptual level is the Entity-Relationship (ER) model. There are well known patterns for modeling the specialization-generalization situation. Converting those ER patterns to SQL tables (called logical design) is pretty straightforward, although you do have to make some design choices.
The case you gave of a student having possibly several roles like musician probably doesn't illustrate the case you are interested in, if I read you right. You seem to be interested in the case where the subclasses are mutually exclusive. Perhaps the case where a vehicle might be an auto, a truck, or a motorcycle might be easier to discuss.
One difference you are likely to encounter is that the general table for the superclass doesn't really need the type code column. The type of a single superclass instance can be derived by the presence or absence of foreign keys in the various subclass tables. Whether it's smarter to include or omit the type code depends on how you intend to use the data.
interesting problem. Of course the FK constraints are there for the subtables so there has to be a student for those.
The main problem is trying to check as it is inserted. The student has to be inserted first so that you don't violate a FK constraint in a subtable so a trigger that does a check wouldn't work.
You could write an app that checks now and then if you are really concerned about this. I think the biggest fear though would be deletions. Someone could delete a subtable entry but not the student. You could have triggers to check when items are deleted from the subtables since that is probably the biggest problem.
I have a db with a table per subclass hierarchy like this as well. I use Hibernate and its mapped properly so it deletes everything automatically. If doing this by 'hand' then I would make sure to always delete the parent with proper cascades hehe :)
Thanks, Bill. You got me thinking...
The superclass table has a subclass code column. Each of the subclass tables has a foreign key constraint, as well as one that dictates that the id exist with a subset of the superclass table (where code = athlete).
The only missing part here is that it's possible to model a superclass without a subclass. Even if you make the code column mandatory, it could just be an empty join. That can be fixed by adding a constraint that the superclass's ids exist in a union of the ids in the subclass tables. Insertion gets a little hairy with these two constraints if constraints are enforced in the middle of transactions. That or just don't worry about unsubclassed objects.
Edit: Bleh, such a good sounding idea... But impeded by the fact that subqueries that refer to other tables aren't supported. At least not in SQL Server.
That can be fixed by adding a constraint that the superclass's ids exist in a union of
the ids in the subclass tables.
Depending on how much intelligence you want to put into your schema (and how much MS SQL Server lets you put there), you wouldn't actually need to do a union of the subclass tables, since you know that, if the id exists in any subclass table, it must exist in the same subclass as the one identified by the subclass code column.
I would add a Check Constraint possibly.
Create the ForeignKeys as Nullable.
Add a Check to make sure they aren't both null and to make sure they aren't both set.
CONSTRAINT [CK_HasOneForiegnKey] CHECK ((FK_First!= NULL OR FK_Second != NULL) AND NOT (FK_First != NULL AND FK_Second != NULL)).
I am not sure but I believe this would allow you to set only one key at a time.