Understanding abstract class OOD -> relational schema [duplicate] - oop

I have the below UML class diagram with Abstract Class, and sub-Classes that extends from it. and i want to make an ER diagram using this class diagram.
My question is how can i represent the Abstract class in ER diagram ? as a Table ? or should i just ignore it ?
Thank you.

There are basically three choices to translate generalization into a database model
1. One table per concrete class
Create tables Admin, Teacher and Student. Each of these table contain columns for all of the attributes and relations of User
Pro
All fields of a concrete subclass are in the same table, so no join needed to get all Student data
Easy data validation constraints (such as mandatory fields for Student)
Con
All fields of User are duplicated in each subclass table
Foreign keys to User have to be split into three FK fields. One for Admin, one for Teacher and one for Student.
2. On table for whole generalization set
In this case you just have one table call User that contains all fields of User + all fields of all sub-classes of User
Pro
All fields are in the same table, so no join needed to get all User data
No splitting of FK's to User
Con
There are a bunch of fields that are never used. All fields specific for Student and Teacher are never filled in for Admins and vice versa
Data validation such as mandatory fields for a concrete class such as Student become rather complex as it is no longer a simple Not Null constraint.
3. One table per concrete class, and one for the superclass
In this case you create tables for each of the concrete sub-classes and you create a table for the class User. Each of the concrete sub-class tables has a mandatory FK to User
Pro
Most normalized schema: No repeated fields for the attributes of user, and no unused fields.
No splitting of FK's to User
Easy data validation constraints (such as mandatory fields for Student)
Con
You have to query two tables if you want all data of a Student
Complex validation rules to make sure each User record has exactly one Admin, Teacher or Student record.
Which one of these options you choose depends on a number of things such as the number of sub-classes, the number of attributes in either subclass or superclass, the number of FK's to the superclass, and probably a few other things I didn't think about.

Related

Entity Framework: One to Many relationship

I have a design problem with regards to Entity Framework model relationship
I have this model in the edmx
Business Rule:
A Participant can have multiple Roles so I create a relationship table ParticipantRoles that has 1-to-Many relationship on the Participant and the Role table
The Problem:
In order to get the Participant's Role value, I have to drill down through Participant->ParticipantRole->Role (see JSON output below)
The Question:
In EF, how to design the table relationship to bypass the ParticipantsRole table. I want to access the Role in something like this Particant.Role and not Participant.ParticipantsRole.Role
You say A Participant can have multiple Roles. And of course, a Role can have multiple Participants. So basically this is a many-to-many association.
Entity Framework will only map pure many-to-many associations (without connecting class) when the junction table only has two foreign keys. In your case, if the table ParticipantsRole only would have had a primary key consisting of ParticipantId and RoleId at the time of generating the model the class ParticipantsRole would not have been created. You would have had Participant.Roles and Role.Participants as navigation properties.
However, the model has been generated with ParticipantsRole and you want to get rid of it. (Or not, I'll get back to that).
This is what you can do:
Remove ParticipantRoles from the class diagram.
Modify the database table ParticipantRoles so it only has the two FK columns, that both form the primary key.
Update the model from the database and select ParticipantsRole in the Add tab.
This should give you a model with a pure many-to-many association.
However, think twice before you do this. M2m associations have a way of evolving into 1-m-1association (as you've got now). The reason is that sooner or later the need is felt to record data about the association, so the junction table must have more fields and stops being a pure junction table. In your case I can imagine that one day participant's roles must have a fixed order, or one marked as default. It can be a major overhaul to change a m2m association into 1-m-1 in a production environment. - Something to consider...

Setup Many-to-Many tables that share a common type

I'm preparing a legacy Microsoft SQL Server database so that I can interface with in through an ORM such as Entity Framework, and my question revolves around handling the setup of some of my many-to-many associations that share a common type. Specifically, should a common type be shared among master types or should each master type have its own linked table?
For example, here is a simple example I concocted that shows how the tables of interest are currently setup:
Notice that of there are two types, Teachers and Students, and both can contain zero, one, or many PhoneNumbers. The two tables, Teachers and Students, actually share an association table (PeoplePhoneNumbers). The field FKID is either a TeacherId or a StudentId.
The way I think it ought to be setup is like this:
This way, both the Teachers table and the Students table get its own PhoneNumbers table.
My gut tells me the second way is the proper way. Is this true? What about even if the PhoneNumbers tables contains several fields? My object oriented programmer brain is telling me that it would be wrong to have several identical tables, each containing a dozen or so fields if the only difference between these tables is which master table they are linked to? For example:
Here we have two tables that contain the same information, yet the only difference is that one table is addresses for Teachers and the other is for Students. These feels redundant to me and that they should really be one table -- but then I lose the ability for the database to constrain them (right?) and also make it messier for myself when I try to apply an ORM to this.
Should this type of common type be merged or should it stay separated for each master type?
Update
The answers below have directed me to the following solution, which is based on subclassing tables in the database. One of my original problems was that I had a common table shared among multiple other tables because that entity type was common to both the other tables. The proper way to handle that is to subclass the shared tables and essentially descend them from a common parent AND link the common data type to this new parent. Here's an example (keep in mind my actual database has nothing to do with Teachers and Students, so this example is highly manufactured but the concepts are valid):
Since Teachers and Students both required PhoneNumbers, the solution is to create a superclass, Party, and FK PhoneNumbers to the Party table. Also note that you can still FK tables that only have to do with Teachers or only have to do with Students. In this example I also subclassed Students and PartTimeStudents one more level down and descended them from Learners.
Where this solution is very satisfactory is when I implement it in an ORM, such as Entity Framework.
The queries are easy. I can query all Teachers AND Students with a particular phone number:
var partiesWithPhoneNumber = from p in dbContext.Parties
where p.PhoneNumbers.Where(x => x.PhoneNumber1.Contains(phoneNumber)).Any()
select p;
And it's just as easy to do a similar query but only for PhoneNumbers belonging to only Teachers:
var teachersWithPhoneNumber = from t in dbContext.Teachers
where t.Party.PhoneNumbers.Where(x => x.PhoneNumber1.Contains(phoneNumber)).Any()
select t;
Teacher and Student are both subclasses of a more general concept (a Person). If you create a Person table that contains the general data that is shared for all people in your database and then create Student and Teacher tables that link to Person and contain any additional details you will find that you have an appropriate point to link in any other tables.
If there is data that is common for all people (such as zero to many phone numbers) then you can link to the Person table. When you have data that is only appropriate for a Student you link it to the Student ID. You gain the additional advantage that Student Instructors are simply a Person with both a Student and Teacher record.
Some ORMs support the concept of subclass tables directly. LLBLGen does so in the way I describe so you can make your data access code work with higher level concepts (Teacher and Student) and the Person table will be managed on your behalf in the low level data access code.
Edit
Some commentary on the current diagram (which may not be relevant in the source domain this was translated from, so a pinch of salt is advised).
Party, Teachers and Learners looks good. Salaries looks good if you add start and end dates for the rate so you can track salary history. Also keep in mind it may make sense to use PartyID (instead of TeacherID) if you end up with multiple entites that have a Salary.
PartyPhoneNumbers looks like you might be able to hang the phone number off of that directly. This would depend on if you expect to change the phone number for multiple people (n:m) at once or if a phone number is owned by each Party independently. (I would expect the latter because you might have a student who is a (real world) child of a teacher and thus they share a phone number. I wouldn't want an update to the student's phone number to impact the teacher, so the join table seems odd here.)
Learners to PaymentHistories seems right, but the Students vs PartTimeStudents difference seems artificial. (It seems like PartTimeStudents is more AttendenceDays which in turn would be a result of a LearnerClasses join).
I think you should look into the supertype/subtype pattern. Add a Party or Person table that has one row for every teacher or student. Then, use the PartyID in the Teacher and Student tables as both the PK and FK back to Party (but name them TeacherID and StudentID). This establishes a "one-to-zero-or-one" relationship between the supertype table and each of the subtype tables.
Note that if you have identity columns in the subtype tables they will need to be removed. When creating those entities going forward you will first have to insert to the supertype and then use that row's ID in either subtype.
To maintain consistency you will also have to renumber one of your subtype tables so its IDs do not conflict with the other's. You can use SET IDENTITY_INSERT ON to create the missing supertype rows after that.
The beauty of all this is that when you have a table that must allow only one type such as Student you can FK to that table, but when you need an FK that can be either--as with your Address table--you FK to the Party table instead.
A final point is to move all the common columns into the supertype table and put only columns in the subtypes that must be different between them.
Your single Phone table now is easily linked to PartyID as well.
For a much more detailed explanation, please see this answer to a similar question.
The problem that you have is an example of a "one-of" relationship. A person is a teacher or a student (or possibly both).
I think the existing structure captures this information best.
The person has a phone number. Then, some people are teachers and some are students. The additional information about each entity is stored in either the teacher or student table. Common information, such as name, is in the phone table.
Splitting the phone numbers into two separate tables is rather confusing. After all, a phone number does not know whether it is for a student or a teacher. In addition, you don't have space for other phone numbers, such as for administrative staff. You also have a challenge for students who may sometimes teach or help teach a class.
Reading your question, it looks like you are asking for a common database schema to your situation. I've seen several in the past, some easier to work with than others.
One option is having a Student_Address table and a Teacher_Address table that both use the same Address table. This way if you have entity specific fields to store, you have that capability. But this can be slightly (although not significantly) harder to query against.
Another option is how you suggested above -- I would probably just add a primary key on the table. However you'd want to add a PersonTypeId field to that table (PersonTypeId which links to a PersonType table). This way you'd know which entity was with each record.
I would not suggest having two PhoneNumber tables. I think you'll find it much easier to maintain with all in the same table. I prefer keeping same entities together, meaning Students are a single entity, Teachers are a single entity, and PhoneNumbers are the same thing.
Good luck.

Table Per Subclass Vs Table Per concrete class in hibernate?

In most of the web application, i have seen one base class consisting common properties and number of subclasses extending base class . So my question here is which strategy we should go for among Table Per Subclass Vs Table Per concrete class. I personally feel we should go for table per subclass because in future if we want to introduce the common column we can do it at one place but in case of concrete class we have to do it in multiple tables. Right?
But yes if we want to fetch all deatils from all child tables i think Table per concrete class will be helpful Because we have to simply union the records from all tables but in case of Table per Sub class along with union we have to introduce the join with parent table which will be extra costlier .Right?
You might be interested in Section 2.12 "Inheritance Mapping Strategies" of the JPA 2.0 specification, as it sums up all possbible inheritance types as well as their advantages and drawbacks. Let me pull out just the most interesting fragments:
2.12.1 Single Table per Class Hierarchy Strategy
This mapping strategy provides good support for polymorphic
relationships between entities and for queries that range over the
class hierarchy. It has the drawback, however, that it requires that
the columns that correspond to state specific to the subclasses be
nullable.
2.12.3 Table per Concrete Class Strategy
This strategy has the following drawbacks:
- It provides poor support for polymorphic relationships.
- It typically requires that SQL UNION queries (or a separate SQL query per subclass) be issued for queries that are intended to range over the class hierarchy.
2.12.2 Joined Subclass Strategy
It has the drawback that it requires that one or more join operations
be performed to instantiate instances of a subclass. In deep class
hierarchies, this may lead to unacceptable performance. Queries that
range over the class hierarchy likewise require joins.
Also, if you're planning to be JPA-compatible, remember that the JPA-provider doesn't have to support TABLE_PER_CLASS strategy type.
I personally feel we should go for table per subclass because in
future if we want to introduce the common column we can do it at one
place but in case of concrete class we have to do it in multiple
tables.
True, but JOINED strategy also provides you the same feature and allows to specify common properties in one table.
Hope that helps!
The Object-Relational Impedance Mismatch
In Object Model, while creating object we may require to use inheritance i.e. Generalization as follows:
In Relational Model, the above Generalization(not association i.e. one-to-one or many-to-many) can achieve in Hibernate ORM with the following three inheritance mapping strategies:
Table Per Class i.e. for Hierarchy only one table
Table Per Concrete class i.e. One table for each concrete class not for super class
Table Per Subclass i.e. One table fore each class
In this strategy, we can map the whole hierarchy into single table, here we use one more discriminator column i.e. TYPE.
In this strategy, tables are created as per class but related by foreign key. So there are no duplicate columns.
In this strategy, tables are created as per class but related by foreign key. So there are no duplicate columns.
image source

“Relation” versus “relationship” in RDBMS/SQL?

Coming from question “Relation” versus “relationship”
What are definitions of "relation" vs. "relationship" in RDBMS (or database theory)?
Update:
I was somewhat perplexed by comment to my question:
"relation is a synonym for table, and
thus has a very precise meaning in
terms of the schema stored in the
computer"
Update2:
Had I answered incorrectly that question , in terms of RDBMS, having written that relation is one-side direction singular connection-dependence-link,
i.e. from one table to another while relationship implies (not necessarily explicitly) more than one link connection in one direction (from one table to another)?
A RELATION is a subset of the cartesian product of a set of domains (http://mathworld.wolfram.com/Relation.html). In everyday terms a relation (or more specifically a relation variable) is the data structure that most people refer to as a table (although tables in SQL do not necessarily qualify as relations).
Relations are the basis of the relational database model.
Relationships are something different. A relationship is a semantic "association among things".
Relation is a mathematical term referring to a concept from set theory. Basically, in RDBMS world, the "relational" aspect is that data is organized into tables which reflect the fact that each row (tuple) is related to all the others. They are all the same type of info.
But then, your have ER (Entity Relationship) which is a modeling methodology in which you identify objects and their relationships in the real world. Then each object is modelled as a table, and each relationship is modelled as a table that contains only foreign keys.
For instance, if you have 3 entities: Teacher, Student, Class; then you might also create a couple of tables to record these 2 relationships: TaughtBy and StudyingIn. The TaughtBy table would have a record with a Teacher ID and a Class ID to record that this class is taught by this teacher. And the StudyingIn table would have a Student ID and a Class ID to reflect that the student is taking this class.
That way, each student can be in many Classes, and each Teacher can be in many classes without needing to have a field which contains a list of class ids in any records. SQL cannot deal with field containing a list of things.
A relation is a table with columns and rows.
and
relationship is association between relations/tables
for example employee table has relation in branch its called relationship between employee table and branch table

ORM question - JPA

I'm reading Pro JPA 2. The book talks begins by talking about ORM in the first few pages.
It talks about mapping a single Java class named Employee with the following instance variables - id,name,startDate, salary.
It then goes on to the issue of how this class can be represented in a relational database and suggests the following scheme.
table A: emp
id - primary key
startDate
table B: emp_sal
id - primary key in this table, which is also a foreign key referencing the 'id' column in table A.
It thus seems to suggest that persisting an Employee instance to the database would require operations on two(multiple) tables.
Should the Employee class have an instance variable 'salary' in the first place?
I think it should possibly belong to a separate class (Class salary maybe?) representing salary and thus the example doesn't seem very intuitive.
What am I missing here?
First, the author explains that there are multiples ways to represent a class in a database: sometimes the mapping of a class to a table is straightforward, sometimes you don't have a direct correspondence between attributes and columns, sometimes a single class is represented by multiples tables:
In scenario (C), the EMP table has
been split so that the salary
information is stored in a separate
EMP_SAL table. This allows the
database administrator to restrict
SELECT access on salary information to
those users who genuinely require it.
With such a mapping, even a single
store operation for the Employee class
now requires inserts or updates to two
different tables.
So even storing the data from a single class in a database can be a challenging exercise.
Then, he describes how relationships are different. At the object level model, you traverse objects via their relations. At the relational model level, you use foreign keys and joins (sometimes via a join table that doesn't even exist at the object model level).
Inheritance is another "problem" and can be "simulated" in various ways at the relational model level: you can map an entire hierarchy into a single table, you can map each concrete class to its own table, you can map each class to its own table.
In other words, there is no direct and unique correspondence between an object model and a relational model. Both rely on different paradigms and the fit is not perfect. The difference between both is known as the impedance mismatch, which is something ORM have to deal with (allowing the mapping between an object model and the many possible representations in a relation model). And this is what the whole section you're reading is about. This is also what you missed :)