Imagine we have two types of requests, an InvoiceRequest and a QuoteRequest. How would you prefer the object model (classes) be and the database model ? Which one of the following two make more sense ?
InvoiceRequest:
- id
- amount
- discount
- date
- invoiceSpecificFieldHere
QuoteRequest:
- id
- amount
- discount
- date
- quoteSpecificFieldHere.
Or does this one make more sense?
RequestData:
- amount
- discount
- date
InvoiceRequest:
- id
- requestData: <RequestData>
- invoiceSpecificProperty
QuoteRequest:
- id
- requestData: <RequestData>
- quoteSpecificProperty.
I'm not representing a third option using inheritance in purpose.
The question behind this question, is the following; if we go with design 2, we reduce redundancy, however there is something about it that doesn't feel right. I think discount should be at the same level as quoteSpecificProperty. And putting it inside the requestData object doesn't model this correctly.
My impression is that you are mixing concepts from object-oriented modeling and relational data modeling. This is since your second solution is not correct from a relational data modeling point of view.
Since I do not know your exact needs in term of implementation of the model, I'll try to propose a solution for different situations.
If you want to use a pure Object-Oriented Model, implemented with an object-oriented language, you should obviously define a superclass Request, with two subclasses InvoiceRequest and QuoteRequest, both of them with the specific properties.
If you want to implement your situation in a pure relational model, with a relational database, you should define three tables:
Requests:
- id (Primary Key)
- amount
- discount
- date
InvoiceRequests:
- id (Primary Key) (Foreign Key for Requests)
- invoiceSpecificProperty
QuoteRequests:
- id (Primary Key) (Foreign Key for Requests)
- quoteSpecificProperty.
Finally, if you want to use an Object-Relational Mapping, you should design a superclass Request, with two subclasses InvoiceRequest and QuoteRequest, both of them with the specific properties, and then you can map it onto a relational database with a model like the previous one.
Of course there is another possibility in the relational modeling, i.e. to have a single table Requests, with all the attributes, includind quote specific and invoice specific, as well as an attribute to distinguish which kind of request is the current one.
The second one has a lot more sense, because when you design your objects, and the fields they have, you are making and abstraction of the real word, and how it see and what behavior has in it. You are dealing here with something called normalization,
Database Normalization, or simply normalization, is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity.
That relationship not always match perfectly with the reality in the world, but you must abstract from the real word and to treat the data as it is related to each other.
I will share with you some information I collected this week.
Maybe the SOLID principles would help you to decide that.
SOLID =(Single responsibility principle,Open/closed principle,
Liskov substitution principle,Interface segregation principle,
Dependency inversion principle or Dependency injection principle.
Alright, that's much more than property abstraction. Let see Some examples:
S
According Wikipedia, Single responsibility principle means
One class shall have only one reason that justifies changing its
implementation;
Classes shall have few dependencies on other classes;
Classes shall be abstract from the particular layer they are running.
O
When you define a class or a unit, keep in mind:
They shall be open for extension;
But closed for modification.
About modification, think that, in bug situation, which you are obligated to do that, a modification in second model is most easy for common fields.
First model
InvoiceRequest:
- id
- amount
- discount
- date
- invoiceSpecificFieldHere
QuoteRequest:
- id
- amount
- discount
- date
- quoteSpecificFieldHere.
Second model-Common fields
QuoteRequest:
- id
- requestData: <RequestData>
- quoteSpecificProperty.
L
According "Barbara Liskovs substitution principle" , if TChild is a subtype of TParent, then objects of type TParent may be replaced with objects of type TChild without altering any of the desirable properties of that program (correctness, task performed, etc.).
I mean, the objects of TParent, the instances of TParent, not the TParent classes properly.
That is an interesting topic to think when you want to implement this example using Interface. Also follow:
I
Interface segregation principle
D
Dependency Inversion Principle
Another form of decoupling is to invert the dependency between high and low level of a software design:
- High-level modules should not depend on low-level modules. Both
should depend on abstractions;
- Abstractions should not depend upon details. Details should depend
upon abstractions.
To know more about SOLID principle, read http://blog.synopse.info/post/2011/11/27/SOLID-design-principles
In resume, observe three characteristics of an object model:
Rigidity – Hard to change something because every change affects too
many other parts of the system;
Fragility – When you make a change, unexpected parts of the system
break;
Immobility – Hard to reuse in another application because it cannot
be disentangled from the current application.
Special thanks for A.Bouchez, source http://blog.synopse.info/post/2011/11/27/SOLID-design-principles
Invoice and quote are two completelly different things even they look similar. It's better to keep them separated because changes to one might produce unwanted side effects to the other.
Related
Using Core Data, I have two entities that have many-to-many relationships. So:
Class A <<---->> Class B
Both relationships are set up as 'ordered' so I can track they're order in a UITableView. That works fine, no problem.
I am about to try and implement iCloud with this Core Data model, and find out that iCloud doesn't support ordered relationships, so I need to reimplement the ordering somehow.
I've done this with another entity that has a one-to-many relationship with no problem, I add an 'order' attribute to the entity and store it's order information there. But with a many-to-many relationship I need an unknown number of order attributes.
I can think of two solutions, neither of which seem ideal to me so maybe I'm missing something;
Option 1. I add an intermediary entity. This entity has a one-to-many relationship with both entities like so:
Class A <<--> Class C <-->> Class B
That means I can have the single order attribute in this helper entity.
Option 2. Instead of an order attribute that stores a single order number, I store a dictionary that I can store as many order numbers as I need, probably with the corresponding object (ID?) as the key and the order number as the value.
I'm not necessarily looking for any code so any thoughts or suggestions would be appreciated.
I think your option 1, employing a "join table" with an order attribute is the most feasible solution for this problem. Indeed, this has been done many times in the past. This is exactly the case for which you would use a join table in Core Data although the framework already gives you many-to-many relationships: if you want to store information about the relationship itself, which is precisely your case. Often these are timestamps, in your case it is a sequence number.
You state: "...solutions, neither of which seem ideal to me". To me, the above seems indeed "ideal". I have used this scheme repeatedly with great performance and maintainability.
The only problem (though it is the same as with a to-one relationship) is that when inserting an item out of sequence you have to update many entities to get the order right. That seems cumbersome and could potentially harm performance. In practice, however, it is quite manageable and performs rather well.
NB: As for arrays or dictionaries to be stored with the entity to keep track of ordering information: this is possible via so-called "transformable" attributes, but the overhead is daunting. These attributes have to be serialized and deserialized, and in order to retrieve one sequence number you have to get all of them. Hardly an attractive design choice.
Before we had ordered relationships for more than 10 years, everyone used a "helper" entity. So that is the thing that you should do.
Additional note 1: This is no "helper" entity. It is a entity that models a fact in your model. In my books I always had the same example:
You have a group entity with members. Every member can belong to many groups. The "helper" entity is nothing else than membership.
Additional note 2: It is hard to synchronize such an ordered relationship. This is why it is not done automatically. However, you have to do it. Since CD and synchronizing is no fun, CD and synchronizing a model with ordered relationship is less than no fun.
This is a bit of a complex one, and even trying to think it over is somewhat confusing.
Basically I'm having to design a series of tables that will house information about many different pieces of electrical equipment. The arrangement of this equipment is quite complex, and can vary fairly drastically.
The different types of equipment are as follows:
RDC - Remote Distribution Center
EBD - Electrical Bus Duct
UPB - Upright Panel Board
PDU - Power Distribution Unit
Now the way these units work together is slightly confusing as well.
PDU - Powers RDC's, EBD's, and UPB's. They are often redundant, and have a secondary
unit that powers the same equipment in the event of a power failure.
Can also contain breakers and power equipment directly.
RDC - Powers nearly all the equipment on the data center floors, are usually redundant.
They have two units side by side, being powered by a PDU. In the event of a
failure, the second RDC is activated and resumes operations.
EBD - Nearly identical to the RDC, being phased out, but still needs to be tracked in a
similar fashion.
UPB - Similar to an RDC, however, they are not redundant.
Now what I'm trying to do is figure out the most simplistic method of tracking this crazy relationship between all the different items?
I need to track the redundant sources for all possible hardware, but also what powers each unit. This can be quite complex because if two PDUs power a set of two RDCs, we need to be able to track exactly what goes where.
Any idea on exactly where to start?
EDIT Here is a visual representation of what I'm after. The objects that are touching are redundant, and must be documented as such. Also, the different hardware that is connected to each device must be cataloged.
Set up one table for equipment, one table for power supplies, then a third table that matches a piece of equipment with its power supply.
This sounds like a job for an entity-relationship model. You can learn more about that here: enter link description here
But, in the interest of answering your question, here's how I would set it up. I believe I understand the relationships between entities. My shorthand follows this pattern: Table [TableName] ([columns]). I tried to name them so they make the relationships obvious.
Table RDC (id)
Table PDU (id)
Table UPB (id, PduId) // Many-to-one relationship between UPBs and Pdus
Table PDU (id)
Table PDU_RDC (PduId, RdcId) // represents many-to-many relationship between PDUs and RDCs
Table PDU_EBD (PduId, EbdId) // represents many-to-many relationship between PDUs and EBDs
Good luck!
Instead of focusing on "entities" focus on basic facts. Each gives a table or view.
Some of the basic facts just involve entities; others are about (ids of) entities:
RDC(id) // id identifies a remote distribution center
powers(pid,rid) // PDU pid powers RDC rid
backup(rid1,rid2) // RDC rid1 is backed up by RDC rid2
active(rid) // RDC is active
Until you supply adequate statements you want to make/use we can only answer you with guesses or principles; give statements and business rules we can suggest alternatives and rearrangements.
When you get AND between two statements you already have, the table with that statement is expressible as a JOIN of the two statements' tables.
You can introduce notions like hardware type but the tables/statements for that way will involve simpler statements (for which you may have defined tables). The former tables/statements are joins of the latter, and the latter are projections of the former. This means you can write views of either way in terms of the other. Neither is more complex; you have fewer things with more parts or more simpler things. Queries involving given statement will be simpler--but using the appropriate view neither is more complex. However, each way has corresponding versions of constraints and SQL might make certain constraints hard to express declaratively. Investigate join performance later as a non-premature optimization.
When a column is a function of a set of columns there is an FD from the set to the column. A column set forms a key when all other columns are functions of it but of no subset. FDs and keys are kinds of constraint.
There will be certain constraints that a projection of a source table is always a subset of a projection of a target table (maybe the same one). That's an IND. Informally it means something(c1,...) IMPLIES otherthing(c1,...). Formally, EXISTS x1,... t1(c1,...,x1,...) IMPLIES EXISTS y1,... t2(c1,...,y1,...). If the target projection' columns form a key in its table, there's also a FK. SQL FK [sic] declarations actually declare INDs.
There will be other constraints.
Supplying whateever-to-whateverness for a table is just one property about it. Not being 0-or-more-to-0-or-more means a corresponding FD or IND holds. People talk about "a" "1-to-n" "relationship" between entity types or tables but that's just sloppy unclear expression of some constraint. Make sure you know exactly the table(s) and constraint(s) that means.
Read about ORM2 (or NIAM or FCO-IM) because it is based on relational principles (although could be moreso).
I have a question about the modelling of classes and the underlying database design.
Simply put, the situation is as follows: at the moment we have Positions and Accounts objects and tables and the relationship between them is that a Position 'has an' Account (an Account can have multiple Positions). This is simple aggregation and is handled in the DB by the Position table holding an Account ID as a foreign key.
We now need to extend this 'downwards' with Trades and Portfolios. One or more Trades make up a Position (but a Trade is not a Position in itself) and one or more Portfolios make up an Account (but a Portfolio is not an Account in itself). Trades are associated with Portfolios just like Positions are associated with Accounts ('has a'). Note that it is still possible to have a Position without Trades and an Account without Portfolios (i.e. it is not mandatory to have all the existing objects broken down in subcomponents).
My first idea was to go simply for the following (the first two classes already exist):
class Account;
class Position {
Account account;
}
class Portfolio {
Account account;
}
class Trade {
Position position;
Portfolio portfolio;
}
I think the (potential) problem is clear: starting from Trade, you might end up in different Accounts depending if you take the Position route or the Portfolio route. Of course this is never supposed to happen and the code that creates and stores the objects should never be able create such an inconsistency. I wonder though whether the fact that it is theoretically possible to have an inconsistent database implies a flawed design?
Looking forward to your feedback.
The design is not flawed just because there are two ways to get from class A to class D, one way over B and one over C. Such "squares" will appear often in OOP class models, sometimes not so obvious, especially if more classes lie in the paths. But as Dan mentioned, always the business semantics determine if such a square must commute or not (in the mathematic sense).
Personally I draw a = sign inside such a square in the UML diagram to indicate that it must commute. Also I note the precise formula in an UML comment, in my example it would be
For every object a of class A: a.B.D = a.C.D
If such a predicate holds, then you have basically two options:
Trust all programmers to not break the rule in any code, since it is very well documented
Implement some error handling (like Dan and algirdas mentioned) or, if you don't want to have such code in your model, create a Checker controller, which checks all conditions in a given model instance.
I have a rails app in which I have a group of models (let's call them Events) that have some fields in common (date, title, user_id), but then I need some "subtypes". A SalesEvent might have a article_id and an amount. An InterviewEvent might have a comments field. And so on.
I know 3 business requirements I need to meet:
in some occasions I'll want to frame the Events as a whole (i.e. "get all the Events for this user, and sort them chronologically, grouped in months")
in other occasions I will need only the "subtypes" ("get all the articles sold by this user").
the number of subtypes can be moderately high (still TBD, but we estimate around 20, depending on user feedback)
I'm pondering about how to structure the tables to support this model. I came out with 5 possible ways to model this, but each one has its own drawbacks.
Option A: Separate tables - sales_events and interview_events. This would make 2) very simple, and 3) feasible, but 1) would be very cumbersome to implement.
Option B: Single table inheritance. This would solve 1) and 2) more or less easily, but but has the issue of requiring more and more nullable fields, which doesn't play well with 3)
Option C: Using hstore - Since we're using Postgres in production, we could use hstore - we would have a "data" field governed by a "type" string field. This would solve 1), 2) and 3), but ties us to postgresql, and we would implement a key business object in a technology we are not very familiar with. I'd rather avoid that if possible.
Option D: events table with polymorphic link to ***_event_data. We would basically have an events table with a type and event_data_id, and then we would have sale_event_data, interview_event_data, etc. This satisfies 1) and 3) well, but 2) is a bit weak than in other approaches, since there will be lots of joins involved in linking the events with their data.
Option E: Sale has_one :event. This does the same as Option D, except that the "link to the other" is on the "data" part. It also solves 1) and 3), and also involves some joins in 2), but it seems a bit more "clean"; there are no polymorphic associations here, just "regular" sql ones.
Right now I'm inclined to use Option E. But I'd like to know if anyone sees an obvious disadvantage on it, or a greater benefit in one of the other options, or a better option that I didn't think of.
I have used almost all your suggested options. While I would eliminate options A, B and D for the following reasons, I can't talk about C because I don't know hstore and don't use Postgres:
Option A: Separate tables, as you said, would be very difficult to maintain. Each time you would want to change the structure of events, you'd have to do it on all the sub_events tables.
Option B: Single table inheritance, I have used it a lot and dropped it. I felt like a big design drawback between what you see in the database and what your models look like. Lots of nil fields also.
Option D: events table with polymorphic link to *_event_data. Polymorphic tables are not meant for that purpose. They are a way to have different type fields in a model so you could reference it without specifying the type explicitly.
Option E seems OK, but where the foreign key should be stored? Hard to tell and may lead to difficult to maintain situations.
Personally, I would go with the code I want to write, what would make using it and reading it later easier. I like things when they are more specific. And I would simply change the way I name my models so that it satisfies my needs. You have to be creative!
I would rather write something like that:
conference.event_information.users OR
sales_event.settings.title OR
interview.shared_information.comments OR
event.interview_details.starting_at
With all that examples, I'd use classical has_many and belongs_to relationships.
I think that the whole concept of data types and inheritance can put you in situations where it does not solve problems or make things clearer. Sometimes you just need to see things a little differently.
I hope it helps.
Rails doesn't support Multiple Table Inheritance by default, but it turns out it's possible to model it pretty closely.
See this article:
http://mediumexposure.com/multiple-table-inheritance-active-record/
Basically, it uses a module to "modify" Option D. I'm still pondering about Wawa Loo's answer, but this one is also worth considering.
EDIT: more on multiple-table inheritance: a gem called "citier" http://peterhamilton.github.com/citier/index.html
EDIT2: I ended up using multiple_table_inheritance:
https://github.com/mhuggins/multiple_table_inheritance
But I'm not very satisfied with the results. This is probably one of those places where having the business data tightly coupled with the persistence policies (as ActiveRecord does) doesn't help very much. It does the job sufficiently well, but it is not perfect (notably, instance methods can be "inherited", but not class methods. Things like scopes have to be repeated/mixed in separatedly on each subclass).
My model looks like this:
InsurancePolicy
VehicleInsurancePolicy
AbcInsurancePolicy
DefInsurancePolicy
HomeInsurancePolicy
GhiInsurancePolicy
PqrInsurancePolicy
SomeOtherInsurancePolicy
... etc
where InsurancePolicy is an abstract class which is the base class for all concrete implementations of insurance policies. AbcInsurancePolicy , DefInsurancePolicy , etc are implementations which correspond to a certain insurance products. Sometimes I define other abstract classes for subgroups of policies with a subset of common fields (like VehicleInsurancePolicy).
I mapped this classes using a "Table per subclass, using a discriminator" strategy. The InsurancePolicy table contains about 60 fields, and each joined table adds from 10 to 30 fields. I used this strategy because:
I have a lot of subclasses with a lot of fields. A table-per-class-hierarchy strategy would end having a single table with a lot of null columns.
I want to be able to extend the application by adding other subclasses without changing the schema of InsurancePolicy table.
The InsurancePolicy is used often as a many-to-one relationship in other entities like Payment, Document etc.
NHibernate generates a lot of left-outer-joins when querying for InsurancePolicy because it doesn't know the type. This is very inefficient as I have a lot of tables to join. The problem becomes even worse when lazy-loading many-to-one properties containing an InsurancePolicy because it is used quite a lot in my model. The concrete implementations are used rarely, only in edit/details scenarios where it is specified the actual type and only the needed tables are joined.
Then I used a combination of discrimator + join. Thus the InsurancePolicy table contains the information about the type. Unfortunately a "join" mapping doesn't support lazy-loading. I tried setting fetch="select", however these generates N+1 selects when querying for multiple insurance policies.
// select from 1 table, "join" class must be lazy-loaded on access
Session.Get<InsurancePolicy>(5)
// select includes a join, since we explicitly specified a concrete type
Session.Get<SomeConcreteInsurancePolicy>(5)
So my questions are:
Is there a way to extend NHibernate to make it work like above?
Is there another way of mapping these large / complex class hierarchies?
Based on this:
The concrete implementations are used rarely, only in edit/details scenarios
I recommend that you break up InsurancePolicy in two:
InsurancePolicy, containing only the properties from the current base class
PolicyDetails, an abstract base class for the hierarchy.
There's a one-to-one relationship between those two classes.
The beauty of this is that you don't have to change anything else (except a minor change in the policy edit views, to point them to the new relationship)