Should I have a separate class for each database table? - oop

I assumed this was good practice, for reasons such as it helps to clearly represent relationships between classes. However, according to this website it's not:
http://www.tonymarston.net/php-mysql/good-bad-oop.html#a5
So, is having a class for each database table good OO practice or not?

Briefly reading the article in the introduction these points are said to 'complaints' against another article the author had written. I believe the author is actually saying that a class per table is a good approach and explains why, defending it against remarks such as "This code leads to a problematic dependency between the DB and the class".
Having a table per class is certainly a good place to start from, however like any pattern it is not necessarily going to be something that'll be required in all situations. An example being a junction table where perhaps just one class can handle interactions for numerous different junction tables. Likewise you may have tables with a one to one relationship that fits the program better as a single class.

Related

Embeddable vs one to many

I have seen an article in Dzone regarding Post and Post Details (two different entities) and the relations between them. There the post and its details are in different tables. But as I see it, Post Detail is an embeddable part because it cannot be used without the "parent" Post. So what is the logic to separate it in another table?
Please give me a more clear explanation when to use which one?
Embeddable classes represent the state of their parent classes. So to take your example, a StackOverflow POST has an ID which is invariant and used in an unbreakable URL for sharing e.g. http://stackoverflow.com/q/44017535/146325. There are a series of other attributes (state, votes, etc) which are scalar properties. When the post gets edited we have various versions of the text (which are kept and visible to people with sufficient rep). Those are your POST DETAILS.
"what is the logic to separate it in another table?"
Because keeping different things in separate tables is what relational databases do. The standard way of representing this data model is a parent table POST and child table POST_DETAIL with a defined relationship enforced through a foreign key.
Embeddable is a concept from object-oriented programming. Oracle does support object-relational constructs in the database. So it would be possible to define a POST_DETAIL Type and create a POST Table which has a column declared as a nested table of that Type. However, that would be a bad design for two reasons:
The SQL for working with nested tables is clunky. For instance, to get the POST and the latest version of its text would require unnesting the collection of details every time we need to display it. Computationally not much different from joining to a child table and filtering on latest version flag, but harder to optimise.
Children can have children themselves. In the case of Posts, Tags are details because they can vary due to editing. But if you embed TAG in POST_DETAIL embedded in POST how easy would it be to find all the Posts with an [oracle] tag?
This is the difference between Object-Oriented design and relational design.
OO is strongly hierarchical: everything is belongs to something and the way to get the detail is through the parent. This approach works well when dealing with single instances of things, and so is appropriate for UI design.
Relational prioritises commonality: everything of the same type is grouped together with links to other things. This approach is suited for dealing with sets of things, and so is appropriate for data management tasks (do you want to find all the employees who work in BERLIN or whose job is ENGINEER or who are managed by ELLIOTT?)
"give me a more clear explanation when to use which one"
Always store the data relationally in separate tables. Build APIs using OO patterns when it makes sense to do so.

Refactor schema with multiple many-to-many relationships to semantically different but structurally similar entities?

I have a schema with a number of many to many relationships and what I'm seeing is a alot of similar data structure spread out among tables with different names. The intuition I have is that there is a more efficient/desirable way to achieve the same result but I'm not sure what alternative approaches fall into reasonable design/best practices.
Note: Counries, TrafficTypes and People - as they exist now - could all be represented by Id and Name columns, but in the future may have additional fields. Maybe what I'm after is some kind of technique akin to inheritance?
Here's the diagram of what I've got:
Don't lump together things which you think are similar; they may diverge later when you need to store more information about each entity.
Is there a problem with the number of tables you have in your database?
You are probably thinking about the problem from an object oriented design position and thinking you can use some sort of "parent" table to represent the common parts - databases don't work that way.
If you are not careful you will end up with a MUCK or OTLT table.
Off the bat, I would keep seperate enties/objects/(cars vs animals) in seperate tables.
The chances of such enties overlapping properties are slim at best.
Thing is, once those entites start to evolve in your system, you will find that a single table will have hundreds of columns with singles populated per entity.
I don't see how inheritance applies to your case. But if you are interested in inheritance, as it applies to SQL tables or relations, here are some things to look up:
At the level of ER modeling look up "ER model specialization". This is the way the extended ER model diagrams "is A" relationships.
At the level of table design, look up "Class Table Inheritance" and "Shared primary key" for a couple of techniques that, used together, sort of mimic what inheritance does for you in an OOP. You might also want to look up "single table inheritance" for an alternative that's simpler, but can be more wasteful.
Don't worry. You're doing it right.
In a highly normalized schema, you're going to have tons of tables that are nearly identical.
As the others have said, what you're starting to consider (a generic table for multiple things) is a very bad idea and has many drawbacks.
The biggest drawback is that your relationships are made useless by it, in terms of maintaining data integrity. There's nothing stopping someone from assigning a CountryCode where a TrafficTypeId should be, and so on.
Another drawback would be that having one larger table will likely perform worse than many smaller, specialized tables; due to extra, unnecessary blocking.
Your may still want to implement some type of inheritance concept, but that'll be best done in whatever code accesses the database.

How do I structure my database so that two tables that constitute the same "element" link to another?

I read up on database structuring and normalization and decided to remodel the database behind my learning thingie to reduce redundancy.
I have different types of entries that can be learned. Gap texts/cloze tests (one text, many gaps) and simple known-unknown (one question, one answer) types.
Now I'm in a bit of a pickle:
gaps need exactly the same columns in the user table as question-answer types
but they need less columns than question-answer types (all that info is in the clozetests table)
I'm wishing for a "magic" foreign key that can point both to the gap and the terms table. Of course their ids would overlap though. I don't like having both a term_id and gap_id in the user_terms, that seems unelegant (but is the most elegant I can come up with after googling for a while, not knowing what name this pickle goes by).
I don't want a user_gaps analogue to user_terms, because then I'd be in the same pickle when it comes to the table user_terms_answers.
I put up this cardboard cutout collage of my schema. I didn't remove the stuff that isn't relevant for this question, but I can do that if anyone's confusion can be remedied like that. I think it looks super tidy already. Tidier than my mental concept of this at least.
Did I say any help would be greatly appreciated? Answerers might find themselves adulated for their wisdom.
Background story if you care, it's not really relevant to the question.
Before remodeling I had them all in one table (because I added the gap texts in a hurry), so that the gap texts were "normal" items without answers, while the gaps where items without questions. The application linked them together.
Edit
I added an answer after SO coughed up some helpful posts. I'm not yet 100% satisfied. I try to write views for common queries to this set up now and again I feel like I'll have to pull application logic for something that is database turf.
As mentioned in the comment, it is hard to answer without knowing the whole story. So, here is a story and a model to match. See if you can adapt this to you example.
School of (foreign) languages offers exams for several levels of language proficiency. The school maintains many pre-made tests for each level of each language (LangLevelTestNo).
Each test contains several (many) questions. Each question can be simple or of the close-text-type. Correct answers are stored for each simple question. Correct terms are stored for each gap of each close-text question.
Student can take an exam for a language level and is presented with one of the pre-made tests. For each student exam, the exam form is maintained which stores students answers for each question of the exam. Like a question, an answer may be of a simple of of a close-text-type.
After editing my question some Stackoverflow started relating the right questions to me.
I knew this was a common problem, but I really couldn't find it, just couldn't come up with the right search terms, I guess.
The following threads address similar problems and I'll try to apply that logic to my own design. They all propose adding a higher-level description for (in my case terms and gaps) like items. That makes sense and reflects the logic behind my application.
Relation Database Design
Foreign Key on multiple columns in one of several tables
Foreign Key refering to primary key across multiple tables
And this good person illustrates how to retrieve the data once it's broken up across tables. He also clues me to the keyword class table inheritance, so now I know what to google.
I'll post back with my edited schema once I've applied this. It does seem more elegant like this.
Edited schema

Model diagram doesn't seem right. How else can I relate the objects?

I have a entity diagram from some analysis that I'd like to have someone look over. For some reason the System object just doesn't seem right to me. Is there a better way to relate the objects?
Its basically a user authentication/management system in its infancy.
http://www.dumpt.com/img/viewer.php?file=zlh8ltbtho4mutbbb3yk.gif
Cheers,
Mike
User and Company should have a common base class (they both have names and mail addresses), then you can link the System to this base class. That's a common pattern for business modeling, look for example, into chapter one of Martin Fowler's book "Analysis Patterns".
EDIT: Or, if you think this makes more sense, you use System as the base class itself, put the EMail adress there (and perhaps give System a better name like LegalPerson, CorporateBody or something like that).
Considering the password has a 1-to-1 relationship with the User, and is not keyed to any other tables, I'd suggest saving yourself an inner join and just making it another column in the property table. Otherwise, looks pretty good.
It's hard to evaluate the "rightness" of something without some metrics of comparison. The easiest metrics for class designs are queries.
Think up as many of the queries that you will eventually want to ask of this data. Write them down and see how the design supports them. If you're unhappy, try another design and see how the queries look then.

How can an object-oriented programmer get his/her head around database-driven programming?

I have been programming in C# and Java for a little over a year and have a decent grasp of object oriented programming, but my new side project requires a database-driven model. I'm using C# and Linq which seems to be a very powerful tool but I'm having trouble with designing a database around my object oriented approach.
My two main question are:
How do I deal with inheritance in my database?
Let's say I'm building a staff rostering application and I have an abstract class, Event. From Event I derive abstract classes ShiftEvent and StaffEvent. I then have concrete classes Shift (derived from ShiftEvent) and StaffTimeOff (derived from StaffEvent). There are other derived classes, but for the sake of argument these are enough.
Should I have a separate table for ShiftEvents and StaffEvents? Maybe I should have separate tables for each concrete class? Both of these approaches seem like they would give me problems when interacting with the database. Another approach could be to have one Event table, and this table would have nullable columns for every type of data in any of my concrete classes. All of these approaches feel like they could impede extensibility down the road. More than likely there is a third approach that I have not considered.
My second question:
How do I deal with collections and one-to-many relationships in an object oriented way?
Let's say I have a Products class and a Categories class. Each instance of Categories would contain one or more products, but the products themselves should have no knowledge of categories. If I want to implement this in a database, then each product would need a category ID which maps to the categories table. But this introduces more coupling than I would prefer from an OO point of view. The products shouldn't even know that the categories exist, much less have a data field containing a category ID! Is there a better way?
Linq to SQL using a table per class solution:
http://blogs.microsoft.co.il/blogs/bursteg/archive/2007/10/01/linq-to-sql-inheritance.aspx
Other solutions (such as my favorite, LLBLGen) allow other models. Personally, I like the single table solution with a discriminator column, but that is probably because we often query across the inheritance hierarchy and thus see it as the normal query, whereas querying a specific type only requires a "where" change.
All said and done, I personally feel that mapping OO into tables is putting the cart before the horse. There have been continual claims that the impedance mismatch between OO and relations has been solved... and there have been plenty of OO specific databases. None of them have unseated the powerful simplicity of the relation.
Instead, I tend to design the database with the application in mind, map those tables to entities and build from there. Some find this as a loss of OO in the design process, but in my mind the data layer shouldn't be talking high enough into your application to be affecting the design of the higher order systems, just because you used a relational model for storage.
I had the opposite problem: how to get my head around OO after years of database design. Come to that, a decade earlier I had the problem of getting my head around SQL after years of "structured" flat-file programming. There are jsut enough similarities betwwen class and data entity decomposition to mislead you into thinking that they're equivalent. They aren't.
I tend to agree with the view that once you're committed to a relational database for storage then you should design a normalised model and compromise your object model where unavoidable. This is because you're more constrained by the DBMS than you are with your own code - building a compromised data model is more likley to cause you pain.
That said, in the examples given, you have choices: if ShiftEvent and StaffEvent are mostly similar in terms of attributes and are often processed together as Events, then I'd be inclined to implement a single Events table with a type column. Single-table views can be an effective way to separate out the sub-classes and on most db platforms are updatable. If the classes are more different in terms of attributes, then a table for each might be more appropriate. I don't think I like the three-table idea:"has one or none" relationships are seldom necessary in relational design. Anyway, you can always create an Event view as the union of the two tables.
As to Product and Category, if one Category can have many Products, but not vice versa, then the normal relational way to represent this is for the product to contain a category id. Yes, it's coupling, but it's only data coupling, and it's not a mortal sin. The column should probably be indexed, so that it's efficient to retrieve all products for a category. If you're really horrified by the notion then pretend it's a many-to-many relationship and use a separate ProductCategorisation table. It's not that big a deal, although it implies a potential relationship that doesn't really exist and might mislead somone coming to the app in future.
In my opinion, these paradigms (the Relational Model and OOP) apply to different domains, making it difficult (and pointless) to try to create a mapping between them.
The Relational Model is about representing facts (such as "A is a person"), i.e. intangible things that have the property of being "unique". It doesn't make sense to talk about several "instances" of the same fact - there is just the fact.
Object Oriented Programming is a programming paradigm detailing a way to construct computer programs to fulfill certain criteria (re-use, polymorphism, information hiding...). An object is typically a metaphor for some tangible thing - a car, an engine, a manager or a person etc. Tangible things are not facts - there may be two distinct objects with identical state without them being the same object (hence the difference between equals and == in Java, for example).
Spring and similar tools provide access to relational data programmatically, so that the facts can be represented by objects in the program. This does not mean that OOP and the Relational Model are the same, or should be confused with eachother. Use the Realational Model to design databases (collections of facts) and OOP to design computer programs.
TL;DR version (Object-Relational impedance mismatch distilled):
Facts = the recipe on your fridge.
Objects = the content of your fridge.
Frameworks such as
Hibernate http://www.hibernate.org/
JPA http://java.sun.com/developer/technicalArticles/J2EE/jpa/
can help you to smoothly solve this problem of inheritance. e.g. http://www.java-tips.org/java-ee-tips/enterprise-java-beans/inheritance-and-the-java-persistenc.html
I also got to understand database design, SQL, and particularly the data centered world view before tackling the object oriented approach. The object-relational-impedance-mismatch still baffles me.
The closest thing I've found to getting a handle on it is this: looking at objects not from an object oriented progamming perspective, or even from an object oriented design perspective but from an object oriented analysis perspective. The best book on OOA that I got was written in the early 90s by Peter Coad.
On the database side, the best model to compare with OOA is not the relational model of data, but the Entity-Relationship (ER) model. An ER model is not really relational, and it doesn't specify the logical design. Many relational apologists think that is ER's weakness, but it is actually its strength. ER is best used not for database design but for requirements analysis of a database, otherwise known as data analysis.
ER data analysis and OOA are surprisingly compatible with each other. ER, in turn is fairly compatible with relational data modeling and hence to SQL database design. OOA is, of course, compatible with OOD and hence to OOP.
This may seem like the long way around. But if you keep things abstract enough, you won't waste too much time on the analysis models, and you'll find it surprisingly easy to overcome the impedance mismatch.
The biggest thing to get over in terms of learning database design is this: data linkages like the foreign key to primary key linkage you objected to in your question are not horrible at all. They are the essence of tying related data together.
There is a phenomenon in pre database and pre object oriented systems called the ripple effect. The ripple effect is where a seemingly trivial change to a large system ends up causing consequent required changes all over the entire system.
OOP contains the ripple effect primarily through encapsulation and information hiding.
Relational data modeling overcomes the ripple effect primarily through physical data independence and logical data independence.
On the surface, these two seem like fundamentally contradictory modes of thinking. Eventually, you'll learn how to use both of them to good advantage.
My guess off the top of my head:
On the topic of inheritance I would suggest having 3 tables: Event, ShiftEvent and StaffEvent. Event has the common data elements kind of like how it was originally defined.
The last one can go the other way, I think. You could have a table with category ID and product ID with no other columns where for a given category ID this returns the products but the product may not need to get the category as part of how it describes itself.
The big question: how can you get your head around it? It just takes practice. You try implementing a database design, run into problems with your design, you refactor and remember for next time what worked and what didn't.
To answer your specific questions... this is a little bit of opinion thrown in, as in "how I would do it", not taking into account performance needs and such. I always start fully normalized and go from there based on real-world testing:
Table Event
EventID
Title
StartDateTime
EndDateTime
Table ShiftEvent
ShiftEventID
EventID
ShiftSpecificProperty1
...
Table Product
ProductID
Name
Table Category
CategoryID
Name
Table CategoryProduct
CategoryID
ProductID
Also reiterating what Pierre said - an ORM tool like Hibernate makes dealing with the friction between relational structures and OO structures much nicer.
There are several possibilities in order to map an inheritance tree to a relational model.
NHibernate for instance supports the 'table per class hierarchy', table per subclass and table per concrete class strategies:
http://www.hibernate.org/hib_docs/nhibernate/html/inheritance.html
For your second question:
You can create a 1:n relation in your DB, where the Products table has offcourse a foreign key to the Categories table.
However, this does not mean that your Product Class needs to have a reference to the Category instance to which it belongs to.
You can create a Category class, which contains a set or list of products, and you can create a product class, which has no notion of the Category to which it belongs.
Again, you can easy do this using (N)Hibernate;
http://www.hibernate.org/hib_docs/reference/en/html/collections.html
Sounds like you are discovering the Object-Relational Impedance Mismatch.
The products shouldn't even know that
the categories exist, much less have a
data field containing a category ID!
I disagree here, I would think that instead of supplying a category id you let your orm do it for you. Then in code you would have something like (borrowing from NHib's and Castle's ActiveRecord):
class Category
[HasMany]
IList<Product> Products {get;set;}
...
class Product
[BelongsTo]
Category ParentCategory {get;set;}
Then if you wanted to see what category the product you are in you'd just do something simple like:
Product.ParentCategory
I think you can setup the orm's differently, but either way for the inheritence question, I ask...why do you care? Either go about it with objects and forget about the database or do it a different way. Might seem silly, but unless you really really can't have a bunch of tables, or don't want a single table for some reason, why would you care about the database? For instance, I have the same setup with a few inheriting objects, and I just go about my business. I haven't looked at the actual database yet as it doesn't concern me. The underlying SQL is what is concerning me, and the correct data coming back.
If you have to care about the database then you're going to need to either modify your objects or come up with a custom way of doing things.
I guess a bit of pragmatism would be good here. Mappings between objects and tables always have a bit of strangeness here and there. Here's what I do:
I use Ibatis to talk to my database (Java to Oracle). Whenever I have an inheretance structure where I want a subclass to be stored in the database, I use a "discriminator". This is a trick where you have one table for all the Classes (Types), and have all fields which you could possibly want to store. There is one extra column in the table, containing a string which is used by Ibatis to see which type of object it needs to return.
It looks funny in the database, and sometimes can get you into trouble with relations to fields which are not in all Classes, but 80% of the time this is a good solution.
Regarding your relation between category and product, I would add a categoryId column to the product, because that would make life really easy, both SQL wise and Mapping wise. If you're really stuck on doing the "theoretically correct thing", you can consider an extra table which has only 2 colums, connecting the Categories and their products. It will work, but generally this construction is only used when you need many-to-many relations.
Try to keep it as simple as possible. Having a "academic solution" is nice, but generally means a bit of overkill and is harder to refactor because it is too abstract (like hiding the relations between Category and Product).
I hope this helps.