What are the principles behind, and benefits of, the "party model"?

What are the principles behind, and benefits of, the "party model"? - sql

The "party model" is a "pattern" for relational database design. At least part of it involves finding commonality between many entities, such as Customer, Employee, Partner, etc., and factoring that into some more "abstract" database tables.
I'd like to find out your thoughts on the following:
What are the core principles and motivating forces behind the party model?
What does it prescribe you do to your data model? (My bit above is pretty high level and quite possibly incorrect in some ways. I've been on a project that used it, but I was working with a separate team focused on other issues).
What has your experience led you to feel about it? Did you use it, and if so, would you do so again? What were the pros and cons?
Did the party model limit your choice of ORMs? For example, did you have to eliminate certain ORMs because they didn't allow for enough of an "abstraction layer" between your domain objects and your physical data model?
I'm sure every response won't address every one of those questions ... but anything touching on one or more of them is going to help me make some decisions I'm facing.
Thanks.

What are the core principles and motivating forces behind the party
model?
To the extent that I've used it, it's mostly about code reuse and flexibility. We've used it before in the guest / user / admin model and it certainly proves its value when you need to move a user from one group to another. Extend this to having organizations and companies represented with users under them, and it's really providing a form of abstraction that isn't particularly inherent in SQL.
What does it prescribe you do to your data model? (My bit above is
pretty high level and quite possibly
incorrect in some ways. I've been on a
project that used it, but I was
working with a separate team focused
on other issues).
You're pretty correct in your bit above, though it needs some more detail. You can imagine a situation where an entity in the database (call it a Party) contracts out to another Party, which may in turn subcontract work out. A party might be an Employee, a Contractor, or a Company, all subclasses of Party. From my understanding, you would have a Party table and then more specific tables for each subclass, which could then be further subclassed (Party -> Person -> Contractor).
What has your experience led you to feel about it? Did you use it, and if
so, would you do so again? What were
the pros and cons?
It has its benefits if you need flexibly to add new types to your system and create relationships between types that you didn't expect at the beginning and architect in (users moving to a new level, companies hiring other companies, etc). It also gives you the benefit of running a single query and retrieving data for multiple types of parties (Companies,Employees,Contractors). On the flip side, you're adding additional layers of abstraction to get to the data you actually need and are increasing load (or at least the number of joins) on the database when you're querying for a specific type. If your abstraction goes too far, you'll likely need to run multiple queries to retrieve the data as the complexity would start to become detrimental to readability and database load.
Did the party model limit your choice of ORMs? For example, did you
have to eliminate certain ORMs because
they didn't allow for enough of an
"abstraction layer" between your
domain objects and your physical data
model?
This is an area that I'm admittedly a bit weak in, but I've found that using views and mirrored abstraction in the application layer haven't made this too much of a problem. The real problem for me has always been a "where is piece of data X living" when I want to read the data source directly (it's not always intuitive for new developers on the system either).

The idea behind the party models (aka entity schema) is to define a database that leverages some of the scalability benefits of schema-free databases. The party model does that by defining its entities as party type records, as opposed to one table per entity. The result is an extremely normalized database with very few tables and very little knowledge about the semantic meaning of the data it stores. All that knowledge is pushed to the data access in code. Database upgrades using the party model are minimal to none, since the schema never changes. It’s essentially a glorified key-value pair data model structure with some fancy names and a couple of extra attributes.
Pros:
Kick-ass horizontal scalability. Once your 5-6 tables are defined in your entity model, you can go to the beach and sip margaritas. You can virtually scale this database out as much as you want with minimum efforts.
The database supports any data structure you throw at it. You can also change data structures and party/entities definitions on the fly without affecting your application. This is very very powerful.
You can model any arbitrary data entity by adding records, not changing the schema. Meaning you can say goodbye to schema migration scripts.
This is programmers’ paradise, since the code they write will define the actual entities they use in code, and there are no mappings from Objects to Tables or anything like that. You can think of the Party table as the base object of your framework of choice (System.Object for .NET)
Cons:
Party/Entity models never play well with ORMs, so forget about using EF or NHibernate to get semantically meaningful entities out of your entity database.
Lots of joins. Performance tuning challenges. This ‘con’ is relative to the practices you use to define your entities, but is safe to say that you’ll be doing a lot more of those mind-bending queries that will bring you nightmares at night.
Harder to consume. Developers and DB pros unfamiliar with your business will have a harder time to get used to the entities exposed by these models. Since everything is abstract, there no diagram or visualization you can build on top of your database to explain what is stored to someone else.
Heavy data access models or business rules engines will be needed. Basically you have to do the work of understanding what the heck you want out of your database at some point, and your database model is not going to help you this time around.
If you are considering a party or entity schema in a relational database, you should probably take a look at other solutions like a NoSql data store, BigTable or KV Stores. There are some great products out there with massive deployments and traction such as MongoDB, DynamoDB, and Cassandra that pioneered this movement.

This is a vast topic, I would recommend reading The Data Model Resource Book Volume 3 - Universal Patterns for Data Modeling by Len Silverston and Paul Agnew.
I've just received my copy and it's pretty good - It provides you with an overlook for many approaches to data modeling, including hybrid contextual role patterns and so on. It has detailed PROs and CONs for every approach.
There is a pletheora of ways to model party relationships and roles all with their benefits and disadvantages. The question that was accepted as an answer covers just one instance of a 'party model'.
For instance, in many approaches, notions like "Employee", "Project Manager" etc. are roles that a party can play within a certain context. I will try to give you a better breakdown once I get home.

When I was part of a team implementing these ideas in the early 1980's, it did not limit our choice of ORM's because those hadn't been invented yet.
I'd fall back on those ideas any time, as that particular project was one of the most convincing proofs-of-concept I have ever seen of a "revolutionary" idea (which it certainly was at the time).
It forces you to nothing. And it doesn't stop you from anything (from any mistake, I mean). The one defining your own information model is you.
All parties have lots of properties in common. The fact that they have a name and such (we called those "signaletics"). The fact that they have principal/primary locations called "addresses". The fact that they all are involved, in some sense, in the business' contracts.

as a simple talk from my understanding: Party modeling gives the flexibility and needs more effort (like T-sql join and ...) to be implemented.
I also wanna point that, "using Party modeling (serialization/generalization) gives you the ability to have FK-Relation to other tables". for example: think of different types of users (admin, user, ...) which generalized into User table, and you can have UserID in your Authorization table.

I'm not sure, but the party model sounds like a particular case of the generalization-specialization pattern. A search on "generalization specialization relational modeling" finds some interesting articles.

Related

Is it better to use entity-arrtibute-value model over storing various different product in single description text column? [duplicate]

It is safe to say that the EAV/CR database model is bad. That said,
Question: What database model, technique, or pattern should be used to deal with "classes" of attributes describing e-commerce products which can be changed at run time?
In a good E-commerce database, you will store classes of options (like TV resolution then have a resolution for each TV, but the next product may not be a TV and not have "TV resolution"). How do you store them, search efficiently, and allow your users to setup product types with variable fields describing their products? If the search engine finds that customers typically search for TVs based on console depth, you could add console depth to your fields, then add a single depth for each tv product type at run time.
There is a nice common feature among good e-commerce apps where they show a set of products, then have "drill down" side menus where you can see "TV Resolution" as a header, and the top five most common TV Resolutions for the found set. You click one and it only shows TVs of that resolution, allowing you to further drill down by selecting other categories on the side menu. These options would be the dynamic product attributes added at run time.
Further discussion:
So long story short, are there any links out on the Internet or model descriptions that could "academically" fix the following setup? I thank Noel Kennedy for suggesting a category table, but the need may be greater than that. I describe it a different way below, trying to highlight the significance. I may need a viewpoint correction to solve the problem, or I may need to go deeper in to the EAV/CR.
Love the positive response to the EAV/CR model. My fellow developers all say what Jeffrey Kemp touched on below: "new entities must be modeled and designed by a professional" (taken out of context, read his response below). The problem is:
entities add and remove attributes weekly (search keywords dictate future attributes)
new entities arrive weekly (products are assembled from parts)
old entities go away weekly (archived, less popular, seasonal)
The customer wants to add attributes to the products for two reasons:
department / keyword search / comparison chart between like products
consumer product configuration before checkout
The attributes must have significance, not just a keyword search. If they want to compare all cakes that have a "whipped cream frosting", they can click cakes, click birthday theme, click whipped cream frosting, then check all cakes that are interesting knowing they all have whipped cream frosting. This is not specific to cakes, just an example.

There's a few general pros and cons I can think of, there are situations where one is better than the other:
Option 1, EAV Model:
Pro: less time to design and develop a simple application
Pro: new entities easy to add (might even
be added by users?)
Pro: "generic" interface components
Con: complex code required to validate simple data types
Con: much more complex SQL for simple
reports
Con: complex reports can become almost
impossible
Con: poor performance for large data sets
Option 2, Modelling each entity separately:
Con: more time required to gather
requirements and design
Con: new entities must be modelled and
designed by a professional
Con: custom interface components for each
entity
Pro: data type constraints and validation simple to implement
Pro: SQL is easy to write, easy to
understand and debug
Pro: even the most complex reports are relatively simple
Pro: best performance for large data sets
Option 3, Combination (model entities "properly", but add "extensions" for custom attributes for some/all entities)
Pro/Con: more time required to gather requirements and design than option 1 but perhaps not as much as option 2 *
Con: new entities must be modelled and designed by a professional
Pro: new attributes might be easily added later on
Con: complex code required to validate simple data types (for the custom attributes)
Con: custom interface components still required, but generic interface components may be possible for the custom attributes
Con: SQL becomes complex as soon as any custom attribute is included in a report
Con: good performance generally, unless you start need to search by or report by the custom attributes
* I'm not sure if Option 3 would necessarily save any time in the design phase.
Personally I would lean toward option 2, and avoid EAV wherever possible. However, for some scenarios the users need the flexibility that comes with EAV; but this comes with a great cost.

It is safe to say that the EAV/CR database model is bad.
No, it's not. It's just that they're an inefficient usage of relational databases. A purely key/value store works great with this model.
Now, to your real question: How to store various attributes and keep them searchable?
Just use EAV. In your case it would be a single extra table. index it on both attribute name and value, most RDBMs would use prefix-compression to on the attribute name repetitions, making it really fast and compact.
EAV/CR gets ugly when you use it to replace 'real' fields. As with every tool, overusing it is 'bad', and gives it a bad image.

// At this point, I'd like to take a moment to speak to you about the Magento/Adobe PSD format.
// Magento/PSD is not a good ecommerce platform/format. Magento/PSD is not even a bad ecommerce platform/format. Calling it such would be an
// insult to other bad ecommerce platform/formats, such as Zencart or OsCommerce. No, Magento/PSD is an abysmal ecommerce platform/format. Having
// worked on this code for several weeks now, my hate for Magento/PSD has grown to a raging fire
// that burns with the fierce passion of a million suns.
http://code.google.com/p/xee/source/browse/trunk/XeePhotoshopLoader.m?spec=svn28&r=11#107
The internal models are wacky at best, like someone put the schema into a boggle game, sealed that and put it in a paint shacker...
Real world: I'm working on a midware fulfilment app and here are one the queries to get address information.
CREATE OR REPLACE VIEW sales_flat_addresses AS
SELECT sales_order_entity.parent_id AS order_id,
sales_order_entity.entity_id,
CONCAT(CONCAT(UCASE(MID(sales_order_entity_varchar.value,1,1)),MID(sales_order_entity_varchar.value,2)), "Address") as type,
GROUP_CONCAT(
CONCAT( eav_attribute.attribute_code," ::::: ", sales_order_entity_varchar.value )
ORDER BY sales_order_entity_varchar.value DESC
SEPARATOR '!!!!!'
) as data
FROM sales_order_entity
INNER JOIN sales_order_entity_varchar ON sales_order_entity_varchar.entity_id = sales_order_entity.entity_id
INNER JOIN eav_attribute ON eav_attribute.attribute_id = sales_order_entity_varchar.attribute_id
AND sales_order_entity.entity_type_id =12
GROUP BY sales_order_entity.entity_id
ORDER BY eav_attribute.attribute_code = 'address_type'
Exacts address information for an order, lazily
--
Summary: Only use Magento if:
You are being given large sacks of money
You must
Enjoy pain

I'm surprised nobody mentioned NoSQL databases.
I've never practiced NoSQL in a production context (just tested MongoDB and was impressed) but the whole point of NoSQL is being able to save items with varying attributes in the same "document".

Where performance is not a major requirement, as in an ETL type of application, EAV has another distinct advantage: differential saves.
I've implemented a number of applications where an over-arching requirement was the ability to see the history of a domain object from its first "version" to it's current state. If that domain object has a large number of attributes, that means each change requires a new row be inserted into it's corresponding table (not an update because the history would be lost, but an insert). Let's say this domain object is a Person, and I have 500k Persons to track with an average of 100+ changes over the Persons life-cycle to various attributes. Couple that with the fact that rare is the application that has only 1 major domain object and you'll quickly surmize that the size of the database would quickly grow out of control.
An easy solution is to save only the differential changes to the major domain objects rather than repeatedly saving redundant information.
All models change over time to reflect new business needs. Period. Using EAV is but one of the tools in our box to use; but it should never be automatically classified as "bad".

I'm struggling with the same issue. It may be interesting for you to check out the following discussion on two existing ecommerce solutions: Magento (EAV) and Joomla (regular relational structure):
https://forum.virtuemart.net/index.php?topic=58686.0
It seems, that Magento's EAV performance is a real showstopper.
That's why I'm leaning towards a normalized structure. To overcome the lack of flexibility I'm thinking about adding some separate data dictionary in the future (XML or separate DB tables) that could be edited, and based on that, application code for displaying and comparing product categories with new attributes set would be generated, together with SQL scripts.
Such architecture seems to be the sweetspot in this case - flexible and performant at the same time.
The problem could be frequent use of ALTER TABLE in live environment. I'm using Postgres, so its MVCC and transactional DDL will hopefully ease the pain.

I still vote for modeling at the lowest-meaningful atomic-level for EAV. Let standards, technologies and applications that gear toward certain user community to decide content models, repetition needs of attributes, grains, etc.

If it's just about the product catalog attributes and hence validation requirements for those attributes are rather limited, the only real downside to EAV is query performance and even that is only a problem when your query deals with multiple "things" (products) with attributes, the performance for the query "give me all attributes for the product with id 234" while not optimal is still plenty fast.
One solution is to use the SQL database / EAV model only for the admin / edit side of the product catalog and have some process that denormalizes the products into something that makes it searchable. Since you already have attributes and hence it's rather likely that you want faceting, this something could be Solr or ElasticSearch. This approach avoids basically all downsides to the EAV model and the added complexity is limited to serializing a complete product to JSON on update.

EAV has many drawbacks:
Performance degradation over time
Once the amount of data in the application grows beyond a certain size, the retrieval and manipulation of that data is likely to become less and less efficient.
The SQL queries are very complex and difficult to write.
Data Integrity problems.
You can't define foreign keys for all the fields needed.
You have to define and maintain your own metadata.

I have a slightly different problem: instead of many attributes with sparse values (which is possibly a good reason to use EAV), I want to store something more like a spreadsheet. The columns in the sheet can change, but within a sheet all cells will contain data (not sparse).
I made a small set of tests to benchmark two designs: one using EAV, and the other using a Postgres ARRAY to store cell data.
EAV
Array
Both schemas have indexes on appropriate columns, and the indexes are used by the planner.
It turned out the array-based schema was an order of magnitude faster for both inserts and queries. From quick tests, it seemed that both scaled linearly. The tests aren't very thorough, though. Suggestions and forks welcome - they're under an MIT licence.

Using an ORM with a database that has no defined relationships?

Consider a database(MSSQL 2005) that consists of 100+ tables which have primary keys defined to a certain degree. There are 'relationships' between tables, however these are not enforced with foreign key constraints.
Consider the following simplified example of typical types of tables I am dealing with. The are clear relations between the User and City and Province tables. However, they key issues is the inconsistent data types in the tables and naming conventions.
User:
UserRowId [int] PK
Name [varchar(50)]
CityId [smallint]
ProvinceRowId [bigint]
City:
CityRowId [bigint] PK
CityDescription [varchar(100)]
Province:
ProvinceId [int] PK
ProvinceDesc [varchar(50)]
I am considering a rewrite of the application (in ASP.net MVC) that uses this data source as is similar in design to MVC storefront. However I am going through a proof of concept phase and this is one of the stumbling blocks I have come across.
What are my options in terms of ORM choice that can be easily used and why?
Should I even be considering an ORM? (The reason I ask this is that most explanations and tutorials all work with relatively cleanly designed existing databases, or newly created ones when compared to mine. I am thus having a very hard time trying to find a way forward with this problem)
There is a huge amount of existing SQL queries, would a datamappper(eg IBatis.net) be more suitable since we could easily modify them to work and reuse the investment already made?
I have found this question on SO which indicates to me that an ORM can be used - however I get the impression that this a question of mapping?
Note: at the moment, the object model is not clearly defined as it was non-existent. The existing system pretty much did almost everything in SQL or consisted of overly complicated, and numerous queries to complete functionality. I am pretty much a noob and have zero experience around ORMs and MVC - so this an awesome learning curve I am on.

I agree with Ben.
I was in this situation with a LAMP stack. An old dirty, bady coded website needed bringing up to scratch. It was literally the worst database I have seen, coupled with line after line of blind SQL execution.
Job? Get rid of all that SQL very quickly and replace it with an abstraction. Which ORM? I found that using an existing ORM to fit over a bad database (most databases really) retrospectively is bad news. I think this is a problem with ORMs, they move database/storage concerns closer to the application ... not further away.
My Solution: A reflective ORM that used only the existing database state to work out what was going on. All selects, inserts, updates and what-not used views/stored proceedures to mask the cruddy database. It is powered by a linq-esque API just rewrite the grim SQL with. Boiled around 100klocs SQL statements down to less than 2klocs.
pros: I can gradually port the database to a better structure behind the views and proceedures. IMHO this is how all databases should be organised, taking full advantage of the abstraction that SPs and views provide. I never want to see a single SQL statement (or an ORM masquerading as SQL) directly against a table.
That's my story. An overengineered way to slot a nice abstraction above an existing and crap database, without rewriting the database first, and without crowbaring an ORM into the mix making things much more complex.
a hack, no doubt, but it works so well I am using it in projects where I can design the database from scratch anyway ;)

The amount of work involved in trying to keep the existing schema and then crowbaring it into a much more structured orm pattern would probably be large and complex. If you are rewriting the whole system and retiring the old one then i would devise my data model create a new db and set of classes,maybe using linq2sql, then write a data migration script to move the data from the old schema to the new one. That way your complex fiddly code is all in the migration and you don't have to deal with maintining and managing a complex mapping between a structured class model and a badly designed db.

We've just faced this problem with an awful schema design (randomly has primary keys, no foreign keys at all, badly designed tables - just a mess).
We had the luxury of technology choice, and went MVC2 front end (irrelevant to your question), and had 2 devs split off - one try to model using NHibernate, the other using Entity Framework 4.
I hasten to add that we had a strong idea of what we wanted from our domain model, and modelled that first (not wanting to be constrained by the database), so our 'User' object from a schema point of view actually spanned 5 tables, we encapsulated a lot of the business logic so that the domain model wasn't aneamic, and once we were happy with our User object, we started the process of trying to plugin the ORM.
I can say without hesitation in both cases (NH and EF4) the compromises we had to make on our model in order to shoe-horn the implementation in was phenomenal. I'll give you the examples from EF4 as that's the one I was most closely involved in, others may be able to relate these to other ORMs.
private setters
Nope - not on your life with EF4. Your properties must be public. There are workarounds (for example, creating wrappers around properties that were coming in from your DB)
enums
Again, no - there was a wrapper concept and a 'mapping' to try to get a lookup int out of the DB into the models enum types.
outcomes
We persevered for a while with both approaches to get to a point where we'd completed the mapping of a user, and the outcome was that we had to compromise our domain model in too many ways.
where did we go after that?
Linq to SQL with our own mapping layer. And we've never looked back - absolutely fantastic - we've written the mapping layer ourselves once that takes the Dto object down at the Dal layer and maps it (as we specify it) into our Domain model.
Good luck with any investigation of ORMs, I'd certainly re-investigate them if I had a decent schema to base them off, but as it stood, with an awful schema, it was easier to roll our own.
Cheers,
Terry

Entity Attribute Value Database vs. strict Relational Model Ecommerce

There's a few general pros and cons I can think of, there are situations where one is better than the other:
Option 1, EAV Model:
Pro: less time to design and develop a simple application
Pro: new entities easy to add (might even
be added by users?)
Pro: "generic" interface components
Con: complex code required to validate simple data types
Con: much more complex SQL for simple
reports
Con: complex reports can become almost
impossible
Con: poor performance for large data sets
Option 2, Modelling each entity separately:
Con: more time required to gather
requirements and design
Con: new entities must be modelled and
designed by a professional
Con: custom interface components for each
entity
Pro: data type constraints and validation simple to implement
Pro: SQL is easy to write, easy to
understand and debug
Pro: even the most complex reports are relatively simple
Pro: best performance for large data sets
Option 3, Combination (model entities "properly", but add "extensions" for custom attributes for some/all entities)
Pro/Con: more time required to gather requirements and design than option 1 but perhaps not as much as option 2 *
Con: new entities must be modelled and designed by a professional
Pro: new attributes might be easily added later on
Con: complex code required to validate simple data types (for the custom attributes)
Con: custom interface components still required, but generic interface components may be possible for the custom attributes
Con: SQL becomes complex as soon as any custom attribute is included in a report
Con: good performance generally, unless you start need to search by or report by the custom attributes
* I'm not sure if Option 3 would necessarily save any time in the design phase.
Personally I would lean toward option 2, and avoid EAV wherever possible. However, for some scenarios the users need the flexibility that comes with EAV; but this comes with a great cost.

It is safe to say that the EAV/CR database model is bad.
No, it's not. It's just that they're an inefficient usage of relational databases. A purely key/value store works great with this model.
Now, to your real question: How to store various attributes and keep them searchable?
Just use EAV. In your case it would be a single extra table. index it on both attribute name and value, most RDBMs would use prefix-compression to on the attribute name repetitions, making it really fast and compact.
EAV/CR gets ugly when you use it to replace 'real' fields. As with every tool, overusing it is 'bad', and gives it a bad image.

// At this point, I'd like to take a moment to speak to you about the Magento/Adobe PSD format.
// Magento/PSD is not a good ecommerce platform/format. Magento/PSD is not even a bad ecommerce platform/format. Calling it such would be an
// insult to other bad ecommerce platform/formats, such as Zencart or OsCommerce. No, Magento/PSD is an abysmal ecommerce platform/format. Having
// worked on this code for several weeks now, my hate for Magento/PSD has grown to a raging fire
// that burns with the fierce passion of a million suns.
http://code.google.com/p/xee/source/browse/trunk/XeePhotoshopLoader.m?spec=svn28&r=11#107
The internal models are wacky at best, like someone put the schema into a boggle game, sealed that and put it in a paint shacker...
Real world: I'm working on a midware fulfilment app and here are one the queries to get address information.
CREATE OR REPLACE VIEW sales_flat_addresses AS
SELECT sales_order_entity.parent_id AS order_id,
sales_order_entity.entity_id,
CONCAT(CONCAT(UCASE(MID(sales_order_entity_varchar.value,1,1)),MID(sales_order_entity_varchar.value,2)), "Address") as type,
GROUP_CONCAT(
CONCAT( eav_attribute.attribute_code," ::::: ", sales_order_entity_varchar.value )
ORDER BY sales_order_entity_varchar.value DESC
SEPARATOR '!!!!!'
) as data
FROM sales_order_entity
INNER JOIN sales_order_entity_varchar ON sales_order_entity_varchar.entity_id = sales_order_entity.entity_id
INNER JOIN eav_attribute ON eav_attribute.attribute_id = sales_order_entity_varchar.attribute_id
AND sales_order_entity.entity_type_id =12
GROUP BY sales_order_entity.entity_id
ORDER BY eav_attribute.attribute_code = 'address_type'
Exacts address information for an order, lazily
--
Summary: Only use Magento if:
You are being given large sacks of money
You must
Enjoy pain

I'm surprised nobody mentioned NoSQL databases.
I've never practiced NoSQL in a production context (just tested MongoDB and was impressed) but the whole point of NoSQL is being able to save items with varying attributes in the same "document".

Where performance is not a major requirement, as in an ETL type of application, EAV has another distinct advantage: differential saves.
I've implemented a number of applications where an over-arching requirement was the ability to see the history of a domain object from its first "version" to it's current state. If that domain object has a large number of attributes, that means each change requires a new row be inserted into it's corresponding table (not an update because the history would be lost, but an insert). Let's say this domain object is a Person, and I have 500k Persons to track with an average of 100+ changes over the Persons life-cycle to various attributes. Couple that with the fact that rare is the application that has only 1 major domain object and you'll quickly surmize that the size of the database would quickly grow out of control.
An easy solution is to save only the differential changes to the major domain objects rather than repeatedly saving redundant information.
All models change over time to reflect new business needs. Period. Using EAV is but one of the tools in our box to use; but it should never be automatically classified as "bad".

I'm struggling with the same issue. It may be interesting for you to check out the following discussion on two existing ecommerce solutions: Magento (EAV) and Joomla (regular relational structure):
https://forum.virtuemart.net/index.php?topic=58686.0
It seems, that Magento's EAV performance is a real showstopper.
That's why I'm leaning towards a normalized structure. To overcome the lack of flexibility I'm thinking about adding some separate data dictionary in the future (XML or separate DB tables) that could be edited, and based on that, application code for displaying and comparing product categories with new attributes set would be generated, together with SQL scripts.
Such architecture seems to be the sweetspot in this case - flexible and performant at the same time.
The problem could be frequent use of ALTER TABLE in live environment. I'm using Postgres, so its MVCC and transactional DDL will hopefully ease the pain.

I still vote for modeling at the lowest-meaningful atomic-level for EAV. Let standards, technologies and applications that gear toward certain user community to decide content models, repetition needs of attributes, grains, etc.

If it's just about the product catalog attributes and hence validation requirements for those attributes are rather limited, the only real downside to EAV is query performance and even that is only a problem when your query deals with multiple "things" (products) with attributes, the performance for the query "give me all attributes for the product with id 234" while not optimal is still plenty fast.
One solution is to use the SQL database / EAV model only for the admin / edit side of the product catalog and have some process that denormalizes the products into something that makes it searchable. Since you already have attributes and hence it's rather likely that you want faceting, this something could be Solr or ElasticSearch. This approach avoids basically all downsides to the EAV model and the added complexity is limited to serializing a complete product to JSON on update.

EAV has many drawbacks:
Performance degradation over time
Once the amount of data in the application grows beyond a certain size, the retrieval and manipulation of that data is likely to become less and less efficient.
The SQL queries are very complex and difficult to write.
Data Integrity problems.
You can't define foreign keys for all the fields needed.
You have to define and maintain your own metadata.

I have a slightly different problem: instead of many attributes with sparse values (which is possibly a good reason to use EAV), I want to store something more like a spreadsheet. The columns in the sheet can change, but within a sheet all cells will contain data (not sparse).
I made a small set of tests to benchmark two designs: one using EAV, and the other using a Postgres ARRAY to store cell data.
EAV
Array
Both schemas have indexes on appropriate columns, and the indexes are used by the planner.
It turned out the array-based schema was an order of magnitude faster for both inserts and queries. From quick tests, it seemed that both scaled linearly. The tests aren't very thorough, though. Suggestions and forks welcome - they're under an MIT licence.

How can an object-oriented programmer get his/her head around database-driven programming?

I have been programming in C# and Java for a little over a year and have a decent grasp of object oriented programming, but my new side project requires a database-driven model. I'm using C# and Linq which seems to be a very powerful tool but I'm having trouble with designing a database around my object oriented approach.
My two main question are:
How do I deal with inheritance in my database?
Let's say I'm building a staff rostering application and I have an abstract class, Event. From Event I derive abstract classes ShiftEvent and StaffEvent. I then have concrete classes Shift (derived from ShiftEvent) and StaffTimeOff (derived from StaffEvent). There are other derived classes, but for the sake of argument these are enough.
Should I have a separate table for ShiftEvents and StaffEvents? Maybe I should have separate tables for each concrete class? Both of these approaches seem like they would give me problems when interacting with the database. Another approach could be to have one Event table, and this table would have nullable columns for every type of data in any of my concrete classes. All of these approaches feel like they could impede extensibility down the road. More than likely there is a third approach that I have not considered.
My second question:
How do I deal with collections and one-to-many relationships in an object oriented way?
Let's say I have a Products class and a Categories class. Each instance of Categories would contain one or more products, but the products themselves should have no knowledge of categories. If I want to implement this in a database, then each product would need a category ID which maps to the categories table. But this introduces more coupling than I would prefer from an OO point of view. The products shouldn't even know that the categories exist, much less have a data field containing a category ID! Is there a better way?

Linq to SQL using a table per class solution:
http://blogs.microsoft.co.il/blogs/bursteg/archive/2007/10/01/linq-to-sql-inheritance.aspx
Other solutions (such as my favorite, LLBLGen) allow other models. Personally, I like the single table solution with a discriminator column, but that is probably because we often query across the inheritance hierarchy and thus see it as the normal query, whereas querying a specific type only requires a "where" change.
All said and done, I personally feel that mapping OO into tables is putting the cart before the horse. There have been continual claims that the impedance mismatch between OO and relations has been solved... and there have been plenty of OO specific databases. None of them have unseated the powerful simplicity of the relation.
Instead, I tend to design the database with the application in mind, map those tables to entities and build from there. Some find this as a loss of OO in the design process, but in my mind the data layer shouldn't be talking high enough into your application to be affecting the design of the higher order systems, just because you used a relational model for storage.

I had the opposite problem: how to get my head around OO after years of database design. Come to that, a decade earlier I had the problem of getting my head around SQL after years of "structured" flat-file programming. There are jsut enough similarities betwwen class and data entity decomposition to mislead you into thinking that they're equivalent. They aren't.
I tend to agree with the view that once you're committed to a relational database for storage then you should design a normalised model and compromise your object model where unavoidable. This is because you're more constrained by the DBMS than you are with your own code - building a compromised data model is more likley to cause you pain.
That said, in the examples given, you have choices: if ShiftEvent and StaffEvent are mostly similar in terms of attributes and are often processed together as Events, then I'd be inclined to implement a single Events table with a type column. Single-table views can be an effective way to separate out the sub-classes and on most db platforms are updatable. If the classes are more different in terms of attributes, then a table for each might be more appropriate. I don't think I like the three-table idea:"has one or none" relationships are seldom necessary in relational design. Anyway, you can always create an Event view as the union of the two tables.
As to Product and Category, if one Category can have many Products, but not vice versa, then the normal relational way to represent this is for the product to contain a category id. Yes, it's coupling, but it's only data coupling, and it's not a mortal sin. The column should probably be indexed, so that it's efficient to retrieve all products for a category. If you're really horrified by the notion then pretend it's a many-to-many relationship and use a separate ProductCategorisation table. It's not that big a deal, although it implies a potential relationship that doesn't really exist and might mislead somone coming to the app in future.

In my opinion, these paradigms (the Relational Model and OOP) apply to different domains, making it difficult (and pointless) to try to create a mapping between them.
The Relational Model is about representing facts (such as "A is a person"), i.e. intangible things that have the property of being "unique". It doesn't make sense to talk about several "instances" of the same fact - there is just the fact.
Object Oriented Programming is a programming paradigm detailing a way to construct computer programs to fulfill certain criteria (re-use, polymorphism, information hiding...). An object is typically a metaphor for some tangible thing - a car, an engine, a manager or a person etc. Tangible things are not facts - there may be two distinct objects with identical state without them being the same object (hence the difference between equals and == in Java, for example).
Spring and similar tools provide access to relational data programmatically, so that the facts can be represented by objects in the program. This does not mean that OOP and the Relational Model are the same, or should be confused with eachother. Use the Realational Model to design databases (collections of facts) and OOP to design computer programs.
TL;DR version (Object-Relational impedance mismatch distilled):
Facts = the recipe on your fridge.
Objects = the content of your fridge.

Frameworks such as
Hibernate http://www.hibernate.org/
JPA http://java.sun.com/developer/technicalArticles/J2EE/jpa/
can help you to smoothly solve this problem of inheritance. e.g. http://www.java-tips.org/java-ee-tips/enterprise-java-beans/inheritance-and-the-java-persistenc.html

I also got to understand database design, SQL, and particularly the data centered world view before tackling the object oriented approach. The object-relational-impedance-mismatch still baffles me.
The closest thing I've found to getting a handle on it is this: looking at objects not from an object oriented progamming perspective, or even from an object oriented design perspective but from an object oriented analysis perspective. The best book on OOA that I got was written in the early 90s by Peter Coad.
On the database side, the best model to compare with OOA is not the relational model of data, but the Entity-Relationship (ER) model. An ER model is not really relational, and it doesn't specify the logical design. Many relational apologists think that is ER's weakness, but it is actually its strength. ER is best used not for database design but for requirements analysis of a database, otherwise known as data analysis.
ER data analysis and OOA are surprisingly compatible with each other. ER, in turn is fairly compatible with relational data modeling and hence to SQL database design. OOA is, of course, compatible with OOD and hence to OOP.
This may seem like the long way around. But if you keep things abstract enough, you won't waste too much time on the analysis models, and you'll find it surprisingly easy to overcome the impedance mismatch.
The biggest thing to get over in terms of learning database design is this: data linkages like the foreign key to primary key linkage you objected to in your question are not horrible at all. They are the essence of tying related data together.
There is a phenomenon in pre database and pre object oriented systems called the ripple effect. The ripple effect is where a seemingly trivial change to a large system ends up causing consequent required changes all over the entire system.
OOP contains the ripple effect primarily through encapsulation and information hiding.
Relational data modeling overcomes the ripple effect primarily through physical data independence and logical data independence.
On the surface, these two seem like fundamentally contradictory modes of thinking. Eventually, you'll learn how to use both of them to good advantage.

My guess off the top of my head:
On the topic of inheritance I would suggest having 3 tables: Event, ShiftEvent and StaffEvent. Event has the common data elements kind of like how it was originally defined.
The last one can go the other way, I think. You could have a table with category ID and product ID with no other columns where for a given category ID this returns the products but the product may not need to get the category as part of how it describes itself.

The big question: how can you get your head around it? It just takes practice. You try implementing a database design, run into problems with your design, you refactor and remember for next time what worked and what didn't.
To answer your specific questions... this is a little bit of opinion thrown in, as in "how I would do it", not taking into account performance needs and such. I always start fully normalized and go from there based on real-world testing:
Table Event
EventID
Title
StartDateTime
EndDateTime
Table ShiftEvent
ShiftEventID
EventID
ShiftSpecificProperty1
...
Table Product
ProductID
Name
Table Category
CategoryID
Name
Table CategoryProduct
CategoryID
ProductID
Also reiterating what Pierre said - an ORM tool like Hibernate makes dealing with the friction between relational structures and OO structures much nicer.

There are several possibilities in order to map an inheritance tree to a relational model.
NHibernate for instance supports the 'table per class hierarchy', table per subclass and table per concrete class strategies:
http://www.hibernate.org/hib_docs/nhibernate/html/inheritance.html
For your second question:
You can create a 1:n relation in your DB, where the Products table has offcourse a foreign key to the Categories table.
However, this does not mean that your Product Class needs to have a reference to the Category instance to which it belongs to.
You can create a Category class, which contains a set or list of products, and you can create a product class, which has no notion of the Category to which it belongs.
Again, you can easy do this using (N)Hibernate;
http://www.hibernate.org/hib_docs/reference/en/html/collections.html

Sounds like you are discovering the Object-Relational Impedance Mismatch.

The products shouldn't even know that
the categories exist, much less have a
data field containing a category ID!
I disagree here, I would think that instead of supplying a category id you let your orm do it for you. Then in code you would have something like (borrowing from NHib's and Castle's ActiveRecord):
class Category
[HasMany]
IList<Product> Products {get;set;}
...
class Product
[BelongsTo]
Category ParentCategory {get;set;}
Then if you wanted to see what category the product you are in you'd just do something simple like:
Product.ParentCategory
I think you can setup the orm's differently, but either way for the inheritence question, I ask...why do you care? Either go about it with objects and forget about the database or do it a different way. Might seem silly, but unless you really really can't have a bunch of tables, or don't want a single table for some reason, why would you care about the database? For instance, I have the same setup with a few inheriting objects, and I just go about my business. I haven't looked at the actual database yet as it doesn't concern me. The underlying SQL is what is concerning me, and the correct data coming back.
If you have to care about the database then you're going to need to either modify your objects or come up with a custom way of doing things.

I guess a bit of pragmatism would be good here. Mappings between objects and tables always have a bit of strangeness here and there. Here's what I do:
I use Ibatis to talk to my database (Java to Oracle). Whenever I have an inheretance structure where I want a subclass to be stored in the database, I use a "discriminator". This is a trick where you have one table for all the Classes (Types), and have all fields which you could possibly want to store. There is one extra column in the table, containing a string which is used by Ibatis to see which type of object it needs to return.
It looks funny in the database, and sometimes can get you into trouble with relations to fields which are not in all Classes, but 80% of the time this is a good solution.
Regarding your relation between category and product, I would add a categoryId column to the product, because that would make life really easy, both SQL wise and Mapping wise. If you're really stuck on doing the "theoretically correct thing", you can consider an extra table which has only 2 colums, connecting the Categories and their products. It will work, but generally this construction is only used when you need many-to-many relations.
Try to keep it as simple as possible. Having a "academic solution" is nice, but generally means a bit of overkill and is harder to refactor because it is too abstract (like hiding the relations between Category and Product).
I hope this helps.

What are the advantages of using an ORM? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
As a web developer looking to move from hand-coded PHP sites to framework-based sites, I have seen a lot of discussion about the advantages of one ORM over another. It seems to be useful for projects of a certain (?) size, and even more important for enterprise-level applications.
What does it give me as a developer? How will my code differ from the individual SELECT statements that I use now? How will it help with DB access and security? How does it find out about the DB schema and user credentials?
Edit: #duffymo pointed out what should have been obvious to me: ORM is only useful for OOP code. My code is not OO, so I haven't run into the problems that ORM solves.

I'd say that if you aren't dealing with objects there's little point in using an ORM.
If your relational tables/columns map 1:1 with objects/attributes, there's not much point in using an ORM.
If your objects don't have any 1:1, 1:m or m:n relationships with other objects, there's not much point in using an ORM.
If you have complex, hand-tuned SQL, there's not much point in using an ORM.
If you've decided that your database will have stored procedures as its interface, there's not much point in using an ORM.
If you have a complex legacy schema that can't be refactored, there's not much point in using an ORM.
So here's the converse:
If you have a solid object model, with relationships between objects that are 1:1, 1:m, and m:n, don't have stored procedures, and like the dynamic SQL that an ORM solution will give you, by all means use an ORM.
Decisions like these are always a choice. Choose, implement, measure, evaluate.

ORMs are being hyped for being the solution to Data Access problems. Personally, after having used them in an Enterprise Project, they are far from being the solution for Enterprise Application Development. Maybe they work in small projects. Here are the problems we have experienced with them specifically nHibernate:
Configuration: ORM technologies require configuration files to map table schemas into object structures. In large enterprise systems the configuration grows very quickly and becomes extremely difficult to create and manage. Maintaining the configuration also gets tedious and unmaintainable as business requirements and models constantly change and evolve in an agile environment.
Custom Queries: The ability to map custom queries that do not fit into any defined object is either not supported or not recommended by the framework providers. Developers are forced to find work-arounds by writing adhoc objects and queries, or writing custom code to get the data they need. They may have to use Stored Procedures on a regular basis for anything more complex than a simple Select.
Proprietery binding: These frameworks require the use of proprietary libraries and proprietary object query languages that are not standardized in the computer science industry. These proprietary libraries and query languages bind the application to the specific implementation of the provider with little or no flexibility to change if required and no interoperability to collaborate with each other.
Object Query Languages: New query languages called Object Query Languages are provided to perform queries on the object model. They automatically generate SQL queries against the databse and the user is abstracted from the process. To Object Oriented developers this may seem like a benefit since they feel the problem of writing SQL is solved. The problem in practicality is that these query languages cannot support some of the intermediate to advanced SQL constructs required by most real world applications. They also prevent developers from tweaking the SQL queries if necessary.
Performance: The ORM layers use reflection and introspection to instantiate and populate the objects with data from the database. These are costly operations in terms of processing and add to the performance degradation of the mapping operations. The Object Queries that are translated to produce unoptimized queries without the option of tuning them causing significant performance losses and overloading of the database management systems. Performance tuning the SQL is almost impossible since the frameworks provide little flexiblity over controlling the SQL that gets autogenerated.
Tight coupling: This approach creates a tight dependancy between model objects and database schemas. Developers don't want a one-to-one correlation between database fields and class fields. Changing the database schema has rippling affects in the object model and mapping configuration and vice versa.
Caches: This approach also requires the use of object caches and contexts that are necessary to maintian and track the state of the object and reduce database roundtrips for the cached data. These caches if not maintained and synchrnonized in a multi-tiered implementation can have significant ramifications in terms of data-accuracy and concurrency. Often third party caches or external caches have to be plugged in to solve this problem, adding extensive burden to the data-access layer.
For more information on our analysis you can read:
http://www.orasissoftware.com/driver.aspx?topic=whitepaper

At a very high level: ORMs help to reduce the Object-Relational impedance mismatch. They allow you to store and retrieve full live objects from a relational database without doing a lot of parsing/serialization yourself.
What does it give me as a developer?
For starters it helps you stay DRY. Either you schema or you model classes are authoritative and the other is automatically generated which reduces the number of bugs and amount of boiler plate code.
It helps with marshaling. ORMs generally handle marshaling the values of individual columns into the appropriate types so that you don't have to parse/serialize them yourself. Furthermore, it allows you to retrieve fully formed object from the DB rather than simply row objects that you have to wrap your self.
How will my code differ from the individual SELECT statements that I use now?
Since your queries will return objects rather then just rows, you will be able to access related objects using attribute access rather than creating a new query. You are generally able to write SQL directly when you need to, but for most operations (CRUD) the ORM will make the code for interacting with persistent objects simpler.
How will it help with DB access and security?
Generally speaking, ORMs have their own API for building queries (eg. attribute access) and so are less vulnerable to SQL injection attacks; however, they often allow you to inject your own SQL into the generated queries so that you can do strange things if you need to. Such injected SQL you are responsible for sanitizing yourself, but, if you stay away from using such features then the ORM should take care of sanitizing user data automatically.
How does it find out about the DB schema and user credentials?
Many ORMs come with tools that will inspect a schema and build up a set of model classes that allow you to interact with the objects in the database. [Database] user credentials are generally stored in a settings file.

If you write your data access layer by hand, you are essentially writing your own feature poor ORM.
Oren Eini has a nice blog which sums up what essential features you may need in your DAL/ORM and why it writing your own becomes a bad idea after time:
http://ayende.com/Blog/archive/2006/05/12/25ReasonsNotToWriteYourOwnObjectRelationalMapper.aspx
EDIT: The OP has commented in other answers that his code base isn't very object oriented. Dealing with object mapping is only one facet of ORMs. The Active Record pattern is a good example of how ORMs are still useful in scenarios where objects map 1:1 to tables.

Top Benefits:
Database Abstraction
API-centric design mentality
High Level == Less to worry about at the fundamental level (its been thought of for you)
I have to say, working with an ORM is really the evolution of database-driven applications. You worry less about the boilerplate SQL you always write, and more on how the interfaces can work together to make a very straightforward system.
I love not having to worry about INNER JOIN and SELECT COUNT(*). I just work in my high level abstraction, and I've taken care of database abstraction at the same time.
Having said that, I never have really run into an issue where I needed to run the same code on more than one database system at a time realistically. However, that's not to say that case doesn't exist, its a very real problem for some developers.

I can't speak for other ORM's, just Hibernate (for Java).
Hibernate gives me the following:
Automatically updates schema for tables on production system at run-time. Sometimes you still have to update some things manually yourself.
Automatically creates foreign keys which keeps you from writing bad code that is creating orphaned data.
Implements connection pooling. Multiple connection pooling providers are available.
Caches data for faster access. Multiple caching providers are available. This also allows you to cluster together many servers to help you scale.
Makes database access more transparent so that you can easily port your application to another database.
Make queries easier to write. The following query that would normally require you to write 'join' three times can be written like this:
"from Invoice i where i.customer.address.city = ?" this retrieves all invoices with a specific city
a list of Invoice objects are returned. I can then call invoice.getCustomer().getCompanyName(); if the data is not already in the cache the database is queried automatically in the background
You can reverse-engineer a database to create the hibernate schema (haven't tried this myself) or you can create the schema from scratch.
There is of course a learning curve as with any new technology but I think it's well worth it.
When needed you can still drop down to the lower SQL level to write an optimized query.

Most databases used are relational databases which does not directly translate to objects. What an Object-Relational Mapper does is take the data, create a shell around it with utility functions for updating, removing, inserting, and other operations that can be performed. So instead of thinking of it as an array of rows, you now have a list of objets that you can manipulate as you would any other and simply call obj.Save() when you're done.
I suggest you take a look at some of the ORM's that are in use, a favourite of mine is the ORM used in the python framework, django. The idea is that you write a definition of how your data looks in the database and the ORM takes care of validation, checks and any mechanics that need to run before the data is inserted.

What does it give me as a developer?
Saves you time, since you don't have to code the db access portion.
How will my code differ from the individual SELECT statements that I use now?
You will use either attributes or xml files to define the class mapping to the database tables.
How will it help with DB access and security?
Most frameworks try to adhere to db best practices where applicable, such as parametrized SQL and such. Because the implementation detail is coded in the framework, you don't have to worry about it. For this reason, however, it's also important to understand the framework you're using, and be aware of any design flaws or bugs that may open unexpected holes.
How does it find out about the DB schema and user credentials?
You provide the connection string as always. The framework providers (e.g. SQL, Oracle, MySQL specific classes) provide the implementation that queries the db schema, processes the class mappings, and renders / executes the db access code as necessary.

Personally I've not had a great experience with using ORM technology to date. I'm currently working for a company that uses nHibernate and I really can't get on with it. Give me a stored proc and DAL any day! More code sure ... but also more control and code that's easier to debug - from my experience using an early version of nHibernate it has to be added.

Using an ORM will remove dependencies from your code on a particular SQL dialect. Instead of directly interacting with the database you'll be interacting with an abstraction layer that provides insulation between your code and the database implementation. Additionally, ORMs typically provide protection from SQL injection by constructing parameterized queries. Granted you could do this yourself, but it's nice to have the framework guarantee.
ORMs work in one of two ways: some discover the schema from an existing database -- the LINQToSQL designer does this --, others require you to map your class onto a table. In both cases, once the schema has been mapped, the ORM may be able to create (recreate) your database structure for you. DB permissions probably still need to be applied by hand or via custom SQL.
Typically, the credentials supplied programatically via the API or using a configuration file -- or both, defaults coming from a configuration file, but able to be override in code.

While I agree with the accepted answer almost completely, I think it can be amended with lightweight alternatives in mind.
If you have complex, hand-tuned SQL
If your objects don't have any 1:1, 1:m or m:n relationships with other objects
If you have a complex legacy schema that can't be refactored
...then you might benefit from a lightweight ORM where SQL is is not
obscured or abstracted to the point where it is easier to write your
own database integration.
These are a few of the many reasons why the developer team at my company decided that we needed to make a more flexible abstraction to reside on top of the JDBC.
There are many open source alternatives around that accomplish similar things, and jORM is our proposed solution.
I would recommend to evaluate a few of the strongest candidates before choosing a lightweight ORM. They are slightly different in their approach to abstract databases, but might look similar from a top down view.
jORM
ActiveJDBC
ORMLite

my concern with ORM frameworks is probably the very thing that makes it attractive to lots of developers.
nameley that it obviates the need to 'care' about what's going on at the DB level. Most of the problems that we see during the day to day running of our apps are related to database problems. I worry slightly about a world that is 100% ORM that people won't know about what queries are hitting the database, or if they do, they are unsure about how to change them or optimize them.
{I realize this may be a contraversial answer :) }

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas