Database modelling or database design: Which comes first? - nhibernate

I would like to know which is the common practice for a domain implementation. Designing the business objects first, that need persistence or the database schema first, generating it from an entity relationship diagram (and afterwards the ORM poco*'s)?
I am going to start a solution, but I would like to know which is the most preferable "pattern".
(*powered by NHibernate)

Depends on whether you're an object or relational modeler. Preference is dictated by what you know best.
I'm an object person, so I'd say model the problem in objects and then get the relational schema from that.
I think there are lots of issues around data that aren't addressed by objects (e.g., indexing, primary and foreign keys, normalization) that say you still have some work to do when you're finished.
But any relational person will argue that they're primary and should be in the driver's seat.
I doubt that there will be a definitive answer to this one. I don't believe there should be. There's an object-relational impedance mismatch that's real. Objects are instance-centric; relational models are set-based. Both need careful consideration.

Common practice are both, and it comes down to the preference of each implementer. As duffymo suggested you should go with the one you know best.
However you should also take into consideration what your regular patterns for working with data are. Having something that's nicely modeled in either one, but very costly in terms of performance is not a good choice. The balance is somewhere in the middle.
I personally tend to pay more attention to the database side of things mainly because databases are the ones that are harder to scale. Keeping this is mind when designing a database helps. You don't have to necessarily make the initial design following strict scaling rules, but having it in mind might help you not to make design decisions that will be the equivalent of shooting yourself in the foot, when later on the need for scaling arises.

Related

What business folks have to understand about database design

I have a business team asking me about setting up a meeting to explain them about database design considerations. Since they do not have much idea on RDMS I'm to thinking to explain below things
What is RDBMS
What is a table and what are constraints / why we need them
What is a transaction and what are ACID Properties
Things to consider before/while developing a dbms
a. Decide how much detail you need and how much you may in need future
b. Identify fields with unique values
c. Select the appropriate data types for your fields
d. Normalization and Index design
Also most of the time this team has their data coming in from flat files which we need to load into the DB and represent into the format they need. Anybody please suggest what can i explain more or any better way I can explain. And kind of their data is all over the place. I just want to emphazise more on thinking it through because we couldn't set up a stable process to do the import. Any suggestion for me is welcome as well :)
Appreciate your help!
You haven't said what your audience expects to take away from your presentation. So I'll have to guess, based on my dealings with business people in the past. Your mileage may vary.
Business people typically don't care about the skills and knowledge you put into doing a good job with database design, even when they say they do. They want to understand database design in terms of costs and benefits. That is how business people think.
So if you must cover some technical topic like indexing, do so from a cost benefit point of view. There is a cost to adding an index to a table, and there is a benefit to adding an index to a table. Figuring out in advance whether the benefit is worth the cost is the really tricky part, and they will be interested in this.
On a larger scale, data is a business asset. There is a cost to managing that asset well, and there is a benefit to managing that asset well. If you can connect your talk to these two concepts, they will be interested.
If they are really good business people, they will have a good understanding of the subject matter that the database covers, provided it's a part of the enterprise data that affects their business. If you have a good ER model of the data in the database, this model will connect every value in every table to an attribute, and every attribute will describe some aspect of the subject matter. This is a very different use of an ER model than just using it as a preliminary to creating a relational model.
Technical people tend to think of ER modeling as "relational modeling light". It's really much deeper than that. It's an analytical handle on the question "what does the data really mean?" And this is a handle on "what is the data really worth?". And this is where the technical world meets the business world.
How about starting from the basis of CRUD operations, then move on to normalization, give the scenarios for the need of Normalization and concept of Keys in RDBMS ,then you can talk about the ER modeling
Considering the fact that you are presenting to business folks, I think there would be 2 approaches best suited to your needs.
a) WHEN YOU HAVE LESS TIME:
Only cover topics which need minimum or no prior knowledge. Cover RDMS & things to consider.
Keep it simple and easy to understand. Tell them how your solution works and why it is an effective one.
Cover only topics which are relevant and make it layman friendly. Provide them the pros & cons of your DB design. Connect it to business needs.
In all cases, provide contextual examples which they may relate to with ease.
b) WHEN YOU HAVE MORE TIME
You may cover topics in detail as suggested in the previous comments. (#SQL_Underworld & #Ramya)

What do I need to know about databases in order to create a quality Django app?

I'm trying to optimize my site and found this nice little Django doc:
Database Access Optimization, which suggests profiling followed by indexing and the selection of proper fields as the starting point for database optimization.
Normally, the django docs explain things pretty well, even things that more experienced programmers might consider "obvious". Not so in this case. After no explanation of indexing, the doc goes on to say:
We will assume you have done the obvious things above.
Uhhh. Wait! What the heck is indexing?
Obviously I can figure out what indexing is via google, my question is: what is it that I need to know as far as database stuff goes in order to create a scalable website? What should I be aware of about the Django framework specifically? What other "obvious" things ought I know? Where can I learn them?
I'm looking to get pointed in a direction here. I don't need to learn anything and everything about SQL, I just want to be informed enough to build my app the right way.
Thanks in advance!
I encourage you to read all that the other answers suggest and whatever else you can find on the subject, because it's all good information to know and will make you a better programmer.
That said, one of the nice things about Django and other similar frameworks is that for the most part you don't have to know what's going on behind the scenes in the DB. Django adds indexes automatically for fields that need them. The encouragement to add more is based on the use cases of your app. If you continually query based on one particular field, you should ensure that that field is indexed. It might be already (if it's a foreign key, primary key, etc.), but other random fields typically aren't.
There's also various optimizations that are database client-specific. Django can't do much here because it's goal is to remain database independent. So, if you're using PostgreSQL, MySQL, whatever, read about optimizations and best practices concerning those particular clients.
Wikipedia database design, and database normalization http://en.wikipedia.org/wiki/Database_design, and http://en.wikipedia.org/wiki/Database_normalization are two very important concepts, in addition to indexing.
In addition to these, having a basic understanding of your database of choice is necessary. Being able to add users, set permissions, and create a database are key things that you should know.
Learning how to backup your data is also a crucial thing.
The list keeps getting longer, one should also be aware of the db relationships that django handles for you, OneToOne, ManyToMany, ManyToOne. https://docs.djangoproject.com/en/dev/topics/db/models/
The performance impact of JOINs shouldn't be ignored. Access model properties in django is so easy, but understanding that some of Foreign Key relationships could have huge performance impacts is something to consider too.
Once you have a basic understanding of these things you should be at a pretty good starting point for creating a non-trivial django app!
Wikipedia has a nice article about database indexes, they are similar(ish) to an index in a book i.e. lets you (the computer) find things faster because you just look at the index (probably a very bad example :-)
As for performance there are many things you can do and presumably as it is a very detailed subject in itself, and is something that is particular to each RDBMS then it would be distracting / irrelevant for them (django) to go into great detail. Best thing is really to google performance tips for your particular RDBMS. There are some general tips such as indexing, limiting queries to only return the required data etc.
I think one of the main things is a good design, sticking as much as possible to Normal Form and in general actually taking your database into consideration before programming your models etc (which clearly you seem to be doing). Naming conventions are also a big plus, remembering explicit is better then implicit :-)
To summarise:
Learn/understand the fundamentals such as the relational model
Decide on a naming convention
Design your database perhaps using an ERM tool
Prefer surrogate ID's
Use the correct data type of minimum possible size
Use indexes appropriately and don't over index
Avoid unecessary/over querying
Prioritise security and stability over raw performance
Once you have an up and running database 'tune' the database analysing/profiling settings, queries, design etc
Backup and archive regularly - cron
Hang out here :-)
If required advance into replication (master/slave - django supports this quite well too)
Consider upgrading your hardware
Don't get too hung up about it

What are the limitations of ORM in general?

I know that there are a lot of ORM fans out there but how do you deal with a database with more than 300 tables and some of the tables have more than 100 fields?
Most of the sample applications that i have seen only use a few fields. Is is prudent to use ORM in such large scale? I think that ORM is redundant (why creating another layer when in reality databases do not get changed easily?).
For me it makes sense for small applications that might get moved from databases to databases or applications that can be run on multiple platforms to use ORM.
Otherwise it seems useless or simply another headache.
any idea?
I have used ORM in some projects (Hibernate) and not in others. ORM limitations are the same as for all abstractions, you give up some flexibility and you must invest in learning the specifics of the implementation. However you typically gain coding efficiency, reduce duplication, centralize configuration, and get other improvements that are specific to the implementation. Note that database portability is not always without effort - obviously not if you use vendor-specific features.
You don't mention whether your project already has a data access implementation. If you're starting from scratch then the size of the database should not concern you too much as ORM should actually save you more on a bigger database in terms of efficiency and reducing duplication. However if you're contemplating replacing an existing data access implementation and you don't foresee the database changing much then your efforts will almost certainly outweigh the benefits.
BTW, I suspect sample applications use small databases because they're less effort to create and easier for users to understand the examples, not because the developers think that their ORM solution is only appropriate for small databases
The great added value of the ORM is that the business logic developers can focus on interaction with objects rather than database tables.
I.e. sometimes your business object might be quite complex or use multiple database tables (i.e. #SecondaryTable in JPA 2.0). You don't need to know how the entity is represented in the database in order to do your job.
And what about relations? As a developer, I don't need to know if the relation is realised as a join table, foreign key or whatever. I just need to set appropriate object-oriented associations and the ORM will do the rest of the work for me.
I've seen quite a large projects (> 50 developers) that worked fine on the ORM even besides in that time the tools hasn't been so good and mature as now.
You might want to see this thread: Is ORM fit for complex projects?

Database EAV Pros/Cons and Alternatives

I have been looking for a database solution to allow user defined fields and values (allowing an unlimited number). At first glance, EAV seemed like the right fit, but after some reading I am not sure anymore.
What are the pros and cons of EAV?
Is there an alternative database method to allow user defined attributes/fields and values?
This is not to be considered an exhaustive answer, but just a few points on the topic.
Since the question is also tagged with the [sql] tag, let me say that, in general, relational databases aren't particularly suitable for storing data using the EAV model. You can still design an EAV model in SQL, but you will have to sacrifice many advantages that a relational database would give. Not only you won't be able to enforce referential integrity, use SQL data types for values and enforce mandatory attributes, but even the very basic queries can become difficult to write. In fact, to overcome this limitation, several EAV solutions rely on data duplication, instead of joining with related tables, which as you can imagine, has plenty of drawbacks.
If you really require a schemaless design, "allowing an unlimited number of attributes", your best bet is probably to use a NoSQL solution. Even though the weaknesses of EAV relative to relational databases also apply to NoSQL alternatives, you will be offered additional features that are difficult to achieve with conventional SQL databases. For example, usually NoSQL datastores can be scaled much easier than relational databases, simply because they were designed to solve some sort of scalability problem, and they intentionally dropped features that make scaling difficult.
Many cloud computing platforms (such as those offered by Amazon, Google and Microsoft) are featuring datastores based on the EAV model, where an arbitrary number of attributes can be associated with a given entity. If you are considering deploying your application to the cloud, you may consider this both as a business advantage, as well as a technical one, because the strong competition between the big vendors is pushing the value-to-cost ratios to very high levels, by continually pushing up on the features and pushing down the financial and implementation costs.
Have a look at posgtres hstore http://www.postgresql.org/docs/9.0/static/hstore.html
this will do exactly what you want without most of the disadvantages
The Streams Platform proposes the alternative way based on Streams (actually, it's the Domain Model), Fields and Assignments entities.
Is there an alternative database method to allow user defined attributes/fields and values?
One alternative is to change the database schema based on user input: for example when the user wants a new field, then add a corresponding column to the database.

What are the principles behind, and benefits of, the "party model"?

The "party model" is a "pattern" for relational database design. At least part of it involves finding commonality between many entities, such as Customer, Employee, Partner, etc., and factoring that into some more "abstract" database tables.
I'd like to find out your thoughts on the following:
What are the core principles and motivating forces behind the party model?
What does it prescribe you do to your data model? (My bit above is pretty high level and quite possibly incorrect in some ways. I've been on a project that used it, but I was working with a separate team focused on other issues).
What has your experience led you to feel about it? Did you use it, and if so, would you do so again? What were the pros and cons?
Did the party model limit your choice of ORMs? For example, did you have to eliminate certain ORMs because they didn't allow for enough of an "abstraction layer" between your domain objects and your physical data model?
I'm sure every response won't address every one of those questions ... but anything touching on one or more of them is going to help me make some decisions I'm facing.
Thanks.
What are the core principles and motivating forces behind the party
model?
To the extent that I've used it, it's mostly about code reuse and flexibility. We've used it before in the guest / user / admin model and it certainly proves its value when you need to move a user from one group to another. Extend this to having organizations and companies represented with users under them, and it's really providing a form of abstraction that isn't particularly inherent in SQL.
What does it prescribe you do to your data model? (My bit above is
pretty high level and quite possibly
incorrect in some ways. I've been on a
project that used it, but I was
working with a separate team focused
on other issues).
You're pretty correct in your bit above, though it needs some more detail. You can imagine a situation where an entity in the database (call it a Party) contracts out to another Party, which may in turn subcontract work out. A party might be an Employee, a Contractor, or a Company, all subclasses of Party. From my understanding, you would have a Party table and then more specific tables for each subclass, which could then be further subclassed (Party -> Person -> Contractor).
What has your experience led you to feel about it? Did you use it, and if
so, would you do so again? What were
the pros and cons?
It has its benefits if you need flexibly to add new types to your system and create relationships between types that you didn't expect at the beginning and architect in (users moving to a new level, companies hiring other companies, etc). It also gives you the benefit of running a single query and retrieving data for multiple types of parties (Companies,Employees,Contractors). On the flip side, you're adding additional layers of abstraction to get to the data you actually need and are increasing load (or at least the number of joins) on the database when you're querying for a specific type. If your abstraction goes too far, you'll likely need to run multiple queries to retrieve the data as the complexity would start to become detrimental to readability and database load.
Did the party model limit your choice of ORMs? For example, did you
have to eliminate certain ORMs because
they didn't allow for enough of an
"abstraction layer" between your
domain objects and your physical data
model?
This is an area that I'm admittedly a bit weak in, but I've found that using views and mirrored abstraction in the application layer haven't made this too much of a problem. The real problem for me has always been a "where is piece of data X living" when I want to read the data source directly (it's not always intuitive for new developers on the system either).
The idea behind the party models (aka entity schema) is to define a database that leverages some of the scalability benefits of schema-free databases. The party model does that by defining its entities as party type records, as opposed to one table per entity. The result is an extremely normalized database with very few tables and very little knowledge about the semantic meaning of the data it stores. All that knowledge is pushed to the data access in code. Database upgrades using the party model are minimal to none, since the schema never changes. It’s essentially a glorified key-value pair data model structure with some fancy names and a couple of extra attributes.
Pros:
Kick-ass horizontal scalability. Once your 5-6 tables are defined in your entity model, you can go to the beach and sip margaritas. You can virtually scale this database out as much as you want with minimum efforts.
The database supports any data structure you throw at it. You can also change data structures and party/entities definitions on the fly without affecting your application. This is very very powerful.
You can model any arbitrary data entity by adding records, not changing the schema. Meaning you can say goodbye to schema migration scripts.
This is programmers’ paradise, since the code they write will define the actual entities they use in code, and there are no mappings from Objects to Tables or anything like that. You can think of the Party table as the base object of your framework of choice (System.Object for .NET)
Cons:
Party/Entity models never play well with ORMs, so forget about using EF or NHibernate to get semantically meaningful entities out of your entity database.
Lots of joins. Performance tuning challenges. This ‘con’ is relative to the practices you use to define your entities, but is safe to say that you’ll be doing a lot more of those mind-bending queries that will bring you nightmares at night.
Harder to consume. Developers and DB pros unfamiliar with your business will have a harder time to get used to the entities exposed by these models. Since everything is abstract, there no diagram or visualization you can build on top of your database to explain what is stored to someone else.
Heavy data access models or business rules engines will be needed. Basically you have to do the work of understanding what the heck you want out of your database at some point, and your database model is not going to help you this time around.
If you are considering a party or entity schema in a relational database, you should probably take a look at other solutions like a NoSql data store, BigTable or KV Stores. There are some great products out there with massive deployments and traction such as MongoDB, DynamoDB, and Cassandra that pioneered this movement.
This is a vast topic, I would recommend reading The Data Model Resource Book Volume 3 - Universal Patterns for Data Modeling by Len Silverston and Paul Agnew.
I've just received my copy and it's pretty good - It provides you with an overlook for many approaches to data modeling, including hybrid contextual role patterns and so on. It has detailed PROs and CONs for every approach.
There is a pletheora of ways to model party relationships and roles all with their benefits and disadvantages. The question that was accepted as an answer covers just one instance of a 'party model'.
For instance, in many approaches, notions like "Employee", "Project Manager" etc. are roles that a party can play within a certain context. I will try to give you a better breakdown once I get home.
When I was part of a team implementing these ideas in the early 1980's, it did not limit our choice of ORM's because those hadn't been invented yet.
I'd fall back on those ideas any time, as that particular project was one of the most convincing proofs-of-concept I have ever seen of a "revolutionary" idea (which it certainly was at the time).
It forces you to nothing. And it doesn't stop you from anything (from any mistake, I mean). The one defining your own information model is you.
All parties have lots of properties in common. The fact that they have a name and such (we called those "signaletics"). The fact that they have principal/primary locations called "addresses". The fact that they all are involved, in some sense, in the business' contracts.
as a simple talk from my understanding: Party modeling gives the flexibility and needs more effort (like T-sql join and ...) to be implemented.
I also wanna point that, "using Party modeling (serialization/generalization) gives you the ability to have FK-Relation to other tables". for example: think of different types of users (admin, user, ...) which generalized into User table, and you can have UserID in your Authorization table.
I'm not sure, but the party model sounds like a particular case of the generalization-specialization pattern. A search on "generalization specialization relational modeling" finds some interesting articles.