I develop a lot of MVC applications on my own and I am looking for good resources related to a particular problem.
In a lot of my applications, I need to give the end-user CRUD abilities on a lot of entities. But, often times I'll find that these entities are parents of un-deletable children records. Take the following model as an example:
Here, I want to give the user the ability to delete employees (maybe because they've left the company, etc.). However, these employees are listed on projects. To change the creator_employee_id on projects would make the data obviously incorrect, and to cascade delete the projects where that employee is listed is clearly a no-no.
So, I've handled this in the past by adding a flag:
So, when a user goes to delete an employee, I check if deleting that employee violates any foreign key constraints. If it doesn't, I delete the employee. If it does, I set the employee to "active = 0".
However, this entire process of checking, iffing, deleting or deactivating is really complicated and cumbersome to compensate for on the code-level. Because an employee might not just be tied to projects, but to other entities as well, you have to write a check for each entity, determine if it makes sense to "delete" or "deactivate," etc. Then you have to give the user a way to view/reactivate inactive records.
My question is: Are there any other approaches to handling this specific kind of problem? I've tried looking around the web, but this problem is difficult to find the wording for. Perhaps even just the right words will help point me in the right direction. I hope this problem makes sense.
I think that the only clean alternative to deleting entities is to manage their history, possibly with an active flag as in your example (or better with lifecycle timestamps). However, you have to treat all entities in a uniform manner and not delete some of them (those that don't have dependent entities), but not others (having dependent entities).
My approach would be to set the active flag to false in all cases when an employee leaves. Keep that code simple and don't try to use branching logic to handle different scenarios based on the existence of related projects.
If you really set on removing old employees when they don't have any projects related to them, then setup a maintenance task that runs monthly, weekly, or whatever, that cleans out any inactive employees with no related records.
Related
I'm new to database design and have some uncertainties about how best to model this particular case. I'd appreciate any suggestions for this fairly simple scenario.
When a production task begins, two people are involved at all times. One is in charge of the production, and a second is tasked with quality assurance. For any task in the database, it must be possible to identify these two people. They'll both exist in a Person table and have IDs, so I just want the best way to relate them to the production task. The following rules exist:
Either person may be swapped out for a different person at any time.
Each task always involves both people (Neither of these are null).
There are never any other people involved in the task that we want to record.
Each person may be involved in multiple tasks, or none at all.
If we had a whole host of relationships between the task and the people, I'd create some sort of convoluted relationship structure describing their relationship (As producer, quality assurance person, overseer, etc.), but here I feel as though it's sensible to just stick the IDs of the two people in the Task table, in separate columns for Production Person and Quality Assurance Person. Is this bad for some reason that I can't see?
What has really prompted my question is that I'm trying to design exactly that in DBDesigner 4, which I'm new to, and it just doesn't like it - When I try to set up a second non-identifying relationship between Task and Person, it won't give me a second field. It also won't seem to let me rename the fields in Task that refer to the persons, so it'd be impossible to differentiate between the two anyway. Since no-one else seems to share this problem, I've began to wonder whether it's a good idea at all. Is it standard to introduce additional tables as soon as there are two or more links between two entities? What would that look like if I wanted to enforce the above rules? I can't see how I'd ensure that an n:m table always has entries for both people working on the task.
If you are confident your requirements will stay this rigid forever, then just create two NOT NULL FKs:
This declaratively enforces that exactly two people are associated to the task at all times, which would not be readily achievable with just the junction table (as you already noted).
OTOH, if you anticipate your requirements might change at some point in the future, then the added flexibility of junction table might be more important than the completely declarative enforcement of your business rules.
I'm not familiar with DBDesigner, and therefore with your particular problem, but in ER modeling in general, multiple relationships with the same entity are distinguished by their "rolenames" which determine the names of migrated attributes (see the section on "Rolenames" in the chapter 3 of the ERwin Methods Guide). Try locating something along those lines in the UI of your tool.
If you want to know the current state and not who held the role previously #Branko Dimitrijevic's solution will work.
But if the statement 'Either person may be swapped out for a different person at any time' implies you need to know who previously held that role consider a 3 table design
Task; TaskID, <other details>
Assignee; TaskID, PeopleID, role, start_date, end_date
People; PeopleID, <other details>
Then in the assignee table you need constraints to ensure that for each TaskID, Role combination the dates are reasonable e.g. dates don't overlap or have gaps. That you have only 1 of each role active for each task at a time. To manage this would probably require code either in triggers or the application.
I need store some review flags that relate to some entities. Each review flag can only related to a single entity property group. For example table Parents has a ParentsStatus flag and table Children has a set of ChildrenStatus flags.
In the current design proposal I have three tables:
ReviewTypes: stores the flags and the properties they relate to.
ReviewPositions: stores the values the flags can have.
Reviews: stores the transaction data, the actual reviews. It is like UsersToFlags: Flags in a database rows, best practices.
The problem is I am getting push back that there is no need to have the Reviews table and it would be better to just store this actual review data on each entity. For example add an extra column to Parents to hold ParentsStatus. They feel it is a simpler solution and separating the data out is just “overkill” for out scenario.
I don’t like this idea as this means that every time we want to add a new review flag we need to update the core entity table to hold that flag.
Space is not a problem.
Do people have any strong opinions?
Edit:
This comment applies to the three answers. The consensus is the relational approach is best but I think I need to read up a little more on the EAV model as from some very basic reading Best beginner resources for understanding the EAV database model? and its related links it does not appear to be super straightforward and I don't want to dig myself a hole. Thanks to wildplasser. I'll loop back once I read up a bit more.
Oh yes. Their idea is simpler, until you want to enhance it. Given the scheme they are proposing what if two reviews were need per entity. What if you wanted to attach other things such as notes/annotations. Once they find out how much of an inflatable dartboard their idea is, what do you have to move to a more useful one? Not to mention you need some way of identifying status fields, with fragile rubbish like Column name ends with "_Status", or you have to hard code them somewhere.
Doing it properly is not that much more work, it's not more complex, in fact in many ways it's simpler and it will cope with the invetible changes at far less cost.
normalization is always preferable to premature optimization.
One reason why I like the reviews table separate is that you can hold changes you may not want to display yet (as it hasn't been reviewed and approved) and still maintain the old dat until the new is approved. I don't know if your situation requires that.
To make future programming simpler for when you want to display the changes, you can write a view that shows the old and new data.
I'd like to hear some opinions or discussion on a matter of database design. Me and my colleagues are developing a complex application in finance industry that is being installed in several countries.
Our contractors wanted us to keep a single application for all the countries so we naturally face the difficulties with different workflows in every one of them and try to make the application adjustable to satisfy various needs.
The issue I've encountered today was a request from the head of the IT department from the contractors side that we keep the database model in terms of tables and columns they consist of.
For examlpe, we got a table with different risks and we needed to add a flag column IsSomething (BIT NOT NULL ...). It fully qualifies to exists within the risk table according to the third normal form, no transitive dependency to the key, a non key value ...
BUT, the guy said that he wants to keep the tables as they are so we had to make a new table "riskinfo" and link the data 1:1 to the new column.
What is your opinion ?
We add columns to our tables that are referenced by a variety of apps all the time.
So long as the applications specifically reference the columns they want to use and you make sure the new fields are either nullable or have a sensible default defined so it doesn't interfere with inserts I don't see any real problem.
That said, if an app does a select * then proceeds to reference the columns by index rather than name you could produce issues in existing code. Personally I have confidence that nothing referencing our database does this because of our coding conventions (That and I suspect the code review process would lynch someone who tried it :P), but if you're not certain then there is at least some small risk to such a change.
In your actual scenario I'd go back to the contractor and give your reasons you don't think the change will cause any problems and ask the rationale behind their choice. Maybe they have some application-specific wisdom behind their suggestion, maybe just paranoia from dealing with other companies that change the database structure in ways that aren't backwards-compatible, or maybe it's just a policy at their company that got rubber-stamped long ago and nobody's challenged. Till you ask you never know.
This question is indeed subjective like what Binary Worrier commented. I do not have an answer nor any suggestion. Just sharing my 2 cents.
Do you know the rationale for those decisions? Sometimes good designs are compromised for the sake of not breaking currently working applications or simply for the fact that too much has been done based on the previous one. It could also be many other non-technical reasons.
Very often, the programming community is unreasonably concerned about the ripple effect that results from redefining tables. Usually, this is a result of failure to understand data independence, and failure to guard the data independence of their operations on the data. Occasionally, the original database designer is at fault.
Most object oriented programmers understand encapsulation better than I do. But these same experts typically don't understand squat about data independence. And anyone who has learned how to operate on an SQL database, but never learned the concept of data independence is dangerously ignorant. The superficial aspects of data independence can be learned in about five minutes. But to really learn it takes time and effort.
Other responders have mentioned queries that use "select *". A select with a wildcard is more data dependent than the same select that lists the names of all the columns in the table. This is just one example among dozens.
The thing is, both data independence and encapsulation pursue the same goal: containing the unintended consequences of a change in the model.
Here's how to keep your IT chief happy. Define a new table with a new name that contains all the columns from the old table, and also all the additional columns that are now necessary. Create a view, with the same name as the old table, that contains precisely the same columns, and in the same order, that the old table had. Typically, this view will show all the rows in the old table, and the old PK will still guarantee uniqueness.
Once in a while, this will fail to meet all of the IT chief's needs. And if the IT chief is really saying "I don't understand databases; so don't change anything" then you are up the creek until the IT chief changes or gets changed.
The lead developer on a project I'm involved in says it's bad practice to rely on cascades to delete related rows.
I don't see how this is bad, but I would like to know your thoughts on if/why it is.
I'll preface this by saying that I rarely delete rows period. Generally most data you want to keep. You simply mark it as deleted so it won't be shown to users (ie to them it appears deleted). Of course it depends on the data and for some things (eg shopping cart contents) actually deleting the records when the user empties his or her cart is fine.
I can only assume that the issue here is you may unintentionally delete records you don't actually want to delete. Referential integrity should prevent this however. So I can't really see a reason against this other than the case for being explicit.
I would say that you follow the principle of least surprise.
Cascading deletes should not cause unexpected loss of data. If a delete requires related records to be deleted, and the user needs to know that those records are going to go away, then cascading deletes should not be used. Instead, the user should be required to explicitly delete the related records, or be provided a notification.
On the other hand, if the table relates to another table that is temporary in nature, or that contains records that will never be needed once the parent entity is gone, then cascading deletes may be OK.
That said, I prefer to state my intentions explicitly by deleting the related records in code, rather than relying on cascading deletes. In fact, I've never actually used a cascading delete to implicitly delete related records. Also, I use soft deletion a lot, as described by cletus.
I never use cascading deletes. Why? Because it is too easy to make a mistake. Much safer to require client applications to explicitly delete (and meet the conditions for deletion, such as deleting FK referred records.)
In fact, deletions per se can be avoided by marking records as deleted or moving into archival/history tables.
In the case of marking records as deleted, it depends on the relative proportion of marked as deleted data, since SELECTs will have to filter on 'isDeleted = false' an index will only be used if less than 10% (approximately, depending on the RDBMS) of records are marked as deleted.
Which of these 2 scenarios would you prefer:
Developer comes to you, says "Hey, this delete won't work". You both look into it and find that he was accidently trying to delete entire table contents. You both have a laugh, and go back to what you were doing.
Developer comes to you, and sheepishly asks "Do we have backups?"
There's another great reason to not use cascading UPDATES or DELETES: they hold a serializable lock. Holding a serializable lock can kill performance.
Another huge reason to avoid cascading deletes is performance. They seem like a good idea until you need to delete 10,000 records from the main table which in turn have millions of records in child tables. Given the size of this delete, it is likely to completely lock down all of the table for hours maybe even days. Why would you ever risk this? For the convenience of spending ten minutes less time writing the extra delete statements for one record deletes?
Further, the error you get when you try to delete a record that has a child record is often a good thing. It tells you that you don't want to delete this record becasue there is data that you need that you would lose if you did so. Cascade delete would just go ahead and delete the child records resulting in loss of information about orders for instance if you deleted a customer who had orders in the past. This sort of thing can thoroughly mess up your financial records.
I was likewise told that cascading deletes were bad practice... and thus never used them until I came across a client who used them. I really didn't know why I was not supposed to use them but thought they were very convenient in not having to code out deleting all the FK records as well.
Thus I decided to research why they were so "bad" and from what I've found so far their doesn't to appear to be anything problematic about them. In fact the only good argument I've seen so far is what HLGLEM stated above about performance. But as I am usually not deleting this number of records I think in most cases using them should be fine. I would like to hear of any other arguments others may have against using them to make sure I've considered all options.
I'd add that ON DELETE CASCADE makes it difficult to maintain a copy of the data in a data warehouse using binlog replication which is how most commercial ETL tools work. Explicit deletion from each table maintains a full log record and is much easier on the data team :)
I actually agree with most of the answers here, YET not all scenarios are the same, and it depends on the situation at hand and what would be the entropy of that decision, for example:
If you have a deletion command for an entity that has multiple many/belong relationships with a large number of entities, each time you would call that deletion process you would also need to remember to delete all the corresponding FKs from each relational pivot that A has corrosponding relationships with.
Whereas via a cascade on delete, you write that once as part of your schema and it will ONLY delete those corresponding FKs and cleanup the pivots from relations that are no longer necessary, imagine 24 relations for an entity + other entities that would also have large number of relations on top of that, again, it really depends on your setup and what YOU feel comfortable with. In anycase just for FYIs, in an Illuminate migration schema file, you would write it as such:
$table->dropForeign(['permission_id']);
$table->foreign('permission_id')
->references('id')
->on('permission')
->onDelete('cascade');
Hopefully, this fictitious example will illustrate my problem:
Suppose you are writing a system which tracks complaints for a software product, as well as many other attributes about the product. In this case the SoftwareProduct is our aggregate root and Complaints are entities that only can exist as a child of the product. In other words, if the software product is removed from the system, so shall the complaints.
In the system, there is a dashboard like web page which displays many different aspects of a single SoftwareProduct. One section in the dashboard, displays a list of Complaints in a grid like fashion, showing only some very high level information for each complaint. When an admin type user chooses one of these complaints, they are directed to an edit screen which allows them to edit the detail of a single Complaint.
The question is: what is the best way for the edit screen to retrieve the single Complaint, so that it can be displayed for editing purposes? Keep in mind we have already established the SoftwareProduct as an aggregate root, therefore direct access to a Complaint should not be allowed. Also, the system is using NHibernate, so eager loading is an option, but my understanding is that even if a single Complaint is eager loaded via the SoftwareProduct, as soon as the Complaints collection is accessed the rest of the collection is loaded. So, how do you get the single Complaint through the SoftwareProduct without incurring the overhead of loading the entire Complaints collection?
This gets a bit into semantics and religiosity, but within the context of editing a complaint, the complaint is the root object. When you are editing a complaint, the parent object (software product) is unimportant. It is obviously an entity with a unique identity. Therefore you would would have a service/repository devoted to saving the updated complaint, etc.
Also, i think you're being a bit too negative. Complaints? How about "Comments"? Or "ConstructiveCriticisms"?
#Josh,
I don't agree with what you are saying even though I have noticed some people design their "Web" applications this way just for the sake of performance, and not based on the domain model itself.
I'm not a DDD expert either, but I'm sure you have read the traditional Order and OrderItem example. All DDD books say OrderItem belongs to the Order aggregate with Order being the aggregate root.
Based on what you are saying, OrderItem doesn't belong to Order aggregate anymore since the user may want to directly edit an OrderItem with Order being unimportant (just like editing a Complaing with its parents Software Product being unimportant). And you know if this approach is followed, none of the Order invariants could be enforced, which are extremely important when it comes to e-commerce systems.
Anyone has any better approaches?
Mosh
To answer your question:
Aggregates are used for the purpose of consistency. For example, if adding/modifying/deleting a child object from a parent (aggregate root) causes an invariant to break, then you need an aggregate there.
However, in your problem, I believe SoftwareProduct and Compliant belong to two separate aggregates, each being the root of their own aggregates. You don't need to load the SoftwareProject and all N Complaints assigned to it, just to add a new Complaint. To me, it doesn't seem that you have any business rules to be evaluated when adding a new Complaint.
So, in summary, create 2 different Repositories: SoftwareProductRepository and ComplaintRepository.
Also, when you delete a SoftwareProduct, you can use database relationships to cascade deletes and remove the associated Complaints. This should be done in your database for the purpose of data integrity. You don't need to control that in your Domain Model, unless you had other invariants apart from deleting linked objects.
Hope this helps,
Mosh
I am using NH for another business context but similar entity relationships like yours. I do not understand why do you say:
Keep in mind we have already
established the SoftwareProduct as an
aggregate root, therefore direct
access to a Complaint should not be
allowed
I have this in mine, Article and Publisher entities, if Publisher cease to exist, so do all the dependent Artcle entities. I allow myself to have direct access to the Article collections of each Publisher and individual entities. In the DB/Mapping of the Article class, Publisher is one of the members and cannot accept Null.
Care to elaborate the difference between yours and mine?
Sorry this is not a direct answer but too long to be added as a comment.
I agree with Mosh. Each ones of these two entities has its own aggregate root. Let me to explain it in the real life. suppose that a company has developed a software. There are some bug in this software, that made you annoy. you are going to go to the company and aware them from this problem. this company gives you a form to be filled by you.
This form has a field - section - indicates to the software name and description. additionally, it has some parts for your complaint. Is this form the same as the software manual? No. It is a form related to the software. It is not the software. Does this form has any ID? yes. It has. In other words, you can call the company in the next day and ask the operator about your letter of complaint. It is obvious that the operator will ask you about the Id.
This evidence shows that this form has its own entity and it could not be confused with the software itself. Any relation between two different entity does not mean one of them belongs to the other.