Database design: Multiple non-identifying relationships between two tables - sql

I'm new to database design and have some uncertainties about how best to model this particular case. I'd appreciate any suggestions for this fairly simple scenario.
When a production task begins, two people are involved at all times. One is in charge of the production, and a second is tasked with quality assurance. For any task in the database, it must be possible to identify these two people. They'll both exist in a Person table and have IDs, so I just want the best way to relate them to the production task. The following rules exist:
Either person may be swapped out for a different person at any time.
Each task always involves both people (Neither of these are null).
There are never any other people involved in the task that we want to record.
Each person may be involved in multiple tasks, or none at all.
If we had a whole host of relationships between the task and the people, I'd create some sort of convoluted relationship structure describing their relationship (As producer, quality assurance person, overseer, etc.), but here I feel as though it's sensible to just stick the IDs of the two people in the Task table, in separate columns for Production Person and Quality Assurance Person. Is this bad for some reason that I can't see?
What has really prompted my question is that I'm trying to design exactly that in DBDesigner 4, which I'm new to, and it just doesn't like it - When I try to set up a second non-identifying relationship between Task and Person, it won't give me a second field. It also won't seem to let me rename the fields in Task that refer to the persons, so it'd be impossible to differentiate between the two anyway. Since no-one else seems to share this problem, I've began to wonder whether it's a good idea at all. Is it standard to introduce additional tables as soon as there are two or more links between two entities? What would that look like if I wanted to enforce the above rules? I can't see how I'd ensure that an n:m table always has entries for both people working on the task.

If you are confident your requirements will stay this rigid forever, then just create two NOT NULL FKs:
This declaratively enforces that exactly two people are associated to the task at all times, which would not be readily achievable with just the junction table (as you already noted).
OTOH, if you anticipate your requirements might change at some point in the future, then the added flexibility of junction table might be more important than the completely declarative enforcement of your business rules.
I'm not familiar with DBDesigner, and therefore with your particular problem, but in ER modeling in general, multiple relationships with the same entity are distinguished by their "rolenames" which determine the names of migrated attributes (see the section on "Rolenames" in the chapter 3 of the ERwin Methods Guide). Try locating something along those lines in the UI of your tool.

If you want to know the current state and not who held the role previously #Branko Dimitrijevic's solution will work.
But if the statement 'Either person may be swapped out for a different person at any time' implies you need to know who previously held that role consider a 3 table design
Task; TaskID, <other details>
Assignee; TaskID, PeopleID, role, start_date, end_date
People; PeopleID, <other details>
Then in the assignee table you need constraints to ensure that for each TaskID, Role combination the dates are reasonable e.g. dates don't overlap or have gaps. That you have only 1 of each role active for each task at a time. To manage this would probably require code either in triggers or the application.

Related

need help answering question about sql data integrity

The question is:
For your final designed database, find a scenario in which a relatively prominent business data
integrity can not be ensured by your current primary keys and foreign keys, nor by adding directly
more of such keys or check clauses in the created tables.
In other words, the data integrity ensured by
the keys within the database may not be enough to ensure all the data integrity within the business
context.
Write a SQL statement that will determine if such a problem exists or not, and where, for any
given state of the database.
I am not too sure what this question is asking or how to approach it.
Need help writing a sql code for this question.
I think the question is asking you to define some business logic that cannot be encoded in the database. However, it then wants you to find conflicts that could occur because the business logic is not encoded in the database. This second part seems to be in conflict with the first, but not necessarily.
An example based on your previous assignment would be if a coach is suddenly sick and there are too few additional coaches to cover the booked clients, or some coaches are not qualified to replace the sick coach, or had previous conflicts with certain clients and therefore can't be assigned to those clients. Therefore, some training bookings must be cancelled.
The decision on which are best to cancel may be difficult or impossible to code in SQL, but you can use SQL to verify that all of the sick coach's slots have either been filled by others or cancelled after the external business logic is applied.
EDIT: I think the above scenario fits the question's requirement that you can't find the conflicts (such as clients that don't like certain coaches) in your existing foreign key relationships, but you can verify that the external logic is consistent with the final requirements (all slots accounted for).
Perhaps a better example is the traveling salesman problem: It is difficult to code the least cost routing in SQL, but it's easy to verify that all cities have been visited.
The scenarios where every row can have variable number of attributes and each attribute can have variable number of datatypes, is generally modeled using EAV datamodel. EAV wikipedia.
Here, attribute can have variety of values and so, we cannot enforce check constraint always. In some scenarios, if attributes finite list is not available, we cannot have foreign key for attributeID.
This datamodel is popular in the medical history datamodel, where every patient can have different kinds of symptoms.
May be this can be an example for a scenario, where data integrity cannot be completely enforced.

Is there a database design pattern name for reducing duplicate join table data?

I have two tables with a join table to allow a many-to-many relationship.
It's a very familiar design pattern. It indicates which Branches each Member has access to.
As the number of members and branches increases I end up with a lot of data in the join table that is duplicated across members. Members tend to have access to the same groups of Branches as other Members.
So I'm looking at normalizing my data by creating a MemberProfile table that is effectively immutable. And rather than creating MemberBranch records for every Member I check for a matching MemberProfile, use if it already exists, or create one if it doesn't:
The idea being if I have a million Members with only a hundred access profiles this will save me a lot of space in my database.
I'm happy that it all works and that the development effort is worth is.
My question is "Is this a standard database design pattern, and if so, what is it called?"
EDIT: It's been pointed out that this is compressing the data not normalizing it. Which is the intent behind the design.
Unless your many:many table is always the join of particular other base tables, one is not normalizing. You aren't normalizing here. Normalization does not introduce new column names. It just rearranges the current ones among different base tables.
You are just compressing/encoding your data. There is not necessarily any benefit in this, since now some queries and updates will be slower although your database is smaller. (You have reported that it is worth it in your case.)
I understand you'd like to put a label on that precise transformation, but unfortunately, there aren't many books that discuss database design or refactoring patterns. One of the few is Martin Fowler's Refactoring Databases, which you may know for his work on analysis patterns (he also has a great blog, worth following!). In that book, Martin presents a bunch of refactoring patterns that can be applied to databases and has put a name on common database transformations, including the one you have presented, which he called Split Table.
Split Table. Vertically split (e.g. by columns) an existing table into one or more tables.
A catalog of the database refactorings presented in that book are available here.
Hi I don't know about a pattern name but I've used the same principle before.
To keep this performing well, introduce a checksum to memberProfile based upon the branches for the profile, this way a lookup for an existing profile is plain easy and fast.
But do remember that the checksum is not necessarily unique, in case of collisions you will still have to check the branches, but only for the profiles sharing the same checksum.
Cleanup can be a scheduled task is is nothing more then deleting the profiles without users.

MVC design: Deleting an entity with undeletable children

I develop a lot of MVC applications on my own and I am looking for good resources related to a particular problem.
In a lot of my applications, I need to give the end-user CRUD abilities on a lot of entities. But, often times I'll find that these entities are parents of un-deletable children records. Take the following model as an example:
Here, I want to give the user the ability to delete employees (maybe because they've left the company, etc.). However, these employees are listed on projects. To change the creator_employee_id on projects would make the data obviously incorrect, and to cascade delete the projects where that employee is listed is clearly a no-no.
So, I've handled this in the past by adding a flag:
So, when a user goes to delete an employee, I check if deleting that employee violates any foreign key constraints. If it doesn't, I delete the employee. If it does, I set the employee to "active = 0".
However, this entire process of checking, iffing, deleting or deactivating is really complicated and cumbersome to compensate for on the code-level. Because an employee might not just be tied to projects, but to other entities as well, you have to write a check for each entity, determine if it makes sense to "delete" or "deactivate," etc. Then you have to give the user a way to view/reactivate inactive records.
My question is: Are there any other approaches to handling this specific kind of problem? I've tried looking around the web, but this problem is difficult to find the wording for. Perhaps even just the right words will help point me in the right direction. I hope this problem makes sense.
I think that the only clean alternative to deleting entities is to manage their history, possibly with an active flag as in your example (or better with lifecycle timestamps). However, you have to treat all entities in a uniform manner and not delete some of them (those that don't have dependent entities), but not others (having dependent entities).
My approach would be to set the active flag to false in all cases when an employee leaves. Keep that code simple and don't try to use branching logic to handle different scenarios based on the existence of related projects.
If you really set on removing old employees when they don't have any projects related to them, then setup a maintenance task that runs monthly, weekly, or whatever, that cleans out any inactive employees with no related records.

It is OK to use STI in this situation?

I have a table named people
Each person can be a client, a manager, an accountant, or any combination of the three.
Also each of them have special table columns, besides the ones in the people table.
What I'm doing now is using a person_id in each of the tables... but I think it would be much simpler to just used the same table, and a different model for each one, so I can manage them separately.
Should I do that?
You don't have a nice inheritance hierarchy so I don't think STI applies. For example, how would you represent a person that was both a manager and an accountant in terms of (single) inheritance?
If a person could only have one of the three roles then maybe STI would make sense; but even then implementing roles using inheritance should be setting off warning bells in your head, you should know that one person will end up needing multiple roles sooner or later (and it will probably become a critical necessity immediately after delivery).

T-SQL database design and tables

I'd like to hear some opinions or discussion on a matter of database design. Me and my colleagues are developing a complex application in finance industry that is being installed in several countries.
Our contractors wanted us to keep a single application for all the countries so we naturally face the difficulties with different workflows in every one of them and try to make the application adjustable to satisfy various needs.
The issue I've encountered today was a request from the head of the IT department from the contractors side that we keep the database model in terms of tables and columns they consist of.
For examlpe, we got a table with different risks and we needed to add a flag column IsSomething (BIT NOT NULL ...). It fully qualifies to exists within the risk table according to the third normal form, no transitive dependency to the key, a non key value ...
BUT, the guy said that he wants to keep the tables as they are so we had to make a new table "riskinfo" and link the data 1:1 to the new column.
What is your opinion ?
We add columns to our tables that are referenced by a variety of apps all the time.
So long as the applications specifically reference the columns they want to use and you make sure the new fields are either nullable or have a sensible default defined so it doesn't interfere with inserts I don't see any real problem.
That said, if an app does a select * then proceeds to reference the columns by index rather than name you could produce issues in existing code. Personally I have confidence that nothing referencing our database does this because of our coding conventions (That and I suspect the code review process would lynch someone who tried it :P), but if you're not certain then there is at least some small risk to such a change.
In your actual scenario I'd go back to the contractor and give your reasons you don't think the change will cause any problems and ask the rationale behind their choice. Maybe they have some application-specific wisdom behind their suggestion, maybe just paranoia from dealing with other companies that change the database structure in ways that aren't backwards-compatible, or maybe it's just a policy at their company that got rubber-stamped long ago and nobody's challenged. Till you ask you never know.
This question is indeed subjective like what Binary Worrier commented. I do not have an answer nor any suggestion. Just sharing my 2 cents.
Do you know the rationale for those decisions? Sometimes good designs are compromised for the sake of not breaking currently working applications or simply for the fact that too much has been done based on the previous one. It could also be many other non-technical reasons.
Very often, the programming community is unreasonably concerned about the ripple effect that results from redefining tables. Usually, this is a result of failure to understand data independence, and failure to guard the data independence of their operations on the data. Occasionally, the original database designer is at fault.
Most object oriented programmers understand encapsulation better than I do. But these same experts typically don't understand squat about data independence. And anyone who has learned how to operate on an SQL database, but never learned the concept of data independence is dangerously ignorant. The superficial aspects of data independence can be learned in about five minutes. But to really learn it takes time and effort.
Other responders have mentioned queries that use "select *". A select with a wildcard is more data dependent than the same select that lists the names of all the columns in the table. This is just one example among dozens.
The thing is, both data independence and encapsulation pursue the same goal: containing the unintended consequences of a change in the model.
Here's how to keep your IT chief happy. Define a new table with a new name that contains all the columns from the old table, and also all the additional columns that are now necessary. Create a view, with the same name as the old table, that contains precisely the same columns, and in the same order, that the old table had. Typically, this view will show all the rows in the old table, and the old PK will still guarantee uniqueness.
Once in a while, this will fail to meet all of the IT chief's needs. And if the IT chief is really saying "I don't understand databases; so don't change anything" then you are up the creek until the IT chief changes or gets changed.