need help answering question about sql data integrity

need help answering question about sql data integrity - sql

The question is:
For your final designed database, find a scenario in which a relatively prominent business data
integrity can not be ensured by your current primary keys and foreign keys, nor by adding directly
more of such keys or check clauses in the created tables.
In other words, the data integrity ensured by
the keys within the database may not be enough to ensure all the data integrity within the business
context.
Write a SQL statement that will determine if such a problem exists or not, and where, for any
given state of the database.
I am not too sure what this question is asking or how to approach it.
Need help writing a sql code for this question.

I think the question is asking you to define some business logic that cannot be encoded in the database. However, it then wants you to find conflicts that could occur because the business logic is not encoded in the database. This second part seems to be in conflict with the first, but not necessarily.
An example based on your previous assignment would be if a coach is suddenly sick and there are too few additional coaches to cover the booked clients, or some coaches are not qualified to replace the sick coach, or had previous conflicts with certain clients and therefore can't be assigned to those clients. Therefore, some training bookings must be cancelled.
The decision on which are best to cancel may be difficult or impossible to code in SQL, but you can use SQL to verify that all of the sick coach's slots have either been filled by others or cancelled after the external business logic is applied.
EDIT: I think the above scenario fits the question's requirement that you can't find the conflicts (such as clients that don't like certain coaches) in your existing foreign key relationships, but you can verify that the external logic is consistent with the final requirements (all slots accounted for).
Perhaps a better example is the traveling salesman problem: It is difficult to code the least cost routing in SQL, but it's easy to verify that all cities have been visited.

The scenarios where every row can have variable number of attributes and each attribute can have variable number of datatypes, is generally modeled using EAV datamodel. EAV wikipedia.
Here, attribute can have variety of values and so, we cannot enforce check constraint always. In some scenarios, if attributes finite list is not available, we cannot have foreign key for attributeID.
This datamodel is popular in the medical history datamodel, where every patient can have different kinds of symptoms.
May be this can be an example for a scenario, where data integrity cannot be completely enforced.

Related

Why does CQRS seem to prevent unique constraints on write side?

Everywhere I see a post about ensuring uniqueness in a CQRS architecture the most obvious solution for me which is to add a unique index on the write side is never mentioned, without any explanation.
Instead I read the read model must be queried for that, and concurrency issues should be tackled by a saga compensating the action. Seems overly complex when you can just reject the command on unique index violation, so why is that?

Why does CQRS seem to prevent unique constraints on write side?
It doesn't
What it does do is recognize that that maintaining an invariant on a distributed set is a nightmare.
the most obvious solution for me which is to add a unique index on the write side is never mentioned, without any explanation
That's right. If you don't have a distributed set -- if all of the elements of the set are stored together -- then maintaining the invariant is straight forward.
But what does it mean to have a unique index constraint that spans two databases?
To express the idea in more modern terms, the guiding assumption is that the business logic should be scale agnostic. If two write models are really independent of one another, then we ought to be able to store them separately.
If there is a constraint that needs to be satisfied that depends on data from two different write models, then those write models aren't really independent.
Greg Young raised a really good question
What is the business impact of having a failure?
That's the sort of thing we are supposed to be thinking about in domain driven design, after all.
Why would event sourcing prevent me to put an index on unique fields??
Same answer really: it doesn't, so long as your unique constraint and your events are stored together.
If you have an RDMBS with a table that represents the elements of your set, and tables that store your events, you can update the two tables together within a single transaction, and roll back the whole mess if your constraint is violated.
But take that same idea, and put the set in a different database than the events? Now you have two distinct transactions to coordinate. Good luck with that.

Database design: Multiple non-identifying relationships between two tables

I'm new to database design and have some uncertainties about how best to model this particular case. I'd appreciate any suggestions for this fairly simple scenario.
When a production task begins, two people are involved at all times. One is in charge of the production, and a second is tasked with quality assurance. For any task in the database, it must be possible to identify these two people. They'll both exist in a Person table and have IDs, so I just want the best way to relate them to the production task. The following rules exist:
Either person may be swapped out for a different person at any time.
Each task always involves both people (Neither of these are null).
There are never any other people involved in the task that we want to record.
Each person may be involved in multiple tasks, or none at all.
If we had a whole host of relationships between the task and the people, I'd create some sort of convoluted relationship structure describing their relationship (As producer, quality assurance person, overseer, etc.), but here I feel as though it's sensible to just stick the IDs of the two people in the Task table, in separate columns for Production Person and Quality Assurance Person. Is this bad for some reason that I can't see?
What has really prompted my question is that I'm trying to design exactly that in DBDesigner 4, which I'm new to, and it just doesn't like it - When I try to set up a second non-identifying relationship between Task and Person, it won't give me a second field. It also won't seem to let me rename the fields in Task that refer to the persons, so it'd be impossible to differentiate between the two anyway. Since no-one else seems to share this problem, I've began to wonder whether it's a good idea at all. Is it standard to introduce additional tables as soon as there are two or more links between two entities? What would that look like if I wanted to enforce the above rules? I can't see how I'd ensure that an n:m table always has entries for both people working on the task.

If you are confident your requirements will stay this rigid forever, then just create two NOT NULL FKs:
This declaratively enforces that exactly two people are associated to the task at all times, which would not be readily achievable with just the junction table (as you already noted).
OTOH, if you anticipate your requirements might change at some point in the future, then the added flexibility of junction table might be more important than the completely declarative enforcement of your business rules.
I'm not familiar with DBDesigner, and therefore with your particular problem, but in ER modeling in general, multiple relationships with the same entity are distinguished by their "rolenames" which determine the names of migrated attributes (see the section on "Rolenames" in the chapter 3 of the ERwin Methods Guide). Try locating something along those lines in the UI of your tool.

If you want to know the current state and not who held the role previously #Branko Dimitrijevic's solution will work.
But if the statement 'Either person may be swapped out for a different person at any time' implies you need to know who previously held that role consider a 3 table design
Task; TaskID, <other details>
Assignee; TaskID, PeopleID, role, start_date, end_date
People; PeopleID, <other details>
Then in the assignee table you need constraints to ensure that for each TaskID, Role combination the dates are reasonable e.g. dates don't overlap or have gaps. That you have only 1 of each role active for each task at a time. To manage this would probably require code either in triggers or the application.

How many foreign keys is too many?

After running across this article:
http://diovo.com/2008/08/are-foreign-keys-really-necessary-in-a-database-design/
It seems like a good idea to use foreign keys when designing a database. But when are you using too many?
For example, suppose I have a main table used to store a list of machinery part information that other programs make reference to with the following columns:
ID
Name
Colour
Price
Measurement Units
Category
etc...
Should I be making tables containing a list of all possible colours, units and categories and then setting these as foreign keys to the corresponding columns in my machine part info table? At what point would the benefit of using foreign keys out weight the fact that I'm making all these extra tables and relationships?

Any attribute for which you want to be able to state, with certainty, that there are only known, valid values present in the database should be protected with a foreign key. Otherwise, you can only hope to catch invalid values in your application code and whatever interfaces are created in the future.
It is NOT a bad thing to have more tables and relationships. The only issue -- and it usually is not one -- has to do with the overhead of maintaining the indexes that are used in enforcing those relationships. Until you experience performance issues you should create a foreign key relationship for every column that "should" have one (because the values need to be validated against a list).
The performance considerations would have to be pretty dire before I would be willing to sacrifice correctness for performance.

Every Design is a compromise of competing goals, so there are very few simple answers (except the wrong ones).
I would certainly put discrete measures such as name, color, category, measure unit, etc.. in their own key tables. Variable measures (cost, number of units ,etc..) not so much, unless you have units in standard size packages (i.e. only 1, 6, 12, etc..)

The simplest way to design a database is to start with the requirements. In one classical methodology, the requirements are summarized in an ER (Entity-Relationship) model. In this model, relationships between entities are not invented, they are discovered. If they lie within the scope of the information the database is supposed to cover, then they are part of the model. Period.
From there, when you turn to database design, you already know what relationships you need. You have a few decisions to make about the structure of your tables, but almost all the foreign keys that reference a primary key are a direct consequence of the requirements.
Of course, if you are at liberty to change the requirements as you go through the design process, then you can do anything you want.

Dimensional modelling covers all points of your question well. Having too many foreign key relationships can make query performance suffer. Kimball's Group Reader is a great introduction to Dimensional Design and how to translate customer requirements to a schema.
http://en.wikipedia.org/wiki/Dimensional_modeling
The main question to ask is 'how constrained does the data need to be?' Concerning the color of machine parts, I'd assume, it would be in everyone's best interest not to have burnt ciena and camomille as color options. So, a look up table for these would be best.

Database Design Without Foreign Keys

After having worked at various employers I've noticed a trend of "bad" database design with some of these companies - primarily the exclusion of Foreign Keys Constraints. It has always bugged me that these transactional systems didn't have FK's, which would've promoted referential integrity.
Are there any scenarios, in transactional systems, whereby the omission of FK's would be beneficial?
Has anyone else experienced this, if so what was the outcome?
What should one do if they're presented with this scenario and their asked to maintain/enhance the system?

I cannot think of any scenario where, if two columns have a dependency, they should not have a FK constraint set up between them. Removing referential integrity may certainly speed up database operations but there's a pretty high cost to pay for that.
I have experienced such systems and the usual outcome is corrupted data, in the sense that records exists that shouldn't exist (or vice versa). These are the sort of systems where people believe they're okay because the application takes care of it, not caring that:
Every application has to take care of it, rather than one DB server.
It only takes one bug, or malignant app, to screw it up for everyone.
It is the responsibility of the database to protect itself! That is one of its best features.
As to what you should do, I simply put forward the possible things that can go wrong and how using FKs will prevent that (often with a cost/benefit analysis "skewed" toward my viewpoint, if necessary). Then let the company decide - it is their database, after all.

There is a school of thought that a well-written application does not need referential integrity. If the application does things right, the thinking goes, there's no need for constraints.
Such thinking is akin to not doing defensive programming because if you write the code correctly, you won't have bugs. While true, it simply won't happen. Not using appropriate constraints is asking for data corruption.
As for what you should do, you should encourage the company to add constraints at every opportunity. You don't want to push it to the point of getting in trouble or making a bad name for yourself, but as long as the environment is appropriate, keep pushing for it. Everyone's life will be better in the long run.

Personally, I have no problem with a database not having explicit declarations for foreign keys. But, it depends on how the database is being used.
Most of the databases that I work with are relatively static data derived from one or more transactional systems. I am not particularly concerned with rogue updates affecting the database, so an explicit definition of a foreign key relationship is not particularly important.
One thing that I do have is very consistent naming. Basically, every table has a first column called ID, which is exactly how the column is refered to in other tables (or, sometimes with a prefix, when there are multiple relationships between two entities). I also try to insist that every column in such a database has a unique name that describes the attribute (so "CustomerStartDate" is different from "ProductStartDate").
If I were dealing with data that had more "cooks in the pot", then I would want to be more explicit about the foreign key relationships. And, I then I am more willing to have the overhead of foreign key definitions.
This overhead arises in many places. When creating a new table, I may want to use use "create table as" or "select into" and not worry about the particulars of constraints. When running update or insert queries, I may not want the database overhead of checking things that I know are ok. However, I must emphasize that consistent naming greatly increases my confidence that things are ok.
Clearly, my perspective is not one of a DBA but of a practitioner. However, invalid relationships between tables are something I -- or the rest of my team -- almost never has to deal with.

As long as there's a single point of entry into the database it ultimately doesn't matter which "layer" is maintaining referential integrity. Using the "built-in layer" of foreign key constraints seems to make the most sense, but if you have a rock solid service layer responsible for the same thing then it has freedom to break the rules if necessary.
Personally I use foreign key constraints and engineer my apps so they don't have to break the rules. Relational data with guaranteed referential integrity is just easier to work with.
The performance gained is probably equivalent to the performance lost from having to maintain integrity outside of the db.

In an OLTP database, the only reason I can think of is if you care about performance more than data integrity. Enforcing a FK when row is inserted to the child table requires an index seek on the parent table and I can imagine there may be extreme situations where even this relatively quick index seek is too much. For example, some kind of very intensive logging where you can live with incorrect log entries and the application doing the writing is simple and unlikely to have bugs.
That being said, if you can live with corrupt data, you can probably live without a database in the first place.

Defensive Programming withot foreign keys works if you primarily use stored procedures and every application uses those stored procedures, instead of writing their own queries. Then you can control it quite easily and more flexible than the standard foreign keys.
One situation I can think of off the top of my head where foreign key constraints are not readily usable is a permissions module where permissions can be applied per user or per group, determined by a Boolean. So some of the records in the permissions table have a user id and others have a group id. If you still wanted foreign key constraints, you would have to have two different fields for the same mutally exclusive information and allow them to be null. Meaning adding another constraint saying that one is allowed to be null but they can't both be null, as well as a combination of 3 fields must be unique instead of a combination of 2 fields (user/group id and permission id). And the alternative is two separate tables containing the same data, meaning maintaining both tables separately.
But perhaps in that scenario, it's best to separate the data. Anything where you need the same field to connect to different tables based on other data in that record, you cannot use foreign field constraints, and it becomes best to keep the constraints in the stored procedures and views instead.

T-SQL database design and tables

I'd like to hear some opinions or discussion on a matter of database design. Me and my colleagues are developing a complex application in finance industry that is being installed in several countries.
Our contractors wanted us to keep a single application for all the countries so we naturally face the difficulties with different workflows in every one of them and try to make the application adjustable to satisfy various needs.
The issue I've encountered today was a request from the head of the IT department from the contractors side that we keep the database model in terms of tables and columns they consist of.
For examlpe, we got a table with different risks and we needed to add a flag column IsSomething (BIT NOT NULL ...). It fully qualifies to exists within the risk table according to the third normal form, no transitive dependency to the key, a non key value ...
BUT, the guy said that he wants to keep the tables as they are so we had to make a new table "riskinfo" and link the data 1:1 to the new column.
What is your opinion ?

We add columns to our tables that are referenced by a variety of apps all the time.
So long as the applications specifically reference the columns they want to use and you make sure the new fields are either nullable or have a sensible default defined so it doesn't interfere with inserts I don't see any real problem.
That said, if an app does a select * then proceeds to reference the columns by index rather than name you could produce issues in existing code. Personally I have confidence that nothing referencing our database does this because of our coding conventions (That and I suspect the code review process would lynch someone who tried it :P), but if you're not certain then there is at least some small risk to such a change.
In your actual scenario I'd go back to the contractor and give your reasons you don't think the change will cause any problems and ask the rationale behind their choice. Maybe they have some application-specific wisdom behind their suggestion, maybe just paranoia from dealing with other companies that change the database structure in ways that aren't backwards-compatible, or maybe it's just a policy at their company that got rubber-stamped long ago and nobody's challenged. Till you ask you never know.

This question is indeed subjective like what Binary Worrier commented. I do not have an answer nor any suggestion. Just sharing my 2 cents.
Do you know the rationale for those decisions? Sometimes good designs are compromised for the sake of not breaking currently working applications or simply for the fact that too much has been done based on the previous one. It could also be many other non-technical reasons.

Very often, the programming community is unreasonably concerned about the ripple effect that results from redefining tables. Usually, this is a result of failure to understand data independence, and failure to guard the data independence of their operations on the data. Occasionally, the original database designer is at fault.
Most object oriented programmers understand encapsulation better than I do. But these same experts typically don't understand squat about data independence. And anyone who has learned how to operate on an SQL database, but never learned the concept of data independence is dangerously ignorant. The superficial aspects of data independence can be learned in about five minutes. But to really learn it takes time and effort.
Other responders have mentioned queries that use "select *". A select with a wildcard is more data dependent than the same select that lists the names of all the columns in the table. This is just one example among dozens.
The thing is, both data independence and encapsulation pursue the same goal: containing the unintended consequences of a change in the model.
Here's how to keep your IT chief happy. Define a new table with a new name that contains all the columns from the old table, and also all the additional columns that are now necessary. Create a view, with the same name as the old table, that contains precisely the same columns, and in the same order, that the old table had. Typically, this view will show all the rows in the old table, and the old PK will still guarantee uniqueness.
Once in a while, this will fail to meet all of the IT chief's needs. And if the IT chief is really saying "I don't understand databases; so don't change anything" then you are up the creek until the IT chief changes or gets changed.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas