Which is the best way to link contextually related sql tables using keys? - sql

Let's say you have a dbEngine and a schema, let's say the tables are A, B and C. B has a foreign key reference to A, and C has a foreign key reference to B. Now let's say you have to add tables into this schema, all the way until Z, with the above foreign key pattern.
Is it not more efficient to use a single table, let's call it Keys, to store a reference to contextually related tables, where each key will bring back every non null entry in every table, as opposed to using a foreign key in each associated table.

Beware of reinventing SQL Server. Keys are not, in and of themselves, efficient. They don't bring back anything. They restrict updates that would violate referential integrity. Your other table, Keys, would not do that. It fails on that basis alone.
all the way until Z
I have designed many thousands of tables and written many thousands of queries. Eight tables in a query is a lot. I doubt I ever saw 16. They tend to cluster around a few, not be "strung out" in a chain. I don't say that 26 is unheard of, but yours is an extreme example.
Your table does exist, though, after a fashion, among the system tables. (If that link will ages out, search for sys.tables on msdn.microsoft.com.) The DBMS already has long had a need to, um, relate tables. It maintains information about their actual and permitted relationships, probably in system memory using the most efficient internal structure its engineers have been able to devise.
For a particular query, it's sometimes possible to "outsmart the system" and redo in SQL space something that's implemented more generally by the server itself. Usually that's a mistake, though. The tables are only the tip of an iceberg of features supplied by the server that are infeasible to implement in SQL. As a rule, it's best to use the system as intended, and leave DBMS engineering to DBMS engineers.

Related

<select> for an entity with composite keys - strategy needed

So say I have database table tours (PK tour_id) holding region independent information and tours_regional_details (PK tour_id, region_id) holding region specific information.
Let's say I want to populate select control with entities from tours_regional_details table (my real scenarios are bit different, just imagine this for the sake of simplicity).
So, how would you tackle this? My guts says concatenate PKs into delimited strings, like "pk1|pk2" or "pk1,pk2" and use that as value of select control. While it works, feels dirty and possibly needs additional validation steps before splitting the string again, which again feels dirty.
I don't want to start a composite vs single pk holy war, but may this be a bad database design decision on my part? I always believed identifying relationships and composite keys are there for a reason, but I feel tempted to alter my tables and just stuff them with auto incremental IDs and unique constraints. I'm just not sure what kind of a fresh hell will that introduce.
I am a little bit flabbergasted that I encounter this for the first time now after so many years.
EDIT: Yes, there is a table regions (PK region_id) but is mostly irrelevant for the topic. While in some scenarios two select boxes would make sense, let's say here they don't, let's say I want only one select box and want to select from:
Dummy tour (Region 1)
Dummy tour (Region 2)
Another dummy tour (region 3)
...
Composite primary keys aren't bad database design. In an ideal world, our programming languages and UI libraries would support tuples and relations as first-class values, so you'd be able to assign a pair of values as the value of an option in your dropdown control. However, since they generally only support scalar variables, we're stuck trying to encode or reduce our identifiers.
You can certainly add surrogate keys / autoincrement columns (and unique constraints on the natural keys where available) to every table. It's a very common pattern, most databases I've seen have at least some tables set up like this. You may be able to keep existing composite foreign keys as is, or you may want/need to change them to reference the surrogate primary keys instead.
The risk with using surrogate keys for foreign keys is that your access paths in the database become fixed. For example, let's assume tours_regional_details had a primary key tours_regional_detail_id that's referenced by a foreign key in another table. Queries against this other table would always need to join with tours_regional_details to obtain the tour_id or region_id. Natural keys allow more flexible access paths since identifiers are reused throughout the database. This becomes significant in deep hierarchies of dependent concepts. These are exactly the scenarios where opponents of composite keys complain about the "explosion" of keys, and I can at least agree that it becomes cumbersome to remember and type out joins on numerous columns when writing queries.
You could duplicate the natural key columns into the referencing tables, but storing redundant information requires additional effort to maintain consistency. I often see this done for performance or convenience reasons where surrogate keys were used as foreign keys, since it allows querying a table without having to do all the joins to dereference the surrogate identifiers. In these cases, it might've been better to reference the natural key instead.
If I'm allowed to return to my ideal world, perhaps DBMSs could allow naming and storing joins.
In practice, surrogate keys help balance the complexity we have to deal with. Use them, but don't worship them.

Designing Tables Sql Server

Good Morning,
in the design of a database, I have a table (TabA's call it) that could have relationships with four other tables. In the sense that this table can be connected both with the first of four, and with the second, and the third to the fourth, but could not have links with them; or it could have one (with any of the tables), or two links (always with two of any of them), and so on.
The table TabA I added four fields that refer to the four tables which could be "null" when they do not have any connection.
Wondering is this the kind of optimal design (say the four fields in the TabA) or you can make a better design for this type of situation?
Many thanks for your reply.
dave
In answer to the question and clarification in your comment, the answer is that your design can't be improved in terms of the number of foreign key columns. Having a specific foreign key column for every potential foreign key relationship is a best practice design.
However, the schema design itself seems questionable. I don't have enough information to tell whether the "Distributori_[N]_Livello" tables are a truly hierarchical structure or not. If it is, it is often possible to use a self-referential table for hierarchical structures rather than a set of N tables, as the diagram you linked seems to use. If you are able to refactor your design in such a way, it might be possible to reduce the number of foreign key columns required.
Whether this is possible or not is not for me to say given the data provided.

SQL one to one relationship vs. single table

Consider a data structure such as the below where the user has a small number of fixed settings.
User
[Id] INT IDENTITY NOT NULL,
[Name] NVARCHAR(MAX) NOT NULL,
[Email] VNARCHAR(2034) NOT NULL
UserSettings
[SettingA],
[SettingB],
[SettingC]
Is it considered correct to move the user's settings into a separate table, thereby creating a one-to-one relationship with the users table? Does this offer any real advantage over storing it in the same row as the user (the obvious disadvantage being performance).
You would normally split tables into two or more 1:1 related tables when the table gets very wide (i.e. has many columns). It is hard for programmers to have to deal with tables with too many columns. For big companies such tables can easily have more than 100 columns.
So imagine a product table. There is a selling price and maybe another price which was used for calculation and estimation only. Wouldn't it be good to have two tables, one for the real values and one for the planning phase? So a programmer would never confuse the two prices. Or take logistic settings for the product. You want to insert into the products table, but with all these logistic attributes in it, do you need to set some of these? If it were two tables, you would insert into the product table, and another programmer responsible for logistics data would care about the logistic table. No more confusion.
Another thing with many-column tables is that a full table scan is of course slower for a table with 150 columns than for a table with just half of this or less.
A last point is access rights. With separate tables you can grant different rights on the product's main table and the product's logistic table.
So all in all, it is rather rare to see 1:1 relations, but they can give a clearer view on data and even help with performance issues and data access.
EDIT: I'm taking Mike Sherrill's advice and (hopefully) clarify the thing about normalization.
Normalization is mainly about avoiding redundancy and relateded lack of consistence. The decision whether to hold data in only one table or more 1:1 related tables has nothing to do with this. You can decide to split a user table in one table for personal information like first and last name and another for his school, graduation and job. Both tables would stay in the normal form as the original table, because there is no data more or less redundant than before. The only column used twice would be the user id, but this is not redundant, because it is needed in both tables to identify a record.
So asking "Is it considered correct to normalize the settings into a separate table?" is not a valid question, because you don't normalize anything by putting data into a 1:1 related separate table.
Creating a new table with 1-1 relationships is not a reasonable solution. You might need to do it sometimes, but there would typically be no reason to have two tables where the user id is the primary key.
On the other hand, splitting the settings into a separate table with one row per user/setting combination might be a very good idea. This would be a three-table solution. One for users, one for all possible settings, and one for the junction table between them.
The junction table can be quite useful. For instance, it might contain the effective and end dates of the setting.
However, this assumes that the settings are "similar" to each other, in a SQL sense. If the settings are different such as:
Preferred location as latitude/longitude
Preferred time of day to receive an email
Flag to be excluded from certain contacts
Then you have a data-type problem when storing them in a table. So, the answer is "it depends". A lot of the answer depends on what the settings look like, how they will be used, and the type of constraints on them.
You're all wrong :) Just kidding.
On a very high load, high volume, heavily updated system splitting a table by 1:1 helps optimize I/O.
For example, this way you can place heavily read columns onto separate physical hard-drives to speed-up parallel reads (the 1-1 tables have to be in different "filegroups" for this). Or you can optimize table-level locks. Etc. Etc.
But this type of optimization usually does not happen until you have millions of rows and huge read/write concurrency
Splitting tables into distinct tables with 1:1 relationships between them is usually not practiced, because :
If the relationship is really 1:1, then integrity enforcement boils down to "inserts being done in all concerned tables, or none at all". Achieving this on the server side requires systems that support deferred constraint checking, and AFAIK that's a feature of the rather high-end systems. So in many cases the 1:1 enforcement is pushed over to the application side, and that approach has its own obvious downsides.
A case when splitting tables is nonetheless advisable, is when there are security perspectives, i.e. when not all columns can be updated by one user. But note that by definition, in such cases the relationship between the tables can never be strictly 1:1 .
(I also suggest you read carefully the discussion between Thorsten/Mike. You used the word 'normalization' but normalization has very little to do with your scenario - except if you were considering 6NF, which I think is rather unlikely.)
It makes more sense that your settings are not only in a separate table, but also use a on-to-many relationship between the ID and Settings. This way, you could potentially have a as many (or as few) setting as required.
UserSettings
[Settings_ID]
[User_ID]
[Settings]
In fact, one could make the same argument for the [Email] field.

Why is a primary-foreign key relation required when we can join without it?

If we can get data from two tables without having primary and foreign key relation, then why we need this rule? Can you please explain me clearly, with suitable example?
It's a test database, don't mind the bad structure.
Tables' structure:
**
table - 'test1'
columns - id,lname,fname,dob
no primary and foreign key and also not unique(without any constraints)
**
**table - 'test2'
columns- id,native_city
again, no relations and no constraints**
I can still join these tables with same columns 'id',
so if there's no primary-foreign key, then what is the use of that?
The main reason for primary and foreign keys is to enforce data consistency.
A primary key enforces the consistency of uniqueness of values over one or more columns. If an ID column has a primary key then it is impossible to have two rows with the same ID value. Without that primary key, many rows could have the same ID value and you wouldn't be able to distinguish between them based on the ID value alone.
A foreign key enforces the consistency of data that points elsewhere. It ensures that the data which is pointed to actually exists. In a typical parent-child relationship, a foreign key ensures that every child always points at a parent and that the parent actually exists. Without the foreign key you could have "orphaned" children that point at a parent that doesn't exist.
You need two columns of the same type, one on each table, to JOIN on. Whether they're primary and foreign keys or not doesn't matter.
You don't need a FK, you can join arbitrary columns.
But having a foreign key ensures that the join will actually succeed in finding something.
Foreign key give you certain guarantees that would be extremely difficult and error prone to implement otherwise.
For example, if you don't have a foreign key, you might insert a detail record in the system and just after you checked that the matching master record is present somebody else deletes it. So in order to prevent this you need to lock the master table, when ever you modify the detail table (and vice versa). If you don't need/want that guarantee, screw the FKs.
Depending on your RDBMS a foreign key also might improve performance of select (but also degrades performance of updates, inserts and deletes)
I know its late to post, but I use the site for my own reference and so I wanted to put an answer here for myself to reference in the future too. I hope you (and others) find it helpful.
Lets pretend a bunch of super Einstein experts designed our database. Our super perfect database has 3 tables, and the following relationships defined between them:
TblA 1:M TblB
TblB 1:M TblC
Notice there is no relationship between TblA and TblC
In most scenarios such a simple database is easy to navigate but in commercial databases it is usually impossible to be able to tell at the design stage all the possible uses and combination of uses for data, tables, and even whole databases, especially as systems get built upon and other systems get integrated or switched around or out. This simple fact has spawned a whole industry built on top of databases called Business Intelligence. But I digress...
In the above case, the structure is so simple to understand that its easy to see you can join from TblA, through to B, and through to C and vice versa to get at what you need. It also very vaguely highlights some of the problems with doing it. Now expand this simple chain to 10 or 20 or 50 relationships long. Now all of a sudden you start to envision a need for exactly your scenario. In simple terms, a join from A to C or vice versa or A to F or B to Z or whatever as our system grows.
There are many ways this can indeed be done. The one mentioned above being the most popular, that is driving through all the links. The major problem is that its very slow. And gets progressively slower the more tables you add to the chain, the more those tables grow, and the further you want to go through it.
Solution 1: Look for a common link. It must be there if you taught of a reason to join A to C. If it is not obvious, create a relationship and then join on it. i.e. To join A through B through C there must be some commonality or your join would either produce zero results or a massive number or results (Cartesian product). If you know this commonality, simply add the needed columns to A and C and link them directly.
The rule for relationships is that they simply must have a reason to exist. Nothing more. If you can find a good reason to link from A to C then do it. But you must ensure your reason is not redundant (i.e. its already handled in some other way).
Now a word of warning. There are some pitfalls. But I don't do a good job of explaining them so I will refer you to my source instead of talking about it here. But remember, this is getting into some heavy stuff, so this video about fan and chasm traps is really only a starting point. You can join without relationships. But I advise watching this video first as this goes beyond what most people learn in college and well into the territory of the BI and SAP guys. These guys, while they can program, their day job is to specialise in exactly this kind of thing. How to get massive amounts of data to talk to each other and make sense.
This video is one of the better videos I have come across on the subject. And it's worth looking over some of his other videos. I learned a lot from him.
A primary key is not required. A foreign key is not required either. You can construct a query joining two tables on any column you wish as long as the datatypes either match or are converted to match. No relationship needs to explicitly exist.
To do this you use an outer join:
select tablea.code, tablea.name, tableb.location from tablea left outer join
tableb on tablea.code = tableb.code
join with out relation
SQL join

SQL Referencial Integrity Between a Column and (One of Many Possible) Tables

This is more of a curiosity at the moment, but let's picture an environment where I bill on a staunch nickle&dime basis. I have many operations that my system does and they're all billable. All these operations are recorded across various tables (these tables need to be separate because they record very different kinds of information). I also want to micro manage my accounts receivables. (Forgive me if you find inconsistencies here, as this example is not a real situation)
Is there a somewhat standard way of substituting a foreign key with something that can verify that the identifier in column X on my billing table is an existing identifier within one of many operations record tables?
One idea is that when journalizing account activity, I could reference the operation's identifier as well as the operation (specifically, the table that it's in) and use a CHECK constraint. This is probably the best way to go so that my journal is not ambiguous.
Are there other ways to solve this problem, de-facto or proprietary?
Do non-relational databases solve this problem?
EDIT:
To rephrase my initial question,
Is there a somewhat standard way of substituting a foreign key with something that can verify that the identifier in column X on my billing table is an existing identifier within one of many (but not necessarily all) operations record tables?
No, there's no way to achieve this with a single foreign key column.
You can do basically one of two things:
in your table which potentially references any of the other x tables, have x foreign key reference fields (ideally: ID's of type INT), only one of which will ever be non-NULL at any given time. Each FK reference key references exactly one of your other data tables
or:
have one "child" table per master table with a proper and enforced reference, and pull together the data from those n child tables into a view (instead of a table) for your reporting / billing.
Or just totally forget about referential integrity - which I would definitely not recommend!
you can Implementing Table inheritance
see article
http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server
An alternative is to enforce complex referential integrity rules via a trigger. However,and not knowing exactly what your design is, usually when these types of questions are asked it is to work around a bad design. Look at the design first and see if you can change it to make this something that can be handled through FKs, they are much more managable than doing this sort of thing through triggers.
If you do go the trigger route, don't forget to enforce updates as well as inserts and make sure your trigger will work properly with a set-based multi-row insert and update.
A design alternative is to havea amaster table that is parent to all your tables with the differnt details and use the FK against that.