separate columns for separate id fields? - sql

Say we have tables A, B, and C and then we want table Z to contain column TYPE which tells us which table of A, B, and C the record in Z is associated with.
Is it better to have a separate column for each table like columns A_ID, B_ID, and C_ID in order to use indexing?
Or is there some reason why using a generic column TYPE_ID might be better performance-wise?

Using a type_id and then a fk_id won't be good because selectivity on the index is 33%, which is too high to be of any use. You would always be indexing on the fk_id instead (that which links to A,B,C) - which may require tie breaking between 3 values (if the id is used by all 3 types).
Storage wise, an index never stores nulls, so the absolute number of items stored in the indexes, whether a single (fk_id) or multiple (a_id,b_id,c_id) will be similar.
If you are coming in from the exact fk_id (from either A,B,C), then using a unique index on (fk_id,type_id) in that order can quickly identify the record required.
It would seem for simplicity and brevity, two columns are better than 3 here.

This is sometimes a schema code smell.
If you are considering putting this as a single column in Z, does that mean that only one of A, B, C can be applicable to Z?
Before I decide, I'd really say I have to know more about the entity and the usage pattern. Is access coming from known A, B, or C, or is the supplemental info driven from the Z side? If it is driven from the Z side, do you want to get all the A, B and C columns and then use them selectively from the application, or just Zs with As or Zs with Bs - i.e. do you usually know the subtype? Also, do A, B, and C have enough columns to merit the separation out of Zs row if they are each 1-1 (i.e. you could have the columns in Z and just be NULL)
Just for completeness, another possibility which gives you more referential integrity (because with a single column, you can't be an FK to one of three tables) is to have tables Z_A, Z_B, Z_C:
With schemas:
Z_A:
Z_ID REFERENCES (Z.ID)
A_ID REFERENCES (A.ID)
Z_B:
Z_ID REFERENCES (Z.ID)
B_ID REFERENCES (B.ID)
Z_C:
Z_ID REFERENCES (Z.ID)
C_ID REFERENCES (C.ID)
With all IDs unique in each table, this constrains everything pretty nicely, except there is nothing declarative to stop Z lying in multiple tables without a trigger (you cannot make a unique constraint on an indexed view over a UNION ALL in SQL Server).
While it seems to multiply the number of tables, these can usually be wrapped up into views.

Related

Must a natural join be on a shared primary key?

Suppose I perform A natural join B, where:
A's schema is: (aID), where (aID) is a primary key.
B's schema is: (aID,bID), where (aID, bID) is a composite primary key.
Would performing the natural join work in this case? Or is it necessary for A to have both aID and bID for this to work?
NATURAL JOIN returns rows with one copy each of the common input table column names and one copy each of the column names that are unique to an input table. It returns a table with all such rows that can be made by combining a row from each input table. That is regardless of how many common column names there are, including zero. When there are no common column names, that is a kind of CROSS JOIN aka CARTESIAN PRODUCT. When all the column names are common, that is a kind of INTERSECTION. All this is regardless of PKs, UNIQUE, FKs & other constraints.
NATURAL JOIN is important as a relational algebra operator. In SQL it can be used in a certain style of relational programming that is in a certain sense simpler than usual.
For a true relational result you would SELECT DISTINCT. Also relations have no special NULL value whereas SQL JOINs treat a NULL as not equal to a NULL; so if we treat NULL as just another value relationally then SQL will sometimes not return the true relational result. (When both arguments have a NULL for each of some shared columns and both have the same non-NULL value for each other shared column.)
A "natural" join uses the names of columns to match between tables. It uses any matching names, regardless of key definitions.
Hence,
select . . .
from a natural join b
will use AId, because that is the only column with the same name.
In my opinion, natural join is an abomination. For one thing, it ignores explicitly declared foreign key relationships. These are the "natural join" keys, regardless of their names.
Second, the join keys are not clear in the SELECT statement. This makes debugging the query much more difficult.
Third, I cannot think of a SQL construct where adding a column or removing a column from a table takes a working query and changes the number of rows in the result set.
Further, I often have common columns on my tables -- CreatedAt, CreatedOn, CreatedBy. Just the existence of these columns precludes using natural joins.

How to block an insert into a table if a reference between 2 other tables exist?

completely newb on DBs here... Trying to make simple program to help an orphanage around my area. The question: Suppose I have table A and table B referencing table C through a foreign key constraint. Is there a way to block an insertion in table A referencing an element X in the table C if there is an element Y in table B already referencing X?
Table A: empty
Table B: Y(references X)
Table C: X
Attempt to insert Z(references X) in table A
Table A: Z(references X) <<--- Blocked
Table B: Y(references X)
Table C: X
Sorry for the newb question, my first try with a DB... And although I have done some research, I don't even know the proper terms to put on the search bars to look for this situation.
i might be wrong, but to me it looks like you are trying to solve the wrong problem ...
it sounds like tables A and B are some sort of subtype related to each other like student and teacher ... both are persons
i'll stick to the student - teacher example here, but it's all the same for other entities that can be treated alike
so depending on the relationships you want to have, the status of a person, either being a student or teacher can either be a discriminator field in the person table telling you what subtype a person is, or it can be part of another concrete relationship, for example if you have the case that a person can be a student in one case and at the same time a teacher in another... (the discriminator field would move to the table that describes the relationship so you get one discriminator per concrete relation)
in your abstract world with A, B and C it could look like this:
A
[A1]
[A2]
B
[B1]
[B2]
C
[C1]
R
[R1;C1;A1;NULL;A]
while the columns for R are:
PK
C reference
A reference
B reference
discriminator
and a constraint for R restricting the column C ref to be unique
the unique is the thing that prevents the unwanted additional reference you described and will block an insert to R (not exactly what you wanted, since you can still have an entry in A ... but it will prevent the additional reference to the C entity that was already referenced, which will most likely be enough)

Multiple tables need to have one-many relationship with common table

Suppose, I have a SQL database with table like A,B,C and Z. all three table A,B and C has a one-many relationship with table Z. I think of solution mentioned below but need suggestion or better solution for this problem if any.
Solution-
Create one more table named 'LinkZ' which have one-many relationship with table 'Z'
Add foreign key field to table A,B and C of table 'LinkZ'
So that way in future if more tables add to database which need relationship with table 'Z' we just need to add foreign key to newly added table of 'LinkZ' as it has already one-many relationship.
Please suggest solution if exist which satisfy my needs.
Edit -
#Gordon, "If all the tables have a "1-many" relationship with Z, then you don't need a link table at all. You need a link table when your relationships are many-to-many". I need this kind relationship which you have quoted from your answer. but if follow traditional way then i need to add AID, BID, CID in Z to achieve 1-Many with Z then 'Z' table contains many columns which represent foreign key with other table (In future might be other table also included to have same relationship, so in that case i want to avoid many columns from Z of only foreign keys only) that's why I have created dummy table 'LinkZ' which provide illusion to other table A,B and C that this all tables have 1-Many relationship with Z directly.
This is too long for a comment.
If all the tables have a "1-many" relationship with Z, then you don't need a link table at all. You need a link table when your relationships are many-to-many.
It is unclear which way the linkage is going, but you either want ZID in each of A, B, and C, or you want AID, BID, CID in Z.

SQL One-to-one relationship join

I have 2 tables one is an extension of the other so it is currently a simple one-to-one relationship (this is likely to become one-to-many in the future). I need to join from one table to another to pull a value out of another column in the extension.
so table A contains basic details including an id and table B uses a FK reference to the Id column in table A. I need to pull out column X from table B.
To add complexity sometimes there won't be a matching entry in table B but in that case it needs to return null. Also the value of X could be null.
I know I can use a left outer join but is there a more efficient way to perform the join?
Left outer join is the way. In order to make it most efficient, make sure you index the FK column in table B. It will be super-fast with the index.
You don't need to index the primary key in table A for this query (and most databases already index primary keys anyway).
The MySQL syntax for creating the index:
CREATE INDEX `fast_lookups` ON `table_b` (`col_name`);
You can name it whatever, I picked "fast_lookups."

Altering candidate key in SQL Server 2005

I have a rather interesting task ahead of me and I want to make sure I am thinking it out correct -
I have a table Part that has its part_id being used as part of the candidate keys of several other tables (A, B, C). I need to drop Part and use Product instead. A, B, C need to have their part_id column (which is of type bigint) replaced with a new column of product_id (which is of datatype int). I need to use the part_id column of the tables to determine what product_id I need to use for each row.
Here is what I am thinking is what I need to do (thoughts?):
create the product_id column in each of the tables (A, B, C)
set the product_id of each row for each table to the appropriate value
drop any constraints/fk/pk I have for the part_id column in A, B, C
drop the part_id from those tables completely
recreate the constraints/fk/pk I dropped earlier, only have product_id be part of them instead
drop the Part table completely
Can anyone see any potential issues that I may be neglecting?
Thanks!
Additional Info: Tables A, B, and C each have LOTS of data, so if there is more performant way, I am all ears.
You should be able to create new constraints on product_id before you drop constraints on part_id, and before you drop the column part_id. And you should do it in that way--create the new constraints first.
If things go wrong with the product_id changes, your database is still usable if the part_id constraints are still in place.