is it necessary to have foreign key for simple tables - sql

have a table called RoundTable
It has the following columns
RoundName
RoundDescription
RoundType
RoundLogo
Now the RoundType will be having values like "Team", "Individual", "Quiz"
is it necessary to have a one more table called "RoundTypes" with columns
TypeID
RoundType
and remove the RoundType from the rounds table and have a column "TypeID" which has a foreign key to this RoundType table?
Some say that if you have the RoundType in same table it is like hard-coding as there will be lot of round types in future.
is it like if there are going to be only 2-3 round types, i need not have foreign key??

Is it necessary? Obviously not. SQL works fine either way. In a properly defined database, you would do one of two things for RoundType:
Have a lookup table
Have a constraint that checks that values are within an agreed upon set (and I would put enums into this category)
If you have a lookup table, I would advocate having an auto-incremented id (called RoundTypeId) for it. Remember, that in a larger database, such a table would often have more than two columns:
CreatedAt -- when it was created
CreatedBy -- who created it
CreatedOn -- where it was created (important for distributed systems)
Long name
In a more advanced system, you might also need to internationalize the system -- that is, make it work for multiple languages. Then you would be looking up the actual string value in other tables.

is it like if there are going to be only 2-3 round types, i need not
have foreign key??
Usually it's just the opposite: If you have a different value for most of the records (like in a "lastName" column) you won't use a lookup table.
If, however, you know that you will have a limited set of allowed/possible values, a lookup table referenced via a foreign key is probably the better solution.
Maybe read up on "database normalization", starting perhaps # Wikipedia.

Actually you need to have separate table if you have following association between entities,
One to many
Many to many
because of virtue of these association simple DBMS becomes **R**DBMS ( Relation .)
Now ask simple question,
Whether my single record in round table have multiple roundTypes?
If so.. Make a new table and have foreign key in ROUNDTable.
Otherwise no.

yeah I think you should normalize it. Because if you will not do so then definitely you have to enter the round types (value) again and again for each record which is not good practice at all in case if you have large data. so i will suggest you to make another table
however later on you can make a view to get the desired result as fallow
create view vw_anyname
as
select RoundName, RoundDescription , RoundLogo, RoundType from roundtable join tblroundtype
on roundtable.TypeID = tblroundtype .typeid
select * from vw_anyname

Related

SQL - What is best to do when multiple tables have the same columns

I have different tables in my scheme with different columns, but I want to store data of when was the table modified or when was the data stored, so I added some columns to specify that.
I realized that I had to add the same "modification_date" and "modification_time" columns to all my tables, so I thought about making a new table called DATA_INFO so I won't need to do so, but every table has a different PRIMARY KEY and I don't know which one to add as FOREIGN KEY to the DATA_INFO table.
I don't know if I have to maybe add all of them or is there another way to do what I need.
It's better to have the same "modification_datetime" column in all tables, rather than trying to keep that data in a central table.
That's what we have done at every shop I've worked in.
I want to emphasize that a separate table is not reasonable for this purpose. The lack of an obvious foreign key is a hint.
Unlike Tab Allerman, tables that I create are much less likely to be updated, so I have three additional columns on most tables:
CreatedBy -- the user who created the row
CreatedAt -- when the row was creatd
CreatedOn -- the system where the table was created
The most important point is that this information can -- in many databases -- be implemented using default values rather than triggers. That is a big advantage of working within a single row. The fewer triggers, the better.

SQL Newbie: Use a foreign key for a lookup table being used like an enum() field?

Say I have field Ice_Cream.flavor, with the current choices in lookup table Flavor.flavor.
I use Flavor.flavor like an enum() list, storing the value, not the record ID, in Ice_Cream.flavor. If Flavor.flavor changes, I don't want to update Ice_Cream:flavor. I want it to stay as created.
Should I set up Ice_Cream.Flavor as a foreign key, so I can see the source of the values in my ER diagram, or not?
If you want Ice_Cream.flavor to stay as created even if there is no matching record in Flavor (which is what your question sounds like) then you cannot create a FOREIGN KEY relationship, it will not allow that condition to occur in your database.
Furthermore, if you're storing the actual text Flavor.Flavor string in Ice_Cream.Flavor, there's no particular reason to have a separate RecordID column in Flavor.
IMHO, you do not need a FK here except if you have additional informations about a flavor in the Flavor table beside the name in the column flavor. It is the case because you do not keep an ID, you keep the name AND you want to keep the old value.
I also supposed that you do not want to keep old flavors in the Flavor table or elsewhere except in the Ice_Cream table.
Last but not least, a FK would require that any flavor stored in Ice_Cream.flavor exists in the Flavor table. It is not the case if I understand correctly your question.

Normalization Help

I am refactoring an old Oracle 10g schema to try to introduce some normalization. In one of the larger tables, there is a text field that has at most, 10-15 possible values. In my mind, it seems that this field is an example of unnecessary data duplication and should be extracted to a separate table.
After examining the data, I cannot find one relevant piece of information that could be associated with that text value. Basically, if I pulled that value out and put it into its own table, it would be the only field in that table. It exists today as more of a 'flag' field. Should I create a two-column table with a surrogate key, keep it as it is, or do something entirely different? Am I doing more harm than good by trying to minimize data duplication on this field?
You might save some space by extracting the column to a separate table. This is called a lookup table. It can give you a couple of other benefits:
You can declare a foreign key constraint to the lookup table, so you can rely on the column in the main table never having any value other than the 10-15 values you want.
It's easy to query for a concise list of all permitted values, by querying the lookup table. This can be faster than using SELECT DISTINCT on the main table's column. It also returns values that are permitted, but not currently used in the main table.
If you change a value in the lookup table, it automatically applies to all rows in the main table that reference it.
However, creating a lookup table with one column is not strictly normalization. You're just replacing one value with another. The attribute in the main table either already supports a normal form, or not.
Using surrogate keys (vs. natural keys) also has nothing to do with normalization. A lot of people make this mistake.
However, if you move other attributes into the lookup table, attributes that depend only on the lookup value and therefore would create repeating groups (violating 3NF) in the main table if you left them there, then that would be normalization.
If you want normalization break it out.
I think of these types of data in DBs as the equivalent of enums in C,C++,C#. Mostly you put them in the table as documentation.
I often have an ID, Name, Description, and auditing columns for them (eg modified by, modified date, create date, create by, active.) The description field is rarely used.
Example (some might say there are more than just 2)
Gender
ID Name Audit Columns...
1 Male
2 Female
Then in your contacts you would have a GenderID column which would link to this one.
Of course you don't "need" the table. You could have external documentation somewhere that says 1=Male, 2=Female -- but I think these tables serve to document a system.
If it's really a free-entry text field that's not re-used somewhere else in the database, and there's just a single field without repeated instances, I'd probably go ahead and leave it as it is. If you're determined to break it out I'd create a 'validation' table with a surrogate key and the text value, then put the surrogate key in the base table.
Share and enjoy.
Are these 10-15 values actually meaningful, or are they really just flags? If they're meaningful pieces of text and it seems wasteful to replicate them, then sure create a lookup table. But if they're just arbitrary flag values, then your new table will be nothing more than a mapping from one arbitrary value to another, and not terribly helpful.
A completely separate question is whether all or most of the rows in your big table even have a value for this column. If not, then indeed you have a good opportunity for normalization and can create a separate table linking the primary key from your base table with the flag value.
Edit: One thing. If there's some chance that one of these "flag" values is likely to be wholesale replaced with another value at some point in the future, that would be another good reason to create a table.

Multiple foreign keys to a single column

I'm defining a database for a customer/ order system where there are two highly distinct types of customers. Because they are so different having a single customer table would be very ugly (it'd be full of null columns as they are pointless for one type).
Their orders though are in the same format. Is it possible to have a CustomerId column in my Order table which has a foreign key to both the Customer Types? I have set it up in SQL server and it's given me no problems creating the relationships, but I'm yet to try inserting any data.
Also, I'm planning on using nHibernate as the ORM, could there be any problems introduced by doing the relationships like this?
No, you can't have a single field as a foreign key to two different tables. How would you tell where to look for the key?
You would at least need a field that tells what kind of user it is, or two separate foreign keys.
You could also put the information that is common for all users in one table and have separate tables for the information that is specific for the user types, so that you have a single table with user id as primary key.
A foreign key can only reference a single primary key, so no. However, you could use a bridge table:
CustomerA <---- CustomerA_Orders ----> Order
CustomerB <---- CustomerB_Orders ----> Order
So Order doesn't even have a foreign key; whether this is desirable, though...
I inherited a SQL Server database where this was done (a single column used in four foreign key relationships with four unrelated tables), so yes, it's possible. My predecessor is gone, though, so I can't ask why he thought it was a good idea.
He used a GUID column ("uniqueidentifier" type) to avoid the ambiguity problem, and he turned off constraint checking on the foreign keys, since it's guaranteed that only one will match. But I can think of lots of reasons that you shouldn't, and I haven't thought of any reasons you should.
Yours does sound like the classical "specialization" problem, typically solved by creating a parent table with the shared customer data, then two child tables that contain the data unique to each class of customer. Your foreign key would then be against the parent customer table, and your determination of which type of customer would be based on which child table had a matching entry.
You can create a foreign key referencing multiple tables. This feature is to allow vertical partioining of your table and still maintain referential integrity. In your case however, this is not applicable.
Your best bet would be to have a CustomerType table with possible columns - CustomerTypeID, CustomerID, where CustomerID is the PK and then refernce your OrderID table to CustomerID.
Raj
I know this is a very old question; however if other people are finding this question through the googles, and you don't mind adding some columns to your table, a technique I've used (using the original question as a hypothetical problem to solve) is:
Add a [CustomerType] column. The purpose of storing a value here is to indicate which table holds the PK for your (assumed) [CustomerId] FK column. Optional - addition of a check constraint (to ensure CustomerType is in CustomerA or CustomerB) will help you sleep better at night.
Add a computed column for each [CustomerType], eg:
[CustomerTypeAId] as case when [CustomerType] = 'CustomerA' then [CustomerId] end persisted
[CustomerTypeBId] as case when [CustomerType] = 'CustomerB' then [CustomerId] end persisted
Add your foreign keys to the calculated (and persisted) columns.
Caveat: I'm primarily in a MSSQL environment; so I don't know how well this translates to other DBMS (ie: Postgres, ORACLE, etc).
As noted, if the key is, say, 12345, how would you know which table to look it up in? You could, I suppose, do something to insure that the key values for the two tables never overlapped, but this is too ugly and painful to contemplate. You could have a second field that says which customer type it is. But if you're going to have two fields, why not have one field for customer type 1 id and another for customer type 2 id.
Without knowing more about your app, my first thought is that you really should have a general customer table with the data that is common to both, and then have two additional tables with the data specific to each customer type. I would think that there must be a lot of data common to the two -- basic stuff like name and address and customer number at the least -- and repeating columns across tables sucks big time. The additional tables could then refer back to the base table. As there is then a single key for the base table, the issue of foreign keys having to know which table to refer to evaporates.
Two distinct types of customer is a classic case of types and subtypes or, if you prefer, classes and subclasses. Here is an answer from another question.
Essentially, the class-table-inheritance technique is like Arnand's answer. The use of the shared-primary-key technique is what allows you to get around the problems created by two types of foreign key in one column. The foreign key will be customer-id. That will identify one row in the customer table, and also one row in the appropriate kind of customer type table, as the case may be.
Create a "customer" table include all the columns that have same data for both types of customer.
Than create table "customer_a" and "customer_b"
Use "customer_id" from "consumer" table as foreign key in "customer_a" and "customer_b"
customer
|
---------------------------------
| |
cusomter_a customer_b

Decision between storing lookup table id's or pure data

I find this comes up a lot, and I'm not sure the best way to approach it.
The question I have is how to make the decision between using foreign keys to lookup tables, or using lookup table values directly in the tables requesting it, avoiding the lookup table relationship completely.
Points to keep in mind:
With the second method you would
need to do mass updates to all
records referencing the data if it
is changed in the lookup table.
This is focused more
towards tables that have a lot of
the column's referencing many lookup
tables.Therefore lots of foreign
keys means a lot of
joins every time you query the
table.
This data would be coming from drop
down lists which would be pulled
from the lookup tables. In order to match up data when reloading, the values need to be in the existing list (related to the first point).
Is there a best practice here, or any key points to consider?
You can use a lookup table with a VARCHAR primary key, and your main data table uses a FOREIGN KEY on its column, with cascading updates.
CREATE TABLE ColorLookup (
color VARCHAR(20) PRIMARY KEY
);
CREATE TABLE ItemsWithColors (
...other columns...,
color VARCHAR(20),
FOREIGN KEY (color) REFERENCES ColorLookup(color)
ON UPDATE CASCADE ON DELETE SET NULL
);
This solution has the following advantages:
You can query the color names in the main data table without requiring a join to the lookup table.
Nevertheless, color names are constrained to the set of colors in the lookup table.
You can get a list of unique colors names (even if none are currently in use in the main data) by querying the lookup table.
If you change a color in the lookup table, the change automatically cascades to all referencing rows in the main data table.
It's surprising to me that so many other people on this thread seem to have mistaken ideas of what "normalization" is. Using a surrogate keys (the ubiquitous "id") has nothing to do with normalization!
Re comment from #MacGruber:
Yes, the size is a factor. In InnoDB for example, every secondary index stores the primary key value of the row(s) where a given index value occurs. So the more secondary indexes you have, the greater the overhead for using a "bulky" data type for the primary key.
Also this affects foreign keys; the foreign key column must be the same data type as the primary key it references. You might have a small lookup table so you think the primary key size in a 50-row table doesn't matter. But that lookup table might be referenced by millions or billions of rows in other tables!
There's no right answer for all cases. Any answer can be correct for different cases. You just learn about the tradeoffs, and try to make an informed decision on a case by case basis.
In cases of simple atomic values, I tend to disagree with the common wisdom on this one, mainly on the complexity front. Consider a table containing hats. You can do the "denormalized" way:
CREATE TABLE Hat (
hat_id INT NOT NULL PRIMARY KEY,
brand VARCHAR(255) NOT NULL,
size INT NOT NULL,
color VARCHAR(30) NOT NULL /* color is a string, like "Red", "Blue" */
)
Or you can normalize it more by making a "color" table:
CREATE TABLE Color (
color_id INT NOT NULL PRIMARY KEY,
color_name VARCHAR(30) NOT NULL
)
CREATE TABLE Hat (
hat_id INT NOT NULL PRIMARY KEY,
brand VARCHAR(255) NOT NULL,
size INT NOT NULL,
color_id INT NOT NULL REFERENCES Color(color_id)
)
The end result of the latter is that you've added some complexity - instead of:
SELECT * FROM Hat
You now have to say:
SELECT * FROM Hat H INNER JOIN Color C ON H.color_id = C.color_id
Is that extra join a huge deal? No - in fact, that's the foundation of the relational design model - normalizing allows you to prevent possible inconsistencies in the data. But every situation like this adds a little bit of complexity, and unless there's a good reason, it's worth asking why you're doing it. I consider possible "good reasons" to include:
Are there other attributes that "hang off of" this attribute? Are you capturing, say, both "color name" and "hex value", such that hex value is always dependent on color name? If so, then you definitely want a separate color table, to prevent situations where one row has ("Red", "#FF0000") and another has ("Red", "#FF3333"). Multiple correlated attributes are the #1 signal that an entity should be normalized.
Will the set of possible values change frequently? Using a normalized lookup table will make future changes to the elements of the set easier, because you're just updating a single row. If it's infrequent, though, don't balk at statements that have to update lots of rows in the main table instead; databases are quite good at that. Do some speed tests if you're not sure.
Will the set of possible values be directly administered by the users? I.e. is there a screen where they can add / remove / reorder the elements in the list? If so, a separate table is a must, obviously.
Will the list of distinct values power some UI element? E.g. is "color" a droplist in the UI? Then you'll be better off having it in its own table, rather than doing a SELECT DISTINCT on the table every time you need to show the droplist.
If none of those apply, I'd be hard pressed to find another (good) reason to normalize. If you just want to make sure that the value is one of a certain (small) set of legal values, you're better off using a CONSTRAINT that says the value must be in a specific list; keeps things simple, and you can always "upgrade" to a separate table later if the need arises.
One thing no one has considered is that you would not join to the lookup table if the data in it can change over time and the records joined to are historical. The example is a parts table and an order table. The vendors may drop parts or change part numbers, but the orders table should alawys have exactly what was ordered at the time it was ordered. Therefore, it should lookup the data to do the record insert but should never join to the lookup table to get information about an existing order. Instead the part number and description and price, etc. should be stored in the orders table. This is espceially critical so that price changes do not propagate through historical data and make your financial records inaccurate. In this case, you would also want to avoid using any kind of cascading update as well.
rauhr.myopenid.com wrote:
The way we decided to solve this problem is with 4th normal form.
...
That is not 4th normal form. That is a common mistake called One True Lookup:
http://www.dbazine.com/ofinterest/oi-articles/celko22
4th normal form is :
http://en.wikipedia.org/wiki/Fourth_normal_form
Normalization is pretty universally regarded as part of best practices in databases, and normalization says yeah, you push the data out and refer to it by key.
Since no one else has addressed your second point: When queries become long and difficult to read and write due to all those joins, a view will usually resolve that.
You can even make it a rule to always program against the views, having the view get the lookups.
This makes it possible to optimize the view and make your code resistant to changes in the tables.
In oracle, you could even convert the view into a materialized view if you ever need to.