Use of null values in related tables with foreign key constraints - sql

I have the following tables:
Cateogories
CategoryID (int) Primary Key
CategoryName (varchar)
Items
ItemID (int) Primary Key
CategoryID (int)
ItemName (varchar)
There is a foreign key constraint on Items.CategoryID. There is a chance that when a new item is created that there will be no category assigned.
Is it better to set Items.CategoryID to allow nulls and deal with the nulls in my code OR better to not allow nulls, set the default CategoryID to 1, and create a dummy record in the Categories table called "Uncategorized" and then deal with that dummy category in my code?

The logically correct way would be for the CategoryID column to be NULL when there is no Category for the item.
If you get trapped by any of the gotchas that are associated with using NULL, then that is most likely a sign that the design hasnt taken account of the fact that items cannot have a category. Fix the design. The NULL will ensure you stick to solving the correct problem.

It depends:
If your items really have no category, then I would allow NULLs, as that is what you have: no CategoryId.
If you want to list all categories, you do not want to display the dummy row, so you would have to ignore that.
If you want to display all items and show the categories, you'd better be aware that there are items without category, so you would use a LEFT JOIN in that case.
If possible, change your application to select a category before actually saving your item.
If you want to treat that Uncategorized category just like the other categories (list them with the other categories, count items assigned to it, select it in lists/dropdowns), then it should get it's own category, and Item.CategoryId should be NOT NULL.

Ideally you'd want to force a category choice before allowing an item to be created. If an item will have no category at any point in the future then you'll need to create a category specifically to deal with that. I personally wouldn't call it "Uncategorized" though as this implies that a user can just chase it up later - which they will forget to do with alarming regularity!
Go for logical consistency or you'll end up in a mess. If that means creating a "Miscellaneous" category then do that and make sure that (a) Users know when to use it and (b) It is reported on regularly to make sure items are categorised correctly.

For simple lookup tables of this type it is almost always better to disallow NULLs and have the unknown value in your lookup table.
Why?
Because the ANSI NULL specifications are inconsistent and very complex. Dealing with nulls greatly increases the likelihood of coding defects, and takes a lot more code to write
Because few developers really understand how NULLs work in all scenarios
Because it simplifies your model and queries nicely. You can join things together nicely with inner joins from either direction with very simple sql.
However, a few cautions:
You may want more than one "dummy" value: one for "unknown" and another for "not assigned". Of course, NULL bundles both into a single value, so you're going above & beyond the minimal standard if you do this.
You will end up sometimes having additional non-key attributes that either must be nullable or carry 'n/a' type values for the dummy rows. For heavily denormalized lookup tables (like warehousing dimensions) you'll probably want nulls allows for these columns because 'n/a' doesn't work well for timestamps, amounts, etc.
If you apply this technique to more than just simple lookup tables it will dramatically complicate your design. Don't do that.

SQL NULLs are tricky, so I think you're better off with a sentinel category.

In this case I believe it's really a matter of personal preference. Either way you'll have to deal with the uncategorized items in your code.

I do not believe that either of the alternatives are very good.
If you choose the NULL approach you will have problems with the gotchas involved in working with NULLs. If you choose to not allow nulls, you will need to handle cases where if you delete a category the item would cascade.
IMO the best design is to have three tables.
Categories
ID
Name
Items
ID
Name
Categories2Items
CategoryID
ItemID
This eliminates the need for NULL (and the gotchas involved) as well as allows you to have uncategorized items, and items which belong to several categories. This design is also in Boyce-Codd Normal form which is always a good thing ..
en.wikipedia.org/wiki/BCNF

Related

Modelling many-to-many relation between more than two tables

I'm modelling a tier-list database using PostgreSQL. This is how it works:
A user can create a new Tier List;
A user can add as many tiers he wants to the list;
A user can add as many items as he can. Initially, the items are added to an "unranked" section (not assigned to any tier), then the user can rank them as he wants.
Modeling details:
A tier necessarily belongs to a tier_list;
An item can be in multiple tier_lists and in multiple tiers as well;
An item added to a tier_list has not necessarily been added to one of the tiers.
For modelling the relations between item-tier and item-tier_list, I thought about two scenarios:
Creating a junction with a composite PFK key of item and tier_list with a nullable tier FK. The records with no tier value would be the unranked ones, while the ones with an assigned tier would be the ranked;
Creating two M-N relations: one between item and tier, storing ranked items, and another between item and tier_list, storing unranked items.
I feel like the first option would be easier to deal with when having to persist things like moving a product between tiers (or even unranking it), while the second looks more compliant to SQL standards. Am I missing something?
First proposed solution model:
Second proposed solution model:
You can create a joint key using 3 different fields.
First of all, why using smallint and not int? Not fluent in Posgres, but it's usually better to have the biggest integer possible as primary key (things can grow faster than you expect).
Second, I strongly suggest to put ID_ before and not after the name of the filed used for lookup. It makes it easier to read.
As how to build your tables:
Item
ID PK
Title
Descriptions
I see no problems here. I'd just change the name in tblProducts, for easier reading.
Tier_List
ID PK
Description
Works fine too. Again I'll look for a better name. I'd call this one tblTiers or tblLegues instead. Usign similar names can bring troubles in 2-3 years when you have to add things and you're not sure what's what. Better use distinctive names for the tables.
Tier (suggesting tblTiers or tblRankings)
ID PK
Tier_List_ID PK FK
Title
Description
Here I see a HUGE problem. For experience, I don't really understand why you create a combination key here with ID and Tier_List_ID. Do you need to reuse the same ID for different tiers? If that ID has a meaning bring it out from the PK absolutely! PK must be simple counters, that will NEVER be changed. I saw people using the ID with a meaning for the end-user. It was a total disaster! I can't even start describing the quantity of garbage data that that DB was containing.
I suppose, because you were talking about ranking, that the ID there is a Rank, a level or something like that.
The table should become
ID PK uuid
Tier_List_ID FK
Rank smallint
Title
Description
There's another reason why I had you do this: when you have a combined PK, certain DBRMs require you to use the same combined key in the lookup tables, and that can become messy fast!
Now, the lookup table:
tier_list_item (tblRankingLookup?)
ID_Product FK PK
ID_Tier_List FK PK
ID_Tier FK PK
You don't need anything else to make it work smoothly! At least, that's how I'd envision it.
Instead I'd add an ID_User (because I'm not sure if all users can see all tiers and all rankings, or they can see only theirs).
Addendum: if you need to have unique combinations of different elements, I'm pretty sure you can create a combined index and mark it as "unique" (don't remember the correct syntax, not sure it is the same in Postgres).
In exmple, if you don't want the Tier table to have the rank repeated only once per tier_list_ID, you can create an index using tier_list_ID and Ranking and mark it unique. This way a two tiers in the same tier_list will not have the same value for the field Rank (rank can still be null).

Relational Database Design: Conditionals

I'm designing a relational database that I plan to implement with SQL. I have a use case that I'm working on and seem to be having a bit of trouble thinking through the solution. The design is for an e-commerce order system.
Use Case:
The ORDER_DETAILS table contains a deliveryMethod attribute. I then have a SHIPPING_DETAILS table that contains address information and a PICKUP_DETAILS table that contains location, date, and time information for an in-person pickup. When a user places an order, they have the option to have their order shipped to their address or to pick up their order in person. My current thought is to have a shippingId foreign key and pickupId foreign key in the ORDER_DETAILS table. Then, basically run a conditional check on the deliveryMethod attribute and retrieve data from the appropriate table depending on the value of that attribute (either "shipping" or "pickup"). With this thought, however, I would be allowing for null values to be present in the ORDER_DETAILS for either the shippingId or the pickupId attributes. From my understanding, null values are viewed negatively in relational designs. So I'm looking for some feedback on this design. Is this okay? Am I overthinking the nulls? Is there a more efficient way to design this particular schema?
If I understand your problem correctly,
The cardinality of the relationship of ORDER to SHIPPING is 1 ---> (0, 1)
The cardinality of the relationship of ORDER to PICKUP is 1 ---> (0, 1)
An ORDER MUST have either a SHIPPING or a PICKUP, but not both.
To enforce the constraint (#3) you could define a functional constraint in the database. That gets into interesting stuff.
Anyway, like you say, you could make columns in ORDER that are FKs to the SHIPPING or PICKUP tables, but both of those are nullable. I don't think null FKs are evil or anything, but they do get messy especially if you had a whole bunch of delivery methods and not just two.
If you don't like the nulls, you could have separate association tables: (1) ORDER_DELIVERY that has just an order_id and an delivery_id, each are FKs to the respective tables, and (2) ORDER_PICKUP, also a two column table. In each case the primary key would be order_id. Now there are no nulls: the orders with delivery are in the ORDER_DELIVERY table and the orders with pickup are in ORDER_PICKUP.
Of course there's a tradeoff, as maintaining the constraint that there be exactly one and only one delivery method is not a consistency check across tables.
Another idea is to make the delivery and pickup details be JSON fields. Here you are doing more work on the application side, enforcing constraints programmatically, but you won't have nulls.
I wish I could say that there was a slam-dunk go-to design pattern here, but I don't see one. Personally with only two types of delivery methods, I would not shy from having nulls (as I'm not a purist). But I do love it when the database does the work, so....
(Oh, the answer to the question "are you over thinking things?" is no, this thinking is really good!)

database design, items and orders tables

I was just after some input on database design. I have two tables, Orders and Items.
The items table is going to be a list of items that can be used on multiple orders, each item has an id
The way i thought to do it at the moment, was in the order to put an array of comma seperated ids for each item in the order.
does that sound like the best way?
also im using linq to entity framework and i dont think id be able to create a relationship between the tables, but i dont think one is needed anyway is there, since the items are not unique to an order
Thanks for any advice
The way I thought to do it at the moment, was in the order to put an array of comma separated ids for each item in the order. Does that sound like the best way?
Absolutely not - It will be MUCH more difficult in SQL to determine which orders contain a particular item, enumerate the items (to get a total, for example), and to add/remove items from an order.
A much better way would be to create an OrderItem table, which has a foreign key back to Order and Item and any other attributes relating to the item in that order - quantity, discount, comments, etc.
As far as EF goes, it will probably create a third entity (OrderItem) that will "link" the two tables. If you don't add any extra properties (which you probably should) then EF will probably create it as a many-to-many relationship between the Order and Item entities.
As far as I have understood from your question (it is not very clear), every Order can have multiple Items and every Item can be used in multiple orders. If this is what you want, you have a many to many relationship, that must be resolved using an intersection entity. This intersection entity has 2 foreign keys, one for item and one for order. Using it, you can identify what items are in a certain order and what orders need a certain item.
As my explanation is very short and very sloppy, I will recommend you the following references:
http://sd271.k12.id.us/lchs/faculty/bkeylon/Oracle/database_design/section5/dd_s05_l03.pdf
Resolve many to many relationship
Also, you proposed design is very bad, as it breaks the first normal form: no attribute can have multiple values. You shoud try to build databases at least in third normal form.
Regarding the database design, you would usually create a third table - ORDER_ITEMS - linking the two tables, containing columns (foreign keys) for order id and item id. You might also want to include a column for quantity.

Site-wide comments with different type of pages and special requirements

I am interested in designing the database (well, I'm only concerned about one table really) for a site with the following requirements:
There is an items page, which lists items. items.xyz?id=t displays the item with ID t. I need the IDs of the items to be consecutive. The first item has ID 1, the second ID 2 and so on. Each item page has comments on that item.
There are other pages, such as objects, where objects.xyz?id=t displays the object with ID t. The IDs here need not necessarily be consecutive (and they can overlap with item IDs, but it's ok if you suggest something that forces them not to overlap). These also have comments.
My question is how to design the Comments table? If I have an EntityID in it that represents the page the comment should be displayed on (be it an item page or an object page), then should I make it so that the ItemID never overlaps the ObjectID by making all ObjectID start from, say, 109 and using a GUID table? (The ItemIDs increase very slowly). Is this acceptable practice?
Right now I'm doing it by having a bunch of nullable boolean fields in each comment: IsItem, IsObjectType1, IsObjectType2, ..., which allows me to know where each comment should be displayed. This isn't so bad since I only have a few objects, but it seems like an ugly hack.
What is the best way to go about this?
I see three solutions (assuming it is impossible or undesired to put Pages and Objects in one table). Either:
Tell the comment which it belongs to by giving it two columns: PageId and ObjectId.
That way you can also give these columns foreign keys to the respective tables and add proper indexes.
Introduce a table 'Entity' that has a unique id, a PageId and an ObjectId. Either columns are optional off course, exactly one of them must be filled, not 0 or both.
This way, you move all the potential garbage of having separate entities to this table, not polluting the Comments table, which should contain just comments. You isolate the mess.
Create a link table between Comments and Items and another table between Comments and Objects. Items and Objects are completely unrelated, and you don't have to pollute the Comments table with a lot of NULL values in multiple columns. When you create a comment, you decide if it links to an Item or an Object by inserting a link in either ItemComments or ObjectComments. Reading comments for an item or object is a matter of two simple joins.
The comments table can then contain only a single EntityId that refers to the Id in the Entity table.
The big advantage to this approach is twofold:
a) You can link other things to the same table too, whichout much hassle.
b) You can add other kinds of Entities and they will automatically support Comments and other things you might add, as mentioned in a).

Implementing many-to-many with one "primary" value

I have many products that can each be in many categories.
products: id, ...
products_categories: product_id, category_id
categories: id, ...
Now I want to have many products, each with one master category, and 0 or more secondary categories. I can think of two ways to model this in SQL.
Add an is_primary column to products_categories
OR
Add a primary_category_id column to products
What is the best way to implement this in pure SQL and/or ActiveRecord? I'm using PostgreSQL, for what it's worth.
I would go with the first option unless I have a good reason for choosing 2 (like the cost of an extra join when getting the primary category)
reason: you probably need to add the primary category to product_category table anyway (in order to use it in a uniform and simple way in queries like getting all categories for a product)
option 1 avoids duplicating primary category thus simpler
I would go with option (1). The reason for this is since your products can belong to more than one category, the relationship attribute (that its a 'primary' category) belongs in the table that defines the relationship.
I would even go further and suggest that instead of labeling the field 'is_primary', you should have the field labeled as 'association_type'. And instead of just adding a bit field, make it an integer field, and have all the association types defined. In your case today, there are only two association types - secondary and primary. The advantage is that this design is much more scalable. If tomorrow, you are asked to define a 'primary', a 'secondary' and all other tertiary categories, this design will be able to handle it, instead of having to add another field to designate the 'secondary' field.
It really depends on the exact details of what you're trying to accomplish. Here are some of the things to consider while deciding what's best for you. Other answers already tackled the first case, so I'm going to focus on the second one.
If you have primary_category_id:
It seems cleaner to have one field in product that tells which category is the primary one, than to have a field in every product_category which has 1 in one row and 0 in every other row, although the suggestion by M.R. to use association_type sounds clean too - but what's the chance you're going to have "tertiary" categories?
It's slightly easier to get to the primary category
It's easy to ensure every product always has a primary category (just make the field NOT NULL)
It automatically enforces that a product may only have one primary category
Should you also insert the primary category to products_categories?
Neither option is enforced.
If you don't, it's awkward to query all the categories
If you do, it's still easy to query, but without additional work, nothing guarantees the primary category is also inserted in the other table
If you use the is_primary method, you should somehow ensure that every product always has exactly one primary category.
What are each way's pros and cons?
Option 1. I can be sure that the primary category for a product is indeed one of its categories. But there may be a problem of ensuring that a product has no more than one primary category.
Option 2. This lets me make sure that a product has only one primary category. But then I don't seem to have a way to make sure that it's one of this same product's categories.
So, I would probably go for a third option, using a table Products_PrimaryCategories:
Products_PrimaryCategories: product_id, category_id
It seems the same as product_categories, but has some additional properties:
product_id has an associated unique index, making sure you can only have one primary category for each product;
(product_id, category_id) is a foreign key referencing products_categories (product_id, category_id) ensuring that a product's primary category is one of its categories (which implies that (product_id, category_id) should be products_categories's primary key).