Database Design: Common Foreign Keys to a Single Table - oop

I am still new with database designing and I'm having a bit of difficulty with it.
I have a product table that has a main category and sub category attribute and I am having difficulties identifying a good design for the their database tables and class diagram.
An example would be an Orange's main_category would be "Fruit" and it's subcategory would be "pulpy fruit" and another product may be Broccoli and it's main_category would be "Vegetable" and it's subcategory would be "Fibrous Vegetables". So I believe subcategory is also under main_category where depending on the main_category selected, a list of subcategory can be chosen.
My current design is like this:
======================= ======================= =====================
| PRODUCT | | SUBCATEGORY | | MAIN CATEGORY |
======================= ======================= =====================
| PK id | |PK id | |PK id |
| name | ---> | name | ---> | name |
| FK main_category_id | |FK main_category_id | =====================
| FK subcategory_id | ======================= ^
======================= --------------------------------------------------|
But I'm having doubts with my current design and would like to ask for your opinions whether this design is acceptable or is there a better way of designing this.
Also, in terms of OOP, what type of relationship would this be?

Since a sub category is also a category, you can have them all in a single table. You may have a design like this,
Product(id, name)
Category(id, name)
ProductMaincategoryAssignments(productId, categoryId)
ProductSubcategoryAssignments(productId, categoryId)
Here the productId and categoryId are foreign keys which are referencing to corresponding id in Product and Category tables.

Related

Is it better to use a separate table to store a list of values, or include the value directly in the current table?

I have a jobs table that stores information such as title, department, and salary. I'm wanting the user to be able to create a job using a form that has fields for the aforementioned information, as well as a field for the job category. category would be something like retail, or IT, for example.
I don't have any issues with the actual coding itself, but rather what the best way to design the database store the information in it. So my question is this: should I create a separate table categories that stores each job category, along with an ID, so that the tables would look something like this
categories jobs
+----+---------------+ +----+---------------+-------------+--------+-------------+
| id | category | | id | title | department | salary | category_id |
+----+---------------+ +----+---------------+-------------+--------+-------------+
| 1 | Retail | | 1 | Retail | department1 | 10000 | 2 |
+----+---------------+ +----+---------------+-------------+--------+-------------+
| 2 | IT | | 2 | IT | department2 | 12000 | 1 |
+----+---------------+ +----+---------------+-------------+--------+-------------+
where category_id is a foreign key linking to the categories table,
or should I do something like this, where all the information is stored in a single table:
jobs
+----+---------------+-------------+--------+-------------+
| id | title | department | salary | category |
+----+---------------+-------------+--------+-------------+
| 1 | Retail | department1 | 10000 | IT |
+----+---------------+-------------+--------+-------------+
| 2 | IT | department2 | 12000 | Retail |
+----+---------------+-------------+--------+-------------+
Which is the better option? They both seem to achieve the same result, but what are the pros and cons of doing it either way, and which way would be the more preferred way of doing it?
In general, you want to store "entities" in separate tables. In this case, category is a separate entity from jobs.
Why do you want to do this?
There is only one row per category, so you don't have to worry about duplication -- and errors.
There may be additional information that you want to store, such as the creation date, abbreviation, who created it, and so on.
Properly declared foreign key constraints ensure that only valid categories are stored.
Categories may be shared across different tables, and a separate reference table ensures that the values are consistent.

Foreign key for many to many relationship in sql

I have an SQL table called Listing which is representing houses that have been rented. The table has a primary key id and another field called amenities with the things each house had to offer. The amenities of each house are separated from each other with a comma. For example TV, Internet, Bathroom.
I used the following commands to create a table called Amenity with all the unique different amenities offered and a SERIAL number for each amenity.
CREATE TABLE Amenity AS(
SELECT DISTINCT regexp_split_to_table(amenities,',') FROM Listing
);
ALTER TABLE Amenity
RENAME regexp_split_to_table to amenity_name;
ALTER TABLE Amenity ADD COLUMN amenity_id SERIAL;
ALTER TABLE Amenity ADD PRIMARY KEY(amenity_id);
My problem is that I need to connect these two tables with a foreign key and I don't know how since the relationship between them is a many to many relationship. I have checked other questions regarding foreign keys in many to many relations but could not find anything similar. If there exists something similar please explain the way it is similar to my question.
You must create another table which will hold the one-to-many relationships between a house and its amenities.
So your 3 tables looks look like this:
Table HOUSE
+----------+------------+
| house_id | house_name |
+----------+------------+
| 1 | Uncle Bob |
+----------+------------+
| 2 | Mom Sara |
+----------+------------+
Table AMENITIES
+------------+--------------+
| amenity_id | amenity_name |
+------------+--------------+
| 1 | TV |
+------------+--------------+
| 2 | Internet |
+------------+--------------+
| 3 | Kitchen |
+------------+--------------+
Table HOUSE_AMENITIES
+----------+------------+
| house_id | amenity_id |
+----------+------------+
| 1 | 1 |
+----------+------------+
| 2 | 1 |
+----------+------------+
| 2 | 2 |
+----------+------------+
| 2 | 3 |
+----------+------------+
So the house Uncle Bob has only TV while the house Mom Sara has TV, Internet and a fully equipped kitchen.
Remember - you should never use the same column to store multiple values (separated with comma). In all such cases you have to use another table, converting the multiple comma-separated values into distinct rows inside this detail table and referencing the primary key of the master table.

CONSTRAINT by checking value in other table

I have items, categories and a items_category table like this:
table items
+----+-------+
| id | title |
+----+-------+
table categories
+----+------+------+
| id | name | main |
+----+------+------+
table items_categories
+---------+-------------+
| item_id | category_id |
+---------+-------------+
I need to make a constraint so that an item can have ONLY ONE main category, but otherwise, any amount of main = false categories.
My initial thought was to set UNIQUE(item_id, category_id) but that is just limiting an item to one category.
Then I thought about having another column on the items_categories table called main which is an exact duplicate of the main column referenced by the category_id column, like this:
table items_categories
+---------+-------------+------+
| item_id | category_id | main |
+---------+-------------+------+
UNIQUE(item_id, category_id, main)
But that is not 100% normalized data and I would want to avoid that if possible.
What you need is a partial index:
CREATE UNIQUE INDEX main_category_unique ON items_categories (item_id)
WHERE main = TRUE;

Database design, for optimized access

I jump straight into the problem, currently I have a table as such
id | model | CategoryId | etc...
Now my new requirement is to have support for multiple categories. So I have two possible solutions in mind but I would like to know problems that both this designs might create. I also know that at most I can have 6 categories, also I can't create a linker table to link product to category.
On first design I would simply create column CategoryN
id | model | CategoryId1 | CategoryId2 | CategoryId3 | CategoryId4 | etc...
But this would make queries hideous,
id | model | CategoryId | etc...
My second approach is simply to add product for N categories
id | model | CategoryId | etc...
1 | ABC | 1 | etc...
2 | ABC | 2 | etc...
3 | ABC | 3 | etc...
I think queries would be cleaner but not necessarily simpler.
Another aspect is that I am looking at the performance of the queries and it looks like the first approach would be better.
I hope this is clear enough.
Thanks for any suggestions.
The third option is a many-to-many table to link a model to a category:
MODEL_CATEGORIES
model_id (primary key, foreign key to MODEL table)
category_id (primary key, foreign key to CATEGORY table)
Your example data would resemble:
model_id category_id
----------------------
1 1
1 2
1 3
This means there's no need for a category_id column in the MODEL table.
I think you're after a many-to-many relationship here.
Basically, you have a model_to_categories table that matches model ids against category ids.
Can you make the CategoryID values bit-field "flags"?
If so, you could keep your performance high by using the single CategoryID field that you have now, keep your queries simple by not adding a bunch of new columns, and would have up to 32 categories for each product over time (assuming that CategoryID is an int).
So, your CategoryID values would be:
id | model | CategoryId | etc...
1 | ABC | 1 | etc...
2 | ABC | 2 | etc...
3 | ABC | 4 | etc...
3 | ABC | 8 | etc...
You would store the total of all of the CategoryIDs in the CategoryID column (for backward compatibility) and then would have to test the value of the field to find out if a specific category were "set". Incidentally, you can do that directly in your query as well.
It's not "best practice", which would really require a brand-new table and a lot of joining, but if you are looking for a way to shoe-horn something in there that will work, bit-field flags will do the trick.

Ways to implement tags - pros and cons of each

Related
Using SO as an example, what is the most sensible way to manage tags if you anticipate they will change often?
Way 1: Seriously denormalized (comma delimited)
table posts
+--------+-----------------+
| postId | tags |
+--------+-----------------+
| 1 | c++,search,code |
Here tags are comma delimited.
Pros: Tags are retrieved at once with a single select query. Updating tags is simple. Easy and cheap to update.
Cons: Extra parsing on tag retrieval, difficult to count how many posts use which tags.
(alternatively, if limited to something like 5 tags)
table posts
+--------+-------+-------+-------+-------+-------+
| postId | tag_1 | tag_2 | tag_3 | tag_4 | tag_5 |
+--------+-------+-------+-------+-------+-------+
| 1 | c++ |search | code | | |
Way 2: "Slightly normalized" (separate table, no intersection)
table posts
+--------+-------------------+
| postId | title |
+--------+-------------------+
| 1 | How do u tag? |
table taggings
+--------+---------+
| postId | tagName |
+--------+---------+
| 1 | C++ |
| 1 | search |
Pros: Easy to see tag counts (count(*) from taggings where tagName='C++').
Cons: tagName will likely be repeated many, many times.
Way 3: The cool kid's (normalized with intersection table)
table posts
+--------+---------------------------------------+
| postId | title |
+--------+---------------------------------------+
| 1 | Why is a raven like a writing desk? |
table tags
+--------+---------+
| tagId | tagName |
+--------+---------+
| 1 | C++ |
| 2 | search |
| 3 | foofle |
table taggings
+--------+---------+
| postId | tagId |
+--------+---------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
Pros:
No repeating tag names.
More girls will like you.
Cons: More expensive to change tags than way #1.
These solutions are called mysqlicious, scuttle and toxi.
This article compares benefits and drawbacks of each.
I would argue that there is a fourth solution which is a variation on your third solution:
Create Table Posts
(
id ...
, title ...
)
Create Table Tags
(
name varchar(30) not null primary key
, ...
)
Create Table PostTags
(
PostId ...
, TagName varchar(30) not null
, Constraint FK_PostTags_Posts
Foreign Key ( PostId )
References Posts( Id )
, Constraint FK_PostTags_Tags
Foreign Key ( TagName )
References Tags( Name )
On Update Cascade
On Delete Cascade
)
Notice that I'm using the tag name as the primary key of the Tags table. In this way, you can filter on certain tags without the extra join to the Tags table itself. In addition, if you change a tag name, it will update the names in the PostTags table. If changing a tag name is a rare occurrence, then this shouldn't be a problem. If changing a tag name is a common occurrence, then I would go with your third solution where you use a surrogate key to reference the tag.
I personally favour solution #3.
I don't agree that solution #1 is easier to mantain.
Think of the situation where you have to change the name of a tag.
Solution #1:
UPDATE posts SET tag = REPLACE(tag, "oldname", "newname") WHERE tag LIKE("%oldname%")
Solution #3:
UPDATE tags SET tag = "newname" WHERE tag = "oldname"
The first one is way heavier.
Also you have to deal with the commas when deleting tags (OK, it's easily done but still, more difficult that just deleting one line in the taggings table)
As for solution #2... is neither fish nor fowl
I think that SO uses solution #1. I'd go with either #1 or #3.
One thing to consider is if you have several thing that you can tag (e.g. adding tags to both post and products, for example). This may affect database solution.
Well I have the same doubt I adopted the third solution for my website. I know there is another way for dealing with this problem of variable-length tuples which consists in using columns as rows in this way you will have some information identifying the tuple redudant and the varying ones organized one for each row.
+--------+-------+-------------------------------------+
| postId | label | value |
+--------+-------+-------------------------------------+
| 1 | tag |C++ |
+--------+-------+-------------------------------------+
| 1 | tag |search |
+--------+-------+-------------------------------------+
| 1 | tag |code |
+--------+-------+-------------------------------------+
| 1 | title | Why is a raven like a writing desk? |
+--------+-------+-------------------------------------+
This is really bad but sometimes it's the only feasible solution, and it's very far from the relational approach.