Structuring Categories in SQL Server 2008 R2 - sql

Hello I looked at a few similar posts to what I am looking to do but none are the same to what I need to accomplish. I am trying to come up with my structure for categories using SQL Server 2008 R2.
I want to make categories for lets say...Clothing, Electronics, Furniture, Tools......and so on.
I am looking at a 3 field table to start with a category table (category ID (PK), categoryname, parentID) which from what I am finding is a standard practice and can go several layers deep without having to restructure.
The problem lies where it is fine for lets say (electronics-cd players-cd changer), (electronics-lighting-studio lighting) or (clothing-womens-skirts), (clothing-womens-pants) perhaps one level deeper?
What do I do for brands? I was planning to have a brand table (brandID(PK),Brand)
then Category_Brand table (categoryID, BrandID) to link brands to categories when I want to use a cascading dropdown list that populates from the database.
What do I do for deeper attributes where the rest of the attributes apply to the item itself, but are dependent on the category? color, pattern, material, size? which can apply to clothing, but not to electronics or tools, also Mens clothing has different sizing than womens clothing.
Or furniture where I want to store dresser dimensions and color, or beds where I want to store bed size (king, queen, twin) and to store the type (Spring, air, foam, water)
What i need is to connect the item specific attributes to each item based on which category the item belongs to. On another forum I was suggested to just add all the misc. attributes to the item table and leave the ones I don't use null. I know that doesn't make sense, it seems to me that there should be different sub-attribute tables with fields that are related to the categories that they represent. i am thinking that clothing size for example would have a lookup table where each size has a (sizeid) and a link table for a many to many type relationship to connect the size with the (itemid), although there would need to be a few different size tables because men's sizes and women's sizes are different or put then all in one table with the (categoryid) as a sort of parent foreign key, and dimensions for another item like (length, width, height) would be stored into its own table along with the (itemid) as the foreign key?
Or is it a good idea to store the (sizeid) or (dimensionid) right into the item table?
This seemed to be simple to me when I started, but the more I look at it the more I am getting confused as to the correct way to structure this, I want it to work good for performance as this may become a high volume application. But doesn't everyone wish that?

try to understand normalization first. Here is a good article for you.

Related

Why need to add ID field to products categories of of online shopping database?

I am just started to learn about relational database. When I studied the database of online shopping websites, I found that many examples create a category table and added ID field to the category name. I don't know why they need to create a category table and use category ID as a foreign key to relate products table. What will happen if I remove the category table and add the category name directly to the products table?
What I think is a lot of cases that you want a website menu showing your categories. This menu allows people to view your categories (Men Clothing, Women Clothing, Kids, Accessories) and once they click it they can see the products relevant to them.
If you put the category name to the product, it is very hard for you to update your menu content as you need to loop, group the category in the product table. Also, it is harder to update the category name in product table as a category name could be in lots of product records,
Whereas if you have a category table, you just need to maintain the category table (view what you have in the category table and update DB record if you want your menu change).
In long term maintenance, category table is desired.
In a case I have come over that I would like an empty category which just to show in the website menu (a menu item which contains no product) which is not possible if I do not have a category table.
By inserting just the category name you may complete your POC but you need to understand what is Normalization and why it is needed.
First normal form (1NF) : An entity type is in 1NF when it contains no repeating groups of data.
Second normal form (2NF) : An entity type is in 2NF when it is in 1NF and when all of its non-key attributes are fully dependent on its primary key.
Third normal form (3NF) : An entity type is in 3NF when it is in 2NF and when all of its attributes are directly dependent on the primary key.
Source
What will happen if I remove the category table and add the category name directly to the products table?
Suppose you store the category with each product, and one day your boss tells you that you misspelled a category name. Which one?
"Theater" he says. Or did he say "theatre?" Which is correct? You check and find about "theater" and "theatre" are used close to evenly among the products that have either one.
So which spelling did your boss mean is the mistake, and which one is correct?
If you store the correct spelling in one place, in its own categories table, then you can be sure. You can correct it, and all the products that reference it will implicitly get the correction.
That's an argument for normalization, but keep in mind using an integer id is only a convention. It has nothing to do with normalization. You can use a string as a primary key of a table, and therefore you can use a string as a foreign key in a table that references it.
It's okay to use a non-integer for key columns. As long as there is one instance that stores the canonical value, it satisfies the goal of normalization -- that is to reduce data anomalies.

Creating a relationship between two tables without using a Primary Key

I'm looking to create a relationship in SSMS between the Menu table and the WinePrice table. I've attempted to do this via a link table (MenuContents). However I can't figure out the relationship between MenuContentsId in the MenuContents table (as it won't be unique) and MenuContentsId in the Menu table. I've left my other tables out of the picture to keep things clearer.
Menu: WinePrice: MenuContents:
MenuId(PK) WinePriceId(PK) MenuContentsId
PubId WineId WinePriceId
MenuContentsId Size
MenuName Price
The idea is that a menu can contain many variations of the same wine (based on its price and size), each identified by WinePriceId- which relates to a specific wine in another unshown table. I can't make MenuContentsId a PK because many MenuContentsIds will have many WinePriceIds.
To me it looks like you want one menu to have many wines. Many wines can exist in many menus.
The link table is correct, all you need to do is to make the PK of MenuContents a composite of both fields. That way, any MenucContents::MenuId field may appear many times as long as MenuContents::WindPriceId is different.
And it means that the same WindPriceId can appear against multiple MenuIds.

How to reduce this SQL table size?

I am trying to upgrade an existing database for online job finding and hiring website, the main goal is to make the table for people more browsable by adding categories, features, better tag system, and subcategories
Here is the problem: each category have it's own subcategory and features, for example when a user is seeing people in teaching category, the user might want to find out if they teach privately (home-school) so I add a bit type column for that feature, but as you might know, not all categories need a home schooling feature, for example other categories like, computer, engineer, medicine and other stuff, that means all of these rows with category other than teaching will all have a useless "NULL" in them that takes a 1 byte that might not sound a lot but at the end I might end up having tons of such useless "NULL"s in each row that wasting space.
I also can't create table for each category since the people table have relations with other tables like users,comments, images and etc....
What do you suggest I do?
It should be one option datatype varchar to nvarchare. Table column should be data type nvarchar. It is benift to given size max but store value to assign size. More detail to refer this link
http://www.c-sharpcorner.com/UploadFile/cda5ba/difference-between-char-nchar-varchar-and-nvarchar-data-ty/

Database Design related to Categories and Subcategories

I have researched this to no end. I am not the only person who has asked this question... but I would like your thoughts regarding the best practice.
I'm trying to design a Database that will track financial transactions. For the sake of simplicity, each transaction can only have one Category, and each category can only have one Sub-category.
I have a self-referencing table, like this:
Table: Categories
ID, int, primary key
parentID, int, foreign key
description, text
Long story short, you end up with data like this:
1 Auto [null]
2 Bills [null]
3 Healthcare [null]
4 Maintenance 1
5 Gasoline 1
6 Cell Phone 2
7 Rent 2
8 Prescriptions 3
9 Dentist 3
So far, so good. Here is my problem:
I don't know the proper way I'm supposed to relate this all back to my Transactions table. 'Transactions' has a column for 'Category' and 'Subcategory'. Transaction.ID would be the PK, and Categories.ID would be the FK.
With Transactions related to Categories in the manner specified above, that means any value from Categories could be written to Category or Subcategory...
Is it my responsibility as the programmer to control access to the table via a form? In other words, is my only option 'programmatically controlling' what goes into the Category and Subcategory columns?
Remember, each Category can only have one Subcategory. The selected Category should only allow that Category's children...
Am I making sense?
GOOD: Auto -- Maintenance
BAD: Healthcare -- Gasoline
The case you pose is subset of the more general problem of encoding hierarchical data, tree structures, in relational tables. This case has been studied in great detail ever since relational databases first made the scene in the late 1970s.
In bookkeeping systems in particular, the idea of subcategories and categories comes up, every single time. Larger scale industrial systems tend to have a four level system, with overall account type (Expenses), Category (Transportation), Subcategory (Automotive), and sub-subcategory (Gasoline).
Your research might be more productive if you used the following search terms: "Tree structure in relational design". That search yielded the following Wikipedia summary:
http://en.wikipedia.org/wiki/Hierarchical_database_model
You can find lots of related questions and answers here in SO. Search under "nested sets" or "adjancency lists" for a couple of techniques.
Your problem is going to be to simplify the answers you will find down to the case where there are only two levels: category and subcategory.
I think whatever design you choose will want to make the following rule explicit: Subcategory determines category. and you will, IMO, want the DBMS to enforce this rule so that no transaction ends up with a subcategory that is inconsistent with its category.
So your categorizations are not orthogonal and independent (such as gender and city), but rather hierarchical (such as State and County).
In order to enforce a hierarchical categorizations, use a single categorization table with a ID column as primary key, referenced as a foreign key in the data table, and two descriptive fields Category and Subcategory.
To facilitate data entry you might supply a combo box Category which filters the available subcategories. However the actual foreign key reference is supplied by the selection made in the Subcategory combo box, which must list both fields, Category and Subcategory. It would be usual to concatenate these two fields with a delimiter such as dash (-) or pipe(|).

Products database design for product lines, categories, manufacturers, related software, product attributes, etc

I am redeveloping the front end and database for a medium size products database so that it can support categories/subcategories, product lines, manufacturers, supported software and product attributes. Right now there is only a products table. There will be pages for products by line, by category/subcategory, by manufacturer, by supported software (optional). Each page will have additional filtering based on the other classifications.
Categories/Subcategories (multi level)
Products and product lines can be assigned to multiple category trees. Up to 5 levels deep should be supported.
Product lines (single level)
Groups of products. Product can only be in single product line.
Manufacturers (single level)
Products and product lines can be assigned to single manufacturer.
Supported software (single level)
Certain products only work with one or more softwares, so a product/line can be assigned to none, one or more softwares.
Attribues (type / options - could be treated so each type is a category and items are children)
Products and product lines can be assigned attributes (eg - color > red / blue / green). Attributes should be able to be assigned to one or more categories.
Since all these items are basically types of subcategories, do I put them all together in a master table OR split them into separate tables for each one?
Master table idea:
ClassificationTypes (product line, category/sub, manufacturer, software, attribute would all be types)
-TypeID
-Name
Classifications
-ClassID
-TypeID
-ParentClassID
-Name
ClassificationsProductsAssociations
-ProductID
-ClassID
I would still need at least one more table to link types together (eg - to link attributes to a category) and a way to link product lines to various types.
If I go with a table for each type it can get messy quick and I will still need a way to link everything together.
Multiple table setup:
Categories
-CategoryID
-Name
-ParentCategoryID
CategoriesAssociations
-CategoryID
-ProductID
-ProductLineID ?
Attributes
-AttributeID
-Name
-ParentAttributeID (use this as the parent would be "color" and child would be "red")
AttributesAssociations
-AttributeID
-ProductID
-CategoryID (do I also need to link the category to the parent attribute?)
CompatibleSoftware
-SoftwareID
-Name
CompatibleSoftwareAssociations
-SoftwareID
-ProductID
-ProductLineID ?
Manufacturers
-ManufacturerID
-Name
ProductLines
-ProductLineID
-ManufacturerID
-Name
Products
-ProductID
-ProductLineID
-ManufacturerID
-Name
Other option for associations is to have a single associations table to link the tables above:
Master Associations
-ProductID
-ProductLineID
-ManufacturerID
-CategoryID
-SoftwareID
-AttributeID
What is the best solution?
Go for multiple tables, it makes the design more obvious and more extensible, in my opinion. While it may fit your solution now, further changes may be more difficult.
I agree to Paddy. It makes your life easier in the future and you are much more flexible. You might want to put in stock control and other stuff. To link everything together use the id's (integer) parent/child of the tables.
I think multiple tables is the way to go, but to really know, do this: Flesh out the design for both ways and then take a sample of 5-10 products.
Populate the tables in both designs for the 5-10 products.
Now start writing the queries for both ways. You will start to see which is easier to write (the single table I bet), and you might find cases that only work in one design (the multi-table I bet.)
When you are done you have not lost the work -- you can use the table schema to move forward and some of your queries will already be written.
If you get to a query that does not make sense, seems to complicated, or such you can post it here and get feed back -- having real code always gets better comments.
Just wanted to post my decision and since I was not satisfied with any of the answers provided, I have elected to answer my own question.
I ended up setting up a a single set of tables:
Classification Types (eg - product lines, categories, manufacturers, etc)
Classifications (supporting parent/child adjacency list, nested sets, and materialized path all at once in order to take advantage of strengths of each. I have a SQL CTE that can populate all the fields in one go when the data changes)
Classifications Relations (with ability to relate products to classifications, relate classifications to other classifications and also relate classifications to other types)
I will admit that the solution is not 100% normalized, but this setup gives me ultimate flexibility to expand by creating new types and is very powerful and easy to query.