Database design - Category-Object relation - sql

I have to store in database objects with different kind of properties. Also I want to group those objects in categories and the object can be part in multiple categories.
Example objects:
Book (ID, Title, Author, Description)
Movie (ID, Title, Author, Genre)
Song (ID, Title, Singer, Year)
So far I have tried two methods:
Method 1
Create a Category table with fields ( ID, CategoryName).
Create a general table and try to match table fields with object properties.
Join the category with objects in a helper table
_CategoyObject_ (_CategoryID_, _ObjectID_).
This seems to be a bad practice because we have many columns with null values in general table. Also referring to the first point in this article it is a bad practice to store different objects in the same table.
Method 2
Create a Category table with fields ( ID, CategoryName).
Create table for each of the object.
Create a helper table to link the categories table with the name of the object table:
_CategoyObject_ (_CategoryID_, *Object_Table_Name*)
This method can help us find the name of the table to create the join if we want to take objects for a given category.
But this complicates the query because we should have one query to get the tables for a certain category and another query to get records of objects
This is even harder because I am currently using Entity Framework code first to create the database.
Is it better to get the data with ADO.NET instead of Entity Framework for easier data retrieval?
Can you suggest me any other method that is simpler for the questions:
Get all the objects who belongs to X category?
Get in which category Y object belongs?

Create three tables, and call them movies, books, and songs. In this way, you will not need to look up the name of the table. When you need the books, you'll "select * from books".

Related

I am making a SQL database of categories and subcategories. What is the best way to link these tables?

The database has a table called "categories" with columns CATEGORY_ID(primary key) and CATEGORY_NAME.
I have subcategories for each category.
For better accessing which is the best method from the below methods.
Method 1: The "CATEGORY_ID" column in the "categories" table is a FOREIGN KEY in the "subcategories " table.
Method 2: Maintaining a separate table for each category representing the subcategories.
I prefer to use same table for category and sub category
like
Table Categories
[CATEGORY_ID, CATEGORY_NAME, PARENT_CATEGORY_ID]
In case you don't know how many sub categories are there.
This scenario is just an example, the scenario is as follows: We have a product table where all the records of products are stored. Same way we will have customer table where records of customers are stored. The daily sales keep the record of all the sales. This sales table will keep record of which product who has purchased. So linking is to be done from Sales table to product table and customer table.
The query to link the two tables is as follows:SELECT product_name, customer.name, date_of_sale FROM sales, product, customer WHERE product.product_id = sales.product_id and customer.customer_id >= sales.customer_id LIMIT 0, 30
It is better to go with Method 1 since it is more scalable.
Let me elaborate on this. If we go with method 1, we need to maintain 2 tables only that is Categories and Subcategories. In future if we have new categories or subcategories we can directly deal with this 2 tables.
If we consider same situation with Method2 then we need to create new tables every time, this may become maintenance overhead.
Let me be a bit more direct. You explain in a comment that Method 2 is a separate table for each category. If so, then Method 2 -- in general -- is just wrong.
There are two methods for storing this type of information. One is a Categories table with a (single) Subcategories table. The Subcategories table would have CategoryId, a foreign key reference back to Categories. This is the normalized data model.
The second method is to store everything in one table. Each row would be a category/subcategory combination. Information about a given category would be duplicated across multiple rows, so this is not a normalized approach. However, this is a typical approach when doing dimensional modeling for decision support systems.
If the subcategories are just names of things, there is a third approach, which would be to store a list of the subcategories within each Category row. The list would not be a delimited string. It would be JSON, a nested table, XML, array, or similar collection data type supported by the database you are using. I am mentioning this as a possibility, but not recommending it.

How to use SQL Server views with distinct clause to Link to a detail table?

I may be total standard here, but I have a table with duplicate values across the records i.e. People and HairColour. What I need to do is create another table which contains all the distinct HairColour values in the Group of Person records.
i.e.
Name HairColour
--------------------
Sam Ginger
Julie Brown
Peter Brown
Caroline Blond
Andrew Blond
My Person feature view needs to list out the distinct HairColours:
HairColour Ginger
HairColour Brown
HairColour Blond
Against each of these Person feature rows I record the Recommended Products.
It is a bit weird from a Relational perspective, but there are reasons. I could build up the Person Feature"View as I add Person records using say an INSTEAD OF INSERT trigger on the View. But it gets messy. An alternative is just to have Person Feature as a View based on a SELECT DISTINCT of the Person table and then link Recommended Products to this. But I have no Primary Key on the Person Feature View since it is a SELECT DISTINCT View. I will not be updating this View. Also one would need to think about how to deal with the Person Recommendation records when a Person Feature record disappeared since since it is not based on a physical table.
Any thoughts on this please?
Edit
I have a table of People with duplicate values for HairColour across a number of records, e.g., more than one person has blond hair. I need to create a table or view that represents a distinct list of "HairColour" records as above. Against each of these "HairColour" records I need link another table called Product Recommendation. The main issue to start with is creating this distinct list of records. Should it be a table or could it be a View based on a SELECT DISTINCT query?
So Person >- HairColour (distinct Table or Distinct View) -< Product Recommendation.
If HairColour needs to be a table then I need to make sure it has the correct records in it every time a Person record is added. Obviously using a View would do this automatically, but I am unsure whether you can can hang another table off a View.
If I understand correctly, you need a table with a primary key that lists the distinct hair colors that are found in a different table.
CREATE TABLE Haircolour(
ID INT IDENTITY(1,1) NOT NULL,
Colour VARCHAR(50) NULL
CONSTRAINT [PK_Haircolour] PRIMARY KEY CLUSTERED (ID ASC))
Then insert your records. If this is querying a table called "Person" it will look like this:
INSERT INTO Haircolour (Colour) SELECT DISTINCT HairColour FROM Person
Does this do what you are looking for?
UPDATE:
Your most recent Edit shows that you are looking for a many-to-many relationship between the Person and ProductRecommendation tables, with the HairColour table functioning as a cross reference table.
As ErikE points out, this is a good opportunity to normalize your data.
Create the HairColour table as described above.
Populate it from whatever source you like, for example the insert statement above.
Modify both the Person and the ProductRecommendation tables to include a HairColourID field, which is an integer foreign key that points to the PK field of the HairColour table.
Update Person.HairColourID to point to the color mentioned in the Person.HairColour column.
Drop the Person.HairColour column.
This involves giving up the ability to put free form new color names into the Person table. Any new colors must now be added to the HairColour table; those are the only colors that are available.
The foreign key constraint enforces the list of available colors. This is a good thing. Referential integrity keeps your data clean and prevents a lot of unexpected errors.
You can now confidently build your ProductRecommendation table on a data structure that will carry some weight.
Are you simply looking for a View of distinct hair colors?
CREATE VIEW YourViewName AS
SELECT DISTINCT HairColour
FROM YourTableName
You can query this view like a table:
SELECT 'HairColour: ' + HairColour
FROM YourViewName
If you are trying to create a new (temp) table, the syntax would look like:
SELECT Name, HairColour
INTO #Temp
FROM YourTableName
GROUP BY Name, HairColour
Here the GROUP BY is doing the same work that a DISTINCT keyword would do in the select list. This will create a temp table with unique combinations of "Name" and "HairColour".
You need to clear up a few things in your post (or in your mind) first:
1) What are the objectives? Forget about tables and views and whatever. Phrase your objectives as an ordinary person would. For example, from what I could gather from your post:
"My objective is to have a list of recommended products based on each person's hair colour."
2) Once you have that, check what data you have. I assume you have a "Persons" table, with the columns "Name" and "HairColour". You check your data and ask yourself: "Do I need any more data to reach my objective?" Based on your post I say yes: you also need a "matching" between hair colours and product ids. This must be provided, or programmed by you. There is no automatic method of saying for example "brown means products X,Y,Z.
3) After you have all the needed data, you can ask: Can I perform a query that will return a close approximation of my objective?
See for example this fiddle:
http://sqlfiddle.com/#!2/fda0d6/1
I have also defined your "Select distinct" view, but I fail to see where it will be used. Your objectives (as defined in your post) do not make this clear. If you provide a thorough list in Recommended_Products_HairColour you do not need a distinct view. The JOIN operation takes care of your "missing colors" (namely "Green" in my example)
4) When you have the query, you can follow up with: Do I need it in a different format? Is this a job for the query or the application? etc. But that's a different question I think.

How to write a SELECT statement when I don't know in advance what columns I want?

My application has several tables: a master OBJECT table, and several
tables for storing specific kinds of objects: CAT, SHOE and BOOK.
Here's an idea of what the table columns look like:
object
object_id (primary key)
object_type (string)
cat
cat_id (primary key)
object_id (foreign key)
name (string)
color (string)
shoe
shoe_id (primary key)
object_id (foreign key)
model (string)
size (string)
book
book_id (primary key)
object_id (foreign key)
title (string)
author (string)
From the user's point of view, each specific object is primarily identified
by its name, which is a different column for each table. For the CAT table
it's name, for the SHOE table it's model, and for the BOOK table it's
title.
Let's say I'm handed an object_id without knowing in advance what kind of
object it represents -- a cat, a shoe or a book. How do I write a
SELECT statement to get this information?
Obviously it would look a little like this:
SELECT object_type,name FROM object WHERE object_id = 12345;
But how do I get the right contents in the "name" column?
It seems like you're describing a scenario where the user's view on the data (objects have names, I don't care what type they are) is different from the model you're using to store the data.
If that is the case, and assuming you have some control over the database objects, I'd probably create a VIEW, allowing you to coalesce similar data for each type of object.
Example on SQL Server:
CREATE VIEW object_names AS
SELECT object_id, name FROM cat
UNION ALL
SELECT object_id, model AS name FROM shoe
UNION ALL
SELECT object_id, title AS name FROM book
GO
You can then SELECT name FROM object_names WHERE object_id = 12345, without concerning yourself with the underlying column names.
Your only real solutions basically boil down to the same thing: writing explicit statements for each specific table and unioning them into a single result set.
You can either do this in a view (giving you a dynamic database object that you can query) or as part of the query (whether it's straight SQL or a stored procedure). You don't mention which database you're using, but the basic query is something like this:
select object_id, name from cat where object_id = 12345 union all
select object_id, model from shoe where object_id = 12345 union all
select object_id, title from book where object_id = 12345
For SQL Server, the syntax for creating the view would be:
create view object_view as
select 'cat' as type, object_id, name from cat union all
select 'shoe', object_id, model from shoe union all
select 'book', object_id, title from book
And you could query like:
select type, name from object_view where object_id = 12345
However, what you have is a basic table inheritance pattern, but it's implemented improperly since:
The primary key of child tables (cat, shoe, book) should also be a foreign key to the parent table (object). You should not have a different key for this, unless two cat records can represent the same object (in which case this is not inheritance at all)
Common elements, such as a name, should be represented at the highest level of the hierarchy as appropriate (in this case in object, since all of the objects have the concept of a "name").
do a join
http://www.tizag.com/sqlTutorial/sqljoin.php
You can't. Why not name them the same thing, or pull that name back to the OBJECT table. You can very easily create a column called Name and put that in the OBEJCT table. This would still be normalized.
It seems like you have a few options. You could use a config file or a 'schema' table. You could rename your tables so that the name of ye column is always te same. You could have the class in your code know its table. You could make your architecture a little less generic, and allow the data access layer to understand the data it's accessing.
Which to choose? What problem are you solving? What problem were you solving, whose solution created this problem?
There's really no way to do this without first SELECTing to find out the kind, then SELECTing a second time to get the actual data. If you only have a few different kinds of objects, you could do it with a single SELECT and a bunch of LEFT JOINs to join all the tables at once, but that doesn't scale well if you've got lots of joiner tables.
But just thinking outside the box a bit, does the "identifier" that users see have to correspond exactly to the primary key in the table? Could you encode the "kind" of the object in the identifier itself? So for example, if object_id 12345 is a shoe you could "encode" this as "S12345" from the user's point of view. A book would be "B4567" and a cat "C2578". Then in your code, just separate out the first letter and use that to decide which table to join on, and the remaining numbers are your primary key.
If you cannot alter the original table due to dependencies, you could probably create a view of the table with uniform column name. More information on how to create views can be found here.
There a table you can look at that tells you the properties of all the tables (and column properties) in your db.
In postgres this is something like pg_stat_alltables, I think there is something similar in sql server. You could query this and work out what you required, then construct a query based on that info...
EDIT: Sorry re-reading the question, I don't think that is what you require. - I've solved a similar problem before by having a surrogate key table - one table with all the id's in and a type id, then a serial/identity column that contains the primary key for that table - this is the id you should use... then you can create a view which looks up the other information based on the type id in that table.
The 'entity ref' table would have columns 'entityref' (PK), 'id', 'type id' etc... (that is assuming you can't restructure to use inheritance)

How to maintain subcategory in MYSQL?

I am having categories as following,
Fun
Jokes
Comedy
Action
Movies
TV Shows
Now One video can have multiple categories or sub categories, let's say VideoId: 23 is present in Categories Fun, Fun->Comedy, Action->TV Shows but not in Action category. Now I am not getting idea that hwo should I maintain these categories in Database. Should I create only one column "CategoryId AS VARCHAR" in Videos and add category id as comma-separated values (1,3,4) like this but then how I will fetch the records if someone is browsing category Jokes?
Or should I create another table which will have videoId and categoryid, in that case if a Video is present in 3 different categories then 3 rows will be added to that new table
Please suggest some way of how to maintain categories for a particular record in the table
Thanks
You categories table could have a column in it called parentID that reference another entry in the categories table. It would be a foreign key to itself. NULL would represent a top-level category. Something other then NULL would represent "I am a child category of this category". You could assign a video to any category still, top-level, child, or somewhere inbetween.
Also, use autoincrement notnull integers for your primary keys, not varchar. It's a performance consideration.
To answer your comment:
3 tables: Videos, Categories, and Video_Category
Video_Category would have VideoID and CategoryID columns. The primary key would be a combination of the two columns (a compound primary key)
You have two choices, parentID (better as INT) to refer to the parent or an extra table with categoryID - parentID.
The last one may provide a better logical separation and allows you to have multiple categories.
I suggest that create another table which will have videoId and categoryid. Then you can use sql-query as follow:
select a.*,GROUP_CONCAT(b.category_id) as cagegory_ids
from table_video a
left join table_video_category b on a.video_id=b.video_id
group by a.video_id

How to display multiple values in a MySQL database?

I was wondering how can you display multiple values in a database for example, lets say you have a user who will fill out a form that asks them to type in what types of foods they like for example cookies, candy, apples, bread and so on.
How can I store it in the MySQL database under the same field called food?
How will the field food structure look like?
You may want to read the excellent Wikipedia article on database normalization.
You don't want to store multiple values in a single field. You want to do something like this:
form_responses
id
[whatever other fields your form has]
foods_liked
form_response_id
food_name
Where form_responses is the table containing things that are singular (like a person's name or address, or something where there aren't multiple values). foods_liked.form_response_id is a reference to the form_responses table, so the foods liked by the person who has response number six will have a value of six for the form_response_id field in foods_liked. You'll have one row in that table for each food liked by the person.
Edit: Others have suggested a three-table structure, which is certainly better if you are limiting your users to selecting foods from a predefined list. The three-table structure may be better in the case that you are allowing them the ability to enter their own foods, though if you go that route you'll want to be careful to normalize your input (trim whitespace, fix capitalization, etc.) so you don't end up with duplicate entries in that table.
normally, we do NOT work out like this. try to use a relation table.
Table 1: tbl_food
ID primary key, auto increment
FNAME varchar
Table 2: tbl_user
ID primary key, auto increment
USER varchar
Table 3: tbl_userfood
RID auto increment
USERID int
FOODID int
Use similar format to store your data, instead a chunk of data fitted into a field.
Querying in these tables are easier than parsing the chunk of data too.
Use normalization.
More specifically, create a table called users. Create another called foods. Then link the two tables together with a many-to-many table called users_to_foods referencing each others foreign keys.
One way to do it would be to serialize the food data in your programming language, and then store it in the food field. This would then allow you to query the database, get the serialized food data, and convert it back into a native data structure (probably an array in this case) in your programming language.
The problem with this approach is that you will be storing a lot of the same data over and over, e.g. if a lot of people like cookies, the string "cookies" will be stored over and over. Another problem is searching for everyone who likes one particular food. To do that, you would have to select the food data for each record, unserialize it, and see if the selected food is contained within. This is a very inefficient.
Instead you'll want to create 3 tables: a users table, a foods table, and a join table. The users and foods tables will contain one record for each user and food respectively. The join table will have two fields: user_id and food_id. For every food a user chooses as a favorite, it adds a record to the join table of the user's ID and the food ID.
As an example, to pull all the users who like a particular food with id FOOD_ID, your query would be:
SELECT users.id, users.name
FROM users, join_table
WHERE join_table.food_id = FOOD_ID
AND join_table.user_id = users.id;