Suppose that I have a store-website where user can leave comments about any product.
Suppose that I have tables(entities) in my website database: let it be 'Shoes', 'Hats' and 'Skates'.
I don't want to create separate "comments" table for every entity (like 'shoes_comments', 'hats_comments', 'skates_comments').
My idea is to somehow store all the comments in one big table.
One way to do this, that I thought of, is to create a table:
table (comments):
ID (int, Primary Key),
comment (text),
Product_id (int),
isSkates (boolean),
isShoes (boolean),
isHats (boolean)
and like flag for every entity that could have comments.
Then when I want to get comments for some product the SELECT query would look like:
SELECT comment
FROM comments, ___SOMETABLE___
WHERE ____SOMEFLAG____ = TRUE
AND ___SOMETABLE___.ID = comments.Product_id
Is this an efficient way to implement database for needed functionality?
What other ways i can do this?>
Sorry, this feels odd.
Do you indeed have one separate table for each product type? Don't they have common fields (e.g. name, description, price, product image, etc.)?
My recommendation as for tables: product for common fields, comments with foreign key to product but no hasX columns, hat with only the fields that are specific to the hat product line. The primary key in hat is either the product PK or an individual unique value (then you'd need an extra field for the foreign key to product).
I would recommend you to make one table for the comments and use a foreign key of other tables in the comments table.
The "normalized" way to do this is to add one more entity (say, "Product") that groups all characteristics common to shoes, hats and skates (including comments)
+-- 0..1 [Shoe]
|
[Product] 1 --+-- 0..1 [Hat]
1 |
| +-- 0..1 [Skate]
*
[Comment]
Besides performance considerations, the drawback here is that there is nothing in the data model preventing a row in Product to be referenced both by a row in Shoe and one in Hat.
There are other alternatives too (each with perks & flaws) - you might want to read something about "jpa inheritance strategies" - you'll find java-specific articles that discuss your same issue (just ignore the java babbling and read the rest)
Personally, I often end up using a single table for all entities in a hierarchy (shoes, hats and skates in our case) and sacrificing constraints on the altar of performance and simplicity (eg: not null in a field that is mandatory for shoes but not for hats and skates).
Related
Data fields
I am designing a database table structure. Say that we need to record employee profiles from different companies. We have the following fields:
+---------+--------------+-----+--------+-----+
| Company | EmployeeName | Age | Gender | Tel |
+---------+--------------+-----+--------+-----+
It's possible that two employees from different company may have the same name (and assume that no 2 employee has the same name in the same company). In this case a composite primary key (Company, EmployeeName) would be necessary in my opinion.
Search
Now I need to get all information by using only one of the 2 attributes in the primary key. For example,
I want to search all employees' profile of Company A:
SELECT EmployeeName, Age, Gender, Tel FROM table WHERE Company = 'Company A'
And I can also search all employees from different company named Donald:
SELECT Company, Age, Gender, Tel FROM table WHERE EmployeeName = 'Donald'
Strategy
In order to implement this requirement, my strategy would be storing all data in a single table, which is easy to read and understandable. However I noticed that it may take a long time to search as the query may need to iterate through all rows. I would like to retrieve these information as quick as possible. Would there be a better strategy for this?
First, your rows should have a unique identifier for each row -- identity/auto-increment/serial, depending on the database. Second, you might reconsider names being unique. Why can't two people at the same company have the same name?
In any case, you have a primary key on, say, (company, name). For the opposite search you simply want another index on (name, company):
create index idx_profiles_name_company on profiles(name, company);
A note explaining Gordon's suggestion for an identity on each row. This is supplemental to his answer above.
In theory there is nothing wrong with a primary key that crosses columns and in a db like PostgreSQL I like to have identity values as secondary keys (i.e. not null unique) and specify natural primary keys. Of course on MS SQL Server or MySQL/InnoDB that would be a recipe for problems. I would also not say "all" but rather "almost all" since there are times when breaking this rule is good.
Regardless, having an identity row simplifies a couple of things and it provides an abstraction around keys in case you get things wrong. Composite keys provide a couple issues that end up eating time (and possibly resulting in downtime) later. These include:
Joins on composite keys are often more expensive than those on simple values, and
Adding or changing a natural primary key which crosses columns is far harder when joins are involved
So depending on your db you should either specify a unique secondary key or make your natural primary key separate (which you should do depends on storage and implementation specifics).
I am having a bit of trouble when modelling a relational database to an inventory managament system. For now, it only has 3 simple tables:
Product
ID | Name | Price
Receivings
ID | Date | Quantity | Product_ID (FK)
Sales
ID | Date | Quantity | Product_ID (FK)
As Receivings and Sales are identical, I was considering a different approach:
Product
ID | Name | Price
Receivings_Sales (the name doesn't matter)
ID | Date | Quantity | Type | Product_ID (FK)
The column type would identify if it was receiving or sale.
Can anyone help me choose the best option, pointing out the advantages and disadvantages of either approach?
The first one seems reasonable because I am thinking in a ORM way.
Thanks!
Personally I prefer the first option, that is, separate tables for Sales and Receiving.
The two biggest disadvantage in option number 2 or merging two tables into one are:
1) Inflexibility
2) Unnecessary filtering when use
First on inflexibility. If your requirements expanded (or you just simply overlooked it) then you will have to break up your schema or you will end up with unnormalized tables. For example let's say your sales would now include the Sales Clerk/Person that did the sales transaction so obviously it has nothing to do with 'Receiving'. And what if you do Retail or Wholesale sales how would you accommodate that in your merged tables? How about discounts or promos? Now, I am identifying the obvious here. Now, let's go to Receiving. What if we want to tie up our receiving to our Purchase Order? Obviously, purchase order details like P.O. Number, P.O. Date, Supplier Name etc would not be under Sales but obviously related more to Receiving.
Secondly, on unnecessary filtering when use. If you have merged tables and you want only to use the Sales (or Receving) portion of the table then you have to filter out the Receiving portion either by your back-end or your front-end program. Whereas if it a separate table you have just to deal with one table at a time.
Additionally, you mentioned ORM, the first option would best fit to that endeavour because obviously an object or entity for that matter should be distinct from other entity/object.
If the tables really are and always will be identical (and I have my doubts), then name the unified table something more generic, like "InventoryTransaction", and then use negative numbers for one of the transaction types: probably sales, since that would correctly mark your inventory in terms of keeping track of stock on hand.
The fact that headings are the same is irrelevant. Seeking to use a single table because headings are the same is misconceived.
-- person [source] loves person [target]
LOVES(source,target)
-- person [source] hates person [target]
HATES(source,target)
Every base table has a corresponding predicate aka fill-in-the-[named-]blanks statement describing the application situation. A base table holds the rows that make a true statement.
Every query expression combines base table names via JOIN, UNION, SELECT, EXCEPT, WHERE condition, etc and has a corresponding predicate that combines base table predicates via (respectively) AND, OR, EXISTS, AND NOT, AND condition, etc. A query result holds the rows that make a true statement.
Such a set of predicate-satisfying rows is a relation. There is no other reason to put rows in a table.
(The other answers here address, as they must, proposals for and consequences of the predicate that your one table could have. But if you didn't propose the table because of its predicate, why did you propose it at all? The answer is, since not for the predicate, for no good reason.)
For me the most understandable description of going about 1NF so far I found is ‘A primary key is a column (or group of columns) that uniquely identifies each row. ‘ on www.phlonx.com
I understand that redundancy means per key there shouldn’t be more than 1 value to each row. More than 1 value would then be ‘redundant’. Right?
Still I manage to screw up 1 NF a lot of times.
I posted a question for my online pizzashop http://foo.com
pizzashop
here
where I was confused about something in the second normal form only to notice I started off wrong in 1 NF.
Right now I’m thinking that I need 3 keys in 1NF in order to uniquely identify each row.
In this case, I’m finding that order_id, pizza_id, and topping_id will do that for me. So that’s 3 columns. Because if you want to know which particular pizza is which you need to know what order_id it has what type of pizza (pizza_id) and what topping is on there. If you know that, you can look up all the rest.
Yet, from an answer to previous question this seems to be wrong, because topping_id goes to a different table which I don’t understand.
Here’s the list of columns:
Order_id
Order_date
Customer_id
Customer_name
Phone
Promotion
Blacklist Y or N
Customer_address
ZIP_code
City
E_mail
Pizza_id
Pizza_name
Size
Pizza_price
Amount
Topping_id
Topping_name
Topping_prijs
Availabitly
Delivery_id
Delivery_zone
Deliveryguy_id
Deliveryguy_name
Delivery Y or N
Edit: I marked the id's for the first concatenated key in bold. They are only a list of columns, unnormalized. They're not 1 table or 3 tables or anything
use Object Role Modelling (say with NORMA) to capture your information about the design, press the button and it spits out SQL.
This will be easier than having you going back and forth between 1NF, 2NF etc. An ORM design is guaranteed to be in 5NF.
Some notes:
you can have composite keys
surrogate keys may be added after both conceptual and logical design: you have added them up front which is bad. They are added because of the RDBMS performance, not at design time
have you read several sources on 1NF?
start with plain english and some facts. Which is what ORM does with verbalisation.
So:
A Customer has many pizzas (zero to n)
A pizza has many toppings (zero to n)
A customer has an address
A pizza has a base
...
I'd use some more tables for this, to remove duplication for customers, orders, toppings and pizze:
Table: Customer
Customer_id
Customer_name
Customer_name
Phone
Promotion
Blacklist Y or N
Customer_address
ZIP_code
City
E_mail
Table: Order
Order_id
Order_date
Customer_id
Delivery_zone
Deliveryguy_id
Deliveryguy_name
Delivery Y or N
Table: Order_Details
Order_ID (FK on Order)
Pizza_ID (FK on Pizza)
Amount
Table: Pizza
Pizza_id
Pizza_name
Size
Pizza_price
Table: Topping
Topping_id
Topping_name
Topping_prijs
Availabitly
Table: Pizza_Topping
Pizza_ID
Topping_ID
Pizza_topping and Order_details are so-called interselection tables ("helper" tables for modelling a m:n relationship between two tables).
Now suppose we have just one pizza, some toppings and our customer Billy Smith orders 2 quattro stagione pizze - our tables will contain this content:
Pizza(Pizza_ID, Pizza_name, Pizza_price)
1 Quattro stagioni 12€
Topping(Topping_id, topping_name, topping_price)
1 Mozzarrella 0,50€
2 Prosciutto 0,70€
3 Salami 0,50€
Pizza_Topping(Pizza_ID, Topping_ID)
1 1
1 3
(here, a quattro stagioni pizza contains only Mozzarrella and Salami).
Order(order_ID, Customer_name - rest omitted)
1 Billy Smith
Order_Details(order_id, Pizza_id, amount)
1 1 2
I've removed delivery ID, since for me, there is no distinction between an Order and a delivery - or do you support partial deliveries?
On 1NF, from wikipedia, quoting Date:
According to Date's definition of 1NF,
a table is in 1NF if and only if it is
"isomorphic to some relation", which
means, specifically, that it satisfies
the following five conditions:
There's no top-to-bottom ordering to the rows.
There's no left-to-right ordering to the columns.
There are no duplicate rows.
Every row-and-column intersection contains exactly one
value from the applicable domain (and
nothing else).
All columns are regular [i.e. rows have no hidden components such as
row IDs, object IDs, or hidden
timestamps].
—Chris Date, "What First Normal Form Really Means", pp. 127–8[4]
First two are guaranteed in any modern RDBMS.
Duplicate rows are possible in modern RDBMS - however, only if you don't have primary keys (or other unique constraints).
The fourth one is the hardest one (and depends on the semantics of your model) - for example your field Customer_address might be breaking 1NF. Might be, because if you make a contract with yourself (and any potential user of the system) that you will always look at the address as a whole and will not want to separate street name, street number and or floor, you could still claim that 1NF is not broken.
It would be more proper to break the customer address, but there are complexities there with which you would then need to address and which might bring no benefit (provided that you will never have to look a the sub-atomic part of the address line).
The fifth one is broken by some modern RDBMs, however the real importance is that your model nor system should depend on hidden elements, which is normally true - even if your RDBMS uses OIDs internally for certain operations, unless you start to use them for non-administrative, non-maintenance tasks, you can consider it not breaking the 1NF.
The strengths of relational databases come from separating information into different tables. One useful way of looking at tables is first to identify as entity tables those concepts which are relatively permanent (in your case, probably Pizza, Customer, Topping, Deliveryguy). Then you think about the relations between them (in your case, Order, Delivery ). The relational tables link together the entity tables by having foreign keys pointing to the relevant entities: an Order has foreign keys to Customer, Pizza, Topping); a Delivery has foreign keys to Deliveryguy and Order. And, yes, relations can link relations, not just entities.
Only in such a context can you achieve anything like normalization. Tossing a bunch of attributes into one singular table does not make your database relational in any meaningful sense.
I am wondering is it more useful and practical (size of DB) to create multiple tables in sql with two columns (one column containing foreign key and one column containing random data) or merge it and create one table containing multiple columns. I am asking this because in my scenario one product holding primary key could have sufficient/applicable data for only one column while other columns would be empty.
example a. one table
productID productname weight no_of_pages
1 book 130 500
2 watch 50 null
3 ring null null
example b. three tables
productID productname
1 book
2 watch
3 ring
productID weight
1 130
2 50
productID no_of_pages
1 500
The multi-table approach is more "normal" (in database terms) because it avoids columns that commonly store NULLs. It's also something of a pain in programming terms because you have to JOIN a bunch of tables to get your original entity back.
I suggest adopting a middle way. Weight seems to be a property of most products, if not all (indeed, a ring has a weight even if small and you'll probably want to know it for shipping purposes), so I'd leave that in the Products table. But number of pages applies only to a book, as do a slew of other unmentioned properties (author, ISBN, etc). In this example, I'd use a Products table and a Books table. The books table would extend the Products table in a fashion similar to class inheritance in object oriented program.
All book-specific properties go into the Books table, and you join only Products and Books to get a complete description of a book.
I think this all depends on how the tables will be used. Maybe your examples are oversimplifying things too much but it seems to me that the first option should be good enough.
You'd really use the second example if you're going to be doing extremely CPU intensive stuff with the first table and will only need the second and third tables when more information about a product is needed.
If you're going to need the information in the second and third tables most times you query the table, then there's no reason to join over every time and you should just keep it in one table.
I would suggest example a, in case there is a defined set of attributes for product, and an example c if you need variable number of attributes (new attributes keep coming every now and then) -
example c
productID productName
1 book
2 watch
3 ring
attrID productID attrType attrValue
1 1 weight 130
2 1 no_of_pages 500
3 2 weight 50
The table structure you have shown in example b is not normalized - there will be separate id columns required in second and third tables, since productId will be an fk and not a pk.
It depends on how many rows you are expecting on your PRODUCTS table. I would say that it would not make sense to normalize your tables to 3N in this case because product name, weight, and no_of_pages each describe the products. If you had repeating data such as manufacturers, it would make more sense to normalize your tables at that point.
Without knowing the background (data model), there is no way to tell which variant is more "correct". both are fine in certain scenarios.
You want three tables, full stop. That's best because there's no chance of watches winding up with pages (no pun intended) and some books without. If you normalize, the server works for you. If you don't, you do the work instead, just not as well. Up to you.
I am asking this because in my scenario one product holding primary key could have sufficient/applicable data for only one column while other columns would be empty.
That's always true of nullable columns. Here's the rule: a nullable column has an optional relationship to the key. A nullable column can always be, and usually should be, in a separate table where it can be non-null.
If I were to have an online shopping website that sold apples and monitors and these were stored in different tables because the distinguishing property of apples is colour and that of monitors is resolution how would I add these both to an invoice table whilst still retaining referential integrity and not unioning these tables?
Invoices(InvoiceId)
|
InvoiceItems(ItemId, ProductId)
|
Products(ProductId)
| |
Apples(AppleId, ProductId, Colour) Monitors(MonitorId, ProductId, Resolution)
In the first place, I would store them in a single Products table, not in two different tables.
In the second place, (unless each invoice was for only one product) I would not add them to a single invoice table - instead, I would set up an Invoice_Products table, to link between the tables.
I suggest you look into Database Normalisation.
A question for your data model is You need a reference scheme will you use to identify products? Maybe SKU ?
Then identify each apple as a product by assigning an SKU. Likewise for monitors. Then use the SKU in the invoice item. Something like this:
product {sku}
key {sku};
invoice_item {invoice_id, sku}
key {invoice_id, sku} ;
apple {color, sku}
key {color}
key {sku};
monitor {size, sku}
key {size}
key {sku};
with appropriate constrains... in particular, the union of apple {sku} and monitor {sku} == product {sku}.
So Invoice table has a ProductID FK, and a ProductID can be either an AppleID (PK color) or MonitorID (PK resolution)?
If so, you can introduce a ProductTypeID with values like 0=apple, 1=monitor, or a isProductTypeApple boolean if there's only ever going to be 2 product types, and include that in the ProductID table PK.
You also need to include the ProductTypeID field in the Apple table and Monitor table PK.
I like name-value tables for these...It might be easier to redesign so it goes 'Product' and then 'product details'...product details holds the product id, the detail type and then the value. This would allow you to hold apples and monitors in the same table regardless of identifying attribute (and leave it open for other product to be added later on).
Similiar approach can be taken in the invoice table...have a 'product_type' column that tells you which table to look into (apple or monitor) and then a 'product_id' that references whatever ID column is in the apple/monitor table. Querying on a setup like this is a bit difficult and may force you to use dynamic sql...I'd only take this route if you have no control over doing the redesign above (and other answers posted here refer to)
First solution is preferential I would think...change the design on this db to the name value pair with the products and you'll save headaches writing your queries later.