SQL schema design: two tables vs adding columns to the same table - sql

This question is about design decision, hence might be a bit opinionated:
Imagine you are designing database for a car dealer where they ONLY auction cars. Some cars are for display only, and some cars are to be sold in auction.
I have a Car entity with 10 attributes: ID, Model, Mode, YearMade, IsDisplayOnly....
Now, I want to add selling price and selling notes to those cars that are for sale (i.e. IsDisplayOnly = false)
I image that there are two ways this can be done:
Add Price and PriceNotes columns into the Car table, knowing that they are always null for IsDisplayOnly = true cars, and those that haven't been sold at auction yet.
Add a new table SaleInfo with 3 columns: CarID, Price, PriceNotes where CarID is the PK and also FK pointing to the ID column in the Car table.
Which option would align most with the best schema design practice? Why?

You should have one car for cars and the attributes of cars. You should have a separate table for the cars for auction.
Why? These are different entities. Your problem definition suggests an auction table. That auction table should have a foreign key references to the cars that are available for auction. A separate table ensures that that foreign key reference is valid.
There are some other reasons that are not apparent in your simplified example. Notes and prices might change over time, so they should be going into a history table. Display cars have other attributes, like the period of time when they are on display and how they are ultimately disposed of. This suggests that they too have particular attributes.

My advice would be to use three tables:
-The first to store all the makes and models of the cars. As well as their costs(eg Honda something or other selling for X amount of money)
-The second to store the details of the individual vehicles, containing a foreign key to the primary key of one of the Make/Model stored in the first table, as well as individual details such as the color, VIN no. etc. As well as whether they can be sold or not.
-The third table would contain the details of each individual purchases, linked to the table containing each individual vehicle, this would be linked to the table containing the details of each individual vehicle, with each purchase connected to a single instance. On the table of vehicles.
The advantages for this layout is that you are actually going to end up using less storage space in the long run, as instead of having the same three fields (The make, model and year) repeating for every vehicle, you will only have a single field to represent that data instead of multiple redundant fields. Another advantage will be searching, as if you are searching for details of individual vehicles of the same brand/type, you will be able to search using only one field, the key linked to the table containing the make and model. This would drastically decrease search times and improve the effectiveness of the system overall.

Related

Why need to add ID field to products categories of of online shopping database?

I am just started to learn about relational database. When I studied the database of online shopping websites, I found that many examples create a category table and added ID field to the category name. I don't know why they need to create a category table and use category ID as a foreign key to relate products table. What will happen if I remove the category table and add the category name directly to the products table?
What I think is a lot of cases that you want a website menu showing your categories. This menu allows people to view your categories (Men Clothing, Women Clothing, Kids, Accessories) and once they click it they can see the products relevant to them.
If you put the category name to the product, it is very hard for you to update your menu content as you need to loop, group the category in the product table. Also, it is harder to update the category name in product table as a category name could be in lots of product records,
Whereas if you have a category table, you just need to maintain the category table (view what you have in the category table and update DB record if you want your menu change).
In long term maintenance, category table is desired.
In a case I have come over that I would like an empty category which just to show in the website menu (a menu item which contains no product) which is not possible if I do not have a category table.
By inserting just the category name you may complete your POC but you need to understand what is Normalization and why it is needed.
First normal form (1NF) : An entity type is in 1NF when it contains no repeating groups of data.
Second normal form (2NF) : An entity type is in 2NF when it is in 1NF and when all of its non-key attributes are fully dependent on its primary key.
Third normal form (3NF) : An entity type is in 3NF when it is in 2NF and when all of its attributes are directly dependent on the primary key.
Source
What will happen if I remove the category table and add the category name directly to the products table?
Suppose you store the category with each product, and one day your boss tells you that you misspelled a category name. Which one?
"Theater" he says. Or did he say "theatre?" Which is correct? You check and find about "theater" and "theatre" are used close to evenly among the products that have either one.
So which spelling did your boss mean is the mistake, and which one is correct?
If you store the correct spelling in one place, in its own categories table, then you can be sure. You can correct it, and all the products that reference it will implicitly get the correction.
That's an argument for normalization, but keep in mind using an integer id is only a convention. It has nothing to do with normalization. You can use a string as a primary key of a table, and therefore you can use a string as a foreign key in a table that references it.
It's okay to use a non-integer for key columns. As long as there is one instance that stores the canonical value, it satisfies the goal of normalization -- that is to reduce data anomalies.

Data Warehouse Design/Modeling (based on Figure in Data Mining textbook)

I found a schema in Google Images (see below) that can illustrate a problem I having in my data warehouse design:
My design is different, but this is the simplest figure I could find to convey my question, which is given the figure, I'm wondering how could the schema accommodate the following scenario: if a product had a unique number assigned to it by the SalesOrg (salesOrg_product_number)...For example, a salesOrg sells food items and assigns all food items of the same kind the same unique salesOrg_product_number. A different salesOrg would have a different salesOrg_product_number for that type of product.
I'm inclined to place the salesOrg_product_number attribute in the Product dimension table, but part of me thinks it should be in the salesOrg dimension table instead. I'm wondering which one of these is correct way in a data warehouse (not relational db) design to maintain the star schema?
In a perfect world the Primary Keys of a dimension table should be just surrogate key, without any meaning for the business. Table IDs should be invisible for the final users, but business code should be of course available.
A possible solution would be to have a product table with a structure like:
Product_id
Product_desc
Product_SO1_number
Product_SO2_number
...
Of course this will require to show the correct field to the correct Sales Organization. Depending on your reporting tool this can be more or less difficult. For example if you write your query manually you need just to put the right column in your select.
Another possibility would be to have a product/sales_org table, a table which combine the Product and the Sales_Org one:
Product_Sales_Org_id
Product_id
Sales_Org_id
Product_SO_number
...
This table will be child of the two dimension table and on the fact table you will have Product_Sales_Org_id column. Depending on Product and Sales Organization the Product_SO_number will return the correct number per SO.
If you want to have this in a star schema structure you can put Product/Sales_Org/Product_Sales_Org together in only one table like:
Product_Sales_Org_id
Product_id
Sales_Org_id
Product_desc
Sales_Org_desc
Product_SO_number
...
Sincerely I would go for the second solution, keep the Product and the Sales_Org tables separated, because they are two different business entities and implement the relationship table in the middle.
I hope this helps.

Should I store the parent and child in a row or only the child

Let's say I'm building a car rentals application, and at a point I have to show
all available cars.
In the DB, I have the following tables:
Brand
Model
Cars
In Model, there is a FK to Brand, let's say Brand_id
So my question is:
Cars table should have Brand and Model columns? Or only a Model column given that I could get the Brand from the Model column?
Basic normal form requires that the car table NOT include a column which is dependent on another (non PK) column in the car table. Since brand is dependent on model, it should not be in the car table.
Put simply, you're storing an awful lot of redundant information if you repeat brand for every car. You can access that info easily by JOINing the model table into a query. And what happens if you have some "car" rows that claim a model belongs to one brand while others claim it belongs to another? That's not only wrong, in the real world it's impossible and therefore it should not be possible in your database.
The cars table should have a key into model only. You'd never have a Ford Corvette or a Chevrolet Mustang, nor would you have the same car made my more than one manufacturer.
If you need the brand as well, you'd join all three tables, on model.id = cars.model and model.brand = brand.id.
It should have both FK references in my opinion. Dependant on the relationships of these tables you'd also want to avoid a possible fan trap.
For example
A --- (1:N) --- B --- (1:N) --- C
Another example can be for instance:
A is any dimension (for instance DATES)
B is a table with shipping information and data is on a package level
C is a detailed table with an info on a package items level
Make sense?
It depends I think. If you query car brand from your cars table A LOT, and I mean A LOT, (That means you want to know the brand of the car without knowing the model therefore you join car and brand tables without using model) you may consider adding brandid to your cars table so you can join Cars and Brand tables without using Model table.
If you have so many entries in your cars table (Again, I mean Soo Many), (and I doubt it) you may consider reducing the table size by deleting brandid and just use Model table to access to brands.
But that's not the only case. If you store brandid in your cars table, you wouldn't comply basic database normalization rules.
What I have mentioned affects performance when you have 10,000+ rows in tables. So I'm pretty sure you're ok in performance in both approaches.
I'd define just a modelid FK in my cars table since there's no need to have brandid when you have modelid and you're not worried about the performance.

Should I denormalize Loans, Purchases and Sales tables into one table?

Based on the information I have provided below, can you give me your opinion on whether its a good idea to denormalize separate tables into one table which holds different types of contracts?.. What are the pro's/con's?.. Has anyone attempted this before?.. Banking systems use a CIF (Customer Information File) [master] where customers may have different types of accounts, CD's, mortgages, etc. and use transaction codes[types] but do they store them in one table?
I have separate tables for Loans, Purchases & Sales transactions. Rows from each of these tables are joined to their corresponding customer in the following manner:
customer.pk_id SERIAL = loan.fk_id INTEGER;
= purchase.fk_id INTEGER;
= sale.fk_id INTEGER;
Since there are so many common properties among these tables, which revolves around the same merchandise being: pawned, bought and sold, I experimented by consolidating them into one table called "Contracts" and added the following column:
Contracts.Type char(1) {L=Loan, P=Purchase, S=Sale}
Scenario:
A customer initially pawns merchandise, makes a couple of interest payments, then decides he wants to sell the merchandise to the pawnshop, who then places merchandise in Inventory and eventually sells it to another customer.
I designed a generic table where for example:
Contracts.Capital DECIMAL(7,2)
in a loan contract it holds the pawn principal amount, in a purchase holds the purchase price, in a sale holds sale price.
Is this design a good idea or should I keep them separate?
Your table second design is better, and, is "normalised".
You first design was the denormalised one!
You are basiclly following a database design/modelling pattern known as "subtype/supertype"
for dealing with things like transactions where there is a lot of common data and some data specific to each tranasaction type.
There are two accepted ways to model this. If the the variable data is minimal you hold everthing in a single table with the transaction type specfic attributes held in "nullable" columns. (Which is essentially your case and you have done the correct thing!).
The other case is where the "uncommon" data varies wildly depending on transaction type in which case you have a table which holds all the "common" attributes, and a table for each type which holds the "uncommon" attributes for that "type".
However while "LOAN", "PURCHASE" and "SALE" are valid as transactions. I think Inventory is a different entity and should have a table on its own. Essentially "LOAN" wll add to inventory transaction, "PURCHASE" wll change of inventory status to Saleable and "SALE" wll remove the item from inventory (or change status to sold). Once an item is added to inventory only its status should change (its still a widget, violin or whatever).
I would argue that it's not denormalized. I see no repeating groups; all attributes depending on a unique primary key. Sounds like a good design to me.
Is there a question here? Are you just looking for confirmation that it's acceptable?
Hypothetically, what would you do if the overwhelming consensus was that you should revert back to separate tables for each type? Would you ignore the evidence from your own experience (better performance, simpler programming) and follow the advice of Stackoverflow.com?

Relational database tables

I'm currently working on an ASP.Net MVC project for a software engineering class. My goal is to create a small online game rental system. I currently have a 3 tables, Movies, Games and Registrants; and I'm using LINQ-to-SQL to define each of these tables as classes for my model. So far I've created models for Movies and Games, what I would like to do when creating the Registrant model is create a relationship between Registrants and Movies and Games. What I've tried so far is to define a foreign key between the ID (the primary key in the Registrant table) and a registrantID field in both the Movies and Games. What I realized is that if I were to remove an instance of a registrant, it will delete the associated movie and/or game from the other tables. What I'm thinking of doing is creating two separate models defining rentedGames and rentedMovies and creating a relationship between those and the Games and Movies table in order to try and model a registrant renting/returning/buying movies or games from the store.
In Summary:
What I have so far:
3 tables: Registrants, Movies and
Games.
LINQ-to-SQL models for my
inventory of movies and games.
What I'm trying to setup:
A model for a registrant renting/returning a movie and/or game, when a game is rented/returned, a flag is placed next to the item in the inventory to indicate its status.
Question:
Will adding separate tables to model
a rented movie/game prevent items
defined in my inventory models from
being deleted?? i.e. when a customer returns a rented movie, the rentedMovie instance is deleted, but not the movie is is referring to in the movie inventory.
Is there such a thing as a related
table having a status flag set on the
related entry, as opposed to the
entry being deleted, whenever the
associated entry in the other table
is modified?? i.e. when a customer returns a rented movie, the rentedMovie instance sets a flag in the movie it refers to that it's available for rent, the rentedMovie instance is then deleted.
I'd go about this a bit differently. First, is there a real reason to treat a Movie and a Game as separate entities? Why not have a RentableItem that can be either a movie, a game, a game machine, a Blue-Ray player, or whatever? You'd key it by an item_id field, and it would have the expected metadata (title, type, genre, rental_class, and so on).
Then you need to model the fact that a Registrant rents one or more RentableItems. This can be done with a Rental table, whose rows each connect one rented RentableItem with a particular Registrant (that is, the Rental is keyed by a rental_id and it has a foreign key to RentableItem.item_id and a foreign key to Registrant.registrant_id. The Rental would also have the due date, a "returned" flag, the price of the rental, etc.
Then you know a RentableItem is not in the store if there is a Rental record whose item_id is the same as the RentableItem's and whose "returned" flag is false. You never have to modify the RentableItem table itself, just the Rental table.
You're right to create separate tables for rentedGames and rentedMovies, since this model now allows for more than one movie or game of the same type being rented at the same time, which is surely more realistic than having only one instance of a particular movie or game.
This will prevent the deletion of the parent record, when the link record (rentedMovie, say) is deleted. But this deletion of the parent movie should not be happening anyway if you've set up your relationship to not 'cascade delete', and you allowed the registrantID field in the original Movies or Games tables to be nullable.
To answer your second question (which I realise assumes only one movie/game for any particlar title): the way this is normally done, if you're using link tables, which is what you want to do, is simply to delete the rentedMovie/Game record. The absence of a link record for any Movie or Game is all your code needs to determine in order to know that that movie or game is now rentable (again).
I know you're doing this for a class / practice, so this may not be relevant, but consider that having the rental history for things is often very useful. Because of this, you may not want to delete the rented records, but instead just mark the item as returned.
Consider:
TABLE RentalTransaction:
RentalTransactionID integer PK NOT NULL
CustomerID integer FK NOT NULL
RentedOn datetime NOT NULL
DueDate datetime NOT NULL
<..any other fields you may need..>
TABLE RentalItems:
RentedID integer PK NOT NULL
RentalTransactionID integer FK NOT NULL
RentedItemID integer FK NOT NULL
RentedQty integer NOT NULL
RentalRetuned datetime NULL
You can see if any individual item is out or not by if it's RentalReturned field is null or not. If it is nonull, then you know the item is back, and now you can aggregate rental data to see how often it goes out, what the average length of rental is, etc, etc. You would have to build in some checks to make sure you weren't renting more copies of an item than you actually have and other such things, but I think this is overall a more flexible start to a schema. It may also be overly complicated for what you're doing, but I wanted to at least bring the idea up.
Do you really want to delete the rentedMovie instance? How will you report on how many movies a person has rented etc?
I'd suggest rethinking your model slightly. You need somewhere to store people data, somewhere to store item data and somewhere to store people/item data as a first step.
Ignore the difference between movies and games for now - that becomes a process of normalisation once you've defined your underlying structure.
As a simple starting point you should have:
Persons 1..1 ---- 1..* Hires 0..* ---- 1..1 Items
where the Hires table is a linking table between the two others with a combined key made up of personID, ItemID and a time-stamp of some description (to allow re-renting of the same movie).
You can then look at having a separate table for item types etc.
First thing to consider is that a movie is actually two entities, title and media. Title is "Lord of the Rings", while media is a DVD you take home. One title can have many media (copies), while one media has one title. Rental table has a row for each media-rental, this table gets a new row each time a bar code is scanned on rental, while DateReturned is populated upon return. Status field in the Media table tracks the in/out status for each disc/game.
If you feel that you need to track which movies were rented together to a customer, you may find that by DateRented (datetime) or add a ReceiptNumber or ShoppingBasketID to the Rental table.