which one is the better normalization? - sql

I'm not good at normalization databases and I have scenario that get me confused.
We need a database for a storage software with some usual abilities like:
storing customers and goods
storing purchase invoice
storing sales Invoice
get us current amount of goods in stock.
well I have 2 options in my mind:
1. First Solution
We have 4 tables like this:
goods:
ID: PK; ->unique id for each of wares
Name ; ->this is clear enough ;)
customers:
ID: PK;
Name;
invoices:
ID: PK;
ID_Customer: FK;
Date;
invoices_items:
ID_Invoice: FK;
ID_Ware: FK;
Qty: quantity of ware that was bought or sold. for buying number is positive and selling number is negative
2. Second Solution
We have 3 tables like this:
goods and customers are like the first
invoices:
ID: its not PK;
ID_Customer: FK;
ID_goods: FK;
Date;
Qty
actually the difference between first one and second is in Invoices.
So my questions is clear:
Which one is better?
If there is third one that is better than my solutions please advice me.
Please tell me some ways that help me improve my power of normalization.
At last sorry for my bad English ;)

You should definitely go for the first... with both invoices and invoices_items. It is more normalized. The invoice date and other invoice data you may want to add in the future (number, status, date_delivered or other such stuff) should reside in its own table.
If you opt for the second solution you will have complex maintenance issues. If you want to change the invoice date, you will have to do so on all invoices_items rows. And you will never be sure you just have one single date on all rows. If data can go wrong... it will. To avoid this try to have all the data in the correct table, where it resides logically. Do not repeat it on multiple rows, just to save yourself the creation of one table.

Your first solution is better. In your second solution, you're effectively duplicating the date and customer for each item in your invoice, which is both more prone to error (What if the dates are different on items with the same ID?) and less storage-efficient.

Related

Turn two database tables into one?

I am having a bit of trouble when modelling a relational database to an inventory managament system. For now, it only has 3 simple tables:
Product
ID | Name | Price
Receivings
ID | Date | Quantity | Product_ID (FK)
Sales
ID | Date | Quantity | Product_ID (FK)
As Receivings and Sales are identical, I was considering a different approach:
Product
ID | Name | Price
Receivings_Sales (the name doesn't matter)
ID | Date | Quantity | Type | Product_ID (FK)
The column type would identify if it was receiving or sale.
Can anyone help me choose the best option, pointing out the advantages and disadvantages of either approach?
The first one seems reasonable because I am thinking in a ORM way.
Thanks!
Personally I prefer the first option, that is, separate tables for Sales and Receiving.
The two biggest disadvantage in option number 2 or merging two tables into one are:
1) Inflexibility
2) Unnecessary filtering when use
First on inflexibility. If your requirements expanded (or you just simply overlooked it) then you will have to break up your schema or you will end up with unnormalized tables. For example let's say your sales would now include the Sales Clerk/Person that did the sales transaction so obviously it has nothing to do with 'Receiving'. And what if you do Retail or Wholesale sales how would you accommodate that in your merged tables? How about discounts or promos? Now, I am identifying the obvious here. Now, let's go to Receiving. What if we want to tie up our receiving to our Purchase Order? Obviously, purchase order details like P.O. Number, P.O. Date, Supplier Name etc would not be under Sales but obviously related more to Receiving.
Secondly, on unnecessary filtering when use. If you have merged tables and you want only to use the Sales (or Receving) portion of the table then you have to filter out the Receiving portion either by your back-end or your front-end program. Whereas if it a separate table you have just to deal with one table at a time.
Additionally, you mentioned ORM, the first option would best fit to that endeavour because obviously an object or entity for that matter should be distinct from other entity/object.
If the tables really are and always will be identical (and I have my doubts), then name the unified table something more generic, like "InventoryTransaction", and then use negative numbers for one of the transaction types: probably sales, since that would correctly mark your inventory in terms of keeping track of stock on hand.
The fact that headings are the same is irrelevant. Seeking to use a single table because headings are the same is misconceived.
-- person [source] loves person [target]
LOVES(source,target)
-- person [source] hates person [target]
HATES(source,target)
Every base table has a corresponding predicate aka fill-in-the-[named-]blanks statement describing the application situation. A base table holds the rows that make a true statement.
Every query expression combines base table names via JOIN, UNION, SELECT, EXCEPT, WHERE condition, etc and has a corresponding predicate that combines base table predicates via (respectively) AND, OR, EXISTS, AND NOT, AND condition, etc. A query result holds the rows that make a true statement.
Such a set of predicate-satisfying rows is a relation. There is no other reason to put rows in a table.
(The other answers here address, as they must, proposals for and consequences of the predicate that your one table could have. But if you didn't propose the table because of its predicate, why did you propose it at all? The answer is, since not for the predicate, for no good reason.)

SQL Normalization with multiple "measures" tables

I'm currently trying to redesign a Point of Sale database to make it more normalized, which will help tremendously with managing the data, etc. I'm a little bit unsure about the best design practices, based on the data I have to deal with. First of all there are basically two sets of measures, which share common keys. There is inventory data, units and dollars, and then point of sale data, units and dollars. Each of these is a customer, store, item and date level.
What I've done (mostly in theory at this point) is to create separates table for
Item level information
Item_ID,
Customer_ID
itemnumber
(and a few other item specific information).
Stores
Store_ID,
Customer_ID,
Store Number,
(and essentially address information)
Customer
Customer_ID,
Customer Number
(other customer specific information like name).
So in addition to those "support" tables, I have the
Main Inventory Data
Store_ID
Item_ID
I also have POS Data table, with the exact same ID's.
Basically my questions are:
should I include the Customer ID in the Pos Data and Inventory Data tables, even though they are a part of both the stores and items tables?
My second question would be, if I do add the customer ID, if I would join all of these tables together,
would I join the customer ID from all of the tables (Pos Data, Stores and Items OR Inventory Data, Stores and Items) to the customers table or
would just joining from the Pos Data table be sufficient.
Let me give a few additional details, regarding the data. As an example, we have two Customers, CustomerA and CustomerB. CustomerA has several stores whose store numbers are 1000,1025, 1036 and 1037. CustomerB also has several stores, whose store numbers are 1025, 1030, and 1037. Store numbers 1025 and 1037 happen to be the same between customers, but the stores themselves are unique and completely different.
CustomerA's Store Number 1000 sells three of our items (this is a wholesale perspective), which are Items ABC, DEF and EFG. CustomerA's Store Number 1025 also sells three of our items, which are ABC, HIJ and XYZ.
Each of these items contains two import pieces of data, in regards to its relationship to its specific customer and store number, Point of Sale data and Inventory Data. Point of Sale data would be in the form of PosUnits, which would be the quantity of an item that were sold, and PosDollars, which would be the total Dollars of the item that were sold in that store (essentially the number of units times the price it was sold for). The Inventory Data would be in InventoryUnits, which is the quantity of an item that is in stock at a store. [one thing to note, I separated inventory and pos data into separate tables, because we don't always receive both pieces of data from every customer. Also inventory and POS data are generally analyzed separately].
So, back to my example, CustomerA's Store Number 1000, item ABC may have sold 100 units, which is $1245.00. CustomerA's Store Number 1025, may have sold only 10 units of the same item for $124.50.
Now if we go back to CustomerB, it just so happens this Customer also has an item named ABC that it sells in many of their stores. CustomerA's item ABC is a completely different product from CustomerB's item ABC. It's purely coincidental that they named them the same thing.
Let me add this last point of clarification, which I probably should have stated earlier. My perspective is as a wholesaler. When I say item, I'm speaking of the customers item number, not the wholesalers item number. There is a cross reference involved in getting to the wholesalers item and the customer may have more than one of their item number the reference the same wholesaler item number. I don't think it' necessary to delve into that, though.
Question #1: As part of the normalization rules, you should avoid to include redundant data in any table unless there are performance issues that require de-normalization. there are thousands of articles that will explain why avoiding redundancy.
As for Question #2: in the rules are only pick the columns that you need in your queries, if you need the Customer_ID pick it from where is cheaper for the database
Allow me to raise one more question
why do you have repeated Customer_ID in Stores and Item_level when you can join them thought the Main Inventory Data. this is another redundancy.

Improvement on database schema

I'm creating a small pet shop database for a project
The database needs to have a list of products by supplier that can be grouped by pet type or product category.
Each in store sale and customer order can have multiple products per order and an employee attached to them the customer order must be have a customer and employee must have a position,
http://imgur.com/2Mi7EIU
Here are some random thoughts
I often separate addresses from the thing that has an address. You could make 1-many relationships between Employee, Customer and Supplier to an address table. That would allow you to have different types of addresses per entity, and to change addresses without touching the original table.
If it is possible for prices to change for an item, you would need to account for that somehow. Ideas there are create a pricing table, or to capture the price on the sales item table.
I don't like the way you handle the sales item table. the different foreign keys based on the type of the transaction is not quite correct. An alternative would be to replace SalesItem SaleID and OrderId with the SalesRecordId... another better option would be to just merge the fields from InStoreSale, SalesRecord, and CustomerOrders into a single table and slap an indicator on the table to indicate which type of transaction it was.
You would probably try to be consistent with plurality on your tables. For example, CustomerOrders vs. CustomerOrder.
Putting PositionPay on the EmployeePosition table seems off to... Employees in the same position typically can have different pay.
Is the PetType structured with enough complexity? Can't you have items that apply to more than one pet type? For example, a fishtank can be used for fish or lizards? If so, you will need a many-to-many join table there.
Hope this helps!

Many items in single order, SQL table design

I got one order table, when user come to the store they could purchase more than one item for example. may i know what is the best practices to perform such record action ? is that each item order one row in order table or mix all into one row in order table ?
Let say, Customer A purchase 3 items from store. then will be 3 row in order table ? or 1 row in order table with all details and separate order with delimiter ?
My designed structure is like this::
tblOrder
OrderID (Primary Key)
UserID
TotalPrice
tblOrderItem
OrderItemID (Primary Key)
OrderID (Referencing tblOrder)
Quantity
ItemID (Referencing tblItem)
TotalPrice
Is this correct ?
I will suggest you to maintain 3 rows, because it will be easy for you to maintain further.
Updated:
#SLim, The structure is looking perfect for the situation now. You will proceed further with this.
Without question, each order should have its own row in the database table.
Otherwise, for example, if you wanted to find all orders where a particular item was purchased -- how would you do it? It's much easier if there's a single item per row.
More than that, this is the approach taken by larger e-commerce sites. If you're interested in learning to develop software professionally you must learn the best practices.
Good luck!

Database structure for storing historical data

Preface:
I was thinking the other day about a new database structure for a new application and realized that we needed a way to store historical data in an efficient way. I was wanting someone else to take a look and see if there are any problems with this structure. I realize that this method of storing data may very well have been invented before (I am almost certain it has) but I have no idea if it has a name and some google searches that I tried didn't yield anything.
Problem:
Lets say you have a table for orders, and orders are related to a customer table for the customer that placed the order. In a normal database structure you might expect something like this:
orders
------
orderID
customerID
customers
---------
customerID
address
address2
city
state
zip
Pretty straightforward, orderID has a foreign key of customerID which is the primary key of the customer table. But if we were to go and run a report over the order table, we are going to join the customers table to the orders table, which will bring back the current record for that customer ID. What if when the order was placed, the customers address was different and it has been subsequently changed. Now our order no longer reflects the history of that customers address, at the time the order was placed. Basically, by changing the customer record, we just changed all history for that customer.
Now there are several ways around this, one of which would be to copy the record when an order was created. What I have come up with though is what I think would be an easier way to do this that is perhaps a little more elegant, and has the added bonus of logging anytime a change is made.
What if I did a structure like this instead:
orders
------
orderID
customerID
customerHistoryID
customers
---------
customerID
customerHistoryID
customerHistory
--------
customerHistoryID
customerID
address
address2
city
state
zip
updatedBy
updatedOn
please forgive the formatting, but I think you can see the idea. Basically, the idea is that anytime a customer is changed, insert or update, the customerHistoryID is incremented and the customers table is updated with the latest customerHistoryID. The order table now not only points to the customerID (which allows you to see all revisions of the customer record), but also to the customerHistoryID, which points to a specific revision of the record. Now the order reflects the state of data at the time the order was created.
By adding an updatedby and updatedon column to the customerHistory table, you can also see an "audit log" of the data, so you could see who made the changes and when.
One potential downside could be deletes, but I am not really worried about that for this need as nothing should ever be deleted. But even still, the same effect could be achieved by using an activeFlag or something like it depending on the domain of the data.
My thought is that all tables would use this structure. Anytime historical data is being retrieved, it would be joined against the history table using the customerHistoryID to show the state of data for that particular order.
Retrieving a list of customers is easy, it just takes a join to the customer table on the customerHistoryID.
Can anyone see any problems with this approach, either from a design standpoint, or performance reasons why this is bad. Remember, no matter what I do I need to make sure that the historical data is preserved so that subsequent updates to records do not change history. Is there a better way? Is this a known idea that has a name, or any documentation on it?
Thanks for any help.
Update:
This is a very simple example of what I am really going to have. My real application will have "orders" with several foreign keys to other tables. Origin/destination location information, customer information, facility information, user information, etc. It has been suggested a couple of times that I could copy the information into the order record at that point, and I have seen it done this way many times, but this would result in a record with hundreds of columns, which really isn't feasible in this case.
When I've encountered such problems one alternative is to make the order the history table. Its functions the same but its a little easier to follow
orders
------
orderID
customerID
address
City
state
zip
customers
---------
customerID
address
City
state
zip
EDIT: if the number of columns gets to high for your liking you can separate it out however you like.
If you do go with the other option and using history tables you should consider using bitemporal data since you may have to deal with the possibility that historical data needs to be corrected. For example Customer Changed his current address From A to B but you also have to correct address on an existing order that is currently be fulfilled.
Also if you are using MS SQL Server you might want to consider using indexed views. That will allow you to trade a small incremental insert/update perf decrease for a large select perf increase. If you're not using MS SQL server you can replicate this using triggers and tables.
When you are designing your data structures, be very carful to store the correct relationships, not something that is similar to the correct relationships. If the address for an order needs to be maintained, then that is because the address is part of the order, not the customer. Also, unit prices are part of the order, not the product, etc.
Try an arrangement like this:
Customer
--------
CustomerId (PK)
Name
AddressId (FK)
PhoneNumber
Email
Order
-----
OrderId (PK)
CustomerId (FK)
ShippingAddressId (FK)
BillingAddressId (FK)
TotalAmount
Address
-------
AddressId (PK)
AddressLine1
AddressLine2
City
Region
Country
PostalCode
OrderLineItem
-------------
OrderId (PK) (FK)
OrderItemSequence (PK)
ProductId (FK)
UnitPrice
Quantity
Product
-------
ProductId (PK)
Price
etc.
If you truly need to store history for something, like tracking changes to an order over time, then you should do that with a log or audit table, not with your transaction tables.
Normally orders simply store the information as it is at the time of the order. This is especially true of things like part numbers, part names and prices as well as customer address and name. Then you don;t have to join to 5 or six tables to get teh information that can be stored in one. This is not denormalization as you actually need to have the innformation as it existed at the time of the order. I think is is less likely that having this information in the order and order detail (stores the individual items ordered) tables is less risky in terms of accidental change to the data as well.
Your order table would not have hundreds of columns. You would have an order table and an order detail table due to one to many relationships. Order table would include order no. customer id 9so you can search for everything this customer has ever ordered even if the name changed), customer name, customer address (note you don't need city state zip etc, put the address in one field), order date and possibly a few other fields that relate directly to the order at a top level. Then you have an order detail table that has order number, detail_id, part number, part description (this can be a consolidation of a bunch of fields like size, color etc. or you can separate out the most common), No of items, unit type, price per unit, taxes, total price, ship date, status. You put one entry in for each item ordered.
If you are genuinely interested in such problems, I can only suggest you take a serious look at "Temporal Data and the Relational Model".
Warning1 : there is no SQL in there and almost anything you think you know about the relational model will be claimed a falsehood. With good reason.
Warning2 : you are expected to think, and think hard.
Warning3 : the book is about what the solution for this particular family of problems ought to look like, but as the introduction says, it is not about any technology available today.
That said, the book is genuine enlightenment. At the very least, it helps to make it clear that the solution for such problems will not be found in SQl as it stands today, or in ORMs as those stand today, for that matter.
What you want is called a datawarehouse. Since datawarehouses are OLAP and not OLTP, it is recommended to have as many columns as you need in order to achieve your goals. In your case the orders table in the datawarehouse will have 11 fields as having a 'snapshot' of orders as they come, regardless of users accounts updates.
Wiley -The Data Warehouse Toolkit, Second Edition
It's a good start.
Our payroll system uses effective dates in many tables. The ADDRESSES table is keyed on EMPLID and EFFDT. This allows us to track every time an employee's address changes. You could use the same logic to track historical addresses for customers. Your queries would simply need to include a clause that compares the order date to the customer address date that was in effect at the time of the order. For example
select o.orderID, c.customerID, c.address, c.city, c.state, c.zip
from orders o, customers c
where c.customerID = o.customerID
and c.effdt = (
select max(c1.effdt) from customers c1
where c1.customerID = c.customerID and c1.effdt <= o.orderdt
)
The objective is to select the most recent row in customers having an effective date that is on or before the date of the order. This same strategy could be used to keep historical information on product prices.
I myself like to keep it simple. I would use two tables: a customer table and a customer history table. If you have the key (e.g. CustomerID) in the history table there is no reason to make a joining table, a select on that key will give you all records.
You also don't have audit information (e.g. date modified, who modified etc) in the history table as you show it, I expect you want this.
So mine would look something like this:
CustomerTable (this contains current customer information)
CustomerID (distinct non null)
...all customer information fields
CustomerHistoryTable
CustomerID (not distinct non null)
...all customer information fields
DateOfChange
WhoChanged
The DateOfChange field is the date the customer table was changed (from the values in this record) to the values in a more recent record of the values in the CustomerTable.
You orders table just needs a CustomerID if you need to find the customer information at the time of the order it is a simple select.