SQL Normalization with multiple "measures" tables - sql

I'm currently trying to redesign a Point of Sale database to make it more normalized, which will help tremendously with managing the data, etc. I'm a little bit unsure about the best design practices, based on the data I have to deal with. First of all there are basically two sets of measures, which share common keys. There is inventory data, units and dollars, and then point of sale data, units and dollars. Each of these is a customer, store, item and date level.
What I've done (mostly in theory at this point) is to create separates table for
Item level information
Item_ID,
Customer_ID
itemnumber
(and a few other item specific information).
Stores
Store_ID,
Customer_ID,
Store Number,
(and essentially address information)
Customer
Customer_ID,
Customer Number
(other customer specific information like name).
So in addition to those "support" tables, I have the
Main Inventory Data
Store_ID
Item_ID
I also have POS Data table, with the exact same ID's.
Basically my questions are:
should I include the Customer ID in the Pos Data and Inventory Data tables, even though they are a part of both the stores and items tables?
My second question would be, if I do add the customer ID, if I would join all of these tables together,
would I join the customer ID from all of the tables (Pos Data, Stores and Items OR Inventory Data, Stores and Items) to the customers table or
would just joining from the Pos Data table be sufficient.
Let me give a few additional details, regarding the data. As an example, we have two Customers, CustomerA and CustomerB. CustomerA has several stores whose store numbers are 1000,1025, 1036 and 1037. CustomerB also has several stores, whose store numbers are 1025, 1030, and 1037. Store numbers 1025 and 1037 happen to be the same between customers, but the stores themselves are unique and completely different.
CustomerA's Store Number 1000 sells three of our items (this is a wholesale perspective), which are Items ABC, DEF and EFG. CustomerA's Store Number 1025 also sells three of our items, which are ABC, HIJ and XYZ.
Each of these items contains two import pieces of data, in regards to its relationship to its specific customer and store number, Point of Sale data and Inventory Data. Point of Sale data would be in the form of PosUnits, which would be the quantity of an item that were sold, and PosDollars, which would be the total Dollars of the item that were sold in that store (essentially the number of units times the price it was sold for). The Inventory Data would be in InventoryUnits, which is the quantity of an item that is in stock at a store. [one thing to note, I separated inventory and pos data into separate tables, because we don't always receive both pieces of data from every customer. Also inventory and POS data are generally analyzed separately].
So, back to my example, CustomerA's Store Number 1000, item ABC may have sold 100 units, which is $1245.00. CustomerA's Store Number 1025, may have sold only 10 units of the same item for $124.50.
Now if we go back to CustomerB, it just so happens this Customer also has an item named ABC that it sells in many of their stores. CustomerA's item ABC is a completely different product from CustomerB's item ABC. It's purely coincidental that they named them the same thing.
Let me add this last point of clarification, which I probably should have stated earlier. My perspective is as a wholesaler. When I say item, I'm speaking of the customers item number, not the wholesalers item number. There is a cross reference involved in getting to the wholesalers item and the customer may have more than one of their item number the reference the same wholesaler item number. I don't think it' necessary to delve into that, though.

Question #1: As part of the normalization rules, you should avoid to include redundant data in any table unless there are performance issues that require de-normalization. there are thousands of articles that will explain why avoiding redundancy.
As for Question #2: in the rules are only pick the columns that you need in your queries, if you need the Customer_ID pick it from where is cheaper for the database
Allow me to raise one more question
why do you have repeated Customer_ID in Stores and Item_level when you can join them thought the Main Inventory Data. this is another redundancy.

Related

Remove duplicates from fact table to calculate measure correctly

I'm very new to data warehousing and dimensional modelling. For a uni project I started out with a database that I need to turn into a data warehouse for analyses. To end up with a clean star schema, I had to denormalize a few tables into 1 fact table. The downside to this is the amount of redundancy.
Below is a part of the data from the fact table:
A voyage consists of multiple shipments, and a shipment can consist of multiple different items. In this example, containers 1-2000 of shipment 1 contain item 3, and containers 2001-5000 contain item 1. The total amount of containers for this shipment is 5000, obviously. However, for data analysis purposes, I need to calculate a measure for the total amount of containers per voyage. This presents a problem with the current fact table, because I have a record for each different item. So for voyage 1, the actual total amount should be 9200, but because of the duplication I'll end up with 19400, leading to an incorrect measure.
I need to find a way to get rid of the duplicates in my calculation, but I can't find a way to do so. Any help would be much appreciated.
What you'll need to do is group by your shipments (CTE, inner query, temp table, etc) to get the number of containers per shipment, then group by your voyages to get the number of containers per voyage.
Here's an example with an inner query:
SELECT voyage_id, SUM(num_ship_containers) AS num_voyage_containers
FROM (
SELECT voyage_id, shipment_id, MAX(container_end) AS num_ship_containers
FROM ShippingWarehouse
GROUP BY voyage_id, shipment_id
) AS ship_data
GROUP BY voyage_id;
voyage_id
num_voyage_containers
1
9200
Try it out!

Can i Create one table for purchases orders and sales orders?

I'm creating a Motorcycle store system, i wondering if i need to create one table that contains orders, the sales orders and the purchases orders with a column that will be called OrderType that determine if its purchase order or sale order
the same with Payments, money that i pay to supplier and money that customer pay to me, should be in table that called payment and column that determine if its outgoing payment or income
is that ok? or i need to create other tables
I would consider against it... Purchases are from a vendor YOU get the products from to build/fix/manufacture something.
Sales orders would have a customer you are selling to and thus would be foreign keys to different tables... unless your customer table has your vendors too and that has some column to identify difference as Vendor vs Customer.
Additionally, as you expand your design development and queries, purchasing history, etc., it may be more beneficial to have them separate.
You can create a single table. Whether this is good design or unfortunate design depends on how you use the data. How many times do you ever want to query these two datasets as if they were one dataset? How many times to you query them separately?

I am making a SQL database of categories and subcategories. What is the best way to link these tables?

The database has a table called "categories" with columns CATEGORY_ID(primary key) and CATEGORY_NAME.
I have subcategories for each category.
For better accessing which is the best method from the below methods.
Method 1: The "CATEGORY_ID" column in the "categories" table is a FOREIGN KEY in the "subcategories " table.
Method 2: Maintaining a separate table for each category representing the subcategories.
I prefer to use same table for category and sub category
like
Table Categories
[CATEGORY_ID, CATEGORY_NAME, PARENT_CATEGORY_ID]
In case you don't know how many sub categories are there.
This scenario is just an example, the scenario is as follows: We have a product table where all the records of products are stored. Same way we will have customer table where records of customers are stored. The daily sales keep the record of all the sales. This sales table will keep record of which product who has purchased. So linking is to be done from Sales table to product table and customer table.
The query to link the two tables is as follows:SELECT product_name, customer.name, date_of_sale FROM sales, product, customer WHERE product.product_id = sales.product_id and customer.customer_id >= sales.customer_id LIMIT 0, 30
It is better to go with Method 1 since it is more scalable.
Let me elaborate on this. If we go with method 1, we need to maintain 2 tables only that is Categories and Subcategories. In future if we have new categories or subcategories we can directly deal with this 2 tables.
If we consider same situation with Method2 then we need to create new tables every time, this may become maintenance overhead.
Let me be a bit more direct. You explain in a comment that Method 2 is a separate table for each category. If so, then Method 2 -- in general -- is just wrong.
There are two methods for storing this type of information. One is a Categories table with a (single) Subcategories table. The Subcategories table would have CategoryId, a foreign key reference back to Categories. This is the normalized data model.
The second method is to store everything in one table. Each row would be a category/subcategory combination. Information about a given category would be duplicated across multiple rows, so this is not a normalized approach. However, this is a typical approach when doing dimensional modeling for decision support systems.
If the subcategories are just names of things, there is a third approach, which would be to store a list of the subcategories within each Category row. The list would not be a delimited string. It would be JSON, a nested table, XML, array, or similar collection data type supported by the database you are using. I am mentioning this as a possibility, but not recommending it.

Improvement on database schema

I'm creating a small pet shop database for a project
The database needs to have a list of products by supplier that can be grouped by pet type or product category.
Each in store sale and customer order can have multiple products per order and an employee attached to them the customer order must be have a customer and employee must have a position,
http://imgur.com/2Mi7EIU
Here are some random thoughts
I often separate addresses from the thing that has an address. You could make 1-many relationships between Employee, Customer and Supplier to an address table. That would allow you to have different types of addresses per entity, and to change addresses without touching the original table.
If it is possible for prices to change for an item, you would need to account for that somehow. Ideas there are create a pricing table, or to capture the price on the sales item table.
I don't like the way you handle the sales item table. the different foreign keys based on the type of the transaction is not quite correct. An alternative would be to replace SalesItem SaleID and OrderId with the SalesRecordId... another better option would be to just merge the fields from InStoreSale, SalesRecord, and CustomerOrders into a single table and slap an indicator on the table to indicate which type of transaction it was.
You would probably try to be consistent with plurality on your tables. For example, CustomerOrders vs. CustomerOrder.
Putting PositionPay on the EmployeePosition table seems off to... Employees in the same position typically can have different pay.
Is the PetType structured with enough complexity? Can't you have items that apply to more than one pet type? For example, a fishtank can be used for fish or lizards? If so, you will need a many-to-many join table there.
Hope this helps!

How Should I Design TAX Table?

I have a db with many tables. Some tables have unit price column, no tax, gst and such columns. What should i do now? Should i create GST table, HST table and PST table separately. In other words, what is the standard schema of designing the tax table?
You only need one table for tax percentages:
CDN_TAXES
tax_id (primary key)
tax_name (GST, HST, PST)
tax_percentage
Because HST covers both PST and GST, I'd model this as three columns in the PRODUCT/etc table and use a CHECK constraint to enforce that either the HST column be associated, or at least one of PST/GST is associated to the product.
The amount of tax per sale, should be stored in a SALES table. Tax tables in Canada are updated twice a year, and the percentage can change. Which means if you want to reprint a receipt that is accurate, you have to capture the tax amounts at the time of sale.
I think it depends. For storing the type of tax and a rate, sure, you can normalize it in a table - and even capture it changing over time with a start and end date (although the actual tax charged on a sale would always need to be stored).
However, typically you will need more tax configuration. In the US, for instance, we have some things which are never taxed (milk), tax free days when nothing (with some exceptions like cars) is taxed temporarily, and special days when certain things are not taxed (hurricane preparation items like batteries and generators).
So your product table would certainly not hold a tax amount, but may hold some tax flags. Other tables may link the product to a jurisdiction which has taxes. If you have a multi-store company, some jurisdictions may tax things differently, yet you may have a common product master file and pricing might be the same across stores, but taxes would vary.
I'd say depends on the amount of columns. If you're only talking about 3 columns then i'd add them into the table and leave them empty if not used.
If however you want to expand this further and further, create a single table to hold the value, description etc.
Then maybe have a lookup table called unitGST and unitHST etc. Each of these tables contains the unit id and the id to your single table holding the value.