Data Warehouse Design/Modeling (based on Figure in Data Mining textbook)

Data Warehouse Design/Modeling (based on Figure in Data Mining textbook) - sql

I found a schema in Google Images (see below) that can illustrate a problem I having in my data warehouse design:
My design is different, but this is the simplest figure I could find to convey my question, which is given the figure, I'm wondering how could the schema accommodate the following scenario: if a product had a unique number assigned to it by the SalesOrg (salesOrg_product_number)...For example, a salesOrg sells food items and assigns all food items of the same kind the same unique salesOrg_product_number. A different salesOrg would have a different salesOrg_product_number for that type of product.
I'm inclined to place the salesOrg_product_number attribute in the Product dimension table, but part of me thinks it should be in the salesOrg dimension table instead. I'm wondering which one of these is correct way in a data warehouse (not relational db) design to maintain the star schema?

In a perfect world the Primary Keys of a dimension table should be just surrogate key, without any meaning for the business. Table IDs should be invisible for the final users, but business code should be of course available.
A possible solution would be to have a product table with a structure like:
Product_id
Product_desc
Product_SO1_number
Product_SO2_number
...
Of course this will require to show the correct field to the correct Sales Organization. Depending on your reporting tool this can be more or less difficult. For example if you write your query manually you need just to put the right column in your select.
Another possibility would be to have a product/sales_org table, a table which combine the Product and the Sales_Org one:
Product_Sales_Org_id
Product_id
Sales_Org_id
Product_SO_number
...
This table will be child of the two dimension table and on the fact table you will have Product_Sales_Org_id column. Depending on Product and Sales Organization the Product_SO_number will return the correct number per SO.
If you want to have this in a star schema structure you can put Product/Sales_Org/Product_Sales_Org together in only one table like:
Product_Sales_Org_id
Product_id
Sales_Org_id
Product_desc
Sales_Org_desc
Product_SO_number
...
Sincerely I would go for the second solution, keep the Product and the Sales_Org tables separated, because they are two different business entities and implement the relationship table in the middle.
I hope this helps.

Related

SQL schema design: two tables vs adding columns to the same table

This question is about design decision, hence might be a bit opinionated:
Imagine you are designing database for a car dealer where they ONLY auction cars. Some cars are for display only, and some cars are to be sold in auction.
I have a Car entity with 10 attributes: ID, Model, Mode, YearMade, IsDisplayOnly....
Now, I want to add selling price and selling notes to those cars that are for sale (i.e. IsDisplayOnly = false)
I image that there are two ways this can be done:
Add Price and PriceNotes columns into the Car table, knowing that they are always null for IsDisplayOnly = true cars, and those that haven't been sold at auction yet.
Add a new table SaleInfo with 3 columns: CarID, Price, PriceNotes where CarID is the PK and also FK pointing to the ID column in the Car table.
Which option would align most with the best schema design practice? Why?

You should have one car for cars and the attributes of cars. You should have a separate table for the cars for auction.
Why? These are different entities. Your problem definition suggests an auction table. That auction table should have a foreign key references to the cars that are available for auction. A separate table ensures that that foreign key reference is valid.
There are some other reasons that are not apparent in your simplified example. Notes and prices might change over time, so they should be going into a history table. Display cars have other attributes, like the period of time when they are on display and how they are ultimately disposed of. This suggests that they too have particular attributes.

My advice would be to use three tables:
-The first to store all the makes and models of the cars. As well as their costs(eg Honda something or other selling for X amount of money)
-The second to store the details of the individual vehicles, containing a foreign key to the primary key of one of the Make/Model stored in the first table, as well as individual details such as the color, VIN no. etc. As well as whether they can be sold or not.
-The third table would contain the details of each individual purchases, linked to the table containing each individual vehicle, this would be linked to the table containing the details of each individual vehicle, with each purchase connected to a single instance. On the table of vehicles.
The advantages for this layout is that you are actually going to end up using less storage space in the long run, as instead of having the same three fields (The make, model and year) repeating for every vehicle, you will only have a single field to represent that data instead of multiple redundant fields. Another advantage will be searching, as if you are searching for details of individual vehicles of the same brand/type, you will be able to search using only one field, the key linked to the table containing the make and model. This would drastically decrease search times and improve the effectiveness of the system overall.

Product Table Linking Different Types

I have a problem, I am designing a database which will store different products and each product may have different details.
As an example it will need to store books with multiple authors and store software with different types of descriptions.
This is my current design:
Product_table
|ID|TYPE|COMPANY|
|1|1|1|
attr_table
|ID|NAME|
|1|ISBN10|
|2|ISBN13|
|3|Title|
|4|Author|
details_table
|ID|attr_id|value
|1|3|Book of adventures|
Connector_table
|id|pro_id|detail_id|
|1|1|1|
So the product table would only store the main product id, the company it belongs to and the type of product it is.
Then I would have the attribute table which lists each attribute a product could have, this will make it easier to add new types of products.
The details table will the hold all the values such as different authors, titles isbn10s etc.
And then the connector table would connect the product table and the details table.
My main worry is that the details table will get very large and will be storing lots of different data types.
What i would like would be to split up all of the different types into tables such as ISBN table and author tables.
If this is the case how could i link these tables up to the attr_table
Any help would be greatly appreciated.

Don't bother. You do not say what database you are using, but any reasonable database will be able to handle the details table. Databases are designed to handle big tables efficiently.
If it is really big, you might want to consider partitioning the table by some sort of theme.
Otherwise, just be sure that you have an index on the id in the table and probably on the attr_id as well. The structure should work fine.

SQL Single Column Table

I have a model where a Shipment can have many Products through a product_shipments lookup table.
The Shipment model also has a relationship where a shipment can have one to many bill_of_lading numbers.
Is it proper to create a new table for just one bill_of_lading field ? Most cases a shipment will have only one bill_of_lading number, but often a shipment will contain 2 or more.
There are no other attributes for a bill_of_lading that need to be tracked other than just the number.
What is the proper way to handle this case?
Normalization rules would suggest pulling this out into its own table , correct ?

Yes, I would suggest you to do the same thing as you did with Shipment and Products, i.e. create new table. It is a proper way of doing it and easier for you to query later on and even altering your table structure should you need to.

Well, it wouldn't have one column, it would have two... a shipment ID and a bill_of_lading number. It could be correct to have a table with just those two fields. It could also be correct to create a field with bill_of_lading, shipment, and product if you can explicitly divide the products into individual bills of lading. It depends really on the relationship between bill_of_lading and product.

Normalize SQL database

I'm creating a database for a project and I'm a little confused about how normalization applies to my schema. Everytime a loan is aproved for a customer, they have 2 options a check or an EFT, so I want to know wheter the loan was a check or EFT.
This are my 3 tables:
Loans
id_loan (PK)
product
amount
status
Checks
id_check (PK)
id_customer
amount
EFT
id_eft (PK)
id_customer
amount
Then I created a 4th table to establish a relationship between loans and money disposal.
Disposal
id_payment (PK)
id_loan (FK loans)
id_disposal (FK checks or EFT)
disposal_type
In this table I store whether the loan is related to a check or an EFT, disposal_type field is a varchar with two possible values "check" or "EFT". id_disposal field acts as a foreign key for two tables.
The problem is that I think my database isn't normalized with this structure, am I right? What would be the best way to solve this?

You need something like the attached. Note that the customer_loans table is kind of extraneous and overkill, but if there's any columns that relate to the customer and the loan, and not the customer's loan payments, that's where it would go.

In the object world, you'd use inheritance for this. There would be a base type Disposal which CheckDisposal and EftDisposal would derive from. Modern O/RMs support several techniques for mapping this to a relational structure.
TablePerHierarchy puts all of the records into a single table with a discriminator column to identify what type a specific record holds and maps to. The advantage is that it requires fewer joins to get a record. Disadvantage is that it requires app logic to enforce data integrity.
TablePerType maps records into different tables with a fk relationship back to the base table. Of course this requires more joins (especially for deep or wide hierarchies) but data integrity can be enforced in the DB.

Doubt regarding a database design

I have a doubt regarding a database design, suppose a finance/stock software
in the software, the user will be able to create orders,
those orders may contain company products or third-party products
typical product table:
PRIMARY KEY INT productId
KEY INT productcatId
KEY INT supplierId
VARCHAR(20) name
TEXT description
...
but i also need some more details in the company products like:
INT instock
DATETIME laststockupdate
...
The question is, how should i store the data?
I'm thinking in 2 options:
1 -
Have both company and third-party, products in a single table,
some columns will not be used by third-party products
identify the company products are identified by a supplier id
2 -
Have the company products and third-party in separated tables
3 - [new, thanks RibaldEddie]
Have a single product table,
company products have additional info in a separated table
Thanks in advance!

You didn't mention anything about needing to store separate bits of Vendor information, just that a type of product has extra information. So, you could have one products table and an InHouseProductDetails table that has a productId foreign key back to the products table that stores the company specific information. Then when you run your queries you can join the products table to the details table.
The benefit is that you don't have to have NULLable columns in the products table, so your data is safer from corruption and you don't have to store the products themselves in two separate tables.
Oooo go with 3! 3 is the best!

To be honest, I think the choice of #1 or #2 are completely dependent upon some other factors (I can only thing of 2 at the moment):
How much data is expected (affecting speed of queries)
Is scalability going to be a concern anywhere in the near future (I'd guess within 5 years)
If you did go with a single table for all inventory, then later decided to split them, you can. You suggested a supplier identifier of some sort. List suppliers in a table (your company included) with keys to your inventory. Then it really won't matter.
As far as UNION goes, it's been a while since I've written raw Sql - so I'm not sure if UNION is the correct syntax. However, I do know that you can pull data from multiple tables. Actually just found this: Retrieving Data from Multiple Tables with Sql Joins

I agree with RibaldEddie. Just one thing to add: put a unique constraint on that foreign key in your InHouseProductDetails table. That'll enforce that it's a one-to-one relationship between the two tables, so you don't accidently end up with two InHouseProductDetails records for one product (maybe from some dataload gone awry or something)
Constraints are like defensive driving; they help prevent the unexpected...

I would advice on using point #1. What happens when another supplier comes along? It's also more easy to extend on one product table/produst class.

Take into account the testing of your application also. Having all data in one table raises the possible requirement of testing both the 3rd Party & Company elements of your app for any change to either.
If you're happy that your Unit test would cover this off its not so much of a worry... if you're relying on a human tester then it becomes more of an issue when sizing the impact of changes.
Personally I'd go for the one products table with common details and separate tables for the 3rd party & Company specifics.

one table for products with a foreign key to the Vendor table; include your own company in the Vendor table
the Stock table can then be used to store information about stock levels for any product, not just yours
Note that you need the Stock table anyway, this just make the DB model more company-agnostic - so if you ever need to store stock level information about third-party products, there's no DB change required

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Data Warehouse Design/Modeling (based on Figure in Data Mining textbook) - sql

Related

SQL schema design: two tables vs adding columns to the same table

Product Table Linking Different Types

SQL Single Column Table

Normalize SQL database

Doubt regarding a database design

Categories

Resources