Normalize SQL database

Normalize SQL database - sql

I'm creating a database for a project and I'm a little confused about how normalization applies to my schema. Everytime a loan is aproved for a customer, they have 2 options a check or an EFT, so I want to know wheter the loan was a check or EFT.
This are my 3 tables:
Loans
id_loan (PK)
product
amount
status
Checks
id_check (PK)
id_customer
amount
EFT
id_eft (PK)
id_customer
amount
Then I created a 4th table to establish a relationship between loans and money disposal.
Disposal
id_payment (PK)
id_loan (FK loans)
id_disposal (FK checks or EFT)
disposal_type
In this table I store whether the loan is related to a check or an EFT, disposal_type field is a varchar with two possible values "check" or "EFT". id_disposal field acts as a foreign key for two tables.
The problem is that I think my database isn't normalized with this structure, am I right? What would be the best way to solve this?

You need something like the attached. Note that the customer_loans table is kind of extraneous and overkill, but if there's any columns that relate to the customer and the loan, and not the customer's loan payments, that's where it would go.

In the object world, you'd use inheritance for this. There would be a base type Disposal which CheckDisposal and EftDisposal would derive from. Modern O/RMs support several techniques for mapping this to a relational structure.
TablePerHierarchy puts all of the records into a single table with a discriminator column to identify what type a specific record holds and maps to. The advantage is that it requires fewer joins to get a record. Disadvantage is that it requires app logic to enforce data integrity.
TablePerType maps records into different tables with a fk relationship back to the base table. Of course this requires more joins (especially for deep or wide hierarchies) but data integrity can be enforced in the DB.

Related

Determining if an entity is weak or not

I'm creating a relational database of a store and its stock of products.
In the brief, it says "products can be returned under agreed terms e.g. expiry date or manufacturers error", based on this I created a weak entity "Terms" with product_ID as the foreign key and errors & expiry as two attributes.
My logic was that the terms only exist if the product exists, therefore it is a weak attribute as every product has terms, but you wouldn't have terms not associated with a product.
Looking at it though, the "Terms" table would basically be Product ID (1) ---> Errors (No) ---> Expiry (01/01/23), and now I'm starting to think those two attributes should be attributes of the product table and not a separate entity, mainly because "Terms" doesn't have a partial/discriminator key that could be used as a composite primary.
Does anyone have any thoughts about which way is correct?

I think this answer really comes down to the trade-offs in terms of performance.
To make sure I understand your question correctly - you basically have two tables:
The main product table
A "lookup" table that just has Product_ID (FK), Errors, and Expiry as the columns
If this is the case, you have two options:
Just add Errors and Expiry as columns to the primary product table
Keep the two tables separated as you have them, and just JOIN that data when needed.
Option 1 has the benefit of keeping all the data in one table, assuming that "Expiry" and "Errors" are unique to the product_ID; if they're not, you may end up duplicating data, and it's better to keep these fields in your separate table to have a 1:Many relationship. The other drawback would be that if your main Product table is beefy, you've slowed down the query even further by adding these columns.
Option 2 can circumvent the two shortcomings of Option 1 - by keeping this data separate, your Product table is much lighter, and if you have a 1:many relationship, you don't duplicate data (saving you more memory overall!). The drawback with Option 2 is that your EDR gets a bit more complicated - you have one more table to keep track of.
Based on these, I recommend keeping your separate "lookup" table - the benefits of separating this data out will help you in the long run - but ultimately you'll need to weight the pros and cons since I don't know the extent of your project.

Relational Database Design: Conditionals

I'm designing a relational database that I plan to implement with SQL. I have a use case that I'm working on and seem to be having a bit of trouble thinking through the solution. The design is for an e-commerce order system.
Use Case:
The ORDER_DETAILS table contains a deliveryMethod attribute. I then have a SHIPPING_DETAILS table that contains address information and a PICKUP_DETAILS table that contains location, date, and time information for an in-person pickup. When a user places an order, they have the option to have their order shipped to their address or to pick up their order in person. My current thought is to have a shippingId foreign key and pickupId foreign key in the ORDER_DETAILS table. Then, basically run a conditional check on the deliveryMethod attribute and retrieve data from the appropriate table depending on the value of that attribute (either "shipping" or "pickup"). With this thought, however, I would be allowing for null values to be present in the ORDER_DETAILS for either the shippingId or the pickupId attributes. From my understanding, null values are viewed negatively in relational designs. So I'm looking for some feedback on this design. Is this okay? Am I overthinking the nulls? Is there a more efficient way to design this particular schema?

If I understand your problem correctly,
The cardinality of the relationship of ORDER to SHIPPING is 1 ---> (0, 1)
The cardinality of the relationship of ORDER to PICKUP is 1 ---> (0, 1)
An ORDER MUST have either a SHIPPING or a PICKUP, but not both.
To enforce the constraint (#3) you could define a functional constraint in the database. That gets into interesting stuff.
Anyway, like you say, you could make columns in ORDER that are FKs to the SHIPPING or PICKUP tables, but both of those are nullable. I don't think null FKs are evil or anything, but they do get messy especially if you had a whole bunch of delivery methods and not just two.
If you don't like the nulls, you could have separate association tables: (1) ORDER_DELIVERY that has just an order_id and an delivery_id, each are FKs to the respective tables, and (2) ORDER_PICKUP, also a two column table. In each case the primary key would be order_id. Now there are no nulls: the orders with delivery are in the ORDER_DELIVERY table and the orders with pickup are in ORDER_PICKUP.
Of course there's a tradeoff, as maintaining the constraint that there be exactly one and only one delivery method is not a consistency check across tables.
Another idea is to make the delivery and pickup details be JSON fields. Here you are doing more work on the application side, enforcing constraints programmatically, but you won't have nulls.
I wish I could say that there was a slam-dunk go-to design pattern here, but I don't see one. Personally with only two types of delivery methods, I would not shy from having nulls (as I'm not a purist). But I do love it when the database does the work, so....
(Oh, the answer to the question "are you over thinking things?" is no, this thinking is really good!)

Data Warehouse Design/Modeling (based on Figure in Data Mining textbook)

I found a schema in Google Images (see below) that can illustrate a problem I having in my data warehouse design:
My design is different, but this is the simplest figure I could find to convey my question, which is given the figure, I'm wondering how could the schema accommodate the following scenario: if a product had a unique number assigned to it by the SalesOrg (salesOrg_product_number)...For example, a salesOrg sells food items and assigns all food items of the same kind the same unique salesOrg_product_number. A different salesOrg would have a different salesOrg_product_number for that type of product.
I'm inclined to place the salesOrg_product_number attribute in the Product dimension table, but part of me thinks it should be in the salesOrg dimension table instead. I'm wondering which one of these is correct way in a data warehouse (not relational db) design to maintain the star schema?

In a perfect world the Primary Keys of a dimension table should be just surrogate key, without any meaning for the business. Table IDs should be invisible for the final users, but business code should be of course available.
A possible solution would be to have a product table with a structure like:
Product_id
Product_desc
Product_SO1_number
Product_SO2_number
...
Of course this will require to show the correct field to the correct Sales Organization. Depending on your reporting tool this can be more or less difficult. For example if you write your query manually you need just to put the right column in your select.
Another possibility would be to have a product/sales_org table, a table which combine the Product and the Sales_Org one:
Product_Sales_Org_id
Product_id
Sales_Org_id
Product_SO_number
...
This table will be child of the two dimension table and on the fact table you will have Product_Sales_Org_id column. Depending on Product and Sales Organization the Product_SO_number will return the correct number per SO.
If you want to have this in a star schema structure you can put Product/Sales_Org/Product_Sales_Org together in only one table like:
Product_Sales_Org_id
Product_id
Sales_Org_id
Product_desc
Sales_Org_desc
Product_SO_number
...
Sincerely I would go for the second solution, keep the Product and the Sales_Org tables separated, because they are two different business entities and implement the relationship table in the middle.
I hope this helps.

SSAS - Dimension and Fact tables historical data - Mapping fact table with dimension table

i have designed places related warehouse tables - DimPlaces, FactPlaces, DimGeography. It is straightforward design if you see. All the locations is in DimPlaces (Addrline1, Addrline2,placename,etc) and geography hierarchy is in DimGeography (City, State, Country, PostCode). FactPlaces is table which has got foriegn keys to DimPlaces and DimGeography.
I would like to maintain historical data as there are chances that places names or their properties might change and at the same time if the location of a place changes then geographic hierarchy key changes.
I have found design pattern -
Another useful design pattern is to add the durable account key to the fact table in addition to the dimension’s surrogate key. This joins back to the current rows in the dimension to make it easier to report all of history by the current dimension attributes.
Could you please suggest is this OK to follow this solution? If yes, do i need to use KEY of type UNIQUEIDENTIFIER for a unique value?
Another question on this - I have employees data (DimEmployee and FactEmployee). Each employee is associated with the places where he works. How to connect These EMPLOYEE TABLES with the PLACES TABLES. Do I need to connect FACTEMPLOYEE WITH FACTPLACES?

I think in the first instance, they're referring to business keys? So if your dimension table has two rows, surrogate key 1 & 2, but they both refer to the same thing, so both have AccountId/ProductId/WhateverId of 1, then you will have some fact table rows with surrogate key 1 and business key 1, and later ones with surrogate key 2 and business key 1.
Uniqueidentifiers are very wide, try and avoid using them on fact tables and for joins if possible.
For your last question - That's really more a reporting thing. Do you need to do that? Is that what people need to see, do they need to slice by that? You could consider a referenced dimension - Where the places table links to the fact tables via a placeId on the employees dimension. Or, you could have a factemployees table with start and stop dates. It depends on what you need to achieve.

Doubt regarding a database design

I have a doubt regarding a database design, suppose a finance/stock software
in the software, the user will be able to create orders,
those orders may contain company products or third-party products
typical product table:
PRIMARY KEY INT productId
KEY INT productcatId
KEY INT supplierId
VARCHAR(20) name
TEXT description
...
but i also need some more details in the company products like:
INT instock
DATETIME laststockupdate
...
The question is, how should i store the data?
I'm thinking in 2 options:
1 -
Have both company and third-party, products in a single table,
some columns will not be used by third-party products
identify the company products are identified by a supplier id
2 -
Have the company products and third-party in separated tables
3 - [new, thanks RibaldEddie]
Have a single product table,
company products have additional info in a separated table
Thanks in advance!

You didn't mention anything about needing to store separate bits of Vendor information, just that a type of product has extra information. So, you could have one products table and an InHouseProductDetails table that has a productId foreign key back to the products table that stores the company specific information. Then when you run your queries you can join the products table to the details table.
The benefit is that you don't have to have NULLable columns in the products table, so your data is safer from corruption and you don't have to store the products themselves in two separate tables.
Oooo go with 3! 3 is the best!

To be honest, I think the choice of #1 or #2 are completely dependent upon some other factors (I can only thing of 2 at the moment):
How much data is expected (affecting speed of queries)
Is scalability going to be a concern anywhere in the near future (I'd guess within 5 years)
If you did go with a single table for all inventory, then later decided to split them, you can. You suggested a supplier identifier of some sort. List suppliers in a table (your company included) with keys to your inventory. Then it really won't matter.
As far as UNION goes, it's been a while since I've written raw Sql - so I'm not sure if UNION is the correct syntax. However, I do know that you can pull data from multiple tables. Actually just found this: Retrieving Data from Multiple Tables with Sql Joins

I agree with RibaldEddie. Just one thing to add: put a unique constraint on that foreign key in your InHouseProductDetails table. That'll enforce that it's a one-to-one relationship between the two tables, so you don't accidently end up with two InHouseProductDetails records for one product (maybe from some dataload gone awry or something)
Constraints are like defensive driving; they help prevent the unexpected...

I would advice on using point #1. What happens when another supplier comes along? It's also more easy to extend on one product table/produst class.

Take into account the testing of your application also. Having all data in one table raises the possible requirement of testing both the 3rd Party & Company elements of your app for any change to either.
If you're happy that your Unit test would cover this off its not so much of a worry... if you're relying on a human tester then it becomes more of an issue when sizing the impact of changes.
Personally I'd go for the one products table with common details and separate tables for the 3rd party & Company specifics.

one table for products with a foreign key to the Vendor table; include your own company in the Vendor table
the Stock table can then be used to store information about stock levels for any product, not just yours
Note that you need the Stock table anyway, this just make the DB model more company-agnostic - so if you ever need to store stock level information about third-party products, there's no DB change required

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas