Entity relation (normal forms)

Entity relation (normal forms) - sql

Let's assume we have table USERS like:
..ID..username
user1
user2
user3
Users can have Bills (user -> have_many -> bill relation).
Table BILLS like:
..ID..user_id
1
2
2
Also we have Products so every Product can be associated to ONLY ONE bill (product -> has_one -> bill relation).
Table PRODUCTS like:
..ID..bill_id
2
3
1
So, as you can see, our user can have lot of products (through bills).
My question:
Would it be correct DUE TO Database normalization to add second foreign key to PRODUCTS table named user_id to quickly select all user's Products from PRODUCTS table, or it's not correct and I should use JOIN statement to select all User's Products?
P.S. Sorry for dirty tables drawing )

I would rather go with the normalized view (where you DO NOT have a user_id in the products table).
The only time I would ever consider this, as a last option, is if the performance REALY requires it.

It is usually better to normalize data so as not to duplicate information and retain data consistency.
However there are exceptions required in real life systems, often for performance reasons when dealing with huge volumes of data.

Related

Cognos Analytics join multiple tables

I am working in Cognos Analytics 11.1.7.
I have a data module with three tables. Table 1 and 2 contain all the transactions we do, where table 1 contains the remitter's part of the transaction and table 2 contains the beneficiary's part of the transaction. I.e. one transaction is divided into two tables. Table 3 holds account numbers.
I want to create a report that shows the remitter's customer ID and account number. However, in some cases, the the remitter's account is missing. These customer ID's have unique customer ID's (Y instead of X). In those cases, I want the beneficiary's customer ID and account number. Consider the following three tables
Table 1: REMITTER
CUSTOMER_ID
ORDER_NR
1X
123
2Y
456
1X
789
Table 2: BENEFICIARY
CUSTOMER_ID
ORDER_NR
4X
123
6X
456
6X
789
Table 3: ACCOUNTS
CUSTOMER_ID
ACCOUNT_NR
1X
1111
2Y
3X
3333
4X
4444
5X
5555
6X
6666
What I want is basically the following report:
REPORT OF ALL TRANSACTIONS TODAY
CUSTOMER_ID
ACCOUNT_NR
ORDER_NR
1
1111
123
6
6666
456
1
1111
789
I have solved the CUSTOMER_ID column with a switch case:
CASE
WHEN REMITTER.CUSTOMER_ID CONTAINS 'Y'
THEN BENEFICIARY.CUSTOMER_ID
ELSE REMITTER.CUSTOMER_ID
END
Now here's the problem, I can't create a join (relationship) between the column created above and the ACCOUNTS table since my own column lies directly "under" the data module (on the same level as the tables in the index list to the left). However, if I create a column "under" REMITTER table, I can't use the case calculation from above. Cognos gives med the following error:
The expression is not valid.
XQE-MSR-0008 In module "STACKOVERFLOW", the following query
subjects are not joined: "REMITTER", "BENEFICIARY".
I have tried to circumvent the error by creating all kinds of joins between REMITTER and BENEFICIARY on ORDER_NR but Cognos keeps giving me this error.
I have also tried to make a "triangle" of joins, where REMITTER and BENEFICIARY are joined on ORDER_NR, REMITTER and ACCOUNTS are joined on CUSTOMER_ID and BENEFICIARY and ACCOUNTS are joined on CUSTOMER_ID. This doesn't work. However, when I delete either the REMITTER/ACCOUNT or BENEFICIARY/ACCOUNTS join, it works with the table I keep joined.
I am slowly losing my sanity here. Thanks!

What is the nature of the relationships between these entities?
That is a question which you should ask for everything in your model.
The pattern of that relationship drives the relationship between the objects in the model, which in turn drives what decisions you need to make in your modelling.
For example, is this a Bridge table situation? If so, you need to be aware of it so you can model appropriately.
In the end it falls back on Kimball:
Identify the facts
Identify the dimensions
I am assuming that the cardinality is beneficiary to remitter to account or
remitter to beneficiary to account.
Put beneficiary and remitter into a view in the module, create a relationship between it and account, and delete the relationship between the middle table and account (so that the SQL will use the relationship which you created ).
I think putting the calculation into the table which is in the middle would also do the trick.
I can not say that I can map between your described 'triangle' of joined tables and a business purpose so I could not use that information to understand the entity relationship. Such a pattern of relationship is specifically identified as one to be identified and, as part of the Cognos proven practices, corrected. Because I can not identify if there truly is a business purpose to have such a triangle or not, I can not, and will not, describe the appropriate modelling actions as they are dependent on the business purpose of the relationships between the entities, which takes us back to St. Ralph.

Use queries.
In a report, create a query to resolve the remitter/beneficiary problem, then join that to another query that gets the accounts data.
If this functionality must be canned -- because many unknown report developers will use it to produce many unknown reports -- you can still do the same thing, but in a data set.
As C'est Moi alludes to, you can also do this in the data module by joining Remitter and Beneficiary into a table that computes the Customer_Id.

I am making a SQL database of categories and subcategories. What is the best way to link these tables?

The database has a table called "categories" with columns CATEGORY_ID(primary key) and CATEGORY_NAME.
I have subcategories for each category.
For better accessing which is the best method from the below methods.
Method 1: The "CATEGORY_ID" column in the "categories" table is a FOREIGN KEY in the "subcategories " table.
Method 2: Maintaining a separate table for each category representing the subcategories.

I prefer to use same table for category and sub category
like
Table Categories
[CATEGORY_ID, CATEGORY_NAME, PARENT_CATEGORY_ID]
In case you don't know how many sub categories are there.

This scenario is just an example, the scenario is as follows: We have a product table where all the records of products are stored. Same way we will have customer table where records of customers are stored. The daily sales keep the record of all the sales. This sales table will keep record of which product who has purchased. So linking is to be done from Sales table to product table and customer table.
The query to link the two tables is as follows:SELECT product_name, customer.name, date_of_sale FROM sales, product, customer WHERE product.product_id = sales.product_id and customer.customer_id >= sales.customer_id LIMIT 0, 30

It is better to go with Method 1 since it is more scalable.
Let me elaborate on this. If we go with method 1, we need to maintain 2 tables only that is Categories and Subcategories. In future if we have new categories or subcategories we can directly deal with this 2 tables.
If we consider same situation with Method2 then we need to create new tables every time, this may become maintenance overhead.

Let me be a bit more direct. You explain in a comment that Method 2 is a separate table for each category. If so, then Method 2 -- in general -- is just wrong.
There are two methods for storing this type of information. One is a Categories table with a (single) Subcategories table. The Subcategories table would have CategoryId, a foreign key reference back to Categories. This is the normalized data model.
The second method is to store everything in one table. Each row would be a category/subcategory combination. Information about a given category would be duplicated across multiple rows, so this is not a normalized approach. However, this is a typical approach when doing dimensional modeling for decision support systems.
If the subcategories are just names of things, there is a third approach, which would be to store a list of the subcategories within each Category row. The list would not be a delimited string. It would be JSON, a nested table, XML, array, or similar collection data type supported by the database you are using. I am mentioning this as a possibility, but not recommending it.

Database table design for a customized scenario

I am trying to design a database that will store one (or more) meal plans for different people. There are 2 different scenarios in this question.
First Scenario:
(Please refer to image)
There are 4 tables: PERSON, MEALPLAN, MEALPLAN_FOOD, and FOOD. The PERSON table stores basic information for each individual owning a meal plan. The MEALPLAN table will keep track of each meal plan (diet). The FOOD table is table of different foods (e.g. egg, spinach, sweet potato, oatmeal, etc.) and stores the protein/carb/fat/calorie information for each 'quantity unit' (e.g. 1 cup) for each food item. The MEALPLAN_FOOD table serves as the lookup/associative table for MEALPLAN and FOOD.
I believe that I have these tables set up correctly. Each person has one (or more) meal plans. Each meal plan consists of one (or more) foods/qty. My question comes in the next scenario.
Second Scenario:
(Please refer to image)
The first scenario is limited in the fact that it only stores a list of food items per meal plan. In the second scenario, we would like to incorporate additional tables so that the meal plans can be broken down and stored by meal. Each person will one (or more) meal plans with between 1 and 4 meals per meal plan (meal plans may consists of only 2 or 3 meals). In order to accomplish this, the first scenario's MEALPLAN_FOOD lookup/associative table was replaced with 2 lookup/associative tables (MEALPLAN_M<#> and M<#>_FOOD) and a meal table (M1 for meal 1, M2, M3, and M4).
Question: Assuming that the design of the first scenario is correct, is the design of the second scenario correct in accomplishing the additional functionality? Is this design correct in additional lookup tables in order to allow for the storage of food items per meal per meal plan per person? Or, is there a better way/design for accomplishing this task?
Thank you!

Generally not a good idea to have a table per entity (i.e. a table per mealplan/meal of the day). Instead, alter your original design - add another column on the MEALPLAN table that foreign keys to a new table called MEALOFDAY. That way, you can add or remove meals of the day (allow a user to have 6 or 10 meals per day), and you can allow a meal plan to be shared between multiple meals if you add another many to many table for MEALPLAN_MEALOFDAY.
Your new table structure for MEALPLAN and MEALOFDAY migh tlook like:
MEALPLAN
========
mealplan_id (pk)
date
notes
person_id (fk)
mealofday_id (fk)
MEALOFDAY
=========
mealofday_id (pk)
meal_number
meal_description
Alternatively, get rid of that mealofday_id on MEALPLAN and create another table like so:
MEALPLAN_MEALOFDAY
=========
mealofday_id (fk/pk)
mealplan_id (fk/pk)

Fact Table Recommendation

I have a data mart which only needs to capture a serial number of a product, the date of the activity, and where the activity took place (which account).
There are five possible activities. The issue I have is this. Two of the activities take place at a warehouse level. The remaining three take place at the account-level (WH does not apply). Ultimately however every warehouse rolls up to a master account.
So if I had one fact table, I would essentially need two FK and you would have to traverse the fact table to build the WH > Account hierarchy which seems hard to maintain. I'd like one dimension table.
Or is it then recommended I split this into two fact tables, even though the only different characteristic of either table is whether the activity took place at the warehouse or not.
The goal of the reporting will be at the account level, but having the WH information may be useful at some point. And I need to check for duplicates, etc which is why I was leaning towards the first, but don't know how to appropriately handle the hierarchies.
Single Fact Table Design
Item: 1
Account: 14
Warehouse:2
ActivityType:3
Date: 20130204
SerialNumber:123456
Count:1
Dual Fact Table Design
Table 1
Item: 1
Warehouse:2
ActivityType:3
Date: 20130204
SerialNumber:123456
Count:1
Table 2
Item: 1
Account:2
ActivityType:3
Date: 20130204
SerialNumber:123456
Count:1

Ive interpreted you situation as:
ALL activities require an account
Some activities involve a
warehouse.
The selection of warehouse implies an account. the
accounts mentioned in the two point above are of the same type (there
is only 1 account dimension table)
In which case you should be OK with the single FACT table design:
[ACTIVITY_FACT]
SK (Optional, i find unique surrogate PKs useful)
ITEM_SK (Link to your ITEM_DIM table)
ACCOUNT_SK (Link to your ACCOUNT_DIM table)
WAREHOUSE_SK (Link to your WAREHOUSE_DIM table, -1 for no warehouse activities)
ACTIVITY_TYPE_SK (Link to your ACTIVITY_TYPE_DIM table)
ACTIVITY_DATE_SK (Link to your DATE_DIM table)
ITEM_SERIAL_NUMBER
ITEM_COUNT
Have a record in your WAREHOUSE dimension for NONE or NOT APPLICABLE and allocate it a nice obvious special condition SK value of -1 or -9 or whatever your shop is using for such things.
For activity records that reference a warehouse, put the appropriate warehouse sk AND the account sk that belong to that warehouse.
For activities that do not involve a warehouse, populate the warehouse sk with the NONE / NOT APPLICABLE warehouse dimension record and the appropriate Account SK.
Now your fact table can be joined to your Account and Warehouse dimension tables without having to worry about outer join or null condition handling. This should allow you and your users to play about with warehouse dimension data as required and your not having to faff about with managing two tables that contain essentially the same date.

A possibility is to define the hierarchy in a single dimension table. Guessing at what you’re dealing with, I came up with the following.
Outline of dimension table:
TABLE: Account
Account_ID <surrogate key>
Account <Account name, identifier>
Warehouse (Warehouse name, identifier)
Sample data:
Account_ID Account Warehouse
1 A n/a
2 B n/a
3 C n/a
4 W wh1
5 W wh2
6 Z wh3
7 Z n/a
Account_ID is just a surrogate key, having no intrinsic meaning or value
Account lists the accounts. Here, I shows five, A, B, C, W and Z. Select distinct to get the list of accounts; join to a fact table by Account_ID where Account = “W” gets all data for that account (for however many warehouses, if applicable).
Warehouse lists all warehouses and the account they are associated with; here, “W” is the account for two separate warehouses (wh1, wh2); Z is associated with warehouse wh3, but could also be used by a fact table with “no” warehouse. Join to a fact table by Account_ID where Warehouse = “wh1” gets all data for that warehouse.
Using this, with Account_ID in a fact table you could drill down for all entries for any given Account or for a specific warehouse (or for no warehouse, if there is value in that).
There are lots of variations and permutations possible with this kind of approach.

Improvement on database schema

I'm creating a small pet shop database for a project
The database needs to have a list of products by supplier that can be grouped by pet type or product category.
Each in store sale and customer order can have multiple products per order and an employee attached to them the customer order must be have a customer and employee must have a position,
http://imgur.com/2Mi7EIU

Here are some random thoughts
I often separate addresses from the thing that has an address. You could make 1-many relationships between Employee, Customer and Supplier to an address table. That would allow you to have different types of addresses per entity, and to change addresses without touching the original table.
If it is possible for prices to change for an item, you would need to account for that somehow. Ideas there are create a pricing table, or to capture the price on the sales item table.
I don't like the way you handle the sales item table. the different foreign keys based on the type of the transaction is not quite correct. An alternative would be to replace SalesItem SaleID and OrderId with the SalesRecordId... another better option would be to just merge the fields from InStoreSale, SalesRecord, and CustomerOrders into a single table and slap an indicator on the table to indicate which type of transaction it was.
You would probably try to be consistent with plurality on your tables. For example, CustomerOrders vs. CustomerOrder.
Putting PositionPay on the EmployeePosition table seems off to... Employees in the same position typically can have different pay.
Is the PetType structured with enough complexity? Can't you have items that apply to more than one pet type? For example, a fishtank can be used for fish or lizards? If so, you will need a many-to-many join table there.
Hope this helps!

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas