Avoiding nullable FK - Database Design

Avoiding nullable FK - Database Design - sql

I am trying to figure out how to best represent my data in my database with preferably no null FK(sql server).
I have a site where a user has to buy a membership(monthly subscription), can buy ads, buy credits to do more stuff on the site.
So I was thinking something like having these tables
Company
-normal company columns
Plan
- Limitations (json, that contains all the limitations of what membership allows, ie can do x amount of seraches)
- Name (ie Membership lv 1)
- Price
- Qty
- Unit (Monthly, Credit, etc)
PlanTypes
- Type (membership, addon, ad)
CompanyPlans (junction table)
- PlanId
- CompanyId
- Limitations (Json) - this would store like when their membership expires or how many credits they have left. If they would extend membership or buy credits the rows would be updated, so there would be business rules to basically make sure 1 row per plan per company, though this would not work really for ads as they can buy more than 1 ad.
Ads
- normal add columns
- Start
- End
So my problem I run into is, that the Plan table is the table that keeps track of if their membership subscription, credits which I think is fine.
However when it gets to ads it gets weird, as right now I was planning to put the relationship with Company. So now all of sudden for ads the checking of if they are still active or not is done in the ads table where everything else is in the CompanyPlan table.
To make it worse does the start and end date of the ad still get duplicated in the CompanyPlan table for consistency?
I really don't want to try to break up the Plan table into something like Subscription Table, Addon table and Ad Table as I am planning to an order history & order history line table that links to the product bought and I don't want to have 3 different relationships to the Order history line table and have always 2 of FK relationships null, as I think that is bad as well.
Another option I was thinking but again I am not sure if it is bad practice is to put the Ad relationship on the CompanyPlan Junction Table and keep the start/end date in the Company Table and other info in the Ad table.
Example Data
Company
Id Name
1 My Compnay
PlanTypes
Id Type
1 Membership
2 Addon
3 Ad
Plans
Id PlanTypeId Limitations Name Price Value Unit
1 1 {Searches: 100} Plan 1 $30 1 Month
2 2 {} Extra Searches $10 100 Credits
3 3 {} Ad1 $100 1 Month
CompanyPlan
Id PlanId Limitations CompnyId
1 1 {Start: "2018-01-01", End: {2018-02-01}, 1
2 2 {ExtraSearches: 100}, 1
3 3 {AdStart: "2018-01-01", AdEnd: {2018-02-01}, 1
Ads
Id CompanyId Start End
1 1 2018-01-01 2018-02-01
Order History
Id CompanyId
1 1
Order Lines
Id OrderHistoryId PlanId
1 1 1
2 1 2
3 1 3

For the start like The Impaler already mentioned in his comments, nullable FK aren't a bad thing if used correctly. I would design your tables as follow:
Company
-Id
-Name
Plan
-Id
-Name
-Details
Addon
-Id
-Value
DefaultPlanAddon
-Id
-Plan_Id
-Addon_Id
Subscription
-Id
-Company_Id
-Plan_Id
-Start
-End
SubscriptionAddons
-Id
-Subscription_Id
-Addon_Id
Advertisement
-Id
-Subscription_Id
-Content
-Start
-End
Most of the tables are pretty straight forward I guess.
Company
All data to indentify the company/customer.
Plan
Basic details about a plan.
Addon
Here it gets a bit more interesting. In this table are all addons saved. In my understanding addons are useable for all different plans. If this is not the case, you could add something like a PlanAddons table that holds the information which plans can have which addons. Value contains what this addon is about. Maybe about 100 extra adds or something other. As for what I know you can use a simple string as the type since a addon is basically an Id and a description of what this addon is.
Subscription
I would seperate make a subscription table. You can also save the start and end date for each subscription. With this you also have a pruchase history and dont need an OrderHistory table. If the start date does not equals the purchase date you can easily add a PurchaseDate column.
SubscriptionAddon
This table includes all addons for a specific subscription since one subcription can have many addons.
DefaultPlanAddon
This table stores the default addons for each plan. This informations are only used if a company buys a subscription. Your logic looks up what default addons a plan has and put int, together with other wanted addons, into the SubscriptionAddon table.
Advertisement
Here are all adds stored. Also with start and end. Like with a subscription you can use this table as well for the OrderHistrory.

After reading your case, this is what I think:
I think the advertisement table (ads) should store the begin/end date of the ad.
The relationship between company and ads is 1:n. So ads has a foreign key pointing to company.
The company table should not store the begin/end date of the ad, since the same company could buy more ads in the future. When it does, the new ad gets a begin/end date, without affecting the old ad, that may be obsolete at this point.
This should keep you model clean, without (dangerous) redundancy.

Related

Cognos Analytics join multiple tables

I am working in Cognos Analytics 11.1.7.
I have a data module with three tables. Table 1 and 2 contain all the transactions we do, where table 1 contains the remitter's part of the transaction and table 2 contains the beneficiary's part of the transaction. I.e. one transaction is divided into two tables. Table 3 holds account numbers.
I want to create a report that shows the remitter's customer ID and account number. However, in some cases, the the remitter's account is missing. These customer ID's have unique customer ID's (Y instead of X). In those cases, I want the beneficiary's customer ID and account number. Consider the following three tables
Table 1: REMITTER
CUSTOMER_ID
ORDER_NR
1X
123
2Y
456
1X
789
Table 2: BENEFICIARY
CUSTOMER_ID
ORDER_NR
4X
123
6X
456
6X
789
Table 3: ACCOUNTS
CUSTOMER_ID
ACCOUNT_NR
1X
1111
2Y
3X
3333
4X
4444
5X
5555
6X
6666
What I want is basically the following report:
REPORT OF ALL TRANSACTIONS TODAY
CUSTOMER_ID
ACCOUNT_NR
ORDER_NR
1
1111
123
6
6666
456
1
1111
789
I have solved the CUSTOMER_ID column with a switch case:
CASE
WHEN REMITTER.CUSTOMER_ID CONTAINS 'Y'
THEN BENEFICIARY.CUSTOMER_ID
ELSE REMITTER.CUSTOMER_ID
END
Now here's the problem, I can't create a join (relationship) between the column created above and the ACCOUNTS table since my own column lies directly "under" the data module (on the same level as the tables in the index list to the left). However, if I create a column "under" REMITTER table, I can't use the case calculation from above. Cognos gives med the following error:
The expression is not valid.
XQE-MSR-0008 In module "STACKOVERFLOW", the following query
subjects are not joined: "REMITTER", "BENEFICIARY".
I have tried to circumvent the error by creating all kinds of joins between REMITTER and BENEFICIARY on ORDER_NR but Cognos keeps giving me this error.
I have also tried to make a "triangle" of joins, where REMITTER and BENEFICIARY are joined on ORDER_NR, REMITTER and ACCOUNTS are joined on CUSTOMER_ID and BENEFICIARY and ACCOUNTS are joined on CUSTOMER_ID. This doesn't work. However, when I delete either the REMITTER/ACCOUNT or BENEFICIARY/ACCOUNTS join, it works with the table I keep joined.
I am slowly losing my sanity here. Thanks!

What is the nature of the relationships between these entities?
That is a question which you should ask for everything in your model.
The pattern of that relationship drives the relationship between the objects in the model, which in turn drives what decisions you need to make in your modelling.
For example, is this a Bridge table situation? If so, you need to be aware of it so you can model appropriately.
In the end it falls back on Kimball:
Identify the facts
Identify the dimensions
I am assuming that the cardinality is beneficiary to remitter to account or
remitter to beneficiary to account.
Put beneficiary and remitter into a view in the module, create a relationship between it and account, and delete the relationship between the middle table and account (so that the SQL will use the relationship which you created ).
I think putting the calculation into the table which is in the middle would also do the trick.
I can not say that I can map between your described 'triangle' of joined tables and a business purpose so I could not use that information to understand the entity relationship. Such a pattern of relationship is specifically identified as one to be identified and, as part of the Cognos proven practices, corrected. Because I can not identify if there truly is a business purpose to have such a triangle or not, I can not, and will not, describe the appropriate modelling actions as they are dependent on the business purpose of the relationships between the entities, which takes us back to St. Ralph.

Use queries.
In a report, create a query to resolve the remitter/beneficiary problem, then join that to another query that gets the accounts data.
If this functionality must be canned -- because many unknown report developers will use it to produce many unknown reports -- you can still do the same thing, but in a data set.
As C'est Moi alludes to, you can also do this in the data module by joining Remitter and Beneficiary into a table that computes the Customer_Id.

Does it follow best-practice DB design to mix staff and customer details in 1 table?

I have a table called Users which is currently holding data on both Customers and Staff. It has their names and emails and passwords etc. It also has a field called TypeOfUserID which holds a value to say what type of user they are .e.g Customer or Staff
Would it be better to have two separate tables: Customers and Staff?
It seems like duplication because the fields are the same for both types of user. The only field I can get rid of is the TypeOfUserID column.
However, having them both in one table called Users means that in my front-end application I have to keep adding a clause to check what type of user they are. If for any reason I need to allow a different type of user access e.g. External Supplier then I have to manage the addition of TypeOfUserID in multiple places in the WHERE clauses.

Short Answer:
It depends. If your current needs are met, and you don't foresee this model needing to be changed for a long time / it would be easy to change if you had to, stick with it.
Longer answer:
If staff members are just a special case of user, I don't see any reason you'd want to change anything about the database structure. Yes, for staff-specific stuff you'd need to be sure the person was staff, but I don't really see any way around that- you always have to know they're staff, first.
If, however, you want finer-grained permissions than binary (a person can belong to the 'staff' group but that doesn't necessarily say whether or not they're in the users' group, for instance), you might want to change the database.
The easiest way to do that, of course, would be to have a unique ID associated with each user, and use that key to look up their group permissions in a different table.
Something like:
uid | group
------------
1 | users
1 | staff
2 | users
3 | staff
4 | users
5 | admin
Although you may or may not want an actual string for each group; most likely you'd want another level of indirection by having a 'groups' table. So, that table above would be a
'group_membership' table, and it could look more like:
uid | gid
------------
1 | 1
1 | 2
2 | 1
3 | 2
4 | 1
5 | 3
To go along with it, you'd have the 'groups' table, which would be:
gid | group
-------------
1 | users
2 | staff
3 | admin
But, again, that's only if you're imagining a larger number of roles and you want more flexibility. If you only ever plan on having 'users' and 'staff' and staff are just highly privileged users, all of that extra stuff would be a waste of your time.
However, if you want really fine grained permissions, with maximum flexibility, you can use the above to make them happen via a 'permissions' table:
gid | can_create_user | can_fire_people | can_ban_user
-------------------------------------------------------
1 | false | false | false
2 | true | false | true
3 | true | true | true
Some Example Code
Here's a working PostgreSQL example of getting permissions can_create_user and can_fire_people for a user with uid 1:
SELECT bool_or(can_create_user) AS can_create_user,
bool_or(can_fire_people) AS can_fire_people
FROM permissions
WHERE gid IN (SELECT gid FROM group_membership WHERE uid = 1);
Which would return:
can_create_user | can_fire_people
----------------------------------
true | false
because user 1 is in groups 1 and 2, and group 2 has the can_create_user permission, but
neither group has the can_fire_people permission.
((I know you're using SQL Server, but I only have access to a PostgreSQL server at the moment. Sorry about that. The difference should be minor, though.)
Notes
You'll want to make sure that uid and gid are primary keys in the users and groups table, and that there are foreign key constraints for those values in every other table which uses them; you don't want nonexistent groups to have permissions, or nonexistent users to be accidentally added to groups.
Alternatively
A graph database solves this problem pretty elegantly; you'd simply create edges linking users to groups, and edges linking groups to permissions. If you want to work with a technology that's currently sexy / buzzword compliant, might want to give that a try, depending on how enormous of a change that'd be.
Further information
The phrase you'll want to google is "access control". You'll probably want to implement access control lists (as outlined above) or something similar. Since this is primarily a security-related topic, you might also want to ask this question on sec.se, or at least look around there for related answers.

Even they look similar, they are logically from different areas. You will never need a union between those tables. But as your application develops, you will need to add more and more specific fields for these tables and they will became more different than similar.

You could have a seperate table for staff holding only id from the user table as the foreign key. If you do that, then any functionality related only to the staff member can query the staff table joining to the user table. This solution will also give you the fexibility for the future extension as any data releted only to the staff (for example department they work) member can be placed in the staff table.

Fact Table Recommendation

I have a data mart which only needs to capture a serial number of a product, the date of the activity, and where the activity took place (which account).
There are five possible activities. The issue I have is this. Two of the activities take place at a warehouse level. The remaining three take place at the account-level (WH does not apply). Ultimately however every warehouse rolls up to a master account.
So if I had one fact table, I would essentially need two FK and you would have to traverse the fact table to build the WH > Account hierarchy which seems hard to maintain. I'd like one dimension table.
Or is it then recommended I split this into two fact tables, even though the only different characteristic of either table is whether the activity took place at the warehouse or not.
The goal of the reporting will be at the account level, but having the WH information may be useful at some point. And I need to check for duplicates, etc which is why I was leaning towards the first, but don't know how to appropriately handle the hierarchies.
Single Fact Table Design
Item: 1
Account: 14
Warehouse:2
ActivityType:3
Date: 20130204
SerialNumber:123456
Count:1
Dual Fact Table Design
Table 1
Item: 1
Warehouse:2
ActivityType:3
Date: 20130204
SerialNumber:123456
Count:1
Table 2
Item: 1
Account:2
ActivityType:3
Date: 20130204
SerialNumber:123456
Count:1

Ive interpreted you situation as:
ALL activities require an account
Some activities involve a
warehouse.
The selection of warehouse implies an account. the
accounts mentioned in the two point above are of the same type (there
is only 1 account dimension table)
In which case you should be OK with the single FACT table design:
[ACTIVITY_FACT]
SK (Optional, i find unique surrogate PKs useful)
ITEM_SK (Link to your ITEM_DIM table)
ACCOUNT_SK (Link to your ACCOUNT_DIM table)
WAREHOUSE_SK (Link to your WAREHOUSE_DIM table, -1 for no warehouse activities)
ACTIVITY_TYPE_SK (Link to your ACTIVITY_TYPE_DIM table)
ACTIVITY_DATE_SK (Link to your DATE_DIM table)
ITEM_SERIAL_NUMBER
ITEM_COUNT
Have a record in your WAREHOUSE dimension for NONE or NOT APPLICABLE and allocate it a nice obvious special condition SK value of -1 or -9 or whatever your shop is using for such things.
For activity records that reference a warehouse, put the appropriate warehouse sk AND the account sk that belong to that warehouse.
For activities that do not involve a warehouse, populate the warehouse sk with the NONE / NOT APPLICABLE warehouse dimension record and the appropriate Account SK.
Now your fact table can be joined to your Account and Warehouse dimension tables without having to worry about outer join or null condition handling. This should allow you and your users to play about with warehouse dimension data as required and your not having to faff about with managing two tables that contain essentially the same date.

A possibility is to define the hierarchy in a single dimension table. Guessing at what you’re dealing with, I came up with the following.
Outline of dimension table:
TABLE: Account
Account_ID <surrogate key>
Account <Account name, identifier>
Warehouse (Warehouse name, identifier)
Sample data:
Account_ID Account Warehouse
1 A n/a
2 B n/a
3 C n/a
4 W wh1
5 W wh2
6 Z wh3
7 Z n/a
Account_ID is just a surrogate key, having no intrinsic meaning or value
Account lists the accounts. Here, I shows five, A, B, C, W and Z. Select distinct to get the list of accounts; join to a fact table by Account_ID where Account = “W” gets all data for that account (for however many warehouses, if applicable).
Warehouse lists all warehouses and the account they are associated with; here, “W” is the account for two separate warehouses (wh1, wh2); Z is associated with warehouse wh3, but could also be used by a fact table with “no” warehouse. Join to a fact table by Account_ID where Warehouse = “wh1” gets all data for that warehouse.
Using this, with Account_ID in a fact table you could drill down for all entries for any given Account or for a specific warehouse (or for no warehouse, if there is value in that).
There are lots of variations and permutations possible with this kind of approach.

Entity relation (normal forms)

Let's assume we have table USERS like:
..ID..username
user1
user2
user3
Users can have Bills (user -> have_many -> bill relation).
Table BILLS like:
..ID..user_id
1
2
2
Also we have Products so every Product can be associated to ONLY ONE bill (product -> has_one -> bill relation).
Table PRODUCTS like:
..ID..bill_id
2
3
1
So, as you can see, our user can have lot of products (through bills).
My question:
Would it be correct DUE TO Database normalization to add second foreign key to PRODUCTS table named user_id to quickly select all user's Products from PRODUCTS table, or it's not correct and I should use JOIN statement to select all User's Products?
P.S. Sorry for dirty tables drawing )

I would rather go with the normalized view (where you DO NOT have a user_id in the products table).
The only time I would ever consider this, as a last option, is if the performance REALY requires it.

It is usually better to normalize data so as not to duplicate information and retain data consistency.
However there are exceptions required in real life systems, often for performance reasons when dealing with huge volumes of data.

SQL Table Design - Identity Columns

SQL Server 2008 Database Question.
I have 2 tables, for arguments sake called Customers and Users where a single Customer can have 1 to n Users. The Customers table generates a CustomerId which is a seeded identity with a +1 increment on it. What I'm after in the Users table is a compound key comprising the CustomerId and a sequence number such that in all cases, the first user has a sequence of 1 and subsequent users are added at x+1.
So the table looks like this...
CustomerId (PK, FK)
UserId (PK)
Name
...and if for example, Customer 485 had three customers the data would look like...
CustomerId | UserId | Name
----------
485 | 1 | John
485 | 2 | Mark
485 | 3 | Luke
I appreciate that I can manually add the 1,2,3,...,n entry for UserId however I would like to get this to happen automatically on row insert in SQL, so that in the example shown I could effectively insert rows with the CustomerId and the Name with SQL Server protecting the Identity etc. Is there a way to do this through the database design itself - when I set UserId as an identity it runs 1 to infinity across all customers which isn't what I am looking for - have I got a setting wrong somewhere, or is this not an option?
Hope that makes sense - thanks for your help

I can think of no automatic way to do this without implementing a custom Stored Procedure that inserted the rows and checked to increment the Id appropriately, althouh others with more knowledge may have a better idea.
However, this smells to me of naturalising a surrogate key - which is not always a good idea.
More info here:
http://www.agiledata.org/essays/keys.html

That's not really an option with a regular identity column, but you could set up an insert trigger to auto populate the user id though.
The naive way to do this would be to have the trigger select the max user id from the users table for the customer id on the inserted record, then add one to that. However, you'll run into concurrency problems there if more than one person is creating a user record at the same time.
A better solution would be to have a NextUserID column on the customers table. In your trigger you would:
Start a transaction.
Increment the NextUserID for the customer (locking the row).
Select the updated next user id.
use that for the new User record.
commit the transaction.
This should ensure that simultaneous additions of users don't result in the same user id being used more than once.
All that said, I would recommend that you just don't do it. It's more trouble than it's worth and just smells like a bad idea to begin with.

So you want a generated user_id field that increments within the confines of a customer_id.
I can't think of one database where that concept exists.
You could implement it with a trigger. But my question is: WHY?
Surrogate keys are supposed to not have any kind of meaning. Why would you try to make a key that, simultaneously, is the surrogate and implies order?
My suggestions:
Create a date_created field, defaulting to getDate(). That will allow you to know the order (time based) in which each user_id was created.
Create an ordinal field - which can be updated by a trigger, to support that order.
Hope that helps.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas