How should this database sub type relationship be modelled?

How should this database sub type relationship be modelled? - sql

I am revising a legacy multi-tenant application where the shopping cart function stores multiple vendors and multiple clients in the same database. Some clients of one vendor may be clients of a different vendor. Some vendors might actually be clients of another vendor.
I currently have a table for the super-type 'party' with primary key party_ID, a table for the subtype 'company' with primary key company_ID (references party_ID) and a table for the role of 'vendor' with primary key vendor_ID (references company_ID). I also have a junction table, 'client' with a composite primary key of vendor_ID and party_ID.
My question is how should the 'order' table reference the vendor and client tables? My first thought is that the table should have a composite primary key of vendor_ID, client_ID and order_ID (order_ID could be auto-increment across the table or sequential per vendor_ID + client_ID)
but this seemed a bit fishy as there were three attributes making up the key...
Does anyone have any insight into this topic? Most 'shopping carts' only deal with a single vendor, so the order table simply lists client_ID as a foreign key.
Thanks!

My question is how should the order table reference the vendor and
client tables? My first thought is that the table should have a
composite primary key of 'vendor_ID', 'client_ID' and 'order_ID' but
this seemed a bit fishy as there were three keys...
Composite primary key doesn't mean three keys. It means one key consisting of three columns.
But that's not the real issue.
An order is an accounting record; it must not change over time. Storing the ID numbers is risky unless you've built temporal tables, and I doubt you've done that. If a vendor changes its name today, its name no longer matches the name on earlier orders. You must not let that happen with accounting records.
Unless you mean something unusual by "order", I'd expect Order_id to be its primary key. There might be other constraints; there might even be other key constraints to prevent duplicate orders that differ only by Order_id. But I'd still expect Order_id to be the primary key of a table of orders.
If vendors and clients are subtypes, I'd expect any (high risk) id numbers you store to reference the id numbers in the subtype tables. In your case, you seem to have an additional table that identifies the clients of vendors; it contains the columns {vendor_id, client_id}. The foreign key references for that table should be obvious.
Your table of orders should have one foreign key reference to that table, not one foreign key to vendors and another foreign key to clients. So in the table of orders, foreign key (vendor_id, client_id) references vendor_clients (vendor_id, client_id). The table of vendor clients will need either a primary key constraint or a unique constraint on {vendor_id, client_id}.
But you shouldn't do that for accounting unless you're using temporal tables. Instead, you should probably store both the id numbers and the text.

I would start with something like this. I do admit that I still do not quite understand difference between company, vendor, and client in your question. As Catcall mentioned, in this model you are not allowed to delete Parties (People, Organizations); accounting records should be frozen -- usually by capturing current customer/supplier info in order table.

For your primary key, you'll want just order_id.
Really, the composite (and unique) key I would use would be [vendor_id, client_id, occurredAt] (where occurredAt is a timestamp) - assuming orders could only be placed once a millisecond. However, this is something of a wide key, and some systems don't appreciate those. You'll still need these columns, and probably indexed, however.

Related

Postgresql: Primary key for table with one column

Sometimes, there are certain tables in an application with only one column in each of them. Data of records within the respective columns are unique. Examples are: a table for country names, a table for product names (up to 60 characters long, say), a table for company codes (3 characters long and determined by the user), a table for address types (say, billing, delivery), etc.
For tables like these, as the records are unique and not null, the only column can be used as the primary key, technically speaking.
So my question is, is it good enough to use that column as the primary key for the table? Or, is it still desirable to add another column (country_id, product_id, company_id, addresstype_id) as the primary key for the table? Why?
Thanks in advance for any advice.

there is always a debate between using surrogate keys and composite keys as primary key. using composite primary keys always introduces some complexity to your database design so to your application.
think that you have another table which is needed to have direct relationship between your resulting table (billing table). For the composite key scenario you need to have 4 columns in your related table in order to connect with the billing table. On the other hand, if you use surrogate keys, you will have one identity column (simplicity) and you can create unique constraint on (country_id, product_id, company_id, addresstype_id)
but it is hard to say this approach is better then the other one because they both have Pros and Cons.
You can check This for more information

composite primary key with practical approach

I just need to understand the concept behind the composite primary key. I have googled about it, understood that it is a combination of more than one column of a table.But my questions is, what is the practical approach of this key over any data? when i should use this concept? can you show me any practical usage of this key on excel or SQL server?
It may be a weird type of question for any sql expert. I apologize for this kind of idiotic question. If anybody feels it is an idiot question, please forgive me.

A typical use-case for a composite primary key is a junction/association table. Consider orders and products. One order could have many products. One product could be in many orders. The orderProducts table could be defined as:
create table orderProducts (
orderId int not null references orders(orderId),
productId int not null references products(productId),
quantity int,
. . .
);
It makes sense to declare (orderId, productId) as a composite primary key. This would impose the constraint that any given order has any given product only once.
That said, I would normally use a synthetic key (orderProductId) and simply declare the combination as unique.
The benefit of a composite primary key as that it enforces the uniques (which could also be done with a uniqueness constraint). It also wastes no space that would be needed for an additional key.
There are downsides to composite primary keys as compared to identity keys:
Identity keys keep track of the order of inserts.
Identity keys are typically only 4 bytes.
Foreign key references consist of only one column.
By default, SQL Server clusters on primary keys. This imposes an ordering and can result in fragmentation (although that is doubtful for this example).

Let's say I have a table of cars. It includes the model and make of the cars. I do not want to insert the same exact car into my table, but there are cars that will have the same make and cars that will have the same model (assume both Ford and Toyota make a car called the 'BlergWagon').
I could enforce uniqueness of make/model with a composite key that includes both values. A unique key on just make would not allow me to add more than 1 Toyota and a unique key on just model would not allow me to enter more than 1 BlergWagon.
Another example would be grades, terms, years, students, and classes. I could enforce uniqueness for a student in a class and a specific semester in a specific year so that my table does not have 2 dupe records that show the same class in the same semester in the same year with the same student.
Another part of your post is about primary key, which I'll assume means you are talking about a clustered index. Clustered index enforces order of the table. So you could throw this onto an identity column to order the table and add a unique, nonclustered index to enforce uniqueness on your other columns.

I can't find any primary key in some of my relations

Alright so I read from somewhere
Every table should have a primary key
But some of my tables don't seem to behave!
I'd also like to know whether the relations as I'm using are fine or I need to dissolve them further, I'm open to suggestions.
The relations are
Dealers(DealerId(PK),DealerName)
Order(DealerId(FK),OrderDate,TotalBill)
Sales(DealerId(FK),ItemType,OrderDate,Quantity,Price)
P.S. I can't make a table named Items(ItemCode,Type,Price) Because the price is variable for different dealers. And all the constraints i.e not null + check that I needed are dealt with already just didn't mention.
1. Are the relations dissolved well?
2. Should I care about setting primary keys in the tables that don't have it already?
Helpful responses appreciated.

In your case, you should add an auto increment integer field to Order and Sales and set that to be the primary key.
In Relational Database Theory, you can sometimes identify a sub-set of the fields to use as a primary key, as long as those columns are non-null and unique. However, (1) the order table cannot have a primary key from DealerID and OrderDate because a dealer could make two orders on the same date. Maybe even for the same amount, which would mean that no sub-set of fields is unique, and (2) even when familiar data can uniquely identify the data, an auto-increment integer can be a good key.
I also think that you want a foreign key from Sales to Order. You are probably using DealerId and OrderDate for joins, but this will not work correctly if a dealer makes two orders on the same date.
Finally, take advice like
Every table should have a primary key
with a grain of salt. Linking tables used for many-to-many relationships can work perfectly fine without a primary key, although a primary key can be an improvement, since it will make deleting records easier, and if you don't have a primary key on a linking table, I would still recommend a unique index on all the fields, in which case that can be the primary key.

Do you really need separate Sales Table ?
Dealers(DealerId(PK),DealerName)
Order(OrderId(PK), DealerId(FK),OrderDate, ItemType, Quantity,Price)
Also,
TotalBill (can be calculated) = Quantity * Price

About the question 1 you should answer this question:
A sale can be made without an order?
If yes, your DealerId(FK) in Sales is alright, assuming that a sale will only exist if a dealer made it.
If no, you should put an OrderId(FK) in Sales, instead of DealerId(FK). If a sale belongs to an order, this order belongs do a dealer, so you already have the relation from dealers to sales.
About the question 2, you should have primary keys on your tables, because this is the way you have to select, update and delete some specific item on your database. Remembering that a primary key is not always an auto increment column.
And about the Items table, if the price is variable to different dealers, so you have an M to N relationship between Dealers and Items, which means you could have an intermediate table like this example:
DealerItemPrices(DealerId(FK), ItemId(FK), Price)
And these two Foreign Keys should be Unique Composite Keys, in this way a Dealer Y can't have two distinct prices to the same item.
Hope it helps!

MS SQL creating many-to-many relation with a junction table

I'm using Microsoft SQL Server Management Studio and while creating a junction table should I create an ID column for the junction table, if so should I also make it the primary key and identity column? Or just keep 2 columns for the tables I'm joining in the many-to-many relation?
For example if this would be the many-to many tables:
MOVIE
Movie_ID
Name
etc...
CATEGORY
Category_ID
Name
etc...
Should I make the junction table:
MOVIE_CATEGORY_JUNCTION
Movie_ID
Category_ID
Movie_Category_Junction_ID
[and make the Movie_Category_Junction_ID my Primary Key and use it as the Identity Column] ?
Or:
MOVIE_CATEGORY_JUNCTION
Movie_ID
Category_ID
[and just leave it at that with no primary key or identity table] ?

I would use the second junction table:
MOVIE_CATEGORY_JUNCTION
Movie_ID
Category_ID
The primary key would be the combination of both columns. You would also have a foreign key from each column to the Movie and Category table.
The junction table would look similar to this:
create table movie_category_junction
(
movie_id int,
category_id int,
CONSTRAINT movie_cat_pk PRIMARY KEY (movie_id, category_id),
CONSTRAINT FK_movie
FOREIGN KEY (movie_id) REFERENCES movie (movie_id),
CONSTRAINT FK_category
FOREIGN KEY (category_id) REFERENCES category (category_id)
);
See SQL Fiddle with Demo.
Using these two fields as the PRIMARY KEY will prevent duplicate movie/category combinations from being added to the table.

There are different schools of thought on this. One school prefers including a primary key and naming the linking table something more significant than just the two tables it is linking. The reasoning is that although the table may start out seeming like just a linking table, it may become its own table with significant data.
An example is a many-to-many between magazines and subscribers. Really that link is a subscription with its own attributes, like expiration date, payment status, etc.
However, I think sometimes a linking table is just a linking table. The many to many relationship with categories is a good example of this.
So in this case, a separate one field primary key is not necessary. You could have a auto-assign key, which wouldn't hurt anything, and would make deleting specific records easier. It might be good as a general practice, so if the table later develops into a significant table with its own significant data (as subscriptions) it will already have an auto-assign primary key.
You can put a unique index on the two fields to avoid duplicates. This will even prevent duplicates if you have a separate auto-assign key. You could use both fields as your primary key (which is also a unique index).
So, the one school of thought can stick with integer auto-assign primary keys, and avoids compound primary keys. This is not the only way to do it, and maybe not the best, but it won't lead you wrong, into a problem where you really regret it.
But, for something like what you are doing, you will probably be fine with just the two fields. I'd still recommend either making the two fields a compound primary key, or at least putting a unique index on the two fields.

I would go with the 2nd junction table. But make those two fields as Primary key. That will restrict duplicate entries.

How to properly index a table two other tables have a one-to-many relationship to?

Imagine I have three tables, called "customers", "companies" and "phone_numbers". Both customers and companies can have multiple phone numbers. What would be the best way to index phone_numbers? Have both customer_id and company_id and keep one of them null? What if there are more than two tables with a one-to-many relationship with phone_numbers?

Your business rules might only state one-to-many, but in reality people & companies can be a many-to-many relationship. One person can have many phone numbers (home, cell, etc), and a phone number can relate to many people (myself, my significant other, etc). Likewise, a company number and my business number can be the same - you just use an extension number to reach me directly.
Indexing the foreign keys would be a good idea, but beware of premature optimization. Depending on setup, I'd consider a unique constraint on the phone number column but I would not have the phone number column itself as a primary key.

I would go with identity columns in the customer and company tables, then in the phone number table do as you said and keep one null and the other populated. I do something similar to this and it works out fine as long as you validate data so that it doesn't go in with both values being null. For a more elegant solution you could have two columns: one that is an id, and another that is a type identifier. Say 1 for customers and 2 for companies, that way you don't have to worry about null data or a lot of extra columns.

I'd add two columns to the phone_numbers table. The first would be an index that tells you what table to associate with (say, 1 = customers and 2 = companies). The second would be the foreign key to the appropriate table.
This way you can add as many phone number sources as you want.
If a particular person or company has more than one phone number, there would be multiple rows in the phone_numbers table.

The closest thing I have to a pattern is the following -- any two entities with a many-to-many relationship require an associative entity (a cross-reference table) between them, like so (surrogate keys assumed):
CREATE TABLE CUSTOMER_XREF_PHONE
( CUSTOMER_ID NUMBER NOT NULL,
PHONE_NUMBER_ID NUMBER NOT NULL,
CONSTRAINT CUSTOMER_XREF_PHONE_PK
PRIMARY KEY (CUSTOMER_ID, PHONE_NUMBER_ID),
CONSTRAINT CUSTOMER_XREF_PHONE_UK
UNIQUE (PHONE_NUMBER_ID, CUSTOMER_ID),
CONSTRAINT CUSTOMER_XREF_PHONE_FK01
FOREIGN KEY (CUSTOMER_ID)
REFERENCES CUSTOMER (CUSTOMER_ID) ON DELETE CASCADE,
CONSTRAINT CUSTOMER_XREF_PHONE_FK02
FOREIGN_KEY (PHONE_NUMBER_ID)
REFERENCES PHONE_NUMBERS (PHONE_NUMBER_ID) ON DELETE CASCADE
);
Such an implementation pattern can:
Be fully protected by database-level referential integrity constraints
Support bi-directional access (sometimes you need to see who else has that phone number)
Be self-cleaning if your database supports ON DELETE CASCADE
Be extended through the use of a "relationship type" attribute to map multiple independent relationships between the entities,
such as:
customer has a home telephone number
customer has a daytime telephone number
customer has a fax telephone number
customer has a mobile telephone number

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas