Best primary key for table with changing contents - sql

I have a 'loan detail' table in SQL Server.
Much like an order/order detail scenario, the detail table has a relationship with a parent 'loan' table, and needs it's own unique primary key because it contains multiple rows per LoanId.
However, as it is financial data, it can change on a regular basis — quarterly/6 monthly/annually, rather than daily or hourly.
I'm trying to establish the primary key for the loan detail table.
Each loan, may have several loan rates, so I have 2 options in my head at present:
PK1
PK: LoanId smallint
PK: AnnualRate decimal(9,4)
In this scenario, the annual rate will be used as part of the primary key (since it will be unique for the loan.
The AnnualRate column is not connected with any other tables but it will change from time to time.
PK2
PK: LoanId smallint
PK: RateId tinyint <-- a surrogate column not used anywhere else
Other columns
AnnualRate decimal(9,4)
... etc.
In this scenario the primary key will not change if a rate changes but rates could still be added or removed by the lender. In short, it will change must less often.
As an inexperienced SQL guy, I'm looking for advice, as any mistake at this point is likely to be difficult to put right further down the line.

You should have a unique identity column as the primary key. I would expect something along these lines:
create table LoanDetail (
LoanDetailId int identity(1, 1) primary key,
LoanId int references Loans(LoanId),
Rate decimal(9, 4),
eff_date date not null,
end_date date
);
eff_date and end_date represent the period of time when the rate is effective.

The real problem here is the distinction between a primary key, a natural key, and a surrogate key.
A natural key is a column or group of columns that exist within your data that make the row of data unique. A natural key may or may not exist.
A surrogate key is a column or group of columns that you add to your data to make the row of data unique. You don't 'find' a surrogate key; you built it.
A primary key uniquely identifies a row of data. It may be a natural key or it may be a surrogate key, but in either case it is immutable. Otherwise, you stand to lose referential integrity.
From the sounds of things in your loan details table, the only column that's currently immutable is the LoanId. That's fine, but it means that if you're going to have a primary key on your table, it'll have to be surrogate key, and Gordon has laid out the DDL for setting that up in his answer.

In my mind the primary key should be LoanId and DateRateSet which would be a timestamp that signals when a change in the Rate of a LoanId was made. That combination would make AnnualRate uniquely identified.
For example:
LoanId DateRateSet AnnualRate
1 2017-01-01 01:04:20 0.05
1 2017-02-01 01:20:20 0.03
1 2017-08-01 05:04:20 0.05
1 2017-09-01 01:20:24 0.02
They could be repeated within the same LoanId, but then you will be able to specify a Timestamp for each one.

Related

Can I use identity for primary key in more than one table in the same ER model

As it is said in the title, my question is can I use int identity(1,1) for primary key in more than one table in the same ER model? I found on Internet that Primary Key need to have unique value and row, for example if I set int identity (1,1) for table:
CREATE TABLE dbo.Persons
(
Personid int IDENTITY(1,1) PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Age int
);
GO
and the other table
CREATE TABLE dbo.Job
(
jobID int IDENTITY(1,1) NOT NULL PRIMARY KEY,
nameJob NVARCHAR(25) NOT NULL,
Personid int FOREIGN KEY REFERENCES dbo.Persons(Personid)
);
Wouldn't Personid and jobID have the same value and because of that cause an error?
Constraints in general are defined and have a scope of one table (object) in the database. The only exception is the FOREIGN KEY which usually has a REFERENCE to another table.
The PRIMARY KEY (or any UNIQUE key) sets a constraint only on the table it is defined on and is not affecting or is not affected by other constraints on other tables.
The PRIMARY KEY defines a column or a set of columns which can be used to uniquely identify one record in one table (and none of the columns can hold NULL, UNIQUE on the other hand allows NULLs and how it is treated might differ in different database engines).
So yes, you might have the same value for PersonID and JobID, but their meaning is different. (And to select the one unique record, you will need to tell SQL Server in which table and in which column of that table you are looking for it, this is the table list and the WHERE or JOIN conditions in the query).
The query SELECT * FROM dbo.Job WHERE JobID = 1; and SELECT * FROM dbo.Person WHERE PersonID = 1; have a different meaning even when the value you are searching for is the same.
You will define the IDENTITY on the table (the table can have only one IDENTITY column). You don't need to have an IDENTITY definition on a column to have the value 1 in it, the IDENTITY just gives you an easy way to generate unique values per table.
You can share sequences across tables by using a SEQUENCE, but that will not prevent you to manually insert the same values into multiple tables.
In short, the value stored in the column is just a value, the table name, the column name and the business rules and roles will give it a meaning.
To the notion "every table needs to have a PRIMARY KEY and IDENTITY, I would like to add, that in most cases there are multiple (independent) keys in the table. Usually every entity has something what you can call business key, which is in loose terms the key what the business (humans) use to identify something. This key has very similar, but usually the same characteristics as a PRIMARY KEY with IDENTITY.
This can be a product's barcode, or the employee's ID card number, or something what is generated in another system (say HR) or a code which is assigned to a customer or partner.
These business keys are useful for humans, but not always useful for computers, but they could serve as PRIMARY KEY.
In databases we (the developers, architects) like simplicity and a business key can be very complex (in computer terms), can consist of multiple columns, and can also cause performance issues (comparing a strings is not the same as comparing numbers, comparing multiple columns is less efficient than comparing one column), but the worst, it might change over time. To resolve this, we tend to create our own technical key which then can be used by computers more easily and we have more control over it, so we use things like IDENTITYs and GUIDs and whatnot.

Creating tables - integrity constraints

Create the following tables:
Customer
KNr (primary key)
Name (at most 15 characters)
City (at most 10 characters)
Country (at most 10 characters)
Balance (Type FLOAT)
Discount (Type FLOAT)
Products
PNr (greater than 1 and primary key)
Descr (not NULL, at most 10 characters and unique)
Weight (Type FLOAT)
Think about the integrity constraints for the columns Price, StorageLocation and Stock.
Orders
OrdNr (Type INTEGER, greater than 0 and primary key)
Mon (Type INTEGER, not NULL and between 1 and 12)
Day (Type INTEGER, not NULL and between 1 and 31)
PNr (Foreign Key)
KNr (Foreign Key)
The attributes Month, Day, Pnr and Knr must together be unique. Think about the integrity constraints for the columns Quantity, Sum and Status.
I have done the following :
For 1 :
CREATE TABLE Customer
(
KNr PRIMARY KEY,
Name CHAR(15),
City CHAR(10)
Country CHAR(10)
Balance FLOAT
Discount FLOAT
);
Is that correct?
For 2 :
CREATE TABLE Products
(
PNr PRIMARY KEY CHECK (PNr > 1) ,
Descr NOT NULL CHAR(10) UNIQUE.
Weight FLOAT
Price FLOAT CHECK (Price > 0) // Is checking if it is positive an integrity constraint?
StorageLocation CHAR(15) // What integrity constraint do we use here? If it is not Null for example?
Stock INTEGER // What integrity constraint do we use here? If it is not negative for example?
);
Is that correct?
For 3 :
CREATE TABLE Orders
(
BestNr INTEGER PRIMARY KEY CHECK (BestNr > 0) ,
Mon INTEGER NOT NULL CHECK(Mon >= 1 and Mon <=12)
Day INTEGER NOT NULL CHECK(Day >= 1 and Day <=31)
FOREIGN KEY (PNr) REFERENCES Customer (PNr),
FOREIGN KEY (KNr) REFERENCES Products (KNr)
Quantity INTEGER CHECK(Quantity >0) // It is the ordered quantity, or not? What integrity constraints can we consider?
Sum FLOAT // Is this the sum of invoices? Or what is this meant? What integrity constraints can we consider?
Status CHAR(20) // It is meant if is paid, delivered, etc? So this contains words, right? What integrity constraints can we consider?
UNIQUE (Mon, Day, Pnr, Knr)
);
Do we write that as in the last line that the attributes Month, Day, Pnr and Knr must together be unique ?
You are actually pretty close if viewed as logical model defining requirements. From a physical model however, the syntax is considerable off.
I will not do each table but just Orders, and I will slice and dice along the way, leaving some things you need to correct and some suggestions for your considerations.
First off If you want comment on your ddl you can do so, but they begin with -- instead of //. A better approach just use Comment On where they become part of the permanent record.
BestNr:
As a column name nothing wrong but is it clear what BestNr refers to, and what makes it better than any other number. Perhaps a better name would be Ord_nr. (But the is of course just an opinion). Declaring it as Primary comes with 2 automatic constraints: Not Null and Unique. Check constraint again there is nothing wrong. However a better process would be just tell the DBMS to generate identity column (see Create table ... generated ...).
Mon and Day:
Technically nothing wrong. However there is a data integrity hole as it still permits invalid date. The date Feb 30 would pass both your constraints. But it is still an invalid date. Other months have the same issue, day = 31 for a month with only 30 days passes the constraints but remains invalid. To ensure only valid dates just define a date column. This also eliminates the need for the check constraint. The month and date can be extracted when needed.
FOREIGN KEY (PNr) REFERENCES Customer (PNr): FOREIGN KEY (KNr) REFERENCES Products (KNr):
Your reference is backwards. PNr refers to Product, KNr to customer. However you must define them as columns then generate the FK. While nothing is wrong with these as columns names, are the descriptive of what they refer to. PNr perhaps, but not so KNr (unless Customer is always referred to as K...) Perhaps better prod_nr and cust_nr. (but perhaps no product reference at all - later).
Sum:
This column can easily be derived when needed, and will be difficult to keep current (what happens when another item is added to the Order, or Updated, or Deleted). Further this is a very poor choice for a column name as it is a SQL Standard reserved word (not by all RDBMS however, Postgres being one). Drop the column and derive it when needed.
Status:
You would want to constrain this to a set of predefined values. Either a CHECK constraint, an ENUM or a lookup (reference) table.
Normalization:
Consider normalizing a bit further. An order typically will contain multiple items (lines). These can/should be extracted into another table; call it Order_Lines and move PNr and Quantity into it.
Taking all the above into consideration arrive at:
-- method to constrain status
create type order_status as enum ('pending', 'picked', 'shipped', 'delivered', 'billed', 'paid', 'back ordered', 'on hold', 'canceled' ); -- or others
create table orders ( ord_nr integer generated always as identity primary key
, ord_dt date
, cust_nr integer references customers (cust_nr)
, status order_status -- questionable: Can it be derived?
, constraint one_per_cust_per_day unique (cust_nr, ord_dt) -- combine multiple orders for customer into 1 per day. ??
);
create table order_lines ( ord_ln_nr integer generated always as identity primary key -- optional
, ord_nr integer not null references orders(ord_nr)
, prod_nr integer not null references products(prod_nr)
, quantity integer not null check (quantity>0)
, price float -- Note1
, status order_status
, constraint one_ln_per_ord_prod unique ( ord_nr, prod_nr)
);
Note1: Normally do not copy columns from referenced tables. You normally avoid this as it creates duplicate data, just get the value through the reference. However, price tends to be a volatile column. If a price change occurs, we should not automatically apply that to existing orders. For this reason the Price from the Product will be copied when order is placed.

composite primary key with practical approach

I just need to understand the concept behind the composite primary key. I have googled about it, understood that it is a combination of more than one column of a table.But my questions is, what is the practical approach of this key over any data? when i should use this concept? can you show me any practical usage of this key on excel or SQL server?
It may be a weird type of question for any sql expert. I apologize for this kind of idiotic question. If anybody feels it is an idiot question, please forgive me.
A typical use-case for a composite primary key is a junction/association table. Consider orders and products. One order could have many products. One product could be in many orders. The orderProducts table could be defined as:
create table orderProducts (
orderId int not null references orders(orderId),
productId int not null references products(productId),
quantity int,
. . .
);
It makes sense to declare (orderId, productId) as a composite primary key. This would impose the constraint that any given order has any given product only once.
That said, I would normally use a synthetic key (orderProductId) and simply declare the combination as unique.
The benefit of a composite primary key as that it enforces the uniques (which could also be done with a uniqueness constraint). It also wastes no space that would be needed for an additional key.
There are downsides to composite primary keys as compared to identity keys:
Identity keys keep track of the order of inserts.
Identity keys are typically only 4 bytes.
Foreign key references consist of only one column.
By default, SQL Server clusters on primary keys. This imposes an ordering and can result in fragmentation (although that is doubtful for this example).
Let's say I have a table of cars. It includes the model and make of the cars. I do not want to insert the same exact car into my table, but there are cars that will have the same make and cars that will have the same model (assume both Ford and Toyota make a car called the 'BlergWagon').
I could enforce uniqueness of make/model with a composite key that includes both values. A unique key on just make would not allow me to add more than 1 Toyota and a unique key on just model would not allow me to enter more than 1 BlergWagon.
Another example would be grades, terms, years, students, and classes. I could enforce uniqueness for a student in a class and a specific semester in a specific year so that my table does not have 2 dupe records that show the same class in the same semester in the same year with the same student.
Another part of your post is about primary key, which I'll assume means you are talking about a clustered index. Clustered index enforces order of the table. So you could throw this onto an identity column to order the table and add a unique, nonclustered index to enforce uniqueness on your other columns.

Normalizing a table with duplicate rows and many-to-many relationships

I am designing the database for an accounting system, currently working on the Expenses table.
According to IRS rules, whenever you update a row in any accounting table, you need to cancel out the existing row by negating its values, and create a new row with the modified information, like so:
Set the row's Status flag to "Modified"
Create an identical copy of this row, with all Money fields negated, so that the sum of the two rows is 0
Create a 3rd row, identical to the first one, with the modified data
Each expense has an identity field called ID for internal identification purposes, and an ExpenseID field, which identifies the transaction to the users. The two cannot be the same, because
ExpenseID can be repeated twice if the transaction was modified and its row was duplicated.
ExpenseIDs MUST be consecutive and NEVER have gaps, while identity fields can skip numbers if a transaction is rolled back and the identity is not reseeded.
In general, I believe the primary key should have no business meaning whatsoever.
My problem is that there are other tables used to link these expenses Many-To-Many to other objects in our system. E.g.: each expense can be linked to documents, folders, users, etc.
So it looks something like this:
create table Expenses (
ID int not null identity(1,1),
ExpenseID int not null,
Amount Money not null,
Status tinyint not null,
[...]
)
create table Expenses_Users (
ExpenseID int not null,
UserID int not null
)
alter table Expenses_Users add constraint FK_Expenses_Users_Expenses
foreign key (ExpenseID) references Expenses (ID)
alter table Expenses_Users add constraint FK_Expenses_Users_Users
foreign key (UserID) references Users (ID)
Now, because of the IRS guidelines, I have to duplicate not only rows in the Expenses table, but also in Expenses_Users, and any other table that links Expenses to other tables.
I have two ideas on how to solve this:
Option One: Normalize Expenses like this:
create table Expenses (
ID int not null identity(1,1),
ExpenseID int not null,
Status tinyint not null,
[...]
)
create table ExpensesNormalized (
ExpenseID int not null,
Amount Money not null
)
alter table ExpensesNormalized add constraint FK_ExpensesNormalized_Expenses
foreign key (ExpenseID) references Expenses(ExpenseID)
This means I'll only have to link external tables to Expenses, not ExpensesNormalized. Also, when updating an expense, I'll only duplicate and negate the data in ExpensesNormalized, which means I'll have far less redundant data in the Expenses table.
However, I'll have to use a JOIN clause every single time I SELECT from Expenses. I fear a performance hit because of this.
Option Two: Use the same tables I use now, but have the field Expenses_Users.ExpenseID point to the field Expenses.ExpenseID. This means that I won't have to duplicate any external objects because they'll point to ExpenseID, which may occur several times.
However, this will not be a real foreign key because SQL Server does not allow foreign keys to non-unique fields, so I'll have to implement foreign key logic in a trigger.
I'm having a hard time deciding between these two options. Any feedback would be appreciated.

I can't find any primary key in some of my relations

Alright so I read from somewhere
Every table should have a primary key
But some of my tables don't seem to behave!
I'd also like to know whether the relations as I'm using are fine or I need to dissolve them further, I'm open to suggestions.
The relations are
Dealers(DealerId(PK),DealerName)
Order(DealerId(FK),OrderDate,TotalBill)
Sales(DealerId(FK),ItemType,OrderDate,Quantity,Price)
P.S. I can't make a table named Items(ItemCode,Type,Price) Because the price is variable for different dealers. And all the constraints i.e not null + check that I needed are dealt with already just didn't mention.
1. Are the relations dissolved well?
2. Should I care about setting primary keys in the tables that don't have it already?
Helpful responses appreciated.
In your case, you should add an auto increment integer field to Order and Sales and set that to be the primary key.
In Relational Database Theory, you can sometimes identify a sub-set of the fields to use as a primary key, as long as those columns are non-null and unique. However, (1) the order table cannot have a primary key from DealerID and OrderDate because a dealer could make two orders on the same date. Maybe even for the same amount, which would mean that no sub-set of fields is unique, and (2) even when familiar data can uniquely identify the data, an auto-increment integer can be a good key.
I also think that you want a foreign key from Sales to Order. You are probably using DealerId and OrderDate for joins, but this will not work correctly if a dealer makes two orders on the same date.
Finally, take advice like
Every table should have a primary key
with a grain of salt. Linking tables used for many-to-many relationships can work perfectly fine without a primary key, although a primary key can be an improvement, since it will make deleting records easier, and if you don't have a primary key on a linking table, I would still recommend a unique index on all the fields, in which case that can be the primary key.
Do you really need separate Sales Table ?
Dealers(DealerId(PK),DealerName)
Order(OrderId(PK), DealerId(FK),OrderDate, ItemType, Quantity,Price)
Also,
TotalBill (can be calculated) = Quantity * Price
About the question 1 you should answer this question:
A sale can be made without an order?
If yes, your DealerId(FK) in Sales is alright, assuming that a sale will only exist if a dealer made it.
If no, you should put an OrderId(FK) in Sales, instead of DealerId(FK). If a sale belongs to an order, this order belongs do a dealer, so you already have the relation from dealers to sales.
About the question 2, you should have primary keys on your tables, because this is the way you have to select, update and delete some specific item on your database. Remembering that a primary key is not always an auto increment column.
And about the Items table, if the price is variable to different dealers, so you have an M to N relationship between Dealers and Items, which means you could have an intermediate table like this example:
DealerItemPrices(DealerId(FK), ItemId(FK), Price)
And these two Foreign Keys should be Unique Composite Keys, in this way a Dealer Y can't have two distinct prices to the same item.
Hope it helps!