Creating tables - integrity constraints - sql

Create the following tables:
Customer
KNr (primary key)
Name (at most 15 characters)
City (at most 10 characters)
Country (at most 10 characters)
Balance (Type FLOAT)
Discount (Type FLOAT)
Products
PNr (greater than 1 and primary key)
Descr (not NULL, at most 10 characters and unique)
Weight (Type FLOAT)
Think about the integrity constraints for the columns Price, StorageLocation and Stock.
Orders
OrdNr (Type INTEGER, greater than 0 and primary key)
Mon (Type INTEGER, not NULL and between 1 and 12)
Day (Type INTEGER, not NULL and between 1 and 31)
PNr (Foreign Key)
KNr (Foreign Key)
The attributes Month, Day, Pnr and Knr must together be unique. Think about the integrity constraints for the columns Quantity, Sum and Status.
I have done the following :
For 1 :
CREATE TABLE Customer
(
KNr PRIMARY KEY,
Name CHAR(15),
City CHAR(10)
Country CHAR(10)
Balance FLOAT
Discount FLOAT
);
Is that correct?
For 2 :
CREATE TABLE Products
(
PNr PRIMARY KEY CHECK (PNr > 1) ,
Descr NOT NULL CHAR(10) UNIQUE.
Weight FLOAT
Price FLOAT CHECK (Price > 0) // Is checking if it is positive an integrity constraint?
StorageLocation CHAR(15) // What integrity constraint do we use here? If it is not Null for example?
Stock INTEGER // What integrity constraint do we use here? If it is not negative for example?
);
Is that correct?
For 3 :
CREATE TABLE Orders
(
BestNr INTEGER PRIMARY KEY CHECK (BestNr > 0) ,
Mon INTEGER NOT NULL CHECK(Mon >= 1 and Mon <=12)
Day INTEGER NOT NULL CHECK(Day >= 1 and Day <=31)
FOREIGN KEY (PNr) REFERENCES Customer (PNr),
FOREIGN KEY (KNr) REFERENCES Products (KNr)
Quantity INTEGER CHECK(Quantity >0) // It is the ordered quantity, or not? What integrity constraints can we consider?
Sum FLOAT // Is this the sum of invoices? Or what is this meant? What integrity constraints can we consider?
Status CHAR(20) // It is meant if is paid, delivered, etc? So this contains words, right? What integrity constraints can we consider?
UNIQUE (Mon, Day, Pnr, Knr)
);
Do we write that as in the last line that the attributes Month, Day, Pnr and Knr must together be unique ?

You are actually pretty close if viewed as logical model defining requirements. From a physical model however, the syntax is considerable off.
I will not do each table but just Orders, and I will slice and dice along the way, leaving some things you need to correct and some suggestions for your considerations.
First off If you want comment on your ddl you can do so, but they begin with -- instead of //. A better approach just use Comment On where they become part of the permanent record.
BestNr:
As a column name nothing wrong but is it clear what BestNr refers to, and what makes it better than any other number. Perhaps a better name would be Ord_nr. (But the is of course just an opinion). Declaring it as Primary comes with 2 automatic constraints: Not Null and Unique. Check constraint again there is nothing wrong. However a better process would be just tell the DBMS to generate identity column (see Create table ... generated ...).
Mon and Day:
Technically nothing wrong. However there is a data integrity hole as it still permits invalid date. The date Feb 30 would pass both your constraints. But it is still an invalid date. Other months have the same issue, day = 31 for a month with only 30 days passes the constraints but remains invalid. To ensure only valid dates just define a date column. This also eliminates the need for the check constraint. The month and date can be extracted when needed.
FOREIGN KEY (PNr) REFERENCES Customer (PNr): FOREIGN KEY (KNr) REFERENCES Products (KNr):
Your reference is backwards. PNr refers to Product, KNr to customer. However you must define them as columns then generate the FK. While nothing is wrong with these as columns names, are the descriptive of what they refer to. PNr perhaps, but not so KNr (unless Customer is always referred to as K...) Perhaps better prod_nr and cust_nr. (but perhaps no product reference at all - later).
Sum:
This column can easily be derived when needed, and will be difficult to keep current (what happens when another item is added to the Order, or Updated, or Deleted). Further this is a very poor choice for a column name as it is a SQL Standard reserved word (not by all RDBMS however, Postgres being one). Drop the column and derive it when needed.
Status:
You would want to constrain this to a set of predefined values. Either a CHECK constraint, an ENUM or a lookup (reference) table.
Normalization:
Consider normalizing a bit further. An order typically will contain multiple items (lines). These can/should be extracted into another table; call it Order_Lines and move PNr and Quantity into it.
Taking all the above into consideration arrive at:
-- method to constrain status
create type order_status as enum ('pending', 'picked', 'shipped', 'delivered', 'billed', 'paid', 'back ordered', 'on hold', 'canceled' ); -- or others
create table orders ( ord_nr integer generated always as identity primary key
, ord_dt date
, cust_nr integer references customers (cust_nr)
, status order_status -- questionable: Can it be derived?
, constraint one_per_cust_per_day unique (cust_nr, ord_dt) -- combine multiple orders for customer into 1 per day. ??
);
create table order_lines ( ord_ln_nr integer generated always as identity primary key -- optional
, ord_nr integer not null references orders(ord_nr)
, prod_nr integer not null references products(prod_nr)
, quantity integer not null check (quantity>0)
, price float -- Note1
, status order_status
, constraint one_ln_per_ord_prod unique ( ord_nr, prod_nr)
);
Note1: Normally do not copy columns from referenced tables. You normally avoid this as it creates duplicate data, just get the value through the reference. However, price tends to be a volatile column. If a price change occurs, we should not automatically apply that to existing orders. For this reason the Price from the Product will be copied when order is placed.

Related

Restrict the number of entries in a relation based on conditions across several relations

I am using PostgreSQL and am trying to restrict the number of concurrent loans that a student can have. To do this, I have created a CTE that selects all unreturned loans grouped by StudentID, and counts the number of unreturned loans for each StudentID. Then, I am attempting to create a check constraint that uses that CTE to restrict the number of concurrent loans that a student can have to 7 at most.
The below code does not work because it is syntactically invalid, but hopefully it can communicate what I am trying to achieve. Does anyone know how I could implement my desired restriction on loans?
CREATE TABLE loan (
id SERIAL PRIMARY KEY,
copy_id INTEGER REFERENCES media_copies (copy_id),
account_id INT REFERENCES account (id),
loan_date DATE NOT NULL,
expiry_date DATE NOT NULL,
return_date DATE,
WITH currentStudentLoans (student_id, current_loans) AS
(
SELECT account_id, COUNT(*)
FROM loan
WHERE account_id IN (SELECT id FROM student)
AND return_date IS NULL
GROUP BY account_id
)
CONSTRAINT max_student_concurrent_loans CHECK(
(SELECT current_loans FROM currentStudentLoans) BETWEEN 0 AND 7
)
);
For additional (and optional) context, I include an ER diagram of my database schema.
You cannot do this using an in-line CTE like this. You have several choices.
The first is a UDF and check constraint. Essentially, the logic in the CTE is put in a UDF and then a check constraint validates the data.
The second is a trigger to do the check on this table. However, that is tricky because the counts are on the same table.
The third is storing the total number in another table -- probably accounts -- and keeping it up-to-date for inserts, updates, and deletes on this table. Keeping that value up-to-date requires triggers on loans. You can then put the check constraint on accounts.
I'm not sure which solution fits best in your overall schema. The first is closest to what you are doing now. The third "publishes" the count, so it is a bit clearer what is going on.

SQL/ORACLE- FOREIGN KEY using two columns from other tables

I'm trying to create an table in SQL*Plus that consults two columns from another table. For example,
If table A looks something like this:
CREATE TABLE Customers
(Customer_ID int NOT NULL PRIMARY KEY,
NAME Varchar(30) NOT NULL,
PHONE Varchar(12) NOT NULL,
OUTSTANDING_FEES Varchar(10) NULL);
And if my table B looks something like this:
CREATE TABLE Customer_Fees
(Fee_ID int NOT NULL PRIMARY KEY,
FEE_TYPE Varchar(20) NOT NULL,
AMOUNT Varchar(10) NOT NULL,
CUSTOMER_ID int NOT NULL);
I want to populate the OUTSTANDING_FEES in table A with the AMOUNT in table B, where the CUSTOMER_ID matches among the tables. For my purposes I can assume that any single Customer_ID in table B will only appear once in the table.
I've tried creating both tables, with the table A OUTSTANDING_FEES field being null and then making it a FOREIGN KEY that references table B's AMOUNT field, but it's not working since I need to make sure it also cross references the CUSTOMER_ID fields in both tables.
Thanks if you can help!
You can not create foreign key on a non-primary column from another table(s).
You'll have to create FK on FEE_ID.
CREATE TABLE Customers (
Customer_ID int NOT NULL PRIMARY KEY,
NAME Varchar(30) NOT NULL,
PHONE Varchar(12) NOT NULL,
OUTSTANDING_FEES Varchar(10) FOREIGN KEY REFERENCES Customer_Fees(FEE_ID)
);
You can use AMOUNT field in your select clause.
Select A.Customer_ID,B.AMOUNT from Customers A Join Customer_Fees B
on A.OUTSTANDING_FEES = B.FEE_ID
There are several things wrong with the data model posed in the question.
Defining OUTSTANDING_FEES and AMOUNT as varchar2columns is bad data modelling as both are surely intended to be numeric (monetary) values. Good practice is always to use the most appropriate datatype for the attribute we're modelling.
Building a foreign key between OUTSTANDING_FEES and AMOUNT is wrong because they are not unique identifiers. The amount of money owed by one customer can be the same as the amount of money owed by any other - even all - customers (at the start of term all students owe the same amount of tuition fees). So, a foreign which "references the CUSTOMER_ID fields in both tables" is all that is needed.
The data model doesn't provide any attribute which allows us to distinguish between fees which have been paid and fees which haven't.
The questioner states that "I can assume that any single Customer_ID in table B will only appear once in the table" but in real life we would expect Customers to have multiple fee records, unpaid and paid. Why not model that? Otherwise if there is truly a 1:1 relationship between Customer and Fee then there is no need for two tables.
So, here is an improved model. It uses proper datatype for monetary values; it enforces the foreign key between the two tables using CUSTOMER_ID; consequently it supports a one-to-many relationship between Customer and Fee; finally it tracks paid and unpaid fees.
create table customers
( customer_id integer not null constraint cust_pk primary key
, name varchar2(30) not null
, phone varchar2(12) not null
)
/
create table customer_fees
( fee_id integer not null constraint fees_pk primary key
, fee_type varchar2(20) not null
, amount number not null
, invoice_date date not null
, paid_date date null
, customer_id integer not null constraint fees_cust_fk references customers
)
/
Ah, but what about OUTSTANDING_FEES? Well, that information is derivable from the data in the two tables. There are many ways of writing this query, this approach is just a choice:
select cust.customer_id
, cust.name
, cust.phone
, fees.outstanding_fees
from customers cust
left outer join
( select fees.customer_id
, sum(case when fees.paid_date is null then fees.amount
else 0 end) as outstanding_fees
from customer_fees fees
group by fees.customer_id ) fees on fees.customer_id = cust.customer_id
/
Generally it is better to calculate aggregated values on demand rather than re-calculate them in every transaction. It scales better, certainly with OLTP volumes of data; the physics of a data warehouse is different, but I don't think that's what we're dealing with in this case.

Inserting values into the table

I'm trying to insert the values into the tables that I created.
This is the values that I'm trying to insert.
INSERT INTO DDR_Rental (customer_ID, rental_date, rent_fee, film_title, start_date, expiry_date, rating)
VALUES (12345, '12-Mar-19', '4.99', 'Peppermint', '12-Mar-19', '22-Mar-19', 4);
This is the datatypes and the constraints.
CREATE TABLE DDR_Rental
(customer_ID NUMBER(5),
rental_date DATE,
rent_fee NUMBER(3,2) CONSTRAINT SYS_RENTAL_FEE_NN NOT NULL,
film_title VARCHAR2(20),
start_date DATE,
expiry_date DATE,
rating NUMBER(5),
CONSTRAINT SYS_RENTAL_PK PRIMARY KEY ((customer_ID), (rental_date), (film_title)),
CONSTRAINT SYS_RENTAL_CUS_ID_FK1 FOREIGN KEY (customer_ID) REFERENCES
DDR_CUSTOMER(CUSTOMER_ID),
CONSTRAINT SYS_RENTAL_FILM_TITLE_FK2 FOREIGN KEY (film_title) REFERENCES
DDR_MOVIE_TITLE(FILM_TITLE),
CONSTRAINT SYS_RENTAL_EXP_DATE_CK CHECK (expiry_date >= start_date),
CONSTRAINT SYS_RENTAL_START_DATE_CK CHECK (start_date >= rental_date),
CONSTRAINT SYS_RENTAL_RATING_CK CHECK (REGEXP_LIKE(rating,('[12345]'))));
The error says unique constraint (CPRG250.SYS_RENTAL_PK) violated
It seems like you are trying to add a duplicate rental event for the same film by the same customer on exact one day. That can obviously happen, if you allow in your business logic the situation that a customer can rent a movie, give it back on the same day and rent it back again.
Knowing your business, you have 2 ways to deal with this situation:
Your business model don't allow that. This means that this is a duplicate record and you shouldn't add currently existing record, in which case showing that error is perfectly fine and doesn't allow for duplicates, since this event happened only once.
Your business model allows that. In this case, you should modify your rental_date column to store time along with the date, instead of only storing date, so that you know when the rental event actually happen. You could use datetime type for example to store date with time. This can be done when creating your table, just replace rental_date date with rental_date datetime. If the table is already created you will need to drop and recreate PRIMARY KEY and then after that you could change type of your column using ALTER TABLE ddr_rental ALTER COLUMN rental_date datetime and re-create the primary key. Check values stored in your table after that, since 2019-01-01 will now be represented as 2019-01-01 00:00:00.000 appending the time which wasn't specified before.
In addition to (1) you could wrap your code and handle this exception to return a clear message when this happens, showing that the movie has already been rented.
Moreover, since you don't have a table for storing movies in your inventory, this can lead to a possible mistake, since you may have more than 1 copy of a movie. In this case I suggest that you create separate film and film_copy tables to properly identify which copy of a film has been rented, so that you can rent another copy.
You have a unique constraint in your table. Your table already has a record with customer_id, rental_date and film_title that you want to insert.
Try this query and you will see that there already is a record
select * from DDR_Rental
where customer_id=12345 and rental_date='12-Mar-19' and film_title='Peppermint'

Best primary key for table with changing contents

I have a 'loan detail' table in SQL Server.
Much like an order/order detail scenario, the detail table has a relationship with a parent 'loan' table, and needs it's own unique primary key because it contains multiple rows per LoanId.
However, as it is financial data, it can change on a regular basis — quarterly/6 monthly/annually, rather than daily or hourly.
I'm trying to establish the primary key for the loan detail table.
Each loan, may have several loan rates, so I have 2 options in my head at present:
PK1
PK: LoanId smallint
PK: AnnualRate decimal(9,4)
In this scenario, the annual rate will be used as part of the primary key (since it will be unique for the loan.
The AnnualRate column is not connected with any other tables but it will change from time to time.
PK2
PK: LoanId smallint
PK: RateId tinyint <-- a surrogate column not used anywhere else
Other columns
AnnualRate decimal(9,4)
... etc.
In this scenario the primary key will not change if a rate changes but rates could still be added or removed by the lender. In short, it will change must less often.
As an inexperienced SQL guy, I'm looking for advice, as any mistake at this point is likely to be difficult to put right further down the line.
You should have a unique identity column as the primary key. I would expect something along these lines:
create table LoanDetail (
LoanDetailId int identity(1, 1) primary key,
LoanId int references Loans(LoanId),
Rate decimal(9, 4),
eff_date date not null,
end_date date
);
eff_date and end_date represent the period of time when the rate is effective.
The real problem here is the distinction between a primary key, a natural key, and a surrogate key.
A natural key is a column or group of columns that exist within your data that make the row of data unique. A natural key may or may not exist.
A surrogate key is a column or group of columns that you add to your data to make the row of data unique. You don't 'find' a surrogate key; you built it.
A primary key uniquely identifies a row of data. It may be a natural key or it may be a surrogate key, but in either case it is immutable. Otherwise, you stand to lose referential integrity.
From the sounds of things in your loan details table, the only column that's currently immutable is the LoanId. That's fine, but it means that if you're going to have a primary key on your table, it'll have to be surrogate key, and Gordon has laid out the DDL for setting that up in his answer.
In my mind the primary key should be LoanId and DateRateSet which would be a timestamp that signals when a change in the Rate of a LoanId was made. That combination would make AnnualRate uniquely identified.
For example:
LoanId DateRateSet AnnualRate
1 2017-01-01 01:04:20 0.05
1 2017-02-01 01:20:20 0.03
1 2017-08-01 05:04:20 0.05
1 2017-09-01 01:20:24 0.02
They could be repeated within the same LoanId, but then you will be able to specify a Timestamp for each one.

Normalizing a table with duplicate rows and many-to-many relationships

I am designing the database for an accounting system, currently working on the Expenses table.
According to IRS rules, whenever you update a row in any accounting table, you need to cancel out the existing row by negating its values, and create a new row with the modified information, like so:
Set the row's Status flag to "Modified"
Create an identical copy of this row, with all Money fields negated, so that the sum of the two rows is 0
Create a 3rd row, identical to the first one, with the modified data
Each expense has an identity field called ID for internal identification purposes, and an ExpenseID field, which identifies the transaction to the users. The two cannot be the same, because
ExpenseID can be repeated twice if the transaction was modified and its row was duplicated.
ExpenseIDs MUST be consecutive and NEVER have gaps, while identity fields can skip numbers if a transaction is rolled back and the identity is not reseeded.
In general, I believe the primary key should have no business meaning whatsoever.
My problem is that there are other tables used to link these expenses Many-To-Many to other objects in our system. E.g.: each expense can be linked to documents, folders, users, etc.
So it looks something like this:
create table Expenses (
ID int not null identity(1,1),
ExpenseID int not null,
Amount Money not null,
Status tinyint not null,
[...]
)
create table Expenses_Users (
ExpenseID int not null,
UserID int not null
)
alter table Expenses_Users add constraint FK_Expenses_Users_Expenses
foreign key (ExpenseID) references Expenses (ID)
alter table Expenses_Users add constraint FK_Expenses_Users_Users
foreign key (UserID) references Users (ID)
Now, because of the IRS guidelines, I have to duplicate not only rows in the Expenses table, but also in Expenses_Users, and any other table that links Expenses to other tables.
I have two ideas on how to solve this:
Option One: Normalize Expenses like this:
create table Expenses (
ID int not null identity(1,1),
ExpenseID int not null,
Status tinyint not null,
[...]
)
create table ExpensesNormalized (
ExpenseID int not null,
Amount Money not null
)
alter table ExpensesNormalized add constraint FK_ExpensesNormalized_Expenses
foreign key (ExpenseID) references Expenses(ExpenseID)
This means I'll only have to link external tables to Expenses, not ExpensesNormalized. Also, when updating an expense, I'll only duplicate and negate the data in ExpensesNormalized, which means I'll have far less redundant data in the Expenses table.
However, I'll have to use a JOIN clause every single time I SELECT from Expenses. I fear a performance hit because of this.
Option Two: Use the same tables I use now, but have the field Expenses_Users.ExpenseID point to the field Expenses.ExpenseID. This means that I won't have to duplicate any external objects because they'll point to ExpenseID, which may occur several times.
However, this will not be a real foreign key because SQL Server does not allow foreign keys to non-unique fields, so I'll have to implement foreign key logic in a trigger.
I'm having a hard time deciding between these two options. Any feedback would be appreciated.