How can I prevent date overlaps in SQL? - sql

I have the following table structure for a hire table:
hireId int primary key
carId int not null foreign key
onHireDate datetime not null
offHireDate datetime not null
I am attempting to program a multi-user system that does not allow onhire and offhire period for cars to overlap. I need to be able to add hires in a non sequential order. Also need to allow editing of hires.
Any way to constrain the tables or use triggers etc to prevent overlaps? I am using entity framework so I would want to insert to the table as normal and then if it fails throw some catchable exception etc.

Consider this query:
SELECT *
FROM Hire AS H1, Hire AS H2
WHERE H1.carId = H2.carId
AND H1.hireId < H2.hireId
AND
CASE
WHEN H1.onHireDate > H2.onHireDate THEN H1.onHireDate
ELSE H2.onHireDate END
<
CASE
WHEN H1.offHireDate > H2.offHireDate THEN H2.offHireDate
ELSE H1.offHireDate END
If all rows meet you business rule then this query will be the empty set (assuming closed-open representation of periods i.e. where the end date is the earliest time granule that is not considered within the period).
Because SQL Server does not support subqueries within CHECK constraints, put the same logic in a trigger (but not an INSTEAD OF trigger, unless you can provide logic to resolve overlaps).
Alternative query using Fowler:
SELECT *
FROM Hire AS H1, Hire AS H2
WHERE H1.carId = H2.carId
AND H1.hireId < H2.hireId
AND H1.onHireDate < H2.offHireDate
AND H2.onHireDate < H1.offHireDate;

CREATE TRIGGER tri_check_date_overlap ON your_table
INSTEAD OF INSERT
AS
BEGIN
IF ##ROWCOUNT = 0
RETURN
-- check for overlaps in table 'INSERTED'
IF EXISTS(
SELECT hireId FROM your_table WHERE
(INSERTED.onHireDate BETWEEN onHireDate AND offHireDate) OR
(INSERTED.offHireDate BETWEEN onHireDate AND offHireDate)
)
BEGIN
-- exception? or do nothing?
END
ELSE
BEGIN
END
END
GO

I am now using domain driven design techniques. This keeps me from worrying about database triggers etc. and keeps all the logic within the .Net code. Answer is here so that people can see an alternative.
I treat the parent of the collection of periods as an aggregate root and the root has a timestamp. Aggregate root is a consistency boundaries for transactions, distributions and concurrency (see Evans - what I've learned since the blue book). This is used to check the last time any change was confirmed to the database.
I then have methods to add the hire periods to the parent. I test on overlap when adding the movement to the aggregate root. e.g. AddHire(start,end) - will validate that this creates no overlap on the in memory domain object.
AS there is no overlap I save the changes (via my repository), and check the database timestamp is still the same as at the start of the process. Assuming timestamp is the same as it was when I retrieved the entity the changes are persisted and the database updates the timestamp.
If someone else tries to save changes when the aggregate root is being worked on then either I will commit first or they will. If I commit first the timestamps will not match and the overlap check will re-run to make sure that they haven't created an overlap in the intervening time.

Related

SQL, limit the amout of times something can be added

I have made a library management system using Postgresql and I would like to limit the number of books a student/employee is able to borrow. If someone wants to add a new tuple where a student/employee has borrowed a book, and that particular user has already borrowed for example 7 books, the table won't accept another addition.
According to me, either you need to handle this from business logic perspective i.e before insert retrieve the data of a specific student and then take action
or
From a Rule-based perspective
Do not wait for any additional rows to be inserted by the application, but, constantly watch the table for the count, upon reaching the count, db notifies the app instead.
You can call/trigger a stored procedure based on the
number of books taken by a specific user, if count_num_books > 7
then, the app would handle it.
Please take a look at ON CONFLICT as mentioned in their document
http://www.postgresqltutorial.com/postgresql-upsert/
You can create a stored procedure with insert on conflict and take action accordingly.
INSERT INTO table_name(column_list) VALUES(value_list)
ON CONFLICT target action;
In general, SQL does not make this easy. The typical solution is something like this:
Keep a table with one row per book borrowed and student.
Keep a count of outstanding books in the students table.
Maintain this count using triggers.
Add a check constraint on the count.
Postgres does have more convenient methods. One method is to store the list of borrowed books as an array or in a JSON structure. Alas, this is not a relational format. And, it doesn't allow the declaration of foreign key constraints.
That said, it does allow a simple check constraint on the books_borrowed column -- by using cardinality() for instance. And it doesn't make it easy to validate that there are no duplicates in the array. Also, INSERTs, UPDATEs, and DELETEs are more complicated.
For your particular problem, I would recommend the first approach.
As mentioned the best place for this the APPLICATION checking. But otherwise perhaps this is a case where the easiest method is doing nothing - ie don't try keeping a running total number of active checkouts. As Postgres has no issue with a trigger selecting from the table firing the trigger just derive the outstanding books checked out. The following assumes the existence of an checkout table as:
create table checkouts
( checkout_id serial
, student_employee_id integer not null
, book_id integer not null
, out_date date not null
, return_date date default null
) ;
Then create an Insert row trigger on this table and call the following:
create or replace function limit_checkouts()
returns trigger
language plpgsql
as $$
declare
checkout_count integer;
begin
select count(*)
into checkout_count
from checkouts c
where c.student_employee_id = new.student_employee_id
and returned_date is null ;
if checkout_count > 7
then
raise exception 'Checkout limit exceeded';
end if;
return new;
end;
$$;

SQL Interview: Prevent overlapping date range

Say there is an appointment_booking table for a list of Managers (or HRs) with startDatetime and endDatetime, then how do one design the table carefully such that it doesn't accept next entry that overlaps for same manager if he/she has appointment with some other person.
If
Manager: A
has a appointment from 2016-01-01 11:00 to 2016-01-01 14:00 with Employee-1
then if Employee-2 (or someother employee) tries to book an appointment from 20-16-01-01 13:00 to 16:00 then it shouldn't allow.
Note: It is about designing the table, so triggers/procedures isn't encouraged.
Instead of inserting ranges, you could insert slices of time. You could make the slices as wide as you want, but pretend you can book a manager for 30 minutes at a time. To book from 11:30 to 12:00, you'd insert a row with the time value at 11:30. To book from 11:30 to 12:30, you'd insert two rows, one at 11:30, the other at 12:00. Then you can just use a primary key constraint or unique constraint to prevent over booking.
create table appointment_booking (
manager char not null,
startSlice DateTime,
visiting_employee varchar2(255),
primary key (manager, startSlice)
)
I know this doesn't exactly fit your premise of the table with a start and end time, but if you have control over the table structure, this would work.
CHECK CONSTRAINT + FUNCTION (this is as close as I can get to a DDL answer)
You could create a scalar function -- "SCHEDULE_OPENING_EXISTS()" that takes begin, end, employeeID as inputs, and outputs true or false.
Then you could create a check constraint on the table
CREATE TABLE...
WITH CHECK ADD CONSTRAINT OPENING_EXISTS
CHECK (SCHEDULE_OPENING_EXISTS(begin, end, employeeID)) = 'True')
TRIGGERS:
I try to avoid triggers where I can. They're not evil per se -- but they do add a new layer of complexity to your application. If you can't avoid it, you'll need an INSTEAD OF INSERT, and also an INSTEAD OF UPDATE (presumably). Technet Reference Here: https://technet.microsoft.com/en-us/library/ms179288%28v=sql.105%29.aspx
Keep in mind, if you reject an insert/update attempt, whether or how you need to communicate that back to the user.
STORED PROCEDURES / USER INTERFACE:
Would a Stored Procedure work for your situation? Sample scenario:
User Interface -- user needs to see the schedule of the person(s) they're scheduling an appointment with.
From the UI -- attempt an insert/update using a stored proc. Have it re-check (last-minute) the opening (return a failure if the opening no longer exists), and then conditionally insert/update if an opening still exists (return a success message).
If the proc returns a failure to the UI, handle that in the UI by re-querying the visible schedule of all parties, accompanied by an error message.
I think these types of questions are interesting because any time you are designing a database, it is important to know the requirements of the application that will be interacting with your database.
That being said, as long as the application can reference multiple tables, I think Chris Steele's answer is a great start that I will build upon...
I would want 2 tables. The first table divides a day into parts (slices), depending on the business needs of the organization. Each slice would be the primary key of this table. I personally would choose 15 minute slices that equates into 96 day-parts. Each day-part in this table would have a "block start" and a "block end" time that would referenced by the scheduling application when a user has selected an actual start time and an actual end time for the meeting. The application would need to apply logic such as two "OR" operators between 3 "AND" statements in order to see if a particular blockID will be inserted into your Appointments table:
actual start >= block start AND actual start < block end
actual end > block start AND actual end < block end
actual start < block start AND actual end > block end
This slightly varies from Chris Steele's answer in that it uses two tables. The actual time stamps can still be inserted into your applications table, but logic is only applied to them when comparing against the TimeBlocks table. In my Appointments table, I prefer breaking dates into constituent parts for cross-platform analysis (our organization uses multiple RDBMS as well as SAS for analytics):
CREATE TABLE TimeBlocks (
blockID Number(X) NOT NULL,
blockStart DateTime NOT NULL,
blockEnd DateTime NOT NULL,
primary key (blockID)
);
CREATE TABLE Appointments (
mgrID INT NOT NULL,
yr INT NOT NULL,
mnth INT NOT NULL,
day INT NOT NULL,
blockID INT NOT NULL,
ApptStart DateTime NOT NULL,
ApptEnd DateTime NOT NULL
empID INT NOT NULL,
primary key (mgrID, yr, mnth, day, blockID),
CONSTRAINT timecheck
check (ApptStart < ApptEnd)
);

How should I reliably mark the most recent row in SQL Server table?

The existing design for this program is that all changes are written to a changelog table with a timestamp. In order to obtain the current state of an item's attribute we JOIN onto the changelog table and take the row having the most recent timestamp.
This is a messy way to keep track of current values, but we cannot readily change this changelog setup at this time.
I intend to slightly modify the behavior by adding an "IsMostRecent" bit to the changelog table. This would allow me to simply pull the row having that bit set, as opposed to the MAX() aggregation or recursive seek.
What strategy would you employ to make sure that bit is always appropriately set? Or is there some alternative you suggest which doesn't affect the current use of the logging table?
Currently I am considering a trigger approach, which turns the bit off all other rows, and then turns it on for the most recent row on an INSERT
I've done this before by having a "MostRecentRecorded" table which simply has the most recently inserted record (Id and entity ID) fired off a trigger.
Having an extra column for this isn't right - and can get you into problems with transactions and reading existing entries.
In the first version of this it was a simple case of
BEGIN TRANSACTION
INSERT INTO simlog (entityid, logmessage)
VALUES (11, 'test');
UPDATE simlogmostrecent
SET lastid = ##IDENTITY
WHERE simlogentityid = 11
COMMIT
Ensuring that the MostRecent table had an entry for each record in SimLog can be done in the query but ISTR we did it during the creation of the entity that the SimLog referred to (the above is my recollection of the first version - I don't have the code to hand).
However the simple version caused problems with multiple writers as could cause a deadlock or transaction failure; so it was moved into a trigger.
Edit: Started this answer before Richard Harrison answered, promise :)
I would suggest another table with the structure similar to below:
VersionID TableName UniqueVal LatestPrimaryKey
1 Orders 209 12548
2 Orders 210 12549
3 Orders 211 12605
4 Orders 212 10694
VersionID -- being the tables key
TableName -- just in case you want to roll out to multiple tables
UniqueVal -- is whatever groups multiple rows into a single item with history (eg Order Number or some other value)
LatestPrimaryKey -- is the identity key of the latest row you want to use.
Then you can simply JOIN to this table to return only the latest rows.
If you already have a trigger inserting rows into the changelog table this could be adapted:
INSERT INTO [MyChangelogTable]
(Primary, RowUpdateTime)
VALUES (#PrimaryKey, GETDATE())
-- Add onto it:
UPDATE [LatestRowTable]
SET [LatestPrimaryKey] = #PrimaryKey
WHERE [TableName] = 'Orders'
AND [UniqueVal] = #OrderNo
Alternatively it could be done as a merge to capture inserts as well.
One thing that comes to mind is to create a view to do all the messy MAX() queries, etc. behind the scenes. Then you should be able to query against the view. This way would not have to change your current setup, just move all the messiness to one place.

Ensure max min columns don't overlap

Let's say I have the following Categories table:
Category MinValue MaxValue
A 1 2
B 3 9
C 10 0
Above I'm using 0 to indicate no maximum. These values will be configurable by end users. They will be able to add and remove categories, and modify the max and min values. Is there any sort of a constraint I can place on the table to ensure that no two ranges overlap?
This table will be modified using a web application so I could pre-validate changes to the table using Javascript so even an algorithm to prevent duplicates might suffice.
Maybe I'm missing the obvious here, but I don't think this is easy in Oracle.
I've seen solutions using a materialized view
that contains the overlaps from the Categories table
is refresh on commit
has a check constraint that it not contain any rows. This can be achieved by having a "rownum" column in the materialized view and a check constraint that this "rownum" column's value is always 0.
The check constraint on the materialized will then be violated on commit if a user enters any overlapping data.
You'll need to write your front end to allow for exceptions to be raised by Oracle on commit and to present an appropriate message to the user.
Now in the latest version of Postgresql for example, this is very easy with exclusion constraints.
I don't think that you can do it with a constraint, but you should be able to create a before insert/update trigger and use raise_application_error to abort the insert if it violates the conditions.
Something like...
if exists (select * from yourtable where :new.minvalue<maxvalue and :new.maxvalue>minvalue)
begin
raise_application_error(...)
end

Handling Revisions within Oracle

I have a table say:
CREATE TABLE "DataNode" (
"ID" NUMBER(7,0),
"TYPE" NUMBER(7,0),
"NAME" VARCHAR2(100),
"STATUS" NUMBER(7,0),
"REVISION" NUMBER(4,0),
"MODIFIEDAT" DATE
);
CREATE TABLE "DataNode_Revisions" (
"ID" NUMBER(7,0),
"NODEID" NUMBER(7,0),
"TYPE" NUMBER(7,0),
"NAME" VARCHAR2(100),
"STATUS" NUMBER(7,0),
"REVISION" NUMBER(4,0),
"MODIFIEDAT" DATE
) COMPRESS;
So I have these two tables. I do all my reads from "DataNode" and when a change occurs I write out the current entry to "DataNode_Revisions" and then modify my existing "DataNode" record. Makes sense?
Is this the best way to go about it? I can already tell I am going to run into problems when the Schema changes. I am not seeing a better alternative but if there is one please let me know! I assume keeping this all in one table would result in massive performance losses woudl it not? I mean I would be more then quadrupling the number of records and there is already quite a few. I think Drupal stores node revisions like this, and I am curious how they do not suffer performance problems from it.
"DataNode" is constantly being read by a lot of users. However, very few writes ever occur. "DataNode_Revisions" is only read from on occasion. I am just worried about maintaining so many tables. "DataNode" is one of ~25 tables very similar to this one.
Whether there will be any performance implications from storing the old rows in the DataNode table depends on how the DataNode rows are accessed. If the reads are all single-row lookups for the current row, the number of rows in the table is relatively immaterial-- it's not going to take any more work to find the current row for a particular ID than it would to get the row for that ID from the current DataNode table (I'm assuming here that ID is the key for the table). On the other hand, if you have a number of queries that are doing table scans of the DataNode table, then quadrupling the number of rows will increase the time required to run those queries.
If you want to go down the path of putting the historical rows in the DataNode table, you would probably want to add an EXPIRATION_DATE column that is NULL for the current row and populated for the expired rows. You could then create a function-based index based on the EXPIRATION_DATE that would have data for only the current rows, i.e.
CREATE INDEX idx_current_ids
ON DataNode( (CASE WHEN expiration_date IS NULL THEN id ELSE null END) );
which would be used in a query like
SELECT *
FROM DataNode
WHERE (CASE WHEN expiration_date IS NULL THEN id ELSE null END) = <<some id>>
Obviously, you'd probably want to create a view that has this condition rather than rewriting it every time you need the current row, i.e.
CREATE VIEW CurrentDataNode
AS
SELECT (CASE WHEN expiration_date IS NULL THEN id ELSE null END) id,
type,
name,
status
FROM DataNode;
SELECT *
FROM CurrentDataNode
WHERE id = <<some value>>
I usually use triggers to do the writing to the 'Revisions' table. Yes, schema changes force you to update the mirror table and trigger/archive function.
I think you will regret keeping all your history as well as the current revision in a single table, so I think you've got the right idea.
If you want to try to come up with a generic solution that doesn't require a mirror table for every one of your transactional tables you might consider having just a single revisions table where you convert records to XML and store that in a clob... not very useful if you have to access it often or quickly, but good if you're really just wanting to archive everything.
It's going to depend on the application. If you're on 11g, you might want to look at the new Flashback Data Archive. I'm just starting to look at it to keep history on all our financial and other critical data.
You have a few options. What is the business requirement that forces you to keep track of data changes?
if you only need to keep changes for some "short" period of time, you could read the data from UNDO using flashback query.. select * from table as of timestamp (bla);
if you need to retain this information long term, take a look at t feature called Oracle Total Recall. It does the same as Flashback Query, but retains the changes indefinitely.
if you need something simpler, don't have the app insert the "old" version of the rows. Use a trigger that populates the data.
if the system is extremely busy, you can decouple the two tables by having an intermediary table that you use as a "queue"