SQL Interview: Prevent overlapping date range - sql

Say there is an appointment_booking table for a list of Managers (or HRs) with startDatetime and endDatetime, then how do one design the table carefully such that it doesn't accept next entry that overlaps for same manager if he/she has appointment with some other person.
If
Manager: A
has a appointment from 2016-01-01 11:00 to 2016-01-01 14:00 with Employee-1
then if Employee-2 (or someother employee) tries to book an appointment from 20-16-01-01 13:00 to 16:00 then it shouldn't allow.
Note: It is about designing the table, so triggers/procedures isn't encouraged.

Instead of inserting ranges, you could insert slices of time. You could make the slices as wide as you want, but pretend you can book a manager for 30 minutes at a time. To book from 11:30 to 12:00, you'd insert a row with the time value at 11:30. To book from 11:30 to 12:30, you'd insert two rows, one at 11:30, the other at 12:00. Then you can just use a primary key constraint or unique constraint to prevent over booking.
create table appointment_booking (
manager char not null,
startSlice DateTime,
visiting_employee varchar2(255),
primary key (manager, startSlice)
)
I know this doesn't exactly fit your premise of the table with a start and end time, but if you have control over the table structure, this would work.

CHECK CONSTRAINT + FUNCTION (this is as close as I can get to a DDL answer)
You could create a scalar function -- "SCHEDULE_OPENING_EXISTS()" that takes begin, end, employeeID as inputs, and outputs true or false.
Then you could create a check constraint on the table
CREATE TABLE...
WITH CHECK ADD CONSTRAINT OPENING_EXISTS
CHECK (SCHEDULE_OPENING_EXISTS(begin, end, employeeID)) = 'True')
TRIGGERS:
I try to avoid triggers where I can. They're not evil per se -- but they do add a new layer of complexity to your application. If you can't avoid it, you'll need an INSTEAD OF INSERT, and also an INSTEAD OF UPDATE (presumably). Technet Reference Here: https://technet.microsoft.com/en-us/library/ms179288%28v=sql.105%29.aspx
Keep in mind, if you reject an insert/update attempt, whether or how you need to communicate that back to the user.
STORED PROCEDURES / USER INTERFACE:
Would a Stored Procedure work for your situation? Sample scenario:
User Interface -- user needs to see the schedule of the person(s) they're scheduling an appointment with.
From the UI -- attempt an insert/update using a stored proc. Have it re-check (last-minute) the opening (return a failure if the opening no longer exists), and then conditionally insert/update if an opening still exists (return a success message).
If the proc returns a failure to the UI, handle that in the UI by re-querying the visible schedule of all parties, accompanied by an error message.

I think these types of questions are interesting because any time you are designing a database, it is important to know the requirements of the application that will be interacting with your database.
That being said, as long as the application can reference multiple tables, I think Chris Steele's answer is a great start that I will build upon...
I would want 2 tables. The first table divides a day into parts (slices), depending on the business needs of the organization. Each slice would be the primary key of this table. I personally would choose 15 minute slices that equates into 96 day-parts. Each day-part in this table would have a "block start" and a "block end" time that would referenced by the scheduling application when a user has selected an actual start time and an actual end time for the meeting. The application would need to apply logic such as two "OR" operators between 3 "AND" statements in order to see if a particular blockID will be inserted into your Appointments table:
actual start >= block start AND actual start < block end
actual end > block start AND actual end < block end
actual start < block start AND actual end > block end
This slightly varies from Chris Steele's answer in that it uses two tables. The actual time stamps can still be inserted into your applications table, but logic is only applied to them when comparing against the TimeBlocks table. In my Appointments table, I prefer breaking dates into constituent parts for cross-platform analysis (our organization uses multiple RDBMS as well as SAS for analytics):
CREATE TABLE TimeBlocks (
blockID Number(X) NOT NULL,
blockStart DateTime NOT NULL,
blockEnd DateTime NOT NULL,
primary key (blockID)
);
CREATE TABLE Appointments (
mgrID INT NOT NULL,
yr INT NOT NULL,
mnth INT NOT NULL,
day INT NOT NULL,
blockID INT NOT NULL,
ApptStart DateTime NOT NULL,
ApptEnd DateTime NOT NULL
empID INT NOT NULL,
primary key (mgrID, yr, mnth, day, blockID),
CONSTRAINT timecheck
check (ApptStart < ApptEnd)
);

Related

Date ranges unique constraint in Database

I have a table "holidays" which represents people's holidays. It contains a FK to a person table, a from date column and a to date column. I want to add a constraint so that no person can have an over lapping holiday with themselves. So if Billy has a skiing holiday from 15th Jan - 20thJan, he can't have another vacation on the 18th Jan? But it's fine for him to do it on the 21st Jan?
Is this possible to do at database level via a constraint?
DB2 or Oracle can suffice?
Thanks
In DB2 you could use Temporal Tables and Time Travel Queries - check out the doumentation
Using Business Time with Business Period Temporal Tables will allow to define an index which enforces that periods do not overlap
CREATE UNIQUE INDEX I_vacation ON vacation (person, BUSINESS_TIME WITHOUT OVERLAPS)
Not directly. Constraints (at least in Oracle, I can't speak for other databases) work on one row at a time, they don't look at other rows - EXCEPT the UNIQUE constraint which looks across rows.
So - two solutions. One is, instead of storing ranges, to store one row per holiday DAY. (By the way, I believe what you call "holiday" is called "vacation", at least in America; "holiday" is reserved for common holidays, the same for all people, such as New Year or Christmas, etc.) In this arrangement, add a UNIQUE constraint on (person_id, vacation_day). Then re-work your input and reporting apps to translate from ranges to individual days, and respectively from individual days back to ranges.
The other solution, if you must store ranges, is to create a materialized view with refresh on commit (preferably fast refresh if the conditions permit), which shows person_id and vacation_day, one row per day - and put a UNIQUE constraint on the materialized view.
You can create a stored procedure wich take datestart and dateend of current row and use them parameter of this procedure. This procedure return 1 if exist in table a bad range and otherwise 0. Then you create your constraint check when this result procedure =0

Database Table Design Issues

I am new to DB Design and I've recently inherited the responsibility of adding some new attributes to an existing design.
Below is a sample of the current table in question:
Submission Table:
ID (int)
Subject (text)
Processed (bit)
SubmissionDate (datetime)
Submitted (bit)
...
The new requirements are:
A Submission can be marked as valid or invalid
A Reason must be provided when a Submission is marked as invalid. (So a submission may have an InvalidReason)
Submissions can be associated with one another such that: Multiple valid Submissions can be set as "replacements" for an invalid Submission.
So I've currently taken the easy solution and simply added new attributes directly to the Submission Table so that it looks like this:
NEW Submission Table:
ID (int)
Subject (text)
Processed (bit)
SubmissionDate (datetime)
Submitted (bit)
...
IsValid (bit)
InvalidReason (text)
ReplacedSubmissionID (int)
Everything works fine this way, but it just seems a little strange:
Having InvalidReason as a column that will be NULL for majority of submissions.
Having ReplacedSubmissionID as a column that will be NULL for majority of submissions.
If I understand normalization right, InvalidReason might be transitively dependent on the IsValid bit.
It just seems like somehow some of these attributes should be extracted to a separate table, but I don't see how to create that design with these requirements.
Is this single table design okay? Anyone have better alternative ideas?
Whether or not you should have a single table design really depends on
1) How you will be querying the data
2) How much data would end up being potentially NULL in the resulting table.
In your case its probably ok, but again it depends on #1. If you will be querying separately to get information on invalid submissions, you may want to create a separate table that references the Id of invalid submissions and the reason:
New table: InvalidSubmissionInfo
Id (int) (of invalid submissions; will have FK contraint on Submission table)
InvalidReason (string)
Additionally if you will be querying for replaced submissions separately you may want to have a table just for those:
New table: ReplacementSubmissions
Id (int) (of the replacement submissions; will have FK contraint on Submission table)
ReplacedSubmissionId (int) (of what got replaced; will have FK constraint on submission table)
To get the rest of the info you will still have to join with the Submissions table.
All this to say you do not need separate this out into multiple tables. Having a NULL value only takes up 1 bit of memory which isn't bad. And if you need to query and return an entire Submission record each time, it makes more sense to condense this info into one table.
Single table design looks good to me and it should work in your case.
If you do not like NULLS, you can give default value of an empty string and ReplacedSubmissionID to 0. Default values are always preferable in database design.
Having an empty string or default value will make your data look more cleaner.
Please remember if you add default values, you might need to change queries to get proper results.
For example:-
Getting submissions which have not been replaced>
Select * from tblSubmission where ReplacedSubmissionID = 0
Don't fear joins. Looking for ways to place everything in a single table is at best a complete waste of time, at worst results in a convoluted, unmaintainable mess.
You are correct about InvalidReason and IsValid. However, you missed SubmittedDate and Submitted.
Whenever modeling an entity that will be processed in some way and going through consecutive state changes, these states really should be placed in a separate table. Any information concerning the state change -- date, reason for the change, authorization, etc. -- will have a functional dependency on the state rather than the entity as a whole, therefore an attempt to make the state information part of the entity tuple will fail at 2nf.
The problem this causes is shown in your very question. You already incorporated Submitted and SubmittedDate into the tuple. Now you have another state you want to add. If you had normalized the submission data, you could have simply added another state and gone on.
create table StateDefs(
ID int auto_generated primary key,
Name varchar( 16 ) not null, -- 'Submitted', 'Processed', 'Rejected', etc.
... -- any other data concerning states
);
create table Submissions(
ID int auto_generated primary key,
Subject varchar( 128 ) not null,
... -- other data
);
create table SubmissionStates(
SubID int not null references Submissions( ID ),
State int not null references StateDefs( ID ),
When date not null,
Description varchar( 128 )
);
This shows that a state consists of a date and an open text field to place any other information. That may suit your needs. If different states require different data, you may have to (gasp) create other state tables. Whatever your needs require.
You could insert the first state of a submission into the table and update that record at state changes. But you lose the history of state changes and that is useful information. So each state change would call for a new record each time. Reading the history of a submission would then be easy. Reading the current state would be more difficult.
But not too difficult:
select ss.*
from SubmissionStates ss
where ss.SubID = :SubID
and ss.When =(
select Max( When )
from SubmissionStates
where SubID = ss.SubID
and When <= Today() );
This finds the current row, that is, the row with the most recent date. To find the state that was in effect on a particular date, change Today() to something like :AsOf and place the date of interest in that variable. Storing the current date in that variable returns the current state so you can use the same query to find current or past data.

How to create a smart key, which would reset its auto increment value at the end of the month

I am currently working on a project for the management of oil distribution, and i need the receipts of every bill to get stored in a database. I am thinking of building a smart key for the receipts which will contain the first 2 letters of the city, the gas station id, the auto increment number, first letter of the month and the last 2 digits of the year. So it will be somewhat like this:
"AA-3-0001-J15". What i am wondering is how to make the AI number to go back at 0001 when the month changes. Any suggestions?
To answer the direct question - how to make the number restart at 1 at the beginning of the month.
Since it is not a simple IDENTITY column, you'll have to implement this functionality yourself.
To generate such complex value you'll have to write a user-defined function or a stored procedure. Each time you need a new value of your key to insert a new row in the table you'll call this function or execute this stored procedure.
Inside the function/stored procedure you have to make sure that it works correctly when two different sessions are trying to insert the row at the same time. One possible way to do it is to use sp_getapplock.
You didn't clarify whether the "auto increment" number is the single sequence across all cities and gas stations, or whether each city and gas station has its own sequence of numbers. Let's assume that we want to have a single sequence of numbers for all cities and gas stations within the same month. When month changes, the sequence restarts.
The procedure should be able to answer the following question when you run it: Is the row that I'm trying to insert the first row of the current month? If the generated value is the first for the current month, then the counter should be reset to 1.
One method to answer this question is to have a helper table, which would have one row for each month. One column - date, second column - last number of the sequence. Once you have such helper table your stored procedure would check: what is the current month? what is the last number generated for this month? If such number exists in the helper table, increment it in the helper table and use it to compose the key. If such number doesn't exist in the helper table, insert 1 into it and use it to compose the key.
Finally, I would not recommend to make this composite value as a primary key of the table. It is very unlikely that user requirement says "make the primary key of your table like this". It is up to you how you handle it internally, as long as accountant can see this magic set of letters and numbers next to the transaction in his report and user interface. Accountant doesn't know what a "primary key" is, but you do. And you know how to join few tables of cities, gas stations, etc. together to get the information you need from a normalized database.
Oh, by the way, sooner or later you will have more than 9999 transactions per month.
Do you want to store all that in one column? That sounds to me like a composite key over four columns...
Which could look like the following:
CREATE TABLE receipts (
CityCode VARCHAR2(2),
GasStationId NUMERIC,
AutoKey NUMERIC,
MonthCode VARCHAR2(2),
PRIMARY KEY (CityCode, GasStationId, AutoKey, MonthCode)
);
Which DBMS are you using? (MySQL, MSSQL, PostgreSQL, ...?)
If it's MySQL you could have a batch-job which runs on the month's first which executes:
ALTER TABLE tablename AUTO_INCREMENT = 1
But that logic would be on application layer instead of DB-layer...
In such cases, it is best to use a User-Defined function to generate this key and then store it. Like :
Create Function MyKeyGenerator(
#city varchar(250) = '',
#gas_station_id varchar(250) = '')
AS
/*Do stuff here
*/
My guess is , you may need another little table that keeps the last generated auto-number for the month and you may need to update it for the first record that generates during the month. For the next records, during the month, you will fetch from there and increment by 1. You can alse use a stored procedure that returns an Integer as a return code, just for the autonumber part and then do the rest in a function.
Btw, you may want to note that, using the first letter of the month has pitfalls, because two months can have the same first letter. May be try the the two-digit-numeric for the month or the first three letters of the month name.
If you ready not to insist the the AI number exactly be of identity type, you can have another table, where it is a non-identity regular integer, and then run an SQL Server Agent Task calling a stored procedure that'll do the incrementing business.

Multiple Wildcard Counts in Same Query

One of my job functions is being responsible for mining and marketing on a large newsletter subscription database. Each one of my newsletters has four columns (newsletter_status, newsletter_datejoined, newsletter_dateunsub, and newsletter_unsubmid).
In addition to these columns, I also have a master unsub column that our customer service dept. can update to accomodate irate subscribers who wish to be removed from all our mailings, and another column that gets updated if a hard bounce (or a set number of soft bounces) occurs called emailaddress_status.
When I pull a count for current valid subscribers for one list I use the following syntax:
select count (*) from subscriber_db
WHERE (emailaddress_status = 'VALID' OR emailaddress_status IS NULL)
AND newsletter_status = 'Y'
and unsub = 'N' and newsletter_datejoined >= '2013-01-01';
What I'd like to have is one query that looks for all columns with %_status, with the aforementioned criteria ordered by current count size.
I'd like for it to look like this:
etc.
I've search around the web for months looking for something similar, but other than running them in a terminal and exporting the results I've not been able to successfully get them all in one query.
I'm running PostgreSQL 9.2.3.
A proper test case would be each aggregate total matching the counts I get when running the individual queries.
Here's my obsfucated table definition for ordinal placement, column_type, char_limit, and is_nullable.
Your schema is absolutely horrifying:
24 ***_status text YES
25 ***_status text YES
26 ***_status text YES
27 ***_status text YES
28 ***_status text YES
29 ***_status text YES
where I presume the masked *** is something like the name of a publication/newsletter/etc.
You need to read about data normalization or you're going to have a problem that keeps on growing until you hit PostgreSQL's row-size limit.
Since each item of interest is in a different column the only way to solve this with your existing schema is to write dynamic SQL using PL/PgSQL's EXECUTE format(...) USING .... You might consider this as an interim option only, but it's a bit like using a pile driver to jam the square peg into the round hole because a hammer wasn't big enough.
There are no column name wildcards in SQL, like *_status or %_status. Columns are a fixed component of the row, with different types and meanings. Whenever you find yourself wishing for something like this it's a sign that your design needs to be re-thought.
I'm not going to write an example since (a) this is an email marketing company and (b) the "obfuscated" schema is completely unusable for any kind of testing without lots of manual work re-writing it. (In future, please provide CREATE TABLE and INSERT statements for your dummy data, or better yet, a http://sqlfiddle.com/). You'll find lots of examples of dynamic SQL in PL/PgSQL - and warnings about how to avoid the resulting SQL injection risks by proper use of format - with a quick search of Stack Overflow. I've written a bunch in the past.
Please, for your sanity and the sanity of whoever else needs to work on this system, normalize your schema.
You can create a view over the normalized tables to present the old structure, giving you time to adapt your applications. With a bit more work you can even define a DO INSTEAD view trigger (newer Pg versions) or RULE (older Pg versions) to make the view updateable and insertable, so your app can't even tell that anything has changed - though this comes at a performance cost so it's better to adapt the app if possible.
Start with something like this:
CREATE TABLE subscriber (
id serial primary key,
email_address text not null,
-- please read http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
-- for why I merged "fname" and "lname" into one field:
realname text,
-- Store birth month/year as a "date" with a "CHECK" constraint forcing it to be the 1st day
-- of the month. Much easier to work with.
birthmonth date,
CONSTRAINT birthmonth_must_be_day_1 CHECK ( extract(day from birthmonth) = 1),
postcode text,
-- Congratulations! You made "gender" a "text" field to start with, you avoided
-- one of the most common mistakes in schema design, the boolean/binary gender
-- field!
gender text,
-- What's MSO? Should have a COMMENT ON...
mso text,
source text,
-- Maintain these with a trigger. If you want modified to update when any child record
-- changes you can do that with triggers on subscription and reducedfreq_subscription.
created_on timestamp not null default current_timestamp,
last_modified timestamp not null,
-- Use the native PostgreSQL UUID type, after running CREATE EXTENSION "uuid-ossp";
uuid uuid not null,
uuid2 uuid not null,
brand text,
-- etc etc
);
CREATE TABLE reducedfreq_subscription (
id serial primary key,
subscriber_id integer not null references subscriber(id),
-- Suspect this was just a boolean stored as text in your schema, in which case
-- delete it.
reducedfreqsub text,
reducedfreqpref text,
-- plural, might be a comma list? Should be in sub-table ("join table")
-- if so, but without sample data can only guess.
reducedfreqtopics text,
-- date can be NOT NULL since the row won't exist unless they joined
reducedfreq_datejoined date not null,
reducedfreq_dateunsub date
);
CREATE TABLE subscription (
id serial primary key,
subscriber_id integer not null references subscriber(id),
sub_name text not null,
status text not null,
datejoined date not null,
dateunsub date
);
CREATE TABLE subscriber_activity (
last_click timestamptz,
last_open timestamptz,
last_hardbounce timestamptz,
last_softbounce timestamptz,
last_successful_mailing timestamptz
);
To call it merely "horrifying" shows a great deal of tact and kindness on your part. Thank You. :) I inherited this schema only recently (which was originally created by the folks at StrongMail).
I have a full relational DB re-arch project on my roadmap this year - the sample normalization is very much inline with what I'd been working on. Very interesting insight on realname, I hadn't really thought about that. I suppose the only reason StrongMail had it broken out was for first name email personalization.
MSO is multiple systems operator (cable company). We're a large lifestyle media company, and the newsletters we produce are on food, travel, homes and gardening.
I'm creating a Fiddle for this - I'm new here so going forward I'll be more mindful of what you guys need to be able to help. Thank you!

How can I prevent date overlaps in SQL?

I have the following table structure for a hire table:
hireId int primary key
carId int not null foreign key
onHireDate datetime not null
offHireDate datetime not null
I am attempting to program a multi-user system that does not allow onhire and offhire period for cars to overlap. I need to be able to add hires in a non sequential order. Also need to allow editing of hires.
Any way to constrain the tables or use triggers etc to prevent overlaps? I am using entity framework so I would want to insert to the table as normal and then if it fails throw some catchable exception etc.
Consider this query:
SELECT *
FROM Hire AS H1, Hire AS H2
WHERE H1.carId = H2.carId
AND H1.hireId < H2.hireId
AND
CASE
WHEN H1.onHireDate > H2.onHireDate THEN H1.onHireDate
ELSE H2.onHireDate END
<
CASE
WHEN H1.offHireDate > H2.offHireDate THEN H2.offHireDate
ELSE H1.offHireDate END
If all rows meet you business rule then this query will be the empty set (assuming closed-open representation of periods i.e. where the end date is the earliest time granule that is not considered within the period).
Because SQL Server does not support subqueries within CHECK constraints, put the same logic in a trigger (but not an INSTEAD OF trigger, unless you can provide logic to resolve overlaps).
Alternative query using Fowler:
SELECT *
FROM Hire AS H1, Hire AS H2
WHERE H1.carId = H2.carId
AND H1.hireId < H2.hireId
AND H1.onHireDate < H2.offHireDate
AND H2.onHireDate < H1.offHireDate;
CREATE TRIGGER tri_check_date_overlap ON your_table
INSTEAD OF INSERT
AS
BEGIN
IF ##ROWCOUNT = 0
RETURN
-- check for overlaps in table 'INSERTED'
IF EXISTS(
SELECT hireId FROM your_table WHERE
(INSERTED.onHireDate BETWEEN onHireDate AND offHireDate) OR
(INSERTED.offHireDate BETWEEN onHireDate AND offHireDate)
)
BEGIN
-- exception? or do nothing?
END
ELSE
BEGIN
END
END
GO
I am now using domain driven design techniques. This keeps me from worrying about database triggers etc. and keeps all the logic within the .Net code. Answer is here so that people can see an alternative.
I treat the parent of the collection of periods as an aggregate root and the root has a timestamp. Aggregate root is a consistency boundaries for transactions, distributions and concurrency (see Evans - what I've learned since the blue book). This is used to check the last time any change was confirmed to the database.
I then have methods to add the hire periods to the parent. I test on overlap when adding the movement to the aggregate root. e.g. AddHire(start,end) - will validate that this creates no overlap on the in memory domain object.
AS there is no overlap I save the changes (via my repository), and check the database timestamp is still the same as at the start of the process. Assuming timestamp is the same as it was when I retrieved the entity the changes are persisted and the database updates the timestamp.
If someone else tries to save changes when the aggregate root is being worked on then either I will commit first or they will. If I commit first the timestamps will not match and the overlap check will re-run to make sure that they haven't created an overlap in the intervening time.