SQL Scheduling - Overbooked Report - sql

I need a way to view a given resource (in this case rooms/beds) that are overbooked. Here's my table structure. Sorry about the Hungarian notation:
tblRoom
--RoomID
tblBooking
--BookingID
--BeginDate
--EndDate
--AssignedRoomID (foreign key)
I don't have any non-working SQL to post here because I really don't know where to start. I'm using MS Access but I'm looking for a database agnostic solution if possible. It's OK to have to have to change some of the keywords to match the dialect of a given SQL engine but I'd like avoid using other features that are proprietary or only available in one RDBMS.
I realize that it's best to avoid overbooking from the beginning but that's not the point of this question.
In case it's helpful, I posted a related question a couple days ago about how to find resources that are not yet booked for a given data range. You can see that question here.
Edit1:
In reply to the answer below, I've modified your SQL slightly to make it work in Access as well as to be more accurate when it comes to detecting conflicts. If I err not your solution posted below allows some conflicts to go unnoticed but also shows conflicts when a given Booking's EndDate and a different Booking's BeginDate fall on the same day, which is actually allowable and should not show as a conflict. Am I understanding this correctly or am I missing something here?
SELECT
*
FROM
tblBooking AS booking
INNER JOIN
tblBooking AS conflict
ON [conflict].AssignedRoomID = [booking].AssignedRoomID
AND (([conflict].BeginDate >= DateAdd("d", -1, [booking].BeginDate) AND [conflict].BeginDate < [booking].EndDate)
OR ([conflict].EndDate > [booking].BeginDate AND [conflict].EndDate < [booking].EndDate))
AND [conflict].BookingID <> [booking].BookingID

So, what you're looking for is any record in tblBooking for which there is another record with the same AssignRoomID for an overlapping period?
A naive solution would be...
SELECT
*
FROM
tblBooking [booking]
INNER JOIN
tblBooking [conflict]
ON [conflict].AssignedRoomID = [booking].AssignedRoomID
AND [conflict].BeginDate <= [booking].EndDate
AND [conflict].EndDate >= [booking].BeginDate
AND [conflict].BookingID != [booking].BookingID
The last condition stops a booking from being it's own conflict. It can also be changed to AND [conflict].BookingID > [booking].BookingID so that you don't get the conflict repeating. (If A conflicts with B, you only get A,B and not B,A.)
EDIT
The issue with the above solution is that it does not scale very well. When searching for a Conflict, all bookings for that room Before the booking's EndDate are found, then filtered based on the EndDate. After a few years, that first search (hopefully using an Index) will return many, many records.
One optimisation is to have a maximum booking length, and only look that many days back in time for a conflict...
INNER JOIN
tblBooking [conflict]
ON [conflict].AssignedRoomID = [booking].AssignedRoomID
AND [conflict].BeginDate <= [booking].EndDate
AND [conflict].BeginDate >= [booking].BeginDate - 7 -- Or however long the max booking length is
AND [conflict].EndDate >= [booking].BeginDate
AND [conflict].BookingID != [booking].BookingID
By having wrapped a >= AND a <= around the [conflict].BeginDate, an index search can now quickly return a reasonably limitted number of records.
For bookings longer than the maximum booking length, they can be entered into the database as multiple bookings. That's where the art of optimisation comes in, it's often all about trade-offs and compromises :)
EDIT
Another option, giving different details, would be to join the bookings against a calendar table. (Having, for example, one record per day.)
SELECT
[room].RoomID,
[calendar].Date,
COUNT(*) AS [total_bookings],
MIN([booking].BookingID) AS [min_booking_id],
MAX([booking].BookingID) AS [max_booking_id]
FROM
[calendar]
CROSS JOIN
tblRoom [room]
INNER JOIN
tblBooking [booking]
ON [booking].AssignedRoomID = [room].RoomID
AND [booking].BeginDate <= [calendar].Date
AND [booking].EndDate >= [calendar].Date
GROUP BY
[room].RoomID,
[calendar].Date
HAVING
COUNT(*) > 1

Related

Duplicate Results in Oracle SQL Plus

When I run the script below, I keep getting duplicate results, even when using distinct.
select distinct
a.SDT, a.fNo, b.IDType, b.pNo, b.pfName, b.plName, b.PDoB, b.Street, b.City, c.Phone
from Scheduled_Flight a, Passenger b, pass_Phone c
where fNo = '0000021'
and
a.SDT = '08-sep-2017 17:30';
I am new to SQL and any help would be much appreciated into solving this issue.
"I keep getting duplicate results, even when using distinct"
You are not getting duplicates in your result set. Rather you have a Cartesian product which is a combination of ONE flight, THREE passengers and THREE phone numbers. Each record in the set is unique so distinct doesn't have any affect.
The problem is you have no join conditions in your from clause. There should be a column on passenger which is the foreign key on flight, and a column on pass_phone which is the foreign key on passenger.
It is easy to fix: you just need to join the tables. Assuming your data model is consistent, your query should look like this (and you don't need DISTINCT):
select a.SDT, a.fNo, b.IDType, b.pNo, b.pfName, b.plName, b.PDoB, b.Street, b.City,c.Phone
from Scheduled_Flight a
join Passenger b on b.fNo = a.fNo
join pass_Phone c on c.pNo = b.bNo
where a.fNo = '0000021'
and a.SDT = '08-sep-2017 17:30';
However, I notice that in your version of the query you didn't prefix fNo. That makes me think you don't have a column of that name on passenger (otherwise the query would have failed on ORA-00918: column ambiguously defined). So, either the foreign key columns are named differently or you haven't got them.
"Is it possible to specify only the date without the time?"
Yep. Use an ANSI date literal e.g. date '2017-09-08'
"Is it possible to specify only the date without the time to still produce results from the database?"
That depends on the how the data is stored. Oracle dates are stored with a time element. If no time is specified (or the time element is truncated) then the time element defaults to midnight. This often catches beginners out, for instance because the pseudo-column sysdate returns the current date and time, not just the current date.
So, if you know the dates are stored in your table without a time element you can do this:
where a.sdt = date '2017-09-08'
But if you don't know that, you can truncate ...
where trunc(a.sdt) = date '2017-09-08'
or test for a range
where a.sdt >= date '2017-09-08'
and a.sdt < date '2017-09-09'
"How come the following code is still producing duplicate results?
select distinct r.sNo, r.tCode, s.fNo, s.SDT
from Airplane r, Scheduled_Flight s
where SDT >= SYSDATE -1;
The airplane attribute cannot have the s.SDT attribute."
Without seeing the output I can't be sure but I would bet that this query does not produce duplicate records either. What you have is a product combining all your AIRPLANE records with all your FLIGHT records matching the sdt filter.
This is another data modelling problem. Of course aeroplanes don't have a flight time: one aeroplane makes many flights. But it makes perfect sense for a flight to be assigned to a plane. In fact that's crucial to ensuring that you don't have more flights than you have planes to fly them, and that one plane isn't planned to take off from London for Madrid at a time when it's planned to be half-way to Hong Kong.
You really should use the ANSI 92 syntax, as I showed in my answer to your previous posted code. The explicit joins not only make it easier to understand the query but they prevent mistakes like this. The fact that you apparently don't have any candidate columns to make the join immediately highlights the flaw in the data model.
select distinct r.sNo, r.tCode, s.fNo, s.SDT
from Airplane r
INNER JOIN Scheduled_Flight s ON ????
where SDT >= SYSDATE -1;
i don't see any rows which are duplicated, if you compare every column of each row, each row is uniquely identified, since you are doing cartesian product you are getting multiple records. but each rows are unique to each other.

SQL query seems to work for 'AND T1.email_address_ IN (subquery)', but returns 0 rows for 'AND T1.email_address_ NOT IN (subquery)'

Good morning. I'm working in Responsys Interact, which is an Oracle-based email campaign management type SAAS product. I'm creating a query to basically filter a target list for an email campaign designed to target a specific sub-set of our master email contact list. Here's the query I created a few weeks ago that appears to work:
/*
Table Symbolic Name
CONTACTS_LIST $A$
Engaged $B$
TRANSACTIONS_RAW $C$
TRANSACTION_LINES_RAW $D$
-- A Responsys Filter (Engaged) will return only an RIID_, nothing else, according to John # Responsys....so,....let's join on that to contact list...
*/
SELECT
DISTINCT $A$.EMAIL_ADDRESS_,
$A$.RIID_,
$A$.FIRST_NAME,
$A$.LAST_NAME,
$A$.EMAIL_PERMISSION_STATUS_
FROM
$A$
JOIN $B$ ON $B$.RIID_ = $A$.RIID_
LEFT JOIN $C$ ON $C$.EMAIL_ADDRESS_ = $A$.EMAIL_ADDRESS_
LEFT JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
$A$.EMAIL_DOMAIN_ NOT IN ('none.com', 'noemail.com', 'mailinator.com', 'nomail.com') AND
/* don't include hp customers */
$A$.HP_PLAN_START_DATE IS NULL AND
$A$.EMAIL_ADDRESS_ NOT IN
(
SELECT
$C$.EMAIL_ADDRESS_
FROM
$C$
JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
/* Get only purchase transactions for certain item_id's/SKU's */
($D$.ITEM_FAMILY_ID IN (3,4,5,8,14,15) OR $D$.ITEM_ID IN (704,769,1893,2808,3013) ) AND
/* .... within last 60 days (i.e. 2 months) */
$A$.TRANDATE > ADD_MONTHS(CURRENT_TIMESTAMP, -2)
)
;
This seems to work, in that if I run the query without the sub-query, we get 720K rows; and if I add back the 'AND NOT IN...' subquery, we get about 700K rows, which appears correct based on what my user knows about her data. What I'm (supposedly) doing with the NOT IN subquery is filtering out any email addresses where the customer has purchased certain items from us in the last 60 days.
So, now I need to add in another constraint. We still don't want customers who made certain purchases in the last 60 days as above, but now also we want to exclude customers who have purchased another particular item, but now within the last 12 months. So, I thought I would add another subquery, as shown below. Now, this has introduced several problems:
Performance - the query, which took a couple minutes to run before, now takes quite a few more minutes to run - in fact it seems to time out....
So, I wondered if there's an issue having two subqueries, but before I went to think about alternatives to this, I decided to test my new subquery by temporarily deleting the first subquery, so that I had just one subquery similar to above, but with the new item = 11 and within the last 12 months logic. And so with this, the query finally returned after a few minutes now, but with zero rows.
Trying to figure out why, I tried simply changing the AND NOT IN (subquery) to AND IN (subquery), and that worked, in that it returned a few thousand rows, as expected.
So why would the same SQL when using AND IN (subquery) "work", but the exact same SQL simply changed to AND NOT IN (subquery) return zero rows, instead of what I would expect which would be my 700 something thousdand plus rows, less the couple thousand encapsulated by the subquery result?
Also, what is the best i.e. most performant way to accomplish what I'm trying to do, which is filter by some purchases made within one date range, AND by some other purchases made within a different date range?
Here's the modified version:
SELECT
DISTINCT $A$.EMAIL_ADDRESS_,
$A$.RIID_,
$A$.FIRST_NAME,
$A$.LAST_NAME,
$A$.EMAIL_PERMISSION_STATUS_
FROM
$A$
JOIN $B$ ON $B$.RIID_ = $A$.RIID_
LEFT JOIN $C$ ON $C$.EMAIL_ADDRESS_ = $A$.EMAIL_ADDRESS_
LEFT JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
$A$.EMAIL_DOMAIN_ NOT IN ('none.com', 'noemail.com', 'mailinator.com', 'nomail.com') AND
/* don't include hp customers */
$A$.HP_PLAN_START_DATE IS NULL AND
$A$.EMAIL_ADDRESS_ NOT IN
(
SELECT
$C$.EMAIL_ADDRESS_
FROM
$C$
JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
/* Get only purchase transactions for certain item_id's/SKU's */
($D$.ITEM_FAMILY_ID IN (3,4,5,8,14,15) OR $D$.ITEM_ID IN (704,769,1893,2808,3013) ) AND
/* .... within last 60 days (i.e. 2 months) */
$C$.TRANDATE > ADD_MONTHS(CURRENT_TIMESTAMP, -2)
)
AND
$A$.EMAIL_ADDRESS_ NOT IN
(
/* get purchase transactions for another type of item within last year */
SELECT
$C$.EMAIL_ADDRESS_
FROM
$C$
JOIN $D$ ON $D$.TRANSACTION_ID = $C$.TRANSACTION_ID
WHERE
$D$.ITEM_FAMILY_ID = 11 AND $C$.TRANDATE > ADD_MONTHS(CURRENT_TIMESTAMP, -12)
)
;
Thanks for any ideas/insights. I may be missing or mis-remembering some basic SQL concept here - if so please help me out! Also, Responsys Interact runs on top of Oracle - it's an Oracle product - but I don't know off hand what version/flavor. Thanks!
Looks like my problem with the new subquery was due to poor performance due to lack of indexes. Thanks to Alex Poole's comments, I looked in Responsys and there is a facility to get an 'explain' type analysis, and it was throwing warnings, and suggesting I build some indexes. Found the way to do that on the data sources, went back to the explain, and it said, "The query should run without placing an unnecessary burden on the system". And while it still ran for quite a few minutes, it did finally come back with close to the expected number of rows.
Now, I'm on to tackle the other half of the issue, which is to now incorporate this second sub-query in addition to the first, original subquery....
Ok, upon further testing/analysis and refining my stackoverflow search critieria, the answer to the main part of my question dealing with the IN vs. NOT IN can be found here: SQL "select where not in subquery" returns no results
My performance was helped by using Responsys's explain-like feature and adding some indexes, but when I did that, I also happened to add in a little extra SQL in my sub-query's WHERE clause.... when I removed that, even after indexes built, I was back to zero rows returned. That's because as it turned out at least one of the transactions rows for the item family id I was interested in for this additional sub-query had a null value for email address. And as further explained in the link above, when using NOT IN, as soon as you have a null value involved, SQL can't definitively say it's NOT IN, since you can't really compare to null, so as soon as you have a null, the sub-query's going to evaluate 'false', thus zero rows. When using IN, even though there are nulls present, if you get one positive match, well, that's a match, so the sub-query returns 'true', so that's why you'll get rows with IN, but not with NOT IN. I hadn't realized that some of our transaction data may have null email addresses - now I know, so I just added a where not null to the where clause for the email address, and now all's good.

Splitting one table based on criteria and comparing

I'm not quite sure on the best way to phrase this particular query, so I hope the title is adequate, however, I will attempt to describe what it is I need to be able to understand how to do. Just to clarify, this is for oracle sql.
We have a table called assessments. There are different kinds of assessments within this table, however, some assessments should follow others in a logical order and within set time frames. The problems come in when a client has multiple assessments of the same type, as we have to use a fairly inefficient array formula in excel to identify which 'full' assessment corresponds with the 'initial' assessment.
I have an earlier query that was resolved on this site (Returning relevant date from multiple tables including additional table info) which I believe includes a lot of the logic for what is required (particularly in identifying a corresponding event which has occurred within a specified timeframe). However, whilst that query pulls data from 3 seperate tables (assessments, events, responsiblities), I now need to create a query that generates a similar outcome but pulling from 1 main table and a 2nd table to return worker information. I thought the most logical way would be be to create a query that looks at the assessment table with one type of assessment, and then joins to the assessment table again (possibly a temporary table?) with assessment type that would follow the initial one.
For example:
Table 1 (Assessments):
Client ID Assessment Type Start End
P1 1 Initial 01/01/2012 05/01/2012
Table 2 (Assessments temp?):
Client ID Assessment Type Start End
P1 2 Full 12/01/2012
Table 3:
ID Worker Team
1 Bob Team1
2 Lyn Team2
Result:
Client ID Initial Start Initial End Initial Worker Full Start Full End
P1 1 01/01/2012 05/01/2012 Bob 12/01/2012
So table 1 and table 2 draw from the same table, except it's bringing back different assessments. Ideally, there'd be a check to make sure that the 'full' assessment started within X days of the end of the 'initial' assessment (similar to the 'likely' check in the previous query mentioned earlier). If this can be achieved, it's probably worth mentioning that I'd also be interested in expanding this to look at multiple assessment types, as roughly in the cycle a client could be expected to have between 4 or 5 different types of assessment. Any pointers would be appreciated, I've already had a great deal of help from this community which is very valuable.
Edit:
Edited to include solution following MBs advice.
Select
*
From(
Select
I.ASM_SUBJECT_ID as PNo,
I.ASM_ID As IAID,
I.ASM_QSA_ID as IAType,
I.ASM_START_DATE as IAStart,
I.ASM_END_DATE as IAEnd,
nvl(olm_bo.get_ref_desc(I.ASM_OUTCOME,'ASM_OUTCOME'),'') as IAOutcome,
C.ASM_ID as CAID,
C.ASM_QSA_ID as CAType,
C.ASM_START_DATE as CAStart,
C.ASM_END_DATE as CAEnd,
nvl(olm_bo.get_ref_desc(C.ASM_OUTCOME,'ASM_OUTCOME'),'') as CAOutcome,
ROUND(C.ASM_START_DATE -I.ASM_START_DATE,0) as "Likely",
row_number() over(PARTITION BY I.ASM_ID
ORDER BY
abs(I.ASM_START_DATE - C.ASM_START_DATE))as "Row Number"
FROM
O_ASSESSMENTS I
left join O_ASSESSMENTS C
on I.ASM_SUBJECT_ID = C.ASM_SUBJECT_ID
and C.ASM_QSA_ID IN ('AA523','AA1326') and
ROUND(C.ASM_START_DATE - I.ASM_START_DATE,0) >= -2
AND
ROUND(C.ASM_START_DATE - I.ASM_START_DATE,0) <= 25
and C.ASM_OUTCOME <>'ABANDON'
Where I.ASM_QSA_ID IN ('AA501','AA1323')
AND I.ASM_OUTCOME <> 'ABANDON'
AND
I.ASM_END_DATE >= '01-04-2011') WHERE "Row Number" = 1
You can access the same table multiple times in a given query in SQL, simply by using table aliases. So one way of doing this would be:
select i.client,
i.id initial_id,
i.start initial_start,
i.end initial_end,
w.worker initial_worker,
f.id full_id,
f.start full_start,
f.end full_end
from assessments i
join workers w on i.id = w.id
left join assessments f
on i.client = f.client and
f.assessment_type = 'Full' and
f.start between i.end and i.end + X
/* replace X with appropriate number of days */
where i.assessment_type = 'Initial'
Note: column names such as end (that are reserved words in Oracle SQL) should normally be double-quoted, but from the previous question it looks as though these are simplified versions of the actual column names.
From your post, I assume that you're using Oracle here (as I see "Oracle" in the question).
In terms of "temp" tables, Views come right to mind. An Oracle View can give you different looks of a table which is what it sounds like you're looking for with different kinds of assessments.
Don Burleson is a good source for anything Oracle related and he gives some tips on Oracle Views at http://www.dba-oracle.com/concepts/views.htm

SQL - Getting the max effective date less than a date in another table

I'm currently working on a conversion script to transfer a bunch of old data out of an SQL Server 2000 database and onto a SQL Server 2008. One of thing things I'm trying to accomplish during this conversion is to eliminate all of the composite keys and replace them with a "proper" primary key. Obviously, when I transfer the data I need to inject the foreign key values into the new table structures.
I'm currently stuck with one data set though and I can't seem to get my head around it in a set-based fashion. The two tables with which I am working are called Charge and Law. They have a 1:1 relationship and "link" on three columns. The first two are an equal link on the LawSource and LawStatue columns, but the third column is causing me problems. The ChargeDate column should link to the LawDate column where LawDate <= ChargeDate.
My current query is returning more than one row (in some cases) for a given Charge because the Law may have more than one LawDate that is less than or equal to the ChargeDate.
Here's what I currently have:
select LawId
from Law a
join Charge b on b.LawSource = a.LawSource
and b.LawStatute = a.LawStatute
and b.ChargeDate >= a.LawDate
Any way I can rewrite this to get the most recent entry in the Law table that is the same (or earlier) date at the ChargeDate?
This would be easier in SQL 2008 with the partitioning functions (so, it should be easier in the future for you).
The usual caveats of "I don't have your schema, so this isn't tested" apply, but I think it should do what you need.
select
l.LawID
from
law l
join (
select
a.LawSource,
a.LawStatue,
max(a.LawDate) LawDate
from
Law a
join Charge b on b.LawSource = a.LawSource
and b.LawStatute = a.LawStatute
and b.ChargeDate >= a.LawDate
group by
a.LawSource, a.LawStatue
) d on l.LawSource = d.LawSource and l.LawStatue = d.LawStatue and l.LawDate = d.LawDate
If performance is not an issue, cross apply provides a very readable way:
select *
from Law l
cross apply
(
select top 1 *
from Charge
where LawSource = l.LawSource
and LawStatute = l.LawStatute
and ChargeDate >= l.LawDate
order by
ChargeDate
) c
For each row, this looks up the row in the Charge table with the smallest ChargeDate.
To include rows from Law without a matching Charge, change cross apply to outer apply.

SQL not yielding expected results

I have three tables related to this particular query:
Lawson_Employees: LawsonID (pk), LastName, FirstName, AccCode (numeric)
Lawson_DeptInfo: AccCode (pk), AccCode2 (don't ask, HR set up), DisplayName
tblExpirationDates: EmpID (pk), ACLS (date), EP (date), CPR (date), CPR_Imported (date), PALS (date), Note
The goal is to get the data I need to report on all those who have already expired in one or more certification, or are going to expire in the next 90 days.
Some important notes:
This is being run as part of a vbScript, so the 90-day date is being calculated when the script is run. I'm using 2010-08-31 as a placeholder since its the result at the time this question is being posted.
All cards expire at the end of the month. (which is why the above date is for the end of August and not 90 days on the dot)
A valid EP card supersedes ACLS certification, but only the latter is required of some employees. (wasn't going to worry about it until I got this question answered, but if I can get the help I'll take it)
The CPR column contains the expiration date for the last class they took with us. (NULL if they didn't take any classes with us)
The CPR_Imported column contains the expiration date for the last class they took somewhere else. (NULL if they didn't take it elsewhere, and bravo for following policy)
The distinction between CPR classes is important for other reports. For purposes of this report, all we really care about is which one is the most current - or at least is currently current.
If I have to, I'll ignore ACLS and PALS for the time being as it is non-compliance with CPR training that is the big issue at the moment. (not that the others won't be, but they weren't mentioned in the last meeting...)
Here's the query I have so far, which is giving me good data:
SELECT
iEmp.LawsonID, iEmp.LastName, iEmp.FirstName,
dept.AccCode2, dept.DisplayName,
Exp.ACLS, Exp.EP, Exp.CPR, Exp.CPR_Imported, Exp.PALS, Exp.Note
FROM (Lawson_Employees AS iEmp
LEFT JOIN Lawson_DeptInfo AS dept ON dept.AccCode = iEmp.AccCode)
LEFT JOIN tblExpirationDates AS Exp ON iEmp.LawsonID = Exp.EmpID
WHERE iEmp.CurrentEmp = 1
AND ((Exp.ACLS <= #2010-08-31#
AND Exp.ACLS IS NOT NULL)
OR (Exp.CPR <= #2010-08-31#
AND Exp.CPR_Imported <= #2010-08-31#)
OR (Exp.PALS <= #2010-08-31#
AND Exp.PALS IS NOT NULL))
ORDER BY dept.AccCode2, iEmp.LastName, iEmp.FirstName;
After perusing the result set, I think I'm missing some expiration dates that should be in the result set. Am I missing something? This is the sucky part of being the only developer in the department... no one to ask for a little help.
I think the problem is here:
OR (Exp.CPR <= #2010-08-31#
AND Exp.CPR_Imported <= #2010-08-31#)
Null is not less than or greater than anything.
If you need the people who do not have a valid CPR, you will need to include nulls.
It may be easiest to use Nz:
OR (Nz(Exp.CPR,#2010-08-31#) <= #2010-08-31#
AND Nz(Exp.CPR_Imported,#2010-08-31#) <= #2010-08-31#)