"Between" Statement in Left Join Returns null - sql

OK, there's GOT to be an answer to this... likely a straightforward solution for one of the many experts here at Stack Overflow! :-)
Background:
I am trying to create a restaurant sales report by shift. I have 2 tables... one with RECEIPT details (restaurant, date, time, receipt#, amount, server, etc.) and another with SHIFT details (restaurant, shift start time, shift end time & shift name). There are Shift Numbers in both tables, but unfortunately the Shift Numbers in the Receipts table are not always accurate, so I am having to come up with a workaround based on which bucket of ShiftStart & ShiftEnd times the ReceiptTime falls into. This code for the join almost works:
LEFT OUTER JOIN: Receipts.Time BETWEEN ShiftDetails.ShiftStartTime AND ShiftsDetails.ShiftEndTime
BUT... my challenge is that the Dinner ShiftEndTime is currently 02:00, which (of course) is less than the Dinner ShiftStartTime of 17:31, so any ReceiptTimes "between" these hours returns a null value. I would also like to keep the query dynamic by avoiding "hard coding" the Start/End times, instead using those in the Shift table, as they are likely to be revised in the future.
FWIW, my data source is SQL Server with Power BI as a front-end, so alternatively I could scrap the SQL code and perform magic with PowerQuery or DAX...
Many thanks in advance!
Danny

Here is one way of handling this logic:
Receipts r LEFT OUTER JOIN
ShiftDetails sd
ON (sd.ShiftStartTime <= sd.ShiftEndTime AND r.Time BETWEEN sd.ShiftStartTime AND sd.ShiftEndTime) OR
(sd.ShiftStartTime > sd.ShiftEndTime AND r.Time BETWEEN sd.ShiftEndTime AND sd.ShiftStartTime)
You might decide that you really don't want BETWEEN. Receipts that are exactly on the boundary between shifts will return both of them -- because BETWEEN includes the end points of the range.

You are missing a function that says if the shift end time is less than 4:00 AM than return the DateTime for the next day's date.
So it would be used in a join like:
WHERE ShiftEndDateTime <= dbo.EndShiftDateTime(#ShiftDate, #ShiftEndTime)
This bumps it to the next day if the hour is less than 5:
CREATE FUNCTION dbo.EndShiftDateTime(#ShiftDate Date, #ShiftTime Time)
RETURNS DateTime
AS
BEGIN
IF(DATEPART(HOUR, #ShiftTime) < 5)
SET #ShiftDate = dateadd(day, 1, #ShiftDate)
RETURN cast(#ShiftDate as datetime) + cast(#ShiftTime as datetime)
END
I don't like hard coding an hour like this, but it is unlikely to ever change. You could make the logic to be more flexible (if endtime < starttime) but I've seen that that just leads to hard to find bugs.

Related

Finding the Closest Unbooked Dates Using SQL

Scenario
A user selects a date. Based on the selection I check whether the date & time is booked or not (No issues here).
If a date & time is booked, I need to show them n alternative dates. Based on their date and time parameters, and those proposed alternative dates have to be as close as to their chosen date as possible. The list of alternative dates should start from the date the query is ran on My backend handles this.
My Progress So Far
SELECT alternative_date
FROM GENERATE_SERIES(
TIMESTAMP '2022-08-20 05:00:00',
date_trunc('month', TIMESTAMP '2022-08-20 07:00:00') + INTERVAL '1 month - 1 day',
INTERVAL '1 day'
) AS G(alternative_date)
WHERE NOT EXISTS(
SELECT * FROM events T
WHERE T.bookDate::DATE = G.alternative_date::DATE
)
The code above uses the GENERATE_SERIES(...) function in PSQL. It searches for all dates, starting from 2022-08-20, and up to the end of August. It specifically returns the dates which does not exist in the bookDate column (Meaning it has not yet been booked).
Problems I Need Help With
When searching for alternative dates, I'm providing 3 important things
The user's preferred booking date, so I can suggest which other dates are close to him that he can choose? How would I go about doing this? It's the part where I'm facing most trouble.
The user's start and end times, so when providing a list of alternative dates, I can tell him, hey there's free space between 06 and 07 on the date 2022-08-22 for instance. I'm also facing some issues here, a push in the right track will be great!
I want to add another WHERE but it fails, the current WHERE is a NOT EXISTS so it looks for all dates not equaling to what is given. My other WHERE basically means WHERE the place is open for booking or not.
To get closest free dates, you can ORDER BY your result by "distance" of particular alternative date to user's preferred date - the shortest intervals will be first:
ORDER BY alternative_date - TIMESTAMP '2022-08-20 05:00:00'
If you want to recommend time slots smaller than whole dates (hour range), you need to switch the whole thing from dates to hours, i.e. generate_series from 1 day to 1 hour (or whatever your smallest bookable unit is) and excluse invalid hours (nighttime I assume) in WHERE. From there, it is pretty much the same as with dates.
As for "second where", there can be only one WHERE, but it can be composed from multiple conditions - you can add more conditions using AND operator (and it can also be sub-query if needed):
WHERE NOT EXISTS(
SELECT * FROM events T
WHERE T.bookDate::DATE = G.alternative_date::DATE
) AND NOT EXISTS (
SELECT 1 FROM events WHERE "roomId" = '13b46460-162d-4d32-94c0-e27dd9246c79'
)
(warning: this second sub-query is probably dangerous in real world, since the room will be used more than one time, I assume, so you need to add some time condition to the subquery to check against date)

Update records and set values

I have a table called Transaction which contains some columns: [TransactionID, Type(credit or debit), Amount, Cashout, CreditPaid, EndTime]
Customers can get lots of credit and these transactions are stored in the transactions table. If a customer pays at the end of the month an amount which covers some or all of the credit transactions, I want those transactions to be updated. If the total payment covers some transactions, then the transactions should be updated.
For example, a customer pays in 300. If the transaction 'Amount' is 300 and 'Type' is credit then the 'CreditPaid' amount should be 300. (This is a simple update statement) but...
If there are two transactions i.e. one 300 and another 400 and are both credit and the monthly payment amount is 600, then the oldest transaction should be paid 300 in full, and the next transaction 300 leaving 100 outstanding.
Any ideas how to do this?
TrID Buyin Type Cashout CustID StartTime EndTime AddedBy CreditPaid
72 200 Credit 0 132 2013-05-21 NULL NULL NULL
73 300 Credit 0 132 2013-05-22 NULL NULL NULL
75 400 Credit 0 132 2013-05-23 NULL NULL NULL
Desired Results after customer pays 600
TrID Buyin Type Cashout CustID StartTime EndTime AddedBy CreditPaid
72 200 Credit 0 132 2013-05-21 2013-05-24 NULL 200
73 300 Credit 0 132 2013-05-22 2013-05-24 NULL 300
75 400 Credit 0 132 2013-05-23 NULL NULL 100
Here's a SQL 2008 version:
CREATE PROCEDURE dbo.PaymentApply
#CustID int,
#Amount decimal(11, 2),
#AsOfDate datetime
AS
WITH Totals AS (
SELECT
T.*,
RunningTotal =
Coalesce (
(SELECT Sum(S.Buyin - Coalesce(S.CreditPaid, 0))
FROM dbo.Trans S
WHERE
T.CustID = S.CustID
AND S.Type = 'Credit'
AND S.Buyin < Coalesce(S.CreditPaid, 0)
AND (
T.Starttime > S.Starttime
OR (
T.Starttime = S.Starttime
AND T.TrID > S.TrID
)
)
),
0)
FROM
dbo.Trans T
WHERE
CustID = #CustID
AND T.Type = 'Credit'
AND T.Buyin < Coalesce(T.CreditPaid, 0)
)
UPDATE T
SET
T.EndTime = P.EndTime,
T.CreditPaid = Coalesce(T.CreditPaid, 0) + P.CreditPaid
FROM
Totals T
CROSS APPLY (
SELECT TOP 1
V.*
FROM
(VALUES
(T.Buyin - Coalesce(T.CreditPaid, 0), #AsOfDate),
(#Amount - RunningTotal, NULL)
) V (CreditPaid, EndTime)
ORDER BY
V.CreditPaid,
V.EndTime DESC
) P
WHERE
T.RunningTotal <= #Amount
AND #Amount > 0;
;
See a Live Demo at SQL Fiddle
Or, for anyone using SQL 2012, you can replace the contents of the CTE with a better-performing and simpler query using the new windowing functions:
SELECT
*,
RunningTotal =
Sum(Buyin - Coalesce(CreditPaid, 0)) OVER(
ORDER BY StartTime
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) - Buyin
FROM dbo.Trans
WHERE
CustID = #CustID
AND Type = 'Credit'
AND Buyin - Coalesce(CreditPaid, 0) > 0
See a Live Demo at SQL Fiddle
Here's how they work:
We calculate the running total for all the prior rows where the CreditPaid amount is less than the Buyin amount. Note this does NOT include the current row.
From this we can determine what portion of the payment will apply to each row and which rows will be involved in the payment. If the sum of all the credits for all the prior rows are higher than the payment, then this row will NOT be included, thus T.RunningTotal <= #Amount. That's because all the prior rows will fully consume the payment by this point, so we can stop applying it.
For each row where we will apply a payment, we want to pay as much as possible, but we have to pay attention to the last row where we may not be paying the full amount (as is the case with the third credit in the example). So we'll be paying one of two amounts: the full credit amount (with more rows to receive payments) or only the portion left over which could be less than the full credit for that row (and this is the last row). We accomplish this by taking the lesser of either 1) the full remaining Buyin - CreditPaid amount, or 2) what's left of the full amount #Amount - RunningTotalOfPriorRows. I could have done this as a CASE expression, but I like using the Min function, especially because we would have had to do two CASE expressions to also determine whether to also update the EndTime column (per your requirements).
The SQL 2012 version simply calculates the same thing as the 2008 version: the sum of Buyin - CreditPaid for all the prior rows, using a windowing function instead of a correlated subquery.
Finally, we perform the update to all rows where the RunningTotal is less than the amount to be applied (since if it were equal to the amount, there would be no payment left for the current row).
Now, there are some larger considerations that you should think about.
Some of your scheme I like--I am not convinced that, as some commenters have said, you should ignore the individual transactions. I think that handling the individual transactions can be very important. It's much like how hospitals have one medical record number for each patient (MRN) but open a new account / file / visit each time the patient has a service performed. Each account is treated separately, and this is for many reasons, including--and this is where it seems important for you, too--the need for the customer to understand what exactly is comprising the total. It can be shocking to see the total all added up, but when this is broken out into individual transactions on individual dates, this makes a lot more sense to people and they can begin to understand exactly how they spent more money than they remembered at the time. "You owe me 600 bucks" can be harder to face than "your transactions for $100, $300, and $200 are still unpaid". :)
So, on to some big considerations here.
If you go with the theory that a transactional or balance-based account starts at 0 as a sort of "anchor", and to find the current balance you simply have to add up all the transactions: well, this does indeed satisfy relational theory, but in practice it is completely unworkable because it does not provide a fast, accurate way to get the current balance. It is imperative to have the current balance saved as a discrete value. If you were a bank, how would you know how much money you had, without adding up perhaps dozens of years of transaction history each time? Instead, it may be better to think of the current balance as the "anchor" (instead of 0) and think of the transactions as going backward in time. Additionally, there is no harm in recording periodic balances. Banks do this by closing out periods into statements, with a defined balance as of each statement closing date. There is no need to go all the way back to zero, since you don't care too much about the balance at the old, unanchored end of the history. You know that eventually every account started at 0. Big deal!
Given these thoughts, it is important for you to have a table where the customer's total account balance is simply stated. You also need a place to record his payments, refunds, cancellations, and so on. This should be separate from the accounts (in your case, transactions) themselves, because there is not a one-to-one correspondence between payment transactions and credit transactions. Already in your current scheme you have partially paid transactions with no date recorded--this is a huge gap in the system that will come back to bite you. What if a customer paid $10 a day toward a $200 credit for 20 days? 19 of those payments would show no date paid.
What I recommend, then, is that you create a stored procedure (SP) that applies payments to totals first, and then create another one that will "rewrite" the payments into the transactions in an on-demand way. Think about what a credit card company has to do if they "re-rate" your account. Perhaps they acted on incorrect information and increased your interest rate on a certain date. (This actually happened to me. I proved to them that the collections activity they were responding to was not my fault--it had been retracted by the original company after I showed them that one of their staff had mistakenly changed my mailing address, and I had never received a bill to be able to pay. So they had to be able to re-run all the purchase/debit/interest rate calculations on my account retroactively, to recalculate everything after the original change date based on the correct interest rate.) Think about this a bit and you will see that it is quite possible to operate this way, as long as you design your system properly. Your SP is given a date range or set of transactions within which it is allowed to work, and then "rewrites" history as if the old history had never existed.
But, you don't actually want to destroy history, so this is further complicated by the fact that at one point in time, your best knowledge of the customer's account balance for a time period was a different amount than your current best knowledge of their account balance for that time period--both are true data and need to be kept.
Let's say you discover that your system occasionally doubled up Credit transactions mistakenly. When you fix the customer data, you need to be able to see the fact that they had the problem, even though they don't have it now. This is done by using additional date columns EffectiveDate and ExpirationDate--or whatever you want to call them. Then, these need to be part of the clustered index, and used on every query to always get the current values. I highly recommend using 9999-12-31 instead of NULL as your ExpirationDate value for current rows--this will have a huge positive impact on performance when querying for current data. I also recommend putting the ExpirationDate as the first column in the clustered index (or at least, before the EffectiveDate column), since history will always potentially have many more records than the future, so it will be more selective than EffectiveDate being first (think a bit: all past knowledge will have EffectiveDate =< GetDate() but only current or future data will have ExpirationDate > GetDate()). To drive the point home: you don't delete. You expire old rows by setting a column to the date the knowledge became obsolete, and you insert new rows representing the new knowledge, with a column showing the date you learned this information and having an indefinitely-open "to the future" value in the other date column.
And finally a couple of single points:
The CreditPaid column should be NOT NULL with a default of 0. I had to throw in a bunch of Coalesces to deal with the NULLs.
You need to handle overpayments somehow. Either by preventing them, or by storing the overpaid portion value and applying it later. You could OUTPUT the results of the UPDATE statement into a table, then select the Sum from this and make the SP return any unused payment value. There are many ways to handle this. If you build the "re-rate" SP as I suggested, then this won't be too much of a problem, as you can rerun it after receiving new transactions (then immediately (re)apply all payments for any open periods).
At this point I can't go on too much more, but I hope that these thoughts help you. Your design is a good start, but it needs some work to get it to the point where it will function well as an enterprise-quality system.
UPDATE
I corrected a glitch in the 2008 version (adding the conditions from the outer query to the subquery).
And here's my last edit (all: please do not edit this answer again or it will be converted to community wiki).
If you do go with a scheme where rows are marked with the dates they are understood to be true (EffectiveDate and ExpirationDate), you can make coding in your system a little easier by creating inline table functions that select only the active rows from the table WHERE EffectiveDate <= GetDate() AND GetDate() < ExpirationDate. Pay careful attention to the comparison operators you're using (e.g., <= vs <), and use date ranges that are inclusive at the start and exclusive at the end. If you aren't sure what that means, please do look these terms up and understand them before proceeding. You want to be able to change the resolution of your date data type in the future, without breaking any of your queries. If you use an inclusive end date, this will not be possible. There are many posts online talking about how to properly query for dates in SQL.
Something like this:
CREATE FUNCTION dbo.TransCurrent
RETURNS TABLE
AS
RETURN (
SELECT *
FROM dbo.Trans
WHERE
EffectiveDate <= GetDate()
AND GetDate() < ExpirationDate --make clustered index have this first!
);
Do NOT confuse this with a multi-statement table-value-returning function. That will NOT perform well. This function type here will perform well because it can be inlined into the query, where basically the engine takes the logical intent of what the function is doing, and disposes with the function call entirely. Using any other kind of function will defeat this and your performance will go into the pot as your table grows in size.

Loop through SQL records to get total meter reading over time period if meter has be swapped

I have a table used to enter weekly Hobbs hour meter readings for vehicles. I need to be able to add up the total hours used for each vehicle over a period of time. The problem I am having is that sometimes the meters get swapped out once they fail and are replaced with new meters the are not programmable and I have to start at zero hours so I will need to loop through each meter reading and determine if a meter had been swapped and account for it in my code. I'm not big on cursors but may have to use one to create a stored procedure that will work unless someone has a better solution I should try. I'm trying to think of a way a CTE might help but nothing yet.
The table structure is:
ReadingDate Datetime
EquipmentId Int
MeterReading decimal(18,2)
I.m planning on Creating a accumulation variable, loop through the rows in a cursor and add the reading difference into the variable.
So as I loop through and see a meter reading is less than the previous reading, I must have a meter swap out and will have to account for it. I'm thinking that if I was to look ahead at the next reading and it is lower than the current, I would just skip the summing action one time and then start adding the difference to the accumulation variable. Any better way to do this? Thanks.
This will work in SqlServer:
SELECT t1b.EquipmentId,
t1a.ReadingDate As StartDate,
t1b.ReadingDate AS EndDate,
CASE
WHEN t1b.Reading >= t1a.Reading THEN t1b.Reading - t1a.Reading
ELSE t1b.Reading
END AS Hours
FROM MeterReadings AS t1a
JOIN (select t11.EquipmentId,
t11.ReadingDate AS Date1,
(SELECT MIN(t13.ReadingDate)
FROM MeterReadings AS t13
WHERE t13.EquipmentId = t11.EquipmentId
AND t13.ReadingDate > t11.ReadingDate
Group By t13.EquipmentId) AS NextReadingDate
from MeterReadings AS t11) AS rd ON t1a.EquipmentId = rd.EquipmentId
AND t1a.ReadingDate = rd.Date1
JOIN MeterReadings as t1b ON t1b.EquipmentId = t1a.EquipmentId
AND t1b.ReadingDate = rd.NextReadingDate
order by t1a.EquipmentId, t1a.readingdate
SQL Fiddle I was playing with

SQL finding overlapping of times pass midnight (across 2 days)

I know there are lots of these types of questions, but i didn't see one that was similar enough to my criteria. So i'd like to ask for your help please. The fields i have are just start and end which are of time types. I cannot involve any specific dates in this. If the time ranges don't go pass midnight across day, i'd just compare two tuples as such:
end1 > start2 AND start1 < end2
(end points touching are not considered overlapped here.)
But when I involve time range that pass (or at) midnight, this obviously doesn't work. For example, given:
start | end
--------+--------
06:00PM | 01:00AM
03:00PM | 09:00PM
Without involving dates, how can i achieve this, please. My assumption is, if end is less than start, then we're involving 2 days.
I'm trying to do this in plain standard SQL, so just a simple and concise logic in the WHERE clause.
Thank you everyone!
Added:
Also, how would I test if one time range completely envelopes another? thanks again!
If your SQL supports time differences:
(end1 - start1) > (start2 - start1) AND (end2 - start2) > (start1 - start2)
Unfortunately, "plain" SQL will be too general to use against an actual database. The reason is that the various database products have different levels of support for calculating the duration between two times. For example, in SQL Server 2008, it would be substantially simpler to convert the time values to DateTime and then do the comparison since many comparison operators are not supported on the Time data type.
Select ...
From (
Select Cast(T.Start1 As DateTime) As Start1
, Case
When Cast(T.Start1 As DateTime) > Cast(T.End1 As DateTime) Then DateAdd(d,1,Cast(T.End1 As DateTime))
Else Cast(T.End1 As DateTime)
End As End1
From ...
) As T
Where T.End1 > T2.Start2 And T1.Start2 < T2.End2
use start time and duration (in minutes or whatever unit is appropriate)
The program Transtar had this problem. The time data was not associated with any date, nor was the time data in a date time field. The program initially was designed to issue transit itinaries from about 4AM to midnight which worked fine as long as the transit wasn't around the clock. I built a function which did a sliding test for the times so that if you asked for 5AM it would look at times from 1AM to 12:59AM. I wrote it in FORTRAN, but the algorythm would be the same regardless of language.

Is SQL DATEDIFF(year, ..., ...) an Expensive Computation?

I'm trying to optimize up some horrendously complicated SQL queries because it takes too long to finish.
In my queries, I have dynamically created SQL statements with lots of the same functions, so I created a temporary table where each function is only called once instead of many, many times - this cut my execution time by 3/4.
So my question is, can I expect to see much of a difference if say, 1,000 datediff computations are narrowed to 100?
EDIT:
The query looks like this :
SELECT DISTINCT M.MID, M.RE FROM #TEMP INNER JOIN M ON #TEMP.MID=M.MID
WHERE ( #TEMP.Property1=1 ) AND
DATEDIFF( year, M.DOB, #date2 ) >= 15 AND DATEDIFF( year, M.DOB, #date2 ) <= 17
where these are being generated dynamically as strings (put together in bits and pieces) and then executed so that various parameters can be changed along each iteration - mainly the last lines, containing all sorts of DATEDIFF queries.
There are about 420 queries like this where these datediffs are being calculated like so. I know that I can pull them all into a temp table easily (1,000 datediffs becomes 50) - but is it worth it, will it make any difference in seconds? I'm hoping for an improvement better than in the tenths of seconds.
It depends on exactly what you are doing to be honest as to the extent of the performance hit.
For example, if you are using DATEDIFF (or indeed any other function) within a WHERE clause, then this will be a cause of poorer performance as it will prevent an index being used on that column.
e.g. basic example, finding all records in 2009
WHERE DATEDIFF(yyyy, DateColumn, '2009-01-01') = 0
would not make good use of an index on DateColumn. Whereas a better solution, providing optimal index usage would be:
WHERE DateColumn >= '2009-01-01' AND DateColumn < '2010-01-01'
I recently blogged about the difference this makes (with performance stats/execution plan comparisons), if you're interested.
That would be costlier than say returning DATEDIFF as a column in the resultset.
I would start by identifying the individual queries that are taking the most time. Check the execution plans to see where the problem lies and tune from there.
Edit:
Based on the example query you've given, here's an approach you could try out to remove the use of DATEDIFF within the WHERE clause. Basic example to find everyone who was 10 years old on a given date - I think the maths is right, but you get the idea anyway! Gave it a quick test, and seems fine. Should be easy enough to adapt to your scenario. If you want to find people between (e.g.) 15 and 17 years old on a given date, then that's also possible with this approach.
-- Assuming #Date2 is set to the date at which you want to calculate someone's age
DECLARE #AgeAtDate INTEGER
SET #AgeAtDate = 10
DECLARE #BornFrom DATETIME
DECLARE #BornUntil DATETIME
SELECT #BornFrom = DATEADD(yyyy, -(#AgeAtDate + 1), #Date2)
SELECT #BornUntil = DATEADD(yyyy, -#AgeAtDate , #Date2)
SELECT DOB
FROM YourTable
WHERE DOB > #BornFrom AND DOB <= #BornUntil
An important note to add, is for age caculates from DOB, this approach is more accurate. Your current implementation only takes the year of birth into account, not the actual day (e.g. someone born on 1st Dec 2009 would show as being 1 year old on 1st Jan 2010 when they are not 1 until 1st Dec 2010).
Hope this helps.
DATEDIFF is quite efficient compared to other methods of handling of datetime values, like strings. (see this SO answer).
In this case, it sounds like you going over and over the same data, which is likely more expensive than using a temp table. For example, statistics will be generated.
One thing you might be able do to improve performance might be to put an index on the temp table on MID.
Check your execution plan to see if it helps (may depend on the number of rows in the temp table).