Time between date. (More advanced than just Datediff) - sql

I have a table that contains Guest_ID and Trip_Date. I have been tasked with trying to find out for each Guest_ID how many times they have had over 365 days between trips. I know that for the time between the dates I can use datediff formula but I am unsure of how to get the dates plugged in properly. I think if I can get help with this part I can do the rest.
For each time this happened I need to report back Guest_ID, Prior_Last_Trip, New_Trip, days between. This data goes back for over a decade so it is possible for a Guest to have multiple periods of over a year between visits.
I was thinking of just loading a table with that data that can be queried later. That way once I figure out how to make this work the first time I can setup a stored procedure or trigger to check for new occurrences of this and populate the table.
I was not sure were to begin on this code. I was thinking recursion might be the answer but I do not know recursion just that it exist.
This table is quite large. Around 1.5 million unique Guest_ID's with over 30 million trips.
I am using SQL Server 2012. If there is anything else I can add to help this let me know. I will edit and update this as I have ideas on how to make this work myself.
Edit 1: Sample Data and Desired Results
Guest_ID Trip_Date
1 1/1/2013
1 2/5/2013
1 12/5/2013
1 1/1/2015
1 6/5/2015
1 8/1/2017
1 10/2/2017
1 1/6/2018
1 6/7/2018
1 7/1/2018
1 7/5/2018
2 1/1/2018
2 2/6/2018
2 4/2/2018
2 7/3/2018
3 1/1/2014
3 6/5/2014
3 9/4/2014
Guest_ID Prior_Last_Trip New_Trip DaysBetween
1 12/5/2013 1/1/2015 392
1 6/5/2015 8/1/2017 788
So you can see that Guest 1 had 2 different times where they did not have a trip for over a year and that those two instances are recorded in the results. Guest 2 never had a gap of over a year and therefor has no records in the results. Guest 3 has not had a trip in over a year but without have the return trip currently does not qualify for the result set. Should Guest 3 ever make another trip they would then be added to the result set.
Edit 2: Working Query
Thanks to #Code4ml I got this working. Here is the complete query.
Select
Guest_ID, CurrentTrip, DaysBetween, Lasttrip
From (
Select
Guest_ID
,Lag(Trip_Date,1) Over(Partition by Guest_ID Order by Trip_Date) as LastTrip
,Trip_Date as CurrentTrip
,DATEDIFF(d,Lag(Trip_Date,1) Over(Partition by Guest_ID Order by Trip_Date),Trip_Date) as DaysBetween
From UCS
) as A
Where DaysBetween > 365

You may try SQL LAG function to access previous trip date like below.
SELECT guest_id, trip_date,
LAG (trip_date,1) OVER (PARTITION BY guest_id ORDER BY trip_date desc) AS prev_trip_date
FROM tripsTable
Now you can use this as a subquery to calculate number of days between trips and filter the data as required.

Related

How to check if dates overlap on different lines in SQL Server?

I have a database with electricity meter readings. Sometimes people get a new meter and then their original meter gets an end date and the new meter gets a start date and the end date remains NULL. This can happen multiple times in a year and I want to know if there are no gaps in measurement. In other words, I need to figure out if end date 1 is the same as start date 2 and so on.
Sample data:
cust_id meter_id start_date end_date
--------------------------------------------------
a 1 2017-01-01 2017-05-02
a 2 2017-05-02 Null
b 3 2017-01-01 2017-06-01
b 4 2017-06-05 Null
This is what the data looks like and the result I am looking for is that for customer a the end date of meter 1 is equal to the start date of meter 2. For customer b however, there are 4 days between the end date of meter 3 and the start date of meter 4. That is something I want to flag.
I found customers for whom this can happen up to 8 times in the period I am researching. I tried something with nested queries and very complex cases but even I lost my way around it, so I was wondering if someone here has an idea of how to get to the answer a little smarter.
You can get the offending rows using lag():
select r.*
from (select r.*,
lag(end_date) over (partition by cust_id, meter_id order by start_date) as prev_end_date,
row_number() over (partition by cust_id, meter_id order by start_date) as seqnum
from readings r
) r
where prev_end_date <> start_date or prev_end_date is null and seqnum > 1;
Guessing there is now a better way to pull this off using LEAD and LAG, but I wrote an article in SQL 2008R2 called T-SQL: Identify bad dates in a time series where you can modify the big cte in the middle of the article to handle your definition of a bad date.
Good luck. There's too much detail in the article to post in a single SO question, otherwise I'd do that here.

SQL Find latest record only if COMPLETE field is 0

I have a table with multiple records submitted by a user. In each record is a field called COMPLETE to indicate if a record is fully completed or not.
I need a way to get the latest records of the user where COMPLETE is 0, LOCATION, DATE are the same and no additional record exist where COMPLETE is 1. In each record there are additional fields such as Type, AMOUNT, Total, etc. These can be different, even though the USER, LOCATION, and DATE are the same.
There is a SUB_DATE field and ID field that denote the day the submission was made and auto incremented ID number. Here is the table:
ID NAME LOCATION DATE COMPLETE SUB_DATE TYPE1 AMOUNT1 TYPE2 AMOUNT2 TOTAL
1 user1 loc1 2017-09-15 1 2017-09-10 Food 12.25 Hotel 65.54 77.79
2 user1 loc1 2017-09-15 0 2017-09-11 Food 12.25 NULL 0 12.25
3 user1 loc2 2017-08-13 0 2017-09-05 Flight 140 Food 5 145.00
4 user1 loc2 2017-08-13 0 2017-09-10 Flight 140 NULL 0 140
5 user1 loc3 2017-07-14 0 2017-07-15 Taxi 25 NULL 0 25
6 user1 loc3 2017-08-25 1 2017-08-26 Food 45 NULL 0 45
The results I would like is to retrieve are ID 4, because the SUB_DATE is later that ID 3. Which it has the same Name, Location, and Date information and there is no COMPLETE with a 1 value.
I would also like to retrieve ID 5, since it is the latest record for the User, Location, Date, and Complete is 0.
I would also appreciate it if you could explain your answer to help me understand what is happening in the solution.
Not sure if I fully understood but try this
SELECT *
FROM (
SELECT *,
MAX(CONVERT(INT,COMPLETE)) OVER (PARTITION BY NAME,LOCATION,DATE) AS CompleteForNameLocationAndDate,
MAX(SUB_DATE) OVER (PARTITION BY NAME, LOCATION, DATE) AS LastSubDate
FROM your_table t
) a
WHERE CompleteForNameLocationAndDate = 0 AND
SUB_DATE = LastSubDate
So what we have done here:
First, if you run just the inner query in Management Studio, you will see what that does:
The first max function will partition the data in the table by each unique Name,Location,Date set.
In the case of your data, ID 1 & 2 are the first partition, 3&4 are the second partition, 5 is the 3rd partition and 6 is the 4th partition.
So for each of these partitions it will get the max value in the complete column. Therefore any partition with a 1 as it's max value has been completed.
Note also, the convert function. This is because COMPLETE is of datatype BIT (1 or 0) and the max function does not work with that datatype. We therefore convert to INT. If your COMPLETE column is type INT, you can take the convert out.
The second max function partitions by unique Name, Location and Date again but we are getting the max_sub date this time which give us the date of the latest record for the Name,Location,Date
So we take that query and add it to a derived table which for simplicity we call a. We need to do this because SQL Server doesn't allowed windowed functions in the WHERE clause of queries. A windowed function is one that makes use of the OVER keyword as we have done. In an ideal world, SQL would let us do
SELECT *,
MAX(CONVERT(INT,COMPLETE)) OVER (PARTITION BY NAME,LOCATION,DATE) AS CompleteForNameLocationAndDate,
MAX(SUB_DATE) OVER (PARTITION BY NAME, LOCATION, DATE) AS LastSubDate
FROM your)table t
WHERE MAX(CONVERT(INT,COMPLETE)) OVER (PARTITION BY NAME,LOCATION,DATE) = 0 AND
SUB_DATE = MAX(SUB_DATE) OVER (PARTITION BY NAME, LOCATION, DATE)
But it doesn't allow it so we have to use the derived table.
So then we basically SELECT everything from our derived table Where
CompleteForNameLocationAndDate = 0
Which are Name,Location, Date partitions which do not have a record marked as complete.
Then we filter further asking for only the latest record for each partition
SUB_DATE = LastSubDate
Hope that makes sense, not sure what level of detail you need?
As a side, I would look at restructuring your tables (unless of course you have simplified to better explain this problem) as follows:
(Assuming the table in your examples is called Booking)
tblBooking
BookingID
PersonID
LocationID
Date
Complete
SubDate
tblPerson
PersonID
PersonName
tblLocation
LocationID
LocationName
tblType
TypeID
TypeName
tblBookingType
BookingTypeID
BookingID
TypeID
Amount
This way if you ever want to add Type3 or Type4 to your booking information, you don't need to alter your table layout

How can I return rows which match on some columns and fulfil a DateTime comparison between two other columns using SQL?

I have a table which contains rows for jobs, example below, where 01/01/1980 is used rather than null in the ClosedDate column for jobs which are not finished:
JobNumber JobCategory CustomerID CreatedDate ClosedDate
1 Small 1 01/01/2016 03/01/2016
2 Small 2 03/01/2016 07/01/2016
3 Large 2 06/01/2016 07/01/2016
4 Medium 1 08/01/2016 10/01/2016
5 Small 3 10/01/2016 01/01/1980
6 Medium 3 15/01/2016 01/01/1980
7 Large 2 16/01/2016 17/01/2016
8 Large 2 19/01/2016 20/01/2016
9 Small 1 19/01/2016 01/01/1980
10 Medium 2 19/01/2016 01/01/1980
I need to return a list of any jobs where the same customer has had a job of the same category created within 3 days of the previous job being closed.
So, I would want to return:
7 Large 2 16/01/2016 17/01/2016
8 Large 2 19/01/2016 20/01/2016
because Customer 2 had a Large job closed on 17/01/2016 and another Large job opened on 19/01/2016, which is within 3 days.
In order to do this, I assume I need to compare each record in the table with each subsequent record, looking for a match on JobCategory and comparing CreatedDate with ClosedDate between rows.
Can anyone advise my best option for this using SQL? I'm using SQL Server 2012.
The first thing that you should do is get rid of "magic dates" in your system. If the job hasn't been closed yet then the ClosedDate is not known. SQL has a value for exactly that - NULL. That prevents anyone in the future from having to know the magic date of 1/1/1980 or from that having to be hard-coded throughout your system.
Next, you don't have to compare each row with each one after it. Define what you're looking for and find matches that meet those qualifications. You didn't specify which type of SQL Server you're using (you should tag your question with Oracle or MySQL or SQL Server), so the below query is written for SQL Server. Your version might have different date functions.
SELECT
J1.JobNumber,
J1.JobCategory,
J1.CustomerID,
J1.CreatedDate,
J1.ClosedDate,
J2.JobNumber,
J2.CreatedDate,
J2.ClosedDate
FROM
Jobs J1
INNER JOIN Jobs J2 ON
J2.CustomerID = J1.CustomerID AND
J2.JobCategory = J1.JobCategory AND
DATEDIFF(DAY, J1.ClosedDate, J2.CreatedDate) BETWEEN 0 AND 3 AND
J2.JobNumber <> J1.JobNumber
This will return the jobs in a single row instead of two rows. If that's a problem then the query could be altered slightly to do so. This can also be done a little more easily with windowed functions, but again, since you didn't specify your SQL vendor I didn't want to use those.
Since you're using SQL Server, you should be able to use windowed functions like so:
;WITH CTE_JobsWithDates AS -- Probably a poor name for the CTE
(
SELECT
JobNumber,
JobCategory,
CustomerID,
CreatedDate,
ClosedDate,
LEAD(CreatedDate, 1) OVER (PARTITION BY JobCategory, CustomerID ORDER BY CreatedDate) AS NextCreatedDate,
LAG(ClosedDate, 1) OVER (PARTITION BY JobCategory, CustomerID ORDER BY CreatedDate) AS PreviousClosedDate
FROM
Jobs
)
SELECT
JobNumber,
JobCategory,
CustomerID,
CreatedDate,
ClosedDate
FROM
CTE_JobsWithDates
WHERE
DATEDIFF(DAY, ClosedDate, NextCreatedDate) BETWEEN 0 AND 3 OR
DATEDIFF(DAY, LastClosedDate, CreatedDate) BETWEEN 0 AND 3
That was off the cuff, so please test and let me know if anything isn't quite right.
Try:
SELECT a.*
FROM
Job AS a
JOIN
Job AS b ON
a.CustomerID = b.CustomerID AND a.JobCategory = b.JobCategory
WHERE
a.JobNumber != b.JobNumber
AND (
b.CreatedDate - a.ClosedDate BETWEEN 0 AND 3
OR
a.CreatedDate - b.ClosedDate BETWEEN 0 AND 3)

Join to Calendar Table - 5 Business Days

So this is somewhat of a common question on here but I haven't found an answer that really suits my specific needs. I have 2 tables. One has a list of ProjectClosedDates. The other table is a calendar table that goes through like 2025 which has columns for if the row date is a weekend day and also another column for is the date a holiday.
My end goal is to find out based on the ProjectClosedDate, what date is 5 business days post that date. My idea was that I was going to use the Calendar table and join it to itself so I could then insert a column into the calendar table that was 5 Business days away from the row-date. Then I was going to join the Project table to that table based on ProjectClosedDate = RowDate.
If I was just going to check the actual business-date table for one record, I could use this:
SELECT actual_date from
(
SELECT actual_date, ROW_NUMBER() OVER(ORDER BY actual_date) AS Row
FROM DateTable
WHERE is_holiday= 0 and actual_date > '2013-12-01'
ORDER BY actual_date
) X
WHERE row = 65
from here:
sql working days holidays
However, this is just one date and I need a column of dates based off of each row. Any thoughts of what the best way to do this would be? I'm using SQL-Server Management Studio.
Completely untested and not thought through:
If the concept of "business days" is common and important in your system, you could add a column "Business Day Sequence" to your table. The column would be a simple unique sequence, incremented by one for every business day and null for every day not counting as a business day.
The data would look something like this:
Date BDAY_SEQ
========== ========
2014-03-03 1
2014-03-04 2
2014-03-05 3
2014-03-06 4
2014-03-07 5
2014-03-08
2014-03-09
2014-03-10 6
Now it's a simple task to find the N:th business day from any date.
You simply do a self join with the calendar table, adding the offset in the join condition.
select a.actual_date
,b.actual_date as nth_bussines_day
from DateTable a
join DateTable b on(
b.bday_seq = a.bday_seq + 5
);

Can I use SQL to plot actual dates based on schedule information?

If I have a table containing schedule information that implies particular dates, is there a SQL statement that can be written to convert that information into actual rows, using some sort of CROSS JOIN, perhaps?
Consider a payment schedule table with these columns:
StartDate - the date the schedule begins (1st payment is due on this date)
Term - the length in months of the schedule
Frequency - the number of months between recurrences
PaymentAmt - the payment amount :-)
SchedID StartDate Term Frequency PaymentAmt
-------------------------------------------------
1 05-Jan-2003 48 12 1000.00
2 20-Dec-2008 42 6 25.00
Is there a single SQL statement to allow me to go from the above to the following?
Running
SchedID Payment Due Expected
Num Date Total
--------------------------------------
1 1 05-Jan-2003 1000.00
1 2 05-Jan-2004 2000.00
1 3 05-Jan-2005 3000.00
1 4 05-Jan-2006 4000.00
1 5 05-Jan-2007 5000.00
2 1 20-Dec-2008 25.00
2 2 20-Jun-2009 50.00
2 3 20-Dec-2009 75.00
2 4 20-Jun-2010 100.00
2 5 20-Dec-2010 125.00
2 6 20-Jun-2011 150.00
2 7 20-Dec-2011 175.00
I'm using MS SQL Server 2005 (no hope for an upgrade soon) and I can already do this using a table variable and while loop, but it seemed like some sort of CROSS JOIN would apply but I don't know how that might work.
Your thoughts are appreciated.
EDIT: I'm actually using SQL Server 2005 though I initially said 2000. We aren't quite as backwards as I thought. Sorry.
I cannot test the code right now, so take it with a pinch of salt, but I think that something looking more or less like the following should answer the question:
with q(SchedId, PaymentNum, DueDate, RunningExpectedTotal) as
(select SchedId,
1 as PaymentNum,
StartDate as DueDate,
PaymentAmt as RunningExpectedTotal
from PaymentScheduleTable
union all
select q.SchedId,
1 + q.PaymentNum as PaymentNum,
DATEADD(month, s.Frequency, q.DueDate) as DueDate,
q.RunningExpectedTotal + s.PaymentAmt as RunningExpectedTotal
from q
inner join PaymentScheduleTable s
on s.SchedId = q.SchedId
where q.PaymentNum <= s.Term / s.Frequency)
select *
from q
order by SchedId, PaymentNum
Try using a table of integers (or better this: http://www.sql-server-helper.com/functions/integer-table.aspx) and a little date math, e..g. start + int * freq
I've used table-valued functions to achieve a similar result. Basically the same as using a table variable I know, but I remember being really pleased with the design.
The usage ends up reading very well, in my opinion:
/* assumes #startdate and #enddate schedule limits */
SELECT
p.paymentid,
ps.paymentnum,
ps.duedate,
ps.ret
FROM
payment p,
dbo.FUNC_get_payment_schedule(p.paymentid, #startdate, #enddate) ps
ORDER BY p.paymentid, ps.paymentnum
A typical solution is to use a Calendar table. You can expand it to fit your own needs, but it would look something like:
CREATE TABLE Calendar
(
calendar_date DATETIME NOT NULL,
is_holiday BIT NOT NULL DEFAULT(0),
CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED calendar_date
)
In addition to the is_holiday you can add other columns that are relevant for you. You can write a script to populate the table up through the next 10 or 100 or 1000 years and you should be all set. It makes queries like that one that you're trying to do much simpler and can give you additional functionality.