SQL Server 2008 : how to select sum of all sessions where difference between two consecutive sessions is less than 10 minutes - sql

I have a table which stores chat messages for users. Every message is logged in this table. I have to calculate chat duration for a particular user.
Since there is a possibility that user is chatting at x time and after x+10 times he leaves chatting. After X+20 time, again user starts chatting. So the time period between x+10 and x+20 should not be accounted.
Table structure and sample data is as depicted. Different color represent two chat sessions for same user. As we can see that between 663 and 662 there is a difference of more than 1 hour, so such sessions should be excluded from the resultset. Final result should be 2.33 minutes.
declare #messagetime1 as datetime
declare #messagetime2 as datetime
select #messagetime1=messagetime from tbl_chatMessages where ID=662
select #messagetime2=messagetime from tbl_chatMessages where ID=659
print datediff(second,#messagetime2,#messagetime1)
Result --- 97 seconds
declare #messagetime3 as datetime
declare #messagetime4 as datetime
select #messagetime3=messagetime from tbl_chatMessages where ID=668
select #messagetime4=messagetime from tbl_chatMessages where ID=663
print datediff(second,#messagetime4,#messagetime3)
Result -- 43 seconds
Please suggest a solution to calculate duration of chat. This is one of the logic I could think of, in case any one of you has a better idea. Please share with a solution

first need to calculate the gap between adjacent messages, if the gap of more than 600 seconds, so the time between these messages 0
SELECT SUM(o.duration) / 60.00 AS duration
FROM dbo.tbl_chatMessages t1
OUTER APPLY (
SELECT TOP 1
CASE WHEN DATEDIFF(second, t2.messageTime, t1.messageTime) > 600
THEN 0
ELSE DATEDIFF(second, t2.messageTime, t1.messageTime) END
FROM dbo.tbl_chatMessages t2
WHERE t1.messageTime > t2.messageTime
ORDER BY t2.messageTime DESC
) o(duration)
See demo on SQLFiddle

Try something like this:
WITH DATA
AS (SELECT t1.*,
CASE
WHEN
Isnull(Datediff(MI, t2.MESSAGETIME, t1.MESSAGETIME), 11) > 10
THEN 0
ELSE 1
END first_ident
FROM TABLE1 t1
LEFT JOIN TABLE1 t2
ON t1.ID = t2.ID + 1),
CTE
AS (SELECT ID,
MESSAGETIME,
ID gid,
0 AS tot_time
FROM DATA
WHERE FIRST_IDENT = 0
UNION ALL
SELECT t1.ID,
t1.MESSAGETIME,
t2.GID,
t2.TOT_TIME
+ Datediff(MI, t2.MESSAGETIME, t1.MESSAGETIME)
FROM DATA t1
INNER JOIN CTE t2
ON t1.ID = t2.ID + 1
AND t1.FIRST_IDENT = 1)
SELECT GID,
Max(TOT_TIME) Tot_time
FROM CTE
GROUP BY GID
I set up a working example on SQL Fiddle. Take a look and let me know if you have any questions.

Here is the reasoning behind my solution. First, identify each chat that starts a chatting period. You can do this with a flag that identifies a chat that is more than 10 minutes from the previous chat.
Then, take this flag and do a cumulative sum. This sum actually serves as a grouping identifier for the chat periods. Finally, aggregate the results to get the info for each chat period.
with cmflag as (
select cm.*,
(case when datediff(min, prevmessagetime, messagetime) > 10
then 0
else 1
end) as ChatPeriodStartFlag
from (select cm.*,
(select top 1 messagetime
from tbl_chatMessages cm2
where cm2.senderId = cm.senderId or
cm2.RecipientId = cm.senderId
) as prevmessagetme
from tbl_chatMessages cm
) cm
),
cmcum as (
select cm.*,
(select sum(ChatPeriodStartFlag)
from cmflag cmf
where cm2.senderId = cm.senderId or
cm2.RecipientId = cm.senderId and
cmf.messagetime <= cm.messagetime
) as ChatPeriodGroup
from tbl_chatMessages cm
)
select cm.SenderId, ChatPeriodGroup, min(messageTime) as mint, max(messageTime) as maxT
from cmcum
group by cm.SenderId, ChatPeriodGroup;
One challenge that I may not fully understand is how you are matching between senders and recipients. All the rows in your sample data have the same pair. This is looking at the "user" from the SenderId perspective, but takes into account that in a chat period, the user could be either the sender or recipient.

You could use this query (here):
DECLARE #Results TABLE(
RowNum INT NOT NULL,
senderID INT NOT NULL DEFAULT(80),
recipientID INT NOT NULL DEFAULT(79),
PRIMARY KEY(RowNum,senderID,recipientID),
messageTime DATETIME NOT NULL
);
INSERT INTO #Results(RowNum,senderID,recipientID,messageTime)
SELECT ROW_NUMBER() OVER(PARTITION BY senderID,recipientID ORDER BY messageTime, ID) AS RowNum,
c.senderID,c.recipientID,c.messageTime
FROM dbo.tbl_chatMessages c;
WITH RecursiveCTE
AS(
SELECT crt.RowNum,crt.senderID,crt.recipientID,
crt.messageTime,
1 AS SessionID
FROM #Results crt
WHERE crt.RowNum=1
UNION ALL
SELECT crt.RowNum,crt.senderID,crt.recipientID,
crt.messageTime,
CASE
WHEN DATEDIFF(MINUTE,prev.messageTime,crt.messageTime) <= 10 THEN prev.SessionID
ELSE prev.SessionID+1
END
FROM #Results crt INNER JOIN RecursiveCTE prev ON crt.RowNum=prev.RowNum+1
AND crt.senderID=prev.senderID
AND crt.recipientID=prev.recipientID
)
SELECT *,
STUFF(CONVERT(VARCHAR(8), DATEADD(SECOND,x.SessionDuration,0), 114), 1,3,'') AS SessionDuration_mmss,
SUM(x.SessionDuration) OVER() AS SessionDuration_Overall,
STUFF(CONVERT(VARCHAR(8), DATEADD(SECOND,SUM(x.SessionDuration) OVER(),0), 114), 1,3,'') AS SessionDuration_Overall_mmss
FROM(
SELECT r.senderID,r.recipientID,r.SessionID,
DATEDIFF(SECOND, MIN(r.messageTime),MAX(r.messageTime)) AS SessionDuration
FROM RecursiveCTE r
GROUP BY r.senderID,r.recipientID,r.SessionID
) x
OPTION(MAXRECURSION 0);
Results:
senderID recipientID SessionID SessionDuration SessionDuration_mmss SessionDuration_Overall SessionDuration_Overall_mmss
-------- ----------- ----------- --------------- -------------------- ----------------------- ----------------------------
80 79 1 97 01:37 140 02:20
80 79 2 43 00:43 140 02:20

I’d focus on slight modifications in table structure and updating the chat server application code (if possible of course).
Can you have the chat server to generate new chat ID every time there is a delay between messages that is longer than X minutes? If yes then calculating chat duration will become very easy.

Related

Update record for the last week

I'm building a report that needs to show how many users were upgraded from account status 1 to account status 2 each hour for the last week (and delete hours where the upgrades = 0). My table has an updated date, however it isn't certain that the account status is the item being updated (it could be contact information etc).
The basic table config that I'm working with is below. There are other columns but they aren't needed for my query.
account_id, account_status, updated_date.
My initial idea was to first filter and look at the data for the current week, then find if they were at account_status = 1 and later account_status = 2.
What's the best way to tackle this?
This is the kind of thing that you would use a SELF JOIN for. It's tough to say exactly how to do this without getting any kind of example data, but hopefully you can build off of this at least. There are a lot of tutorials on how to write a successful self join, so I'd refer to those if you're having difficulties.
select a.account_id
from tableName a, tableName b
where a.account_id= b.account_id
and
(a.DateModified > 'YYYY-MM-DD' and a.account_status = 1)
and
(b.DateModified < 'YYYY-MM-DD' and b.account_status= 2)
Maybe you could try to rank all the updates older than an update, with a status of 2 for an account by the timestamp descending. Check if such an entry with status 1 and rank 1 exists, to know that the respective younger update did change the status from 1 to 2.
SELECT *
FROM elbat t1
WHERE t1.account_status = 2
AND EXISTS (SELECT *
FROM (SELECT rank() OVER (ORDER BY t2.updated_date DESC) r,
t2.account_status
FROM elbat t2
WHERE t2.account_id = t1.account_id
AND t2.updated_date <= t1.updated_date) x
WHERE x.account_status = 1
AND x.r = 1);
Then, to get the hours you, could create a table variable and fill it with the hours worth a week (unless you already have a suitable calender/time table). Then INNER JOIN that table (variable) to the result from above. Since it's an INNER JOIN hours where no status update exists won't be in the result.
DECLARE #current_time datetime = getdate();
DECLARE #current_hour datetime = dateadd(hour,
datepart(hour,
#current_time),
convert(datetime,
convert(date,
#current_time)));
DECLARE #hours
TABLE (hour datetime);
DECLARE #interval_size integer = 7 * 24;
WHILE #interval_size > 0
BEGIN
INSERT INTO #hours
(hour)
VALUES (dateadd(hour,
-1 * #interval_size,
#current_hour));
SET #interval_size = #interval_size - 1;
END;
SELECT *
FROM #hours h
INNER JOIN (SELECT *
FROM elbat t1
WHERE t1.account_status = 2
AND EXISTS (SELECT *
FROM (SELECT rank() OVER (ORDER BY t2.updated_date DESC) r,
t2.account_status
FROM elbat t2
WHERE t2.account_id = t1.account_id
AND t2.updated_date <= t1.updated_date) x
WHERE x.account_status = 1
AND x.r = 1)) y
ON convert(date,
y.updated_date) = h.convert(date,
h.hour)
AND datepart(hour,
y.updated_date) = datepart(hour,
h.hour);
If you use this often and/or performance is important, you might consider to introduce persistent, computed and indexed columns for the convert(...) and datepart(...) expressions and use them in the query instead. Indexing the calender/time table and the columns used in the subqueries is also worth a consideration.
(Disclaimer: Since you didn't provide DDL of the table nor any sample data this is totally untested.)

SQL Query Help - Negative reporting

Perhaps somebody can help with Ideas or a Solution. A User asked me for a negative report. We have a table with tickets each ticket has a ticket number which would be easy to select but the user wants a list of missing tickets between the first and last ticket in the system.
E.g. Select TicketNr from Ticket order by TicketNr
Result
1,
2,
4,
7,
11
But we actually want the result 3,5,6,8,9,10
CREATE TABLE [dbo].[Ticket](
[pknTicketId] [int] IDENTITY(1,1) NOT NULL,
[TicketNr] [int] NULL
) ON [PRIMARY]
GO
SQL Server 2016 - TSQL
Any ideas ?
So a bit more information is need all solution thus far works on small table. Our production database has over 4 million tickets. Hence why we need to find the missing ones.
First get the minimum and maximum, then generate all posible ticket numbers and finally select the ones that are missing.
;WITH FirstAndLast AS
(
SELECT
MinTicketNr = MIN(T.TicketNr),
MaxTicketNr = MAX(T.TicketNr)
FROM
Ticket AS T
),
AllTickets AS
(
SELECT
TicketNr = MinTicketNr,
MaxTicketNr = T.MaxTicketNr
FROM
FirstAndLast AS T
UNION ALL
SELECT
TicketNr = A.TicketNr + 1,
MaxTicketNr = A.MaxTicketNr
FROM
AllTickets AS A
WHERE
A.TicketNr + 1 <= A.MaxTicketNr
)
SELECT
A.TicketNr
FROM
AllTickets AS A
WHERE
NOT EXISTS (
SELECT
'missing ticket'
FROM
Ticket AS T
WHERE
A.TicketNr = T.TicketNr)
ORDER BY
A.TicketNr
OPTION
(MAXRECURSION 32000)
If you can accept the results in a different format, the following will do what you want:
select TicketNr + 1 as first_missing,
next_TicketNr - 1 as last_missing,
(next_TicketNr - TicketNr - 1) as num_missing
from (select t.*, lead(TicketNr) over (order by TicketNr) as next_TicketNr
from Ticket t
) t
where next_TicketNr <> TicketNr + 1;
This shows each sequence of missing ticket numbers on a single row, rather than a separate row for each of them.
If you do use a recursive CTE, I would recommend doing it only for the missing tickets:
with cte as (
select (TicketNr + 1) as missing_TicketNr
from (select t.*, lead(TicketNr) over (order by TicketNr) as next_ticketNr
from tickets t
) t
where next_TicketNr <> TicketNr + 1
union all
select missing_TicketNr + 1
from cte
where not exists (select 1 from tickets t2 where t2.TicketNr = cte.missing_TicketNr + 1)
)
select *
from cte;
This version starts with the list of missing ticket numbers. It then adds a new one, as the numbers are not found.
One method is to use recursive cte to find the missing ticket numbers :
with missing as (
select min(TicketNr) as mnt, max(TicketNr) as mxt
from ticket t
union all
select mnt+1, mxt
from missing m
where mnt < mxt
)
select m.*
from missing m
where not exists (select 1 from tickets t where t.TicketNr = m.mnt);
This should do the trick: SQL Fiddle
declare #ticketsTable table (ticketNo int not null)
insert #ticketsTable (ticketNo) values (1),(2),(4),(7),(11)
;with cte1(ticketNo, isMissing, sequenceNo) AS
(
select ticketNo
, 0
, row_number() over (order by ticketNo)
from #ticketsTable
)
, cte2(ticketNo, isMissing, sequenceNo) AS
(
select ticketNo, isMissing, sequenceNo
from cte1
union all
select a.ticketNo + 1
, 1
, a.sequenceNo
from cte2 a
inner join cte1 b
on b.sequenceNo = a.sequenceNo + 1
and b.ticketNo != a.ticketNo + 1
)
select *
from cte2
where isMissing = 1
order by ticketNo
It works by collecting all of the existing tickets, marking them as existing, and assigning each a consecutive number giving their order in the original list.
We can then see the gaps in the list by finding any spots where the consecutive order number shows the next record, but the ticket numbers are not consecutive.
Finally, we recursively fill in the gaps; working from the start of a gap and adding new records until that gap's consecutive numbers no longer has a gap between the related ticket numbers.
I think this one give you easiest solution
with cte as(
select max(TicketNr) maxnum,min(TicketNr) minnum from Ticket )
select a.number FROM master..spt_values a,cte
WHERE Type = 'P' and number < cte.maxnum and number > cte.minno
except
select TicketNr FROM Ticket
So After looking at all the solutions
I went with creating a temp table with a full range of number from Starting to Ending ticket and then select from the Temp table where the ticket number not in the ticket table.
The reason being I kept running in MAXRECURSION problems.

sql lowest running balance in a group

I've been trying for days to solve this problem to no solution.
I want to get the lowest running balance in a group.
Here is a sample data
The running balance is imaginary and is not part of the table.
the running balance is also computed dynamically.
the problem is I want to get the lowest running balance in a Specific month (January)
so the output should be 150 for memberid 10001 and 175 for memberid 10002 as highlighted in the image.
my desired out put should be
memberid | balance
10001 | 150
10002 | 175
Is that possible using sql query only?
PS. Using c# to compute lowest running balance is very slow since I have more than 600,000 records in my table.
I've updated the question.
The answer provided by Mihir Shah gave me the idea how solve my problem.
His answer takes to much time to process making it as slow as my computation on my c# program because his code loops on every record.
Here is my answer to get the minimum lowest value in a specific group (specific month) with a running value or running total without sacrificing a lot of performance.
with IniMonth1 as
(
select a.memberid, a.iniDeposit, a.iniWithdrawal,
(cast(a.iniDeposit as decimal(10,2)) - cast(a.iniWithdrawal as decimal(10,2))) as RunningTotal
from
(
select b.memberid, sum(b.depositamt) as iniDeposit, sum(b.withdrawalamt) as iniWithdrawal
from savings b
where trdate < '01/01/2016'
group by b.memberid
) a /*gets all the memberid, sum of deposit amount and withdrawal amt from the beginning of the savings before the specific month */
where cast(a.iniDeposit as decimal(10,2)) - cast(a.iniWithdrawal as decimal(10,2)) > 0 /*filters zero savings */
)
,DetailMonth1 as
(
select a.memberid, a.depositamt,a.withdrawalamt,
(cast(a.depositamt as decimal(10,2)) - cast(a.withdrawalamt as decimal(10,2))) as totalBal,
Row_Number() Over(Partition By a.memberid Order By a.trdate Asc) RowID
from savings a
where
a.trdate >= '01/01/2016'
and
a.trdate <= '01/31/2016'
and (a.depositamt<>0 or a.withdrawalamt<>0)
) /* gets all record within the specific month and gives a no of row as an id for the running value in the next procedure*/
,ComputedDetailMonth1 as
(
select a.memberid, min(a.runningbalance) as MinRunningBal
from
(
select a.rowid, a.memberid, a.totalbal,
(
sum(b.totalbal) +
(case
when c.runningtotal is null then 0
else c.runningtotal
end)
)as runningbalance , c.runningtotal as oldbalance
from DetailMonth1 a
inner join DetailMonth1 b
on b.rowid<=a.rowid
and a.memberid=b.memberid
left join IniMonth1 c
on a.memberid=c.memberid
group by a.rowid,a.memberid,a.totalbal,c.runningtotal
) a
group by a.memberid
) /* the loop is only for the records of the specific month only making it much faster */
/* this gets the running balance of specific month ONLY and ADD the sum total of IniMonth1 using join to get the running balance from the beginning of savings to the specific month */
/* I then get the minimum of the output using the min function*/
, OldBalanceWithNoNewSavingsMonth1 as
(
select a.memberid,a.RunningTotal
from
IniMonth1 a
left join
DetailMonth1 b
on a.memberid = b.memberid
where b.totalbal is null
)/*this gets all the savings that is not zero and has no transaction in the specific month making and this will become the default value as the lowest value if the member has no transaction in the specific month. */
,finalComputedMonth1 as
(
select a.memberid,a.runningTotal as MinRunTotal from OldBalanceWithNoNewSavingsMonth1 a
union
select b.memberid,b.MinRunningBal from ComputedDetailMonth1 b
)/*the record with minimum running total with clients that has a transaction in the specific month Unions with the members with no current transaction in the specific month*/
select * from finalComputedMonth1 order by memberid /* display the final output */
I have more than 600k savings record on my savings table
Surprisingly the performance of this code is very efficient.
It takes almost 2hr using my c# program to manually compute every record of all the members.
This code makes only 2 secs and at most 9 secs just to compute everything.
i Just display to c# for another 2secs.
The output of this code was tested and compared with my computation using my c# program.
May be below one is help you
Set Nocount On;
Declare #CashFlow Table
(
savingsid Varchar(50)
,memberid Int
,trdate Date
,deposit Decimal(18,2)
,withdrawal Decimal(18,2)
)
Insert Into #CashFlow(savingsid,memberid,trdate,deposit,withdrawal) Values
('10001-0002',10001,'01/01/2015',1000,0)
,('10001-0003',10001,'01/07/2015',25,0)
,('10001-0004',10001,'01/13/2015',25,0)
,('10001-0005',10001,'01/19/2015',0,900)
,('10001-0006',10001,'01/25/2015',25,0)
,('10001-0007',10001,'01/31/2015',25,0)
,('10001-0008',10001,'02/06/2015',25,0)
,('10001-0009',10001,'02/12/2015',25,0)
,('10001-0010',10001,'02/18/2015',0,200)
,('10002-0001',10002,'01/01/2015',500,0)
,('10002-0002',10002,'01/07/2015',25,0)
,('10002-0003',10002,'01/13/2015',0,200)
,('10002-0004',10002,'01/19/2015',25,0)
,('10002-0005',10002,'01/25/2015',25,0)
,('10002-0006',10002,'01/31/2015',0,200)
,('10002-0007',10002,'02/06/2015',25,0)
,('10002-0008',10002,'02/12/2015',25,0)
,('10002-0009',10002,'02/12/2015',0,200)
;With TrialBalance As
(
Select Row_Number() Over(Partition By cf.memberid Order By cf.trdate Asc) RowNum
,cf.memberid
,cf.deposit
,cf.withdrawal
,cf.trdate
From #CashFlow As cf
)
,RunningBalance As
(
Select tb.RowNum
,tb.memberid
,tb.deposit
,tb.withdrawal
,tb.trdate
From TrialBalance As tb
Where tb.RowNum = 1
Union All
Select tb.RowNum
,rb.memberid
,Cast((rb.deposit + tb.deposit - tb.withdrawal) As Decimal(18,2))
,rb.withdrawal
,tb.trdate
From TrialBalance As tb
Join RunningBalance As rb On tb.RowNum = (rb.Rownum + 1) And tb.memberid = rb.memberid
)
Select rb.memberid
,Min(rb.deposit) As runningBalance
From RunningBalance As rb
Where Year(rb.trdate) = 2015
And Month(rb.trdate) = 1
Group By rb.memberid

Find Segment with Longest Stay Per Booking

We have a number of bookings and one of the requirements is that we display the Final Destination for a booking based on its segments. Our business has defined the Final Destination as that in which we have the longest stay. And Origin being the first departure point.
Please note this is not the segments with the Longest Travel time i.e. Datediff(minute, DepartDate, ArrivalDate) This is requesting the one with the Longest gap between segments.
This is a simplified version of the tables:
Create Table Segments
(
BookingID int,
SegNum int,
DepartureCity varchar(100),
DepartDate datetime,
ArrivalCity varchar(100),
ArrivalDate datetime
);
Create Table Bookings
(
BookingID int identity(1,1),
Locator varchar(10)
);
Insert into Segments values (1,2,'BRU','2010-03-06 10:40','FIH','2010-03-06 20:20:00')
Insert into Segments values (1,4,'FIH','2010-03-13 21:50:00','BRU', '2010-03-14 07:25:00')
Insert into Segments values (2,2,'BOD','2010-02-10 06:50:00','AMS','2010-02-10 08:50:00')
Insert into Segments values (2,3,'AMS','2010-02-10 10:40:00','EBB','2010-02-10 20:40:00')
Insert into Segments values (2,4,'EBB','2010-02-28 22:55:00','AMS','2010-03-01 05:35:00')
Insert into Segments values (2,5,'AMS','2010-03-01 10:25:00','BOD','2010-03-01 12:15:00')
insert into Segments values (3,2,'BRU','2010-03-09 12:10:00','IAD','2010-03-09 14:46:00')
Insert into Segments Values (3,3,'IAD','2010-03-13 17:57:00','BRU','2010-03-14 07:15:00')
insert into segments values (4,2,'BRU','2010-07-27','ADD','2010-07-28')
insert into segments values (4,4,'ADD','2010-07-28','LUN','2010-07-28')
insert into segments values (4,5,'LUN','2010-08-23','ADD','2010-08-23')
insert into segments values (4,6,'ADD','2010-08-23','BRU','2010-08-24')
Insert into Bookings values('5MVL7J')
Insert into Bookings values ('Y2IMXQ')
insert into bookings values ('YCBL5C')
Insert into bookings values ('X7THJ6')
I have created a SQL Fiddle with real data here:
SQL Fiddle Example
I have tried to do the following, however this doesn't appear to be correct.
SELECT Locator, fd.*
FROM Bookings ob
OUTER APPLY
(
SELECT Top 1 DepartureCity, ArrivalCity
from
(
SELECT DISTINCT
seg.segnum ,
seg.DepartureCity ,
seg.DepartDate ,
seg.ArrivalCity ,
seg.ArrivalDate,
(SELECT
DISTINCT
DATEDIFF(MINUTE , seg.ArrivalDate , s2.DepartDate)
FROM Segments s2
WHERE s2.BookingID = seg.BookingID AND s2.segnum = seg.segnum + 1) 'LengthOfStay'
FROM Bookings b(NOLOCK)
INNER JOIN Segments seg (NOLOCK) ON seg.bookingid = b.bookingid
WHERE b.Locator = ob.locator
) a
Order by a.lengthofstay desc
)
FD
The results I expect are:
Locator Origin Destination
5MVL7J BRU FIH
Y2IMXQ BOD EBB
YCBL5C BRU IAD
X7THJ6 BRU LUN
I get the feeling that a CTE would be the best approach, however my attempts do this so far failed miserably. Any help would be greatly appreciated.
I have managed to get the following query working but it only works for one at a time due to the top one, but I'm not sure how to tweak it:
WITH CTE AS
(
SELECT distinct s.DepartureCity, s.DepartDate, s.ArrivalCity, s.ArrivalDate, b.Locator , ROW_NUMBER() OVER (PARTITION BY b.Locator ORDER BY SegNum ASC) RN
FROM Segments s
JOIN bookings b ON s.bookingid = b.BookingID
)
SELECT C.Locator, c.DepartureCity, a.ArrivalCity
FROM
(
SELECT TOP 1 C.Locator, c.ArrivalCity, c1.DepartureCity, DATEDIFF(MINUTE,c.ArrivalDate, c1.DepartDate) 'ddiff'
FROM CTE c
JOIN cte c1 ON c1.Locator = C.Locator AND c1.rn = c.rn + 1
ORDER BY ddiff DESC
) a
JOIN CTE c ON C.Locator = a.Locator
WHERE c.rn = 1
You can try something like this:
;WITH CTE_Start AS
(
--Ordering of segments to eliminate gaps
SELECT *, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY SegNum) RN
FROM dbo.Segments
)
, RCTE_Stay AS
(
--recursive CTE to calculate stay between segments
SELECT *, 0 AS Stay FROM CTE_Start s WHERE RN = 1
UNION ALL
SELECT sNext.*, DATEDIFF(Mi, s.ArrivalDate, sNext.DepartDate)
FROM CTE_Start sNext
INNER JOIN RCTE_Stay s ON s.RN + 1 = sNext.RN AND s.BookingID = sNext.BookingID
)
, CTE_Final AS
(
--Search for max(stay) for each bookingID
SELECT *, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY Stay DESC) AS RN_Stay
FROM RCTE_Stay
)
--join Start and Final on RN=1 to find origin and departure
SELECT b.Locator, s.DepartureCity AS Origin, f.DepartureCity AS Destination
FROM CTE_Final f
INNER JOIN CTE_Start s ON f.BookingID = s.BookingID
INNER JOIN dbo.Bookings b ON b.BookingID = f.BookingID
WHERE s.RN = 1 AND f.RN_Stay = 1
SQLFiddle DEMO
You can use the OUTER APPLY + TOP operators to find the next values SegNum. After finding the gap between segments are used MIN/MAX aggregate functions with OVER clause as conditions in the CASE expression
;WITH cte AS
(
SELECT seg.BookingID,
CASE WHEN MIN(seg.segNum) OVER(PARTITION BY seg.BookingID) = seg.segNum
THEN seg.DepartureCity END AS Origin,
CASE WHEN MAX(DATEDIFF(MINUTE, seg.ArrivalDate, o.DepartDate)) OVER(PARTITION BY seg.BookingID)
= DATEDIFF(MINUTE, seg.ArrivalDate, o.DepartDate)
THEN o.DepartureCity END AS Destination
FROM Segments seg (NOLOCK)
OUTER APPLY (
SELECT TOP 1 seg2.DepartDate, seg2.DepartureCity
FROM Segments seg2
WHERE seg.BookingID = seg2.BookingID
AND seg.SegNum < seg2.SegNum
ORDER BY seg2.SegNum ASC
) o
)
SELECT b.Locator, MAX(c.Origin) AS Origin, MAX(c.Destination) AS Destination
FROM cte c JOIN Bookings b ON c.BookingID = b.BookingID
GROUP BY b.Locator
See demo on SQLFiddle
The statement below:
;WITH DataSource AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY BookingID ORDER BY DATEDIFF(SS,DepartDate,ArrivalDate) DESC) AS Row
,Segments.BookingID
,Segments.SegNum
,Segments.DepartureCity
,Segments.DepartDate
,Segments.ArrivalCity
,Segments.ArrivalDate
,DATEDIFF(SS,DepartDate,ArrivalDate) AS DiffInSeconds
FROM Segments
)
SELECT *
FROM DataSource DS
INNER JOIN Bookings B
ON DS.[BookingID] = B.[BookingID]
Will give the following output:
So, adding the following clause to the above statement:
WHERE Row = 1
will give you what you need.
Few important things:
As you can see from the screenshot below, there are two records with same difference in second. If you want to show both of them (or all of them if there are), instead ROW_NUMBER function use RANK function.
The return type of DATEDIFF is INT. So, there is limitation for seconds max deference value. It is as follows:
If the return value is out of range for int (-2,147,483,648 to
+2,147,483,647), an error is returned. For millisecond, the maximum difference between startdate and enddate is 24 days, 20 hours, 31
minutes and 23.647 seconds. For second, the maximum difference is 68
years.

Find the longest sequence of a value in a table

This is an SQL Question, I think it is difficult one - I'm not sure it is possible to achieve in a simple SQL sentence or a stored procedure:
I want to find the number of the longest sequence of the same (known) number in a column in a table:
example:
TABLE:
DATE SALEDITEMS
1/1/09 4
1/2/09 3
1/3/09 3
1/4/09 4
1/5/09 3
calling the sp/sentence for 4 will give 1 calling the sp/sentecne for 3 will give 2
as there was 2 times in a row number 3.
I'm running SQL server 2008.
UPDATE: I generated a million rows of random data, and abandoned the recursive CTE solution, as its query plan didn't make good use of indexes in the optimizer.
But the non-recursive solution I originaly posted turned out to work great, as long as there was an additional non-clustered index on (SALEDITEMS, [DATE]). This makes sense, since the query needs to filter in both directions (both by date and by SALEDITEMS). With this additional index, queries on a million rows return in under 2 seconds on my (not very beefy) desktop mathine. Without this index, the query was dog-slow.
BTW, this is a great example of how SQL Server's cost-based query optimization totally breaks down in some cases. The recursive CTE solution has a cost (on my PC) of 42 and takes at least several minutes to finish. The non-recursive solution has a cost of 15,446 (!!!) and completes in 1.5 seconds. Moral of the story: when comparing SQL Server query plans, don't assume that cost necessarily correlates to query performance!
Anyway, here's the solution I'd recommend (the same non-recursive CTE I posted earlier) :
DECLARE #SALEDITEMS INT = 3;
WITH SalesNoMatch ([DATE], SALEDITEMS, NoMatchDate)
AS
(
SELECT [DATE], SALEDITEMS,
(SELECT MIN([DATE]) FROM Sales s2 WHERE s2.SALEDITEMS <> #SALEDITEMS
AND s2.[DATE] > s1.[DATE]) as NoMatchDate
FROM Sales s1
)
, SalesMatchCount ([DATE], ConsecutiveCount) AS
(
SELECT [DATE], 1+(SELECT COUNT(1) FROM Sales s2 WHERE s2.[DATE] > s1.[DATE] AND s2.[DATE] < NoMatchDate)
FROM SalesNoMatch s1
WHERE s1.SALEDITEMS = #SALEDITEMS
)
SELECT MAX(ConsecutiveCount)
FROM SalesMatchCount;
Here's the DDL I used to test this, including indexes you'll need:
CREATE TABLE [Sales](
[DATE] date NOT NULL,
[SALEDITEMS] int NOT NULL
);
CREATE UNIQUE CLUSTERED INDEX IX_Sales ON Sales ([DATE]);
CREATE UNIQUE NONCLUSTERED INDEX IX_Sales2 ON Sales (SALEDITEMS, [DATE]);
And here's how I created my test data-- 1,000,001 rows with ascending dates with SALEDITEMS randomly set between 1 and 10.
INSERT INTO Sales ([DATE], SALEDITEMS)
VALUES ('1/1/09', 5)
DECLARE #i int = 0;
WHILE (#i < 1000000)
BEGIN
INSERT INTO Sales ([DATE], SALEDITEMS)
SELECT DATEADD (d, 1, (SELECT MAX ([DATE]) FROM Sales)), ABS(CHECKSUM(NEWID())) % 10 + 1
SET #i = #i + 1;
END
Here's the recursive-CTE solution that I abandoned:
DECLARE #SALEDITEMS INT = 3;
-- recursive CTE solution (remember to set MAXRECURSION!)
WITH SalesRowNum ([DATE], SALEDITEMS, RowNum)
AS
(
SELECT [DATE], SALEDITEMS, ROW_NUMBER() OVER (ORDER BY s1.[DATE]) as RowNum
FROM Sales s1
)
, SalesCTE (RowNum, [DATE], ConsecutiveCount)
AS
(
SELECT s1.RowNum, s1.[DATE], 1 AS ConsecutiveCount
FROM SalesRowNum s1
WHERE SALEDITEMS = #SALEDITEMS
UNION ALL
SELECT s1.RowNum, s1.[DATE], ConsecutiveCount + 1 AS ConsecutiveCount
FROM SalesRowNum s1
INNER JOIN SalesCTE s2 ON s1.RowNum = s2.RowNum + 1
WHERE SALEDITEMS = #SALEDITEMS
)
SELECT MAX(ConsecutiveCount)
FROM SalesCTE;
Untested, because you did not provide DDL and sample data:
DECLARE #SALEDITEMS INT;
SET #SALEDITEMS=3;
SELECT MAX(cnt) FROM(
SELECT COUNT(*) FROM YourTable JOIN (
SELECT y1.[Date] AS d1, y2.[Date] AS d2
FROM YourTable AS y1 JOIN YourTable AS y2
ON y1.SALEDITEMS=#SALEDITEMS AND y2.SALEDITEMS=#SALEDITEMS
AND NOT EXISTS(SELECT 1 FROM YourTable AS y
WHERE y.SALEDITEMS<>#SALEDITEMS
AND y1.[Date] < y.[Date] AND y.[Date] < y2.[Date])
) AS t
WHERE [Date] BETWEEN t.d1 AND t.d2
) AS t;