SQL find next unique date and subaccount per account

SQL find next unique date and subaccount per account - sql

This query has a few requirements. The basic idea is that for each account, pull the next admit_date and corresponding discharge_date after the subaccount of interest. If there is no next admit_date that is unique, indicate "No Readmit."
I realize pictures are not encouraged on StackOverflow, but I feel a visual aid is helpful. The accounts of interest are AAA, BBB, CCC and DDD and the subaccounts of interest are 121, 214, 315, 414 and 416. Note that CCC has no next unique admit_date (would be "No Readmit"), DDD has two subaccounts of interest with a next unique admit_dates, and that the subaccounts are not necessarily in numerical order (i.e. BBB begins at 221 and ends at 216). So transforming this:
To this:
Here is the setup code:
CREATE TABLE random_table
(
account VarChar(50),
subaccount VarChar(50),
admit_date DATETIME,
discharge_date DATETIME
);
INSERT INTO random_table
VALUES
('AAA',111,6/20/2021,6/25/2021),
('AAA',121,6/20/2021,6/25/2021),
('AAA',131,7/1/2021,7/3/2021),
('AAA',141, 8/2/2021, 8/5/2021),
('BBB',216,4/1/2021,4/3/2021),
('BBB',213,4/1/2021,4/3/2021),
('BBB',221,4/1/2021,4/3/2021),
('BBB',215,4/1/2021,4/3/2021),
('BBB',216,4/5/2021,4/10/2021),
('CCC',313,11/1/2020,11/5/2020),
('CCC',314,11/15/2020,11/17/2020),
('CCC',315,12/23/2020,12/24/2020),
('CCC',316,12/23/2020,12/24/2020),
('DDD',414,7/1/2021,7/3/2021),
('DDD',412,7/6/2021,7/7/2021),
('DDD',416,8/1/2021,8/5/2021),
('DDD',417,8/10/2021,8/15/2021)
To solve for this, I've been trying to use a combination of row_numbers() to mark the first new instance of each admit_date (partitioned by the account), as well as CTEs to select those relevant rows. But obviously not there yet. Any suggestions? Here's what I have:
select
cte2.*
,case when cte2.subaccount in (111,121,131,141,216,213,221,215,216,313,314,315,316,414,412,416,417
) then lead(cte2.admit_date) over (order by cte2.account, cte2.row_nums)
else null
end second_admit
from (
select
cte.*
,row_number() over (partition by cte.account order by cte.row_num) row_nums
from (
select distinct
hsp.subaccount
,row_number() over (partition by pat.account, hsp.admit_date order by pat.account) row_num
,case when row_number() over (partition by pat.account,hsp.admit_date order by pat.account) =1 then 'New Admit' else null end new_admit
,convert(varchar,hsp.admit_date,101) adm_date
,convert(varchar,hsp.discharge_date,101) disch_date
,pat.account
from hsp_account hsp
left join patient pat on hsp.pat_id=pat.pat_id
where pat.account in ('AAA','BBB','CCC','DDD')
) cte
where cte.new_admit = 'New Admit'
) cte2

Hope this is what you're looking for:
with AI as
(
select * from
(values ('AAA'), ('BBB'), ('CCC'), ('DDD')) A(account)
),
SI as
(
select * from
(values (121), (214), (221), (315), (414), (416)) A(subaccount)
),
T as
(
select * from random_table
where account in (select * from AI)
),
N as
(
select
T1.account,
T1.subaccount,
T1.admit_date,
T1.discharge_date,
T2.subaccount next_subaccount,
T2.admit_date next_admit_date,
T2.discharge_date next_discharge_date,
row_number()
over(
partition by T1.account, T1.subaccount
order by T2.admit_date) group_id
from
T T1
left join
T T2
on
T1.account = T2.account and
T1.admit_date < T2.admit_date
where
T1.subaccount in (select * from SI)
)
select
account, subaccount, admit_date,
next_subaccount, next_admit_date, next_discharge_date
from N
where
N.group_id = 1
Please note that I print NULL instead of 'No Readmit'.

Related

How to generate ten million test/sample records for SQL Server Address Table (AddressID, FirstLine, SecondLine, State, Country) quickly?

Database: SQL Server 2019
Table name:
Address
Columns:
AddressID Varchar(20)
FirstLine Varchar(400)
SecondLine Varchar(400)
State Varchar (100)
Country Varchar (50)
Sample Data:
AddrID0001 | Some Random Street | Some Random Apt | NH | US
AddrID0002 | ueiwoqtyr uiyweqry qow iuyuiwqye | ewquyrtweq Apt 4| CA | US
AddrID0003 | rtyewqr yuwqtert oiyqewiru | ewquyrtweq utyewqr | NC| US
etc.
If these random placeholders can be replaced by actual names, that will be great but not necessary.

Create a table with a few records.For example, 10 records. Then fill the destination table with the following code and writing the desired number in front of the GO command
INSERT INTO Address
VALUES (
(SELECT TOP 1 FirstLine FROM tbl ORDER BY NEWID()) ,
(SELECT TOP 1 SecondLine FROM tbl ORDER BY NEWID()) ,
(SELECT TOP 1 State FROM tbl ORDER BY NEWID()) ,
(SELECT TOP 1 Country FROM tbl ORDER BY NEWID())
)
GO 10000000
With this GO 10000000 command, INSERT queries will be executed 10000000 times

WITH cte as (
select CAST(0 AS INT) as x
union all
select x+1 from cte where x<9)
INSERT INTO Address
SELECT
CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT(CONCAT('AddrID0' , cte1.x), cte2.x) , cte3.x) , cte4.x) , cte5.x) , cte6.x) , cte7.x) as AddressID,
REPLACE(REPLACE(CONVERT(varchar(255), NEWID()),'-',' '),'Q',' ') as FirstLine,
REPLACE(REPLACE(CONVERT(varchar(255), NEWID()),'-',' '),'Q',' ') as SecondLine,
'NH',
'US'
FROM cte as cte1
CROSS JOIN cte as cte2
CROSS JOIN cte as cte3
CROSS JOIN cte as cte4
CROSS JOIN cte as cte5
CROSS JOIN cte as cte6
CROSS JOIN cte as cte7
-- ORDER BY 1
;
output of this query:
select * from (
(select * from (select top(3) * from Address ORDER by AddressID asc) x1 )
union all
(select * from (select top(3) * from Address order by AddressID desc) x2)
) y;
will be something (because of the random output of NEWID()) like:
AddressID
FirstLine
SecondLine
State
Country
AddrID00000000
21348E1C D239 4DC5 AA28 3D609919C9C4
AD9C16F1 9B9D 49F9 A249 7893C7EBEDBC
NH
US
AddrID00000001
D9BDAF89 9147 4F8E BBCB 0345411B7DD8
79C50237 2181 45A2 8551 2A574DD71145
NH
US
AddrID00000002
83818A03 A341 4ED7 AEB8 001CC4162276
9301BDF4 F456 484F BE8C DCDD7EB44060
NH
US
AddrID09999999
D22C6EBB DC33 4572 96F4 8C24C350BEB0
21C946B6 0ECB 46A7 99A0 569B43C3275C
NH
US
AddrID09999998
A52E42DD 7BBA 4440 89D3 825176ED0159
CCBD542F 28E9 449D 8A4C 9AA5ED4E9923
NH
US
AddrID09999997
02F640F6 786E 43D4 95F6 F80A4AA10CA3
88AB6376 65F7 4B5B 9C9C 78D67822B1CD
NH
US
It's up to you to fill State with something random.
"quickly" is a relative word, on my system it took about 2 minutes.
The REPLCE in First- and SecondLine is just a simple attempt to create random length words, and is open for improvement when you have more imagination than I have currently... 😉

SQL - select last and previous different to last

The problem: a simplified membership table containing membership id, starting date for each membership and membership level description:
CREATE TABLE cover
(
[membership_id] int,
[cover_from_date] date,
[description] varchar(57)
);
INSERT INTO cover ([membership_id], [cover_from_date], [description])
VALUES (1, '1/1/2011', 'AA'),
(1, '1/2/2011', 'BB'),
(1, '1/3/2011', 'CC'),
(1, '1/4/2011', 'CC');
The task: to list the current membership and the immediate previous membership different to the current one. So from the above table I would like to see something like:
1, 1/4/2011, CC, 1/2/2011, BB
The attempted solution: I have managed to come up with a solution but it takes an enormous time to run on a large database and I'm sure there are better ways of resolving this problem. My no-doubt over complicated query is as follows:
with cte as
(
select
cover.membership_id, cover.cover_from_date,
cover.description,
row_number() over (partition by cover.membership_id order by cover.cover_from_date desc) AS version_no
from
cover
)
select
cte.membership_id,
cover_now.cover_from_date, cover_now.description,
cover_prev.cover_from_date, cover_prev.description
from
cte
left outer join
cte cover_now on cte.membership_id = cover_now.membership_id
and cover_now.version_no = 1
left outer join
cte cover_prev on cte.membership_id = cover_prev.membership_id
and cover_prev.version_no = (select min(x.version_no)
from cte x
where x.version_no >= 2
and x.membership_id = cover_now.membership_id
and x.description <> cover_now.description)
group by
cte.membership_id, cover_now.cover_from_date, cover_now.description,
cover_prev.cover_from_date, cover_prev.description
The entire fiddle is located here. Any tips on how to optimise the query would be appreciated.

First create an index on membership_id and cover_from_date in descending order. It will be heavily used by this query.
create index cover_by_date on cover (membership_id asc, cover_from_date desc)
Then:
select
membership.membership_id,
membership.cover_from_date,
membership.description,
previous_membership.cover_from_date,
previous_membership.description
from
(
select membership_id, description, cover_from_date, row_number() over (partition by membership_id order by cover_from_date desc) as rank
from cover
) as membership
left join (
select previous.membership_id, previous.description, previous.cover_from_date, row_number() over (partition by previous.membership_id order by previous.cover_from_date desc) as rank
from cover
join cover as previous on
cover.membership_id = previous.membership_id and
cover.description <> previous.description and
cover.cover_from_date > previous.cover_from_date
) as previous_membership on
previous_membership.membership_id = membership.membership_id and
previous_membership.rank = 1
where
membership.rank = 1

Why my SELECT statement returned 2 rows when SELECT MAX(ID)?

I want to select only max(ID) row , when i select direct max(ID) by using this SELECT its working correct and return max(ID) one row with maximum ID
select max(id)
FROM [LAB_CULTURE_RESULT]
where order_id = 1900001265
and testid = 1100
but when i use the other SELECT statement with more details its not working and returns all the rows 4 rows not only max(id) row
this is my select statement how to use max(id) and return one row only
SELECT MAX([ID])
,[SAMPLE_ID]
,[ORDER_ID]
,[TESTID]
,[SAMPLE_STATUS]
,[EXAMINED_BY]
,[EXAMINED_DATE]
,[APPROVED_BY]
,[APPROVED_DATE]
,[RESULT_NOTE]
,[MACHINE_ID]
,[DEPTID]
,[PATIENT_NO]
,[CUSTID]
,[REQ_FORM_NO]
,[PC_FILENO]
,[CULTURE_REPORT]
,[SAMPLE]
,[PUS_CELLS]
,[RED_CELLS]
,[YEAST_CELLS]
,[CLUE_CELLS]
,[RESULT_POSITIVE]
,[AMIKACIN]
,[AZTREONAM]
,[AMOXIXILLIN]
,[AMPICILLIN]
,[AMOXICLAV]
,[AZITHROMYCIN]
,[CEFIXIME]
,[CEFACLOR]
,[CEPHRADINE]
,[CEFTAZIDIME]
,[CEFUROXIME]
,[CEFOTAXIME]
,[CLINDAMYCIN]
,[CIPROFLOXACIN]
,[CLARITHROMYCIN]
,[CEFADROXIL]
,[CEFTRIAXONE]
,[TEICOPLANIN]
,[CEFEPIME]
,[CEFOXITIN]
,[GENTAMICIN]
,[LEVOFLOXACIN]
,[NORFLOXACIN]
,[OXACILLIN]
,[CARBENICILLIN]
,[PIPERACILLIN]
,[PEFLOXACIN]
,[TETRACYCLIN]
,[PENICILLIN]
,[VANCOMYCIN]
,[VIABLE_COLONY_COUNT]
,[UPDATED_BY]
,[UPDATED_DATE]
FROM [LAB_CULTURE_RESULT]
where order_id = 1900001265
and testid = 1100
group by [SAMPLE_ID]
,[ORDER_ID]
,[TESTID]
,[SAMPLE_STATUS]
,[EXAMINED_BY]
,[EXAMINED_DATE]
,[APPROVED_BY]
,[APPROVED_DATE]
,[RESULT_NOTE]
,[MACHINE_ID]
,[DEPTID]
,[PATIENT_NO]
,[CUSTID]
,[REQ_FORM_NO]
,[PC_FILENO]
,[CULTURE_REPORT]
,[SAMPLE]
,[PUS_CELLS]
,[RED_CELLS]
,[YEAST_CELLS]
,[CLUE_CELLS]
,[RESULT_POSITIVE]
,[AMIKACIN]
,[AZTREONAM]
,[AMOXIXILLIN]
,[AMPICILLIN]
,[AMOXICLAV]
,[AZITHROMYCIN]
,[CEFIXIME]
,[CEFACLOR]
,[CEPHRADINE]
,[CEFTAZIDIME]
,[CEFUROXIME]
,[CEFOTAXIME]
,[CLINDAMYCIN]
,[CIPROFLOXACIN]
,[CLARITHROMYCIN]
,[CEFADROXIL]
,[CEFTRIAXONE]
,[TEICOPLANIN]
,[CEFEPIME]
,[CEFOXITIN]
,[GENTAMICIN]
,[LEVOFLOXACIN]
,[NORFLOXACIN]
,[OXACILLIN]
,[CARBENICILLIN]
,[PIPERACILLIN]
,[PEFLOXACIN]
,[TETRACYCLIN]
,[PENICILLIN]
,[VANCOMYCIN]
,[VIABLE_COLONY_COUNT]
,[UPDATED_BY]
,[UPDATED_DATE]

Because what you're doing isn't the "right" answer. Here are a couple of alternatives:
--Using a CTE:
WITH CTE AS(
SELECT [ID]
,[SAMPLE_ID]
,[ORDER_ID]
,[TESTID]
,[SAMPLE_STATUS]
,[EXAMINED_BY]
,[EXAMINED_DATE]
,[APPROVED_BY]
,[APPROVED_DATE]
,[RESULT_NOTE]
,[MACHINE_ID]
,[DEPTID]
,[PATIENT_NO]
,[CUSTID]
,[REQ_FORM_NO]
,[PC_FILENO]
,[CULTURE_REPORT]
,[SAMPLE]
,[PUS_CELLS]
,[RED_CELLS]
,[YEAST_CELLS]
,[CLUE_CELLS]
,[RESULT_POSITIVE]
,[AMIKACIN]
,[AZTREONAM]
,[AMOXIXILLIN]
,[AMPICILLIN]
,[AMOXICLAV]
,[AZITHROMYCIN]
,[CEFIXIME]
,[CEFACLOR]
,[CEPHRADINE]
,[CEFTAZIDIME]
,[CEFUROXIME]
,[CEFOTAXIME]
,[CLINDAMYCIN]
,[CIPROFLOXACIN]
,[CLARITHROMYCIN]
,[CEFADROXIL]
,[CEFTRIAXONE]
,[TEICOPLANIN]
,[CEFEPIME]
,[CEFOXITIN]
,[GENTAMICIN]
,[LEVOFLOXACIN]
,[NORFLOXACIN]
,[OXACILLIN]
,[CARBENICILLIN]
,[PIPERACILLIN]
,[PEFLOXACIN]
,[TETRACYCLIN]
,[PENICILLIN]
,[VANCOMYCIN]
,[VIABLE_COLONY_COUNT]
,[UPDATED_BY]
,[UPDATED_DATE],
ROW_NUMBER() OVER (ORDER BY ID DESC) AS RN
FROM [LAB_CULTURE_RESULT]
WHERE order_id = 1900001265
AND testid = 1100)
SELECT *
FROM CTE
WHERE RN = 1;
--Using TOP 1 WITH TIES:
SELECT TOP 1 WITH TIES
[ID]
,[AZITHROMYCIN]
,[CEFIXIME]
,[CEFACLOR]
,[CEPHRADINE]
,[CEFTAZIDIME]
,[CEFUROXIME]
,[CEFOTAXIME]
,[CLINDAMYCIN]
,[CIPROFLOXACIN]
,[CLARITHROMYCIN]
,[CEFADROXIL]
,[CEFTRIAXONE]
,[TEICOPLANIN]
,[CEFEPIME]
,[CEFOXITIN]
,[GENTAMICIN]
,[LEVOFLOXACIN]
,[NORFLOXACIN]
,[OXACILLIN]
,[CARBENICILLIN]
,[PIPERACILLIN]
,[PEFLOXACIN]
,[TETRACYCLIN]
,[PENICILLIN]
,[VANCOMYCIN]
,[VIABLE_COLONY_COUNT]
,[UPDATED_BY]
,[UPDATED_DATE]
FROM [LAB_CULTURE_RESULT]
WHERE order_id = 1900001265
AND testid = 1100
ORDER BY ROW_NUMBER() OVER (ORDER BY ID DESC);
Note that if you are expecting more than 1 row (maybe the "last" row per order_id) you'll need to add a PARTITION BY clause to the OVER for the ROW_NUMBER() function.

It works exactly as it must.
It returns quantity of rows equal to distinct values of fields in GROUP BY.
Take a time and read documentation on GROUP BY

Your query returns the max ID for every unique combination of the other columns, so it’s behaving as expected.
If you only want the row whose ID is the max:
select
...
from mytable
where ID = (select max(ID) from mytable)

select * from [LAB_CULTURE_RESULT] where id = ( select MAX( id ) from [LAB_CULTURE_RESULT])
How can I select the row with the highest ID in MySQL?

If you want to get the last Id (primary key) inserted in any table then it is better to use IDENT_CURRENT to get the last inserted Id of any table.
GO
SELECT IDENT_CURRENT('Table1')
Go

u should do like this :
select * from [LAB_CULTURE_RESULT] where id=(select max(id) from [LAB_CULTURE_RESULT] where order_id = 1900001265
and testid = 1100)

Trying to simplify a SQL query without UNION

I'm very bad at explaining, so let me try to lay out my issue. I have a table that resembles the following:
Source Value User
======== ======= ======
old1 1 Phil
new 2 Phil
old2 3 Phil
new 4 Phil
old1 1 Mike
old2 2 Mike
new 1 Jeff
new 2 Jeff
What I need to do is create a query that gets values for users based on the source and the value. It should follow this rule:
For every user, get the highest value. However, disregard the 'new'
source if either 'old1' or 'old2' exists for that user.
So based on those rules, my query should return the following from this table:
Value User
======= ======
3 Phil
2 Mike
2 Jeff
I've come up with a query that does close to what is asked:
SELECT MAX([Value]), [User]
FROM
(
SELECT CASE [Source]
WHEN 'old1' THEN 1
WHEN 'old2' THEN 1
WHEN 'new' THEN 2
END AS [SourcePriority],
[Value],
[User]
FROM #UserValues
) MainPriority
WHERE [SourcePriority] = 1
GROUP BY [User]
UNION
SELECT MAX([Value]), [User]
FROM
(
SELECT CASE [Source]
WHEN 'old1' THEN 1
WHEN 'old2' THEN 1
WHEN 'new' THEN 2
END AS [SourcePriority],
[Value],
[User]
FROM #UserValues
) SecondaryPriority
WHERE [SourcePriority] = 2
GROUP BY [User]
However this returns the following results:
Value User
======= ======
3 Phil
4 Phil
2 Mike
2 Jeff
Obviously that extra value for Phil=4 is not desired. How should I attempt to fix this query? I also understand that this is a pretty convoluted solution and that it can probably be more easily solved by proper use of aggregates, however I'm not too familiar with aggregates yet which resulted in me resorting to a union. Essentially I'm looking for help creating the cleanest-looking solution possible.
Here is the SQL code if anyone wanted to populate the table themselves to give it a try:
CREATE TABLE #UserValues
(
[Source] VARCHAR(10),
[Value] INT,
[User] VARCHAR(10)
)
INSERT INTO #UserValues VALUES
('old1', 1, 'Phil'),
('new', 2, 'Phil'),
('old2', 3, 'Phil'),
('new', 4, 'Phil'),
('old1', 1, 'Mike'),
('old2', 2, 'Mike'),
('new', 1, 'Jeff'),
('new', 2, 'Jeff')

You can solve it fairly easily without resorting to window functions. In this case, you need the maximum value where ((not new) OR (there isn't an old1 or old2 entry)).
Here's a query that works correctly with your sample data:
SELECT
MAX(U1.[Value]) as 'Value'
,U1.[User]
FROM
#UserValues U1
WHERE
U1.[Source] <> 'new'
OR NOT EXISTS (SELECT * FROM #UserValues U2 WHERE U2.[User] = U1.[User] AND U2.[Source] IN ('old1','old2'))
GROUP BY U1.[User]

You can use priorities order by with row_number() :
select top (1) with ties uv.*
from #UserValues uv
order by row_number() over (partition by [user]
order by (case when source = 'old2' then 1 when source = 'old1' then 2 else 3 end), value desc
);
However, if you have only source limited with 3 then you can also do :
. . .
order by row_number() over (partition by [user]
order by (case when source = 'new' then 2 else 1 end), value desc
)

with raw_data
as (
select row_number() over(partition by a.[user] order by a.value desc) as rnk
,count(case when a.source in('old1','old2') then 1 end) over(partition by a.[user]) as cnt_old
,a.*
from uservalues a
)
,curated_data
as(select *
,row_number() over(partition by rd.[user] order by rd.value desc) as rnk2
from raw_data rd
where 0 = case when rnk=1 and source='new' and cnt_old>0 then 1 else 0 end
)
select *
from curated_data
where rnk2=1
I am doing the following
raw_data ->First i get rank the values on the basis of max available value per user. Also i get to check if the user has any records which are pegged at old1 or old2 in the source column
curated_data ->i eliminate records which have the highest value(rnk=1) as new if they have cnt_old >0. Also now i rank(rnk2) the records on the highest value available from this result set.
I select the highest available value from curated_data(ie rnk2=1)

I think you should consider setting up an XREF table to define which source is what priority, for a possible more complicated priorisation in the future. I do it with a temp table:
CREATE TABLE #SourcePriority
(
[Source] VARCHAR(10),
[SourcePriority] INT
)
INSERT INTO #SourcePriority VALUES
('old1', 1),
('old2', 1),
('new', 2)
You might also create a View to look up the SourcePriority to the original table. I do it wit a CTE + possible implementation how to look up the top priority with the highest value:
;WITH CTE as (
SELECT s.[SourcePriority], u.[Value], u.[User]
FROM #UserValues as u
INNER JOIN #SourcePriority as s on u.[Source] = s.[Source]
)
SELECT MAX (v.[Value]) as [Value], v.[User]
FROM (
SELECT MIN ([SourcePriority]) as [TopPriority], [User]
FROM cte
GROUP BY [User]
) as s
INNER JOIN cte as v
ON s.[User] = v.[User] and s.[TopPriority] = v.[SourcePriority]
GROUP BY v.[User]

I think you want:
select top (1) with ties uv.*
from (select uv.*,
sum(case when source in ('old1', 'old2') then 1 else 0 end) over (partition by user) as cnt_old
from #UserValues uv
) uv
where cnt_old = 0 or source <> 'new'
order by row_number() over (partition by user order by value desc);

Find Segment with Longest Stay Per Booking

We have a number of bookings and one of the requirements is that we display the Final Destination for a booking based on its segments. Our business has defined the Final Destination as that in which we have the longest stay. And Origin being the first departure point.
Please note this is not the segments with the Longest Travel time i.e. Datediff(minute, DepartDate, ArrivalDate) This is requesting the one with the Longest gap between segments.
This is a simplified version of the tables:
Create Table Segments
(
BookingID int,
SegNum int,
DepartureCity varchar(100),
DepartDate datetime,
ArrivalCity varchar(100),
ArrivalDate datetime
);
Create Table Bookings
(
BookingID int identity(1,1),
Locator varchar(10)
);
Insert into Segments values (1,2,'BRU','2010-03-06 10:40','FIH','2010-03-06 20:20:00')
Insert into Segments values (1,4,'FIH','2010-03-13 21:50:00','BRU', '2010-03-14 07:25:00')
Insert into Segments values (2,2,'BOD','2010-02-10 06:50:00','AMS','2010-02-10 08:50:00')
Insert into Segments values (2,3,'AMS','2010-02-10 10:40:00','EBB','2010-02-10 20:40:00')
Insert into Segments values (2,4,'EBB','2010-02-28 22:55:00','AMS','2010-03-01 05:35:00')
Insert into Segments values (2,5,'AMS','2010-03-01 10:25:00','BOD','2010-03-01 12:15:00')
insert into Segments values (3,2,'BRU','2010-03-09 12:10:00','IAD','2010-03-09 14:46:00')
Insert into Segments Values (3,3,'IAD','2010-03-13 17:57:00','BRU','2010-03-14 07:15:00')
insert into segments values (4,2,'BRU','2010-07-27','ADD','2010-07-28')
insert into segments values (4,4,'ADD','2010-07-28','LUN','2010-07-28')
insert into segments values (4,5,'LUN','2010-08-23','ADD','2010-08-23')
insert into segments values (4,6,'ADD','2010-08-23','BRU','2010-08-24')
Insert into Bookings values('5MVL7J')
Insert into Bookings values ('Y2IMXQ')
insert into bookings values ('YCBL5C')
Insert into bookings values ('X7THJ6')
I have created a SQL Fiddle with real data here:
SQL Fiddle Example
I have tried to do the following, however this doesn't appear to be correct.
SELECT Locator, fd.*
FROM Bookings ob
OUTER APPLY
(
SELECT Top 1 DepartureCity, ArrivalCity
from
(
SELECT DISTINCT
seg.segnum ,
seg.DepartureCity ,
seg.DepartDate ,
seg.ArrivalCity ,
seg.ArrivalDate,
(SELECT
DISTINCT
DATEDIFF(MINUTE , seg.ArrivalDate , s2.DepartDate)
FROM Segments s2
WHERE s2.BookingID = seg.BookingID AND s2.segnum = seg.segnum + 1) 'LengthOfStay'
FROM Bookings b(NOLOCK)
INNER JOIN Segments seg (NOLOCK) ON seg.bookingid = b.bookingid
WHERE b.Locator = ob.locator
) a
Order by a.lengthofstay desc
)
FD
The results I expect are:
Locator Origin Destination
5MVL7J BRU FIH
Y2IMXQ BOD EBB
YCBL5C BRU IAD
X7THJ6 BRU LUN
I get the feeling that a CTE would be the best approach, however my attempts do this so far failed miserably. Any help would be greatly appreciated.
I have managed to get the following query working but it only works for one at a time due to the top one, but I'm not sure how to tweak it:
WITH CTE AS
(
SELECT distinct s.DepartureCity, s.DepartDate, s.ArrivalCity, s.ArrivalDate, b.Locator , ROW_NUMBER() OVER (PARTITION BY b.Locator ORDER BY SegNum ASC) RN
FROM Segments s
JOIN bookings b ON s.bookingid = b.BookingID
)
SELECT C.Locator, c.DepartureCity, a.ArrivalCity
FROM
(
SELECT TOP 1 C.Locator, c.ArrivalCity, c1.DepartureCity, DATEDIFF(MINUTE,c.ArrivalDate, c1.DepartDate) 'ddiff'
FROM CTE c
JOIN cte c1 ON c1.Locator = C.Locator AND c1.rn = c.rn + 1
ORDER BY ddiff DESC
) a
JOIN CTE c ON C.Locator = a.Locator
WHERE c.rn = 1

You can try something like this:
;WITH CTE_Start AS
(
--Ordering of segments to eliminate gaps
SELECT *, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY SegNum) RN
FROM dbo.Segments
)
, RCTE_Stay AS
(
--recursive CTE to calculate stay between segments
SELECT *, 0 AS Stay FROM CTE_Start s WHERE RN = 1
UNION ALL
SELECT sNext.*, DATEDIFF(Mi, s.ArrivalDate, sNext.DepartDate)
FROM CTE_Start sNext
INNER JOIN RCTE_Stay s ON s.RN + 1 = sNext.RN AND s.BookingID = sNext.BookingID
)
, CTE_Final AS
(
--Search for max(stay) for each bookingID
SELECT *, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY Stay DESC) AS RN_Stay
FROM RCTE_Stay
)
--join Start and Final on RN=1 to find origin and departure
SELECT b.Locator, s.DepartureCity AS Origin, f.DepartureCity AS Destination
FROM CTE_Final f
INNER JOIN CTE_Start s ON f.BookingID = s.BookingID
INNER JOIN dbo.Bookings b ON b.BookingID = f.BookingID
WHERE s.RN = 1 AND f.RN_Stay = 1
SQLFiddle DEMO

You can use the OUTER APPLY + TOP operators to find the next values SegNum. After finding the gap between segments are used MIN/MAX aggregate functions with OVER clause as conditions in the CASE expression
;WITH cte AS
(
SELECT seg.BookingID,
CASE WHEN MIN(seg.segNum) OVER(PARTITION BY seg.BookingID) = seg.segNum
THEN seg.DepartureCity END AS Origin,
CASE WHEN MAX(DATEDIFF(MINUTE, seg.ArrivalDate, o.DepartDate)) OVER(PARTITION BY seg.BookingID)
= DATEDIFF(MINUTE, seg.ArrivalDate, o.DepartDate)
THEN o.DepartureCity END AS Destination
FROM Segments seg (NOLOCK)
OUTER APPLY (
SELECT TOP 1 seg2.DepartDate, seg2.DepartureCity
FROM Segments seg2
WHERE seg.BookingID = seg2.BookingID
AND seg.SegNum < seg2.SegNum
ORDER BY seg2.SegNum ASC
) o
)
SELECT b.Locator, MAX(c.Origin) AS Origin, MAX(c.Destination) AS Destination
FROM cte c JOIN Bookings b ON c.BookingID = b.BookingID
GROUP BY b.Locator
See demo on SQLFiddle

The statement below:
;WITH DataSource AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY BookingID ORDER BY DATEDIFF(SS,DepartDate,ArrivalDate) DESC) AS Row
,Segments.BookingID
,Segments.SegNum
,Segments.DepartureCity
,Segments.DepartDate
,Segments.ArrivalCity
,Segments.ArrivalDate
,DATEDIFF(SS,DepartDate,ArrivalDate) AS DiffInSeconds
FROM Segments
)
SELECT *
FROM DataSource DS
INNER JOIN Bookings B
ON DS.[BookingID] = B.[BookingID]
Will give the following output:
So, adding the following clause to the above statement:
WHERE Row = 1
will give you what you need.
Few important things:
As you can see from the screenshot below, there are two records with same difference in second. If you want to show both of them (or all of them if there are), instead ROW_NUMBER function use RANK function.
The return type of DATEDIFF is INT. So, there is limitation for seconds max deference value. It is as follows:
If the return value is out of range for int (-2,147,483,648 to
+2,147,483,647), an error is returned. For millisecond, the maximum difference between startdate and enddate is 24 days, 20 hours, 31
minutes and 23.647 seconds. For second, the maximum difference is 68
years.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL find next unique date and subaccount per account - sql

Related

How to generate ten million test/sample records for SQL Server Address Table (AddressID, FirstLine, SecondLine, State, Country) quickly?

SQL - select last and previous different to last

Why my SELECT statement returned 2 rows when SELECT MAX(ID)?

Trying to simplify a SQL query without UNION

Find Segment with Longest Stay Per Booking

Categories

Resources