Find start and end dates when one field changes - sql

I have this data in a table
FIELD_A FIELD_B FIELD_D
249052903 10/15/2011 N
249052903 11/15/2011 P ------------- VALUE CHANGED
249052903 12/15/2011 P
249052903 1/15/2012 N ------------- VALUE CHANGED
249052903 2/15/2012 N
249052903 3/15/2012 N
249052903 4/15/2012 N
249052903 5/15/2012 N
249052903 6/15/2012 N
249052903 7/15/2012 N
249052903 8/15/2012 N
249052903 9/15/2012 N
When ever the value in FIELD_D changes it forms a group and I need the min and max dates in that group. The query shoud return
FIELD_A GROUP_START GROUP_END
249052903 10/15/2011 10/15/2011
249052903 11/15/2011 12/15/2011
249052903 1/15/2012 9/15/2012
The examples that I have seen so far have the data in Field_D being unique. Here the data can repeat as shown, First it is "N" then it changes to "P" and then back to "N".
Any help will be appreciated
Thanks

You can use analytic functions - LAG, LEAD, and COUNT() OVER to your advantage, if they are supported by your SQL implementation. SQL Fiddle here.
WITH EndsMarked AS (
SELECT
FIELD_A,
FIELD_B,
CASE WHEN FIELD_D = LAG(FIELD_D,1) OVER (ORDER BY FIELD_B)
THEN 0 ELSE 1 END AS IS_START,
CASE WHEN FIELD_D = LEAD(FIELD_D,1) OVER (ORDER BY FIELD_B)
THEN 0 ELSE 1 END AS IS_END
FROM T
), GroupsNumbered AS (
SELECT
FIELD_A,
FIELD_B,
IS_START,
IS_END,
COUNT(CASE WHEN IS_START = 1 THEN 1 END)
OVER (ORDER BY FIELD_B) AS GroupNum
FROM EndsMarked
WHERE IS_START=1 OR IS_END=1
)
SELECT
FIELD_A,
MIN(FIELD_B) AS GROUP_START,
MAX(FIELD_B) AS GROUP_END
FROM GroupsNumbered
GROUP BY FIELD_A, GroupNum;

This is fairly easy to express in SQL using subqueries:
select Field_A, Field_D, min(Field_B) as Group_Start, max(Field_B) as Group_End
from (select t.*,
(select min(field_B)
from t t2
where t2.field_A = t.field_A and
t2.field_B > t.field_B and
t2.Field_D <> t.field_D
) as TheGroup
from t
) t
group by Field_A, Field_D, TheGroup
This is assigning a group identifier using a correlated subquery. The identifier is the first value of Field_B where Field_D changes.
You don't mention the database you are using, so this uses standard SQL.

Don't use SQL for this problem because it is not possible to do it in SQL with a single table scan since it requires comparison between records. It would need a full table scan plus at least a join with itself. It is trivial to implement a solution in a imperative language and it only requires a single table scan.
Edit: a stored procedure would be best.

I modified the answers a bit where you have multiple Field_A's. This should always work :-)
WITH EndsMarked
AS
(
SELECT
[Field_A]
,[Field_B]
,CASE
WHEN LAG([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) IS NULL
AND ROW_NUMBER() OVER (PARTITION BY [Field_A] ORDER BY [Field_B]) = 1
THEN 1
WHEN LAG([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) > 0
<> LAG([Field_D],0) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) > 0
THEN 1
ELSE 0
END AS IS_START
,CASE
WHEN LEAD([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B]) IS NULL
AND ROW_NUMBER() OVER (PARTITION BY [Field_A] ORDER BY [Field_B] DESC) = 1
THEN 1
WHEN LEAD([Field_D],0) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B])
<> LEAD([Field_D],1) OVER (PARTITION BY [Field_A] ORDER BY [Field_A],[Field_B])
THEN 1
ELSE 0
END AS IS_END
FROM
(
SELECT
[Field_A]
,[Field_B]
,[Field_D]
,[Aantal Facturen]
FROM [T]
) F
)
,GroupsNumbered
AS
(
SELECT
[Field_A]
,[Field_B]
,IS_START
,IS_END
,COUNT(CASE
WHEN IS_START = 1
THEN 1
END) OVER (ORDER BY [Field_A]
,[Field_B]) AS GroupNum
FROM EndsMarked
WHERE IS_START = 1
OR IS_END = 1
)
SELECT
[Field_A]
,MIN([Field_B]) AS GROUP_START
,MAX([Field_B]) AS GROUP_END
FROM GroupsNumbered
GROUP BY [Field_A], GroupNum

Related

SQL to return 1 or 0 depending on values in a column's audit trail

If I were to have a table such as the one below:
id_
last_updated_by
1
robot
1
human
1
robot
2
robot
3
robot
3
human
Using SQL, how could I group by the ID and create a new column to indicate whether a human has ever updated the record like this:
id_
last_updated_by
updated_by_human
1
robot
1
2
robot
0
3
robot
1
UPDATE
I'm currently doing the following, though I'm not sure how efficient this is. Selecting the latest record and then merging it with my calculated column via a sub-select.
SELECT MAIN.TRANSACTION_ID,
MAIN.CREATED_DATE
MAIN.CREATED_BY_USER_ID,
MAIN.OWNER_USER_ID,
STP.TOUCHED_BY_HUMAN
FROM (
SELECT TRANSACTION_ID,
CREATED_DATE
CREATED_BY_USER_ID_
OWNER_USER_ID_
FROM TABLE_NAME
WHERE CREATED_DATE >= CAST('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= CAST('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY row_number() OVER (partition by TRANSACTION_ID order by End_Dt desc) = 1
) MAIN
LEFT JOIN (
SELECT TRANSACTION_ID,
CASE
WHEN CREATED_BY_USER_ID IN ('ROBOT', 'MACHINE') OR
CREATED_BY_USER_ID LIKE 'N%' OR
CREATED_BY_USER_ID IS NULL
THEN 0
ELSE 1 END AS CREATED_BY_HUMAN,
CASE
WHEN OWNER_USER_ID IN ('ROBOT', 'MACHINE') OR
OWNER_USER_ID LIKE 'N%' OR
OWNER_USER_ID IS NULL
THEN 0
ELSE 1 END AS OWNED_BY_HUMAN,
CASE
WHEN CREATED_BY_HUMAN = 0 AND
OWNED_BY_HUMAN = 0
THEN 0
ELSE 1 END AS TOUCHED_BY_HUMAN_
FROM TABLE_NAME
WHERE CREATED_DATE >= CAST('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= CAST('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY row_number() OVER (partition by TRANSACTION_ID order by TOUCHED_BY_HUMAN_ desc) = 1
) STP
ON MAIN.TRANSACTION_ID = STP.TRANSACTION_ID
If I'm following your problem, then something like this should work.
SELECT
t.*
,CASE WHEN a.id IS NOT NULL THEN 1 ELSE 0 END AS updated_by_human
FROM table t
LEFT JOIN (SELECT DISTINCT id FROM table WHERE last_updated_by = 'human') a ON t.id = a.id
That takes care of the updated_by_human field, but if you also need to reduce the records in table (only keeping a subset) then you need more information to do that.
Exists clauses are usually not that performant but if your data isn't big this should work.
select id_,
IF (EXISTS (SELECT 1 FROM table_name t2 WHERE t2.last_updated_by = 'human' and t2.id_ = t1.id_), 1, 0) AS updated_by_human
from table_name t1;
here is another way
SELECT *
FROM table_name t1
GROUP BY ti.id_
HAVING COUNT(*) > 0
AND MAX(CASE t1.last_updated_by WHEN 'human' THEN 1 ELSE 0 END) = 1;
Since you didn't specified which column is used to determine this record is the newest record added by a given id, I assume that there will be a column to track the insert/modify timestamp (which is pretty standard table design), let's put it is last_updated_timestamp (if you don't have any, then I still insist you to have one as an auditing trail without timestamp does not make sense)
Given your table name is updating_trail
SELECT updating_trail.*, last_update_trail.modified_by_human
FROM updating_trail
INNER JOIN (
-- determine the id_, the lastest modified_timestamp, and a flag check to determine if there is any record with last_update_by is 'human' -> if yes then give 1
SELECT updating_trail.id_, MAX(last_update_timestamp) AS most_recent_update_ts, MAX(CASE WHEN updating_trail.last_updated_by = 'human' THEN 1 ELSE 0 END) AS modified_by_human
FROM updating_trail
GROUP BY updating_trail.id_
) last_update_trail
ON updating_trail.id_ = last_update_trail.id_ AND updating_trail.last_update_timestamp = last_update_trail.most_recent_update_ts;
Give
id_
last_updated_by
last_update_timestamp
modified_by_human
1
robot
2021-10-19T20:00:00.000Z
1
2
robot
2021-10-19T17:00:00.000Z
0
3
robot
2021-10-19T16:00:00.000Z
1
Check out this sample db fiddle I created for you
This is a 1:1 translation of your query to conditional aggregation:
SELECT TRANSACTION_ID,
CREATED_DATE,
CREATED_BY_USER_ID,
OWNER_USER_ID,
Max(CASE
WHEN CREATED_BY_USER_ID IN ('ROBOT', 'MACHINE') OR
CREATED_BY_USER_ID LIKE 'N%' OR
CREATED_BY_USER_ID IS NULL
THEN 0
ELSE 1
END) Over (PARTITION BY TRANSACTION_ID) AS CREATED_BY_HUMAN
FROM Table_Name
WHERE CREATED_DATE >= Cast('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= Cast('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY Row_Number() Over (PARTITION BY TRANSACTION_ID ORDER BY End_Dt DESC) = 1

Aggregating values in SQL

Im trying to aggregate output values of a customer's bankruptcy in terms of yes (Y), no (N) or no data (N/D) using a window function in a subquery below. For example, if there arise edge cases when in one record the Customer is classified as not bankrupt (N) but in another record on the same CDate it is also classified as no data (N/D), I should get a final aggregated output value as N/D, but it gives me N instead, because of what I've done here by partitioning the customer records over IsBankrupt ascending (asc). The logic behind it which is supposed to be implemented:
Y and Y = Y;
Y and N = Y;
N and N = N;
Y and N/D = Y;
N and N/D = N/D
with sample as (
select date('2020-12-32') as CDate, 123 as CustomerID, 'N/D' as IsBankrupt
union all
select date('2020-12-32') as CDate, 123 as CustomerID, 'N' as IsBankrupt)
select CDate, CustomerID, IsBankrupt, case when CustomerID = 123 then 'N/D' end as ExpectedResult
from
(
select CDate, CustomerID, IsBankrupt,
row_number() over (partition by CustomerID, CDate order by IsBankrupt asc) as flag
from sample
) from subsample
where flag = 1
output:
CDate
CustomerID
IsBankrupt
ExpectedOutput
2020-12-31
123
N
N/D
All the other cases of the previously mentioned logic work. So Question is - how could i update my row_number() over partition by clause so that the logic doesnt break down?
I would suggest aggregation:
select cdate, customerid,
(case when sum(case when IsBankrupt = 'Y' then 1 else 0 end) > 0
then 'Y'
when sum(case when IsBankrupt = 'N/D' then 1 else 0 end) > 0
then 'N/D'
else 'N'
end) as new_flag
from t
group by cdate, customerid;
If you don't like the nested case expressions, you can actually do this based on the ordering of the values:
select cdate, customerid,
max(IsBankrupt) as new_flag
from t
group by cdate, customerid;

SQL Server 2008 equivalent for FETCH OFFSET with WHERE clause

I have a program with which my users can look up all the data traffic that happend the last 7 days. I use a stored procedure to get me that data - 250 records at a time (the user can page through that). The problem was, that the users get a lot of timeouts when they wanted to see that data.
Here is the stored procedure before I tried to optimize ist.
#MaxRecCount INT,
#PageOffset INT,
#IncludeData BIT
SELECT [Client], [Schema], [Version], [Records], [Fetched], [Receipted], [ProvidedAt], [FetchedAt], [ReceiptedAt],[PacketIds], [Record] FROM (
SELECT TOP(#MaxRecCount) MAX(bai_ExportPendingArchive.[UserName]) AS Client,
MAX(bai_ExportPendingArchive.Category) AS [Schema],
MAX(bai_ExportPendingArchive.ContractVersion) AS [Version],
COUNT(*) AS [Records],
SUM (CASE WHEN bai_ExportPendingAckArchive.ExportPendingId IS NULL THEN 0 ELSE 1 END) as [Fetched],
SUM (CASE WHEN bai_ExportPendingAckArchive.Receipted IS NULL THEN 0 ELSE 1 END) as [Receipted],
MAX(bai_ExportArchive.Inserted) AS [ProvidedAt],
MAX(CASE WHEN bai_ExportPendingAckArchive.ExportPendingId IS NULL THEN NULL ELSE bai_ExportPendingAckArchive.Inserted END) AS [FetchedAt],
MAX(CASE WHEN bai_ExportPendingAckArchive.Receipted IS NULL THEN NULL ELSE bai_ExportPendingAckArchive.Receipted END) AS [ReceiptedAt],
bai_ExportArchive.PacketIds AS [PacketIds],
NULL AS [Record],
ROW_NUMBER() Over (Order By MAX(bai_ExportArchive.Inserted) desc) as [RowNumber]
FROM bai_ExportArchive
INNER JOIN bai_ExportPendingArchive ON bai_ExportArchive.Id = bai_ExportPendingArchive.ExportId
LEFT OUTER JOIN bai_ExportPendingAckArchive ON bai_ExportPendingAckArchive.ExportPendingId = bai_ExportPendingArchive.Id
GROUP BY bai_ExportPendingArchive.[UserName], bai_ExportArchive.PacketIds, bai_ExportPendingArchive.Category
) AS InnerTable WHERE RowNumber > (#PageOffset * #MaxRecCount) and RowNumber <= (#PageOffset * #MaxRecCount + #MaxRecCount)
ORDER BY RowNumber
#MaxRecCount, #PageOffset and #IncludeData are parameter which came from my C#-method.
This version needed about 1:35min to get me the data I wanted. To make the stored procedure faster I insered a WHERE clause to filter for the Inserted col (also I made an Index on this column) and to use OFFSET FETCH:
The stored procedure after the optimization:
#MaxRecCount INT,
#PageOffset INT,
#IncludeData BIT
Declare #pageStart int
Declare #pageEnd int
SET #pageStart = #PageOffset * #MaxRecCount
SET #pageEnd = #pageStart + #MaxRecCount + 50
IF #IncludeData = 0
BEGIN
SELECT [Client], [Schema], [Version], [Records], [Fetched], [Receipted], [ProvidedAt], [FetchedAt], [ReceiptedAt],[PacketIds], [Record] FROM (
SELECT TOP(#MaxRecCount) bai_ExportPendingArchive.[UserName] AS Client,
bai_ExportPendingArchive.Category AS [Schema],
MAX(bai_ExportPendingArchive.ContractVersion) AS [Version],
COUNT(*) AS [Records],
SUM (CASE WHEN bai_ExportPendingAckArchive.ExportPendingId IS NULL THEN 0 ELSE 1 END) as [Fetched],
SUM (CASE WHEN bai_ExportPendingAckArchive.Receipted IS NULL THEN 0 ELSE 1 END) as [Receipted],
MAX(bai_ExportArchive.Inserted) AS [ProvidedAt],
MAX(CASE WHEN bai_ExportPendingAckArchive.ExportPendingId IS NULL THEN NULL ELSE bai_ExportPendingAckArchive.Inserted END) AS [FetchedAt],
MAX(CASE WHEN bai_ExportPendingAckArchive.Receipted IS NULL THEN NULL ELSE bai_ExportPendingAckArchive.Receipted END) AS [ReceiptedAt],
bai_ExportArchive.PacketIds AS [PacketIds],
NULL AS [Record],
ROW_NUMBER() Over (Order By MAX(bai_ExportArchive.Inserted) desc) as [RowNumber]
FROM bai_ExportArchive
INNER JOIN bai_ExportPendingArchive ON bai_ExportArchive.Id = bai_ExportPendingArchive.ExportId
LEFT OUTER JOIN bai_ExportPendingAckArchive ON bai_ExportPendingAckArchive.ExportPendingId = bai_ExportPendingArchive.Id
Where bai_ExportArchive.Inserted <= (Select bai_ExportArchive.Inserted from bai_ExportArchive Order by bai_ExportArchive.Inserted DESC Offset #pageStart ROWS FETCH NEXT 1 ROWS Only)
And bai_ExportArchive.Inserted > (Select bai_ExportArchive.Inserted from bai_ExportArchive Order by bai_ExportArchive.Inserted DESC Offset #pageEnd ROWS FETCH NEXT 1 ROWS Only)
GROUP BY bai_ExportPendingArchive.[UserName], bai_ExportArchive.PacketIds, bai_ExportPendingArchive.Category
) AS InnerTable
ORDER BY RowNumber
This version gives me the data in about 2s. The only problem is, I work on Microsoft SQL Server 2014 BUT my Users use SQL Server 2008+. The Problem now is, that the OFFSET FETCH dosn't work in Server 2008. And now I'm clueless how I can optimize my stored procedure that it is fast and work on SQl Server 2008.
I'm thankful for any help :)
Try this method to handle the pagination in SQL Server 2005/2008.
First use a CTE for your select query with a ROW_NUMBER() column to identify the record number/count. After that you can select a range of records from this CTE using your PAGE_NUMBER and PAGE_COUNT. Example is below
DECLARE #P_PAGE_NUM INT = 0
,#P_PAGE_SIZE INT = 20
;WITH CTE
AS
( /*SELECT ROW_NUMBER() OVER (ORDER BY COL_to_SORT DESC) AS [ROW_NO]
,...
WHERE ....
*/ -- You can replace your select query here, but column [ROW_NO] should be there in your select list.
--ie ROW_NUMBER() OVER (ORDER BY put_column-to-sort-here DESC) AS [ROW_NO]
)
SELECT *
--,( SELECT COUNT(*) FROM CTE) AS [TOTAL_ROW_COUNT]
FROM CTE
WHERE (
ISNULL(#P_PAGE_NUM,0) = 0 OR
[ROW_NO] BETWEEN ( #P_PAGE_NUM - 1) * #P_PAGE_SIZE + 1
AND #P_PAGE_NUM * #P_PAGE_SIZE
)
ORDER BY [ROW_NO]

How to assign the correct ROW_NUMBER to the result

I have this SQL:
SELECT
--CASE WHEN EA.[EntityAttentionTypeId] IS NULL THEN 1000+(ROW_NUMBER() OVER (ORDER BY EA.[EntityAttentionTypeId] )) ELSE ROW_NUMBER() OVER (ORDER BY EA.[EntityAttentionTypeId] ) END AS C_NO,
ROW_NUMBER() OVER (ORDER BY EA.[EntityAttentionTypeId]) AS C_NO,
EA.[EntityAttentionTypeId],
C.PreName AS C_TITLE,
C.FirstName AS C_NAME,
C.LastName AS C_SURNAME,
C.Email AS C_EMAIL,
CASE WHEN EA.[EntityAttentionTypeId] = 1 THEN 'MAIN CONTACT' ELSE '' END AS C_COMMENTS,
CASE WHEN EA.[EntityAttentionTypeId] = 1 THEN 'v' ELSE ' ' END AS C_INV,
CASE WHEN EA.[EntityAttentionTypeId] = 1 THEN 'v' ELSE ' ' END AS C_UPDATES,
CASE WHEN EA.[EntityAttentionTypeId] = 1 THEN 'v' ELSE ' ' END AS C_DOWN
FROM
[CMSTIME].[dbo].[Contacts] C
LEFT OUTER JOIN
[CMSTIME].[dbo].[Entities] E ON E.EntityId = C.EntityId
FULL OUTER JOIN
[CMSTIME].[dbo].[EntityAttentions] EA ON EA.[PeopleId]= C.[ContactId]
WHERE
C.ContactId IN (SELECT Entity2People.EmployeeId
FROM Entity2People
WHERE Entity2People.EntityId = 307)
ORDER BY
C_NO
I get this result:
How can I change my SQL so that I get rows with nulls EntityAttentionTypeId at the bottom and the C_NO still to read 1,2,3,4...etc
I've tried
CASE WHEN EA.[EntityAttentionTypeId] IS NULL THEN 1000+(ROW_NUMBER() OVER (ORDER BY EA.[EntityAttentionTypeId] )) ELSE ROW_NUMBER() OVER (ORDER BY EA.[EntityAttentionTypeId] ) END AS C_NO,
but i get something like:
Thank you all in advance
I think you want ORDER BY -EntityAttentionTypeId DESC within the ROW_NUMBER() function
Since ASC will put NULL first, then order by non null values, if you order by DESC it will put NULL last. This of course would reverse the required order of Non Null values, so if you use the - operator you get back to the correct order, while still keeping NULL at the end.
e.g.
SELECT ROW_NUMBER() OVER (ORDER BY -EntityAttentionTypeId DESC) AS C_NO,
EntityAttentionTypeId
FROM (VALUES (NULL), (1), (2), (3)) t (EntityAttentionTypeId)
ORDER BY C_NO;
Gives:
C_NO EntityAttentionTypeId
-----------------------------
1 1
2 2
3 3
4 NULL
Hmmm, I think this resulting query should redefine C_NO using both ROW_NUMBER() and a CASE:
SELECT (ROW_NUMBER() OVER (ORDER BY EA.[EntityAttentionTypeId]) +
(CASE WHEN EA.EntityAttentionTypeId] IS NULL THEN 1000 ELSE 0 END)
) AS C_NO,
. . .
ORDER BY C_NO;

SQL Optimize - From History table get value from two different dates

Not sure where to start... But basically I have a report table, an account table, and an account history table. The account history table will have zero or more records, where each record is the state of the account cancelled flag after it changed.
There is other stuff going on, but basically i am looking to return the account detail data, with the state of account cancelled bit on the start date and enddate as different columns.
What is the best way to do this?
I have the following working query below
(Idea) Should I do seperate joins on history table, 1 for each date?
I guess I could do it in three separate queries ( Get Begin Snapshot, End Snapshot, Normal Report query with a join to each snapshot)
something else?
Expected output:
AccountID, OtherData, StartDateCancelled, EndDateCancelled
Test Tables:
DECLARE #Report TABLE (ReportID INT, StartDate DATETIME, EndDate DATETIME)
DECLARE #ReportAccountDetail TABLE( ReportID INT, Accountid INT, Cancelled BIT )
DECLARE #AccountHistory TABLE( AccountID INT, ModifiedDate DATETIME, Cancelled BIT )
INSERT INTO #Report
SELECT 1,'1/1/2011', '2/1/2011'
--
INSERT INTO #ReportAccountDetail
SELECT 1 AS ReportID, 1 AS AccountID, 0 AS Cancelled
UNION
SELECT 1,2,0
UNION
SELECT 1,3,1
UNION
SELECT 1,4,1
--
INSERT INTO #AccountHistory
SELECT 2 AS CustomerID, '1/2/2010' AS ModifiedDate, 1 AS Cancelled
UNION--
SELECT 3, '2/1/2011', 1
UNION--
SELECT 4, '1/1/2010', 1
UNION
SELECT 4, '2/1/2010', 0
UNION
SELECT 4, '2/1/2011', 1
Current Query:
SELECT Accountid, OtherData,
MAX(CASE WHEN BeginRank = 1 THEN CASE WHEN BeginHistoryExists = 1 THEN HistoryCancelled ELSE DefaultCancel END ELSE NULL END ) AS StartDateCancelled,
MAX(CASE WHEN EndRank = 1 THEN CASE WHEN EndHistoryExists = 1 THEN HistoryCancelled ELSE DefaultCancel END ELSE NULL END ) AS EndDateCancelled
FROM
(
SELECT c.Accountid,
'OtherData' AS OtherData,
--lots of other data
ROW_NUMBER() OVER (PARTITION BY c.AccountID ORDER BY
CASE WHEN ch.ModifiedDate <= Report.StartDate THEN 1 ELSE 0 END DESC, ch.ModifiedDate desc) AS BeginRank,
CASE WHEN ch.ModifiedDate <= Report.StartDate THEN 1 ELSE 0 END AS BeginHistoryExists,
ROW_NUMBER() OVER ( PARTITION BY c.AccountID ORDER BY
CASE WHEN ch.ModifiedDate <= Report.EndDate THEN 1 ELSE 0 END DESC, ch.ModifiedDate desc) AS EndRank,
CASE WHEN ch.ModifiedDate <= Report.EndDate THEN 1 ELSE 0 END AS EndHistoryExists,
CAST( ch.Cancelled AS INT) AS HistoryCancelled,
0 AS DefaultCancel
FROM
#Report AS Report
INNER JOIN #ReportAccountDetail AS C ON Report.ReportID = C.ReportID
--Others joins related for data to return
LEFT JOIN #AccountHistory AS CH ON CH.AccountID = C.AccountID
WHERE Report.ReportID = 1
) AS x
GROUP BY AccountID, OtherData
Welcome input on writing stack overflow questions. Thanks!
ROW_NUMBER() often suprises me and out-performs my expectations. In this case, however, I'd be tempted to just use correlated sub-queries. At least, I'd test them against the alternatives.
Note: I would also use real tables, with real indexes, and a realistic volume of fake data. (If it's worth posting this question, I'm assuming that it's worth testing this realistically.)
SELECT
[Report].ReportID,
[Account].AccountID,
[Account].OtherData,
ISNULL((SELECT TOP 1 Cancelled FROM AccountHistory WHERE AccountID = [Account].AccountID AND ModifiedDate <= [Report].StartDate ORDER BY ModifiedDate DESC), 0) AS StartDateCancelled,
ISNULL((SELECT TOP 1 Cancelled FROM AccountHistory WHERE AccountID = [Account].AccountID AND ModifiedDate <= [Report].EndDate ORDER BY ModifiedDate DESC), 0) AS EndDateCancelled
FROM
Report AS [Report]
LEFT JOIN
ReportAccountDetail AS [Account]
ON [Account].ReportID = [Report].ReportID
ORDER BY
[Report].ReportID,
[Account].AccountID
Note: For whatever reason, I've found that TOP 1 and ORDER BY is faster than MAX().
In terms of your suggested answer, I'd modify it slightly to just use ISNULL instead of trying to make the Exists columns work.
I'd also join on the "other data" after all of the working out, rather than inside the inner-most query, so as to avoid having to group by all the "other data".
WITH
HistoricData AS
(
SELECT
Report.ReportID,
c.Accountid,
c.OtherData,
ROW_NUMBER() OVER (PARTITION BY c.ReportID, c.AccountID ORDER BY CASE WHEN ch.ModifiedDate <= Report.StartDate THEN 1 ELSE 0 END DESC, ch.ModifiedDate DESC) AS BeginRank,
ROW_NUMBER() OVER (PARTITION BY c.ReportID, c.AccountID ORDER BY ch.ModifiedDate DESC) AS EndRank,
CH.Cancelled
FROM
#Report AS Report
INNER JOIN
#ReportAccountDetail AS C
ON Report.ReportID = C.ReportID
LEFT JOIN
#AccountHistory AS CH
ON CH.AccountID = C.AccountID
AND CH.ModifiedDate <= Report.EndDate
)
,
FlattenedData AS
(
SELECT
ReportID,
Accountid,
OtherData,
ISNULL(MAX(CASE WHEN BeginRank = 1 THEN Cancelled END), 0) AS StartDateCancelled,
ISNULL(MAX(CASE WHEN EndRank = 1 THEN Cancelled END), 0) AS EndDateCancelled
FROM
[HistoricData]
GROUP BY
ReportID,
AccountID,
OtherData
)
SELECT
*
FROM
[FlattenedData]
LEFT JOIN
[OtherData]
ON Whatever = YouLike
WHERE
[FlattenedData].ReportID = 1
And a final possible version...
WITH
ReportStartHistory AS
(
SELECT
*
FROM
(
SELECT
[Report].ReportID,
ROW_NUMBER() OVER (PARTITION BY [Report].ReportID, [History].AccountID ORDER BY [History].ModifiedDate) AS SequenceID,
[History].*
FROM
Report AS [Report]
INNER JOIN
AccountHistory AS [History]
ON [History].ModifiedDate <= [Report].StartDate
)
AS [data]
WHERE
SequenceID = 1
)
,
ReportEndHistory AS
(
SELECT
*
FROM
(
SELECT
[Report].ReportID,
ROW_NUMBER() OVER (PARTITION BY [Report].ReportID, [History].AccountID ORDER BY [History].ModifiedDate) AS SequenceID,
[History].*
FROM
Report AS [Report]
INNER JOIN
AccountHistory AS [History]
ON [History].ModifiedDate <= [Report].EndDate
)
AS [data]
WHERE
SequenceID = 1
)
SELECT
[Report].ReportID,
[Account].*,
ISNULL([ReportStartHistory].Cancelled, 0) AS StartDateCancelled,
ISNULL([ReportEndHistory].Cancelled, 0) AS EndDateCancelled
FROM
Report AS [Report]
INNER JOIN
Account AS [Account]
LEFT JOIN
[ReportStartHistory]
ON [ReportStartHistory].ReportID = [Report].ReportID
AND [ReportStartHistory].AccountID = [Account].AccountID
LEFT JOIN
[ReportEndHistory]
ON [ReportEndHistory].ReportID = [Report].ReportID
AND [ReportEndHistory].AccountID = [Account].AccountID