SQL self join to get count difference between records

SQL self join to get count difference between records - sql

Pardon the title as I could not think of a good title for my problem.
I have a table as below
L_DATE
GRP
Counts
20.01.2023
A
100
21.01.2023
A
150
22.01.2023
B
200
20.01.2023
C
500
21.01.2023
C
800
22.01.2023
C
1200
The desired output is like this
GRP
Current Count
Last Count
Diff1
Last2Last Count
Diff2
A
0
150
-150
100
-100
B
200
0
200
0
200
C
1200
800
400
500
700
where,
Current Count is the count of latest date - 22.01.2023
Last Count is the count of previous date - 21.01.2023
Last2Last Count is the count of last to last date - 20.01.2023
Diff1 is the difference between Current Count and Last Count
Diff2 is the difference between Current Count and Last2Last Count
0 appears where there is no data for that date, for example A does not have any record for latest date 22.01.2023 so its 'Current Count' is 0. Similarly B does not have any record for 21.01.2023 or 20.01.2023 so its 'Last Count' and 'Last2Last Count' is 0.
I have tried all sorts of joins but cannot achieve the desired results. Below is my latest code, which gives me result of C and B but not A.
select distinct
T1.GRP,
T1.Counts as "Current Count",
ifnull(T2.Counts,0) as "Last Count",
T1.Counts - T2.Counts as "Diff1",
ifnull(T3.Counts,0) as "Last2Last Count",
T1.Counts - T3.Counts as "Diff2"
from tbl T1
left join tbl T2 on (T2.L_DATE = '21.01.2023' and T2.GRP = T1.GRP)
left join tbl T3 on (T3.L_DATE = '20.01.2023' and T3.GRP = T1.GRP)
where T1.L_DATE = ('22.01.2023')
I tried to achieve it via GROUP_BY but did not succeed. Any help or guidance is appreciated.

Generate test data
CREATE TABLE TEST (L_DATE DATE, GRP VARCHAR(1), COUNTS INTEGER);
INSERT INTO TEST VALUES ('20.1.2023', 'A', 100);
INSERT INTO TEST VALUES ('21.1.2023', 'A', 150);
INSERT INTO TEST VALUES ('22.1.2023', 'B', 200);
INSERT INTO TEST VALUES ('20.1.2023', 'C', 500);
INSERT INTO TEST VALUES ('21.1.2023', 'C', 800);
INSERT INTO TEST VALUES ('22.1.2023', 'C', 1200);
Next you need to "fill the empty lines". For dates you may want to use SERIES_GENERATE instead, if not all dates are present in the data.
WITH expected_lines AS (
SELECT DISTINCT a.L_DATE, b.GRP
FROM TEST a, TEST b
)
SELECT el.L_DATE, el.GRP, ifnull(t.COUNTS, 0) AS COUNTS
FROM expected_lines el
LEFT JOIN TEST t ON el.L_DATE = t.L_DATE AND el.GRP = t.GRP
As you proposed, two self-joins based on this intermediate result would do the job. However, I would prefer to use window function LAG instead.
WITH expected_lines AS (
SELECT DISTINCT a.L_DATE, b.GRP
FROM TEST a, TEST b
)
SELECT
el.L_DATE,
el.GRP,
ifnull(t.COUNTS, 0) AS COUNTS,
LAG(ifnull(t.COUNTS,0), 1) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LASTCOUNT,
LAG(ifnull(t.COUNTS,0), 2) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LAST2LASTCOUNT
FROM expected_lines el
LEFT JOIN TEST t ON el.L_DATE = t.L_DATE AND el.GRP = t.GRP
Note that this gives you the desired result also for historical dates. You can add a WHERE condition for the current date. Also you can additionally calculate the differences:
WITH expected_lines AS (
SELECT DISTINCT a.L_DATE, b.GRP
FROM TEST a, TEST b
)
SELECT L_DATE, GRP, COUNTS, LASTCOUNT, COUNTS-LASTCOUNT DIFF1, LAST2LASTCOUNT, COUNTS-LAST2LASTCOUNT DIFF2
FROM
(
SELECT
el.L_DATE,
el.GRP,
ifnull(t.COUNTS, 0) AS COUNTS,
LAG(ifnull(t.COUNTS,0), 1) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LASTCOUNT,
LAG(ifnull(t.COUNTS,0), 2) OVER (PARTITION BY el.GRP ORDER BY el.L_DATE) AS LAST2LASTCOUNT
FROM expected_lines el
LEFT JOIN TEST t ON el.L_DATE = t.L_DATE AND el.GRP = t.GRP
)
WHERE L_DATE = '22.1.2023'

Related

SQL Server - How to fill in missing column values

I have set of records at day level with 2 columns:
Invoice_date
Invoice_amount
For few records, value of invoice_amount is missing.
I need to fill invoice_amount values where it is NULL using this logic:
Look for next available invoice_amount (in dates later than the blank value record date)
For records with invoice_amount still blank (invoice_amount not present for future dates), look for most previous invoice_amount (in dates before the blank value date)
Note: We have consecutive multiple days where invoice_amount is blank in the dataset:

use CROSS APPLY to find next and previous not null Invoice Amount
update p
set Invoice_Amount = coalesce(nx.Invoice_Amount, pr.Invoice_Amount)
from Problem p
outer apply -- Next non null value
(
select top 1 *
from Problem x
where x.Invoice_Amount is not null
and x.Invoice_Date > p.Invoice_Date
order by Invoice_Date
) nx
outer apply -- Prev non null value
(
select top 1 *
from Problem x
where x.Invoice_Amount is not null
and x.Invoice_Date < p.Invoice_Date
order by Invoice_Date desc
) pr
where p.Invoice_Amount is null
this updates back your table. If you need a select query, it can be modify to it easily

Not efficient but seems to work. Try:
update test set invoice_amount =
coalesce ((select top 1 next.invoice_amount from test next
where next.invoiceDate > test.invoiceDate and next.invoice_amount is not null
order by next.invoiceDate),
(select top 1 prev.invoice_amount from test prev
where prev.invoiceDate < test.invoiceDate and prev.invoice_amount is not null
order by prev.invoiceDate desc))
where invoice_amount is null;

As per given example you could use window function with self join
update t set t.amount = tt.NewAmount
from table t
inner join (
select Dates, coalesce(min(amount) over (order by dates desc ROWS BETWEEN 1 PRECEDING AND CURRENT ROW),
min(amount) over (order by dates asc ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)) NewAmount
from table t
) tt on tt.dates = t.dates
where t.amount is null

Replace or updates null values with the sum of amount spend for customers

At first, I have a table looks like as below:
cust_id/Bill_amt/Brand/BrandA/BrandB/Total_value
100/350/A/NULL /NULL/NULL
100/250/A/NULL/NULL/NULL
100/100/B/NULL /NULL/NULL
300/200/B/NULL /NULL/NULL
I would like to replace the 'null' values with the amount of spend for the same customer, as you can see from the above table, there is repeated customers with cust_id 100, this is because this customer purchase both brand A and B at different dates, thus, I need your help to sum up everything for that customer in one row, after putting everything in one row, you will notice that there is 3 rows with the same record (duplication), which is shown as below:
cust_id/Bill_amt/Brand/BrandA/BrandB/Total_value
100/350/A/600/100/700
100/250/A/600/100/700
100/100/B/600/100/700
300/200/B/0/200/200
For example,cust_id 100 spend $600(350+250) for brand A, and this customer only spend $100 (look at the 3rd row of cutsomer_id 100) for brand B, thus, the total value is $700 (600+100).
I hope this explanation is clear enough for you.
After update the table as shown below, we will remove the duplicates by ourselves.
Please kindly provide us the SQL query to help us to replace or update the 'null' values with the sum of bill_amt as we have 200000 plus record to do it.
Thank you very much for taking your time to reply us.

why do you need BrandA,BrandB,Total_value column ?that is not require.
;WITH cte
AS (SELECT cust_id,
brand,
Sum(bill_amt) bill_amt
FROM #t
GROUP BY rollup( cust_id, brand ))
UPDATE #t
SET branda = COALESCE(a.bill_amt, 0),
brandb = COALESCE(c.bill_amt, 0),
total_value = COALESCE(d.bill_amt, 0)
FROM #t b
LEFT JOIN cte a
ON a.cust_id = b.cust_id
AND a.brand = 'A'
LEFT JOIN cte c
ON b.cust_id = c.cust_id
AND c.brand = 'B'
LEFT JOIN cte d
ON b.cust_id = d.cust_id
AND d.brand IS NULL
SELECT *
FROM #t

Find Segment with Longest Stay Per Booking

We have a number of bookings and one of the requirements is that we display the Final Destination for a booking based on its segments. Our business has defined the Final Destination as that in which we have the longest stay. And Origin being the first departure point.
Please note this is not the segments with the Longest Travel time i.e. Datediff(minute, DepartDate, ArrivalDate) This is requesting the one with the Longest gap between segments.
This is a simplified version of the tables:
Create Table Segments
(
BookingID int,
SegNum int,
DepartureCity varchar(100),
DepartDate datetime,
ArrivalCity varchar(100),
ArrivalDate datetime
);
Create Table Bookings
(
BookingID int identity(1,1),
Locator varchar(10)
);
Insert into Segments values (1,2,'BRU','2010-03-06 10:40','FIH','2010-03-06 20:20:00')
Insert into Segments values (1,4,'FIH','2010-03-13 21:50:00','BRU', '2010-03-14 07:25:00')
Insert into Segments values (2,2,'BOD','2010-02-10 06:50:00','AMS','2010-02-10 08:50:00')
Insert into Segments values (2,3,'AMS','2010-02-10 10:40:00','EBB','2010-02-10 20:40:00')
Insert into Segments values (2,4,'EBB','2010-02-28 22:55:00','AMS','2010-03-01 05:35:00')
Insert into Segments values (2,5,'AMS','2010-03-01 10:25:00','BOD','2010-03-01 12:15:00')
insert into Segments values (3,2,'BRU','2010-03-09 12:10:00','IAD','2010-03-09 14:46:00')
Insert into Segments Values (3,3,'IAD','2010-03-13 17:57:00','BRU','2010-03-14 07:15:00')
insert into segments values (4,2,'BRU','2010-07-27','ADD','2010-07-28')
insert into segments values (4,4,'ADD','2010-07-28','LUN','2010-07-28')
insert into segments values (4,5,'LUN','2010-08-23','ADD','2010-08-23')
insert into segments values (4,6,'ADD','2010-08-23','BRU','2010-08-24')
Insert into Bookings values('5MVL7J')
Insert into Bookings values ('Y2IMXQ')
insert into bookings values ('YCBL5C')
Insert into bookings values ('X7THJ6')
I have created a SQL Fiddle with real data here:
SQL Fiddle Example
I have tried to do the following, however this doesn't appear to be correct.
SELECT Locator, fd.*
FROM Bookings ob
OUTER APPLY
(
SELECT Top 1 DepartureCity, ArrivalCity
from
(
SELECT DISTINCT
seg.segnum ,
seg.DepartureCity ,
seg.DepartDate ,
seg.ArrivalCity ,
seg.ArrivalDate,
(SELECT
DISTINCT
DATEDIFF(MINUTE , seg.ArrivalDate , s2.DepartDate)
FROM Segments s2
WHERE s2.BookingID = seg.BookingID AND s2.segnum = seg.segnum + 1) 'LengthOfStay'
FROM Bookings b(NOLOCK)
INNER JOIN Segments seg (NOLOCK) ON seg.bookingid = b.bookingid
WHERE b.Locator = ob.locator
) a
Order by a.lengthofstay desc
)
FD
The results I expect are:
Locator Origin Destination
5MVL7J BRU FIH
Y2IMXQ BOD EBB
YCBL5C BRU IAD
X7THJ6 BRU LUN
I get the feeling that a CTE would be the best approach, however my attempts do this so far failed miserably. Any help would be greatly appreciated.
I have managed to get the following query working but it only works for one at a time due to the top one, but I'm not sure how to tweak it:
WITH CTE AS
(
SELECT distinct s.DepartureCity, s.DepartDate, s.ArrivalCity, s.ArrivalDate, b.Locator , ROW_NUMBER() OVER (PARTITION BY b.Locator ORDER BY SegNum ASC) RN
FROM Segments s
JOIN bookings b ON s.bookingid = b.BookingID
)
SELECT C.Locator, c.DepartureCity, a.ArrivalCity
FROM
(
SELECT TOP 1 C.Locator, c.ArrivalCity, c1.DepartureCity, DATEDIFF(MINUTE,c.ArrivalDate, c1.DepartDate) 'ddiff'
FROM CTE c
JOIN cte c1 ON c1.Locator = C.Locator AND c1.rn = c.rn + 1
ORDER BY ddiff DESC
) a
JOIN CTE c ON C.Locator = a.Locator
WHERE c.rn = 1

You can try something like this:
;WITH CTE_Start AS
(
--Ordering of segments to eliminate gaps
SELECT *, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY SegNum) RN
FROM dbo.Segments
)
, RCTE_Stay AS
(
--recursive CTE to calculate stay between segments
SELECT *, 0 AS Stay FROM CTE_Start s WHERE RN = 1
UNION ALL
SELECT sNext.*, DATEDIFF(Mi, s.ArrivalDate, sNext.DepartDate)
FROM CTE_Start sNext
INNER JOIN RCTE_Stay s ON s.RN + 1 = sNext.RN AND s.BookingID = sNext.BookingID
)
, CTE_Final AS
(
--Search for max(stay) for each bookingID
SELECT *, ROW_NUMBER() OVER (PARTITION BY BookingID ORDER BY Stay DESC) AS RN_Stay
FROM RCTE_Stay
)
--join Start and Final on RN=1 to find origin and departure
SELECT b.Locator, s.DepartureCity AS Origin, f.DepartureCity AS Destination
FROM CTE_Final f
INNER JOIN CTE_Start s ON f.BookingID = s.BookingID
INNER JOIN dbo.Bookings b ON b.BookingID = f.BookingID
WHERE s.RN = 1 AND f.RN_Stay = 1
SQLFiddle DEMO

You can use the OUTER APPLY + TOP operators to find the next values SegNum. After finding the gap between segments are used MIN/MAX aggregate functions with OVER clause as conditions in the CASE expression
;WITH cte AS
(
SELECT seg.BookingID,
CASE WHEN MIN(seg.segNum) OVER(PARTITION BY seg.BookingID) = seg.segNum
THEN seg.DepartureCity END AS Origin,
CASE WHEN MAX(DATEDIFF(MINUTE, seg.ArrivalDate, o.DepartDate)) OVER(PARTITION BY seg.BookingID)
= DATEDIFF(MINUTE, seg.ArrivalDate, o.DepartDate)
THEN o.DepartureCity END AS Destination
FROM Segments seg (NOLOCK)
OUTER APPLY (
SELECT TOP 1 seg2.DepartDate, seg2.DepartureCity
FROM Segments seg2
WHERE seg.BookingID = seg2.BookingID
AND seg.SegNum < seg2.SegNum
ORDER BY seg2.SegNum ASC
) o
)
SELECT b.Locator, MAX(c.Origin) AS Origin, MAX(c.Destination) AS Destination
FROM cte c JOIN Bookings b ON c.BookingID = b.BookingID
GROUP BY b.Locator
See demo on SQLFiddle

The statement below:
;WITH DataSource AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY BookingID ORDER BY DATEDIFF(SS,DepartDate,ArrivalDate) DESC) AS Row
,Segments.BookingID
,Segments.SegNum
,Segments.DepartureCity
,Segments.DepartDate
,Segments.ArrivalCity
,Segments.ArrivalDate
,DATEDIFF(SS,DepartDate,ArrivalDate) AS DiffInSeconds
FROM Segments
)
SELECT *
FROM DataSource DS
INNER JOIN Bookings B
ON DS.[BookingID] = B.[BookingID]
Will give the following output:
So, adding the following clause to the above statement:
WHERE Row = 1
will give you what you need.
Few important things:
As you can see from the screenshot below, there are two records with same difference in second. If you want to show both of them (or all of them if there are), instead ROW_NUMBER function use RANK function.
The return type of DATEDIFF is INT. So, there is limitation for seconds max deference value. It is as follows:
If the return value is out of range for int (-2,147,483,648 to
+2,147,483,647), an error is returned. For millisecond, the maximum difference between startdate and enddate is 24 days, 20 hours, 31
minutes and 23.647 seconds. For second, the maximum difference is 68
years.

SQL Server: Group similar sales together

I'm trying to do some reporting in SQL Server.
Here's the basic table setup:
Order (ID, DateCreated, Status)
Product(ID, Name, Price)
Order_Product_Mapping(OrderID, ProductID, Quantity, Price, DateOrdered)
Here I want to create a report to group product with similar amount of sales over a time period like this:
Sales over 1 month:
Coca, Pepsi, Tiger: $20000 average(coca:$21000, pepsi: $19000, tiger: $20000)
Bread, Meat: $10000 avg (bread:$11000, meat: $9000)
Note that the text in () is just to clarify, not need in the report).
User define the varying between sales that can consider similar. Example sales with varying lower than 5% are consider similar and should be group together. The time period is also user defined.
I can calculate total sale over a period but has no ideas on how to group them together by sales varying. I'm using SQL Server 2012.
Any help is appreciated.
Sorry, my English is not very good :)
UPDATE: *I figured out about what I atually need ;)*
For an known array of numbers like: 1,2,3,50,52,100,102,105
I need to group them into groups which have at least 3 number and the difference between any two items in group is smaller than 10.
For the above array, output should be:
[1,2,3]
[100,102,105]
=> the algorithm take 3 params: the array, minimum items to form a group and maximum difference between 2 items.
How can I implement this in C#?

By the way, if you just want c#:
var maxDifference = 10;
var minItems = 3;
// I just assume your list is not ordered, so order it first
var array = (new List<int> {3, 2, 50, 1, 51, 100, 105, 102}).OrderBy(a => a);
var result = new List<List<int>>();
var group = new List<int>();
var lastNum = array.First();
var totalDiff = 0;
foreach (var n in array)
{
totalDiff += n - lastNum;
// if distance of current number and first number in current group
// is less than the threshold, add into current group
if (totalDiff <= maxDifference)
{
group.Add(n);
lastNum = n;
continue;
}
// if current group has 3 items or more, add to final result
if (group.Count >= minItems)
result.Add(group);
// start new group
group = new List<int>() { n };
lastNum = n;
totalDiff = 0;
}
// forgot the last group...
if (group.Count >= minItems)
Result.Add(group);
the key here is, the array need to be ordered, so that you do not need to jump around or store values to calculate distances

I can't believe I did it~~~
-- this threshold is the key in this query
-- it means that
-- if the difference between two values are less than the threshold
-- these two values are belong to one group
-- in your case, I think it is 200
DECLARE #th int
SET #th = 200
-- very simple, calculate total price for a time range
;WITH totals AS (
SELECT p.name AS col, sum(o.price * op.quantity) AS val
FROM order_product_mapping op
JOIN [order] o ON o.id = op.orderid
JOIN product p ON p.id = op.productid
WHERE dateordered > '2013-03-01' AND dateordered < '2013-04-01'
GROUP BY p.name
),
-- give a row number for each row
cte_rn AS ( --
SELECT col, val, row_number()over(ORDER BY val DESC) rn
FROM totals
),
-- show starts now,
-- firstly, we make each row knows the row before it
cte_last_rn AS (
SELECT col, val, CASE WHEN rn = 1 THEN 1 ELSE rn - 1 END lrn
FROM cte_rn
),
-- then we join current to the row before it, and calculate
-- the difference between the total price of current row and that of previous row
-- if the the difference is more than the threshold we make it '1', otherwise '0'
cte_range AS (
SELECT
c1.col, c1.val,
CASE
WHEN c2.val - c1.val <= #th THEN 0
ELSE 1
END AS range,
rn
FROM cte_last_rn c1
JOIN cte_rn c2 ON lrn = rn
),
-- even tricker here,
-- now, we join last cte to itself, and for each row
-- sum all the values (0, 1 that calculated previously) of rows before current row
cte_rank AS (
SELECT c1.col, c1.val, sum(c2.range) rank
FROM cte_range c1
JOIN cte_range c2 ON c1.rn >= c2.rn
GROUP BY c1.col, c1.val
)
-- now we have properly grouped theres total prices, and we can group on it's rank
SELECT
avg(c1.val) AVG,
(
SELECT c2.col + ', ' AS 'data()'
FROM cte_rank c2
WHERE c2.rank = c1.rank
ORDER BY c2.val desc
FOR xml path('')
) product,
(
SELECT cast(c2.val AS nvarchar(MAX)) + ', ' AS 'data()'
FROM cte_rank c2
WHERE c2.rank = c1.rank
ORDER BY c2.desc
FOR xml path('')
) price
FROM cte_rank c1
GROUP BY c1.rank
HAVING count(1) > 2
The result will look like:
AVG PRODUCT PRICE
28 A, B, C 30, 29, 27
12 D, E, F 15, 12, 10
3 G, H, I 4, 3, 2
for understanding how I did concatenate, please read this:
Concatenate many rows into a single text string?

This query should produce what you expect, it displays products sales for every months for which you have orders :
SELECT CONVERT(CHAR(4), OP.DateOrdered, 100) + CONVERT(CHAR(4), OP.DateOrdered, 120) As Month ,
Product.Name ,
AVG( OP.Quantity * OP.Price ) As Turnover
FROM Order_Product_Mapping OP
INNER JOIN Product ON Product.ID = OP.ProductID
GROUP BY CONVERT(CHAR(4), OP.DateOrdered, 100) + CONVERT(CHAR(4), OP.DateOrdered, 120) ,
Product.Name
Not tested, but if you provide sample data I could work on it

Look like I made things more complicate than it should be.
Here is what should solve the problem:
-Run a query to get sales for each product.
-Run K-mean or some similar algorithms.

SQL if breaking number pattern, mark record?

I have the following query:
SELECT AccountNumber, RptPeriod
FROM dbo.Report
ORDER BY AccountNumber, RptPeriod.
I get the following results:
123 200801
123 200802
123 200803
234 200801
344 200801
344 200803
I need to mark the record where the rptperiod doesnt flow concurrently for the account. For example 344 200803 would have an X next to it since it goes from 200801 to 200803.
This is for about 19321 rows and I want it on a company basis so between different companies I dont care what the numbers are, I just want the same company to show where there is breaks in the number pattern.
Any Ideas??
Thanks!

OK, this is kind of ugly (double join + anti-join) but it gets the work done, AND is pure portable SQL:
SELECT *
FROM dbo.Report R1
, dbo.Report R2
WHERE R1.AccountNumber = R2.AccountNumber
AND R2.RptPeriod - R1.RptPeriod > 1
-- subsequent NOT EXISTS ensures that R1,R2 rows found are "next to each other",
-- e.g. no row exists between them in the ordering above
AND NOT EXISTS
(SELECT 1 FROM dbo.Report R3
WHERE R1.AccountNumber = R3.AccountNumber
AND R2.AccountNumber = R3.AccountNumber
AND R1.RptPeriod < R3.RptPeriod
AND R3.RptPeriod < R2.RptPeriod
)

Something like this should do it:
-- cte lists all items by AccountNumber and RptPeriod, assigning an ascending integer
-- to each RptPeriod and restarting at 1 for each new AccountNumber
;WITH cte (AccountNumber, RptPeriod, Ranking)
as (select
AccountNumber
,RptPeriod
,row_number() over (partition by AccountNumber order by AccountNumber, RptPeriod) Ranking
from dbo.Report)
-- and then we join each row with each preceding row based on that "Ranking" number
select
This.AccountNumber
,This.RptPeriod
,case
when Prior.RptPeriod is null then '' -- Catches the first row in a set
when Prior.RptPeriod = This.RptPeriod - 1 then '' -- Preceding row's RptPeriod is one less that This row's RptPeriod
else 'x' -- -- Preceding row's RptPeriod is not less that This row's RptPeriod
end UhOh
from cte This
left outer join cte Prior
on Prior.AccountNumber = This.AccountNumber
and Prior.Ranking = This.Ranking - 1
(Edited to add comments)

WITH T
AS (SELECT *,
/*Each island of contiguous data will have
a unique AccountNumber,Grp combination*/
RptPeriod - ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) Grp,
/*RowNumber will be used to identify first record
per company, this should not be given an 'X'. */
ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) AS RN
FROM Report)
SELECT AccountNumber,
RptPeriod,
/*Check whether first in group but not first over all*/
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY AccountNumber, Grp
ORDER BY RptPeriod) = 1
AND RN > 1 THEN 'X'
END AS Flag
FROM T

SELECT *
FROM report r
LEFT JOIN report r2
ON r.accountnumber = r.accountnumber
AND {r2.rptperiod is one day after r.rptPeriod}
JOIN report r3
ON r3.accountNumber = r.accountNumber
AND r3.rptperiod > r1.rptPeriod
WHERE r2.rptPeriod IS NULL
AND r3 IS NOT NULL
I'm not sure of sql servers date logic syntax, but hopefully you get the idea. r will be all the records where the next rptPeriod is NULL (r2) and there exists at least one greater rptPeriod (r3). The query isn't super straight forward I guess, but if you have an index on the two columns, it'll probably be the most efficent way to get your data.

Basically, you number rows within every account, then, using the row numbers, compare the RptPeriod values for the neighbouring rows.
It is assumed here that RptPeriod is the year and month encoded, for which case the year transition check has been added.
;WITH Report_sorted AS (
SELECT
AccountNumber,
RptPeriod,
rownum = ROW_NUMBER() OVER (PARTITION BY AccountNumber ORDER BY RptPeriod)
FROM dbo.Report
)
SELECT
AccountNumber,
RptPeriod,
CASE ISNULL(CASE WHEN r1.RptPeriod / 100 < r2.RptPeriod / 100 THEN 12 ELSE 0 END
+ r1.RptPeriod - r2.RptPeriod, 1) AS Chk
WHEN 1 THEN ''
ELSE 'X'
END
FROM Report_sorted r1
LEFT JOIN Report_sorted r2
ON r1.AccountNumber = r2.AccountNumber AND r1.rownum = r2.rownum + 1
It could be complicated further with an additional check for gaps spanning a year and more, if you need that.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL self join to get count difference between records - sql

Related

SQL Server - How to fill in missing column values

Replace or updates null values with the sum of amount spend for customers

Find Segment with Longest Stay Per Booking

SQL Server: Group similar sales together

SQL if breaking number pattern, mark record?

Categories

Resources