SQL Gap Overlap DateRanges - sql

I have a query that identifies gaps and overlaps of date ranges in sql server 2008 r2. Each unique data set has 12 records. What I would like to do is to adjust or add to the code that identifies the gaps and overlaps and update the records to be sequential.
--gaps and overlaps tbl_volumes
with s as
(
select esiid,Read_Start,Read_End ,row_number() over(partition by esiid order by Read_Start) rn
from tbl_Volumes
where Status=0
group by esiid,Read_Start,Read_End)
select a.esiid, a.Read_Start, a.Read_End, b.Read_Start as nextstartdate,datediff(d,a.Read_End, b.Read_Start) as gap
into #go
from s a
join s b on b.esiid = a.esiid and b.rn = a.rn + 1
where datediff(d, a.Read_End, b.Read_Start) not in (0,1)
order by a.esiid
Here is the bad record set that I would like to see sequential:
e Read_Start Read_End Source
10032789402145965 2011-01-21 2011-02-22 867_03_1563303
10032789402145965 2011-02-22 2011-03-21 867_03_1665865
10032789402145965 2011-03-26 2011-04-20 867_03_1782993

Well, you could just assign a new Read_end to each record based on the next value. The calculation for the new start can be done like this:
select t.*,
(select top 1 Read_Start
from t t2
where t2.e = t.e and t2.Read_Start > t.Read_Start
order by t2.Read_Start
) as New_Read_End
from t
Do you actually want to update the value or just see what it should be?

Related

Select latest and 2nd latest date rows per user

I have the following query to select rows where the LAST_UPDATE_DATE field is getting records that have a date value greater than or equal to the last 7 days, which works great.
SELECT 'NEW ROW' AS 'ROW_TYPE', A.EMPLID, B.FIRST_NAME, B.LAST_NAME,
A.BANK_CD, A.ACCOUNT_NUM, ACCOUNT_TYPE, PRIORITY, A.LAST_UPDATE_DATE
FROM PS_DIRECT_DEPOSIT D
INNER JOIN PS_DIR_DEP_DISTRIB A ON A.EMPLID = D.EMPLID AND A.EFFDT = D.EFFDT
INNER JOIN PS_EMPLOYEES B ON B.EMPLID = A.EMPLID
WHERE
B.EMPL_STATUS NOT IN ('T','R','D')
AND ((A.DEPOSIT_TYPE = 'P' AND A.AMOUNT_PCT = 100)
OR A.PRIORITY = 999
OR A.DEPOSIT_TYPE = 'B')
AND A.EFFDT = (SELECT MAX(A1.EFFDT)
FROM PS_DIR_DEP_DISTRIB A1
WHERE A1.EMPLID = A.EMPLID
AND A1.EFFDT <= GETDATE())
AND D.EFF_STATUS = 'A'
AND D.EFFDT = (SELECT MAX(D1.EFFDT)
FROM PS_DIRECT_DEPOSIT D1
WHERE D1.EMPLID = D.EMPLID
AND D1.EFFDT <= GETDATE())
AND A.LAST_UPDATE_DATE >= GETDATE() - 7
What I would like to add onto this is to also add the previous (2nd MAX) row per EMPLID, so that I can output the 'old' row (that was prior to the last update the latest row meeting above criteria), along with the new row that I already am outputting in the query.
ROW_TYPE EMPLID FIRST_NAME LAST_NAME BANK_CD ACCOUNT_NUM ACCOUNT_TYPE PRIORITY LAST_UPDATE_DATE
NEW ROW 12345 JOHN SMITH 123548999 45234879 C 999 2019-03-06 00:00:00.000
OLD ROW 12345 JOHN SMITH 214080046 92178616 C 999 2018-10-24 00:00:00.000
NEW ROW 56399 CHARLES MASTER 785816167 84314314 C 999 2019-03-07 00:00:00.000
OLD ROW 56399 CHARLES MASTER 345761227 547352 C 999 2017-05-16 00:00:00.000
So the EMPLID would be ordered by NEW ROW, followed by OLD ROW as shown above. In this example the 'NEW ROW' is getting the record that is within the past 7 days, as indicated by the LAST_UPDATE_DATE.
I would like to get feedback on how to modify the query so I can also get the 'old' row (which is the max row that is less than the 'NEW' row retrieved above).
It was a slow day for crime in Gotham, so I gave this a whirl. Might work.
This is unlikely to work right out of the box, though, but it should get you started.
Your LAST_UPDATE_DATE column is on the table PS_DIR_DEP_DISTRIB, so we'll start there. First, you want to identify all of the records that were updated in the last 7 days because those are the only ones you're interested in. Throughout this, I'm assuming, and I'm probably wrong, that the natural key for the table consists of EMPLID, BANK_CD, and ACCOUNT_NUM. You'll want to sub in the actual natural key for those columns in a few places. That said, the date limiter looks something like this:
SELECT
EMPLID
,BANK_CD
,ACCOUNT_NUM
FROM
PS_DIR_DEP_DISTRIB AS limit
WHERE
limit.LAST_UPDATE_DATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE))
AND
limit.LAST_UPDATE_DATE <= CAST(GETDATE() AS DATE)
Now we'll use that as a correlated sub-query in a WHERE EXISTS clause that we'll correlate back to the base table to limit ourselves to records with natural key values that were updated in the last week. I altered the SELECT list to just SELECT 1, which is typical verbiage for a correlated sub, since it stops looking for a match when it finds one (1), and doesn't actually return any values at all.
Additionally, since we're filtering this record set anyway, I moved all the other WHERE clause filters for this table into this (soon to be) sub-query.
Finally, in the SELECT portion, I added a DENSE_RANK to force order the records. We' use the DENSE_RANK value later to filter off only the first (N) records of interest.
So that leaves us with this:
SELECT
EMPLID
,BANK_CD
,ACCOUNT_NUM
--,ACCOUNT_TYPE --Might belong here. Can't tell without table alias in original SELECT
,PRIORITY
,EFFDT
,LAST_UPDATE_DATE
,DEPOSIT_TYPE
,AMOUNT_PCT
,DENSE_RANK() OVER (PARTITION BY --Add actual natural key columns here...
EMPLID
ORDER BY
LAST_UPDATE_DATE DESC
) AS RowNum
FROM
PS_DIR_DEP_DISTRIB AS sdist
WHERE
EXISTS
(
-- Get the set of records that were last updated in the last 7 days.
-- Correlate to the outer query so it only returns records related to this subset.
-- This uses a correlated subquery. A JOIN will work, too. Try both, pick the faster one.
-- Something like this, using the actual natural key columns in the WHERE
SELECT
1
FROM
PS_DIR_DEP_DISTRIB AS limit
WHERE
--The first two define the date range.
limit.LAST_UPDATE_DATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE))
AND limit.LAST_UPDATE_DATE <= CAST(GETDATE() AS DATE)
AND
--And these are the correlations to the outer query.
limit.EMPLID = sdist.EMPLID
AND limit.BANK_CD = sdist.BANK_CD
AND limit.ACCOUNT_NUM = sdist.ACCOUNT_NUM
)
AND
(
dist.DEPOSIT_TYPE = 'P'
AND dist.AMOUNT_PCT = 100
)
OR dist.PRIORITY = 999
OR dist.DEPOSIT_TYPE = 'B'
Replace the original INNER JOIN to PS_DIR_DEP_DISTRIB with that query. In the SELECT list, the first hard-coded value is now dependent on the RowNum value, so that's a CASE expression now. In the WHERE clause, the dates are all driven by the subquery, so they're gone, several were folded into the subquery, and we're adding WHERE dist.RowNum <= 2 to bring back the top 2 records.
(I also replaced all the table aliases so I could keep track of what I was looking at.)
SELECT
CASE dist.RowNum
WHEN 1 THEN 'NEW ROW'
ELSE 'OLD ROW'
END AS ROW_TYPE
,dist.EMPLID
,emp.FIRST_NAME
,emp.LAST_NAME
,dist.BANK_CD
,dist.ACCOUNT_NUM
,ACCOUNT_TYPE
,dist.PRIORITY
,dist.LAST_UPDATE_DATE
FROM
PS_DIRECT_DEPOSIT AS dd
INNER JOIN
(
SELECT
EMPLID
,BANK_CD
,ACCOUNT_NUM
--,ACCOUNT_TYPE --Might belong here. Can't tell without table alias in original SELECT
,PRIORITY
,EFFDT
,LAST_UPDATE_DATE
,DEPOSIT_TYPE
,AMOUNT_PCT
,DENSE_RANK() OVER (PARTITION BY --Add actual natural key columns here...
EMPLID
ORDER BY
LAST_UPDATE_DATE DESC
) AS RowNum
FROM
PS_DIR_DEP_DISTRIB AS sdist
WHERE
EXISTS
(
-- Get the set of records that were last updated in the last 7 days.
-- Correlate to the outer query so it only returns records related to this subset.
-- This uses a correlated subquery. A JOIN will work, too. Try both, pick the faster one.
-- Something like this, using the actual natural key columns in the WHERE
SELECT
1
FROM
PS_DIR_DEP_DISTRIB AS limit
WHERE
--The first two define the date range.
limit.LAST_UPDATE_DATE >= DATEADD(DAY, -7, CAST(GETDATE() AS DATE))
AND limit.LAST_UPDATE_DATE <= CAST(GETDATE() AS DATE)
AND
--And these are the correlations to the outer query.
limit.EMPLID = sdist.EMPLID
AND limit.BANK_CD = sdist.BANK_CD
AND limit.ACCOUNT_NUM = sdist.ACCOUNT_NUM
)
AND
(
dist.DEPOSIT_TYPE = 'P'
AND dist.AMOUNT_PCT = 100
)
OR dist.PRIORITY = 999
OR dist.DEPOSIT_TYPE = 'B'
) AS dist
ON
dist.EMPLID = dd.EMPLID
AND dist.EFFDT = dd.EFFDT
INNER JOIN
PS_EMPLOYEES AS emp
ON
emp.EMPLID = dist.EMPLID
WHERE
dist.RowNum <= 2
AND
emp.EMPL_STATUS NOT IN ('T', 'R', 'D')
AND
dd.EFF_STATUS = 'A';

SQL Server - How to fill in missing column values

I have set of records at day level with 2 columns:
Invoice_date
Invoice_amount
For few records, value of invoice_amount is missing.
I need to fill invoice_amount values where it is NULL using this logic:
Look for next available invoice_amount (in dates later than the blank value record date)
For records with invoice_amount still blank (invoice_amount not present for future dates), look for most previous invoice_amount (in dates before the blank value date)
Note: We have consecutive multiple days where invoice_amount is blank in the dataset:
use CROSS APPLY to find next and previous not null Invoice Amount
update p
set Invoice_Amount = coalesce(nx.Invoice_Amount, pr.Invoice_Amount)
from Problem p
outer apply -- Next non null value
(
select top 1 *
from Problem x
where x.Invoice_Amount is not null
and x.Invoice_Date > p.Invoice_Date
order by Invoice_Date
) nx
outer apply -- Prev non null value
(
select top 1 *
from Problem x
where x.Invoice_Amount is not null
and x.Invoice_Date < p.Invoice_Date
order by Invoice_Date desc
) pr
where p.Invoice_Amount is null
this updates back your table. If you need a select query, it can be modify to it easily
Not efficient but seems to work. Try:
update test set invoice_amount =
coalesce ((select top 1 next.invoice_amount from test next
where next.invoiceDate > test.invoiceDate and next.invoice_amount is not null
order by next.invoiceDate),
(select top 1 prev.invoice_amount from test prev
where prev.invoiceDate < test.invoiceDate and prev.invoice_amount is not null
order by prev.invoiceDate desc))
where invoice_amount is null;
As per given example you could use window function with self join
update t set t.amount = tt.NewAmount
from table t
inner join (
select Dates, coalesce(min(amount) over (order by dates desc ROWS BETWEEN 1 PRECEDING AND CURRENT ROW),
min(amount) over (order by dates asc ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)) NewAmount
from table t
) tt on tt.dates = t.dates
where t.amount is null

How to calculate the date difference between rows with special first row rule

I have multiple applications each with unique application ID's, Each application has logs which are stored in a table. I want to know how to calculate the date difference
eg:
AppID Start Loggedon change
A1 08/07/2010 08/09/2010 Xchange
A1 08/07/2010 08/20/2010 Ychange
A1 08/07/2010 08/30/2010 Zchange
A2 07/07/2010 07/13/2010 Ychange
A3 09/07/2010 09/09/2010 Xchange
So I want the values
Difference
2 (Difference between the application start date and 1st loggedon date)
11 (difference between 08/20 and 08/09, the prior row because AppID stayed the same)
10 (difference between 08/30 and 08/20, the prior row because AppID stayed the same)
6 (Difference between the application start date and 1st loggedon date)
2 (Difference between the application start date and 1st loggedon date)
Hope I am clear. How can I achieve this, I tried Ranking and Row_Number. But I might be wrong somewhere. I am using SQL Server and cannot use LAG()
#Hogan has the correct approach, but for some reason, I can't get it to work completely. Here is a slight variation that seems to produce the correct result -- however, please accept #Hogan's answer as he beat me to it :)
WITH cte AS (
SELECT AppId, Start, LoggedOn,
ROW_NUMBER() OVER (ORDER BY AppId) rn
FROM Apps
)
SELECT c.AppId,
CASE
WHEN c2.RN IS NULL
THEN DATEDIFF(day,c.start,c.Loggedon)
ELSE
DATEDIFF(day,c2.Loggedon,c.Loggedon)
END as TimeWanted
FROM cte c
LEFT JOIN cte c2 on c.AppId = c2.AppId
AND c.rn = c2.rn + 1
And here is the Fiddle.
Good luck.
Ok this will work. Enjoy.
Now tested! Thanks sgeddes -- http://sqlfiddle.com/#!3/e4d28/19
WITH tableNumbered AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY AppID ORDER BY AppID, Start , Loggedon ) AS row
FROM Apps
)
SELECT t.*,
CASE WHEN t2.row IS NULL THEN DATEDIFF(day,t.Start,t.LoggedOn)
ELSE DATEDIFF(day,t2.LoggedOn,t.Loggedon)
END as diff
FROM tableNumbered t
LEFT JOIN tableNumbered t2 ON t.AppID = t2.AppID AND t2.row+1 = t.row
I still think you should do it in the UI.

How do I refer to a record in sql that immediately precedes another record in a group?

I have a weird update query to write.
Here's the table
PK-ID (int) --- FK-ID (int) --- Value (int)
In my data set, if I group by FK-ID and order by PK-ID, suppose this is an example of one group:
5 --- 10 --- 23
7 --- 10 --- 49
8 --- 10 --- 81
Due to a bug in some old software, records 7 and 8 have incorrect values. The correct value for 7 is (49-23) = 26 and the correct value for 8 is (81-49) = 32. Record 5 is correct.
I need to update each record to subtract the value of the record immediately preceding it when it is grouped by FK-ID and ordered by PK-ID. If there is no preceding record I do not need to change the value.
Is there a way to write a general sql update query to accomplish this? How would I (conditionally) retrieve the value of the preceding record in the group? I'm using SQL server 2008.
Thanks!
with ordered as (
select *, rn = row_number() over (partition by fk_id order by pk_id)
from tbl
)
update cur
set value = cur.value - prior.value
from ordered cur
join ordered prior on prior.fk_id = cur.fk_id
and prior.rn = cur.rn-1;
This is what I believe to be the correct answer, using a similar idea to the previous one. The toupdate subquery calculates the values, based on the rules in the question (update records with the same foreign key and consecutive primary keys). It does assume that the ids are nuemric values of some sort.
with toupdate as (
select t.pkid, t.value - tprev.value as newval
from t join
t tprev
on t.pkid = tprev.pkid+1 and t.fkid = tprev.fkid
)
update t
set value = newvalue
from toupdate
where t.pkid = toupdate.pkid
update t set value = value -
isnull((select top 1 value
from t t2
where t2.FKID=t.FKID
and t2.PKID<t.PKID
order by PKID desc),0);
Here is a SQLFiddle demo
I hope it should return what you want(sorry, I cannot try it the moment); you just need to incorporate it with UPDATE
WITH cte1 AS
(SELECT pk_id, fk_id, value, ROW_NUMBER() OVER (PARTITION BY fk_id ORDER BY pk_id DESC)
as num
FROM your_table
)
SELECT a.*,
--CASE
-- WHEN b.pk_id IS NOT NULL THEN a.value-b.value
-- ELSE 0 END
a.value-b.value as valid_number
FROM cte1 a
--LEFT JOIN cte1 b ON (b.fk_id = a.fk_id AND b.num = a.num-1)
INNER JOIN cte1 b ON (b.fk_id = a.fk_id AND b.num = a.num-1)

SQL if breaking number pattern, mark record?

I have the following query:
SELECT AccountNumber, RptPeriod
FROM dbo.Report
ORDER BY AccountNumber, RptPeriod.
I get the following results:
123 200801
123 200802
123 200803
234 200801
344 200801
344 200803
I need to mark the record where the rptperiod doesnt flow concurrently for the account. For example 344 200803 would have an X next to it since it goes from 200801 to 200803.
This is for about 19321 rows and I want it on a company basis so between different companies I dont care what the numbers are, I just want the same company to show where there is breaks in the number pattern.
Any Ideas??
Thanks!
OK, this is kind of ugly (double join + anti-join) but it gets the work done, AND is pure portable SQL:
SELECT *
FROM dbo.Report R1
, dbo.Report R2
WHERE R1.AccountNumber = R2.AccountNumber
AND R2.RptPeriod - R1.RptPeriod > 1
-- subsequent NOT EXISTS ensures that R1,R2 rows found are "next to each other",
-- e.g. no row exists between them in the ordering above
AND NOT EXISTS
(SELECT 1 FROM dbo.Report R3
WHERE R1.AccountNumber = R3.AccountNumber
AND R2.AccountNumber = R3.AccountNumber
AND R1.RptPeriod < R3.RptPeriod
AND R3.RptPeriod < R2.RptPeriod
)
Something like this should do it:
-- cte lists all items by AccountNumber and RptPeriod, assigning an ascending integer
-- to each RptPeriod and restarting at 1 for each new AccountNumber
;WITH cte (AccountNumber, RptPeriod, Ranking)
as (select
AccountNumber
,RptPeriod
,row_number() over (partition by AccountNumber order by AccountNumber, RptPeriod) Ranking
from dbo.Report)
-- and then we join each row with each preceding row based on that "Ranking" number
select
This.AccountNumber
,This.RptPeriod
,case
when Prior.RptPeriod is null then '' -- Catches the first row in a set
when Prior.RptPeriod = This.RptPeriod - 1 then '' -- Preceding row's RptPeriod is one less that This row's RptPeriod
else 'x' -- -- Preceding row's RptPeriod is not less that This row's RptPeriod
end UhOh
from cte This
left outer join cte Prior
on Prior.AccountNumber = This.AccountNumber
and Prior.Ranking = This.Ranking - 1
(Edited to add comments)
WITH T
AS (SELECT *,
/*Each island of contiguous data will have
a unique AccountNumber,Grp combination*/
RptPeriod - ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) Grp,
/*RowNumber will be used to identify first record
per company, this should not be given an 'X'. */
ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) AS RN
FROM Report)
SELECT AccountNumber,
RptPeriod,
/*Check whether first in group but not first over all*/
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY AccountNumber, Grp
ORDER BY RptPeriod) = 1
AND RN > 1 THEN 'X'
END AS Flag
FROM T
SELECT *
FROM report r
LEFT JOIN report r2
ON r.accountnumber = r.accountnumber
AND {r2.rptperiod is one day after r.rptPeriod}
JOIN report r3
ON r3.accountNumber = r.accountNumber
AND r3.rptperiod > r1.rptPeriod
WHERE r2.rptPeriod IS NULL
AND r3 IS NOT NULL
I'm not sure of sql servers date logic syntax, but hopefully you get the idea. r will be all the records where the next rptPeriod is NULL (r2) and there exists at least one greater rptPeriod (r3). The query isn't super straight forward I guess, but if you have an index on the two columns, it'll probably be the most efficent way to get your data.
Basically, you number rows within every account, then, using the row numbers, compare the RptPeriod values for the neighbouring rows.
It is assumed here that RptPeriod is the year and month encoded, for which case the year transition check has been added.
;WITH Report_sorted AS (
SELECT
AccountNumber,
RptPeriod,
rownum = ROW_NUMBER() OVER (PARTITION BY AccountNumber ORDER BY RptPeriod)
FROM dbo.Report
)
SELECT
AccountNumber,
RptPeriod,
CASE ISNULL(CASE WHEN r1.RptPeriod / 100 < r2.RptPeriod / 100 THEN 12 ELSE 0 END
+ r1.RptPeriod - r2.RptPeriod, 1) AS Chk
WHEN 1 THEN ''
ELSE 'X'
END
FROM Report_sorted r1
LEFT JOIN Report_sorted r2
ON r1.AccountNumber = r2.AccountNumber AND r1.rownum = r2.rownum + 1
It could be complicated further with an additional check for gaps spanning a year and more, if you need that.