Overlapping Spans - sql

I am trying to write a query that reorders date ranges around particular spans. It should do something that looks like this
Row Rank Begin Date End Date
1 B 3/24/13 11/1/13
2 A 10/30/13 4/9/15
3 B 3/26/15 12/31/15
and have it become
Row Rank Begin Date End Date
1 B 3/24/13 10/29/13
2 A 10/30/13 4/9/15
3 B 4/10/15 12/31/15
To explain further, the dates in row 2 is ranked higher (A>B), so the dates in row 1 and 3 have to change around the dates in row 2 in order to avoid overlap in dates.
I am using SQL Server 2008 R2

You can use the following query:
;WITH CTE AS (
SELECT Row, Rank, BeginDate, EndDate,
ROW_NUMBER() OVER (ORDER BY BeginDate) AS rn
FROM mytable
), ToUpdate AS (
SELECT c1.Row, c1.Rank, c1.BeginDate, c1.EndDate,
c2.Rank AS pRank, c2.EndDate AS pEndDate,
c3.Rank AS nRank, c3.BeginDate AS nBeginDate
FROM CTE AS c1
LEFT JOIN CTE AS c2 ON c1.rn = c2.rn + 1
LEFT JOIN CTE AS c3 ON c1.rn = c3.rn - 1
WHERE c1.Rank = 'B'
)
UPDATE ToUpdate
SET BeginDate = CASE
WHEN pEndDate IS NULL
THEN BeginDate
WHEN (pEndDate >= BeginDate) AND (pRank = 'A')
THEN DATEADD(d, 1, pEndDate)
ELSE BeginDate
END,
EndDate = CASE
WHEN nBeginDate IS NULL
THEN EndDate
WHEN (nBeginDate <= EndDate) AND (nRank = 'A')
THEN DATEADD(d, -1, nBeginDate)
ELSE EndDate
END
A CTE is initially constructed to assign consecutive, ascending numbers to every record of your table. ROW_NUMBER() window function is used for this purpose.
Using this CTE as a basis we construct ToUpdate. This latter CTE contains date values of current as well as previous and next records.
This LEFT JOIN:
LEFT JOIN CTE AS c2 ON c1.rn = c2.rn + 1
is used to join together with previous record, whereas this one:
LEFT JOIN CTE AS c3 ON c1.rn = c3.rn - 1
is used to join together with next record.
Using CASE expressions we can now easily identify overlaps, and, in case there is one, perform an update.
Demo here

Please use the below query to update the table.
Update table_name
set End_Date = DATEADD(day, -1, select Begin_Date from Table_name where
Row_number = '2')
where row = 1;
You need to change the row numbers every time you run the query. Let me know If this works for you.
I suggest First create a View
CREATE OR REPLACE VIEW tempview AS
SELECT row, begin_date FROM table_name
WHERE row > 1;
Then Use this query to update all the row. If may not update just the first row.
Update table_name
set End_Date = DATEADD(day, -1, select Begin_Date from tempview)
Hope this works

Related

How to select a single row for each unique ID

SQL novice here learning on the job, still a greenhorn. I have a problem I don't know how to overcome. Using IBM Netezza and Aginity Workbench.
My current output will try to return one row per case number based on when a task was created. It will only keep the row with the newest task. This gets me about 85% of the way there. The issue is that sometimes multiple tasks have a create day of the same day.
I would like to incorporate Task Followup Date to only keep the newest row if there are multiple rows with the same Case Number. I posted an example of what my current code outputs and what i would like it to output.
Current code
SELECT
A.PS_CASE_ID AS Case_Number
,D.CASE_TASK_TYPE_NM AS Task
,C.TASK_CRTE_TMS
,C.TASK_FLWUP_DT AS Task_Followup_Date
FROM VW_CC_CASE A
INNER JOIN VW_CASE_TASK C ON (A.CASE_ID = C.CASE_ID)
INNER JOIN VW_CASE_TASK_TYPE D ON (C.CASE_TASK_TYPE_ID = D.CASE_TASK_TYPE_ID)
INNER JOIN ADMIN.VW_RSN_CTGY B ON (A.RSN_CTGY_ID = B.RSN_CTGY_ID)
WHERE
(A.PS_Z_SPSR_ID LIKE '%EFT' OR A.PS_Z_SPSR_ID LIKE '%CRDT')
AND CAST(A.CASE_CRTE_TMS AS DATE) >= '2020-01-01'
AND B.RSN_CTGY_NM = 'Chargeback Initiation'
AND CAST(C.TASK_CRTE_TMS AS DATE) = (SELECT MAX(CAST(C2.TASK_CRTE_TMS AS DATE)) from VW_CASE_TASK C2 WHERE C2.CASE_ID = C.CASE_ID)
GROUP BY
A.PS_CASE_ID
,D.CASE_TASK_TYPE_NM
,C.TASK_CRTE_TMS
,C.TASK_FLWUP_DT
Current output
Desired output
You could use ROW_NUMBER here:
WITH cte AS (
SELECT DISTINCT A.PS_CASE_ID AS Case_Number, D.CASE_TASK_TYPE_NM AS Task,
C.TASK_CRTE_TMS, C.TASK_FLWUP_DT AS Task_Followup_Date,
ROW_NUMBER() OVER (PARTITION BY A.PS_CASE_ID ORDER BY C.TASK_FLWUP_DT DESC) rn
FROM VW_CC_CASE A
INNER JOIN VW_CASE_TASK C ON A.CASE_ID = C.CASE_ID
INNER JOIN VW_CASE_TASK_TYPE D ON C.CASE_TASK_TYPE_ID = D.CASE_TASK_TYPE_ID
INNER JOIN ADMIN.VW_RSN_CTGY B ON A.RSN_CTGY_ID = B.RSN_CTGY_ID
WHERE (A.PS_Z_SPSR_ID LIKE '%EFT' OR A.PS_Z_SPSR_ID LIKE '%CRDT') AND
CAST(A.CASE_CRTE_TMS AS DATE) >= '2020-01-01' AND
B.RSN_CTGY_NM = 'Chargeback Initiation' AND
CAST(C.TASK_CRTE_TMS AS DATE) = (SELECT MAX(CAST(C2.TASK_CRTE_TMS AS DATE))
FROM VW_CASE_TASK C2
WHERE C2.CASE_ID = C.CASE_ID)
)
SELECT
Case_Number,
Task,
TASK_CRTE_TMS,
Task_Followup_Date
FROM cte
WHERE rn = 1;
One method used window functions:
with cte as (
< your query here >
)
select x.*
from (select cte.*,
row_number() over (partition by case_number, Task_Followup_Date
order by TASK_CRTE_TMS asc
) as seqnum
from cte
) x
where seqnum = 1;

Select 1st and 2nd Record before record X

SQL Server 2008-12
I have table:
InteractionKey char(18)
dEventTime datetime
SeqNo int
cEventData1
There will be multiple entries per InteractionKey - dEventTime only goes out to the Seconds and SeqNo is incremented if two entries occur on the same second.
What I need to do is select the First and Second record BEFORE the record where
cEventData1 = 'Disconnect'
The final product will give me a count of occurrences grouped by cEventData1.
I am currently using a cursor (will update with cursor source momentarily) I would like to use a CTE - but I really struggle with understanding them...
Any ideas would be appreciated!
Update with Data Sample
INTERACTIONKEY dEventTime SeqNo cEventData1
100186322420130722 2013-07-22 11:50:49.000 1 EnterPassword
100186322420130722 2013-07-22 11:50:49.000 2 CheckPassword
100186322420130722 2013-07-22 11:50:49.000 3 Attendant Disconnect
The result of the query would ideally tell me - : NOTE The Action column here can be simply 'Attendant Disconnect' as Action
cEventData1 Action Count
CheckPassword Attendant Disconnect 1
Here is the query I ended up going with based upon the below answer
SELECT DISTINCT t1.InteractionKey,
DisconnectTime = t1.dEventTime,
PreviousEventTime = t2.dEventTime,
PreviousEvent = t2.cEventData1,
t2.SeqNo
FROM IVRHistory t1
OUTER APPLY
( SELECT TOP 1 t2.dEventTime, t2.SeqNo, t2.cEventData1
FROM IVRHistory t2
WHERE t1.InteractionKey = t2.InteractionKey
AND t1.dEventTime >= t2.dEventTime
AND t1.SeqNo > t2.SeqNo
AND t2.cEventData1 <> 'Attendant Disconnect'
ORDER BY t2.dEventTime DESC, t2.SeqNo DESC
) t2
WHERE t1.cEventData1 = 'Attendant Disconnect'
I would approach this using APPLY:
SELECT t1.InteractionKey,
DisconnectTime = t1.dEventTime,
PreviousEventTime = t2.dEventTime,
PreviousEvent = t2.cEventData1,
t2.SeqNo
FROM T t1
OUTER APPLY
( SELECT TOP 2 t2.dEventTime, t2.SeqNo, t2.cEventData1
FROM T t2
WHERE t1.InteractionKey = t2.InteractionKey
AND t1.dEventTime > t2.dEventTime
ORDER BY t2.dEventTime DESC
) t2
WHERE t1.cEventData1 = 'Disconnect';
This will give you the two records immediately preceeding the disconnect event. If you need more than two records if there are duplicate times you can use TOP 2 WITH TIES.
Without your sample input and output I am guessing a bit, but from what you have said your final aggregate would be:
SELECT t2.cEventData1,
Occurances = COUNT(*)
FROM T t1
OUTER APPLY
( SELECT TOP 2 t2.dEventTime, t2.SeqNo, t2.cEventData1
FROM T t2
WHERE t1.InteractionKey = t2.InteractionKey
AND t1.dEventTime > t2.dEventTime
ORDER BY t2.dEventTime DESC
) t2
WHERE t1.cEventData1 = 'Disconnect'
GROUP BY t2.cEventData1;

Query which gives list of dates between two date ranges

I am sorry for this but my previous question was not properly framed, so creating another post.
My question is similar to following question:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:14582643282111
I need to write inner query which will give me a list of dates between two date ranges to outer query.
My inner query returns following 2 rows:
SELECT request.REQ_DATE, request.DUE_DATE FROM myTable where id = 100
REQ_DATE DUE_DATE
3/19/2013 3/21/2013
3/8/2013 3/8/2013
So I need inner query which will return following dates to outer query:
3/19/2013
3/20/2013
3/21/2013
3/8/2013
The answer in above post has start date and end date hard coded and in my case, it is coming from other table. So I am trying to write query like this which does not work:
 
Select * from outerTable where my_date in
(
select to_date(r.REQ_DATE) + rownum -1 from all_objects,
(
SELECT REQ_DATE, DUE_DATE
FROM myTable where id = 100
) r
where rownum <= to_date(r.DUE_DATE,'dd-mon-yyyy')-to_date(r.REQ_DATE,'dd-mon-yyyy')+1;
)
with
T_from_to as (
select
trunc(REQ_DATE) as d_from,
trunc(DUE_DATE) as d_to
FROM myTable
where id = 100
),
T_seq as (
select level-1 as delta
from dual
connect by level-1 <= (select max(d_to-d_from) from T_from_to)
)
select distinct d_from + delta
from T_from_to, T_seq
where d_from + delta <= d_to
order by 1

How to find the average time difference between rows in a table?

I have a mysql database that stores some timestamps. Let's assume that all there is in the table is the ID and the timestamp. The timestamps might be duplicated.
I want to find the average time difference between consecutive rows that are not duplicates (timewise). Is there a way to do it in SQL?
If your table is t, and your timestamp column is ts, and you want the answer in seconds:
SELECT TIMESTAMPDIFF(SECOND, MIN(ts), MAX(ts) )
/
(COUNT(DISTINCT(ts)) -1)
FROM t
This will be miles quicker for large tables as it has no n-squared JOIN
This uses a cute mathematical trick which helps with this problem. Ignore the problem of duplicates for the moment. The average time difference between consecutive rows is the difference between the first timestamp and the last timestamp, divided by the number of rows -1.
Proof: The average distance between consecutive rows is the sum of the distance between consective rows, divided by the number of consecutive rows. But the sum of the difference between consecutive rows is just the distance between the first row and last row (assuming they are sorted by timestamp). And the number of consecutive rows is the total number of rows -1.
Then we just condition the timestamps to be distinct.
Are the ID's contiguous ?
You could do something like,
SELECT
a.ID
, b.ID
, a.Timestamp
, b.Timestamp
, b.timestamp - a.timestamp as Difference
FROM
MyTable a
JOIN MyTable b
ON a.ID = b.ID + 1 AND a.Timestamp <> b.Timestamp
That'll give you a list of time differences on each consecutive row pair...
Then you could wrap that up in an AVG grouping...
Here's one way:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on cur.id = prev.id + 1
and cur.datecol <> prev.datecol
The timestampdiff function allows you to choose between days, months, seconds, and so on.
If the id's are not consecutive, you can select the previous row by adding a rule that there are no other rows in between:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on prev.datecol < cur.datecol
and not exists (
select *
from table inbetween
where prev.datecol < inbetween.datecol
and inbetween.datecol < cur.datecol)
)
OLD POST but ....
Easies way is to use the Lag function and TIMESTAMPDIFF
SELECT
id,
TIMESTAMPDIFF('MINUTES', PREVIOUS_TIMESTAMP, TIMESTAMP) AS TIME_DIFF_IN_MINUTES
FROM (
SELECT
id,
TIMESTAMP,
LAG(TIMESTAMP, 1) OVER (ORDER BY TIMESTAMP) AS PREVIOUS_TIMESTAMP
FROM TABLE_NAME
)
Adapted for SQL Server from this discussion.
Essential columns used are:
cmis_load_date: A date/time stamp associated with each record.
extract_file: The full path to a file from which the record was loaded.
Comments:
There can be many records in each file. Records have to be grouped by the files loaded on the extract_file column. Intervals of days may pass between one file and the next being loaded. There is no reliable sequential value in any column, so the grouped rows are sorted by the minimum load date in each file group, and the ROW_NUMBER() function then serves as an ad hoc sequential value.
SELECT
AVG(DATEDIFF(day, t2.MinCMISLoadDate, t1.MinCMISLoadDate)) as ElapsedAvg
FROM
(
SELECT
ROW_NUMBER() OVER (ORDER BY MIN(cmis_load_date)) as RowNumber,
MIN(cmis_load_date) as MinCMISLoadDate,
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END as ExtractFile
FROM
TrafTabRecordsHistory
WHERE
court_id = 17
and
cmis_load_date >= '2019-09-01'
GROUP BY
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END
) t1
LEFT JOIN
(
SELECT
ROW_NUMBER() OVER (ORDER BY MIN(cmis_load_date)) as RowNumber,
MIN(cmis_load_date) as MinCMISLoadDate,
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END as ExtractFile
FROM
TrafTabRecordsHistory
WHERE
court_id = 17
and
cmis_load_date >= '2019-09-01'
GROUP BY
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END
) t2 on t2.RowNumber + 1 = t1.RowNumber

Row Number Tsql

I have the following Syntax
select rcp.CalendarPeriodId
,rc.CalendarId
,rcp.CalendarYearId
,rcp.PeriodNumber
,rcp.PeriodStartDate,rcp.PeriodEndDate
,CASE WHEN GETDATE() BETWEEN rcp.PeriodStartDate AND rcp.PeriodEndDate THEN 1 ELSE 0 END AS 'CurrentPeriod'
from RentCalendarPeriod rcp
LEFT JOIN RentCalendarYear rcy ON rcy.CalenderYearId = rcp.CalendarYearId
LEFT JOIN RentCalendar rc ON rc.CalendarId = rcy.CalendarId
What this is doing is that a I have two Calendars (CalenderID 1 = Weekly, CalenderID 2 = Monthly) This is the RentCalendar table.
Each Rent Calendar has a Year (RentCalendarYear table),which in turn each Year has a set of periods.
You will notice that line 47, the final column has been marked as 1 (true) This is because it is the current period.
What I need to do is mark the previous 12 periods for any CalendarId. I was wondering if I could achieve this with ROW_NUMBER, with the field CurrentPeriod WHERE = 1 will be 1 and all periods before will start to be numbered 2, 3, 4, 5 and so on.
I don't know how to do this though.
So something like this:
SELECT * FROM (
select rcp.CalendarPeriodId,rc.CalendarId,rcp.CalendarYearId,rcp.PeriodNumber,rcp.PeriodStartDate,rcp.PeriodEndDate,
ROW_NUMBER() OVER(ORDER BY PeriodStartDate DESC) AS CurrentPeriod
from RentCalendarPeriod rcp
LEFT JOIN RentCalendarYear rcy ON rcy.CalenderYearId = rcp.CalendarYearId
LEFT JOIN RentCalendar rc ON rc.CalendarId = rcy.CalendarId)
WHERE currentperiod <= 12
I'm not sure if I understood you correctly.. this will give you for the latests week 1, second one 2 , third one 3 and so on in CurrentPeriod column
Something like this:
;WITH CTE AS (
SELECT rcp.CalendarPeriodId, rc.CalendarId, rcp.CalendarYearId,
rcp.PeriodNumber, rcp.PeriodStartDate, rcp.PeriodEndDate,
ROW_NUMBER() OVER (ORDER BY rcp.CalendarPeriodId) AS rn,
CASE
WHEN GETDATE() BETWEEN rcp.PeriodStartDate AND
rcp.PeriodEndDate THEN 1
ELSE 0
END AS 'CurrentPeriod'
FROM RentCalendarPeriod rcp
LEFT JOIN RentCalendarYear rcy ON rcy.CalenderYearId = rcp.CalendarYearId
LEFT JOIN RentCalendar rc ON rc.CalendarId = rcy.CalendarId
)
SELECT CalendarPeriodId, CalendarId, CalendarYearId,
PeriodNumber, PeriodStartDate, PeriodEndDate,
'CurrentPeriod',
(t.rn + 1) - c.rn AS rn
FROM CTE AS c
CROSS JOIN (SELECT rn FROM CTE WHERE 'CurrentPeriod' = 1) AS t
WHERE rn BETWEEN t.rn - 11 AND t.rn
This will return 12 records, the one having CurrentPeriod = 1 and the previous 11 records. Field rn enumerates records starting from the one having CurrentPeriod = 1.