SQL gaps in dates - sql

I am trying to find gaps in the a table based on a state code the tables look like this.
StateTable:
StateID (PK) | Code
--------------------
1 | AK
2 | AL
3 | AR
StateModel Table:
StateModelID | StateID | EfftiveDate | ExpirationDate
-------------------------------------------------------------------------
1 | 1 | 2012-06-28 00:00:00.000| 2012-08-02 23:59:59.000
2 | 1 | 2012-08-03 00:00:00.000| 2050-12-31 23:59:59.000
3 | 1 | 2055-01-01 00:00:00.000| 2075-12-31 23:59:59.000
The query I am using is the following:
Declare #gapMessage varchar(250)
SET #gapMessage = ''
select
#gapMessage = #gapMessage +
(Select StateTable.Code FROM StateTable where t1.StateID = StateTable.StateID)
+ ' Row ' +CAST(t1.StateModelID as varchar(6))+' has a gap with '+
CAST(t2.StateModelID as varchar(6))+ CHAR(10)
from StateModel t1
inner join StateModel t2
on
t1.StateID = t2.StateID
and DATEADD(ss, 1,t1.ExpirationDate) < t2.EffectiveDate
and t1.EffectiveDate < t2.EffectiveDate
if(#gapMessage != '')
begin
Print 'States with a gap problem'
PRINT #gapMessage
end
else
begin
PRINT 'No States with a gap problem'
end
But with the above table example I get the following output:
States with a gap problem
AK Row 1 has a gap with 3
AK Row 2 has a gap with 3
Is there anyway to restructure my query so that the gap between 1 and 3 does not display because there is not a gap between 1 and 2?
I am using MS sql server 2008
Thanks

WITH
sequenced AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY StateID ORDER BY EfftiveDate) AS SequenceID,
*
FROM
StateModel
)
SELECT
*
FROM
sequenced AS a
INNER JOIN
sequenced AS b
ON a.StateID = b.StateID
AND a.SequenceID = b.SequenceID - 1
WHERE
a.ExpirationDate < DATEADD(second, -1, b.EfftiveDate)
To make this as effective as possible, also add an index on (StateID, EfftiveDate)

I wanted to just give credit to MatBailie, but don't have the points to do it yet, so I thought I would help out anyone else looking for a similar solution that may want to take it a step further like I needed to. I have changed my application of his code (which involves member enrollment) to the same language as the example here.
In my case, I needed these things:
I have two similar tables that I need to develop into one total table. In this example, let's make the tables like this: SomeStates + OtherStates = UpdatedTable. These are UNIONED in the AS clause.
I didn't want to remove any rows due to gaps, but I wanted to flag them on the StateID level. This is added as an additional column 'StateID_GapFlag'.
I also wanted to add a column to hold the oldest or MIN(EffectiveDate). This would be used in later calculations of SUM(period) to get a total duration, excluding gaps. This is the column 'MIN_EffectiveDate'.
;WITH sequenced
( SequenceID
,EffectiveDate
,ExpirationDate)
AS
(select
ROW_NUMBER() OVER (PARTITION BY StateID ORDER by EffectiveDate) as SequenceID,
* from (select EffectiveDate, ExpirationDate from SomeStates
UNION ALL
(select EffectiveDate, ExpirationDate from OtherStates)
) StateModel
where
EffectiveDate > 'filter'
)
Select DISTINCT
IJ1.[MIN_EffectiveDate]
,coalesce(IJ2.GapFlag,'') as [MemberEnrollmentGapFlag]
,EffectiveDate
,ExpirationDate
into UpdatedTable
from sequenced seq
inner join
(select StateID, min(EffectiveDate) as 'MIN_EffectiveDate'
from sequenced
group by StateID
) IJ1
on seq.member# = IJ1.member
left join
(select a.member#, 'GAP' as 'StateID_GapFlag'
from sequenced a
inner join
sequenced b
on a.StateID = b.StateID
and a.SequenceID = (b.sequenceID - 1)
where a.ExpirationDate < DATEADD(day, -1, b.EffectiveDate)
) LJ2
on seq.StateID = LJ2.StateID

You could use ROW_NUMBER to provide an ordering of stateModel's for each state, then check that the second difference for consecutive rows doesn't exceed 1. Something like:
;WITH Models (StateModelID, StateID, Effective, Expiration, RowOrder) AS (
SELECT StateModelID, StateID, EffectiveDate, ExpirationDate,
ROW_NUMBER() OVER (PARTITION BY StateID, ORDER BY EffectiveDate)
FROM StateModel
)
SELECT F.StateModelId, S.StateModelId
FROM Models F
CROSS APPLY (
SELECT M.StateModelId
FROM Models M
WHERE M.RowOrder = F.RowOrder + 1
AND M.StateId = F.StateId
AND DATEDIFF(SECOND, F.Expiration, M.Effective) > 1
) S
This will get you the state model IDs of the rows with gaps, which you can format how you wish.

Related

deleting specific duplicate and original entries in a table based on date

i have a table called "main" which has 4 columns, ID, name, DateID and Sign.
i want to create a query that will delete entries in this table if there is the same ID record in twice within a certain DateID.
i have my where clause that searches the previous 3 weeks
where DateID =((SELECT MAX( DateID)
WHERE DateID < ( SELECT MAX( DateID )-3))
e.g of my dataset im working with:
id
name
DateID
sign
12345
Paul
1915
Up
23658
Danny
1915
Down
37868
Jake
1916
Up
37542
Elle
1917
Up
12345
Paul
1917
Down
87456
John
1918
Up
78563
Luke
1919
Up
23658
Danny
1920
Up
in the case above, both entries for ID 12345 would need to be removed.
however the entries for ID 23658 would need to be kept as the DateID > 3
how would this be possible?
You can use window functions for this.
It's not quite clear, but it seems LAG and conditional COUNT should fit what you need.
DELETE t
FROM (
SELECT *,
CountWithinDate = COUNT(CASE WHEN t.PrevDate >= t.DateId - 3 THEN 1 END) OVER (PARTITION BY t.id)
FROM (
SELECT *,
PrevDate = LAG(t.DateID) OVER (PARTITION BY t.id ORDER BY t.DateID)
FROM YourTable t
) t
) t
WHERE CountWithinDate > 0;
db<>fiddle
Note that you do not need to re-join the table, you can delete directly from the t derived table.
Hope this works:
DELETE FROM test_tbl
WHERE id IN (
SELECT T1.id
FROM test_tbl T1
WHERE EXISTS (SELECT 1 FROM test_tbl T2 WHERE T1.id = T2.id AND ABS(T2.dateid - T1.dateid) < 3 AND T1.dateid <> T2.dateid)
)
In case you need more logic for data processing, I would suggest using Stored Procedure.

datediff for row that meets my condition only once per row

I want to do a datediff between 2 dates on different rows only if the rows have a condition.
my table looks like the following, with additional columns (like guid)
Id | CreateDateAndTime | condition
---------------------------------------------------------------
1 | 2018-12-11 12:07:55.273 | with this
2 | 2018-12-11 12:07:53.550 | I need to compare this state
3 | 2018-12-11 12:07:53.550 | with this
4 | 2018-12-11 12:06:40.780 | state 3
5 | 2018-12-11 12:06:39.317 | I need to compare this state
with this example I would like to have 2 rows in my selection which represent the difference between the dates from id 5-3 and from id 2-1.
As of now I come with a request that gives me the difference between dates from id 5-3 , id 5-1 and id 2-1 :
with t as (
SELECT TOP (100000)
*
FROM mydatatable
order by CreateDateAndTime desc)
select
DATEDIFF(SECOND, f.CreateDateAndTime, s.CreateDateAndTime) time
from t f
join t s on (f.[guid] = s.[guid] )
where f.condition like '%I need to compare this state%'
and s.condition like '%with this%'
and (f.id - s.id) < 0
My problem is I cannot set f.id - s.id to a value since other rows can be between the ones I want to make the diff on.
How can I make the datediff only on the first rows that meet my conditions?
EDIT : To make it more clear
My condition is an eventname and I want to calculate the time between the occurence of my event 1 and my event 2 and fill a column named time for example.
#Salman A answer is really close to what I want except it will not work when my event 2 is not happening (which was not in my initial example)
i.e. in table like the following , it will make the datediff between row id 5 and row id 2
Id | CreateDateAndTime | condition
---------------------------------------------------------------
1 | 2018-12-11 12:07:55.273 | with this
2 | 2018-12-11 12:07:53.550 | I need to compare this state
3 | 2018-12-11 12:07:53.550 | state 3
4 | 2018-12-11 12:06:40.780 | state 3
5 | 2018-12-11 12:06:39.317 | I need to compare this state
the code I modified :
WITH cte AS (
SELECT id
, CreateDateAndTime AS currdate
, LAG(CreateDateAndTime) OVER (PARTITION BY guid ORDER BY id desc ) AS prevdate
, condition
FROM t
WHERE condition IN ('I need to compare this state', 'with this ')
)
SELECT *
,DATEDIFF(second, currdate, prevdate) time
FROM cte
WHERE condition = 'I need to compare this state '
and DATEDIFF(second, currdate, prevdate) != 0
order by id desc
Perhaps you want to match ids with the nearest smaller id. You can use window functions for this:
WITH cte AS (
SELECT id
, CreateDateAndTime AS currdate
, CASE WHEN LAG(condition) OVER (PARTITION BY guid ORDER BY id) = 'with this'
THEN LAG(CreateDateAndTime) OVER (PARTITION BY guid ORDER BY id) AS prevdate
, condition
FROM t
WHERE condition IN ('I need to compare this state', 'with this')
)
SELECT *
, DATEDIFF(second, currdate, prevdate)
FROM cte
WHERE condition = 'I need to compare this state'
The CASE expression will match this state with with this. If you have mismatching pairs then it'll return NULL.
try by using analytic function lead()
with cte as
(
select 1 as id, '2018-12-11 12:07:55.273' as CreateDateAndTime,'with this' as condition union all
select 2,'2018-12-11 12:07:53.550','I need to compare this state' union all
select 3,'2018-12-11 12:07:53.550','with this' union all
select 4,'2018-12-11 12:06:40.780','state 3' union all
select 5,'2018-12-11 12:06:39.317','I need to compare this state'
) select *,
DATEDIFF(SECOND,CreateDateAndTime,lead(CreateDateAndTime) over(order by Id))
from cte
where condition in ('with this','I need to compare this state')
You Ideally want LEADIF/LAGIF functions, because you are looking for the previous row where condition = 'with this'. Since there are no LEADIF/LAGIFI think the best option is to use OUTER/CROSS APPLY with TOP 1, e.g
CREATE TABLE #T (Id INT, CreateDateAndTime DATETIME, condition VARCHAR(28));
INSERT INTO #T (Id, CreateDateAndTime, condition)
VALUES
(1, '2018-12-11 12:07:55', 'with this'),
(2, '2018-12-11 12:07:53', 'I need to compare this state'),
(3, '2018-12-11 12:07:53', 'with this'),
(4, '2018-12-11 12:06:40', 'state 3'),
(5, '2018-12-11 12:06:39', 'I need to compare this state');
SELECT ID1 = t1.ID,
Date1 = t1.CreateDateAndTime,
ID2 = t2.ID,
Date2 = t2.CreateDateAndTime,
Difference = DATEDIFF(SECOND, t1.CreateDateAndTime, t2.CreateDateAndTime)
FROM #T AS t1
CROSS APPLY
( SELECT TOP 1 t2.CreateDateAndTime, t2.ID
FROM #T AS t2
WHERE t2.Condition = 'with this'
AND t2.CreateDateAndTime > t1.CreateDateAndTime
--AND t2.GUID = t.GUID
ORDER BY CreateDateAndTime
) AS t2
WHERE t1.Condition = 'I need to compare this state';
Which Gives:
ID1 Date1 D2 Date2 Difference
-------------------------------------------------------------------------------
2 2018-12-11 12:07:53.000 1 2018-12-11 12:07:55.000 2
5 2018-12-11 12:06:39.000 3 2018-12-11 12:07:53.000 74
I would enumerate the values and then use window functions for the difference.
select min(id), max(id),
datediff(second, min(CreateDateAndTime), max(CreateDateAndTime)) as seconds
from (select t.*,
row_number() over (partition by condition order by CreateDateAndTime) as seqnum
from t
where condition in ('I need to compare this state', 'with this')
) t
group by seqnum;
I cannot tell what you want the results to look like. This version only output the differences, with the ids of the rows you care about. The difference can also be applied to the original rows, rather than put into summary rows.

How to get the first record from a group of records MS sql

I have several business Unit ids with dates that records were added and I want to get the rows of when the last record was added only
2018-06-26 22:54:51.190 1
2018-07-05 10:36:49.563 1
2018-07-16 10:14:04.093 1
2018-07-17 15:24:22.173 1
2018-07-19 10:40:24.700 1
2018-07-23 09:53:34.607 13
2018-07-23 09:53:57.107 13
2018-09-04 14:55:04.860 4
2018-09-04 14:56:34.147 4
should be
2018-07-19 10:40:24.700 1
2018-07-23 09:53:57.107 13
2018-09-04 14:56:34.147 4
I tried this
select
mnt.DateAdded,
bu.BuId
from BusinessUnit bu
inner join MagicNumbersTable mnt on mnt.BuId = bu.BuId
where bu.BuId in (select top 1 b.BuId
from BusinessUnit b inner join MagicNumbersTable m on b.BuId = m.BuId
group by b.BuId, m.DateAdded
order by m.DateAdded desc)
which only return one record and not each first record order by dateadded desc per bu
What would the correct way be of achieving this?
You can use row_number() function with ties clause :
select top (1) with ties mnt.DateAdded, bu.BuId
from BusinessUnit bu inner join
MagicNumbersTable mnt
on mnt.BuId = bu.BuId
order by row_number() over (partition by bu.BuId order by mnt.DateAdded desc);
However, you can simply use GROUP BY clasue with MAX(), but assuming you want some more info with this. If not, then GROUP BYclause is enough to implement not need to use subuqery or row_number()
From my understanding, I wrote this. What you think about this.
create table #maxDate (date datetime, id int)
insert #maxDate values
( '2018-06-26 22:54:51.190', 1)
, ( '2018-07-05 10:36:49.563', 1)
, ( '2018-07-16 10:14:04.093', 1)
, ( '2018-07-17 15:24:22.173', 1)
, ( '2018-07-19 10:40:24.700', 1)
, ( '2018-07-23 09:53:34.607', 13)
, ( '2018-07-23 09:53:57.107', 13)
, ( '2018-09-04 14:55:04.860', 4)
, ( '2018-09-04 14:56:34.147', 4)
select id, max(date) date from #maxDate
group by id
id date
-------------------------------
1 2018-07-19 10:40:24.700
4 2018-09-04 14:56:34.147
13 2018-07-23 09:53:57.107
Revert me, if query needs updates.

How to select the first row for every sub group using a custom order?

Having a Table Person
and a Table PersonRecord
I need to select only one record for each person, the record with the max status.
The status are ordered by C > B > A, a person can have multiple records with different or the same status, I need always select the greater status or the first (if the person have records with the same status).
I make the following query to get the rows ordered
select ep.personid, ep.persondesc, records.veryimportantcode, records.status
from extperson ep
left join
(
select rownum as rn, v.* from
(
select pr.personid, pr.veryimportantcode, pr.status
from personrecord pr
group by pr.personid, pr.veryimportantcode, pr.status
order by pr.personid,
decode(pr.status,
'C', 1,'B', 2,'A', 3,
4)
) v
) records
on ep.personid = records.personid
it give me:
I need
|PERSONID |PERSONDESC|VERYIMPORTANTCODE |STATUS |
|00325465 |Bjork |(null) |(null) |
|00527513 |Paul |ZP-2143540 |A |
|00542369 |Hazard |ZH-7531594 |C |
|0324567 |Jhon |ZJ-2346570 |B |
I try to achieve this using an aditional materialized subquery where I count the number of repetitions and make a left join with a where (subquerymat.nrorepeat > 1 and rownum = 1) or (subquerymat.nrorepeat = 1 or subquerymat.nrorepeat is null) but does not work.
There is one very important rule for this query, I would append this query in the right side of an union inside a view then I can't use stored procedures.
Try:
select personid, persondesc, veryimportantcode, status
from (select pe.personid,
pe.persondesc,
pr.veryimportantcode,
pr.status,
row_number() over(partition by pe.personid order by pr.status desc,
pr.autoid) as rn
from person pe
left join personrecord pr
on pe.personid = pr.personid)
where rn = 1
Fiddle test: http://sqlfiddle.com/#!4/25074/2/0

SQL to answer: which customers were active in a given month, based on activate/deactivate records

Given a table
custid | date | action
1 | 2011-04-01 | activate
1 | 2011-04-10 | deactivate
1 | 2011-05-02 | activate
2 | 2011-04-01 | activate
3 | 2011-03-01 | activate
3 | 2011-04-01 | deactivate
The database is PostgreSQL.
I want an SQL query to show customers that were active at any stage during May.
So, in the above, that would be 1 and 2.
I just can't get my head around the way to approach this. Any pointers?
update
Customer 2 was active during May, as he was activated Before May, and not Deactivated since he was Activated. As in, I'm alive this Month, but wasn't born this month, and I've not died.
select distinct custid
from MyTable
where action = 'active' and date >= '20110501' and date < '20110601'
This approach won't work, as it only shows activations during may, not 'actives'.
Note: This would be a starting point and only works for 2011.
Ignoring any lingering bugs, this code (for each customer) looks at 1) The customer's latest status update before may and 2) Did the customer become active during may?
SELECT
Distinct CustId
FROM
MyTable -- Start with the Main table
-- So, was this customer active at the start of may?
LEFT JOIN -- Find this customer's latest entry before May of This Year
(select
max(Date)
from
MyTable
where
Date < '2011-05-01') as CustMaxDate_PreMay on CustMaxDate_PreMay.CustID = MyTable.CustID
-- Return a record "1" here if the Customer was Active on this Date
LEFT JOIN
(select
1 as Bool,
date
from
MyTable
) as CustPreMay_Activated on CustPreMay_Activated.Date = CustMaxDate_PreMay.Date and CustPreMay_Activated.CustID = MyTable.CustID and CustPreMay_Activated = 'activated'
-- Fallback plan: If the user wasn't already active at the start of may, did they turn active during may? If so, return a record here "1"
LEFT JOIN
(select
1 as Bool
from
MyTable
where
Date <= '2011-05-01' and Date < '2011-06-01' and action = 'activated') as TurnedActiveInMay on TurnedActiveInMay .CustID = MyTable.CustID
-- The Magic: If CustPreMay_Activated is Null, then they were not active before May
-- If TurnedActiveInMay is also Null, they did not turn active in May either
WHERE
ISNULL(CustPreMay_Activated.Bool, ISNULL(TurnedActiveInMay.Bool, 0)) = 1
Note:
You might need replace the `FROM MyTable' with
From (Select distinct CustID from MyTable) as Customers
It is unclear to me just looking at this code whether or not it will A) be too slow or B) somehow cause dupes or problems due starting the FROM clause # MYTable which may contain many records per customer. The DISTINCT clause probably takes care of this, but figured I'd mention this workaround.
Finally, I'll leave it to you to make this work across different years.
Try this
select t2.custid from
(
-- select the most recent entry for each customer
select custid, date, action
from cust_table t1
where date = (select max(date)
from cust_table where custid = t1.custid)
) as t2
where t2.date < '2011-06-01'
-- where the most recent entry is in May or is an activate entry
-- assumes they have to have an activate entry before they get a deactivate entry
and (date > '2011-05-01' or [action] = 'activate')
In PostgreSQL 8.4+:
WITH ActivateDates AS (
SELECT
custid,
date,
ROW_NUMBER() OVER (PARTITION BY custid ORDER BY date) AS rownum
FROM atable
WHERE action = 'activate'
),
DeactivateDates AS (
SELECT
custid,
date,
ROW_NUMBER() OVER (PARTITION BY custid ORDER BY date) AS rownum
FROM atable
WHERE action = 'deactivate'
),
ActiveRanges AS (
SELECT
a.custid,
a.date AS activated,
COALESCE(b.date, '21000101'::date) AS deactivated
FROM ActivateDates a
LEFT JOIN DeactivateDates d ON a.custid = d.custid AND a.rownum = d.rownum
)
SELECT DISTINCT custid
FROM ActiveRanges
WHERE deactivated > '20110501'
AND activated < '20110601'