Deleting record in SQL depending on next record - sql

I have records with columns: ID, Time_End and Attribute.
I need to delete all records,
WHERE Time_End = '1990-01-01 00:00:00.000' AND Attribute <> '9'
but only:
if the next row does not have the same attribute number
or
the next row has the same attribute number and a Time_End value of 1990-01-01 00:00:00.000
For example:
ID Time_End Attribute
---------------------------------------------
235 1990-01-01 00:00:00.000 5 /delete
236 1990-01-01 00:00:00.000 5 /delete
237 1990-01-01 00:00:00.000 5
238 2016-10-10 23:45:40.000 5
ID Time_End Attribute
---------------------------------------------
312 1990-01-01 00:00:00.000 8 /delete
313 2016-01-09 18:00:00.000 6
314 1990-01-01 00:00:00.000 4 /delete
315 1990-01-01 00:00:00.000 7
316 2016-10-10 23:45:40.000 7
Our customer have 50 database tables with thousands of records in every table (and of course more columns, I mentioned only those, which have impact on solution). Records are send in to the database from PLC, but sometimes (we don't know why) PLC send also wrong records.
So what I need is a query which finds those wrong records and deletes them. :)
Anybody who knows how the SQL code should look like?

Please see my SQL below. First, we collect ids to delete using two window functions (LEAD) to get the next row needed data. Then, with all needed data computed, apply the evaluation rules proposed by the OP. Last, use the obtained ids to delete the affected records of the tablet by id with an in clause.
DELETE toDeleteTable
WHERE toDeleteTable.id IN (WITH dataSet
AS (SELECT toDeleteTable.id,
toDeleteTable.time_end,
toDeleteTable.attribute,
LEAD(toDeleteTable.time_end,1,0) OVER (ORDER BY toDeleteTable.id) AS next_time_end,
LEAD(toDeleteTable.attribute,1,0) OVER (ORDER BY toDeleteTable.id) AS next_attribute
FROM toDeleteTable)
SELECT dataSet.id
FROM dataSet
WHERE dataSet.time_end = '1990-01-01 00:00:00.000'
AND dataSet.attribute <> '9'
AND ( (dataSet.next_attribute = dataSet.attribute AND dataSet.next_time_end = '1990-01-01 00:00:00.000')
OR dataSet.next_attribute <> dataSet.attribute)
)

You can accomplish this with a simple apply join. The below should give you enough to make this work for your needs without doing anything complex:
declare #t table(ID int
,Time_End datetime
,Attribute int
);
insert into #t values(235,'1990-01-01 00:00:00.000',5),(236,'1990-01-01 00:00:00.000',5),(237,'1990-01-01 00:00:00.000',5),(238,'2016-10-10 23:45:40.000',5),(312,'1990-01-01 00:00:00.000',8),(313,'2016-01-09 18:00:00.000',6),(314,'1990-01-01 00:00:00.000',4),(315,'1990-01-01 00:00:00.000',7),(316,'2016-10-10 23:45:40.000',7);
select t.*
,tm.*
from #t t
outer apply (select top 1 tt.Time_End
,tt.Attribute
from #t tt
where t.ID < tt.ID
order by tt.ID
) tm
where t.Attribute <> tm.Attribute
or (t.Attribute = tm.Attribute
and tm.Time_End = '1990-01-01 00:00:00.000'
);

I think you can use ROW_NUMBER() like this:
;WITH t AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Time_End ORDER BY ID DESC) AS seq
FROM yourTable
WHERE Attribute <> '9'
AND Time_End = CAST('1990-01-01 00:00:00.000' as datetime)
)
DELETE FROM t
WHERE seq > 1;
Not Tested - HTH ;).

Related

Combine multiple rows of data into one (Start and End time)

I'm currently receiving notifications of when a device is switched on, and another when the device is switched off. These are currently showing in separate rows, however I'd like to combine the one/off record of each instance into one row
The data is entering as below:
ObjectID On/OffID Msgtime
100 1 2022-04-15 10:01:00
1472 1 2022-04-15 10:04:00
100 0 2022-04-15 11:35:00
100 1 2022-04-15 12:00:00
1472 0 2022-04-15 15:00:00
I'd like to have it showing as below:
ObjectID OnTime OffTime
100 2022-04-15 10:01:00 2022-04-15 11:35:00
1472 2022-04-15 10:04:00 2022-04-15 15:00:00
100 2022-04-15 12:00:00 -
Maybe a group by query like below on row_number column
see fiddle link
select objectID,
min(msgTime) as OnTime,
case
when min(MsgTime) <>max(MsgTime)
then max(MsgTime) else NULL
end as OffTime
from
(
select *,
row_number() over (partition by ObjectID order by MsgTime asc)+1 as r
from T
)T
group by objectID, r/2
order by Objectid, r/2
This query would return all 'going to on state' rows, and for each one it finds the nearest 'going to off state' row, if exists (LEFT JOIN)
select
ontm.ObjectID, ontm.MsgTime as OnTime, offtm.MsgTime as OffTime
from yourtable ontm
left join
yourtable offtm
on ontm.ObjectId=offtm.ObjectId
and offtm.onoffid = 0
and ontm.MsgTime <= offtm.MsgTime
and not exists (select 1
from yourtable mdle
where mdle.ObjectId=offtm.ObjectId
and mdle.MsgTime < offtm.MsgTime
and ontm.MsgTime < mdle.MsgTime
)
where ontm.onoffid = 1
Explanation:
We first select all 'going to on' rows; these are the ones we want a result row. We then find all 'matching', i.e. future, 'going to off' state records for the same objectid (we use LEFT JOIN to make sure that if the objectId was left ON we still show it). This would match all future 'going to off' rows for the object, so we need something to make sure that only the earliest one matches our ON row; we do this by making sure that for any candidate OFF row, there is no other future, earlier OFF row: NOT EXISTS.

Select most recent InstanceID base on max end date

I am trying to pull the memberinstance from a table based on the max DateEnd. If it is Null I want to pull that as it would be still ongoing. I am using sql server.
select memberinstanceid
from table
group by memberid
having MAX(ISNULL(date_end, '2099-12-31'))
This query above doesnt work for me. I have tried different ones and have gotten it to return the separate instances, but not just the one with the max date.
Below is what my table looks like.
MemberID MemberInstanceID DateStart DateEnd
2 abc12 2013-01-01 2013-12-31
4 abc21 2010-01-01 2013-12-31
2 abc10 2015-01-01 NULL
4 abc19 2014-01-01 2014-10-31
I would expect my results to look like this
MemberInstanceID
abc10
abc19
I have been trying to figure out how to do this but have not had much luck. Any help would be much appreciated. Thanks
I think you need something like the following:
select MemberID, MemberInstanceID
from table t
where (
-- DateEnd is null...
DateEnd is null
or (
-- ...or pick the latest DateEnd for this member...
DateEnd = (
select max(DateEnd)
from table
where MemberID = t.MemberID
)
-- ... and check there's not a NULL entry for DateEnd for this member
and not exists (
select 1
from table
where MemberID = t.MemberID
and DateEnd is null
)
)
)
The problem with this approach would be if there are multiple rows that match for each member, i.e. multiple NULL rows with the same MemberID, or multiple rows with the same DateEnd for the same MemberID.
SELECT TOP 1 memberinstanceid
from table
ORDER BY (CASE WHEN [DateEnd] IS NULL THEN 1 ELSE 0 END) DESC,
[DateEnd] DESC
The ORDER BY is essentially creating a "column" to sort the NULL values to the top, then doing a secondary sort on the dates that are not null.
You have a good start but you don't need to perform any explicit grouping. What you want is the row where the EndDate is null or is the largest value (latest date) of all the records with the same MemberID. You also realized that the Max couldn't return the latest non-null date because the null, if one exists, must be the latest date.
select m.*
from Members m
where m.DateEnd is null
or m.DateEnd =(
select Max( IsNull( DateEnd, '9999-12-31' ))
from Members
where MemberID = m.MemberID );

Detect Anomaly Intervals with SQL

My problem is simple: I have a table with a series of statuses and timestamps (for the sake of curiosity, these statuses indicate alarm levels) and I would like to query this table in order to get duration between two statuses.
Seems simple, but here comes the tricky part: I canĀ“t create look-up tables, procedures and it should be as fast as possible as this table is a little monster holding over 1 billion records (no kidding!)...
The schema is drop dead simple:
[pk] Time
Value
(actualy, there is a second pk but it is useless for this)
And below a real world example:
Timestamp Status
2013-1-1 00:00:00 1
2013-1-1 00:00:05 2
2013-1-1 00:00:10 2
2013-1-1 00:00:15 2
2013-1-1 00:00:20 0
2013-1-1 00:00:25 1
2013-1-1 00:00:30 2
2013-1-1 00:00:35 2
2013-1-1 00:00:40 0
The output, considering only a level 2 alarm, should be as follow should report the begin of a level 2 alarm an its end (when reach 0):
StartTime EndTime Interval
2013-1-1 00:00:05 2013-1-1 00:00:20 15
2013-1-1 00:00:30 2013-1-1 00:00:40 10
I have been trying all sorts of inner joins, but all of them lead me to an amazing Cartesian explosion. Can you guys help me figure out a way to accomplish this?
Thanks!
This has to be one of the harder questions I've seen today - thanks! I assume you can use CTEs? If so, try something like this:
;WITH Filtered
AS
(
SELECT ROW_NUMBER() OVER (ORDER BY dateField) RN, dateField, Status
FROM Test
)
SELECT F1.RN, F3.MinRN,
F1.dateField StartDate,
F2.dateField Enddate
FROM Filtered F1, Filtered F2, (
SELECT F1a.RN, MIN(F3a.RN) as MinRN
FROM Filtered F1a
JOIN Filtered F2a ON F1a.RN = F2a.RN+1 AND F1a.Status = 2 AND F2a.Status <> 2
JOIN Filtered F3a ON F1a.RN < F3a.RN AND F3a.Status <> 2
GROUP BY F1a.RN ) F3
WHERE F1.RN = F3.RN AND F2.RN = F3.MinRN
And the Fiddle. I didn't add the intervals, but I imagine you can handle that part from here.
Good luck.
Finally figured out a version I was happy with. It took me remembering an answer from another question (can't remember which one though) where it was pointed out that the difference between two (increasing) sequences was always a constant.
WITH Ordered (occurredAt, status, row, grp)
as (SELECT occurredAt, status,
ROW_NUMBER() OVER (ORDER BY occurredat),
ROW_NUMBER() OVER (PARTITION BY status
ORDER BY occurredat)
FROM Alert)
SELECT Event.startDate, Ending.occurredAt as endDate,
DATEDIFF(second, Event.startDate, Ending.occurredAt) as interval
FROM (SELECT MIN(occurredAt) as startDate, MAX(row) as ending
FROM Ordered
WHERE status = 2
GROUP BY row - grp) Event
LEFT JOIN (SELECT occurredAt, row
FROM Ordered
WHERE status != 2) Ending
ON Event.ending + 1 = Ending.row
(working SQL Fiddle example, with some additional data rows for work checking).
This unfortunately doesn't correctly deal with level-2 statuses that are end rows (behavior unspecified), although it does list them.
Just for the sake of having an alternative. Tried to do some test on performance, but did not finish.
SELECT
MIN([main].[Start]) AS [Start],
[main].[End],
DATEDIFF(s, MIN([main].[Start]), [main].[End]) AS [Seconds]
FROM
(
SELECT
[sub].[Start],
MIN([sub].[End]) AS [End]
FROM
(
SELECT
[start].[Timestamp] AS [Start],
[start].[Status] AS [StartingStatus],
[end].[Timestamp] AS [End],
[end].[Status] AS [EndingStatus]
FROM [Alerts] [start], [Alerts] [end]
WHERE [start].[Status] = 2
AND [start].[Timestamp] < [end].[Timestamp]
AND [start].[Status] <> [end].[Status]
) AS [sub]
GROUP BY
[sub].[Start],
[sub].[StartingStatus]
) AS [main]
GROUP BY
[main].[End]
And here is a Fiddle.
I do something similar by using id that is an identity to the table.
create table test(id int primary key identity(1,1),timstamp datetime,val int)
insert into test(timstamp,val) Values('1/1/2013 00:00:00',1)
insert into test(timstamp,val) Values('1/1/2013 00:00:05',2)
insert into test(timstamp,val) Values('1/1/2013 00:00:25',1)
insert into test(timstamp,val) Values('1/1/2013 00:00:30',2)
insert into test(timstamp,val) Values('1/1/2013 00:00:35',1)
select t1.timstamp,t1.val,DATEDIFF(s,t1.timstamp,t2.timstamp)
from test t1 left join test t2 on t1.id=t2.id-1
drop table test
I would also make the timestamps be seconds since 1980 or 2000 or whatever. But then you might not want to do the reverse conversion all the time and so it depends on how often you use the actual time stamp.

Recursive CTE - consolidate start and end dates

I have the following table:
row_num customer_status effective_from_datetime
------- ------------------ -----------------------
1 Active 2011-01-01
2 Active 2011-01-02
3 Active 2011-01-03
4 Suspended 2011-01-04
5 Suspended 2011-01-05
6 Active 2011-01-06
And am trying to achieve the following result whereby consecutive rows with the same status are merged into one row with an effective from and to date range:
customer_status effective_from_datetime effective_to_datetime
--------------- ----------------------- ---------------------
Active 2011-01-01 2011-01-04
Suspended 2011-01-04 2011-01-06
Active 2011-01-06 NULL
I can get a recursive CTE to output the correct effective_to_datetime based on the next row, but am having trouble merging the ranges.
Code to generate sample data:
CREATE TABLE #temp
(
row_num INT IDENTITY(1,1),
customer_status VARCHAR(10),
effective_from_datetime DATE
)
INSERT INTO #temp
VALUES
('Active','2011-01-01')
,('Active','2011-01-02')
,('Active','2011-01-03')
,('Suspended','2011-01-04')
,('Suspended','2011-01-05')
,('Active','2011-01-06')
EDIT SQL updated as per comment.
WITH
group_assigned_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY customer_status ORDER BY effective_from_date) AS status_sequence_id,
ROW_NUMBER() OVER ( ORDER BY effective_from_date) AS sequence_id,
customer_status,
effective_from_date
FROM
your_table
)
,
grouped_data AS
(
SELECT
customer_status,
MIN(effective_from_date) AS min_effective_from_date,
MAX(effective_from_date) AS max_effective_from_date
FROM
group_assigned_data
GROUP BY
customer_status,
sequence_id - status_sequence_id
)
SELECT
[current].customer_status,
[current].min_effective_from_date AS effective_from,
[next].min_effective_from_date AS effective_to
FROM
grouped_data AS [current]
LEFT JOIN
grouped_data AS [next]
ON [current].max_effective_from_date = [next].min_effective_from_date + 1
ORDER BY
[current].min_effective_from_date
This isn't recursive, but that's possibly a good thing.
It doesn't deal with gaps in your data. To deal with that you could create a calendar table, with every relevant date, and join on that to fill missing dates with 'unknown' status, and then run the query against that. (Infact you cate do it it a CTE that is used by the CTE above).
At present...
- If row 2 was missing, it would not change the result
- If row 3 was missing, the end_date of the first row would change
Different behaviour can be determined by preparing your data, or other methods. We'd need to know the business logic you need though.
If any one date can have multiple status entries, you need to define what logic you want it to follow. At present the behaviour is undefined, but you could correct that as simply as adding customer_status to the ORDER BY portions of ROW_NUMBER().

Get a single max date if dates are not unique

For sql 2000,
Very similar to what I asked here
Get distinct max date using SQL
But this time the dates aren't unique so for this table pc_bsprdt_tbl
pc_bsprhd_key pc_bsprdt_shpiadt pc_bsprdt_prod
21ST 99-00 2001-04-30 23:59:59.000 72608-12895
21ST 99-00 2001-04-30 23:59:59.000 72608-12910
AFCC990915 1999-09-01 00:00:00.000 72608-12115
AFCC990915 1999-09-01 00:00:00.000 CHU99-01514
AFCC990915 1999-09-01 00:00:00.000 POP99-01514
I would like returned
21ST 99-00 2001-04-30 23:59:59.000
AFCC990915 1999-09-01 00:00:00.000
Now, the pc_bsprdt_prod is unique so what I have tried is using the max for the product like this to give me uniqueness.
Select T.pc_bsprhd_key, T.pc_bsprdt_shpiadt
From pc_bsprdt_tbl As T
Join (
Select pc_bsprhd_key, Max( T1.pc_bsprdt_shpiadt ) As MaxDateTime, Max(pc_bsprdt_prod) as Product
From pc_bsprdt_tbl As T1
Group By T1.pc_bsprhd_key
) As Z
On Z.pc_bsprhd_key = T.pc_bsprhd_key
And Z.MaxDateTime = T.pc_bsprdt_shpiadt
AND Z.Product = T.pc_bsprdt_prod
It seems like it works :)
Is there a way to do it though just using the date? Maybe a top 1 in there somewhere?
SELECT pc_bsprhd_key, MAX(pc_bsprdt_shpiadt)
FROM pc_bsprdt_tbl
GROUP BY pc_bsprhd_key;
That might not be working as you think it is. That will give you the MAX(Date) and MAX(prod) which might not be on the same row. Here is an example:
CREATE TABLE #Test
(
a int,
b date,
c int,
)
INSERT INTO #Test(a, b, c)
SELECT 1, '01/01/2010', 3 UNION ALL
SELECT 1, '01/02/2010', 2 UNION ALL
SELECT 1, '01/03/2010', 1 UNION ALL
SELECT 2, '01/01/2010', 1
SELECT a, MAX(b), MAX(c) FROM #TEST
GROUP BY a
Which will return
----------- ---------- -----------
1 2010-01-03 3
2 2010-01-01 1
Notice that 1/03/2010 and 3 are not in the same row. In this situation I don't think it matters to you, but just a heads up.
As for the actual question- in SQL2005 we would probably apply a ROW_NUMBER over the groups to get the row with the latest date for each part, however you don't have access to this feature in 2000. If the above is giving you correct results I'd say use it.