SQL Server delete a specific row - sql

I have a SQL question which is to delete some rows from a table. The structure of the table is like a paired rows. It can be expressed in the following SQL:
create table #test
(
col1 int, col2 int, col3 int, id char(1), dtime datetime
)
insert into #test
values
(1,1,1,'a','2015-02-01 1:00:00')
,(1,1,1,'b','2015-02-01 1:00:01')
,(2,1,1,'a','2015-02-01 1:00:00')
,(2,1,1,'b','2015-02-01 1:00:01')
,(3,1,3,'b','2015-02-01 1:00:00') -- Remove this row
,(3,1,3,'a','2015-02-01 1:00:03')
,(3,1,3,'b','2015-02-01 1:00:04')
,(4,2,1,'a','2015-02-01 3:00:00')
,(4,2,1,'b','2015-02-01 3:00:01')
,(5,3,1,'a','2015-02-01 4:00:00')
,(5,3,1,'b','2015-02-01 4:00:01')
,(5,6,3,'b','2015-02-01 4:00:00') -- Remove this row
,(5,6,3,'a','2015-02-01 4:00:03')
,(5,6,3,'b','2015-02-01 4:00:04')
select *
from #test
order by col1,col2,col3
drop table #test
Sorry, I have to make it clear. This question is from a real dataflow. The data is about a workflow steps. It has a start time, and a complete time. Each step might have multiple rows (because the step is called multiple times). When I choose a begin time and end time to get the dataflow, you can expect some steps is cut at the complete time instead of the start time which I want.
The query is to remove the unpaired rows that begin with complete time.
As you seen, every two rows should consist of a column of 'a' and 'b', and start with 'a' -- the start time. But those two rows to be deleted (actually we do not know how many they are) starts with 'b'-- the complete time.

Having a primary key makes deletions much easier. Adding one would be the ideal solution.
Without a primary key or some other unique constraint, there could be duplicate rows. The datetime column does not guarantee that the data is unique.
If there are duplicates, do you want all duplicate rows deleted? If so, you can delete them specifying all of the columns:
delete from #Test
where col 1 = 3
and col2 = 1
and col3 = 3
and id = 'b'
and dtime = '2015-02-01 1:00:00'
delete from #Test
where col 1 = 5
and col2 = 6
and col3 = 3
and id = 'b'
and dtime = 2015-02-01 4:00:00'
If you want all but one of the potential duplicates removed, you would have to number them and delete all matching rows after the first row.

You can not delete a specific row with non-unique values.
So, you have to ad an id-column (primary key!)

As said, if there's not a primary key set, you have to tell it every value that makes it different from other. In this case:
DELETE FROM #test WHERE dtime ='2015-02-01 1:00:00' AND id = 'b' AND col1 = 3 AND col2 = 1 AND col3 = 3
But I warn you this isn't a good practice. You should set a primary key as you've already told.

WITH Ordered AS
(
SELECT Col1, col2, col3, id, dtime,
ROW_NUMBER() OVER(PARTITION BY col1, col2, col3, id ORDER BY dtime DESC) AS Pos
FROM #test
)
--SELECT a.*, b.Pos
DELETE a
FROM #test AS a
INNER JOIN Ordered AS b ON a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 = b.col3
AND a.ID = b.ID AND a.dtime = b.dtime
AND b.Pos <> 1
This will remove all but the most recent of each duplicate.

Related

SQL Query: Modify Next Row Based On Current Row

Using SQL Server 2012.
The table I'm working with is shown below, along with code for the query. When I find ReasonString_S = 'Fault Reset' I would like to add that rows DurationMinutes_D to the next row and delete the current row.
I think a case when statement would work but I kept getting syntax issues and I'm fairly new to sql queries.
select ROW_NUMBER() OVER (ORDER BY EntryDate_T) AS Row, e.equip_name,
ReasonString_S, DurationMinutes_D, rt.Name_S ProcStateString, EntryDate_T
into #temptable
from AT_PM_PlantStateEvent pse
inner join EQUIPMENT e on pse.OwnerKey_I = e.equip_key
inner join dbo.AT_PM_ReasonTree rt on pse.ReasonKey_64 = rt.atr_key
where EntryDate_T >= #jobstart and EntryDate_T < #jobend
and rt.Name_S <> 'Running'
and e.equip_name = #mach
Thanks for the help!
It seems Row is your identity column if not then don't hesitate to create or use another existing one. It has many benefits as it's very useful in your case you can perform the desired operation with the help of that simply using JOIN as below:
create table #test(rowNum int identity(1,1),
ReasonString_S varchar(50),
DurationMinutes_D float)
insert into #test values
('Model1',0.34),
('Model2',0.35),
('Model3',0.36)
DATA:
rowNum ReasonString_S DurationMinutes_D
-----------------------------------------
1 Model1 0.34
2 Model2 0.35
3 Model3 0.36
update t set t.ReasonString_S = u.DurationMinutes_D
from #test t
left join #test u on u.rowNum = t.rowNum+1
OUTPUT:
rowNum ReasonString_S DurationMinutes_D
-----------------------------------------
1 0.35 0.34
2 0.36 0.35
3 NULL 0.36
Here's a general answer. To update a row based on the value in the next row, you should use the LEAD function. First you need to identify which column you want to sort by in order to determine which row is the next row. LEAD lets you get the value of a particular column from the next row. If you write a subquery using the LEAD function (which can only appear in the SELECT or ORDER BY clause), you can update by joining to the subquery.
Conversely, to update a row based on the value in the previous row, you'd use the LAG function.
Here's an example:
declare #t1 table (id int, val varchar(10))
insert into #t1 values (1, 'val1')
insert into #t1 values (2, 'val2')
insert into #t1 values (3, 'val3')
update #t1 set val = sq.next_row_val
from #t1 t1
inner join (
select t1.id, LEAD(t1.val) over (order by t1.id) as next_row_val
from #t1 t1
)sq on t1.id = sq.id
select * from #t1
id val
1 val2
2 val3
3 NULL

insert into fails to insert selected data

I have been working on breaking up a 68GB table into a more normalized structure for the last few weeks, and everything has bee going smoothly until today.
I am attempting to move a select few columns from the big table into the new table with this query:
insert into [destination] (col1, col2, col3...)
select col1, col2, col3
From [source]
where companyID = [source].companyID
I receive the message, (60113678 row(s) affected), but the data was not inserted into the destination, and the data in the source table hasn't been altered, so what has been affected, and why wasn't any of the data inserted into the destination?
The code you seem to want to execute is:
update d
set col1 = s.col1,
col2 = s.col2,
col3 = s.col3
from destination d join
sources s
on s.companyID = s.companyId;
The code you have written is equivalent to:
insert into [destination] (col1, col2, col3...)
select s.col1, s.col2, s.col3
From [source]
where s.companyID = s.companyID;
The where is equivalent to s.companyID is not null. Hence, you have inserted all 60,113,678 rows from source into new rows in destination.
Obviously, one moral of the story is to understand the difference between insert and update. More importantly, qualify all columns names in a query. If you had done so, your query would have have failed at source.CompanyID = destination.CompanyId -- and you wouldn't have to figure out how to delete 60,113,678 new rows.

updating a table by moving data from one row to another multiple times

have an sql table and I need to take row 2 and move data in column 4 & 5 and move it to row 1. The values in column 4 & 5 are dynamic (ie not the same each time)
background on the table. every 8 rows are grouped together (by an action_id ascending) to another entity in another table (but the other table isnt needed for this task)
for example
action_id = 839283 col 4 = space col 5 = space
action_id = 839284 col 4 = SMS col 5 = L1
i need to move the SMS & L1 to the row above and blank out the row where action_id = 839284
this will repeat multiple times.
I was thinking of creating a select into a temp table to get the rows i need to change (using 2 other tables as links) but i caon't work out how to move data from the one row to the other dynamically
Assumptions:
sql server
action_id values are always one apart
3 stpes:
select action_id as To_Empty
into #EmptyMe
from MyTable
where col4 is not null
with CTE as
(
select action_id,
col4,
col5,
lag(col4) over (order by action_id) as n_col4,
lag(col5) over (order by action_id) as n_col5
from MyTable
where col4 is not null
)
update CTE
set col4 = n_col4, col5 = n_col5
update MyTable
set col4 = null, col5 = null
where exists (select 1 from #EmptyMe where action_id = To_Empty)

SQL Server Merge - Output returning null deleted rows as inserted

I'm using MERGE to sync data in a table, and I'm seeing some wrong (to me) behavior from SQL Server. When I OUTPUT the INSERTED.* values, and a row was deleted, the MERGE command returns a row with all NULL columns for each row that was deleted.
For example, take this schema:
CREATE TABLE tbl
(
col1 INT NOT NULL,
col2 INT NOT NULL
);
I do an initial load of data, and all 4 rows are outputted as expected.
WITH data1 AS (
SELECT 1 [col1],1 [col2]
UNION ALL SELECT 2 [col1],2 [col2]
UNION ALL SELECT 3 [col1],3 [col2]
UNION ALL SELECT 4 [col1],4 [col2]
)
MERGE tbl t
USING data1 s
ON t.col1 = s.col1 AND t.col2 = s.col2
WHEN NOT MATCHED BY TARGET
THEN INSERT (col1,col2) VALUES (s.col1,s.col2)
WHEN NOT MATCHED BY SOURCE
THEN DELETE
OUTPUT INSERTED.*;
Now, say I remove 2 rows from the data I'm syncing with the table (in my CTE) and do the same MERGE, I see 2 rows of all NULL columns returned.
WITH data1 as (
SELECT 1 [col1],1 [col2]
UNION ALL SELECT 2 [col1],2 [col2]
)
MERGE tbl t
USING data1 s
ON t.col1 = s.col1 AND t.col2 = s.col2
WHEN NOT MATCHED BY TARGET
THEN INSERT (col1,col2) VALUES (s.col1,s.col2)
WHEN NOT MATCHED BY SOURCE
THEN DELETE
OUTPUT INSERTED.*;
To me, this seems like wrong behavior because A) I didn't as for any deleted rows and B) this makes it seem like I inserted these 2 NULL rows into my table, which I clearly did not. Can anyone shed some light on what's happening?
From the documentation:
output_clause - Returns a row for every row in target_table that is
updated, inserted, or deleted, in no particular order. $action can be
specified in the output clause. $action is a column of type
nvarchar(10) that returns one of three values for each row: 'INSERT',
'UPDATE', or 'DELETE', according to the action that was performed on
that row.
It seems that SQL Server is outputting one row for every row that changed (by an insert or delete). When I specify OUTPUT INSERTED.*, I'm really only specifying the inserted data, which is null for the 2 rows that were changed. If I specify OUTPUT INSERTED.col1 [InsCol1],INSERTED.col2 [InsCol2],DELETED.col1 [DelCol1],DELETED.col2 [DelCol2],$action then I can see a better picture of what's happening.
Thanks to Laurence for your comment.

updating changes rows

I have a requirement to update a couple of thousand rows in a table based on whether any changes have happened to any of the values. At the moment im just updating all the values regardless but was wondering what was more effecient. Should i check all the columns to see if there are any changes and update or should i just update regardless. e.g
update someTable Set
column1 = somevalue,
column2 = somevalue,
column3 = somevalue,
etc....
from someTable inner join sometable2 on
someTable.id = sometable2.id
where
someTable.column1 != sometable2.column1 or
someTable.column2 != sometable2.column2 or
someTable.column2 != sometable2.column2 or
etc etc......
Whats faster and whats best practice
See two articles on Paul White's Blog.
The Impact of Non-Updating Updates for discussion of the main issue.
Undocumented Query Plans: Equality Comparisons for a less tedious way of doing the inequality comparisons particularly if your columns are nullable (WHERE NOT EXISTS (SELECT someTable.* INTERSECT SELECT someTable2.*)).
I believe this is the best way.
Tables and data:
declare #someTable1 table(id int, column1 int, column2 varchar(2))
declare #someTable2 table(id int, column1 int, column2 varchar(2))
insert #someTable1
select 1,10 a, 'a3'
union all select 2,20 , 'a3'
union all select 3,null, 'a4'
insert #someTable2
select 1,10, 'a3'
union all select 2,19, 'a3'
union all select 3,null, 'a5'
Update:
UPDATE t1
set t1.column1 = t2.column1,
t1.column2 = t2.column2
from #someTable1 t1
JOIN
(select * from #someTable2
EXCEPT
select * from #someTable1) t2
on t2.id = t1.id
Result:
select * from #someTable1
id a b
----------- -------- --
1 10 a3
2 19 a3
3 NULL a5
I've found that explicitly including the where clause the excludes no-op updates to perform faster, when working against large tables, but this is a very YMMV type of question.
If possible, compare the two approaches side by side, against a realistic set of data. E.g. if your tables contain millions of rows, and the updates affect only 10, make sure your sample data affects just a few rows. Or likewise, if it's likely that most rows will change, make your sample data reflect that.