How do I commit/execute deletion row by row in SQL (MSSQL) - sql

I have a simple table but a long one (few millions or rows).
The table contains many paired rows which I need to delete.
The row data is not distinct!
There are single rows (which has no a paired row)
The table pairs are defined by cross info in two columns concatenated to a 3rd column.
I would like to have only one row of each data identifier.
Therefore, I need the myTable to shrink immediately whereis a condition is met.
I tried:
myIndexColumn = Column1 + Column2 + Column3
myReversedIndexColumn = Column2 + Column1 + Column3
CREATE NONCLUSTERED INDEX myIndex1 ON myDB.dbo.myTable (
myIndexColumn ASC
)
CREATE NONCLUSTERED INDEX myIndex2 ON myDB.dbo.myTable (
myReversedIndexColumn ASC
)
DELETE FROM myDB.dbo.myTable
WHERE myIndexColumn in (SELECT myReversedIndex FROM myDB.dbo.myTable)
The problem is that both paired data is deleted instead of leaving one row of the data.
Obviously, that is because the DELETE commits changes only after running the entire transaction.
If I could persuid the MS SQL 2008 R2 Express edition to commit the DELETE upon condition is met, the SELECT clause would have output a shorter list on each row test to delete.
How do I do that?

To not delete the cases where column1 = column2
DELETE FROM myDB.dbo.myTable
WHERE myIndexColumn in (SELECT myReversedIndex FROM myDB.dbo.myTable)
AND column1 <> column2
To remove column1 = column2
;with cte as
(
select *,
row_number() over (
partition by Column1 + Column2 + Column3
order by (SELECT 1)
) rn
from yourtable
where column1 = column2
)
delete cte where rn > 1
The CTE can be used to delete all duplicates too
;with cte as
(
select *,
row_number() over (
partition by
CASE WHEN Column1 > Column2 THEN Column2 ELSE Column1 END +
CASE WHEN Column1 > Column2 THEN Column1 ELSE Column2 END +
Column3
order by (SELECT 1)
) rn
from yourtable
)
delete cte where rn > 1

Related

Delete oldest entries with two duplicate columns from a table - SQL

SELECT column1, column2, count(*) as duplicate
FROM table
GROUP BY column1, column2 HAVING count(*)> 1 ;
ID column1 column2 timestamp
abc 123 1 2020-02-03 19:36:27
xyz 123 1 2020-02-02 15:36:27
column1 and column2 is a unique combination with duplicate entry.
The above queries gives the entries that have duplicates. We want to delete the oldest entries based on another column timestamp
One method is:
delete from t
where t.timestamp > (select min(t2.timestamp)
from t t2
where t2.column1 = t.column1 and t2.column2 = t.column2
);
DELETE
FROM table a
JOIN (
SELECT id, row_number() OVER (PARTITION BY column1, column2 ORDER BY timestamp DESC) AS rownum
FROM table ) b
ON a.id = b.id
WHERE rownum > 1
You can use row_number function to get an ordered ranking of the results. Partitioning by column1 and column2 will restart the row number at each change in those values. Ordering by your timestamp descending will start your count with the newest record, so deleting anything where a rownum > 1 would keep only the newest record. If you needed something like a top 3, you would simply change the rownum > from 1 to 3.

Subtract 2 rows using case statement in SQL Server 2008

My data is like below, it's in a single table
Column1 Column2
abc 100
abc 200
Now I need like below
abc 100 //here 200-100
I am banging my head on how to achieve this.
I have tried to use the row_number and then subtract using case statement like
Select
column1,
sum(
case when rownum=1
then column2
end
-
case when rownum=2
then column2
end
)
from table
group by column1
But this is giving me null.
Assuming there is no attribute which can define row ordering -
;with cte as(
select
row_number() over (order by (select null)) as IndexId,
Column1,
Column2
from #xyz
)
select sum(case when IndexID=1 then (-1 * Column2) else Column2 end), Column1
from cte
group by Column1
Input data-
declare #xyz table(Column1 varchar(10),Column2 int)
insert into #xyz
select 'abc' ,100 union all
select 'abc' ,200
Assuming you have an attribute rownum in table which is always 1 or 2 (it can be generated by some row_number() as you suggest in question, according to any order that is suitable for you)
Column1 Column2 Rownum
------------------------
abc 100 1
abc 200 2
then you can simply use
Select
column1,
sum(
case when rownum=1
then column2
else -column2
end
)
from table
group by column1
It performs a sum of the Column2 per Column1, however, in the row having rownum = 2 the Column2 value is negated. Therefore in our example you end up with 100 + (-200) = -100
You could do:
select column1, max(column2) - min(column2)
from t
group by column1;
Here is a short form of the answer above if you care:
SELECT
column1,
SUM(IIF(rownum=1,column2,-column2))
FROM table
GROUP BY column1

Recursive Lag Column Calculation in SQL

I am trying to write a procedure that inserts calculated table data into another table.
The problem I have is that I need each row's calculated column to be influenced by the result of the previous row's calculated column. I tried to lag the calculation itself but this does not work!
Such as:
(Max is a function I created that returns the highest of two values)
Id Product Model Column1 Column2
1 A 1 5 =MAX(Column1*2, Lag(Column2))
2 A 2 2 =MAX(Column1*2, Lag(Column2))
3 B 1 3 =MAX(Column1*2, Lag(Column2))
If I try the above in SQL:
SELECT
Column1,
MyMAX(Column1,LAG(Column2, 1, 0) OVER (PARTITION BY Product ORDER BY Model ASC) As Column2
FROM Source
...it says column2 is unknown.
Output I get if I LAG the Column2 calculation:
Select Column1, MyMAX(Column1,LAG(Column1*2, 1, 0) OVER (PARTITION BY Product ORDER BY Model ASC) As Column2
Id Column1 Column2
1 5 10
2 2 10
3 3 6
Why 6 on row 3? Because 3*2 > 2*2.
Output that I want:
Id Column1 Column2
1 5 10
2 2 10
3 3 10
Why 10 on row 3? Because previous result of 10 > 3*2
The problem is I can't lag the result of Column2 - I can only lag other columns or calculations of them!
Is there a technique of achieving this with LAG or must I use Recursive CTE? I read that LAG succeeds CTE so I assumed it would be possible. If not, what would this 'CTE' look like?
Edit: Or alternatively - what else could I do to resolve this calculation?
Edit
In hindsight, this problem is a running partitioned maximum over Column1 * 2. It can be done as simply as
SELECT Id, Column1, Model, Product,
MAX(Column1 * 2) OVER (Partition BY Model, Product Order BY ID ASC) AS Column2
FROM Table1;
Fiddle
Original Answer
Here's a way to do this with a recursive CTE, without LAG at all, by joining on incrementing row numbers. I haven't assumed that your Id is contiguous, hence have added an additional ROW_NUMBER(). You haven't mentioned any partitioning, so haven't applied same. The query simply starts at the first row, and then projects the greater of the current Column1 * 2, or the preceding Column2
WITH IncrementingRowNums AS
(
SELECT Id, Column1, Column1 * 2 AS Column2,
ROW_NUMBER() OVER (Order BY ID ASC) AS RowNum
FROM Table1
),
lagged AS
(
SELECT Id, Column1, Column2, RowNum
FROM IncrementingRowNums
WHERE RowNum = 1
UNION ALL
SELECT i.Id, i.Column1,
CASE WHEN (i.Column2 > l.Column2)
THEN i.Column2
ELSE l.Column2
END,
i.RowNum
FROM IncrementingRowNums i
INNER JOIN lagged l
ON i.RowNum = l.RowNum + 1
)
SELECT Id, Column1, Column2
FROM lagged;
SqlFiddle here
Edit, Re Partitions
Partitioning is much the same, by just dragging the Model + Product columns through, then partitioning by these in the row numbering (i.e. starting back at 1 each time the Product or Model resets), including these in the CTE JOIN condition and also in the final ordering.
WITH IncrementingRowNums AS
(
SELECT Id, Column1, Column1 * 2 AS Column2, Model, Product,
ROW_NUMBER() OVER (Partition BY Model, Product Order BY ID ASC) AS RowNum
FROM Table1
),
lagged AS
(
SELECT Id, Column1, Column2, Model, Product, RowNum
FROM IncrementingRowNums
WHERE RowNum = 1
UNION ALL
SELECT i.Id, i.Column1,
CASE WHEN (i.Column2 > l.Column2)
THEN i.Column2
ELSE l.Column2
END,
i.Model, i.Product,
i.RowNum
FROM IncrementingRowNums i
INNER JOIN lagged l
ON i.RowNum = l.RowNum + 1
AND i.Model = l.Model AND i.Product = l.Product
)
SELECT Id, Column1, Column2, Model, Product
FROM lagged
ORDER BY Model, Product, Id;
Updated Fiddle

Sampling unique set of records in Oracle table

I have an Oracle table that from which I need to select a given percentage of records for each type of a given set of unique column combination.
For example,
SELECT distinct column1, column2, Column3 from TableX;
provides me all the combination of unique records from that table. I need a % of each rows from each such combination. Currently I am using the following query to accomplish this, which is lengthy and slow.
SELECT *
FROM tableX Sample ( 3 )
WHERE Column1 = ‘value1’ and
Column2 = ‘value2’ and
Column3 = ‘value3
UNION
SELECT *
FROM tableX Sample ( 3 )
WHERE Column1 = ‘value1’ and
Column2 = ‘value2’ and
Column3 = ‘value4
UNION
…
…
SELECT *
FROM tableX Sample ( 3 )
WHERE Column1 = ‘valueP’ and
Column2 = ‘valueQ’ and
Column3 = ‘valueR’
Where the combination of suffix in the “Value” is unique for that table (obtained from the first query)
How can I improve the length of the query and speed?
Here is one approach:
select t.*
from (select t.*,
row_number() over (partition by column1, column2, column3 order by dbms_random()
) as seqnum,
count(*) over (partition by column1, column2, column3) as totcnt
from tablex t
) t
where seqnum / totcnt <= 0.10 -- or whatever your threshold is
It uses row_number() to assign a sequential number to rows in each group, in a random order. The where clause chooses the proportion that you want.

Delete reverse duplicate rows using sql

column1 column2
x y
y x
how does one go about eliminating such duplicates? or at worst selecting just one of those tuples?
It's become kind of a mainstream habit among question askers to withhold the information which RDBMS we are dealing with. In response: this is tested and works with a certain RDBMS I am not inclined to name. Go figure!
DELETE FROM tbl a
USING tbl b
WHERE (a.x, a.y) = (b.y, b.x)
AND a.y > a.x -- keep the one dupe with the biggest x
Assuming there are no dupes with x = y. This would be an ordinary duplicate anyway.
One approach is to identify only the valid rows e.g.
SELECT column1, column2
FROM T
WHERE column1 <= column2
UNION
SELECT column2 AS column1, column1 AS column2
FROM T
WHERE column1 > column2;
...then delete rows that aren't in the set of valid rows:
DELETE
FROM T
WHERE NOT EXISTS (
SELECT *
FROM (
SELECT column1, column2
FROM T
WHERE column1 <= column2
UNION
SELECT column2 AS column1, column1 AS column2
FROM T
WHERE column1 > column2
) AS DT1
WHERE DT1.column1 = T.column1
AND DT1.column2 = T.column2
);
Alternatively, the DELETE may be simplified to target only the invalid rows:
DELETE
FROM T
WHERE column1 > column2
AND EXISTS (
SELECT *
FROM T AS T1
WHERE T1.column1 = T.column2
AND T1.column2 = T.column1
);