about CTE and physically deleting records from table - sql

For deleting duplicate records I have found below query on stack overflow which works fine.In this query we are deleting records from "a" and not from tblEmployee.So my question is how duplicate records get physically deleted from physical table though we don't have any unique or primary key.
WITH a as (
SELECT Firstname,ROW_NUMBER() OVER(PARTITION by Firstname, empID ORDER BY Firstname)
AS duplicateRecCount
FROM dbo.tblEmployee
)
--Now Delete Duplicate Records
DELETE
FROM a
WHERE duplicateRecCount > 1

Inorder to understand this, let's consider one of the differences between temp tables and CTE.
When we use Temporary tables, this temp table will be saved in a Tempdb database. So, it is just a copy of your table tblEmployee. No matter what changes you make to temp table, it won't affect tblEmployee.
But, when you use cte, it is actually pointing to the same table itself. That is why, if you delete from cte, it will affect tblEmployee also.
CTE is nothing but a disposable view.

You appear to be using an updatable CTE in SQL Server. In this case, the CTE acts the same as a view.
The view is updatable because it refers to only one table and does not have aggregation. Hence, the effect of the CTE is merely to add columns to the table, tables which can be referenced in the DELETE statement.
The conditions for an updatable view are explained in the documentation. These conditions are the same as for your CTE.

Related

Remove the duplicates expect one record in SQL Server

Here I wanted to delete all records with value 1 and only keep a single record
Without knowing your DBMS it's really tough to know which query you need. If your dbms supports cte and row_number() then below query will work.
with cte as
(select *,row_number()over(order by column_1)rn from table_name)t
delete cte where rn>1
In SQL Server this will work fine.
Given the nature of your data, I would suggest removing all rows and adding a new one back in:
truncate table t;
insert into t(column_1)
values (1);
Be careful! The truncate table removes all rows from the table.

delete rows if duplicate exist

I want to delete duplicate rows in my table. It should keep the first one, or the one on top and if there is any other duplicates it should be deleted.
As the image shows there is two 12 and two 13. So keep the first 12 in the database and if there is any other delete them same goes for 13 or any ID.
My Idea:
DELETE from [Table]
WHERE [ID]
HAVING COUNT(TABLE.ID) > 1;
You could try using a CTE and ROW_NUMBER.
A common table expression (CTE) can be thought of as a temporary
result set that is defined within the execution scope of a single
SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is
similar to a derived table in that it is not stored as an object and
lasts only for the duration of the query. Unlike a derived table, a
CTE can be self-referencing and can be referenced multiple times in
the same query.
Something like
;WITH DeleteRows AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY ID) RowID
FROM [Table]
)
DELETE from DeleteRows
WHERE RowID > 1
SQL Fiddle DEMO

How to optimize the following delete SQL query?

I have a following delete query in oracle. There will be about 1000 records to be deleted from the database at a time.
I have used "in" the query. Is there any better way to write this query?
DELETE FROM BI_EMPLOYEE_ACTIVITY
WHERE EMPLOYEE_ID in (
SELECT
EMP_ID
FROM
BI_EMPLOYEE
WHERE
PRODUCT_ID = IN_PRODUCT_ID
);
It is not really possible to answer this question as we're missing a description of the data distribution: How many rows are in each table? What's the relationship between the tables? How many rows are affected by the delete?
I'll be assuming that both tables are large (since this is an optimization question) and that BI_EMPLOYEE and BI_EMPLOYEE_ACTIVITY have a parent-child 1..N relationship.
If there are few rows affected by the delete, this means that not many employees have the same PRODUCT_ID and each employee has few activities. In this case it would make sense to index both BI_EMPLOYEE (product_id) and BI_EMPLOYEE_ACTIVITY (employee_id).
This is probably not the case though, the delete probably affects lots of rows. In that case the indexes could be a hindrance. If the delete affects lots of rows, the fastest access path probably is FULL TABLE SCAN + HASH JOIN.
We need some metrics here: how many rows are deleted? How long does it take? This is because large DML will always take time, especially DELETE since they produce the largest amount of undo.
There are alternatives to a large DELETE, as explained in "Deleting many rows from a big table" from asktom:
recreate the table without the deleted rows
partition the data, do a parallel delete
partition the data so that the delete is done by dropping a partition
Putting index on EMP_ID may help, I dont believe if any other optimization is possible, query is quite simple and straight forward
Create an index on PRODUCT_ID column. This would speed up the search. If the column is of varchar type, make use to function index if you are converting values to uppercase or lowercase
Maybe you can try EXIST instead of IN:
DELETE FROM BI_EMPLOYEE_ACTIVITY
WHERE EXISTS (
SELECT
EMP_ID
FROM
BI_EMPLOYEE
WHERE
PRODUCT_ID = IN_PRODUCT_ID
AND
EMP_ID = EMPLOYEE_ID
);
Create an index on BI_EMPLOYEE table for PRODUCT_ID, EMP_ID columns in this order (product_id on the first place).
And create an index on the BI_EMPLOYEE_ACTIVITY table for the column EMPLOYEE_ID
I'll just add that other than creating an index for the query, you need to take a look at the locking issue when your table grows really big, try to lock the table in exclusive mode (if possible) as this will only take a lock from the db, and if it's not possible try to commit the delete over each 2500 records so if you're stuck with row locking you don't endup starving the database of locks.

Delete rows from CTE in SQL SERVER

I have a CTE which is a select statement on a table. Now if I delete 1 row from the CTE, will it delete that row from my base table?
Also is it the same case if I have a temp table instead of CTE?
Checking the DELETE statement documentation, yes, you can use a CTE to delete from and it will affect the underlying table. Similarly for UPDATE statements...
Also is it the same case if I have a temp table instead of CTE?
No, deletion from a temp table will affect the temp table only -- there's no connection to the table(s) the data came from, a temp table is a stand alone object.
You can think of CTE as a subquery, it doesn't have a temp table underneath.
So, if you run delete statement against your CTE you will delete rows from the table. Of course if SQL can infer which table to upadte/delete base on your CTE. Otherwise you'll see an error.
If you use temp table, and you delete rows from it, then the source table will not be affected, as temp table and original table don't have any correlation.
In the cases where you have a sub query say joining multiple tables and you need to use this in multiple places then both cte and temp table can be used. If you however want to delete records based on the sub query condition then cte is the way to go. Sometimes you can simply use the delete statement with out a need of cte since it's a delete statement and only rows that satisfy the query conditions get deleted even though multiple conditions are used for filtering.

SQL: Remove rows whose associations are broken (orphaned data)

I have a table called "downloads" with two foreign key columns -- "user_id" and "item_id". I need to select all rows from that table and remove the rows where the User or the Item in question no longer exists. (Look up the User and if it's not found, delete the row in "downloads", then look up the Item and if it's not found, delete the row in "downloads").
It's 3.4 million rows, so all my scripted solutions have been taking 6+ hours. I'm hoping there's a faster, SQL-only way to do this?
use two anti joins and or them together:
delete from your_table
where user_id not in (select id from users_table)
or item_id not in (select id from items_table)
once that's done, consider adding two foreign keys, each with an on delete cascade clause. it'll do this for you automatically.
delete from your_table where user_id not in (select id from users_table) or item_id not in (select id from items_table)
think there is no faster solution when there are so many rows
that are on your server 157 rows per second
check user id
if mysql num rows = 0 than delete the downloads and also check the item_id
there was also a similar question about the performance of myswl num rows
MySQL: Fastest way to count number of rows
edit: think the best is to creatse some triggers so the database server does the job for you
currently i would use a cronjob for the first time
For future reference. For these kind of long operations. It is possible to optimise the server independently of the SQL. For example detach the sql service, defrag the system disk, if you can ensure the sql log files are on separate disk drive to the drive where database is.
This will at least reduce the pain of these kind of long operations.
I've found in SQL 2008 R2, if your "in" clause contains a null value (perhaps from a table who has a reference to this key that is nullable), no records will be returned! To correct, just add a clause to your selects in the union part:
delete from SomeTable where Key not in (
select SomeTableKey from TableB where SomeTableKey is not null
union
select SomeTableKey from TableC where SomeTableKey is not null
)