deleting a big table based on condition - sql

i have a huge table which has no indexing on it. and indexing cant be added. i need to delete rows like this :
delete from table1 where id in (
select id from table2 inner join table3 on table2.col1 = table3.col1);
but since it has huge number of rows its taking too much time. what i can do to make it faster other than indexing (not permitted).
I am using oracle db.

Related

Efficiently delete from one table where ID matches another table

I have two tables with few million records in a PostgreSQL database.
I'm trying to delete rows from one table where ID matches ID of another table. I have used the following command:
delete from table1 where id in (select id from table2)
The above command has been taking lot of time (few hours) which got me wondering is there a faster way to do this operation. Will creating indices help?
I have also tried the delete using join as suggested by few people:
delete from table1 join table2 on table1.id = table2.id
But the above command returned a syntax error. Can this be modified to avoid the error?
Syntax
You second attempt is not legal DELETE syntax in PostgreSQL. This is:
DELETE FROM table1 t1
USING table2 t2
WHERE t2.id = t1.id;
Consider the chapter "Notes" for the DELETE command:
PostgreSQL lets you reference columns of other tables in the WHERE condition by specifying the other tables in the USING clause. For example,
[...]
This syntax is not standard.
[...]
In some cases the join style is easier to write or faster to execute than the sub-select style.
Index
Will creating indices help?
The usefulness of indexes always depends on the complete situation. If table1 is big, and much bigger than table2, an index on table1.id should typically help. Typically, id would be your PRIMARY KEY, which is indexed implicitly anyway ...
Also typically, an index on table2 would not help (and not be used even if it exists.)
But like I said: Depends on the complete situation, and you disclosed preciously little.
Other details of your setup might make the deletes expensive. FK constraints, triggers, indexes, locks held by concurrent transactions, table and index bloat ...
Or non-unique rows in table2. (But I would assume ìd to be unique?) Then you would first extract a unique set of IDs from table2. Depending on cardinalities, a simple DISTINCT or more sophisticated query techniques would be in order ...

Optimize deletion query for duplicate rows in SQLite3?

I'm trying to delete one of the duplicate rows with the highest user ID in my table, consisting of 3.5 million rows. I have around 1300 rows to delete, and I am currently using the following query:
delete from Data
where exists (select 1 from Data t2
where data.code = t2.code and data.issue = t2.issue
and data.id < t2.id);
The query has run for more than 15 minutes. Is there any way I can optimize this to decrease the time taken? I'm using SQLite version 3.22.0.
Often, deleting a lot of rows in a table is simply inefficient. It can be faster to reconstruct the table.
The idea is to select the rows you want into another table:
create table temp_data as
select t.*
from data t
where t.id = (select max(t2.id)
from data t2
where t2.code = t.code and t2.issue = t.issue
);
For this query, you want an index on (code, issue, id).
Then when the data is safely tucked away and validated, you can empty the existing table and re-insert:
delete from data;
Be sure you have removed any triggers on the table. You can read about SQLite's "truncate" optimization in the documentation. In most other databases, you would use the command truncate table data.
Then, you can re-insert the data:
insert into data
select *
from temp_data;

MSSQL - Question about how insert queries run

We have two tables we want to merge. Say, table1 and table2.
They have the exact same columns, and the exact same purpose. The difference being table2 having newer data.
We used a query that uses LEFT JOIN to find the the rows that are common between them, and skip those rows while merging. The problem is this. both tables have 500M rows.
When we ran the query, it kept going on and on. For an hour it just kept running. We were certain this was because of the large number of rows.
But when we wanted to see how many rows were already inserted to table2, we ran the code select count(*) from table2, it gave us the exact same row count of table2 as when we started.
Our questions is, is that how it's supposed to be? Do the rows get inserted all at the same time after all the matches have been found?
If you would like to read uncommited data, than the count should me modified, like this:
select count(*) from table2 WITH (NOLOCK)
NOLOCK is over-used, but in this specific scenario, it might be handy.
No data are inserted or updated one by one.
I have no idea how it is related with "Select count(*) from table2 WITH (NOLOCK) "
Join condition is taking too long to produce Resultset which will be use by insert operator .So actually there is no insert because no resultset is being produce.
Join query is taking too long because Left Join condition produces very very high cardinality estimate.
so one has to fix Join condition first.
for that need other info like Table schema ,Data type and length and existing index,requirement.

Fastest options for merging two tables in SQL Server

Consider two very large tables, Table A with 20 million rows in, and Table B which has a large overlap with TableA with 10 million rows. Both have an identifier column and a bunch of other data. I need to move all items from Table B into Table A updating where they already exist.
Both table structures
- Identifier int
- Date DateTime,
- Identifier A
- Identifier B
- General decimal data.. (maybe 10 columns)
I can get the items in Table B that are new, and get the items in Table B that need to be updated in Table A very quickly, but I can't get an update or a delete insert to work quickly. What options are available to merge the contents of TableB into TableA (i.e. updating existing records instead of inserting) in the shortest time?
I've tried pulling out existing records in TableB and running a large update on table A to update just those rows (i.e. an update statement per row), and performance is pretty bad, even with a good index on it.
I've also tried doing a one shot delete of the different values out of TableA that exist in TableB and performance of the delete is also poor, even with the indexes dropped.
I appreciate that this may be difficult to perform quickly, but I'm looking for other options that are available to achieve this.
Since you deal with two large tables, in-place updates/inserts/merge can be time consuming operations. I would recommend to have some bulk logging technique just to load a desired content to a new table and the perform a table swap:
Example using SELECT INTO:
SELECT *
INTO NewTableA
FROM (
SELECT * FROM dbo.TableB b WHERE NOT EXISTS (SELECT * FROM dbo.TableA a WHERE a.id = b.id)
UNION ALL
SELECT * FROM dbo.TableA a
) d
exec sp_rename 'TableA', 'BackupTableA'
exec sp_rename 'NewTableA', 'TableA'
Simple or at least Bulk-Logged recovery is highly recommended for such approach. Also, I assume that it has to be done out of business time since plenty of missing objects to be recreated on a new tables: indexes, default constraints, primary key etc.
A Merge is probably your best bet, if you want to both inserts and updates.
MERGE #TableB AS Tgt
USING (SELECT * FROM #TableA) Src
ON (Tgt.Identifier = SRc.Identifier)
WHEN MATCHED THEN
UPDATE SET Date = Src.Date, ...
WHEN NOT MATCHED THEN
INSERT (Identifier, Date, ...)
VALUES (Src.Identifier, Src.Date, ...);
Note that the merge statement must be terminated with a ;

Count rows with column varbinary NOT NULL tooks a lot of time

This query
SELECT COUNT(*)
FROM Table
WHERE [Column] IS NOT NULL
takes a lot of time. The table has 5000 rows, and the column is of type VARBINARY(MAX).
What can I do?
Your query needs to do a table scan on a column that can potentially be very large without any way to index it. There isn't much you can do to fix this without changing your approach.
One option is to split the table into two tables. The first table could have all the details you have now in it and the second table would have just the file. You can make this a 1-1 table to ensure data is not duplicated.
You would only add the binary data as needed into the second table. If it is not needed anymore, you simply delete the record. This will allow you to simply write a JOIN query to get the information you are looking for.
SELECT
COUNT(*)
FROM dbo.Table1
INNER JOIN dbo.Table2
ON Table1.Id = Table2.Id