how can i improve the performance of this oracle sql delete query? - sql

I got to delete some unwanted rows from a table based on the result of a select query from another table
DELETE /*+ parallels(fe) */ FROM fact_x fe
WHERE fe.key NOT IN(
SELECT DISTINCT di.key
FROM dim_x di
JOIN fact_y fa
ON fa.code = di.code
WHERE fa.code_type = 'ABC'
);
The inner select query returns 77 rows and executes in few milliseconds. but the outer delete query runs forever(for more than 8 hrs). I tried to count how many rows got to be deleted by converting the delete to select count(1) and its around 66.4 million fact_x rows out of total 66.8 million rows. I am not trying to truncate though. I need to retain remaining rows.
Is there any other way to achieve this? will deleting this by running a pl/sql cursor will work better?

Would it not make more sense just to insert the rows you want to keep into another table, then drop the existing table? Even if there are FKs to disable/recreate/etc. it almost certain to be faster.

Could you add a "toBeDeleted" column? The query to set that wouldn't need that "NOT IN" construction. Deleting the marked rows should also be "simple".
Then again, deleting 99,4% of the 67 million rows will take some time.

Try /*+ parallel(fe) */. No "S".

Related

How to Optimize SQL query with DISTINCT and fetch first?

I am trying to fetch first 5 distinct rows using the query SELECT DISTINCT col_A from table_name fetch first 5 rows only;
But the table has millions of rows so using distinct is expensive as it scans the whole table for only 5 rows taking a lot of time, around 200 seconds for me.
Is there a workaround or subquery for this?
The problem here is you don't know if a row is unique unless you check all the other rows. I believe your only solution would be to index the rows such that non-distinct rows are indexed together. That might buy you some efficiency when searching, however it will cost you when inserting data.

Alternatives to UPDATE statement Oracle 11g

I'm currently using Oracle 11g and let's say I have a table with the following columns (more or less)
Table1
ID varchar(64)
Status int(1)
Transaction_date date
tons of other columns
And this table has about 1 Billion rows. I would want to update the status column with a specific where clause, let's say
where transaction_date = somedatehere
What other alternatives can I use rather than just the normal UPDATE statement?
Currently what I'm trying to do is using CTAS or Insert into select to get the rows that I want to update and put on another table while using AS COLUMN_NAME so the values are already updated on the new/temporary table, which looks something like this:
INSERT INTO TABLE1_TEMPORARY (
ID,
STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS)
SELECT
ID
3 AS STATUS,
TRANSACTION_DATE,
TONS_OF_OTHER_COLUMNS
FROM TABLE1
WHERE
TRANSACTION_DATE = SOMEDATE
So far everything seems to work faster than the normal update statement. The problem now is I would want to get the remaining data from the original table which I do not need to update but I do need to be included on my updated table/list.
What I tried to do at first was use DELETE on the same original table using the same where clause so that in theory, everything that should be left on that table should be all the data that i do not need to update, leaving me now with the two tables:
TABLE1 --which now contains the rows that i did not need to update
TABLE1_TEMPORARY --which contains the data I updated
But the delete statement in itself is also too slow or as slow as the orginal UPDATE statement so without the delete statement brings me to this point.
TABLE1 --which contains BOTH the data that I want to update and do not want to update
TABLE1_TEMPORARY --which contains the data I updated
What other alternatives can I use in order to get the data that's the opposite of my WHERE clause (take note that the where clause in this example has been simplified so I'm not looking for an answer of NOT EXISTS/NOT IN/NOT EQUALS plus those clauses are slower too compared to positive clauses)
I have ruled out deletion by partition since the data I need to update and not update can exist in different partitions, as well as TRUNCATE since I'm not updating all of the data, just part of it.
Is there some kind of JOIN statement I use with my TABLE1 and TABLE1_TEMPORARY in order to filter out the data that does not need to be updated?
I would also like to achieve this using as less REDO/UNDO/LOGGING as possible.
Thanks in advance.
I'm assuming this is not a one-time operation, but you are trying to design for a repeatable procedure.
Partition/subpartition the table in a way so the rows touched are not totally spread over all partitions but confined to a few partitions.
Ensure your transactions wouldn't use these partitions for now.
Per each partition/subpartition you would normally UPDATE, perform CTAS of all the rows (I mean even the rows which stay the same go to TABLE1_TEMPORARY). Then EXCHANGE PARTITION and rebuild index partitions.
At the end rebuild global indexes.
If you don't have Oracle Enterprise Edition, you would need to either CTAS entire billion of rows (followed by ALTER TABLE RENAME instead of ALTER TABLE EXCHANGE PARTITION) or to prepare some kind of "poor man's partitioning" using a view (SELECT UNION ALL SELECT UNION ALL SELECT etc) and a bunch of tables.
There is some chance that this mess would actually be faster than UPDATE.
I'm not saying that this is elegant or optimal, I'm saying that this is the canonical way of speeding up large UPDATE operations in Oracle.
How about keeping in the UPDATE in the same table, but breaking it into multiple small chunks?
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 0000000 and 0999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 1000000 and 1999999
COMMIT
UPDATE .. WHERE transaction_date = somedatehere AND id BETWEEN 2000000 and 2999999
COMMIT
This could help if the total workload is potentially manageable, but doing it all in one chunk is the problem. This approach breaks it into modest-sized pieces.
Doing it this way could, for example, enable other apps to keep running & give other workloads a look in; and would avoid needing a single humungous transaction in the logfile.

Slow self-join delete query

Does it get any simpler than this query?
delete a.* from matches a
inner join matches b ON (a.uid = b.matcheduid)
Yes, apparently it does... because the performance on the above query is really bad when the matches table is very large.
matches is about 220 million records. I am hoping that this DELETE query takes the size down to about 15,000 records. How can I improve the performance of the query? I have indexes on both columns. UID and MatchedUID are the only two columns in this InnoDB table, both are of type INT(10) unsigned. The query has been running for over 14 hours on my laptop (i7 processor).
Deleting so many records can take a while, I think this is as fast as it can get if you're doing it this way. If you don't want to invest into faster hardware, I suggest another approach:
If you really want to delete 220 million records, so that the table then has only 15.000 records left, thats about 99,999% of all entries. Why not
Create a new table,
just insert all the records you want to survive,
and replace your old one with the new one?
Something like this might work a little bit faster:
/* creating the new table */
CREATE TABLE matches_new
SELECT a.* FROM matches a
LEFT JOIN matches b ON (a.uid = b.matcheduid)
WHERE ISNULL (b.matcheduid)
/* renaming tables */
RENAME TABLE matches TO matches_old;
RENAME TABLE matches_new TO matches;
After this you just have to check and create your desired indexes, which should be rather fast if only dealing with 15.000 records.
running explain select a.* from matches a inner join matches b ON (a.uid = b. matcheduid) would explain how your indexes are present and being used
I might be setting myself up to be roasted here, but in performing a delete operation like this in the midst of a self-join, isn;t the query having to recompute the join index after each deletion?
While it is clunky and brute force, you might consider either:
A. Create a temp table to store the uid's resulting from the inner join, then join to THAT, THEN perfoorm the delete.
OR
B. Add a boolean (bit) typed column, use the join to flag each match (this operation should be FAST), and THEN use:
DELETE * FROM matches WHERE YourBitFlagColumn = True
Then delete the boolean column.
You probably need to batch your delete. You can do this with a recursive delete using a common table expression or just iterate it on some batch size.

Limit number of rows - TSQL - Merge - SQL Server 2008

Hi all i have the following merge sql script which works fine for a relatively small number of rows (up to about 20,000 i've found). However sometimes the data i have in Table B can be up to 100,000 rows and trying to merge this with Table A (which is currently at 60 million rows). This takes quite a while to process, which is understandable as it has to merge 100,000 with 60 million existing records!
I was just wondering if there was a better way to do this. Or is it possible to have some sort of count, so merge 20,000 rows from Table B to Table A. Then delete those merged rows from table B. Then do the next 20,000 rows and so on, until Table B has no rows left?
Script:
MERGE
Table A AS [target]
USING
Table B AS [source]
ON
([target].recordID = [source].recordID)
WHEN NOT MATCHED BY TARGET
THEN
INSERT([recordID],[Field 1]),[Field 2],[Field 3],[Field 4],[Field 5])
VALUES([source].[recordID],[source].[Field 1],[source].[Field 2],[source].[Field 3],[source].[Field 4],[source].[Field 5]
);
MERGE is overkill for this since all you want is to INSERT missing values.
Try:
INSERT INTO Table_A
([recordID],[Field 1]),[Field 2],[Field 3],[Field 4],[Field 5])
SELECT B.[recordID],
B.[Field 1],B.[Field 2],B.[Field 3],B.[Field 4],B.[Field 5]
FROM Table_B as B
WHERE NOT EXISTS (SELECT 1 FROM Table_A A
WHERE A.RecordID = B.RecordID)
In my experience MERGE can perform worse for simple operations like this. I try to reserve it for when you need varying operations depending on conditions, like an UPSERT.
You can definitely do (SELECT TOP 20000 * FROM B ORDER BY [some_column]) as [source] in USING and then delete these records after MERGE. So you pseudo-code will look like :
1. Merge top 20000
2. Delete 20000 records from source table
3. Check ##ROWCOUNT. If it's 0, exit; otherwise goto step 1
I'm not sure if it runs any faster than merging all the records at the same time.
Also, are you sure you need MERGE? From what I see in your code INSERT INTO ... SELECT should also work for you.

Querying Oracle with a pick list

I have an oracle database that I have read-only access (with no permission to create temporary tables). I have a pick list (in Excel) of 28000 IDs corresponding to 28000 rows in a table which has millions of records. How do I write a query to return the 28000 rows?
I tried creating a table in access and performing a join through ODBC but Access freezes/takes an incredible long time. Would I have to create a query with 28,000 items in an IN statement?
Is there anything in PL/SQL that would make it easier?
Thank you for your time and help.
-JC
What makes your 28,000 rows special?
Is there another field in the records you can use to restrict you query in a WHERE clause (or at least narrow down the millions of rows a bit)? Perhaps the ID's you're interested in fall within a certain range?
The max number of variables for an IN (.., .. ,,) type query is 1000 in Oracle 10g.
Try creating an index on the table you created in Access.
That's a painful condition to be in. One workaround is to create a view that contains all of the ids, then join to it.
The example below is Oracle.
WITH
ids AS
(
SELECT 314 ID FROM DUAL UNION ALL
SELECT 159 ID FROM DUAL UNION ALL
SELECT 265 ID FROM DUAL
)
SELECT VALUE1, VALUE2
FROM SOME_TABLE, ids
WHERE SOME_TABLE.ID = ids.ID
This basically embeds all 28000 ids, in the with clause, allowing you to do the join, without actually creating a table.
Ugly, but it should work.
The best way to do it is described here: How to put more than 1000 values into an Oracle IN clause