Would a table lock speed up an update statement in Oracle 10g enterprise? - sql

We have a fairly wide table BaseData with some 33 millions rows in it. Then we have an update query that joins it to several other tables containing all kinds of parameters, some functions are applied, there is a group by original Id and then the results are written back to the BaseData table in a few columns.
This process is very slow so I'm looking into ways of speeding it up. I have most of my experience in SQLServer so all this type of internals of Oracle I don't know yet.
One thing I suspect is that during the update Oracle creates versions of every row so any oher readers can read that unaffected row. This however takes up considerable resources. Is there any way to have the update take a write lock on the table so it wouldn't create versions of every row?
Any other tips you guys have for large updates? We already broke it down into batches. Each batch is in a seperate partition of the table and then several updates are run in parallel. But still its all much too slow.

The short answer is that no, in Oracle, taking an exclusive lock on a table won't prevent other sessions from reading it, or having to incur the work of generating a read-consistent view of the data. Similarly, in Oracle, you can't tell a session to enable "dirty reads."
Well, the first question is what's slow - is it all the work of joining and applying functions, or is it the writing back? How does a SELECT my_updated_resultset FROM BASEDATA JOIN... perform compared to your update statement? Have you verified that there's contention between the readers of BaseData and the update process? Also, it's it too slow for the business, or just slower than you think it should be?
Another option to consider is to use partition exchange to perform your updates. The high level concept would be:
CREATE TABLE BASEDATA_XCHG as SELECT * FROM BASEDATA WHERE 1 = 0;
INSERT /*+ append */ INTO BASEDATA_XCHG SELECT my_updated_resultset FROM BASEDATA PARTITION (ONLY_ONE_PARTITION) JOIN...
Create all the required indexes and constraints on the BASEDATA_XCHG table.
ALTER TABLE BASEDATA EXCHANGE PARTITION (ONLY_ONE_PARTITION) WITH BASEDATA_XCHG
If you're updating most of the rows in a partition of BASEDATA table, don't update them - create a new table and exchange it out. Tim Gorman has an excellent paper called "Scaling to Infinity" that covers this concept in greater depth; you may wish to check it out.

In addition to Adam's answer:
Run an EXPLAIN PLAN on your update statement and check the execution plan.
Chances are that adding indexes to support your joins and WHERE conditions can speed up the query.

Oracle uses undo segments for read consistency (along with SCNs, read more here)
I'm assuming these large batch processes are running on a staging area and not a "prod" instance that is being used by a lot of various processes. If you are updating 25% or more (rough figures) of some big table, it may be better to do a CTAS (create table as select...) than attempting updates. Your CTAS would contain the update logic for the new table. Once done, add indexes/grants/etc on new table and rename new to old. You can also add a parallel hint and nologging on the CTAS to potentially speed things up even more.

Related

How to improve the execution time of inserting to a table from selecting rows from a table of million rows in ORACLE?

An empty table T1 where rows have to be inserted by selecting rows from another table T2 in ORACLE.
Like,
INSERT INTO T1
SELECT * FROM T2;
The issue is table T2 has about 10 million of rows. This simple SELECT statement seems to execute around 25-30 secs individually. But when it inserts into T1, it takes 20-30 mins to complete.
Why the above statement is taking long time to execute and what is the best approach or how to improve upon to insert data to table T1 selecting from table T2?
For one thing, the "apparent" execution time of a simple SELECT query is a bit misleading: the database engine figures out how to do the query then returns only the first "chunk" of information to you. (As you then move through the dataset, additional "chunks" are transparently supplied as needed.) But when you specify INSERT, now the database has no choice but to actually go through all those millions of rows.
There are often specialized tools that are specifically intended for "bulk" data operations such as this one. These might be significantly faster.
Another standard practice is to temporarily disable indexes. This avoids the overhead of updating the indexes for every record: the index will be completely rebuilt when you turn it back on. (The "bulk operations" tools aforementioned will usually do things like that automagically.)
Adding an APPEND hint may enable a direct path insert, which can avoid generating extra REDO data used for recovery:
INSERT /*+ append */ INTO T1
SELECT * FROM T2;
Adding parallelism can further improve performance:
ALTER SESSION ENABLE PARALLEL DML;
INSERT /*+ parallel append */ INTO T1
SELECT * FROM T2;
Those two features could shrink the run time from minutes to seconds but there are a lot of caveats you need to understand. Direct-path writes lock the table, and are not recoverable; if the data is important you may not want to wait for the next full backup. Parallel queries work harder, not smarter, and may steal resources from more important jobs. Finding the optimal degree of parallelism is tricky, and direct-path inserts have many limitations, like triggers and some kinds of referential integrity constraints.
With the right hardware, system configuration, and code, you can realistically improve performance by 100x. But if you're new to these features, prepare to spend hours learning about them.

avoiding write conflicts while re-sorting a table

I have a large table that I need to re-sort periodically. I am partly basing this on a suggestion I was given to stay away from using cluster keys since I am inserting data ordered differently (by time) from how I need it clustered (by ID), and that can cause re-clustering to get a little out of control.
Since I am writing to the table on a hourly I am wary of causing problems with these two processes conflicting: If I CTAS to a newly sorted temp table and then swap the table name it seems like I am opening the door to have a write on the source table not make it to the temp table.
I figure I can trigger a flag when I am re-sorting that causes the ETL to pause writing, but that seems a bit hacky and maybe fragile.
I was considering leveraging locking and transactions, but this doesn't seem to be the right use case for this since I don't think I'd be locking the table I am copying from while I write to a new table. Any advice on how to approach this?
I've asked some clarifying questions in the comments regarding the clustering that you are avoiding, but in regards to your sort, have you considered creating a nice 4XL warehouse and leveraging the INSERT OVERWRITE option back into itself? It'd look something like:
INSERT OVERWRITE INTO table SELECT * FROM table ORDER BY id;
Assuming that your table isn't hundreds of TB in size, this will complete rather quickly (inside an hour, I would guess), and any inserts into the table during that period will queue up and wait for it to finish.
There are some reasons to avoid the automatic reclustering, but they're basically all the same reasons why you shouldn't set up a job to re-cluster frequently. You're making the database do all the same work, but without the built in management of it.
If your table is big enough that you are seeing performance issues with the clustering by time, and you know that the ID column is the main way that this table is filtered (in JOINs and WHERE clauses) then this is probably a good candidate for automatic clustering.
So I would recommend at least testing out a cluster key on the ID and then monitoring/comparing performance.
To give a brief answer to the question about resorting without conflicts as written:
I might recommend using a time column to re-sort records older than a given time (probably in a separate table). While it's sorting, you may get some new records. But you will be able to use that time column to marry up those new records with the, now sorted, older records.
You might consider revoking INSERT, UPDATE, DELETE privileges on the original table within the same script or procedure that performs the CTAS creating the newly sorted copy of the table. After a successful swap you can re-enable the privileges for the roles that are used to perform updates.

Delete large portion of huge tables

I have a very large table (more than 300 millions records) that will need to be cleaned up. Roughly 80% of it will need to be deleted. The database software is MS SQL 2005. There are several indexes and statistics on the table but not external relationships.
The best solution I came up with, so far, is to put the database into "simple" recovery mode, copy all the records I want to keep to a temporary table, truncate the original table, set identity insert to on and copy back the data from the temp table.
It works but it's still taking several hours to complete. Is there a faster way to do this ?
As per the comments my suggestion would be to simply dispense with the copy back step and promote the table containing records to be kept to become the new main table by renaming it.
It should be quite straightforward to script out the index/statistics creation to be applied to the new table before it gets swapped in.
The clustered index should be created before the non clustered indexes.
A couple of points I'm not sure about though.
Whether it would be quicker to insert into a heap then create the clustered index afterwards. (I guess no if the insert can be done in clustered index order)
Whether the original table should be truncated before being dropped (I guess yes)
#uriDium -- Chunking using batches of 50,000 will escalate to a table lock, unless you have disabled lock escalation via alter table (sql2k8) or other various locking tricks.
I am not sure what the structure of your data is. When does a row become eligible for deletion? If it is a purely ID based on date based thing then you can create a new table for each day, insert your new data into the new tables and when it comes to cleaning simply drop the required tables. Then for any selects construct a view over all the tables. Just an idea.
EDIT: (In response to comments)
If you are maintaining a view over all the tables then no it won't be complicated at all. The complex part is coding the dropping and recreating of the view.
I am assuming that you don't want you data to be locked down too much during deletes. Why not chunk the delete operations. Created a SP that will delete the data in chunks, 50 000 rows at a time. This should make sure that SQL Server keeps a row lock instead of a table lock. Use the
WAITFOR DELAY 'x'
In your while loop so that you can give other queries a bit of breathing room. Your problem is the old age computer science, space vs time.

SQL Server 2005 efficient delete

My issue at hand is that I need to remove about 60M records from a table without causing deadlocks with other processes that use this table. At this point I'm almost done removing the records using a while loop that only processes about 1M records at a time however it's taken all day.
Q1: What is the optimal way to remove large quantities of data from a table, keeping the table online and minimal impact to other resources that need to use this table in MS SQL Server 2005?
Q2: Is there a way to implement the individual row locking (rather than table locking) in SQL Server like they have in Oracle? (Note answering this may answer Q1).
A2: So as #Remus Rusanu informed me there is a way to do row level locking with a delete.
See this thread, the original poster actually did some tests and posted the most efficient method. An MVP did originally chime in with an option to actually insert the data you want to retain into a temp table and then truncate the original table and reinsert.
I just did something like that recently. I simply created a SQL Server job that ran every 10 mins deleting a million rows. Code follows
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
DELETE TOP 1000000 FROM BIG_TABLE WHERE CreatedDate <= '20080630'
Said table in question had about 900 mil rows to start with. Didn't notice any significant performance issues.
The most efficient way is to use partition switching, see Transferring Data Efficiently by Using Partition Switching. The drawback is that it requires planning ahead in how the partitions are deployed.
If partition switching is not available then the answer depends on the actual table schema. You better post the actual schema (including all indexes and most importantly the clustered key definiton) and the criteria that qualifies the delete candidates.
As for Q2, SQL Server had row level locking since mid 90s, I don't know what you're actually asking.

Oracle SQL technique to avoid filling trans log

Newish to Oracle programming (from Sybase and MS SQL Server). What is the "Oracle way" to avoid filling the trans log with large updates?
In my specific case, I'm doing an update of potentially a very large number of rows. Here's my approach:
UPDATE my_table
SET a_col = null
WHERE my_table_id IN
(SELECT my_table_id FROM my_table WHERE some_col < some_val and rownum < 1000)
...where I execute this inside a loop until the updated row count is zero,
Is this the best approach?
Thanks,
The amount of updates to the redo and undo logs will not at all be reduced if you break up the UPDATE in multiple runs of, say 1000 records. On top of it, the total query time will be most likely be higher compared to running a single large SQL.
There's no real way to address the UNDO/REDO log issue in UPDATEs. With INSERTs and CREATE TABLEs you can use a DIRECT aka APPEND option, but I guess this doesn't easily work for you.
Depends on the percent of rows almost as much as the number. And it also depends on if the update makes the row longer than before. i.e. going from null to 200bytes in every row. This could have an effect on your performance - chained rows.
Either way, you might want to try this.
Build a new table with the column corrected as part of the select instead of an update. You can build that new table via CTAS (Create Table as Select) which can avoid logging.
Drop the original table.
Rename the new table.
Reindex, repoint contrainst, rebuild triggers, recompile packages, etc.
you can avoid a lot of logging this way.
Any UPDATE is going to generate redo. Realistically, a single UPDATE that updates all the rows is going to generate the smallest total amount of redo and run for the shortest period of time.
Assuming you are updating the vast majority of the rows in the table, if there are any indexes that use A_COL, you may be better off disabling those indexes before the update and then doing a rebuild of those indexes with NOLOGGING specified after the massive UPDATE statement. In addition, if there are any triggers or foreign keys that would need to be fired/ validated as a result of the update, getting rid of those temporarily might be helpful.