Bulk insert into partitioned table and table level lock

Bulk insert into partitioned table and table level lock - sql

I want to know the core reason(the mechanics of segments, blocks, locks that the engine does) why bulk insert(with direct-path) locks the entire table so if I insert into a partition, I can't truncate another partition which is not affected(apparently) by insert.
A conventional insert(without append hint) permits to truncate some nonaffected partitions.(Notice that i speak about non-commited transaction.)
Below an example to ilustrate it.
Let be a table:
CREATE TABLE FG_TEST
(COL NUMBER )
PARTITION BY RANGE (COL)
(PARTITION "P1" VALUES LESS THAN (1000),
PARTITION "P2" VALUES LESS THAN (2000));
Insert into table fg_test values (1);
insert into table fg_test values (1000);
commit;
Session 1:
insert into table fg_test select * from fg_test where col >=1000;
--1 rows inserted;
Session 2:
alter table fg_test truncate partition p1;
--table truncated
Session 1:
rollback;
insert /*+append */ into table fg_test select * from fg_test where col >=1000;
--1 rows inserted;
Session 2:
alter table fg_test truncate partition p1;
--this throws ORA-00054: resource busy and acquire with NOWAIT specified
--or timeout expired
The Doc on Diret-Path Insert is pretty abrupt on this subject and just says:
During direct-path INSERT, the database obtains exclusive locks on the
table (or on all partitions of a partitioned table). As a result,
users cannot perform any concurrent insert, update, or delete
operations on the table, and concurrent index creation and build
operations are not permitted.
The How Direct-Path INSERT Works does not explain why the lock is needed for all partitions.
And why conventional insert does not lock nonaffected partitions? (My intuition is that the lock is done at block level)

Your premise is slightly wrong. A direct-path insert does not lock the entire table if you use the partition extension clause.
Session 1:
insert /*+append */ into fg_test partition (p2)
select * from fg_test where col >=1000;
Session 2:
alter table fg_test truncate partition p1;
--table truncated
The new question is: When the partition extension clause is NOT used, why do conventional and direct-path inserts have different locking mechanisms? This clarification makes the question easier, but without inside knowledge the answer below is still only a guess.
It was easier to code a feature that locks the entire table. And it runs faster, since there is no need to track which partitions are updated.
There's usually no need for a more fine-grained lock. Most systems or processes that use direct-path writes only update one large table at a time. If a more fine-grained lock is really needed, the partition extension clause can be used. It's not quite as convenient, since only one partition can be referenced at a time. But it's good enough 99.9% of the time.

I found the follwing answer on asktom.oracle.com:
Ask Tom: Inserts with APPEND Hint
Tom explains many of the inner workings, but the reason why Oracle locks the whole table and not only affected partitions is still not clear.
Maybe it's just a design decision (e.g. not wanting the big bulky direct load to be potentially blocked by one smallish uncommited transaction and therefore locking all partitions ...)

Related

What is the best way to insert 6000000 records from one table to another table in ORACLE?

Hello guys i need to copy 6000000 rows from TMP_DATA to DATA what is the best way to do this?
I was thinking of doing INSERT INTO DATA SELECT * FROM TMP_DATA. But i think it will take ages to do the insert.
What do you suggest?
Kind Regards,

To expand a bit on Anders' answer and mathguy's comments, do the following:
alter table data nologging;
alter session enable parallel dml;
-- disable any triggers on `data` and temporarily drop any indexes
insert /*+ append */ * into data
select /*+ parallel (4) */ * from tmp_data
--sample (10) -- if tmp_data has 60 million rows: 10 means 10%
-- where rownum < 6000001
-- pick one of the two prior clauses if tmp_table has > 6 million rows
after the insert is done:
alter table data nologging;
-- enable triggers and recreate indexes
and have the dba do a backup as the data table will not be able to be recovered if there was any issue after the load.

There are a couple of ways of doing this:
If you want speed use parallel and nologging so (on a new table):
-- Caveat: this method is fast but will use a lot of cpu resources so just let
-- the DBA know. Also, index the table at the end.
create table DATA SELECT parallel 4 nologging as
select * from TMP_DATA;
If you are using an existing table one of the things that could potentially decrease the insertion performance is the use of indexes. You can temporarily disable the index to allow faster insertions.

Oracle Table LOGGING Mode - Bulk data insertion

I have a scenario wherein I need to copy 500 million rows from a Table1 to Table2. Couple of points,
Table1 has 2 billion rows.
Table2 is a new table and identical to Table1.
Table1 and Table2 both are of List partition type.
Both tables has to be in same tablespace and tablespace is created with LOGGING mode.
TABLESPACE Block size is: 8192, FORCE_LOGGING NO, AUTO EXTEND ON. REDO ARCHIVAL ENABLED
So, here is what my approach to do this activity and I ask for recommendations to improve or maybe prevent some sudden unwanted situations.
Create Table2 with same structure without any indexes or PK.
Alter Table2 nologging; --Putting the table in NOLOGGING mode to stop redo generation. This is done just to improve performance.
Do this activity in 50 parallel jobs (Jobs created based on partitioned column). Partitioned Column has 120 distinct values. So total 120 jobs. First 50 will be posted and as soon as 1 finishes, 51th will be posted and so on.
Using a Cursor, Bulk Fetch with limit of 5000 and FORALL for insert (With APPEND Hint). Commit immediately after 1 iteration so commit freq is 5000.
After all the jobs are finished, put Table2 back in LOGGING mode.
alter table Table2 logging;
Create all required indexes and PK on Table2 with Parallel mode enabled and then alter index NOPARALLEL.
Any suggestions? Thanks a lot for your time.

Use a single SELECT statement instead of PL/SQL.
There's no need to commit in chunks or to have a parallel strategy that mirrors the partitions. If the APPEND hint works and a direct-path write is used then there won't be any significant REDO or UNDO usage, so you don't need to run in chunks to reduce resource consumption. Oracle can easily divide a segment into granules - it's just copying a bunch of blocks from one place to another, it doesn't matter if it processes them per-partition. (Some possible exceptions are if you're using a weird column that doesn't support parallel SQL, or if you're joining tables and using partition-wise joins.)
alter session enable parallel dml;
alter table table2 nologging;
--Picking a good DOP can be tricky. 32 might not be the best number for you.
insert /*+ append parallel(32) */ into table2
select * from table1;
commit;
alter table table2 logging;
Before you run this, check the execution plan. There are lots of things that can prevent direct-path writes and you want to find them before you start the DML.
In the execution plan, make sure you see "LOAD AS SELECT" to ensure direct-path writes, "PX" to ensure parallelism, and a "PX" operation before the "LOAD AS SELECT" to ensure that both the writes and the reads are done in parallel.
alter session enable parallel dml;
alter table table2 nologging;
explain plan for
insert /*+ append parallel(32) */ into table2
select * from table1;
select * from table(dbms_xplan.display);
I often find it's not worth dealing with indexes separately. But that may depend on the number of indexes.

Oracle MERGE deadlock

I want to insert rows with a MERGE statement in a specified order to avoid deadlocks. Deadlocks could otherwise happen because multiple transaction will call this statement with overlapping sets of keys. Note that this code is also sensitive to duplicate value exception but I handle that by retrying so that is not my question. I was doing the following:
MERGE INTO targetTable
USING (
SELECT ...
FROM sourceCollection
ORDER BY <desiredUpdateOrder>
)
WHEN MATCHED THEN
UPDATE ...
WHEN NOT MATCHED THEN
INSERT ...
Now I'm still getting the dead lock so I'm becoming unsure whether oracle maintains the order of the sub-query. Does anyone know how to best make sure that oracle locks the rows in targetTable in the same order in this case? Do I have to do a SELECT FOR UPDATE before the merge? In which order does the SELECT FOR UPDATE lock the rows? Oracle UPDATE statement has an ORDER BY clause that MERGE seems to be missing. Is there another way to avoid dead locks other than locking the rows in the same order every time?
[Edit]
This query is used to maintain a count of how often a certain action has taken place. When the action happens the first time a row is inserted, when it happens a second time the "count" column is incremented. There are millions of different actions and they happen very often. A table lock wouldn't work.

Controlling the order in which the target table rows are modified requires that you control the query execution plan of the USING subquery. That's a tricky business, and depends on what sort of execution plans your query is likely to be getting.
If you're getting deadlocks then I'd guess that you're getting a nested loop join from the source collection to the target table, as a hash join would probably be based on hashing the source collection and would modify the target table roughly in target-table rowid order because that would be full scanned -- in any case, the access order would be consistent across all of the query executions.
Likewise, if there was a sort-merge between the two data sets you'd get consistency in the order in which target table rows are accessed.
Ordering of the source collection seems to be desirable, but the optimiser might not be applying it so check the execution plan. If it is not then try inserting your data into a global temporary table using APPEND and with an ORDER BY clause, and then selecting from there without an order by clause, and explore the us of hints to entrench a nested loop join.

I don't believe the ORDER BY will affect anything (though I'm more than willing to be proven wrong); I think MERGE will lock everything it needs to.
Assume I'm completely wrong, assume that you get row-by-row locks with MERGE. Your problem still isn't solved as you have no guarantees that your two MERGE statements won't hit the same row simultaneously. In fact, from the information given, you have no guarantees that an ORDER BY improves the situation; it might make it worse.
Despite there being no skip locked rows syntax as there is with UPDATE there is still a simple answer, stop trying to update the same row from within different transactions. If feasible, you can use some form of parallel execution, for instance the DBMS_PARALLEL_EXECUTE subprogram CREATE_CHUNKS_BY_ROWID and ensure that your transactions only work on a specific sub-set of the rows in the table.
As an aside I'm a little worried by your description of the problem. You say there's some duplicate erroring that you fix by rerunning the MERGE. If the data in these duplicates is different you need to ensure that the ORDER BY is done not only on the data to be merged but the data being merged into. If you don't then there's no guarantee that you don't overwrite the correct data with older, incorrect, data.

First locks are not really managed at row level but at block level. You may encounter an ORA-00060 error even without modifying the same row. This can be tricky. Managing this is the request developper's job.
One possible workaround is to organize your table (never do that on huge tables or table with heavy change rates)
https://use-the-index-luke.com/sql/clustering/index-organized-clustered-index

Rather than do a merge, I suggest that you try and lock the row. If successful update it, if not insert new row. By default lock will wait if another process has a lock on the same thing.
CREATE TABLE brianl.deleteme_table
(
id INTEGER PRIMARY KEY
, cnt INTEGER NOT NULL
);
CREATE OR REPLACE PROCEDURE brianl.deleteme_table_proc (
p_id IN deleteme_table.id%TYPE)
AUTHID DEFINER
AS
l_id deleteme_table.id%TYPE;
-- This isolates this procedure so that it doesn't commit
-- anything outside of the procedure.
PRAGMA AUTONOMOUS_TRANSACTION;
BEGIN
-- select the row for update
-- this will pause if someone already has the row locked.
SELECT id
INTO l_id
FROM deleteme_table
WHERE id = p_id
FOR UPDATE;
-- Row was locked, update it.
UPDATE deleteme_table
SET cnt = cnt + 1
WHERE id = p_id;
COMMIT;
EXCEPTION
WHEN NO_DATA_FOUND
THEN
-- we were unable to lock the record, insert a new row
INSERT INTO deleteme_table (id, cnt)
VALUES (p_id, 1);
COMMIT;
END deleteme_table_proc;
CREATE OR REPLACE PROCEDURE brianl.deleteme_proc_test
AUTHID CURRENT_USER
AS
BEGIN
-- This resets the table to empty for the test
EXECUTE IMMEDIATE 'TRUNCATE TABLE brianl.deleteme_table';
brianl.deleteme_table_proc (p_id => 1);
brianl.deleteme_table_proc (p_id => 2);
brianl.deleteme_table_proc (p_id => 3);
brianl.deleteme_table_proc (p_id => 2);
FOR eachrec IN ( SELECT id, cnt
FROM brianl.deleteme_table
ORDER BY id)
LOOP
DBMS_OUTPUT.put_line (
a => 'id: ' || eachrec.id || ', cnt:' || eachrec.cnt);
END LOOP;
END;
BEGIN
-- runs the test;
brianl.deleteme_proc_test;
END;

Can an INSERT operation result in a deadlock?

Assuming:
I am using REPEATABLE_READ or SERIALIZABLE transaction isolation (locks get retained every time I access a row)
We are talking about multiple threads accessing multiple tables simultaneously.
I have the following questions:
Is it possible for an INSERT operation to cause a deadlock? If so, please provide a detailed scenario demonstrating how a deadlock may occur (e.g. Thread 1 does this, Thread 2 does that, ..., deadlock).
For bonus points: answer the same question for all other operations (e.g. SELECT, UPDATE, DELETE).
UPDATE:
3. For super bonus points: how can I avoid a deadlock in the following scenario?
Given tables:
permissions[id BIGINT PRIMARY KEY]
companies[id BIGINT PRIMARY KEY, name VARCHAR(30), permission_id BIGINT NOT NULL, FOREIGN KEY (permission_id) REFERENCES permissions(id))
I create a new Company as follows:
INSERT INTO permissions; -- Inserts permissions.id = 100
INSERT INTO companies (name, permission_id) VALUES ('Nintendo', 100); -- Inserts companies.id = 200
I delete a Company as follows:
SELECT permission_id FROM companies WHERE id = 200; -- returns permission_id = 100
DELETE FROM companies WHERE id = 200;
DELETE FROM permissions WHERE id = 100;
In the above example, the INSERT locking order is [permissions, companies] whereas the DELETE locking order is [companies, permissions]. Is there a way to fix this example for REPEATABLE_READ or SERIALIZABLE isolation?

Generally all modifications can cause a deadlock and selects will not (get to that later). So
No you cannot ignore these.
You can somewhat ignore select depending on your database and settings but the others will give you deadlocks.
You don't even need multiple tables.
The best way to create a deadlock is to do the same thing in a different order.
SQL Server examples:
create table A
(
PK int primary key
)
Session 1:
begin transaction
insert into A values(1)
Session 2:
begin transaction
insert into A values(7)
Session 1:
delete from A where PK=7
Session 2:
delete from A where PK=1
You will get a deadlock. So that proved inserts & deletes can deadlock.
Updates are similar:
Session 1:
begin transaction
insert into A values(1)
insert into A values(2)
commit
begin transaction
update A set PK=7 where PK=1
Session 2:
begin transaction
update A set pk=9 where pk=2
update A set pk=8 where pk=1
Session 1:
update A set pk=9 where pk=2
Deadlock!
SELECT should never deadlock but on some databases it will because the locks it uses interfere with consistent reads. That's just crappy database engine design though.
SQL Server will not lock on a SELECT if you use SNAPSHOT ISOLATION. Oracle & I think Postgres will never lock on SELECT (unless you have FOR UPDATE which is clearly reserving for an update anyway).
So basically I think you have a few incorrect assumptions. I think I've proved:
Updates can cause deadlocks
Deletes can cause deadlocks
Inserts can cause deadlocks
You do not need more than one table
You do need more than one session
You'll just have to take my word on SELECT ;) but it will depend on your DB and settings.

In addition to LoztInSpace's answer, inserts may cause deadlocks even without deletes or updates presence. All you need is a unique index and a reversed operations order.
Example in Oracle :
create table t1 (id number);
create unique index t1_pk on t1 (id);
--thread 1 :
insert into t1 values(1);
--thread 2
insert into t1 values(2);
--thread 1 :
insert into t1 values(2);
--thread 2
insert into t1 values(1); -- deadlock !

Let us assume you have two relations A and B and two users X and Y. Table A is WRITE Locked by user X and Table B is WRITE Locked by Y. Then the following query will give you a dead lock if used by both the users X and Y.
Select * from A,B
So clearly a Select operation can cause a deadlock if join operations involving more than one table is a part of it. Usually Insert and Delete operations involve single relations. So they may not cause deadlock.

Does "SELECT FOR UPDATE" prevent other connections inserting when the row is not present?

I'm interested in whether a SELECT FOR UPDATE query will lock a non-existent row.
Example
Table FooBar with two columns, foo and bar, foo has a unique index.
Issue query SELECT bar FROM FooBar WHERE foo = ? FOR UPDATE
If the first query returns zero rows, issue a query
INSERT INTO FooBar (foo, bar) values (?, ?)
Now is it possible that the INSERT would cause an index violation or does the SELECT FOR UPDATE prevent that?
Interested in behavior on SQLServer (2005/8), Oracle and MySQL.

MySQL
SELECT ... FOR UPDATE with UPDATE
Using transactions with InnoDB (auto-commit turned off), a SELECT ... FOR UPDATE allows one session to temporarily lock down a particular record (or records) so that no other session can update it. Then, within the same transaction, the session can actually perform an UPDATE on the same record and commit or roll back the transaction. This would allow you to lock down the record so no other session could update it while perhaps you do some other business logic.
This is accomplished with locking. InnoDB utilizes indexes for locking records, so locking an existing record seems easy--simply lock the index for that record.
SELECT ... FOR UPDATE with INSERT
However, to use SELECT ... FOR UPDATE with INSERT, how do you lock an index for a record that doesn't exist yet? If you are using the default isolation level of REPEATABLE READ, InnoDB will also utilize gap locks. As long as you know the id (or even range of ids) to lock, then InnoDB can lock the gap so no other record can be inserted in that gap until we're done with it.
If your id column were an auto-increment column, then SELECT ... FOR UPDATE with INSERT INTO would be problematic because you wouldn't know what the new id was until you inserted it. However, since you know the id that you wish to insert, SELECT ... FOR UPDATE with INSERT will work.
CAVEAT
On the default isolation level, SELECT ... FOR UPDATE on a non-existent record does not block other transactions. So, if two transactions both do a SELECT ... FOR UPDATE on the same non-existent index record, they'll both get the lock, and neither transaction will be able to update the record. In fact, if they try, a deadlock will be detected.
Therefore, if you don't want to deal with a deadlock, you might just do the following:
INSERT INTO ...
Start a transaction, and perform the INSERT. Do your business logic, and either commit or rollback the transaction. As soon as you do the INSERT on the non-existent record index on the first transaction, all other transactions will block if they attempt to INSERT a record with the same unique index. If the second transaction attempts to insert a record with the same index after the first transaction commits the insert, then it will get a "duplicate key" error. Handle accordingly.
SELECT ... LOCK IN SHARE MODE
If you select with LOCK IN SHARE MODE before the INSERT, if a previous transaction has inserted that record but hasn't committed yet, the SELECT ... LOCK IN SHARE MODE will block until the previous transaction has completed.
So to reduce the chance of duplicate key errors, especially if you hold the locks for awhile while performing business logic before committing them or rolling them back:
SELECT bar FROM FooBar WHERE foo = ? LOCK FOR UPDATE
If no records returned, then
INSERT INTO FooBar (foo, bar) VALUES (?, ?)

In Oracle, the SELECT ... FOR UPDATE has no effect on a non-existent row (the statement simply raises a No Data Found exception). The INSERT statement will prevent a duplicates of unique/primary key values. Any other transactions attempting to insert the same key values will block until the first transaction commits (at which time the blocked transaction will get a duplicate key error) or rolls back (at which time the blocked transaction continues).

On Oracle:
Session 1
create table t (id number);
alter table t add constraint pk primary key(id);
SELECT *
FROM t
WHERE id = 1
FOR UPDATE;
-- 0 rows returned
-- this creates row level lock on table, preventing others from locking table in exclusive mode
Session 2
SELECT *
FROM t
FOR UPDATE;
-- 0 rows returned
-- there are no problems with locking here
rollback; -- releases lock
INSERT INTO t
VALUES (1);
-- 1 row inserted without problems

I wrote a detailed analysis of this thing on SQL Server: Developing Modifications that Survive Concurrency
Anyway, you need to use SERIALIZABLE isolation level, and you really need to stress test.

SQL Server only has the FOR UPDATE as part of a cursor. And, it only applies to UPDATE statements that are associated with the current row in the cursor.
So, the FOR UPDATE has no relationship with INSERT. Therefore, I think your answer is that it's not applicable in SQL Server.
Now, it may be possible to simulate the FOR UPDATE behavior with transactions and locking strategies. But, that may be more than what you're looking for.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas