Oracle Table LOGGING Mode - Bulk data insertion - sql

I have a scenario wherein I need to copy 500 million rows from a Table1 to Table2. Couple of points,
Table1 has 2 billion rows.
Table2 is a new table and identical to Table1.
Table1 and Table2 both are of List partition type.
Both tables has to be in same tablespace and tablespace is created with LOGGING mode.
TABLESPACE Block size is: 8192, FORCE_LOGGING NO, AUTO EXTEND ON. REDO ARCHIVAL ENABLED
So, here is what my approach to do this activity and I ask for recommendations to improve or maybe prevent some sudden unwanted situations.
Create Table2 with same structure without any indexes or PK.
Alter Table2 nologging; --Putting the table in NOLOGGING mode to stop redo generation. This is done just to improve performance.
Do this activity in 50 parallel jobs (Jobs created based on partitioned column). Partitioned Column has 120 distinct values. So total 120 jobs. First 50 will be posted and as soon as 1 finishes, 51th will be posted and so on.
Using a Cursor, Bulk Fetch with limit of 5000 and FORALL for insert (With APPEND Hint). Commit immediately after 1 iteration so commit freq is 5000.
After all the jobs are finished, put Table2 back in LOGGING mode.
alter table Table2 logging;
Create all required indexes and PK on Table2 with Parallel mode enabled and then alter index NOPARALLEL.
Any suggestions? Thanks a lot for your time.

Use a single SELECT statement instead of PL/SQL.
There's no need to commit in chunks or to have a parallel strategy that mirrors the partitions. If the APPEND hint works and a direct-path write is used then there won't be any significant REDO or UNDO usage, so you don't need to run in chunks to reduce resource consumption. Oracle can easily divide a segment into granules - it's just copying a bunch of blocks from one place to another, it doesn't matter if it processes them per-partition. (Some possible exceptions are if you're using a weird column that doesn't support parallel SQL, or if you're joining tables and using partition-wise joins.)
alter session enable parallel dml;
alter table table2 nologging;
--Picking a good DOP can be tricky. 32 might not be the best number for you.
insert /*+ append parallel(32) */ into table2
select * from table1;
commit;
alter table table2 logging;
Before you run this, check the execution plan. There are lots of things that can prevent direct-path writes and you want to find them before you start the DML.
In the execution plan, make sure you see "LOAD AS SELECT" to ensure direct-path writes, "PX" to ensure parallelism, and a "PX" operation before the "LOAD AS SELECT" to ensure that both the writes and the reads are done in parallel.
alter session enable parallel dml;
alter table table2 nologging;
explain plan for
insert /*+ append parallel(32) */ into table2
select * from table1;
select * from table(dbms_xplan.display);
I often find it's not worth dealing with indexes separately. But that may depend on the number of indexes.

Related

What is the best way to insert 6000000 records from one table to another table in ORACLE?

Hello guys i need to copy 6000000 rows from TMP_DATA to DATA what is the best way to do this?
I was thinking of doing INSERT INTO DATA SELECT * FROM TMP_DATA. But i think it will take ages to do the insert.
What do you suggest?
Kind Regards,
To expand a bit on Anders' answer and mathguy's comments, do the following:
alter table data nologging;
alter session enable parallel dml;
-- disable any triggers on `data` and temporarily drop any indexes
insert /*+ append */ * into data
select /*+ parallel (4) */ * from tmp_data
--sample (10) -- if tmp_data has 60 million rows: 10 means 10%
-- where rownum < 6000001
-- pick one of the two prior clauses if tmp_table has > 6 million rows
after the insert is done:
alter table data nologging;
-- enable triggers and recreate indexes
and have the dba do a backup as the data table will not be able to be recovered if there was any issue after the load.
There are a couple of ways of doing this:
If you want speed use parallel and nologging so (on a new table):
-- Caveat: this method is fast but will use a lot of cpu resources so just let
-- the DBA know. Also, index the table at the end.
create table DATA SELECT parallel 4 nologging as
select * from TMP_DATA;
If you are using an existing table one of the things that could potentially decrease the insertion performance is the use of indexes. You can temporarily disable the index to allow faster insertions.

Copying timestamp columns within a Postgres table

I have a table with about 30 million rows in a Postgres 9.4 db. This table has 6 columns, the primary key id, 2 text, one boolean, and two timestamp. There are indices on one of the text columns, and obviously the primary key.
I want to copy the values in the first timestamp column, call it timestamp_a into the second timestamp column, call it timestamp_b. To do this, I ran the following query:
UPDATE my_table SET timestamp_b = timestamp_a;
This worked, but it took an hour and 15 minutes to complete, which seems a really long time to me considering, as far as I know, it's just copying values from one column to the next.
I ran EXPLAIN on the query and nothing seemed particularly inefficient. I then used pgtune to modify my config file, most notably it increased the shared_buffers, work_mem, and maintenance_work_mem.
I re-ran the query and it took essentially the same amount of time, actually slightly longer (an hour and 20 mins).
What else can I do to improve the speed of this update? Is this just how long it takes to write 30 million timestamps into postgres? For context I'm running this on a macbook pro, osx, quadcore, 16 gigs of ram.
The reason this is slow is that internally PostgreSQL doesn't update the field. It actually writes new rows with the new values. This usually takes a similar time to inserting that many values.
If there are indexes on any column this can further slow the update down. Even if they're not on columns being updated, because PostgreSQL has to write a new row and write new index entries to point to that row. HOT updates can help and will do so automatically if available, but that generally only helps if the table is subject to lots of small updates. It's also disabled if any of the fields being updated are indexed.
Since you're basically rewriting the table, if you don't mind locking out all concurrent users while you do it you can do it faster with:
BEGIN
DROP all indexes
UPDATE the table
CREATE all indexes again
COMMIT
PostgreSQL also has an optimisation for writes to tables that've just been TRUNCATEd, but to benefit from that you'd have to copy the data to a temp table, then TRUNCATE and copy it back. So there's no benefit.
#Craig mentioned an optimization for COPY after TRUNCATE: Postgres can skip WAL entries because if the transaction fails, nobody will ever have seen the new table anyway.
The same optimization is true for tables created with CREATE TABLE AS:
What causes large INSERT to slow down and disk usage to explode?
Details are missing in your description, but if you can afford to write a new table (no concurrent transactions get in the way, no dependencies), then the fastest way might be (except if you have big TOAST table entries - basically big columns):
BEGIN;
LOCK TABLE my_table IN SHARE MODE; -- only for concurrent access
SET LOCAL work_mem = '???? MB'; -- just for this transaction
CREATE my_table2
SELECT ..., timestamp_a, timestamp_a AS timestamp_b
-- columns in order, timestamp_a overwrites timestamp_b
FROM my_table
ORDER BY ??; -- optionally cluster table while being at it.
DROP TABLE my_table;
ALTER TABLE my_table2 RENAME TO my_table;
ALTER TABLE my_table
, ADD CONSTRAINT my_table_id_pk PRIMARY KEY (id);
-- more constraints, indices, triggers?
-- recreate views etc. if any
COMMIT;
The additional benefit: a pristine (optionally clustered) table without bloat. Related:
Best way to populate a new column in a large table?

SQL Server : how do I add a hint to every query against a table?

I have a transactional database. one of the tables is almost empty (A). it has a unique indexed column (x), and no clustered index.
2 concurrent transactions:
begin trans
insert into A (x,y,z) (1,2,3)
WAITFOR DELAY '00:00:02'; -- or manually run the first 2 lines only
select * from A where x=1; -- small tables produce a query plan of table scan here, and block against the transaction below.
commit
begin trans
insert into A (x,y,z) (2,3,4)
WAITFOR DELAY '00:00:02';
-- on a table with 3 or less pages this hint is needed to not block against the above transaction
select * from A with(forceseek) -- force query plan of index seek + rid lookup
where x=2;
commit
My problem is that when the table has very few rows the 2 transactions can deadlock, because SQL Server generates a table scan for the select, even though there is an index, and both wait on the lock held by the newly inserted row of the other transaction.
When there are lots of rows in this table, the query plan changes to an index seek, and both happily complete.
When the table is small, the WITH(FORCESEEK) hint forces the correct query plan (5% more expensive for tiny tables).
is it possible to provide a default hint for all queries on a table to pretend to have the 'forceseek' hint?
the deadlocking code above was generated by Hibernate, is it possible to have hibernate emit the needed query hints?
we can make the tables pretend to be large enough that the query optimizer selects the index seek with the undocumented features in UPDATE STATISTICS http://msdn.microsoft.com/en-AU/library/ms187348(v=sql.110).aspx . Can anyone see any downsides to making all tables with less than 1000 rows pretend they have 1000 rows over 10 pages?
You can create a Plan Guide.
Or you can enable Read Committed Snapshot isolation level in the database.
Better still: make the index clustered.
For small tables that experience high update ratio, perhaps you can apply the advice from Using tables as Queues.
Can anyone see any downsides to making all tables with less than 1000 rows pretend they have 1000 rows over 10 pages?
If the table appears in a another, more complex, query (think joins) then the cardinality estimates may cascade wildly off and produce bad plans.
You could create a view that is a copy of the table but with the hint and have queries use the view instead:
create view A2 as
select * from A with(forceseek)
If you want to preserve the table name used by queries, rename the table to something else then name the view "A":
sp_rename 'A', 'A2';
create view A as
select * from A2 with(forceseek)
Just to add another option you may consider.
You can lock entire table on update by using
ALTER TABLE MyTable SET LOCK_ESCALATION = TABLE
This workaround is fine if you do not have too many updates that will queue and slow performance.
It is table-wide and no updates to other code is needed.

Bulk insert into partitioned table and table level lock

I want to know the core reason(the mechanics of segments, blocks, locks that the engine does) why bulk insert(with direct-path) locks the entire table so if I insert into a partition, I can't truncate another partition which is not affected(apparently) by insert.
A conventional insert(without append hint) permits to truncate some nonaffected partitions.(Notice that i speak about non-commited transaction.)
Below an example to ilustrate it.
Let be a table:
CREATE TABLE FG_TEST
(COL NUMBER )
PARTITION BY RANGE (COL)
(PARTITION "P1" VALUES LESS THAN (1000),
PARTITION "P2" VALUES LESS THAN (2000));
Insert into table fg_test values (1);
insert into table fg_test values (1000);
commit;
Session 1:
insert into table fg_test select * from fg_test where col >=1000;
--1 rows inserted;
Session 2:
alter table fg_test truncate partition p1;
--table truncated
Session 1:
rollback;
insert /*+append */ into table fg_test select * from fg_test where col >=1000;
--1 rows inserted;
Session 2:
alter table fg_test truncate partition p1;
--this throws ORA-00054: resource busy and acquire with NOWAIT specified
--or timeout expired
The Doc on Diret-Path Insert is pretty abrupt on this subject and just says:
During direct-path INSERT, the database obtains exclusive locks on the
table (or on all partitions of a partitioned table). As a result,
users cannot perform any concurrent insert, update, or delete
operations on the table, and concurrent index creation and build
operations are not permitted.
The How Direct-Path INSERT Works does not explain why the lock is needed for all partitions.
And why conventional insert does not lock nonaffected partitions? (My intuition is that the lock is done at block level)
Your premise is slightly wrong. A direct-path insert does not lock the entire table if you use the partition extension clause.
Session 1:
insert /*+append */ into fg_test partition (p2)
select * from fg_test where col >=1000;
Session 2:
alter table fg_test truncate partition p1;
--table truncated
The new question is: When the partition extension clause is NOT used, why do conventional and direct-path inserts have different locking mechanisms? This clarification makes the question easier, but without inside knowledge the answer below is still only a guess.
It was easier to code a feature that locks the entire table. And it runs faster, since there is no need to track which partitions are updated.
There's usually no need for a more fine-grained lock. Most systems or processes that use direct-path writes only update one large table at a time. If a more fine-grained lock is really needed, the partition extension clause can be used. It's not quite as convenient, since only one partition can be referenced at a time. But it's good enough 99.9% of the time.
I found the follwing answer on asktom.oracle.com:
Ask Tom: Inserts with APPEND Hint
Tom explains many of the inner workings, but the reason why Oracle locks the whole table and not only affected partitions is still not clear.
Maybe it's just a design decision (e.g. not wanting the big bulky direct load to be potentially blocked by one smallish uncommited transaction and therefore locking all partitions ...)

SQL Insert with large dataset

When running a query like "insert into table " how do we handle the commit size? I.e. are all records from anotherTable inserted in a single transaction OR is there a way to set a commit size?
Thanks very much ~Sri
PS: I am a first timer here, and this site looks very good!
In good databases that is an atomic statement, so no, there is no way to limit the number of records inserted - which is a good thing!
In the context that the original poster wants to avoid rollback space problems, the answer is pretty straightforward. The rollback segments should be sized to accpomodate the size of transactions, not the other way round. You commit when your transaction is complete.
I've written code in various langues, mostly Java, to do bulk inserts like what you described. Each time I did it, mostly from parsing some input file or something like that, I would basically just prepare a sub-set of data to insert from the total amount (usually batches of 4000 or so) and feed that data to our DAO layer. So it was done programatically. We never noticed any real performance hit for doing it this way and we were dealing with a few million records. If you have large data sets to insert the operation will "take awhile" regardless of how you do it.
You can't handle the commit size unless you explicitly code it. For example you could use a where loop, and code up a way to limit the ammount of data your selecting.
David Aldridge is right, size the rollback segment based on the maximum transaction, when you want the INSERT to either succeed or fail as a whole.
Some alternatives:
If you don't care about being able to roll it back (which is what the segment is there for), you could ALTER TABLE and add the NOLOGGING clause. But that's not a wise move unless you're loading a reporting table where you drop all old rows and load new ones, or some other special cases.
If you're okay with some rows getting inserted and others failing for some reason, then add support for handling the failures, using the INSERT INTO LOG ERRORS INTO syntax.
If you need the data set to be limited, build that limit into the query.
For example, in Microsoft SQL Server parlance, you can use "TOP N" to make sure the query only returns a limited number of rows.
INSERT INTO thisTable
SELECT TOP 100 * FROM anotherTable;
The reason why I want to do that is to avoid the rollback segment going out of space. Also, I want to see results being populated in the target table at regular intervals.
I dont want to use a where loop because it might add performance overheads. Isn't it?
~Sri
You are right, you may want to run large inserts in batches. The attached link shows a way to do it in SQL Server, if you are using a different backend you would do something simliar but the exact syntax might be differnt. This is a case when a loop is acceptable.
http://www.tek-tips.com/faqs.cfm?fid=3141
"The reason why I want to do that is to avoid the rollback segment going out of space. Also, I want to see results being populated in the target table at regular intervals."
The first is simply a matter of sizing the undo tablespace correctly. Since the undo is a delete of an existing row, it doesn't require a lot of space. Conversely, a delete generally requires more space because it has to have a copy of the entire deleted row to re-insert it to undo it.
For the second, have a look at v$session_longops and/or rows_processed in v$sql
INSERT INTO TableInserted
SELECT *
FROM (
SELECT *,
ROW_NUMBER() OVER (ORDER BY ID) AS RowNumber
FROM TableSelected
) X
WHERE RowNumber BETWEEN 101 AND 200
You could wrap the above into a while loop pretty easily, replacing the 101 and 200 with variables. It's better than doing 1 record at a time.
I don't know what versions of Oracle support window functions.
This is an extended comment to demonstrate that setting indexes to NOLOGGING will not help reduce UNDO or REDO for INSERTs.
The manual implies NOLOGGING indexes may help improve DML by reducing UNDO and REDO. And since NOLOGGING helps with table DML it seems logical that it would also help with INDEX changes. But this test case demonstrates that changing indexes to NOLOGGING has no affect on INSERT statements.
drop table table_no_index;
drop table table_w_log_index;
drop table table_w_nolog_index;
--#0: Before
select name, value from v$mystat natural join v$statname where display_name in ('undo change vector size', 'redo size') order by 1;
--#1: NOLOGGING table with no index. This is the best case scenario.
create table table_no_index(a number) nologging;
insert /*+ append */ into table_no_index select level from dual connect by level <= 100000;
commit;
select name, value from v$mystat natural join v$statname where display_name in ('undo change vector size', 'redo size') order by 1;
--#2: NOLOGGING table with LOGGING index. This should generate REDO and UNDO.
create table table_w_log_index(a number) nologging;
create index table_w_log_index_idx on table_w_log_index(a);
insert /*+ append */ into table_w_log_index select level from dual connect by level <= 100000;
commit;
select name, value from v$mystat natural join v$statname where display_name in ('undo change vector size', 'redo size') order by 1;
--#3: NOLOGGING table with NOLOGGING index. Does this generate as much REDO and UNDO as previous step?
create table table_w_nolog_index(a number) nologging;
create index table_w_nolog_index_idx on table_w_nolog_index(a) nologging;
insert /*+ append */ into table_w_nolog_index select level from dual connect by level <= 100000;
commit;
select name, value from v$mystat natural join v$statname where display_name in ('undo change vector size', 'redo size') order by 1;
Here are the results from the statistics queries. The numbers are cumulative for the session. Test cases #2 and #3 have the same increase in UNDO and REDO.
--#0: BEFORE: Very little redo or undo since session just started.
redo size 35,436
undo change vector size 10,120
--#1: NOLOGGING table, no index: Very little redo or undo.
redo size 88,460
undo change vector size 21,772
--#2: NOLOGGING table, LOGGING index: Large amount of redo and undo.
redo size 6,895,100
undo change vector size 3,180,920
--#3: NOLOGGING table, NOLOGGING index: Large amount of redo and undo.
redo size 13,736,036
undo change vector size 6,354,032
You may just want to make the indexes NOLOGGING. That way the table data is recoverable, but the indexes will need to be rebuilt if table is recovered. Index maintenance can create a lot of undo.