Oracle : Inserting large dataset into a table - sql

I need to load data from different remote database into our own database. I write a single "complex" query using WITH statement. It is around 18 Million rows of data.
What is most efficient way to do the insert?
using cursor insert one by one
using INSERT INTO
or is there any other way?

The fastest way to do anything should be to use a single SQL statement. The next most efficient approach is to use a cursor doing BULK COLLECT operations to minimize context shifts between the SQL and PL/SQL engines. The least efficient approach is to use a cursor and process the data row-by-row.

As Justin wrote, the most efficient approach is to use a single SQL statement ( insert into ... select ... ). Additionally you can take advantage of direct-path insert

18 million rows will require quite a bit of rollback for your single insert stmt scenario. A cursor for loop would be much slower but you'd be able to commit every x rows.
Personally, I'd go old school and dump out to a file and load via sqlldr or data pump, esp as this is across databases.

You could use Data Synchronisation Studio and change the select statement to take 1 million at a time (I think 18m at once would probably overload your machine)

Related

How to run more than one query at a time in SQL Server

I have to generate a big report consists of 45 insert statements. How can I Run more than one insert statement at a time it by splitting the queries into groups.
USe Stored Procedure for that and by using it u can return value also.
You can write 45 insert statements in stored procedure.
Try using multiple stored procedures where each stored procedure handles several insert statement. Even if you execute these procedures one by one they should be executed in parallel on SQL server as long as you’re using different connections.
You might succeed in running multiple sessions (i.e. logins).[EDIT] I wrote, that an insert locks the table, which is wrong. Thanks #marc_s. [/EDIT]
However, if your insert precedes a complex query, you might be successful, as the queries could be carried out in parallel.
But, it greatly depends on the code.
Isn't there anything you can improve using the existing code? Usually, there is enough room for a performance boost just by looking at the statements.

How to do a massive insert

I have an application in which I have to insert in a database, with SQL Server 2008, in groups of N tuples and all the tuples have to be inserted to be a success insert, my question is how I insert these tuples and in the case that someone of this fail, I do a rollback to eliminate all the tuples than was inserted correctly.
Thanks
On SQL Server you might consider doing a bulk insert.
From .NET, you can use SQLBulkCopy.
Table-valued parameters (TVPs) are a second route. In your insert statement, use WITH (TABLOCK) on the target table for minimal logging. eg:
INSERT Table1 WITH (TABLOCK) (Col1, Col2....)
SELECT Col1, Col1, .... FROM #tvp
Wrap it in a stored procedure that exposes #tvp as parameter, add some transaction handling, and call this procedure from your app.
You might even try passing the data as XML if it has a nested structure, and shredding it to tables on the database side.
You should look into transactions. This is a good intro article that discusses rolling back and such.
If you are inserting the data directly from the program, it seems like what you need are transactions. You can start a transaction directly in a stored procedure or from a data adapter written in whatever language you are using (for instance, in C# you might be using ADO.NET).
Once all the data has been inserted, you can commit the transaction or do a rollback if there was an error.
See Scott Mitchell's "Managing Transactions in SQL Server Stored Procedures for some details on creating, committing, and rolling back transactions.
For MySQL, look into LOAD DATA INFILE which lets you insert from a disk file.
Also see the general MySQL discussion on Speed of INSERT Statements.
For a more detailed answer please provide some hints as to the software stack you are using, and perhaps some source code.
You have two competing interests, doing a large transaction (which will have poor performance, high risk of failure), or doing a rapid import (which is best not to do all in one transaction).
If you are adding rows to a table, then don't run in a transaction. You should be able to identify which rows are new and delete them should you not like how the look on the first round.
If the transaction is complicated (each row affects dozens of tables, etc) then run them in transactions in small batches.
If you absolutely have to run a huge data import in one transaction, consider doing it when the database is in single user mode and consider using the checkpoint keyword.

Query performance difference pl/sql forall insert and plain SQL insert

We have been using temporary table to store intermediate results in pl/sql Stored procedure. Could anyone tell if there is a performance difference between doing bulk collect insert through pl/sql and a plain SQL insert.
Insert into [Table name] [Select query Returning huge amount of data]
or
Cursor for [Select query returning huge amount of data]
open cursor
fetch cursor bulk collect into collection
Use FORALL to perform insert
Which of the above 2 options is better to insert huge amount of temporary data?.
Some experimental data for your problem (Oracle 9.2)
bulk collect
DECLARE
TYPE t_number_table IS TABLE OF NUMBER;
v_tab t_number_table;
BEGIN
SELECT ROWNUM
BULK COLLECT INTO v_tab
FROM dual
CONNECT BY LEVEL < 100000;
FORALL i IN 1..v_tab.COUNT
INSERT INTO test VALUES (v_tab(i));
END;
/
-- 2.6 sec
insert
-- test table
CREATE global TEMPORARY TABLE test (id number)
ON COMMIT preserve ROWS;
BEGIN
INSERT INTO test
SELECT ROWNUM FROM dual
CONNECT BY LEVEL < 100000;
END;
/
-- 1.4 sec
direct path insert
http://download.oracle.com/docs/cd/B10500_01/server.920/a96524/c21dlins.htm
BEGIN
INSERT /*+ append */ INTO test
SELECT ROWNUM FROM dual
CONNECT BY LEVEL < 100000;
END;
/
-- 1.2 sec
Insert into select must certainly be faster. Skips the overhead of storing the data in a collection first.
It depends on the nature of the work you're doing to populate the intermediate results. If the work can be done relatively simply in the SELECT statement for the INSERT, that will generally perform better.
However, if you have some complex intermediate logic, it may be easier (from a code maintenance point of view) to fetch and insert the data in batches using bulk collects/binds. In some cases it might even be faster.
One thing to note very carefully: the query plan used by the INSERT INTO x SELECT ... will sometimes be quite different to that used when the query is run by itself (e.g. in a PL/SQL explicit cursor). When comparing performance, you need to take this into account.
Tom Kyte of asktomhome fame has answered this question more firmly. If you are willing to do some searching you can find the question and his response which constains detailed testing results and explanations. He shows plsql cursor vs. plsql bulk collect including affect of periodic commit, vs. sql insert as select.
insert as select wins hands down all the time and the difference on even modest datasets is dramatic.
That said. the comment was made earlier about the complexity of intermediary computations. I can think of three situations where this would be relevant.
1) If computations require going outside of the Oracle database, then clearly a simple insert as select does not do the trick.
2) If the solution requires the use of PLSQL function calls then context switching can potentially kill your query and you may have better results with plsql calling plsql functions. PLSQl was made to call SQL but not the other way around. Thus calling PLSQL from SQL is expensive.
3) If computations make the sql code very difficulty to read then even though it may be slower, a plsql bulk collect solution may be better for these other reasons.
Good luck.
When we declare cursor explicitly, oracle will allocate a private SQL work area in our RAM. When you have select statement that returns multiple rows will be copied from table or view to private SQL work area as ACTIVE SET. Its size is the number of rows that meet your search criteria. Once cursor is opened, your pointer will be placed in the first row of ACTIVE SET. Here you can perform DML. For example if you perform some update operation. It will update any changes in rows in the work area and not in the table directly. So it is not using the table every time we need to update. It fetches once to the work area, then after performing operation, the update will be done once for all operations. This reduces input/output data transfer between database and user.
I Suggest using PL\SQL explicit cursor, u r just going to perform any DML operation at the private workspace alloted for the cursor. This will not hit the database server performance during peak hours

Bulk Insert into Oracle database: Which is better: FOR Cursor loop or a simple Select?

Which would be a better option for bulk insert into an Oracle database ?
A FOR Cursor loop like
DECLARE
CURSOR C1 IS SELECT * FROM FOO;
BEGIN
FOR C1_REC IN C1 LOOP
INSERT INTO BAR(A,
B,
C)
VALUES(C1.A,
C1.B,
C1.C);
END LOOP;
END
or a simple select, like:
INSERT INTO BAR(A,
B,
C)
(SELECT A,
B,
C
FROM FOO);
Any specific reason either one would be better ?
I would recommend the Select option because cursors take longer.
Also using the Select is much easier to understand for anyone who has to modify your query
The general rule-of-thumb is, if you can do it using a single SQL statement instead of using PL/SQL, you should. It will usually be more efficient.
However, if you need to add more procedural logic (for some reason), you might need to use PL/SQL, but you should use bulk operations instead of row-by-row processing. (Note: in Oracle 10g and later, your FOR loop will automatically use BULK COLLECT to fetch 100 rows at a time; however your insert statement will still be done row-by-row).
e.g.
DECLARE
TYPE tA IS TABLE OF FOO.A%TYPE INDEX BY PLS_INTEGER;
TYPE tB IS TABLE OF FOO.B%TYPE INDEX BY PLS_INTEGER;
TYPE tC IS TABLE OF FOO.C%TYPE INDEX BY PLS_INTEGER;
rA tA;
rB tB;
rC tC;
BEGIN
SELECT * BULK COLLECT INTO rA, rB, rC FROM FOO;
-- (do some procedural logic on the data?)
FORALL i IN rA.FIRST..rA.LAST
INSERT INTO BAR(A,
B,
C)
VALUES(rA(i),
rB(i),
rC(i));
END;
The above has the benefit of minimising context switches between SQL and PL/SQL. Oracle 11g also has better support for tables of records so that you don't have to have a separate PL/SQL table for each column.
Also, if the volume of data is very great, it is possible to change the code to process the data in batches.
If your rollback segment/undo segment can accomodate the size of the transaction then option 2 is better. Option 1 is useful if you do not have the rollback capacity needed and have to break the large insert into smaller commits so you don't get rollback/undo segment too small errors.
A simple insert/select like your 2nd option is far preferable. For each insert in the 1st option you require a context switch from pl/sql to sql. Run each with trace/tkprof and examine the results.
If, as Michael mentions, your rollback cannot handle the statement then have your dba give you more. Disk is cheap, while partial results that come from inserting your data in multiple passes is potentially quite expensive. (There is almost no undo associated with an insert.)
I think that in this question is missing one important information.
How many records will you insert?
If from 1 to cca. 10.000 then you should use SQL statement (Like they said it is easy to understand and it is easy to write).
If from cca. 10.000 to cca. 100.000 then you should use cursor, but you should add logic to commit on every 10.000 records.
If from cca. 100.000 to millions then you should use bulk collect for better performance.
As you can see by reading the other answers, there are a lot of options available. If you are just doing < 10k rows you should go with the second option.
In short, for approx > 10k all the way to say a <100k. It is kind of a gray area. A lot of old geezers will bark at big rollback segments. But honestly hardware and software have made amazing progress to where you may be able to get away with option 2 for a lot of records if you only run the code a few times. Otherwise you should probably commit every 1k-10k or so rows. Here is a snippet that I use. I like it because it is short and I don't have to declare a cursor. Plus it has the benefits of bulk collect and forall.
begin
for r in (select rownum rn, t.* from foo t) loop
insert into bar (A,B,C) values (r.A,r.B,r.C);
if mod(rn,1000)=0 then
commit;
end if;
end;
commit;
end;
I found this link from the oracle site that illustrates the options in more detail.
You can use:
Bulk collect along with FOR ALL that is called Bulk binding.
Because PL/SQL forall operator speeds 30x faster for simple table inserts.
BULK_COLLECT and Oracle FORALL together these two features are known as Bulk Binding. Bulk Binds are a PL/SQL technique where, instead of multiple individual SELECT, INSERT, UPDATE or DELETE statements are executed to retrieve from, or store data in, at table, all of the operations are carried out at once, in bulk. This avoids the context-switching you get when the PL/SQL engine has to pass over to the SQL engine, then back to the PL/SQL engine, and so on, when you individually access rows one at a time. To do bulk binds with INSERT, UPDATE, and DELETE statements, you enclose the SQL statement within a PL/SQL FORALL statement. To do bulk binds with SELECT statements, you include the BULK COLLECT clause in the SELECT statement instead of using INTO.
It improves performance.
I do neither for a daily complete reload of data. For example say I am loading my Denver site. There are other strategies for near real time deltas.
I use a create table SQL as I have found is just almost as fast as a bulk load
For example, below a create table statement is used to stage the data, casting the columns to the correct data type needed:
CREATE TABLE sales_dataTemp as select
cast (column1 as Date) as SALES_QUARTER,
cast (sales as number) as SALES_IN_MILLIONS,
....
FROM
TABLE1;
this temporary table mirrors my target table's structure exactly which is list partitioned by site.
I then do a partition swap with the DENVER partition and I have a new data set.

SQL Server - SQL Cursor vs ADO.NET

I have to compute a value involving data from several tables. I was wondering if using a stored procedure with cursors would offer a performance advantage compared to reading the data into a dataset (using simple select stored procedures) and then looping through the records? The dataset is not large, it consists in 6 tables, each with about 10 records, mainly GUIDs, several nvarchar(100) fields, a float column, and an nvarchar(max).
That would probably depend on the dataset you may be retrieving back (the larger the set, the more logical it may be to perform inside SQL Server instead of passing it around), but I tend to think that if you are looking to perform computations, do it in your code and away from your stored procedures. If you need to use cursors to pull the data together, so be it, but using them to do calculations and other non-retrieval functions I think should be shied away from.
Edit: This Answer to another related question will give some pros and cons to cursors vs. looping. This answer would seem to conflict with my previous assertion (read above) about scaling. Seems to suggest that the larger you get, the more you will probably want to move it off to your code instead of in the stored procedure.
alternative to a cursor
declare #table table (Fields int)
declare #count int
declare #i
insert inot #table (Fields)
select Fields
from Table
select #count = count(*) from #table
while (#i<=#count)
begin
--whatever you need to do
set #i = #i + 1
end
Cursors should be faster, but if you have a lot of users running this it will eat up your server resources. Bear in mind you have a more powerful coding language when writing loops in .Net rather than SQL.
There are very few occasions where a cursor cannot be replaced using standard set based SQL. If you are doing this operation on the server you may be able to use a set based operation. Any more details on what you are doing?
If you do decide to use a cursor bear in mind that a FAST_FORWARD read only cursor will give you the best performance, and make sure that you use the deallocate statement to release it. See here for cursor tips
Cursors should be faster (unless you're doing something weird in SQL and not in ADO.NET).
That said, I've often found that cursors can be eliminated with a little bit of legwork. What's the procedure you need to do?
Cheers,
Eric