I have to load a text file into a database on a daily basis that is about 50MB in size. I am using Perl DBI to load the file using insert statements into a SQL Server. It is not very performant, and I was wondering if there are better/faster ways of loading from DBI into SQL Server.
You should probably use the BULK INSERT statement. No reason you couldn't run that from DBI.
When doing large INSERT/UPDATE operations, it's generally useful to disable any indexes on the target table(s), make the changes, and re-enable the indexes. This way, the indexes only have to be rebuilt once instead of rebuilding them after each INSERT/UPDATE statement runs.
(This can also be applied in a zero-downtime way by copying the original table to an unindexed temp table, doing your work on the temp table, adding indexes, dropping the original table, and renaming the temp table to replace it.)
Another way to speed things up (if not already done) is to use prepared statements and bind-values.
Related
I have a large .sql file(with 1 Million records) which has insert statements.
this is provided by external system I have no control over.
I have to import this data into my database table, I thought it is a simple job, But Alas how wrong I was.
I am using plsql developer from AllroundAutomations, I went to
Tools -- Import Tables -- SQL Inserts -- pointed exe to sqlldr.exe,
and input to my .sql file with insert statements.
But this process is very slow only inserting around 100 records in a minute, I was expecting this whole process to take not more than an hour.
Is there a better way to do this, sounds simple to just import all data, but it takes hell lot of time.
P.S: I am a developer and not DBA and not an expert on Oracle, so any help appreciated.
When running massive numbers of INSERT's your should first drop all indexes on the table, then disable all constraints, then run your INSERT statements. You should also modify your script to include a COMMIT after every 1000 records or so. Afterwards re-add your indexes, re-enable all constraints, and gather statistics on that table (DBMS_STATS.GATHER_TABLE_STATS).
Best of luck.
So I need to do multiple bulk inserts into a table with row level triggers. I thought it would be a good idea to gather the generated ids first, combine them with my data and then do a direct=true sql load. Normally this would work fine but the table is partitioned by reference so it cannot disable the foreign key constraint that would allow me to do the direct load.
Does anyone know of anyway around this? My first solution of bulk collecting into a varray and inserting every 100,000 went moderately fast but if I was able to do a direct load, that would be much faster.
ERROR: SQL*Loader-965: Error -1 disabling constraint client_fk on
table my_table
The manual implies there is no way to have SQL*Loader use a direct path load but not disable the foreign keys.
But direct-path inserts can work on reference partitioned tables, even with the foreign keys enabled, as I demonstrated in this question and answer.
Convert the process from SQL*Loader to an external table INSERT statement. SQL*Loader and external tables use similar mechanisms so the conversion shouldn't be too difficult. External tables require a little more work - you have to write the INSERT with an append hint, and manually disable and re-enable triggers and perhaps other objects. But that extra control allows loading data quickly with direct-path inserts.
I have an INSERT statement that is eating a hell of a lot of log space, so much so that the hard drive is actually filling up before the statement completes.
The thing is, I really don't need this to be logged as it is only an intermediate data upload step.
For argument's sake, let's say I have:
Table A: Initial upload table (populated using bcp, so no logging problems)
Table B: Populated using INSERT INTO B from A
Is there a way that I can copy between A and B without anything being written to the log?
P.S. I'm using SQL Server 2008 with simple recovery model.
From Louis Davidson, Microsoft MVP:
There is no way to insert without
logging at all. SELECT INTO is the
best way to minimize logging in T-SQL,
using SSIS you can do the same sort of
light logging using Bulk Insert.
From your requirements, I would
probably use SSIS, drop all
constraints, especially unique and
primary key ones, load the data in,
add the constraints back. I load
about 100GB in just over an hour like
this, with fairly minimal overhead. I
am using BULK LOGGED recovery model,
which just logs the existence of new
extents during the logging, and then
you can remove them later.
The key is to start with barebones
tables, and it just screams. Building
the index once leaves you will no
indexes to maintain, just the one
index build per index.
If you don't want to use SSIS, the point still applies to drop all of your constraints and use the BULK LOGGED recovery model. This greatly reduces the logging done on INSERT INTO statements and thus should solve your issue.
http://msdn.microsoft.com/en-us/library/ms191244.aspx
Upload the data into tempdb instead of your database, and do all the intermediate transformations in tempdb. Then copy only the final data into the destination database. Use batches to minimize individual transaction size. If you still have problems, look into deploying trace flag 610, see The Data Loading Performance Guide and Prerequisites for Minimal Logging in Bulk Import:
Trace Flag 610
SQL Server 2008 introduces trace flag
610, which controls minimally logged
inserts into indexed tables.
I'm running the following SAS command:
Proc SQL;
Delete From Server003.CustomerList;
Quit;
Which is taking over 8 minutes... when it takes only a few seconds to read that file. What could be cause a delete to take so long and what can I do to make it go faster?
(I do not have access to drop the table, so I can only delete all rows)
Thanks,
Dan
Edit: I also apparently cannot Truncate tables.
This is NOT regular SQL. SAS' Proc SQL does not support the Truncate statement. Ideally, you want to figure out what's going on with the performance of the delete from; but if what you really need is truncate functionality, you could always just use pure SAS and not mess with SQL at all.
data Server003.CustomerList;
set Server003.CustomerList (obs=0);
run;
This effectively performs and operates like a Truncate would. It maintains the dataset/table structure but fails to populate it with data (due to the OBS= option).
Are there a lot of other tables which have foreign keys to this table? If those tables don't have indexes on the foreign key column(s) then it could take awhile for SQL to determine whether or not it's safe to delete the rows, even if none of the other tables actually has a value in the foreign key column(s).
Try adding this to your LIBNAME statement:
DIRECT_EXE=DELETE
According to SAS/ACCESS(R) 9.2 for Relational Databases: Reference,
Performance improves significantly by using DIRECT_EXE=, because the SQL delete statement is passed directly to the DBMS, instead of SAS reading the entire result set and deleting one row at a time.
I would also mention that in general SQL commands run slower in SAS PROC SQL. Recently I did a project and moved the TRUNCATE TABLE statements into a Stored Procedure to avoid the penalty of having them inside SAS and being handled by their SQL Optimizer and surrounding execution shell. In the end this increased the performance of the TRUNCATE TABLE substantially.
It might be slower because disk writes are typically slower than reads.
As for a way around it without dropping/truncating, good question! :)
You also could consider the elegant:
proc sql; create table libname.tablename like libname.tablename; quit;
I will produce a new table with the same name and same meta data of your previous table and delete the old one in the same operation.
How do we insert data about 2 million rows into a oracle database table where we have many indexes on it?
I know that one option is disabling index and then inserting the data. Can anyone tell me what r the other options?
bulk load with presorted data in index key order
Check SQL*Loader out (especially the paragraph about performance optimization) : it is the standard bulk loading utility for Oracle, and it does a good job once you know how to use it (as always with Oracle).
there are many tricks to fasten de insert, below i wrote some of them
if you use sequence.nextval for insert make sure sequence has big cache value (1000 is enough usually)
drop indexes before insert and create afterwards (make sure you get the create scripts of indexes before dropping) while creating you can use parallel option
if target table has fk dependencies disable them before insert and after insert enable again. if you are sure of your data you can use novalidate option (novalidate option is valid for oracle, other rdbms systems probably have similar option)
if you select and insert you can give parallel hint for select statement and for insert you can use append hint (direct-path insert ) (direct-path insert concept is valid for oracle, other rdbms systems probably have similar option)
Not sure how you are inserting the records; if you can; insert the data in smaller chunks. In my experience 50 sets of 20k records is often quicker than 1 x 1000000
Make sure your database files are large enough before you start save you from database growth during the insert ...
If you are sure about the data, besides the index you can disable referential and constraint checks. You can also lower the transaction isolation level.
All these options come with a price, though. Each option increases your risk of having corrupt data in the sense that you may end up with null FK's etc.
As an another option, one can use oracle advanced and faster data pump (expdp, impdp) utilities availability 10 G onward. Though, Oracle still supports old export/import (exp, imp).
Oracle provides us with many choices for data loading, some way faster than others:
Oracle10 Data Pump Oracle import utility
SQL insert and merge
statements PL/SQL bulk loads for the forall PL/SQL operator
SQL*Loader
The pros/cons of each can be found here ..