What is the best way to execute 100k insert statements? - sql

I have created a set of 100k insert queries to generate data in multiple oracle tables for my performance testing. What is the best way to execute this ?
In the past, I've tried tools like Oracle SQL developer and Toad. However not sure if it can handle this large volume.
Simple insert statements like -
INSERT INTO SELLING_CODE (SELC_ID, VALC_ID, PROD_ID, SELC_CODE, SELC_MASK, VALC_ID_STATUS)
VALUES (5000001, 63, 1, '91111111', 'N/A', 107);

Inserting 100,000 rows with SQL statements is fine. It's not a huge amount of data and there are a few simple tricks that can help you keep the run time down to a few seconds.
First, make sure that your tool is not displaying something for each statement. Copying and pasting the statements into a worksheet window would be horribly slow. But saving the statements into a SQL*Plus script, and running that script can be fast. Use the real SQL*Plus client if possible. That program is available on almost any system and is good at running small scripts.
If you have to use SQL Developer, save the 100K statements in a text file, and then run this as a script (F5). This method took 45 seconds on my PC.
set feedback off
#C:\temp\test1.sql
Second, batch the SQL statements to eliminate the overhead. You don't have to batch all of them, batching 100 statements-at-a-time is enough to reduce 99% of the overhead. For example, generate one thousand statements like this:
INSERT INTO SELLING_CODE (SELC_ID, VALC_ID, PROD_ID, SELC_CODE, SELC_MASK, VALC_ID_STATUS)
select 5000001, 63, 1, '91111111', 'N/A', 107 from dual union all
select 5000001, 63, 1, '91111111', 'N/A', 107 from dual union all
...
select 5000001, 63, 1, '91111111', 'N/A', 107 from dual;
Save that in a text file, run it the same way in SQL Developer (F5). This method took 4 seconds on my PC.
set feedback off
#C:\temp\test1.sql
If you can't significantly change the format of the INSERT statements, you can simply add a BEGIN and END; / between every 100 lines. That will pass 100 statements at a time to the server, and significantly reduce the network overhead.
For even faster speeds, run the script in regular SQL*Plus. On my PC it only takes 2 seconds to load the 100,000 rows.
For medium-sized data like this it's helpful to keep the convenience of SQL statements. And with a few tricks you can get the performance almost the same as a binary format.

Related

Speed up Python executemany

I'm inserting data from one database to another, so I have 2 connections (Conn1 and Conn2). Below is the code (using pypyodbc).
import pypyodbc
Conn1_Query = "SELECT column FROM Table"
Conn1_Cursor.execute(Conn1_Query)
Conn1_Data = Conn1_Cursor.fetchall()
Conn1_array = []
for row in Conn1_Data:
Conn1_array.append(row)
The above part runs very quickly.
stmt = "INSERT INTO TABLE(column) values (?)"
Conn2_Cursor.executemany(stmt, Conn1_array)
Conn2.commit()
This part is extremely slow. I've also tried to do a for loop to insert each row at a time using cursor.execute, but that is also very slow. What am I doing wrong and is there anything I can do to speed it up? Thanks for taking a look.
Thought I should also add that the Conn1 data is only ~50k rows. I also have some more setup code at the beginning that I didn't include because it's not pertinent to the question. It takes about 15 minutes to insert. As a comparison, it takes about 25 seconds to write the output to a csv file.
Yes, executemany under pypyodbc sends separate INSERT statements for each row. It acts just the same as making individual execute calls in a loop. Given that pypyodbc is no longer under active development, that is unlikely to change.
However, if you are using a compatible driver like "ODBC Driver xx for SQL Server" and you switch to pyodbc then you can use its fast_executemany option to speed up the inserts significantly. See this answer for more details.

Error While Inserting large amount of data using Insert statement in SQL Server 2008

I am trying to insert records into table have large amount of data
File Description:
Size : 65.0 MB
Records count : 10000
My Sample Data:
INSERT INTO tbldata(col1,col2,col3)values(col1,col2,col3)
GO
INSERT INTO tbldata(col1,col2,col3)values(col1,col2,col3)
GO
INSERT INTO tbldata(col1,col2,col3)values(col1,col2,col3)
GO
INSERT INTO tbldata(col1,col2,col3)values(col1,col2,col3)
GO
.......
INSERT INTO tbldata(col1,col2,col3)values(col1,col2,col3)
GO
INSERT INTO tbldata(col1,col2,col3)values(col1,col2,col3)
GO
.......
UPTO 10000 ROWS
ERROR:
Exception of type 'System.OutOfMemoryException' was thrown.(mscorlib)
I Tried:
I verified this answer
Under SQL Server\Properties\Memory there is a setting for Minimum Memory Per Query. you can raise this number temporarily to help increase the number of records between the GO statements. In my case I raised this to 5000 (10000 caused a system out of memory error, not good) so I settled for 5000, after a few tests I found that I could now import about 20,000 rows so I placed a GO statement every 20,000 rows (took about 10 minutes) and I was able to import over 200,000 rows in one query.
The maximum batch size for SQL Server 2005 is 65,536 * Network Packet Size (NPS), where NPS is usually 4KB. That works out to 256 MB. That would mean that your insert statements would average 5.8 KB each. That doesn't seem right, but maybe there are extraneous spaces or something unusual in there.
My first suggestion would be to put a "GO" statement after every INSERT statement. This will break your single batch of 45,000 INSERT statements into 45,000 separate batches. This should be easier to digest. Be careful, if one of those inserts fails you may have a hard time finding the culprit. You might want to protect yourself with a transaction. You can add those statements quickly if your editor has a good search-and-replace (that will let you search on and replace return characters like \r\n) or a macro facility.
The second suggestion is to use a Wizard to import the data straight from Excel. The wizard builds a little SSIS package for you, behind the scenes, and then runs that. It won't have this problem
reference got from this Out of memory exception

Transferring tables with an "insert from location" statement in Sybase IQ is very slow

I am trying to transfer several tables from a Sybase IQ database on one machine, to the same database on another machine (exact same schema and table layout etc).
To do this I'm using an insert from location statement:
insert into <local table> location <other machine> select * from mytablex
This works fine, but the problem is that it is desperately slow. I have a 1 gigabit connection between both machines, but the transfer rate is nowhere near that.
With a 1 gigabyte test file, it takes only 1 or 2 minutes to transfer it via ftp (just as a file, nothing to do with IQ).
But I am only managing 100 gigabytes over 24 hours in IQ. That means that the transfer rate is more like 14 or 15 minutes for 1 gigabyte for the data going through Sybase IQ.
Is there any way I can speed this up?
I saw there is an option to change the packet size, but would that make a difference? Surely if the transfer is 7 times faster for a file the packet size can't be that much of a factor?
Thanks! :)
It appears from the documentation here and here that using insert into is a row by row operation, and not a bulk operation. This could explain the performance issues that you are seeing.
You may want to look at the bulk loading LOAD TABLE operation instead.
If I recall correctly, IQ 15.x has known bugs where packetsize is effectively ignored for insert...location...select and the default 512 is always used.
The insert...location...select is a bulk tds operation typically, however we have found it to have limited value when working with gigabytes of data, and built a process to handle extract/Load Table that is significantly faster.
I know it's not the answer you want, but performance appears to degrade as the data size grows. Some tables will actually never finish, if they are large enough.
Just a thought, you might want to specify the exact columns and wrap in an exec with dynamic sql. Dynamic SQL is a no-no, but if you need the proc executable in dev/qa + prod environments, there really isn't another option. I'm assuming this will be called in a controlled environment anyways, but here's what I mean:
declare #cmd varchar(2500), #location varchar(255)
set #location = 'SOMEDEVHOST.database_name'
set #cmd = 'insert localtablename (col1, col2, coln...) ' +
''''+ trim(#location)+ '''' +
' { select col1, col2, coln... from remotetablename}'
select #cmd
execute(#cmd)
go

Removing unwanted SQL queries based on a condition

I have not had experience in SQL queries or SQL database , so please excuse me if my terminology is wrong.
So, I have a file containing around 17,000 SQL insert statements where I enter data for 5 columns/attributes in a database. In those 17,000 statements there are only around 1200 statements which have data for all of the 5 columns/attributes while the rest have data only for 4 columns. I need to delete all those unwanted statements( which dont have data for all 5 columns).
Is there a simple way/process to do it other than going one by one and deleting? If so, it would be great if someone could help me out with it.
A different approach from my fine colleagues here would be to run the file into a staging/disposable database. Use the delete that #Rob called out in his response to pare the table down to the desired dataset. Then use an excellent, free tool like SSMS Tools Pack to reverse engineer those insert statements.
I can think of two approaches:
1: Using SQL: insert all the data and then run a query that removes any records where it does not have all of the necessary data. If the table is not currently empty, keep track of the ID where your current data "ends" so that your query can use that as a WHERE statement.
DELETE FROM myTable WHERE a IS NULL OR b IS NULL /* etc. */
2: Process the SQL file with a regular expression: Use a text editor or command line to match either "bad" records or "good" records. Most text editors have a find and replace that allows you to use regular expressions. And command line you can use grep or other tools to process. Or even a script that parses in your language of choice, for that matter.
Open file in notepad++, replace all "bad" lines using regular expressions.

is there a maximum number of inserts that can be run in a batch sql script?

I have a series of simple "Insert INTO" type statements but after running about 3 or 4 of them the script stops and i get empty sets when i try selecting from the appropriate tables....aside from my specific code...i wonder whether there is an ideal way of running multiple insert type queries.
Right now i just have a txt file saved as a.sql with normal sql commands separated by ";"
No, there is not. however, if it stops after 3 or 4 inserts, it's a good bet there's an error in the 3rd or 4th insert. Depending on which SQL engine you use, there are different ways of making it report errors during and after operations.
Additionally, if you have lots of inserts, it's a good idea to wrap them inside a transaction - this basically buffers all the insert commands until it sees the end command for the transaction, and then commit everything to your table. That way, if something goes wrong, your database doesn't get polluted with data that needs to first be deleted again. More importantly, every insert without a transaction counts as a single transaction, which makes them really slow - Doing 100 inserts inside a transaction can be as fast as doing two or three normal inserts.
Maximum Capacity Specifications for SQL Server
Max Batch size = 65,536 * Network Packet Size
However I doubt that Max Batch size is your problem.