How to do a massive insert - sql

I have an application in which I have to insert in a database, with SQL Server 2008, in groups of N tuples and all the tuples have to be inserted to be a success insert, my question is how I insert these tuples and in the case that someone of this fail, I do a rollback to eliminate all the tuples than was inserted correctly.
Thanks

On SQL Server you might consider doing a bulk insert.

From .NET, you can use SQLBulkCopy.
Table-valued parameters (TVPs) are a second route. In your insert statement, use WITH (TABLOCK) on the target table for minimal logging. eg:
INSERT Table1 WITH (TABLOCK) (Col1, Col2....)
SELECT Col1, Col1, .... FROM #tvp
Wrap it in a stored procedure that exposes #tvp as parameter, add some transaction handling, and call this procedure from your app.
You might even try passing the data as XML if it has a nested structure, and shredding it to tables on the database side.

You should look into transactions. This is a good intro article that discusses rolling back and such.

If you are inserting the data directly from the program, it seems like what you need are transactions. You can start a transaction directly in a stored procedure or from a data adapter written in whatever language you are using (for instance, in C# you might be using ADO.NET).
Once all the data has been inserted, you can commit the transaction or do a rollback if there was an error.
See Scott Mitchell's "Managing Transactions in SQL Server Stored Procedures for some details on creating, committing, and rolling back transactions.

For MySQL, look into LOAD DATA INFILE which lets you insert from a disk file.
Also see the general MySQL discussion on Speed of INSERT Statements.
For a more detailed answer please provide some hints as to the software stack you are using, and perhaps some source code.

You have two competing interests, doing a large transaction (which will have poor performance, high risk of failure), or doing a rapid import (which is best not to do all in one transaction).
If you are adding rows to a table, then don't run in a transaction. You should be able to identify which rows are new and delete them should you not like how the look on the first round.
If the transaction is complicated (each row affects dozens of tables, etc) then run them in transactions in small batches.
If you absolutely have to run a huge data import in one transaction, consider doing it when the database is in single user mode and consider using the checkpoint keyword.

Related

Can I perform a dump tran mid-transaction? Also, effects of using delay on log?

Good day,
Two questions:
A) If I have something like this:
COMPLEX QUERY
WAIT FOR LOG TO FREE UP (DELAY)
COMPLEX QUERY
Would this actually work? Or would the log segment of tempdb remain just as full, due to still holding on to the log of the first query.
B) In the situation above, is it possible to have the middle query perform a dump tran with truncate_only ?
(It's a very long chain of various queries that are run together. They don't change anything in the databases and I don't care to even keep the logs if I don't have to.)
The reason for the chain is because I need the same two temp tables, and a whole bunch of variables, for various queries in the chain (Some of them for all of the queries). To simply the usage of the query chain by a user with VERY limited SQL knowledge, I collect very simple information at the beginning of the long script, retrieve the rest automatically, and then use it through out the script
I doubt either of these would work, but I thought I may as well ask.
Sybase versions 15.7 and 12 (12.? I don't remember)
Thanks,
Ziv.
Per my understanding of #michael-gardner 's answers this is what I plan:
FIRST TEMP TABLES CREATION
MODIFYING OPERATIONS ON FIRST TABLES
COMMIT
QUERY1: CREATE TEMP TABLE OF THIS QUERY
QUERY1: MODIFYING OPERATIONS ON TABLE
QUERY1: SELECT
COMMIT
(REPEAT)
DROP FIRST TABLES (end of script)
I read that 'select into' is not written to the log, so I'm creating the table with a create (I have to do it this way due to other reasons), and use select into existing table for initial population. (temp tables)
Once done with the table, I drop it, then 'commit'.
At various points in the chain I check the log segment of tempdb, if it's <70% (normally at >98%), I use a goto to reach the end of the script where I drop the last temp tables and the script ends. (So no need for a manual 'commit' here)
I misunderstood the whole "on commit preserve rows" thing, that's solely on IQ, and I'm on ASE.
Dumping the log mid-transaction won't have any affect on the amount of log space. The Sybase log marker will only move if there is a commit (or rollback), AND if there isn't an older open transaction (which can be found in syslogshold)
There are a couple of different ways you can approach solving the issue:
Add log space to tempdb.
This would require no changes to your code, and is not very difficult. It's even possible that tempdb is not properly sized for the sytem, and the extra log space would be useful to other applications utilizing tempdb.
Rework your script to add a commit at the beginning, and query only for the later transactions.
This would accomplish a couple of things. The commit at the beginning would move the log marker forward, which would allow the log dump to reclaim space. Then since the rest of your queries are only reads, there shouldn't be any transaction space associate with them. Remember the transaction log only stores information on Insert/Update/Delete, not Reads.
Int the example you listed above, the users details could be stored and committed to the database, then the rest of the queries would just be select statements using those details for the variables, then a final transaction would cleanup the table. In this scenario the log is only held for the first transaction, and the last transaction..but the queries in the middle would not fill the log.
Without knowing more about the DB configuration or query details it's hard to get much more detailed.

Stored procedure without transaction

I have a stored procedure that performs calculations and stores a large amount of data in new tables, in another database.
If anything goes wrong, I just drop the tables.
I've already set the recovery mode for this database as simple, as the data is available elsewhere. Is there anything else I can do in the stored procedure to limit writing to the transaction log or remove transactions entirely to speed up the process?
It is impossible to completely eliminate transaction log from the equation in SQL Server.
You may try to check bulk logged recovery model in conjunction with bulk insert, but if your calculations are complex and cannot be expressed within a single select statement, it could be worth trying SSIS.
I suggest you that using SSIS package in order to convert data from one database to another database. in SSIS you can control converted data and can use balk insert. In buck insert mode you limit your database to write transaction logs completely.
I ran into similar situations even while using SSIS where my staging database(s) kept logs more then 10 times the size of the actual data (on simple logging and using bulk insert). After lots of searching I have found that it is not feasable to prevent this from happening when doing large data operations like loading a datawarehouse. Instead it is easier to just clean up after you are done by shrinking the log.
dbcc shrinkfile

Debug Insert and temporal tables in SQL 2012

I'm using SQL Server 2012, and I'm debugging a store procedure that do some INSERT INTO #temporal table SELECT.
There is any way to view the data selected in the command (the subquery of the insert into?)
There is any way to view the data inserted and/or the temporal table where the insert maked the changes?
It doesn't matter if is the total rows, not one by one
UPDATE:
Requirements from AT Compliance and Company Policy requires that any modification can be done in the process of test and it's probable this will be managed by another team. There is any way to avoid any change on the script?
The main idea is that the AT user check in their workdesktop the outputs, copy and paste them, without make any change on environment or product.
Thanks and kind regards.
If I understand your question correctly, then take a look at the OUTPUT clause:
Returns information from, or expressions based on, each row affected
by an INSERT, UPDATE, DELETE, or MERGE statement. These results can be
returned to the processing application for use in such things as
confirmation messages, archiving, and other such application
requirements.
For instance:
INSERT INTO #temporaltable
OUTPUT inserted.*
SELECT *
FROM ...
Will give you all the rows from the INSERT statement that was inserted into the temporal table, which were selected from the other table.
Is there any reason you can't just do this: SELECT * FROM #temporal? (And debug it in SQL Server Management Studio, passing in the same parameters your application is passing in).
It's a quick and dirty way of doing it, but one reason you might want to do it this way over the other (cleaner/better) answer, is that you get a bit more control here. And, if you're in a situation where you have multiple inserts to your temp table (hopefully you aren't), you can just do a single select to see all of the inserted rows at once.
I would still probably do it the other way though (now I know about it).
I know of no way to do this without changing the script. Howeer, for the future, you should never write a complex strored proc or script without a debug parameter that allows you to put in the data tests you will want. Make it the last parameter with a default value of 0 and you won't even have to change your current code that calls the proc.
Then you can add statements like the below everywhere you will want to check intermediate results. Further in debug mode you might always rollback any transactions so that a bug will not affect the data.
IF #debug = 1
BEGIN
SELECT * FROM #temp
END

sql queries and inserts

I have a random question. If I were to do a sql select and while the sql server was querying my request someone else does a insert statement... could that data that was inputted in that insert statement also be retrieved from my select statement?
Queries are queued, so if the SELECT occurs before the INSERT there's no possibility of seeing the newly inserted data.
Using default isolation levels, SELECT is generally given higher privilege over others but still only reads COMMITTED data. So if the INSERT data has not been committed by the time the SELECT occurs--again, you wouldn't see the newly inserted data. If the INSERT has been committed, the subsequent SELECT will include the newly inserted data.
If the isolation level allowed reading UNCOMMITTED (AKA dirty) data, then yes--a SELECT occurring after the INSERT but before the INSERT data was committed would return that data. This is not recommended practice, because UNCOMMITTED data could be subject to a ROLLBACK.
If the SELECT statement is executed before the INSERT statement, the selected data will certainly not include the new inserted data.
What happens in MySQL with MyISAM, the default engine, is that all INSERT statements require a table lock; as a result, once an INSERT statement is executed, it first waits for all existing SELECTs to complete before locking the table, performs the INSERT, and then unlocks it.
For more information, see: Internal Locking Methods in the MySQL manual
No, a SELECT that is already executing that the moment of the INSERT will never gather new records that did not exist when the SELECT statement started executing.
Also if you use the transactional storage engine InnoDB, you can be assured that your SELECT will not include rows that are currently being inserted. That's the purpose of transaction isolation, or the "I" in ACID.
For more details see http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html because there are some nuances about read-committed and read-uncommitted transaction isolation modes.
I don't know particulars for MySQL, but in SQL Server it would depend on if there were any locking hints used, and the default behavior for locks. You have a couple of options:
Your SELECT locks the table, which means the INSERT won't process until your select is finished.
Your SELECT is able to do a "dirty read" which means the transaction doesn't care if you get slightly out-of-date data, and you miss the INSERT
Your SELECT is able to do a "dirty read" but the INSERT happens before the SELECT hits that row, and you get the result that was added.
The only way you do that is with a "dirty read".
Take a look at MYSql's documentation on TRANSACTION ISOLATION LEVELS to get a better understanding of what that is.

Local Temporary table in Oracle 10 (for the scope of Stored Procedure)

I am new to oracle. I need to process large amount of data in stored proc. I am considering using Temporary tables. I am using connection pooling and the application is multi-threaded.
Is there a way to create temporary tables in a way that different table instances are created for every call to the stored procedure, so that data from multiple stored procedure calls does not mix up?
You say you are new to Oracle. I'm guessing you are used to SQL Server, where it is quite common to use temporary tables. Oracle works differently so it is less common, because it is less necessary.
Bear in mind that using a temporary table imposes the following overheads:read data to populate temporary tablewrite temporary table data to fileread data from temporary table as your process startsMost of that activity is useless in terms of helping you get stuff done. A better idea is to see if you can do everything in a single action, preferably pure SQL.
Incidentally, your mention of connection pooling raises another issue. A process munging large amounts of data is not a good candidate for running in an OLTP mode. You really should consider initiating a background (i.e. asysnchronous) process, probably a database job, to run your stored procedure. This is especially true if you want to run this job on a regular basis, because we can use DBMS_SCHEDULER to automate the management of such things.
IF you're using transaction (rather than session) level temporary tables, then this may already do what you want... so long as each call only contains a single transaction? (you don't quite provide enough detail to make it clear whether this is the case or not)
So, to be clear, so long as each call only contains a single transaction, then it won't matter that you're using a connection pool since the data will be cleared out of the temporary table after each COMMIT or ROLLBACK anyway.
(Another option would be to create a uniquely named temporary table in each call using EXECUTE IMMEDIATE. Not sure how performant this would be though.)
In Oracle, it's almost never necessary to create objects at runtime.
Global Temporary Tables are quite possibly the best solution for your problem, however since you haven't said exactly why you need a temp table, I'd suggest you first check whether a temp table is necessary; half the time you can do with one SQL what you might have thought would require multiple queries.
That said, I have used global temp tables in the past quite successfully in applications that needed to maintain a separate "space" in the table for multiple contexts within the same session; this is done by adding an additional ID column (e.g. "CALL_ID") that is initially set to 1, and subsequent calls to the procedure would increment this ID. The ID would necessarily be remembered using a global variable somewhere, e.g. a package global variable declared in the package body. E.G.:
PACKAGE BODY gtt_ex IS
last_call_id integer;
PROCEDURE myproc IS
l_call_id integer;
BEGIN
last_call_id := NVL(last_call_id, 0) + 1;
l_call_id := last_call_id;
INSERT INTO my_gtt VALUES (l_call_id, ...);
...
SELECT ... FROM my_gtt WHERE call_id = l_call_id;
END;
END;
You'll find GTTs perform very well even with high concurrency, certainly better than using ordinary tables. Best practice is to design your application so that it never needs to delete the rows from the temp table - since the GTT is automatically cleared when the session ends.
I used global temporary table recently and it was behaving very unwantedly manner.
I was using temp table to format some complex data in a procedure call and once the data is formatted, pass the data to fron end (Asp.Net).
In first call to the procedure, i used to get proper data and any subsequent call used to give me data from last procedure call in addition to current call.
I investigated on net and found out an option to delete rows on commit.
I thought that will fix the problem.. guess what ? when i used on commit delete rows option, i always used to get 0 rows from database. so i had to go back to original approach of on commit preserve rows, which preserves the rows even after commiting the transaction.This option clears rows from temp table only after session is terminated.
then i found out this post and came to know about the column to track call_id of a session.
I implemented that solution and still it dint fix the problem.
then i wrote following statement in my procedure before i starting any processing.
Delete From Temp_table;
Above statemnet made the trick. my front end was using connection pooling and after each procedure call it was commitng the transaction but still keeping the connection in connection pool and subsequent request was using the same connection and hence the database session was not terminated after every call..
Deleting rows from temp table before strating any processing made it work....
It drove me nuts till i found this solution....