SQL insert into - for loop - sql

How do I make SQL Server commit inserts in chunks? I need to copy a large amount of rows from an old db into a new table, and it has several problems:
it takes ages to finish, and I don't see any rows in my table until the entire transaction is finished.
my log file is growing like crazy and it will probably run out of space.
if something breaks in the middle, I have to repeat everything.
If I add SET ROWCOUNT 500, I can limit the number of rows, but I don't know how to continue with the last inserted ID. I might query the new table to see what got inserted last, but I am not sure if that's the right thing to do. And it's a bit difficult because my where clause does not use the ID column, so I am not sure how to know exactly where to continue.
What's the best approach for this? Is there a "for loop" or something which would allow me to commit every once in a while?
I am using SSMS for SQL Server 2008 R2.

Even if TomTom's answer is sarcastic, it contains two basic options, which may help you:
You can write a loop in T-SQL (see for example while) and use TOP and OFFSET to select chunks (you need an order by). You can minimize looging according to Microsoft. And if you just worry about restarting without redoing everything this should be fine, though I don't expect it to be fast.
You can export your selection to a file and use the bulk insert to load it.
Some more options you may find here (About Bulk Import and Bulk Export Operations) and here (INSERT Section Best Practices).

How do I make SQL Server commit inserts in chunks?
You program code that inserts it in chunks, quite easy.
1: Yes, that is how transactions work, you know.
2: Yes, that is how transactions work, you know.
3: Yes, guess what - THAT IS HOW TRANSACTIONS WORK, you know ;)
but I don't know how to continue with the last inserted ID
Called programming. Generate ID's on client side. Remember the last one you generated.
In general, I would advice not to INSERT in code - depending how your code works, this sounds like a data copy operation, and there are other mechanisms for that (bulk copy interface).
I am using SSMS for SQL Server 2008 R2
Basically it behaves as you program it. You can easily put in some loop in the SSMS side, or export to a file, then bulk insert the file. Wont help you with item 2 though... unless you go to simple backup model and do not care about restoring.
Also 3 is complex then - how do you properly restart? Lots of additional code.
The best approach depends on the circumstances, and sadly you are way too vague to make more than a blind guess.

Related

How to debug a SQL Server table insertion that hangs?

On my SQL Server, I have a query which does not produce any rows and this select statement runs for about a minute. Now I don't understand why the insert would not finish forever. The destination table has only 48k rows. I don't have rights to run any kinds of tracing or any other diagnostic queries that can help this. What else can I try?
Turn your insert into a select so you can see what you're trying to insert. Then take parts away from your SQL statement (joins, sub queries, etc.) until it starts running quickly. The last thing you removed was the cause of the slowness. Without an example we can't give you more specific help that. The process of writing a https://stackoverflow.com/help/mcve will probably help you answer this yourself.

Stored Procedure vs Direct Query in Excel

I have an excel file that will select roughly 1100 rows with 5 columns of data. Most columns are 5 digits long and are integers. I am using a macro to connect to a SQL server database and insert these rows into one maybe two tables. This is all its doing and then it closes the connection. So the user opens an excel file that has the rows, clicks a button and it executes the macro.
My question is, should the query be written in Excel since its simple and merely inserts the data into a few tables. Or is it more efficient calling a stored procedure and passing all of the values in the stored procedure and have it allocate where the values go in the different tables. When I mean efficient, i mean which is the quickest? I know this will probably take a few seconds to complete. I just feel going to a stored procedure is an extra point along the path that the data has to get to before it reaches the tables. Am I wrong? Any thoughts?
There are some advantages to using stored procedures in SQL Server. One is that SQL Server precompiles and saves the query execution plan, which increases performance. With your current method, SQL Server will generally need to generate the execution plan each time. Stored procedures can also reduce client/server network traffic.
So, even though it may seem like an extra point along the path, it actually can be faster.
In addition to #mark d.'s answer, another reason for using a stored procedure is security.
Your comment says that a customer is entering the data into Excel, so if you are putting direct SQL into your spreadsheet, then there is a risk that someone will open your spreadsheet and find out information about your database. But if you use a stored procedure then there is far less that can be learned.
Either way, make sure that you aren't hardcoding any connection string/account credentials into the spreadsheet.

INSERT INTO SELECT Troubleshooting

When using INSERT INTO SELECT, the SELECT portion if run on its own returns in about 25 seconds. THE INSERT INTO I've let go for over 8 mins and then cancelled. What can I do to start troubleshooting this? I'm not sure if there is locking going on? This is a table that has constant single selects and inserts going on. There are also 3 index on this table that I know need to be updated when new rows come in. The scenario above is also for only 68,000 rows, and I do have other inserts that I will need to do that will contain more. Last, I'm using SQL SERVER 2008 R2.
Thanks in advance.
Is there space for the new table? I had a project way back in SQL Server 2000. The auto-growth was killing the timing. When I pre-allocated the space before loading, the process flew.
Make sure your growth options are not 1 mb or 10 %. That will kill you also.
Also, look into instant database file initialization. If this is turned off, the SQL Server engine has to zero out the pages before giving you new space. Otherwise, it skips this step.
Good link from MSDN on this topic,
http://msdn.microsoft.com/en-us/library/gg634626.aspx
Trace Flag 1806 which disables Instant Database File Initialization. Make sure this is not set on your server.
Please post back if this does not fix your issue.
Sincerely
John
The Crafty DBA
www.craftydba.com

Local vs Global temp tables - When to use what?

I have a report which on execution connects to the database with my_report_user username. There can be many end-users of the report. And in each execution a new connection to the database will be made with my_report_user (there is no connection pooling)
I have a result set which I think can just be created once (may be on the first run of the report) and other report executions can just reuse that stuff. Basically each report execution should check whether this result set (stored as temp table) exists or not. If it does not exist then create that result set else just reuse whats available.
Should I use local temp tables (#) or global temp tables (##)?
Has anyone tried such stuff and if yes, please let me know what all things should I care about? (Almost simultaneous report runs, etc.)
EDIT: I am using Sql-Server 2005
Neither
If you want to cache result result sets under your own control, then you cannot use temp tables, of any kind. You should use ordinary user tables, stored either in tempdb or even have your own result set cache database.
Temp tables, bot #local and ##shared have a lifetime controlled by the connection(s). If your application disconnect, the temp table is deleted, and this does not work well with what you describe.
The real difficult prolem will be to populate these cached result sets under concurent runs without mixing things up (end up with result sets containing duplicate items from concurent report runs that both believed are the 'first' run).
As a side note SQL Server Reporting Services already does this out-of-the-box. You can cache and share datasets, you can cache and share reports, it already works and was tested for you.
I find #temp tables can be useful in certain scenarios, but not as a best practice. I have yet to find a valid use for global ##temp tables, either in my own work, or in the work of anyone else who has written about them. The only case I can think of is BCP or other external process which needs to build a temporary data store and then retrieve it in some subsequent step. In that case I would prefer to use a permanent table with some kind of key and a background process to handle cleanup.
It sounds like you are getting into an OLTP mode now. Reading up on database warehousing will definitely help you.

SQL Timeouts and SSIS

I've an SSIS package that runs a stored proc for exporting to an excel file. Everything worked like a champ until I needed to a do a bit of rewriting on the stored proc. The proc now takes about 1 minute to run and the exported columns are different, so my problems are the following;
1) SSIS complains when I hit the preview button "No column information returned by command"
2) It times out after about 30 seconds.
What I've done.
Tried to clean up/optimize the query. That helped a bit, but it still is doing some major calculations and it runs just fine in SSMS.
Changed the timeout values to 90 seconds. Didn't seem to help. Maybe someone here can?
Thanks,
Found this little tidbit which helped immensely.
No Column Names
Basically all you need to do is add the following to your SQL query text in SSIS.
SET FMTONLY OFF
SET NOCOUNT ON
Only problem now is it runs slow as molasses :-(
EDIT: It's running just too damn slow.
Changed from using #tempTable to tempTable. Adding in appropriate drop statements. argh...
Although it appears you may have answered part of your own question, you are probably getting the "No column information returned by command" error because the table doesn't exist at the time it tries to validate the metadata. Creating the tables as non-temporary tables resolves this issue.
If you insist on using temporary tables, you can create the temporary tables in the step preceeding the data flow. You would need to create it as a ## table and turn off connection sharing for the connection for this to work, but it is an alternative to creating permanent tables.
A shot in the dark based on something obscure I hit years ago: When you modified the procedure, did you add a call to a second procedure? This might mess up SSIS's ability to determine the returned data set.
As for (2), does the procedure take 30+ or 90+ seconds to run in SSMS? If not, do you know that the query is actually getting into SQL from SSIS? Might be worth firing up SQL Profiler to see what's actually being sent to SQL Server. [Which was the way I found out my obscure factoid.]