I am using a MERGE Query that is INSERTING over 800 Million records into a table from another table in the same database (conversion project). We run into this error below when it get's to this particular table it has to write to for the SQL Merge.
2019-02-05 16:35:03.002 Error Could not allocate space for object 'dbo.SORT temporary run storage: 140820412694528' in database 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.
MERGE dbo.' + #p_TargetDLTable + ' as TARGET
USING dbo.' + #p_SourceDLSDTable + ' as SOURCE
ON (TARGET.docid = source.docid AND TARGET.objectid = source.objectid AND
target.pagenum = source.pagenum
and target.subpagenum = source.subpagenum and target.pagever =
source.pagever and target.pathid = source.pathid
and target.annote = source.annote)
WHEN NOT MATCHED BY TARGET AND source.clipid != ''X''
THEN INSERT (docid, pagenum, subpagenum, pagever, objectid, pathid, annote,
formatid, ftoffset, ftcount) VALUES (
source.docid, source.pagenum, source.subpagenum, source.pagever,
source.objectid, source.pathid,source.annote ,source.formatid
,source.ftoffset, source.ftcount); '
The reason I decided to use a MERGE query over INSERT INTO was because all the research was pointing to for the type of join that had to be done, it would result in faster performance.
Is there a way to increase the TempDB, or is there a way for the Merge to not have to use the TempDB? Does the INSERT INTO query also use the TempDB?
Related
I have a table with 3.4 million rows. I want to copy this whole data into another table.
I am performing this task using the below query:
select *
into new_items
from productDB.dbo.items
I need to know the best possible way to do this task.
I had the same problem, except I have a table with 2 billion rows, so the log file would grow to no end if I did this, even with the recovery model set to Bulk-Logging:
insert into newtable select * from oldtable
So I operate on blocks of data. This way, if the transfer is interupted, you just restart it. Also, you don't need a log file as big as the table. You also seem to get less tempdb I/O, not sure why.
set identity_insert newtable on
DECLARE #StartID bigint, #LastID bigint, #EndID bigint
select #StartID = isNull(max(id),0) + 1
from newtable
select #LastID = max(ID)
from oldtable
while #StartID < #LastID
begin
set #EndID = #StartID + 1000000
insert into newtable (FIELDS,GO,HERE)
select FIELDS,GO,HERE from oldtable (NOLOCK)
where id BETWEEN #StartID AND #EndId
set #StartID = #EndID + 1
end
set identity_insert newtable off
go
You might need to change how you deal with IDs, this works best if your table is clustered by ID.
If you are copying into a new table, the quickest way is probably what you have in your question, unless your rows are very large.
If your rows are very large, you may want to use the bulk insert functions in SQL Server. I think you can call them from C#.
Or you can first download that data into a text file, then bulk-copy (bcp) it. This has the additional benefit of allowing you to ignore keys, indexes etc.
Also try the Import/Export utility that comes with the SQL Management Studio; not sure whether it will be as fast as a straight bulk-copy, but it should allow you to skip the intermediate step of writing out as a flat file, and just copy directly table-to-table, which might be a bit faster than your SELECT INTO statement.
I have been working with our DBA to copy an audit table with 240M rows to another database.
Using a simple select/insert created a huge tempdb file.
Using a the Import/Export wizard worked but copied 8M rows in 10min
Creating a custom SSIS package and adjusting settings copied 30M rows in 10Min
The SSIS package turned out to be the fastest and most efficent for our purposes
Earl
Here's another way of transferring large tables. I've just transferred 105 million rows between two servers using this. Quite quick too.
Right-click on the database and choose Tasks/Export Data.
A wizard will take you through the steps but you choosing your SQL server client as the data source and target will allow you to select the database and table(s) you wish to transfer.
For more information, see https://www.mssqltips.com/sqlservertutorial/202/simple-way-to-export-data-from-sql-server/
If it's a 1 time import, the Import/Export utility in SSMS will probably work the easiest and fastest. SSIS also seems to work better for importing large data sets than a straight INSERT.
BULK INSERT or BCP can also be used to import large record sets.
Another option would be to temporarily remove all indexes and constraints on the table you're importing into and add them back once the import process completes. A straight INSERT that previously failed might work in those cases.
If you're dealing with timeouts or locking/blocking issues when going directly from one database to another, you might consider going from one db into TEMPDB and then going from TEMPDB into the other database as it minimizes the effects of locking and blocking processes on either side. TempDB won't block or lock the source and it won't hold up the destination.
Those are a few options to try.
-Eric Isaacs
Simple Insert/Select sp's work great until the row count exceeds 1 mil. I've watched tempdb file explode trying to insert/select 20 mil + rows. The simplest solution is SSIS setting the batch row size buffer to 5000 and commit size buffer to 1000.
I know this is late, but if you are encountering semaphore timeouts then you can use row_number to set increments for your insert(s) using something like
INSERT INTO DestinationTable (column1, column2, etc)
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS RN , column1, column2, etc
FROM SourceTable ) AS A
WHERE A.RN >= 1 AND A.RN <= 10000 )
The size of the log file will grow, so there is that to contend with. You get better performance if you disable constraints and index when inserting into an existing table. Then enable the constraints and rebuild the index for the table you inserted into once the insertion is complete.
I like the solution from #Mathieu Longtin to copy in batches thereby minimising log file issues and created a version with OFFSET FETCH as suggested by #CervEd.
Others have suggested using the Import/Export Wizard or SSIS packages, but that's not always possible.
It's probably overkill for many but my solution includes some checks for record counts and outputs progress as well.
USE [MyDB]
GO
SET NOCOUNT ON;
DECLARE #intStart int = 1;
DECLARE #intCount int;
DECLARE #intFetch int = 10000;
DECLARE #strStatus VARCHAR(200);
DECLARE #intCopied int = 0;
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Getting count of HISTORY records currently in MyTable...';
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
SELECT #intCount = COUNT(*) FROM [dbo].MyTable WHERE IsHistory = 1;
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Count of HISTORY records currently in MyTable: ' + CONVERT(VARCHAR(20), #intCount);
RAISERROR (#strStatus, 10, 1) WITH NOWAIT; --(note: PRINT resets ##ROWCOUNT to 0 so using RAISERROR instead)
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Starting copy...';
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
WHILE #intStart < #intCount
BEGIN
INSERT INTO [dbo].[MyTable_History] (
[PK1], [PK2], [PK3], [Data1], [Data2])
SELECT
[PK1], [PK2], [PK3], [Data1], [Data2]
FROM [MyDB].[dbo].[MyTable]
WHERE IsHistory = 1
ORDER BY
[PK1], [PK2], [PK3]
OFFSET #intStart - 1 ROWS
FETCH NEXT #intFetch ROWS ONLY;
SET #intCopied = #intCopied + ##ROWCOUNT;
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Records copied so far: ' + CONVERT(VARCHAR(20), #intCopied);
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
SET #intStart = #intStart + #intFetch;
END
--Check the record count is correct.
IF #intCopied = #intCount
BEGIN
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Correct record count.';
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
END
ELSE
BEGIN
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Only ' + CONVERT(VARCHAR(20), #intCopied) + ' records were copied, expected: ' + CONVERT(VARCHAR(20), #intCount);
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
END
GO
If your focus is Archiving (DW) and are dealing with VLDB with 100+ partitioned tables and you want to isolate most of these resource intensive work on a non production server (OLTP) here is a suggestion (OLTP -> DW)
1) Use backup / Restore to get the data onto the archive server (so now, on Archive or DW you will have Stage and Target database)
2) Stage database: Use partition switch to move data to corresponding stage table
3) Use SSIS to transfer data from staged database to target database for each staged table on both sides
4) Target database: Use partition switch on target database to move data from stage to base table
Hope this helps.
select * into new_items from productDB.dbo.items
That pretty much is it. THis is the most efficient way to do it.
I am working on a stored procedure that will bulk insert data from a .csv into a table, but the procedure should not "insert duplicates". Some of the rows might already be in the table but some of the values might have changed, I guess an UPSERT would be good here, but what about deleting records that already exist in the table but don't exist in the .csv?
The end goal is exactly the same as wiping the entire table and inserting all the .csv data into it, but as the .csv has millions of records and only a few hundred of them change from time to time, I assume "updating" the table is faster than wiping it.
DECLARE #ssql NVARCHAR(4000) = 'BULK INSERT gl_ip_range FROM '''
+ #psTempFilePath
+
''' WITH ( FIELDTERMINATOR ='','', ROWTERMINATOR =''\n'', FORMAT =''CSV'', FIRSTROW =2 )'
;
The code above is what I currently use together with clearing the table prior to the insert.
Will clearing the table and inserting the millions of records be faster than doing what I am trying to achieve?
I am getting
Statement 'SELECT INTO' is not supported in this version of SQL Server
in SQL Server
for the below query inside stored procedure
DECLARE #sql NVARCHAR(MAX)
,#sqlSelect NVARCHAR(MAX) = ''
,#sqlFrom NVARCHAR(MAX) = ''
,#sqlTempTable NVARCHAR(MAX) = '#itemSearch'
,#sqlInto NVARCHAR(MAX) = ''
,#params NVARCHAR(MAX)
SET #sqlSelect ='SELECT
,IT.ITEMNR
,IT.USERNR
,IT.ShopNR
,IT.ITEMID'
SET #sqlFrom =' FROM dbo.ITEM AS IT'
SET #sqlInto = ' INTO ' + #sqlTempTable + ' ';
IF (#cityId > 0)
BEGIN
SET #sqlFrom = #sqlFrom +
' INNER JOIN dbo.CITY AS CI2
ON CI2.CITYID = #cityId'
SET #sqlSelect = #sqlSelect +
'CI2.LATITUDE AS CITYLATITUDE
,CI2.LONGITUDE AS CITYLONGITUDE'
END
SELECT #params =N'#cityId int '
SET #sql = #sqlSelect +#sqlInto +#sqlFrom
EXEC sp_executesql #sql,#params
I have around 50,000 records, so decided to use Temp Table. But surprised to see this error.
How can i achieve the same in SQL Azure?
Edit: Reading this blog http://blogs.msdn.com/b/sqlazure/archive/2010/05/04/10007212.aspx suggesting us to CREATE a Table inside Stored procedure for storing data instead of Temp table. Is it safe under concurrency? Will it hit performance?
Adding some points taken from http://blog.sqlauthority.com/2011/05/28/sql-server-a-quick-notes-on-sql-azure/
Each Table must have clustered index. Tables without a clustered index are not supported.
Each connection can use single database. Multiple database in single transaction is not supported.
‘USE DATABASE’ cannot be used in Azure.
Global Temp Tables (or Temp Objects) are not supported.
As there is no concept of cross database connection, linked server is not the concept in Azure at this moment.
SQL Azure is shared environment and because of the same there is no concept of Windows Login.
Always drop TempDB objects after their need as they create pressure on TempDB.
During buck insert use batchsize option to limit the number of rows to be inserted. This will limit the usage of Transaction log space.
Avoid unnecessary usage of grouping or blocking ORDER by operations as they leads to high end memory usage.
SELECT INTO is one of the many things that you can unfortunately not perform in SQL Azure.
What you'd have to do is first create the temporary table, then perform the insert. Something like:
CREATE TABLE #itemSearch (ITEMNR INT, USERNR INT, IT.ShopNR INT, IT.ITEMID INT)
INSERT INTO #itemSearch
SELECT IT.ITEMNR, IT.USERNR, IT.ShopNR ,IT.ITEMID
FROM dbo.ITEM AS IT
The new Azure DB Update preview has this problem resolved:
The V12 preview enables you to create a table that has no clustered
index. This feature is especially helpful for its support of the T-SQL
SELECT...INTO statement which creates a table from a query result.
http://azure.microsoft.com/en-us/documentation/articles/sql-database-preview-whats-new/
Create the table using # prefix, e.g. create table #itemsearch then use insert into. The scope of the temp table is limited to the session so there will no concurrency problems.
Well, As we all know SQL Azure table must have a clustered index, that is why SELECT INTO failure copy data from one table in to another table.
If you want to migrate, you must create a table first with same structure and then execute INSERT INTO statement.
For temporary table which followed by # you don't need to create Index.
how to create index and how to execute insert into for temp table?
I was wondering how / if it's possible to have a query which drops all temporary tables?
I've been trying to work something out using the tempdb.sys.tables, but am struggling to format the name column to make it something that can then be dropped - another factor making things a bit trickier is that often the temp table names contain a '_' which means doing a replace becomes a bit more fiddly (for me at least!)
Is there anything I can use that will drop all temp tables (local or global) without having to drop them all individually on a named basis?
Thanks!
The point of temporary tables is that they are.. temporary. As soon as they go out of scope
#temp create in stored proc : stored proc exits
#temp created in session : session disconnects
##temp : session that created it disconnects
The query disappears. If you find that you need to remove temporary tables manually, you need to revisit how you are using them.
For the global ones, this will generate and execute the statement to drop them all.
declare #sql nvarchar(max)
select #sql = isnull(#sql+';', '') + 'drop table ' + quotename(name)
from tempdb..sysobjects
where name like '##%'
exec (#sql)
It is a bad idea to drop other sessions' [global] temp tables though.
For the local (to this session) temp tables, just disconnect and reconnect again.
The version below avoids all of the hassles of dealing with the '_'s. I just wanted to get rid of non-global temp tables, hence the '#[^#]%' in my WHERE clause, drop the [^#] if you want to drop global temp tables as well, or use a '##%' if you only want to drop global temp tables.
The DROP statement seems happy to take the full name with the '_', etc., so we don't need to manipulate and edit these. The OBJECT_ID(...) NOT NULL allows me to avoid tables that were not created by my session, presumably since these tables should not be 'visible' to me, they come back with NULL from this call. The QUOTENAME is needed to make sure the name is correctly quoted / escaped. If you have no temp tables, #d_sql will be the empty string still, so we check for that before printing / executing.
DECLARE #d_sql NVARCHAR(MAX)
SET #d_sql = ''
SELECT #d_sql = #d_sql + 'DROP TABLE ' + QUOTENAME(name) + ';
'
FROM tempdb..sysobjects
WHERE name like '#[^#]%'
AND OBJECT_ID('tempdb..'+QUOTENAME(name)) IS NOT NULL
IF #d_sql <> ''
BEGIN
PRINT #d_sql
-- EXEC( #d_sql )
END
In a stored procedure they are dropped automatically when the execution of the proc completes.
I normally come across the desire for this when I copy code out of a stored procedure to debug part of it and the stored proc does not contain the drop table commands.
Closing and reopening the connection works as stated in the accepted answer. Rather than doing this manually after each execution you can enable SQLCMD mode on the Query menu in SSMS
And then use the :connect command (adjust to your server/instance name)
:connect (local)\SQL2014
create table #foo(x int)
create table #bar(x int)
select *
from #foo
Can be run multiple times without problems. The messages tab shows
Connecting to (local)\SQL2014...
(0 row(s) affected)
Disconnecting connection from (local)\SQL2014...
I have a table with 3.4 million rows. I want to copy this whole data into another table.
I am performing this task using the below query:
select *
into new_items
from productDB.dbo.items
I need to know the best possible way to do this task.
I had the same problem, except I have a table with 2 billion rows, so the log file would grow to no end if I did this, even with the recovery model set to Bulk-Logging:
insert into newtable select * from oldtable
So I operate on blocks of data. This way, if the transfer is interupted, you just restart it. Also, you don't need a log file as big as the table. You also seem to get less tempdb I/O, not sure why.
set identity_insert newtable on
DECLARE #StartID bigint, #LastID bigint, #EndID bigint
select #StartID = isNull(max(id),0) + 1
from newtable
select #LastID = max(ID)
from oldtable
while #StartID < #LastID
begin
set #EndID = #StartID + 1000000
insert into newtable (FIELDS,GO,HERE)
select FIELDS,GO,HERE from oldtable (NOLOCK)
where id BETWEEN #StartID AND #EndId
set #StartID = #EndID + 1
end
set identity_insert newtable off
go
You might need to change how you deal with IDs, this works best if your table is clustered by ID.
If you are copying into a new table, the quickest way is probably what you have in your question, unless your rows are very large.
If your rows are very large, you may want to use the bulk insert functions in SQL Server. I think you can call them from C#.
Or you can first download that data into a text file, then bulk-copy (bcp) it. This has the additional benefit of allowing you to ignore keys, indexes etc.
Also try the Import/Export utility that comes with the SQL Management Studio; not sure whether it will be as fast as a straight bulk-copy, but it should allow you to skip the intermediate step of writing out as a flat file, and just copy directly table-to-table, which might be a bit faster than your SELECT INTO statement.
I have been working with our DBA to copy an audit table with 240M rows to another database.
Using a simple select/insert created a huge tempdb file.
Using a the Import/Export wizard worked but copied 8M rows in 10min
Creating a custom SSIS package and adjusting settings copied 30M rows in 10Min
The SSIS package turned out to be the fastest and most efficent for our purposes
Earl
Here's another way of transferring large tables. I've just transferred 105 million rows between two servers using this. Quite quick too.
Right-click on the database and choose Tasks/Export Data.
A wizard will take you through the steps but you choosing your SQL server client as the data source and target will allow you to select the database and table(s) you wish to transfer.
For more information, see https://www.mssqltips.com/sqlservertutorial/202/simple-way-to-export-data-from-sql-server/
If it's a 1 time import, the Import/Export utility in SSMS will probably work the easiest and fastest. SSIS also seems to work better for importing large data sets than a straight INSERT.
BULK INSERT or BCP can also be used to import large record sets.
Another option would be to temporarily remove all indexes and constraints on the table you're importing into and add them back once the import process completes. A straight INSERT that previously failed might work in those cases.
If you're dealing with timeouts or locking/blocking issues when going directly from one database to another, you might consider going from one db into TEMPDB and then going from TEMPDB into the other database as it minimizes the effects of locking and blocking processes on either side. TempDB won't block or lock the source and it won't hold up the destination.
Those are a few options to try.
-Eric Isaacs
Simple Insert/Select sp's work great until the row count exceeds 1 mil. I've watched tempdb file explode trying to insert/select 20 mil + rows. The simplest solution is SSIS setting the batch row size buffer to 5000 and commit size buffer to 1000.
I know this is late, but if you are encountering semaphore timeouts then you can use row_number to set increments for your insert(s) using something like
INSERT INTO DestinationTable (column1, column2, etc)
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS RN , column1, column2, etc
FROM SourceTable ) AS A
WHERE A.RN >= 1 AND A.RN <= 10000 )
The size of the log file will grow, so there is that to contend with. You get better performance if you disable constraints and index when inserting into an existing table. Then enable the constraints and rebuild the index for the table you inserted into once the insertion is complete.
I like the solution from #Mathieu Longtin to copy in batches thereby minimising log file issues and created a version with OFFSET FETCH as suggested by #CervEd.
Others have suggested using the Import/Export Wizard or SSIS packages, but that's not always possible.
It's probably overkill for many but my solution includes some checks for record counts and outputs progress as well.
USE [MyDB]
GO
SET NOCOUNT ON;
DECLARE #intStart int = 1;
DECLARE #intCount int;
DECLARE #intFetch int = 10000;
DECLARE #strStatus VARCHAR(200);
DECLARE #intCopied int = 0;
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Getting count of HISTORY records currently in MyTable...';
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
SELECT #intCount = COUNT(*) FROM [dbo].MyTable WHERE IsHistory = 1;
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Count of HISTORY records currently in MyTable: ' + CONVERT(VARCHAR(20), #intCount);
RAISERROR (#strStatus, 10, 1) WITH NOWAIT; --(note: PRINT resets ##ROWCOUNT to 0 so using RAISERROR instead)
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Starting copy...';
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
WHILE #intStart < #intCount
BEGIN
INSERT INTO [dbo].[MyTable_History] (
[PK1], [PK2], [PK3], [Data1], [Data2])
SELECT
[PK1], [PK2], [PK3], [Data1], [Data2]
FROM [MyDB].[dbo].[MyTable]
WHERE IsHistory = 1
ORDER BY
[PK1], [PK2], [PK3]
OFFSET #intStart - 1 ROWS
FETCH NEXT #intFetch ROWS ONLY;
SET #intCopied = #intCopied + ##ROWCOUNT;
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Records copied so far: ' + CONVERT(VARCHAR(20), #intCopied);
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
SET #intStart = #intStart + #intFetch;
END
--Check the record count is correct.
IF #intCopied = #intCount
BEGIN
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Correct record count.';
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
END
ELSE
BEGIN
SET #strStatus = CONVERT(VARCHAR(30), GETDATE()) + ' Only ' + CONVERT(VARCHAR(20), #intCopied) + ' records were copied, expected: ' + CONVERT(VARCHAR(20), #intCount);
RAISERROR (#strStatus, 10, 1) WITH NOWAIT;
END
GO
If your focus is Archiving (DW) and are dealing with VLDB with 100+ partitioned tables and you want to isolate most of these resource intensive work on a non production server (OLTP) here is a suggestion (OLTP -> DW)
1) Use backup / Restore to get the data onto the archive server (so now, on Archive or DW you will have Stage and Target database)
2) Stage database: Use partition switch to move data to corresponding stage table
3) Use SSIS to transfer data from staged database to target database for each staged table on both sides
4) Target database: Use partition switch on target database to move data from stage to base table
Hope this helps.
select * into new_items from productDB.dbo.items
That pretty much is it. THis is the most efficient way to do it.