Using "Select INTO" for million of records

Using "Select INTO" for million of records - sql

I'm using the SELECT INTO statement to create my SQL statement result.
SELECT fields
INTO [newtable]
FROM table1, table2, table3, table4
[where clause to filter records from four tables]
I would like to know how this statement would impact the performance if there million of records to insert into new table. Will there be OutOfMemory error in this case?
My environment is SQL Server 2008 R2.

Will there be OutOfMemory error in this case?
SQL Server is going to guess how many records you are inserting. It will then ask the OS for that much memory, hopefully avoiding an out of memory situation.
If SQL Server isn't given enough memory, or if it asks for too little, it will temporarily store the excess data in TempDB. As the name suggests, TempDB is a temporary database that lives on disk. https://msdn.microsoft.com/en-us/library/ms190768.aspx
To ensure good performance, put as much RAM as possible on your database server.
To ensure good performance, put your TempDB on its own hard drive. And make sure you buy an SSD hard drive, not a cheap spinner.
If you look at the execution plan in SQL Server Management Studio, you can see details such as how much memory the query used. http://www.developer.com/db/understanding-a-sql-server-query-execution-plan.html
If the memory SQL Server used is very different than the memory it asked for, your statistics are probably out of date. (Search for "SQL Server Statistics" for more info.)

It doesn't store the data in memory while it is transferring the data so there shouldn't be memory issues.
Logging and IO performance would take a hit as you are creating millions of rows. Also performance would downgrade if your joins are complex in your select statement (if the select alone takes ages then the insert will be even longer)

SELECT INTO is typically used to generate temp tables or to copy another table (data and/or structure).
SELECT INTO is usually much better performance wise since it is minimally logged.

Create a temp table with the necessary fields from the following tables
ex:
create table #temp_data
(
empid int,
emp_name varchar(50),
strt_date datetime,
emp_addrss varchar(50),
emp_sal numeric
)
Next insert the date from the four tables to the following columns with necessary conditions required.
insert #temp_data values
(
001,'raj','2013-04-03','hyderabad',25000
)
Update
update #temp_data
(
empid,emp_name,strt_date,emp_address,emp_sal
)
Now write the query as per ur requirements with where clause and all
select
empid,emp_name,strt_date,emp_addrss,emp_sal
from #temp_data
where empid = 001
Sol:
empid emp_name strt_date emp_addrss emp_sal
1 raj 2013-04-03 00:00:00.000 hyderabad 25000

Related

Huge amount of data from a source table where not exists to destination table

I am having some trouble selecting data from source table to the destination table to find out which data from the source table that has not yet completely integrated to the destination table.
The source table is another DBMS, which in SQL Server we used a linked server to access that DBMS of the source table and the integration is pretty much just straight forward from column to column of the source to destination (No other calculation)
When I execute a select statement like this
SELECT A.*
FROM [ORCLE_DB]..GROUP.TABLEA AS A
WHERE NOT EXISTS (SELECT 1 FROM TABLEA as B WHERE A.ID = B.ID)
It took forever to select the data, and the amount of data is pretty huge, about 20 million rows of data.
Is there any other ways in order to select these rows, that may allow SELECT execution to be done efficiently and faster? Thank you so much, any ideas and advice will be so much appreciated

You are likely falling foul of a distributed query going 'N+1'. The heuristics are somewhat arcane in the way they penalise network speed. You can verify this using SQL Profiler.
If so, you can fix by:
creating a local temp table to house all the required data from the linked server remote table(s) [and apply any differing collation to temp table columns]
insert the remote data into the local temp table
join the local table with the temp table.

If TABLEA is significantly smaller than [ORACLE_DB]..GOUP.TABLEA, you can create a linked server from oracle that references your SqlServer TABLEA and then just query a view or execute a stored procedure residing in oracle to allow the expensive filtering to be performed there where the bulk of the data is instead of on SqlServer.

Passing a table to a stored procedure

I have a table with 20 billion rows. Table does not have any indexes as it was created on fly for doing bulk insert operation. The table is being used in a stored procedure which does the following operation
Delete A
from master a
inner join (Select distinct Col from TableB ) b
on A.Col = B.Col
Insert into master
Select *
from tableB
group by col1,col2,col3
TableB is the one which has 20 billion rows. I don't want to execute SP directly because it might take days to complete the execution. Master is also a huge table and has clustered index on Col
Can i pass chunks of rows to the stored procedure and perform the operation.This might reduce the log file growth. If yes how can i do that
Should i create clustered index on the table and execute the SP which might be little faster but then again i think creating CI on a huge table might take 10 hours to complete.
Or is there any way to perform this operation fast

I've used a method similar to this one. I'd recommend putting your DB into Bulk Logged recovery mode instead of Full recovery mode if you can.
Blog entry reproduced below to future proof it.
Below is a technique used to transfer a large amount of records from
one table to another. This scales pretty well for a couple reasons.
First, this will not fill up the entire log prior to committing the
transaction. Rather, it will populate the table in chunks of 10,000
records. Second, it’s generally much quicker. You will have to play
around with the batch size. Sometimes it’s more efficient at 10,000,
sometimes 500,000, depending on the system.
If you do not need to insert into an existing table and just need a
copy of the table, it is better to do a SELECT INTO. However for this
example, we are inserting into an existing table.
Another trick you should do is to change the recovery model of the
database to simple. This way, there will be much less logging in the
transaction log.
The WITH (TABLOCK) below only works in SQL 2008.
DECLARE #BatchSize INT = 10000
WHILE 1 = 1
BEGIN
INSERT INTO [dbo].[Destination] --WITH (TABLOCK) -- Uncomment for 2008
(
FirstName
,LastName
,EmailAddress
,PhoneNumber
)
SELECT TOP(#BatchSize)
s.FirstName
,s.LastName
,s.EmailAddress
,s.PhoneNumber
FROM [dbo].[SOURCE] s
WHERE NOT EXISTS (
SELECT 1
FROM dbo.Destination
WHERE PersonID = s.PersonID
)
IF ##ROWCOUNT < #BatchSize BREAK
END
With the above example, it is important to have at least a non
clustered index on PersonID in both tables.
Another way to transfer records is to use multiple threads. Specifying
a range of records as such:
INSERT INTO [dbo].[Destination]
(
FirstName
,LastName
,EmailAddress
,PhoneNumber
)
SELECT TOP(#BatchSize)
s.FirstName
,s.LastName
,s.EmailAddress
,s.PhoneNumber
FROM [dbo].[SOURCE] s
WHERE PersonID BETWEEN 1 AND 5000
GO
INSERT INTO [dbo].[Destination]
(
FirstName
,LastName
,EmailAddress
,PhoneNumber
)
SELECT TOP(#BatchSize)
s.FirstName
,s.LastName
,s.EmailAddress
,s.PhoneNumber
FROM [dbo].[SOURCE] s
WHERE PersonID BETWEEN 5001 AND 10000
For super fast performance however, I’d recommend using SSIS.
Especially in SQL Server 2008. We recently transferred 17 million
records in 5 minutes with an SSIS package executed on the same server
as the two databases it transferred between.
SQL Server 2008 SQL Server 2008 has made changes with regards to it’s
logging mechanism when inserting records. Previously, to do an insert
that was minimally logged, you would have to perform a SELECT.. INTO.
Now, you can perform a minimally logged insert if you can lock the
table you are inserting into. The example below shows an example of
this. The exception to this rule is if you have a clustered index on
the table AND the table is not empty. If the table is empty and you
acquire a table lock and you have a clustered index, it will be
minimally logged. However if you have data in the table, the insert
will be logged. Now if you have a non clustered index on a heap and
you acquire a table lock then only the non clustered index will be
logged. It is always better to drop indexes prior to inserting
records.
To determine the amount of logging you can use the following statement
SELECT * FROM ::fn_dblog(NULL, NULL)
Credit for above goes to Derek Dieter at SQL Server Planet.

If you're dead set on passing a table to your stored procedure, you can pass a table-valued parameter to a stored procedure in SQL Server 2008. You might have better luck with some other approaches suggested, like partitioning. Select distinct on a table with 20 billion rows might be part of the problem. I wonder if some very basic tuning wouldn't help, too:
Delete A
from master a
where exists (select 1 from TableB b where b.Col = a.Col)

Why do I get "The log file for database 'tempdb' is full"

Let we have a table of payments having 35 columns with a primary key (autoinc bigint) and 3 non-clustered, non-unique indeces (each on one int column).
Among the table's columns we have two datetime fields:
payment_date datetime NOT NULL
edit_date datetime NULL
The table has about 1 200 000 rows.
Only ~1000 of rows have edit_date column = null.
9000 of rows have edit_date not null and not equal to payment_date
Others have edit_date=payment_date
When we run the following query 1:
select top 1 *
from payments
where edit_date is not null and (payment_date=edit_date or payment_date<>edit_date)
order by payment_date desc
server needs a couple of seconds to do it. But if we run query 2:
select top 1 *
from payments
where edit_date is not null
order by payment_date desc
the execution ends up with The log file for database 'tempdb' is full. Back up the transaction log for the database to free up some log space.
If we replace * with some certain column, see query 3
select top 1 payment_date
from payments
where edit_date is not null
order by payment_date desc
it also finishes in a couple of seconds.
Where is the magic?
EDIT
I've changed query 1 so that it operates over exactly the same number of rows as the 2nd query. And still it returns in a second, while query 2 fills tempdb.
ANSWER
I followed the advice to add an index, did this for both date fields - everything started working quick, as expected. Though, the question was - why in this exact situation sql server behave differently on similar queries (query 1 vs query 2); I wanted to understand the logic of the server optimization. I would agree if both queries did used tempdb similarly, but they didn't....
In the end I mark as the answer the first one, where I saw the must-be symptoms of my problem and the first, as well, thoughts on how to avoid this (i.e. indeces)

This is happening cause certain steps in an execution plan can trigger writes to tempdb in particular certain sorts and joins involving lots of data.
Since you are sorting a table with a boat load of columns, SQL decides it would be crazy to perform the sort alone in temp db without the associated data. If it did that it would need to do a gazzilion inefficient bookmark lookups on the underlying table.
Follow these rules:
Try to select only the data you need
Size tempdb appropriately, if you need to do crazy queries that sort a gazzilion rows, you better have an appropriately sized tempdb

Usually, tempdb fills up when you are low on disk space, or when you have set an unreasonably low maximum size for database growth.
Many people think that tempdb is only used for #temp tables. When in fact, you can easily fill up tempdb without ever creating a single temp table. Some other scenarios that can cause tempdb to fill up:
any sorting that requires more memory than has been allocated to SQL
Server will be forced to do its work in tempdb;
if the sorting requires more space than you have allocated to tempdb,
one of the above errors will occur;
DBCC CheckDB('any database') will perform its work in tempdb -- on
larger databases, this can consume quite a bit of space;
DBCC DBREINDEX or similar DBCC commands with 'Sort in tempdb' option
set will also potentially fill up tempdb;
large resultsets involving unions, order by / group by, cartesian
joins, outer joins, cursors, temp tables, table variables, and
hashing can often require help from tempdb;
any transactions left uncommitted and not rolled back can leave
objects orphaned in tempdb;
use of an ODBC DSN with the option 'create temporary stored
procedures' set can leave objects there for the life of the
connection.
USE tempdb
GO
SELECT name
FROM tempdb..sysobjects
SELECT OBJECT_NAME(id), rowcnt
FROM tempdb..sysindexes
WHERE OBJECT_NAME(id) LIKE '#%'
ORDER BY rowcnt DESC
The higher rowcount, values will likely indicate the biggest temporary tables that are consuming space.
Short-term fix
DBCC OPENTRAN -- or DBCC OPENTRAN('tempdb')
DBCC INPUTBUFFER(<number>)
KILL <number>
Long-term prevention
-- SQL Server 7.0, should show 'trunc. log on chkpt.'
-- or 'recovery=SIMPLE' as part of status column:
EXEC sp_helpdb 'tempdb'
-- SQL Server 2000, should yield 'SIMPLE':
SELECT DATABASEPROPERTYEX('tempdb', 'recovery')
ALTER DATABASE tempdb SET RECOVERY SIMPLE
Reference : https://web.archive.org/web/20080509095429/http://sqlserver2000.databases.aspfaq.com:80/why-is-tempdb-full-and-how-can-i-prevent-this-from-happening.html
Other references : http://social.msdn.microsoft.com/Forums/is/transactsql/thread/af493428-2062-4445-88e4-07ac65fedb76

SQL Server features/commands that most developers are unaware of [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Hidden Features of SQL Server
I've worked as a .NET developer for a while now, but predominantly against a SQL Server database for a little over 3 years now. I feel that I have a fairly decent grasp of SQL Server from a development standpoint, but I ashamed to admit that I just learned today about "WITH TIES" from this answer - Top 5 with most friends.
It is humbling to see questions and answers like this on SO because it helps me realize that I really don't know as much as I think I do and helps re-energize my will to learn more, so I figured what better way than to ask the masses of experts for input on other handy commands/features.
What is the most useful feature/command that the average developer is probably unaware of?
BTW - if you are like I was and don't know what "WITH TIES" is for, here is a good explanation. You'll see quickly why I was ashamed I was unaware of it. I could see where it could be useful though. - http://harriyott.com/2007/06/with-ties-sql-server-tip.aspx
I realize that this is a subjective question so please allow for at least a few answers before you close it. :) I'll try to edit my question to keep up a list with your response. Thanks
[EDIT] - Here is a summary of the responses Please scroll down for more information. Thanks again guys/gals.
MERGE - A single command to INSERT / UPDATE / DELETE into a table from a row source.
FILESTREAM feature of SQL Server 2008 allows storage of and efficient access to BLOB data using a combination of SQL Server 2008 and the NTFS file system
CAST - get a date without a time portion
Group By - I gotta say you should definitely know this already
SQL Server Management Studio
Transactions
The sharing of local scope temp tables between nested procedure calls
INSERT INTO
MSDN
JOINS
PIVOT and UNPIVOT
WITH(FORCESEEK) - forces the query optimizer to use only an index seek operation as the access path to the data in the table.
FOR XML
COALESCE
How to shrink the database and log files
Information_Schema
SET IMPLICIT_TRANSACTIONS in Management Studio 2005
Derived tables and common table expressions (CTEs)
OUTPUT clause - allows access to the "virtual" tables called inserted and deleted (like in triggers)
CTRL + 0 to insert null
Spacial Data in SQL Server 2008

FileStream in SQL Server 2008: FILESTREAM feature of SQL Server 2008 allows storage of and efficient access to BLOB data using a combination of SQL Server 2008 and the NTFS file system.
Creating a Table for Storing FILESTREAM Data
Once the database has a FILESTREAM filegroup, tables can be created that contain FILESTREAM columns. As mentioned earlier, a FILESTREAM column is defined as a varbinary (max) column that has the FILESTREAM attribute. The following code creates a table with a single FILESTREAM column
USE Production;
GO
CREATE TABLE DocumentStore (
DocumentID INT IDENTITY PRIMARY KEY,
Document VARBINARY (MAX) FILESTREAM NULL,
DocGUID UNIQUEIDENTIFIER NOT NULL ROWGUIDCOL
UNIQUE DEFAULT NEWID ())
FILESTREAM_ON FileStreamGroup1;
GO

In SQL Server 2008 (and in Oracle 10g): MERGE.
A single command to INSERT / UPDATE / DELETE into a table from a row source.
To generate a list of numbers from 1 to 31 (say, for a calendary):
WITH cal AS
(
SELECT 1 AS day
UNION ALL
SELECT day + 1
FROM cal
WHERE day <= 30
)
A single-column index with DESC clause in a clustered table can be used for sorting on column DESC, cluster_key ASC:
CREATE INDEX ix_column_desc ON mytable (column DESC)
SELECT TOP 10 *
FROM mytable
ORDER BY
column DESC, pk
-- Uses the index
SELECT TOP 10 *
FROM mytable
ORDER BY
column, pk
-- Doesn't use the index
CROSS APPLY and OUTER APPLY: enables to join rowsources which depend on the values of the tables being joined:
SELECT *
FROM mytable
CROSS APPLY
my_tvf(mytable.column1) tvf
SELECT *
FROM mytable
CROSS APPLY
(
SELECT TOP 5 *
FROM othertable
WHERE othertable.column2 = mytable.column1
) q
EXCEPT and INTERSECT operators: allow selecting conditions that include NULLs
DECLARE #var1 INT
DECLARE #var2 INT
DECLARE #var3 INT
SET #var1 = 1
SET #var2 = NULL
SET #var2 = NULL
SELECT col1, col2, col3
FROM mytable
INTERSECT
SELECT #val1, #val2, #val3
-- selects rows with `col1 = 1`, `col2 IS NULL` and `col3 IS NULL`
SELECT col1, col2, col3
FROM mytable
EXCEPT
SELECT #val1, #val2, #val3
-- selects all other rows
WITH ROLLUP clause: selects a grand total for all grouped rows
SELECT month, SUM(sale)
FROM mytable
GROUP BY
month WITH ROLLUP
Month SUM(sale)
--- ---
Jan 10,000
Feb 20,000
Mar 30,000
NULL 60,000 -- a total due to `WITH ROLLUP`

It's amazing how many people work unprotected with SQL Server as they don't know about transactions!
BEGIN TRAN
...
COMMIT / ROLLBACK

After creating a #TempTable in a procedure, it is available in all stored procedures that are then called from from the original procedure. It is a nice way to share set data between procedures. see: http://www.sommarskog.se/share_data.html

COALESCE() , it accepts fields and a value to use incase the fields are null.
For example if you have a table with city, State, Zipcode you can use COALESCE() to return the addresses as single strings, IE:
City | State | Zipcode
Houston | Texas | 77058
Beaumont | Texas | NULL
NULL | Ohio | NULL
if you were to run this query against the table:
select city + ‘ ‘ + COALESCE(State,’’)+ ‘ ‘+COALESCE(Zipcode, ‘’)
Would return:
Houston Texas 77058
Beaumont Texas
Ohio
You can also use it to pivot data, IE:
DECLARE #addresses VARCHAR(MAX)
SELECT #addresses = select city + ‘ ‘ + COALESCE(State,’’)+ ‘ ‘
+COALESCE(Zipcode, ‘’) + ‘,’ FROM tb_addresses
SELECT #addresses
Would return:
Houston Texas 77058, Beaumont Texas, Ohio

A lot of SQL Server developers still don't seem to know about the OUTPUT clause (SQL Server 2005 and newer) on the DELETE, INSERT and UPDATE statement.
It can be extremely useful to know which rows have been INSERTed, UPDATEd, or DELETEd, and the OUTPUT clause allows to do this very easily - it allows access to the "virtual" tables called inserted and deleted (like in triggers):
DELETE FROM (table)
OUTPUT deleted.ID, deleted.Description
WHERE (condition)
If you're inserting values into a table which has an INT IDENTITY primary key field, with the OUTPUT clause, you can get the inserted new ID right away:
INSERT INTO MyTable(Field1, Field2)
OUTPUT inserted.ID
VALUES (Value1, Value2)
And if you're updating, it can be extremely useful to know what changed - in this case, inserted represents the new values (after the UPDATE), while deleted refers to the old values before the UPDATE:
UPDATE (table)
SET field1 = value1, field2 = value2
OUTPUT inserted.ID, deleted.field1, inserted.field1
WHERE (condition)
If a lot of info will be returned, the output of OUTPUT can also be redirected to a temporary table or a table variable (OUTPUT INTO #myInfoTable).
Extremely useful - and very little known!
Marc

There are a handful of ways to get a date without a time portion; here's one that is quite performant:
SELECT CAST(FLOOR(CAST(getdate() AS FLOAT))AS DATETIME)
Indeed for SQL Server 2008:
SELECT CAST(getdate() AS DATE) AS TodaysDate

The "Information_Schema" gives me a lot of views that I can use to gather information about the SQL objects tables, procedures, views, etc.

If you are using Management Studio 2005 you can have it automatically execute your query as a transaction. In a new query window go to Query->Query Options. Then click on the ANSI "tab" (on the left). Check SET IMPLICIT_TRANSACTIONS. Click OK. Now if you run any query in this current query window it will run as a transaction and you must manually ROLLBACK or COMMIT it before continuing. Additionally, this only works for the current query window; pre-existing/new query windows will need to have the option set.
I've personally found it useful. However, it's not for the faint of heart. You must remember to ROLLBACK or COMMIT your query. It will NOT tell you that you have a pending transaction if you switch to a different query window (or even a new one). However, it will tell you if you try to close the query window.

PIVOT and UNPIVOT

FOR XML

BACKUP LOG <DB_NAME> WITH TRUNCATE_ONLY
DBCC_SHRINKDATABASE(<DB_LOG_NAME>, <DESIRED_SIZE>)
When I started to manage very large databases on MS SQL Server and the log file had over 300 GB this statements saved my life. In most cases the shrink database will have no effect.
Before running them be sure to make full backup of LOG, and after running them to do a full backup of DB (restore sequence is no longer valid).

Most SQL Server developers should know about and use derived tables and common table expressions (CTEs).

The documentation.
Sad to say, but I have come to the conclusion that the most hidden feature that developers are unaware of is the documentation on MSDN. Take for instance a Transact-SQL verb like RESTORE. The BOL will cover not only the syntax and arguments of RESTORE. But this is only the tip of the iceberg when it comes to documentation. The BOL covers:
the in depth fundamentals of recovery: Understanding How Restore and Recovery of Backups Work in SQL Server.
end-to-end scenarios on how to deploy a recovery strategy: Implementing Restore Scenarios for SQL Server Databases.
the issues around system databases: Considerations for Backing Up and Restoring System Databases.
optimizing the recovery procedures: Optimizing Backup and Restore Performance in SQL Server.
understanding how to to a restore. Backing Up and Restoring How-to Topics (Transact-SQL).
more corner cases and uncommon scenarios, there are examples like Example: Piecemeal Restore of Only Some Filegroups (Full Recovery Model).
The list goes on and on, and this is just one single topic (backup and restore). Every feature of SQL Server gets similar coverage. Reckon not everything will get the detail backup and recovery gets, but everything is documented and there are How To topics for every feature.
The amount of information available is just ludicrous. Yet the documentation is one of the most underused resources, hence my vote for it being a hidden feature.

How about materialised views? Add a clustered index to a view and you effectively create a table containing duplicate data that is automatically updated. Slows down inserts and updates because you are doing the operation twice but you make selecting a specific subset faster. And apparently the database optimiser uses it without you having to call it explicitly.
Is a view faster than a simple query?

It sounds silly to say but I've looked a lot of queries where I just asked myself does the person just not know what GROUP BY is? I'm not sure if most developers are unaware of it but it comes up enough that I wonder sometimes.

use ctrl-0 to insert a null value in a cell

WITH (FORCESEEK) which forces the query optimizer to use only an index seek operation as the access path to the data in the table.

Spacial Data in SQL Server 2008 i.e. storing Lat/Long data in a geography datatype and being able to calculate/query using the functions that go along with it.
It supports both Planar and Geodetic data.

Why am I tempted to say JOINS?
Derived tables are one of my favorites. They perform so much better than correlated subqueries but may people continue to use correlated subqueries instead.
Example of a derived table:
select f.FailureFieldName, f.RejectedValue, f.RejectionDate,
ft.FailureDescription, f.DataTableLocation, f.RecordIdentifierFieldName,
f.RecordIdentifier , fs.StatusDescription
from dataFailures f
join(select max (dataFlowinstanceid) as dataFlowinstanceid
from dataFailures
where dataflowid = 13)a
on f.dataFlowinstanceid = a.dataFlowinstanceid
join FailureType ft on f.FailureTypeID = ft.FailureTypeID
join FailureStatus fs on f.FailureStatusID = fs.FailureStatusID

When I first started working as programmer, I started with using SQL Server 2000. I had been taught DB theory on Oracle and MySQL so I didn't know much about SQL Server 2000.
But, as it turned out nor did the development staff I joined because they didn't know that you could convert datetime (and related) data types to formatted strings with built in functions. They were using a very inefficient custom function they had developed. I was more than happy to show them the errors of their ways... (I'm not with that company anymore... :-D)
With that annotate:
So I wanted to add this to the list:
select Convert(varchar, getdate(), 101) -- 08/06/2009
select Convert(varchar, getdate(), 110) -- 08-06-2009
These are the two I use most often. There are a bunch more: CAST and CONVERT on MSDN

Unexpected #temp table performance

Bounty open:
Ok people, the boss needs an answer and I need a pay rise. It doesn't seem to be a cold caching issue.
UPDATE:
I've followed the advice below to no avail. How ever the client statistics threw up an interesting set of number.
#temp vs #temp
Number of INSERT, DELETE and UPDATE statements
0 vs 1
Rows affected by INSERT, DELETE, or UPDATE statements
0 vs 7647
Number of SELECT statements
0 vs 0
Rows returned by SELECT statements
0 vs 0
Number of transactions
0 vs 1
The most interesting being the number of rows affected and the number of transactions. To remind you, the queries below return identical results set, just into different styles of tables.
The following query are basicaly doing the same thing. They both select a set of results (about 7000) and populate this into either a temp or var table. In my mind the var table #temp should be created and populated quicker than the temp table #temp however the var table in the first example takes 1min 15sec to execute where as the temp table in the second example takes 16 seconds.
Can anyone offer an explanation?
declare #temp table (
id uniqueidentifier,
brand nvarchar(255),
field nvarchar(255),
date datetime,
lang nvarchar(5),
dtype varchar(50)
)
insert into #temp (id, brand, field, date, lang, dtype )
select id, brand, field, date, lang, dtype
from view
where brand = 'myBrand'
-- takes 1:15
vs
select id, brand, field, date, lang, dtype
into #temp
from view
where brand = 'myBrand'
DROP TABLE #temp
-- takes 16 seconds

I believe this almost completely comes down to table variable vs. temp table performance.
Table variables are optimized for having exactly one row. When the query optimizer chooses an execution plan, it does it on the (often false) assumption that that the table variable only has a single row.
I can't find a good source for this, but it is at least mentioned here:
http://technet.microsoft.com/en-us/magazine/2007.11.sqlquery.aspx
Other related sources:
http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=125052
http://databases.aspfaq.com/database/should-i-use-a-temp-table-or-a-table-variable.html

Run both with SET STATISTICS IO ON and SET STATISTICS TIME ON. Run 6-7 times each, discard the best and worst results for both cases, then compare the two average times.
I suspect the difference is primarily from a cold cache (first execution) vs. a warm cache (second execution). The output from STATISTICS IO would give away such a case, as a big difference in the physical reads between the runs.
And make sure you have 'lab' conditions for the test: no other tasks running (no lock contention), databases (including tempdb) and logs are pre-grown to required size so you don't hit any log growth or database growth event.

This is not uncommon. Table variables can be (and in a lot of cases ARE) slower than temp tables. Here are some of the reasons for this:
SQL Server maintains statistics for queries that use temporary tables but not for queries that use table variables. Without statistics, SQL Server might choose a poor processing plan for a query that contains a table variable
Non-clustered indexes cannot be created on table variables, other than the system indexes that are created for a PRIMARY or UNIQUE constraint. That can influence the query performance when compared to a temporary table with non-clustered indexes.
table variables use internal metadata in a way that prevents the engine from using a table variable within a parallel query (this means that it wont take advantage of multi-processor machines).
A table variable is optimized for one row, by SQL Server (it assumes 1 row will be returned).

I'm not 100% that this is the cause, but the table var will not have any statistics whereas the temp table will.

SELECT INTO is a non-logged operation, which would likely explain most of the performance difference. INSERT creates a log entry for every operation.
Additionally, SELECT INTO is creating the table as part of the operation, so SQL Server knows automatically that there are no constraints on it, which may factor in.

If it takes over a full minute to insert 7000 records into a temp table (persistent or variable), then the perf issue is almost certainly in the SELECT statement that's populating it.
Have you run DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS before profiling? I'm thinking that maybe it's using some cached results for the second query.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas