Inserting into temp table from view is very slow

Inserting into temp table from view is very slow - sql

I am using different temp tables in my query. When I execute the query below
select * from myView
It takes only 5 seconds to execute.
but when I execute
select * into #temp from myView
It takes 50 seconds (10 times more than above query).
We migrated from SQL Server 2000 to SQL Server 2008 R2. Before in SQL 2000 both of the query takes same time but in SQL Server 2008 it takes 10 times more to execute.

Old question, but as I had a similar issue (though on SQL Server 2014) and resolved it in a way which I have not seen on any readily available resource, thought I would share in hopes of it being helpful to someone else.
I had a similar situation: a view I had created was taking 21 seconds to return its complete result set, but would take 10+ minutes (at which point I stopped the query) when I converted it into a SELECT..INTO The SELECT was a simple one, with no joins and no predicates. My hunch was that the optimizer was altering the original plan based on the additional INTO statement which did not simply pull the data set as in the first instance, then perform the INSERT, but instead altered it in a way to run very sub-optimally.
I first tried an OPENQUERY, attempting to force the result set to be generated first, then inserted into the temp table. Total running time for this method was 23 seconds, obviously much closer to the original SELECT time. Following this, I returned to my original SELECT..INTO query and added an OPTION (FORCE ORDER) hint to try to replicate the OPENQUERY behavior. This seemed to have done the trick and the time was on par with the OPENQUERY method, 23 seconds.
I don't have enough time at the moment to compare the query plans, but as a quick and dirty option if you run into this issue, you can try:
select * into #temp from myView option (force order);

Yeah, I would check the Execution Plan for your command. there may be an overhead on a sort or something.

I think, your tempdb database is in trouble. May be slow I/O, fragmentation, broken RAID etc.

Do you have an order by clause in your select statement such as select * from myView order by col1 before inserting into temp table? If there is an order by, that slows down insertion into temp table heavily. If that is the case, remove order by while the insertion happens and order after the insertion happened like
select *
into #temp
from myView
then apply order by
select * from #temp order by col1

Related

INSERT INTO SELECT and SELECT INTO take much longer than the SELECT

I've got a SELECT statement which takes around 500-600ms to execute. If I use the same SELECT in a INSERT INTO ... SELECT ... or SELECT ... INTO it takes up to 30 seconds.
The table is more like a data copy of a view, for performance reasons which gets truncated and filled with the data from time to time. So my SQL looks like:
TRUNCATE myTable
INSERT INTO myTable (col, col, col) SELECT col, col, col FROM otherTable INNER JOIN ...
I tried multiple things like inserting the data into a temp table so no indexes etc. are on the table (well I also tried dropping the indexes from the original table) but nothing seems to help. If I'm inserting the data into the temp table first (which also takes 30 seconds) and then copy it to the real table, the copy itself is pretty fast (< 1 second).
The query results in ~3800 rows and like 30-40 columns.
The second time executing the Truncate-INSERT INTO/SELECT INTO sql takes less than a second (until I clear all caches). The execution plans look the same, except for the Table Insert which has a cost of 90%.
Also tried to get rid of any implicit conversions but that didnt help either.
Someone knows how this can be possible or how I could find the problem? The problem exists on multiple systems running Sql Server 2014/2016.
Edit: Just saw the execution plan of my SELECT shows an "Excessiv Grant" message as it estimated ~11000 rows but the result is only ~3800 rows. Could that be a reason for the slow insert?

I've just had the same problem. All the data types, sizes & allow-NULLS were the same in my SELECT and target table. I tried changing the table to a HEAP, then a cluster, but it made no difference. The SELECT took around 15 seconds but with the INSERT it took around 4 minutes.
In my case, I ended up using SELECT INTO a temp table, then SELECTing from that into my real table, and it reverted back to 15 seconds or so.
The OP said they tried this and it didn't work, but it may do for some people.

I had identical problem.
Select takes around 900ms to execute insert / select into took more then 2 minutes.
I have re written select to improve performance - just few ms for select but it have great improvement for insert.
Try to simplify query plan as much is possible.
for example if you have multiple joins try to prepare multi - steps solution.

For what it's worth now, I had a similar problem just today. It turned out that the table I was inserting into had INT types, and the table I was selecting from had SMALLINT types. Thus, a type conversion was going on (several times) for each row.
Once I changed the target table to have the same types as the source table, then the insertion and selection took the same order of magnitude.

Query is very slow when we put a where clause on the total selected data by query

I am running a query which is selecting data on the basis of joins between 6-7 tables. When I execute the query it is taking 3-4 seconds to complete. But when I put a where clause on the fetched data it's taking more than one minute to execute. My query is fetching large amounts of data so I can't write it here but the situation I faced is explained below:
Select Category,x,y,z
from
(
---Sample Query
) as a
it's only taking 3-4 seconds to execute. But
Select Category,x,y,z
from
(
---Sample Query
) as a
where category Like 'Spart%'
is taking more than 2-3 minutes to execute.
Why is it taking more time to execute when I use the where clause?

It's impossible to say exactly what the issue is without seeing the full query. It is likely that the optimiser is pushing the WHERE into the "Sample query" in a way that is not performant. Possibly could be resolved by updating statistics on the table, but an easier option would be to insert the whole query into a temporary table, and filter from there.
Select Category,x,y,z
INTO #temp
from
(
---Sample Query
) as a
SELECT * FROM #temp WHERE category Like 'Spart%'
This will force the optimiser to tackle it in the logical order of pulling your data together before applying the WHERE to the end result. You might like to consider indexing the temp table's category field also.

If you're using MS SQL by checking the management studio actual execution plan it may already suggest an index creation
In any case, you should add to the index used by the query the column "Category"
If you don't have an index on that table create it composed by column "Category" and all the other columns used in join or where
bear in mind by using like 'text%' clause you could end in index scan and not index seek

How to speed up this simple query

With SourceTable having > 15MM records and Bad_Phrase having > 3K records, the following query takes almost 10 hours to run on SQL Server 2005 SP4.
Update [SourceTable]
Set Bad_Count = (Select count(*)
from Bad_Phrase
where [SourceTable].Name like '%'+Bad_Phrase.PHRASE+'%')
In English, this query is counting the number of times that any phrases listed in Bad_Phrase are a substring of the column [Name] in the SourceTable and then placing that result in the column Bad_Count.
I would like some suggestions on how to have this query run considerably faster.

For a lack of a better idea, here is one:
I don't know if SQL Server natively supports parallelizing an UPDATE statement, but you can try to do it yourself manually by partitioning the work that needs to be done.
For instance, just as an example, if you can run the following 2 update statements in parallel manually or by writing a small app, I'd be curious to see if you can bring down your total processing time.
Update [SourceTable]
Set Bad_Count=(
Select count(*)
from Bad_Phrase
where [SourceTable].Name like '%'+Bad_Phrase.PHRASE+'%'
)
where Name < 'm'
Update [SourceTable]
Set Bad_Count=(
Select count(*)
from Bad_Phrase
where [SourceTable].Name like '%'+Bad_Phrase.PHRASE+'%'
)
where Name >= 'm'
So the 1st update statement takes care of updating all the rows whose names start with the letters a-l, and the 2nd query takes care of o-z.
It's just an idea, and you can try splitting this into smaller chunks and more parallel update statements, depending on the capacity of your SQL Server machine.

Sounds like your query is scanning the whole table. Does your tables have proper indexes on them. Putting an index on columns that appear in a where clause is a good place to start. You can also try and get the cost of the query in the Sql server management studio (display estimated execution cost) or if your willing to wait (display actual execution cost) are both buttons in the query window. The cost will provide insights as to what is taking forever and possibly steer you to wright faster queries.

You are updating the table using sub query with the same table, every row update will scan the whole table and that may cause too much execution time. I think is better if you will insert first all data in the #temp table and then use the #temp table in your update statement. Or you can JOIN the Source table and Temp table as well.

SQL Server Query time out depending on Where Clause

I have a query that uses 3 functions and a few different views beneath it, which are too complex to post here. The odd thing I am experiencing is when running the top level query, having more than 1 search key is causing the query to take around an hour to run, where splitting the query in two takes about 5 seconds per query.
Here is the top level query:
Select *
from dbo.vwSimpleInvoice i
inner join dbo.vwRPTInvoiceLineItemDetail d on i.InvoiceID = d.InvoiceID
When I add this where clause:
Where i.InvoiceID = 109581
The query takes about 3 seconds to run. Similarly when I add this where clause:
Where i.InvoiceID = 109582
it takes about 3 seconds.
When I add this where clause though:
Where i.InvoiceID in (109581, 109582)
I have had to kill the query after about 50 minutes, and it never returns any results.
This is occurring on a remote client's server running SQL Server 2008 R2 Express. When I run it locally (also on SQL Server 2008 R2 Express), I don't get the massive delay, the last where clause takes about 30 seconds to return. The client has a lot more data than me though.
Any idea where to start troubleshooting this?
Edit:
After the comments below I rebuilt indexes and stats, which improved performance of the first 2 where clauses, but had no effect on the third. I then played around with the query, and discovered that if I rewrote it as:
Select *
from dbo.vwSimpleInvoice i
inner join
(Select * from dbo.vwRPTInvoiceLineItemDetail) d on i.InvoiceID = d.InvoiceID
Where i.InvoiceID in (109581, 109582)
Performance returns to expected levels, around 200 ms. I am now more mystified than ever as to what is occurring...
Edit 2:
Actually, I am wrong. It wasn't rewriting the query like that, I accidentally changed the Where Clause during the rewrite to:
Where d.InvoiceID in (109581, 109582)
(Changed i to d).
Still at a bit of a loss as to why this makes such as massive difference on an Inner Join?
Further edit:
Playing around with this even further, I still cannot understand it.
Select InvoiceId from tblInvoice Where CustomerID = 2000
returns:
80442, 4988, 98497, 102483, 102484, 107958, 127063, 168444, 168531, 173382, 173487, 173633, 174013, 174160, 174240, 175389
Select * from dbo.vwRPTInvoiceLineItemDetail
Where InvoiceID in
(80442, 4988, 98497, 102483, 102484, 107958, 127063, 168444, 168531, 173382, 173487, 173633, 174013, 174160, 174240, 175389)
Runs: 31 Rows returned 110 ms
Select * from dbo.vwRPTInvoiceLineItemDetail
Where InvoiceID in
(Select InvoiceId from tblInvoice Where CustomerID = 2000)
Runs: 31 rows returned 65 minutes

The Problem you are experiencing is (almost certainly) due to a cached query plan, which is appropriate for some version of parameters passed to the query, but not for others (aka Parameter Sniffing).
This is a common occurance, and is often made worse by out of date statistics and/or badly fragmented indexes.
First step: ensure you have rebuilt all your indexes and that statistics on non-indexed columns are up to date. (Also, make sure your client has a regularly scheduled index maintenance job)
exec sp_msforeachtable "DBCC DBREINDEX('?')"
go
exec sp_msforeachtable "UPDATE STATISTICS ? WITH FULLSCAN, COLUMNS"
go
This is the canonical reference: Slow in the Application, Fast in SSMS?
If the problem still exists after rebuilding indexes and updating statistics, then you have a few options:
Use dynamic SQL (but read this first: The Curse and Blessings of
Dynamic SQL)
Use OPTIMIZE FOR
Use WITH(RECOMPILE)

SQL massive performance difference using SELECT TOP x even when x is much higher than selected rows

I'm selecting some rows from a table valued function but have found an inexplicable massive performance difference by putting SELECT TOP in the query.
SELECT col1, col2, col3 etc
FROM dbo.some_table_function
WHERE col1 = #parameter
--ORDER BY col1
is taking upwards of 5 or 6 mins to complete.
However
SELECT TOP 6000 col1, col2, col3 etc
FROM dbo.some_table_function
WHERE col1 = #parameter
--ORDER BY col1
completes in about 4 or 5 seconds.
This wouldn't surprise me if the returned set of data were huge, but the particular query involved returns ~5000 rows out of 200,000.
So in both cases, the whole of the table is processed, as SQL Server continues to the end in search of 6000 rows which it will never get to. Why the massive difference then? Is this something to do with the way SQL Server allocates space in anticipation of the result set size (the TOP 6000 thereby giving it a low requirement which is more easily allocated in memory)?
Has anyone else witnessed something like this?
Thanks

Table valued functions can have a non-linear execution time.
Let's consider function equivalent for this query:
SELECT (
SELECT SUM(mi.value)
FROM mytable mi
WHERE mi.id <= mo.id
)
FROM mytable mo
ORDER BY
mo.value
This query (that calculates the running SUM) is fast at the beginning and slow at the end, since on each row from mo it should sum all the preceding values which requires rewinding the rowsource.
Time taken to calculate SUM for each row increases as the row numbers increase.
If you make mytable large enough (say, 100,000 rows, as in your example) and run this query you will see that it takes considerable time.
However, if you apply TOP 5000 to this query you will see that it completes much faster than 1/20 of the time required for the full table.
Most probably, something similar happens in your case too.
To say something more definitely, I need to see the function definition.
Update:
SQL Server can push predicates into the function.
For instance, I just created this TVF:
CREATE FUNCTION fn_test()
RETURNS TABLE
AS
RETURN (
SELECT *
FROM master
);
These queries:
SELECT *
FROM fn_test()
WHERE name = #name
SELECT TOP 1000 *
FROM fn_test()
WHERE name = #name
yield different execution plans (the first one uses clustered scan, the second one uses an index seek with a TOP)

I had the same problem, a simple query joining five tables returning 1000 rows took two minutes to complete. When I added "TOP 10000" to it it completed in less than one second. It turned out that the clustered index on one of the tables was heavily fragmented.
After rebuilding the index the query now completes in less than a second.

Your TOP has no ORDER BY, so it's simply the same as SET ROWCOUNT 6000 first. An ORDER BY would require all rows to be evaluated first, and it's would take a lot longer.
If dbo.some_table_function is a inline table valued udf, then it's simply a macro that's expanded so it returns the first 6000 rows as mentioned in no particular order.
If the udf is multi valued, then it's a black box and will always pull in the full dataset before filtering. I don't think this is happening.
Not directly related, but another SO question on TVFs

You may be running into something as simple as caching here - perhaps (for whatever reason) the "TOP" query is cached? Using an index that the other isn't?
In any case the best way to quench your curiosity is to examine the full execution plan for both queries. You can do this right in SQL Management Console and it'll tell you EXACTLY what operations are being completed and how long each is predicted to take.
All SQL implementations are quirky in their own way - SQL Server's no exception. These kind of "whaaaaaa?!" moments are pretty common. ;^)

It's not necessarily true that the whole table is processed if col1 has an index.
The SQL optimization will choose whether or not to use an index. Perhaps your "TOP" is forcing it to use the index.
If you are using the MSSQL Query Analyzer (The name escapes me) hit Ctrl-K. This will show the execution plan for the query instead of executing it. Mousing over the icons will show the IO/CPU usage, I believe.
I bet one is using an index seek, while the other isn't.
If you have a generic client:
SET SHOWPLAN_ALL ON;
GO
select ...;
go
see http://msdn.microsoft.com/en-us/library/ms187735.aspx for details.

I think Quassnois' suggestion seems very plausible. By adding TOP 6000 you are implicitly giving the optimizer a hint that a fairly small subset of the 200,000 rows are going to be returned. The optimizer then uses an index seek instead of an clustered index scan or table scan.
Another possible explanation could caching, as Jim davis suggests. This is fairly easy to rule out by running the queries again. Try running the one with TOP 6000 first.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas