sql query taking too long - sql

I have simple "insert into.." query which is taking around 40 seconds to execute. It simply takes records from one table and inserts into another.
I have index on F1 and BatchID on the tbl_TempCatalogue table
146624 records are affected.
select itself is not slow, insert into is slow.
The complete query is:
insert into tbl_ItemPrice
(CATALOGUEVERSIONID,SERIESNUMBER,TYPE,PRICEFIELD,PRICE,
PRICEONREQUEST,recordid)
select 296 as CATALOGUEVERSIONID
,ISNULL(F2,'-32768') as SERIESNUMBER
,ISNULL(F3,'-32768') as TYPE
,ISNULL(F4,'-32768') as PRICEFIELD,F5 as PRICE
,(case when F6 IS NULL then null when F6 = '0' then 'False'
else 'True' end ) as PRICEONREQUEST
,newid()
from tbl_TempCatalogue
where F1 = 450
and BATCHID = 72
Thanks

It's possible that if your table is large, you'd benefit from indexes on F1 and BATCHID in the tbl_TempCatalogue table. It's not clear what DBMS you're using, but most have decent tools to show you an execution plan. If you're doing full table scans on a large table, that may take a long time to run.
Also, you say that the "insert into" is slow, but you include just the code for the select. Is the select slow by itself?

You say the problem lies with the INSERT not the SELECT. So possible culprits are (in no fixed order):
triggers
storage allocation
foreign key validation
check constraint validation
i/o bottlenecks
faulty disk
contention with other sessions
What tools you have to undertake diagnosis will depend on which database product (and which version of which database) you are using. Please include these details in your question.

Is there an index on the tbl_TempCatalogue table to help the database find the rows where F1=450 and BATCHID=72?
Otherwise, it will probably need to scan the entire table to find them.

Your query looks fine ..but my concern is in newid() function , what logic written there that may hit the performance, plese try to run without newid() and see the execution time of select statement..
to solve such type of issue follow the steps
see execution time only for select statement
see execution time of insert statement with select
see execution time for select statement without newid() function
after comparing those time slots you will be able to locate the exact root cause that where the problem reside..after that please post that time slots so that we will try to sole this issue..

Related

INSERT INTO SELECT and SELECT INTO take much longer than the SELECT

I've got a SELECT statement which takes around 500-600ms to execute. If I use the same SELECT in a INSERT INTO ... SELECT ... or SELECT ... INTO it takes up to 30 seconds.
The table is more like a data copy of a view, for performance reasons which gets truncated and filled with the data from time to time. So my SQL looks like:
TRUNCATE myTable
INSERT INTO myTable (col, col, col) SELECT col, col, col FROM otherTable INNER JOIN ...
I tried multiple things like inserting the data into a temp table so no indexes etc. are on the table (well I also tried dropping the indexes from the original table) but nothing seems to help. If I'm inserting the data into the temp table first (which also takes 30 seconds) and then copy it to the real table, the copy itself is pretty fast (< 1 second).
The query results in ~3800 rows and like 30-40 columns.
The second time executing the Truncate-INSERT INTO/SELECT INTO sql takes less than a second (until I clear all caches). The execution plans look the same, except for the Table Insert which has a cost of 90%.
Also tried to get rid of any implicit conversions but that didnt help either.
Someone knows how this can be possible or how I could find the problem? The problem exists on multiple systems running Sql Server 2014/2016.
Edit: Just saw the execution plan of my SELECT shows an "Excessiv Grant" message as it estimated ~11000 rows but the result is only ~3800 rows. Could that be a reason for the slow insert?
I've just had the same problem. All the data types, sizes & allow-NULLS were the same in my SELECT and target table. I tried changing the table to a HEAP, then a cluster, but it made no difference. The SELECT took around 15 seconds but with the INSERT it took around 4 minutes.
In my case, I ended up using SELECT INTO a temp table, then SELECTing from that into my real table, and it reverted back to 15 seconds or so.
The OP said they tried this and it didn't work, but it may do for some people.
I had identical problem.
Select takes around 900ms to execute insert / select into took more then 2 minutes.
I have re written select to improve performance - just few ms for select but it have great improvement for insert.
Try to simplify query plan as much is possible.
for example if you have multiple joins try to prepare multi - steps solution.
For what it's worth now, I had a similar problem just today. It turned out that the table I was inserting into had INT types, and the table I was selecting from had SMALLINT types. Thus, a type conversion was going on (several times) for each row.
Once I changed the target table to have the same types as the source table, then the insertion and selection took the same order of magnitude.

Exclude Certain Records with Certain Values SQL Select

I've used the solution (by AdaTheDev) from the thread below as it relates to the same question:
How to exclude records with certain values in sql select
But when applying the same query to over 40,000 records, the query takes too long to process (>30mins). Is there another way that's efficient in querying for certain records with certain values (the same question as in the above stackoverflow thread). I've attempted to use this below and still no luck:
SELECT StoreId
FROM sc
WHERE StoreId NOT IN (
SELECT StoreId
FROM StoreClients
Where ClientId=5
);
Thanks --
You could use EXISTS:
SELECT StoreId
FROM sc
WHERE NOT EXISTS (
SELECT 1
FROM StoreClients cl
WHERE sc.StoreId = cl.StoreId
AND cl.ClientId = 5
)
Make sure to create an index on StoreId column. Your query could also benefit from an index on ClientId.
40,000 rows isn't too big for SQL server. The other thread is suggesting some good queries too. If I understand correctly, your problem right now is the performance.
To improve the performance, you may need to add a new index to your table, or just update the statistics on your table.
If you are running this query in SQL server management studio for testing and performance tuning, make sure that you are selecting "Include Actual Execution Plan". This will make the query to take longer to execute, but the execution plan result can help you on where the time is spent.
It usually suggests adding some indexes to improve the performance, which is easy to follow.
If adding new indexes are not possible, then you have to spend some time reading on how to understand the execution plan's diagram and finding the slow areas.

How to speed up this simple query

With SourceTable having > 15MM records and Bad_Phrase having > 3K records, the following query takes almost 10 hours to run on SQL Server 2005 SP4.
Update [SourceTable]
Set Bad_Count = (Select count(*)
from Bad_Phrase
where [SourceTable].Name like '%'+Bad_Phrase.PHRASE+'%')
In English, this query is counting the number of times that any phrases listed in Bad_Phrase are a substring of the column [Name] in the SourceTable and then placing that result in the column Bad_Count.
I would like some suggestions on how to have this query run considerably faster.
For a lack of a better idea, here is one:
I don't know if SQL Server natively supports parallelizing an UPDATE statement, but you can try to do it yourself manually by partitioning the work that needs to be done.
For instance, just as an example, if you can run the following 2 update statements in parallel manually or by writing a small app, I'd be curious to see if you can bring down your total processing time.
Update [SourceTable]
Set Bad_Count=(
Select count(*)
from Bad_Phrase
where [SourceTable].Name like '%'+Bad_Phrase.PHRASE+'%'
)
where Name < 'm'
Update [SourceTable]
Set Bad_Count=(
Select count(*)
from Bad_Phrase
where [SourceTable].Name like '%'+Bad_Phrase.PHRASE+'%'
)
where Name >= 'm'
So the 1st update statement takes care of updating all the rows whose names start with the letters a-l, and the 2nd query takes care of o-z.
It's just an idea, and you can try splitting this into smaller chunks and more parallel update statements, depending on the capacity of your SQL Server machine.
Sounds like your query is scanning the whole table. Does your tables have proper indexes on them. Putting an index on columns that appear in a where clause is a good place to start. You can also try and get the cost of the query in the Sql server management studio (display estimated execution cost) or if your willing to wait (display actual execution cost) are both buttons in the query window. The cost will provide insights as to what is taking forever and possibly steer you to wright faster queries.
You are updating the table using sub query with the same table, every row update will scan the whole table and that may cause too much execution time. I think is better if you will insert first all data in the #temp table and then use the #temp table in your update statement. Or you can JOIN the Source table and Temp table as well.

Query with a UNION sub-query takes a very long time

I've been having an odd problem with some queries that depend on a sub query. They run lightning fast, until I use a UNION statement in the sub query. Then they run endlessly, I've given after 10 minutes. The scenario I'm describing now isn't the original one I started with, but I think it cuts out a lot of possible problems yet yields the same problem. So even though it's a pointless query, bear with me!
I have a table:
tblUser - 100,000 rows
tblFavourites - 200,000 rows
If I execute:
SELECT COUNT(*)
FROM tblFavourites
WHERE userID NOT IN (SELECT uid FROM tblUser);
… then it runs in under a second. However, if I modify it so that the sub query has a UNION, it will run for at least 10 minutes (before I give up!)
SELECT COUNT(*)
FROM tblFavourites
WHERE userID NOT IN (SELECT uid FROM tblUser UNION SELECT uid FROM tblUser);
A pointless change, but it should yield the same result and I don't see why it should take any longer?
Putting the sub-query into a view and calling that instead has the same effect.
Any ideas why this would be? I'm using SQL Azure.
Problem solved. See my answer below.
UNION generate unique values, so the DBMS engine makes sorts.
You can use safely UNION ALL in this case.
UNION is really doing a DISTINCT on all fields in the combined data set. It filters out dupes in the final results.
Is Uid indexed? If not it may take a long time as the query engine:
Generates the first result set
Generates the second result set
Filters out all the dupes (which is half the records) in a hash table
If duplicates aren't a concern (and using IN means they won't be) then use UNION ALL which removes the expensive Sort/Filter step.
UNION's are usually implemented via temporary in-memory tables. You're essentially copying your tblUser two times into memory, WITH NO INDEX. Then every row in tblFavourites incur a complete table scan over 200,000 rows - that's 200Kx200K=40 billion double-row scans (because the query engine must get the uid from both table rows)
If your tblUser has an index on uid (which is definitely true because all tables in SQL Azure must have a clustered index), then each row in tblFavourites incurs a very fast index lookup, resulting in only 200Kxlog(100K) =200Kx17 = 200K row scans, each with 17 b-tree index comparisons (which is much faster than reading the uid from a row on a data page), so it should equate to roughly 200Kx(3-4) or 1 million double-row scans. I believe newer versions of SQL server may also build a temp hash table containing just the uid's, so essentially it gets down to 200K row scans (assuming hash table lookup to be trivial).
You should also generate your query plan to check.
Essentially, the non-UNION query runs around 500,000 times faster if tblUser has an index (should be on SQL Azure).
It turns out the problem was due to one of the indexes ... tblFavourites contained two foreign keys to the primary key (uid) in tblUser:
userId
otherUserId
both columns had the same definition and same indexes, but I discovered that swapping userId for otherUserId in the original query solved the problem.
I ran:
ALTER INDEX ALL ON tblFavourites REBUILD
... and the problem went away. The query now executes almost instantly.
I don't know too much about what goes on behind the scenes in Sql Server/Azure ... but I can only imagine that it was a damaged index or something? I update statistics frequently, but that had no effect.
Thanks!
---- UPDATE
The above was not fully correct. It did fix the problem for around 20 minutes, then it returned. I have been in touch with Microsoft support for several days and it seems the problem is to do with the tempDB. They are working on a solution at their end.
I just ran into this problem. I have about 1million rows to go through and then I realized that some of my IDs were in another table, so I unioned to get the same information in one "NOT EXISTS." I went from the query taking about 7 sec to processing only 5000 rows after a minute or so. This seemed to help. I absolutely hate the solution, but I've tried a multitude of things that all end up w/the same extremely slow execution plan. This one got me what I needed in about 18 sec.
DECLARE #PIDS TABLE ([PID] [INT] PRIMARY KEY)
INSERT INTO #PIDS SELECT DISTINCT [ID] FROM [STAGE_TABLE] WITH(NOLOCK)
INSERT INTO #PIDS SELECT DISTINCT [OTHERID] FROM [PRODUCTION_TABLE] WITH(NOLOCK)
WHERE NOT EXISTS(SELECT [PID] FROM #PIDS WHERE [PID] = [OTHERID]
SELECT (columns needed)
FROM [ORDER_HEADER] [OH] WITH(NOLOCK)
INNER JOIN #PIDS ON [OH].[SOME_ID] = [PID]
(And yes I tried "WHERE EXISTS IN..." for the final select... inner join was faster)
Please let me say again, I personaly feel this is really ugly, but I actually use this join twice in my proc, so it's going to save me time in the long run. Hope this helps.
Doesn't it make more sense to rephrase the questions from
"UserIds that aren't on the combined list of all the Ids that apper in this table and/or that table"
to
"UserIds that aren't on this table AND aren't on that table either
SELECT COUNT(*)
FROM tblFavourites
WHERE userID NOT IN (SELECT uid FROM tblUser)
AND userID NOT IN (SELECT uid FROM tblUser);

SQL massive performance difference using SELECT TOP x even when x is much higher than selected rows

I'm selecting some rows from a table valued function but have found an inexplicable massive performance difference by putting SELECT TOP in the query.
SELECT col1, col2, col3 etc
FROM dbo.some_table_function
WHERE col1 = #parameter
--ORDER BY col1
is taking upwards of 5 or 6 mins to complete.
However
SELECT TOP 6000 col1, col2, col3 etc
FROM dbo.some_table_function
WHERE col1 = #parameter
--ORDER BY col1
completes in about 4 or 5 seconds.
This wouldn't surprise me if the returned set of data were huge, but the particular query involved returns ~5000 rows out of 200,000.
So in both cases, the whole of the table is processed, as SQL Server continues to the end in search of 6000 rows which it will never get to. Why the massive difference then? Is this something to do with the way SQL Server allocates space in anticipation of the result set size (the TOP 6000 thereby giving it a low requirement which is more easily allocated in memory)?
Has anyone else witnessed something like this?
Thanks
Table valued functions can have a non-linear execution time.
Let's consider function equivalent for this query:
SELECT (
SELECT SUM(mi.value)
FROM mytable mi
WHERE mi.id <= mo.id
)
FROM mytable mo
ORDER BY
mo.value
This query (that calculates the running SUM) is fast at the beginning and slow at the end, since on each row from mo it should sum all the preceding values which requires rewinding the rowsource.
Time taken to calculate SUM for each row increases as the row numbers increase.
If you make mytable large enough (say, 100,000 rows, as in your example) and run this query you will see that it takes considerable time.
However, if you apply TOP 5000 to this query you will see that it completes much faster than 1/20 of the time required for the full table.
Most probably, something similar happens in your case too.
To say something more definitely, I need to see the function definition.
Update:
SQL Server can push predicates into the function.
For instance, I just created this TVF:
CREATE FUNCTION fn_test()
RETURNS TABLE
AS
RETURN (
SELECT *
FROM master
);
These queries:
SELECT *
FROM fn_test()
WHERE name = #name
SELECT TOP 1000 *
FROM fn_test()
WHERE name = #name
yield different execution plans (the first one uses clustered scan, the second one uses an index seek with a TOP)
I had the same problem, a simple query joining five tables returning 1000 rows took two minutes to complete. When I added "TOP 10000" to it it completed in less than one second. It turned out that the clustered index on one of the tables was heavily fragmented.
After rebuilding the index the query now completes in less than a second.
Your TOP has no ORDER BY, so it's simply the same as SET ROWCOUNT 6000 first. An ORDER BY would require all rows to be evaluated first, and it's would take a lot longer.
If dbo.some_table_function is a inline table valued udf, then it's simply a macro that's expanded so it returns the first 6000 rows as mentioned in no particular order.
If the udf is multi valued, then it's a black box and will always pull in the full dataset before filtering. I don't think this is happening.
Not directly related, but another SO question on TVFs
You may be running into something as simple as caching here - perhaps (for whatever reason) the "TOP" query is cached? Using an index that the other isn't?
In any case the best way to quench your curiosity is to examine the full execution plan for both queries. You can do this right in SQL Management Console and it'll tell you EXACTLY what operations are being completed and how long each is predicted to take.
All SQL implementations are quirky in their own way - SQL Server's no exception. These kind of "whaaaaaa?!" moments are pretty common. ;^)
It's not necessarily true that the whole table is processed if col1 has an index.
The SQL optimization will choose whether or not to use an index. Perhaps your "TOP" is forcing it to use the index.
If you are using the MSSQL Query Analyzer (The name escapes me) hit Ctrl-K. This will show the execution plan for the query instead of executing it. Mousing over the icons will show the IO/CPU usage, I believe.
I bet one is using an index seek, while the other isn't.
If you have a generic client:
SET SHOWPLAN_ALL ON;
GO
select ...;
go
see http://msdn.microsoft.com/en-us/library/ms187735.aspx for details.
I think Quassnois' suggestion seems very plausible. By adding TOP 6000 you are implicitly giving the optimizer a hint that a fairly small subset of the 200,000 rows are going to be returned. The optimizer then uses an index seek instead of an clustered index scan or table scan.
Another possible explanation could caching, as Jim davis suggests. This is fairly easy to rule out by running the queries again. Try running the one with TOP 6000 first.