SQL query does not stop executing - sql

I've created a stored procedure that selects from a few different tables in different dbs into temp tables to work with in the proc. Everything seems to be fine with filling the tables. My problem arises when I attempt to join them. This is the query I wrote:
SELECT #temp1.id,
#temp2.first_name,
#temp2.last_name,
#temp3.DOB,
#temp4.Sex,
#temp5.SSN
FROM (((#temp1
LEFT JOIN #temp3 ON #temp1.id = #temp3.id)
LEFT JOIN #temp4 ON #temp1.id = #temp4.id)
LEFT JOIN #temp2 ON #temp1.id = #temp2.id)
LEFT JOIN #temp5 ON #temp1.id = #temp5.id;
The query works to an extent. The output window is filled with the results of the select. The problem is that the query doesn't exit. It stops adding new records to the output but continues executing and thus the procedure hangs because it can't move on to the next statement. Any ideas?

I can see that your query is resulting in high table scan for all the 5 tables in the execution plan. You can create indexes on the joining column (ID) in all the 5 temp tables as follows:
CREATE CLUSTERED INDEX IDX_C_t1_ID ON #temp1(ID)
CREATE CLUSTERED INDEX IDX_C_t2_ID ON #temp2(ID)
CREATE CLUSTERED INDEX IDX_C_t3_ID ON #temp3(ID)
CREATE CLUSTERED INDEX IDX_C_t4_ID ON #temp4(ID)
CREATE CLUSTERED INDEX IDX_C_t5_ID ON #temp5(ID)
It'd be really helpful if you can include number of rows and columns for all the 5 tables.

How many records are in #temp1? Are you seeing that same number of records in your output window? If not, then your query is still returning records.
Also, did you explicitly begin a transaction somewhere in your script? Maybe it is waiting for you to commit or rollback.
Also, I'm not sure why you have parenthesis around your joins. They aren't needed (and maybe you are getting some weird execution plan by using them).

The running time of this query depends (among else) on the Number of rows in #temp1 and on the indexes that it has (by default --> nothing ).
First step would be as I see it:
Isolating the execution plan, specifically for this query. You can do so, by filling your data into temp1 (and not #temp1),
Then - try running only the query against temp1.
Seeing the execution plan for the query, may recommend which indexes might help, and will be a first step towards optimizing it.

Related

SQL Query very slow - Suddenly

I have a SQL stored procedure that was running perfect (.2 secs execution or less), suddenly today its taking more than 10 minutes.
I see that the issue comes because of a LEFT JOIN of a documents table (that stores the location of all the digital files associated to records in the DB).
This documents table has today 153,234 records.
The schema is
The table has 2 indexes:
Primary key (uid)
documenttype (nonclustered)
The stored procedure is:
SELECT
.....,
CASE ISNULL(cd.countdocs,0) WHEN 0 THEN 0 ELSE 1 END as hasdocs
.....
FROM
requests re
JOIN
employee e ON (e.employeeuid = re.employeeuid)
LEFT JOIN
(SELECT
COUNT(0) as countnotes, n.objectuid as objectuid
FROM
notes n
WHERE
n.isactive = 1
GROUP BY
n.objectuid) n ON n.objectuid = ma.authorizationuid
/* IF I COMMENT THIS LEFT JOIN THEN WORKS AMAZING FAST */
LEFT JOIN
(SELECT
COUNT(0) as countdocs, cd.objectuid
FROM
cloud_document cd
WHERE
cd.isactivedocument = 1
AND cd.entity = 'COMPANY'
GROUP BY
cd.objectuid) cd ON cd.objectuid = re.authorizationuid
JOIN ....
So don't know if I have to add another INDEX to improve this query of maybe the LEFT JOIN I have is not ideal.
If I run the execution plan I get this:
/*
Missing Index Details from SQLQuery7.sql - (local).db_prod (Test/test (55))
The Query Processor estimates that implementing the following index could improve the query cost by 60.8843%.
*/
/*
USE [db_prod]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[cloud_document] ([objectuid],[entity],[isactivedocument])
GO
*/
Any clue on how to solve this?
Thanks.
Just don't go out and add a index. Do some research first!
Can you grab a picture of the query plan and post it? It will show if the query is using the index or not.
Also, complete details of the table would be awesome, including Primary Keys, Foreign Keys, and any indexes. Just script them out to TSQL. A couple of sample records to boot and we can recreate it in a test environment and help you.
Also, take a look at Glenn Barry's DMVs.
http://sqlserverperformance.wordpress.com/tag/dmv-queries/
Good stuff like top running queries, read/write usages of indexes, etc - to name a few.
Like many things in life, it all depends on your situation!
Just need more information before we can make a judgement call.
I would actually be surprised if an index on that field helps as it likely only has two or three values (0,1, null) and indexes are not generally useful when the data has so few values.
I would suspect that either your statistics are out of date or your current indexes need to be rebuilt.

why do some columns slow down the query

I am using SQL Server 2012.
I am trying to optimize a query which is somehting like this:
SELECT TOP 20 ta.id,
ta.name,
ta.amt,
tb.id,
tb.name,
tc.name,
tc.id,
tc.descr
FROM a ta
INNER JOIN b tb
ON ta.id = tb.id
INNER JOIN c tc
ON tb.id = tc.id
ORDER BY ta.mytime DESC
The query takes around 5 - 6 secs to run. There are indexes for all the columns used in joins. The tables have 500k records.
My question is: When I remove the columns tc.name, tc.id and tc.descr from the select, the query returns the results in less than a second. Why?
You need to post the execution plans to really know the difference.
As far as I know, SQL Server does not optimize away joins. After all, even without columns in the select list, the joins can still be used for filtering and multiplying the number of rows.
However, one step might be skipped. With the variables in the select, the engine needs to both go to the index and fetch the page with the data. Without the variables, the engine does not need to do the fetch. This may subtly tip the balance of the optimizer from one type of join to another.
A second possibility simply involves timing. If you ran the query once, then page caches might be filled on the machine. The second time you run it, the query goes much faster simply because the data is in memory. Don't ever run timings unless you either (1) clear the cache between each call or (2) be sure that the cache is filled equivalently.
Do you have clustered indexes? If not, you should create clustered indexes and run your query integer and mostly on primary key columns.
Check http://msdn.microsoft.com/en-us/library/aa933131(v=sql.80).aspx for clustered index.
I was finally able to tune the query by adding additional index to the table. SQL server did not show/imply a missing index but I figured it out by creating a new non-clustered index on a field that is present in a select.
Thanks to you all for coming forward for help.
#Wade the link is really helpful in understanding the SQL optimizer the

Joining Onto CTE Performance

I have a stored procedure where I use a Common Table Expression to build a hierarchical path up a menu (so it can display something like Parent Menu -> Sub Menu -> Sub Sub Menu -> ...)
It works great for what I want to use it for, the issue comes when putting the information I get from the recursive CTE into the information I really want. I do an Inner Join from my Data to the CTE and get out the Hierarchical Path. For something that returns ~300 rows, the stored procedure takes on average 15-20 seconds.
When I insert the results from the CTE into a Temp Table and do the join based on that, the procedure takes less than a second.
I was just wondering why it takes so long to join using only the CTE, or if I am misusing CTE's in some way.
**Edit this is the stored procedure essentially
With Hierarchical_Path (Menu_ID, Parent_ID, Path)
As
(
Select
EM.Menu_Id, Parent_ID,
Convert(varchar(max),
EM.Description) as Path
From
Menu EM
Where
--EM.Topic_No is null
EM.Parent_ID = 0 and EM.Disabled = 0
Union All
Select
EM.Menu_ID,
EM.Parent_ID,
Convert(Varchar(max),E.Path + ' -> ' + EM.Description) as Path
From
Menu EM
Inner Join
Hierarchical_Path E
On
EM.Parent_ID = E.Menu_ID
)
SELECT distinct
EM.Description
,EMS.Path
FROM
dbo.Menu em
INNER JOIN
Hierarchical_Path EMS
ON
EMS.Menu_ID = em.Menu_Id
2 more INNER JOINs
2 Left Joins
WHERE Clause
When I run the query like this (joining onto the CTE) the performance is around 20 seconds.
When I insert the CTE results into a temp table, and join onto that, the performance is instantaneous.
Taking apart my query a bit more, it seems like it gets hung up on the where clause. I guess my question is more to the point of when exactly does a CTE run and does it get stored in memory? I was running under the assumption that it gets called once and then sticks around in memory, but under some circumstances could it be called mulitple times?
The difference is a CTE is not persisted and a temporary table is (at least for the session). Joining on a non-persisted column means SQL has no stats on the data at all compared to the same column in a temporary table which is already pre-evaluated. Basically, the temp table caches what you would use and SQL Server can better optimize for it. The same issues are run into when joining on the result of a function or a table variable.
My guess is that your CTE execution plan is doing the execution with a single thread while your temp table can use multiple threads. You can check this by including actual execution plan when you run the queries and looking for two horizontal arrows pointing in opposite directions on each operator. That indicates parallelism.
P.S. - Try setting "set statistics io on" and "set statistics time on" to see if the actual cost of running the queries are the same regardless of run duration.

Why does this SQL query take 8 hours to finish?

There is a simple SQL JOIN statement below:
SELECT
REC.[BarCode]
,REC.[PASSEDPROCESS]
,REC.[PASSEDNODE]
,REC.[ENABLE]
,REC.[ScanTime]
,REC.[ID]
,REC.[Se_Scanner]
,REC.[UserCode]
,REC.[aufnr]
,REC.[dispatcher]
,REC.[matnr]
,REC.[unitcount]
,REC.[maktx]
,REC.[color]
,REC.[machinecode]
,P.PR_NAME
,N.NO_NAME
,I.[inventoryID]
,I.[status]
FROM tbBCScanRec as REC
left join TB_R_INVENTORY_BARCODE as R
ON REC.[BarCode] = R.[barcode]
AND REC.[PASSEDPROCESS] = R.[process]
AND REC.[PASSEDNODE] = R.[node]
left join TB_INVENTORY as I
ON R.[inventid] = I.[id]
INNER JOIN TB_NODE as N
ON N.NO_ID = REC.PASSEDNODE
INNER JOIN TB_PROCESS as P
ON P.PR_CODE = REC.PASSEDPROCESS
The table tbBCScanRec has 556553 records while the table TB_R_INVENTORY_BARCODE has 260513 reccords and the table TB_INVENTORY has 7688. However, the last two tables (TB_NODE and TB_PROCESS) both have fewer than 30 records.
Incredibly, when it runs in SQL Server 2005, it takes 8 hours to return the result set.
Why does it take so much time to execute?
If the two inner joins are removed, it takes just ten seconds to finish running.
What is the matter?
There are at least two UNIQUE NONCLUSTERED INDEXes.
One is IX_INVENTORY_BARCODE_PROCESS_NODE on the table TB_R_INVENTORY_BARCODE, which covers four columns (inventid, barcode, process, and node).
The other is IX_BARCODE_PROCESS_NODE on the table tbBCScanRec, which covers three columns (BarCode, PASSEDPROCESS, and PASSEDNODE).
Well, standard answer to questions like this:
Make sure you have all the necessary indexes in place, i.e. indexes on N.NO_ID, REC.PASSEDNODE, P.PR_CODE, REC.PASSEDPROCESS
Make sure that the types of the columns you join on are the same, so that no implicit conversion is necessary.
You are working with around (556553 *30 *30) 500 millions of rows.
You probably have to add indexes on your tables.
If you are using SQL server, you can watch the plan query to see where you are losing time.
See the documentation here : http://msdn.microsoft.com/en-us/library/ms190623(v=sql.90).aspx
The query plan will help you to create indexes.
When you check the indexing, there should be clustered indexes as well - the nonclustered indexes use the clustered index, so not having one would render the nonclustered useless. Out-dated statistics could also be a problem.
However, why do you need to fetch ALL of the data? What is the purpose of that? You should have WHERE clauses restricting the result set to only what you need.

Query with a UNION sub-query takes a very long time

I've been having an odd problem with some queries that depend on a sub query. They run lightning fast, until I use a UNION statement in the sub query. Then they run endlessly, I've given after 10 minutes. The scenario I'm describing now isn't the original one I started with, but I think it cuts out a lot of possible problems yet yields the same problem. So even though it's a pointless query, bear with me!
I have a table:
tblUser - 100,000 rows
tblFavourites - 200,000 rows
If I execute:
SELECT COUNT(*)
FROM tblFavourites
WHERE userID NOT IN (SELECT uid FROM tblUser);
… then it runs in under a second. However, if I modify it so that the sub query has a UNION, it will run for at least 10 minutes (before I give up!)
SELECT COUNT(*)
FROM tblFavourites
WHERE userID NOT IN (SELECT uid FROM tblUser UNION SELECT uid FROM tblUser);
A pointless change, but it should yield the same result and I don't see why it should take any longer?
Putting the sub-query into a view and calling that instead has the same effect.
Any ideas why this would be? I'm using SQL Azure.
Problem solved. See my answer below.
UNION generate unique values, so the DBMS engine makes sorts.
You can use safely UNION ALL in this case.
UNION is really doing a DISTINCT on all fields in the combined data set. It filters out dupes in the final results.
Is Uid indexed? If not it may take a long time as the query engine:
Generates the first result set
Generates the second result set
Filters out all the dupes (which is half the records) in a hash table
If duplicates aren't a concern (and using IN means they won't be) then use UNION ALL which removes the expensive Sort/Filter step.
UNION's are usually implemented via temporary in-memory tables. You're essentially copying your tblUser two times into memory, WITH NO INDEX. Then every row in tblFavourites incur a complete table scan over 200,000 rows - that's 200Kx200K=40 billion double-row scans (because the query engine must get the uid from both table rows)
If your tblUser has an index on uid (which is definitely true because all tables in SQL Azure must have a clustered index), then each row in tblFavourites incurs a very fast index lookup, resulting in only 200Kxlog(100K) =200Kx17 = 200K row scans, each with 17 b-tree index comparisons (which is much faster than reading the uid from a row on a data page), so it should equate to roughly 200Kx(3-4) or 1 million double-row scans. I believe newer versions of SQL server may also build a temp hash table containing just the uid's, so essentially it gets down to 200K row scans (assuming hash table lookup to be trivial).
You should also generate your query plan to check.
Essentially, the non-UNION query runs around 500,000 times faster if tblUser has an index (should be on SQL Azure).
It turns out the problem was due to one of the indexes ... tblFavourites contained two foreign keys to the primary key (uid) in tblUser:
userId
otherUserId
both columns had the same definition and same indexes, but I discovered that swapping userId for otherUserId in the original query solved the problem.
I ran:
ALTER INDEX ALL ON tblFavourites REBUILD
... and the problem went away. The query now executes almost instantly.
I don't know too much about what goes on behind the scenes in Sql Server/Azure ... but I can only imagine that it was a damaged index or something? I update statistics frequently, but that had no effect.
Thanks!
---- UPDATE
The above was not fully correct. It did fix the problem for around 20 minutes, then it returned. I have been in touch with Microsoft support for several days and it seems the problem is to do with the tempDB. They are working on a solution at their end.
I just ran into this problem. I have about 1million rows to go through and then I realized that some of my IDs were in another table, so I unioned to get the same information in one "NOT EXISTS." I went from the query taking about 7 sec to processing only 5000 rows after a minute or so. This seemed to help. I absolutely hate the solution, but I've tried a multitude of things that all end up w/the same extremely slow execution plan. This one got me what I needed in about 18 sec.
DECLARE #PIDS TABLE ([PID] [INT] PRIMARY KEY)
INSERT INTO #PIDS SELECT DISTINCT [ID] FROM [STAGE_TABLE] WITH(NOLOCK)
INSERT INTO #PIDS SELECT DISTINCT [OTHERID] FROM [PRODUCTION_TABLE] WITH(NOLOCK)
WHERE NOT EXISTS(SELECT [PID] FROM #PIDS WHERE [PID] = [OTHERID]
SELECT (columns needed)
FROM [ORDER_HEADER] [OH] WITH(NOLOCK)
INNER JOIN #PIDS ON [OH].[SOME_ID] = [PID]
(And yes I tried "WHERE EXISTS IN..." for the final select... inner join was faster)
Please let me say again, I personaly feel this is really ugly, but I actually use this join twice in my proc, so it's going to save me time in the long run. Hope this helps.
Doesn't it make more sense to rephrase the questions from
"UserIds that aren't on the combined list of all the Ids that apper in this table and/or that table"
to
"UserIds that aren't on this table AND aren't on that table either
SELECT COUNT(*)
FROM tblFavourites
WHERE userID NOT IN (SELECT uid FROM tblUser)
AND userID NOT IN (SELECT uid FROM tblUser);