SQL Query very slow - Suddenly - sql

I have a SQL stored procedure that was running perfect (.2 secs execution or less), suddenly today its taking more than 10 minutes.
I see that the issue comes because of a LEFT JOIN of a documents table (that stores the location of all the digital files associated to records in the DB).
This documents table has today 153,234 records.
The schema is
The table has 2 indexes:
Primary key (uid)
documenttype (nonclustered)
The stored procedure is:
SELECT
.....,
CASE ISNULL(cd.countdocs,0) WHEN 0 THEN 0 ELSE 1 END as hasdocs
.....
FROM
requests re
JOIN
employee e ON (e.employeeuid = re.employeeuid)
LEFT JOIN
(SELECT
COUNT(0) as countnotes, n.objectuid as objectuid
FROM
notes n
WHERE
n.isactive = 1
GROUP BY
n.objectuid) n ON n.objectuid = ma.authorizationuid
/* IF I COMMENT THIS LEFT JOIN THEN WORKS AMAZING FAST */
LEFT JOIN
(SELECT
COUNT(0) as countdocs, cd.objectuid
FROM
cloud_document cd
WHERE
cd.isactivedocument = 1
AND cd.entity = 'COMPANY'
GROUP BY
cd.objectuid) cd ON cd.objectuid = re.authorizationuid
JOIN ....
So don't know if I have to add another INDEX to improve this query of maybe the LEFT JOIN I have is not ideal.
If I run the execution plan I get this:
/*
Missing Index Details from SQLQuery7.sql - (local).db_prod (Test/test (55))
The Query Processor estimates that implementing the following index could improve the query cost by 60.8843%.
*/
/*
USE [db_prod]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[cloud_document] ([objectuid],[entity],[isactivedocument])
GO
*/
Any clue on how to solve this?
Thanks.

Just don't go out and add a index. Do some research first!
Can you grab a picture of the query plan and post it? It will show if the query is using the index or not.
Also, complete details of the table would be awesome, including Primary Keys, Foreign Keys, and any indexes. Just script them out to TSQL. A couple of sample records to boot and we can recreate it in a test environment and help you.
Also, take a look at Glenn Barry's DMVs.
http://sqlserverperformance.wordpress.com/tag/dmv-queries/
Good stuff like top running queries, read/write usages of indexes, etc - to name a few.
Like many things in life, it all depends on your situation!
Just need more information before we can make a judgement call.

I would actually be surprised if an index on that field helps as it likely only has two or three values (0,1, null) and indexes are not generally useful when the data has so few values.
I would suspect that either your statistics are out of date or your current indexes need to be rebuilt.

Related

SQL query does not stop executing

I've created a stored procedure that selects from a few different tables in different dbs into temp tables to work with in the proc. Everything seems to be fine with filling the tables. My problem arises when I attempt to join them. This is the query I wrote:
SELECT #temp1.id,
#temp2.first_name,
#temp2.last_name,
#temp3.DOB,
#temp4.Sex,
#temp5.SSN
FROM (((#temp1
LEFT JOIN #temp3 ON #temp1.id = #temp3.id)
LEFT JOIN #temp4 ON #temp1.id = #temp4.id)
LEFT JOIN #temp2 ON #temp1.id = #temp2.id)
LEFT JOIN #temp5 ON #temp1.id = #temp5.id;
The query works to an extent. The output window is filled with the results of the select. The problem is that the query doesn't exit. It stops adding new records to the output but continues executing and thus the procedure hangs because it can't move on to the next statement. Any ideas?
I can see that your query is resulting in high table scan for all the 5 tables in the execution plan. You can create indexes on the joining column (ID) in all the 5 temp tables as follows:
CREATE CLUSTERED INDEX IDX_C_t1_ID ON #temp1(ID)
CREATE CLUSTERED INDEX IDX_C_t2_ID ON #temp2(ID)
CREATE CLUSTERED INDEX IDX_C_t3_ID ON #temp3(ID)
CREATE CLUSTERED INDEX IDX_C_t4_ID ON #temp4(ID)
CREATE CLUSTERED INDEX IDX_C_t5_ID ON #temp5(ID)
It'd be really helpful if you can include number of rows and columns for all the 5 tables.
How many records are in #temp1? Are you seeing that same number of records in your output window? If not, then your query is still returning records.
Also, did you explicitly begin a transaction somewhere in your script? Maybe it is waiting for you to commit or rollback.
Also, I'm not sure why you have parenthesis around your joins. They aren't needed (and maybe you are getting some weird execution plan by using them).
The running time of this query depends (among else) on the Number of rows in #temp1 and on the indexes that it has (by default --> nothing ).
First step would be as I see it:
Isolating the execution plan, specifically for this query. You can do so, by filling your data into temp1 (and not #temp1),
Then - try running only the query against temp1.
Seeing the execution plan for the query, may recommend which indexes might help, and will be a first step towards optimizing it.

SQL Query on table with 30mill records

I have been having problems building a table in my local SQL Server. Orginally it was causing the tempdb table to become full and throw an exception. This has a lot of joins and outer applies, and so to find specifically where the problem lay I did a select on the first table in the sql query to determine how long it took, that was fast so I then added the next table that was the first join in the query and reran, I continued to do this until I found the table that stalled.
I found the problem (or at least the first problem) was with the shipper_container table. This table is huge and pulling it alone gets a System.OutOfMemoryException just showing a select on the results of that table alone (it has only 5 columns). It cuts out at 16 million records but has 30 million rows. It is 1.2GB in size. This doesn't seem so big for me that SQL Management studio couldn't handle it.
Using a WHERE statement to collect values between 1 January - 10 January 2015 still resulted in a search that took over 5 minutes and was still executing when I cancelled. I have also added indexes on each of the select parameters and this did not increase performance either.
Here is the SQL Query. You can see I have commented out the other parameters that have yet to be added in other joins and outer applies.
DECLARE #startDate DATETIME
DECLARE #endDate DATETIME
DECLARE #Shipper_Key INT = NULL
DECLARE #Part_Key INT = NULL
SET #startDate = '2015-01-01'
SET #endDate = '2015-01-10'
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
INSERT Shipped_Container
(
Ship_Date,
Invoice_Quantity,
Shipper_No,
Serial_No,
Truck_Key,
Shipper_Key
)
SELECT
S.Ship_Date,
SC.Quantity,
S.Shipper_No,
SC.Serial_No,
S.Truck_Key,
S.Shipper_Key
FROM Shipper AS S
JOIN Shipper_Line AS SL
--ON SL.PCN = S.PCN
ON SL.Shipper_Key = S.Shipper_Key
JOIN Shipper_Container AS SC
--ON SC.PCN = SL.PCN
ON SC.Shipper_Line_Key = SL.Shipper_Line_Key
WHERE S.Ship_Date >= #startDate AND S.Ship_Date <= #endDate
AND S.Shipper_Key = ISNULL(#Shipper_Key, S.Shipper_Key)
AND SL.Part_Key = ISNULL(#Part_Key, SL.Part_Key)
The server instance is run on the local network - could this be an issue? I really have minimal experience at this and would really appreciate help and as detailed and clear as possible. Often in SQL forums people jump right into technical details I don't follow so well.
Don't do a Select ... From yourtable in SS Management Studio when it return
hundrend of thousand or millions of row. 1GB of data gets a lot bigger when the system has to draw and show it on screen in the Management Studio data sheet
The server instance is run on the local network
When you do a Select ... From yourtable in SSMS, the server must send all the data to your laptop/desktop. This is quite a lot of uneeded presure on the network.
It should not be an issue when you insert because everything stays on the server. However, staying on the server does not mean it will be fast if your data model is not good enough.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
You may get dirty data is you use that... It may be better to remove it unless you know why it is there and why you need it.
I have also added indexes on each of the select parameters and this did not increase performance either
If you mean indexes on :
S.Ship_Date,
SC.Quantity,
S.Shipper_No,
SC.Serial_No,
S.Truck_Key,
S.Shipper_Key
What are their definitions ?
If they are individual indexes on 1 column, you can drop indexes on SC.Quantity, S.Shipper_No, SC.Serial_No and S.Truck_Key. They are not used.
Ship_Date and Shipper_key may be usefull. It all depends on your model and existing primary keys. (which you need to describe, see below)
It will help to give a more accurate answer if you could tell us:
the relation between your 3 tables (which field link A to B and in which direction)
the primary key on your 3 tables
a complete list of all your indexes(and columns) on your 3 tables
If none of your indexes are usefull or if they are missing, it will most likely read the whole 3 tables and try to match them. Because it is pretty big, it does not have enough memory to process it and it uses tempdb to store intermediary data.
For now I will suppose that shipper_key + PCN is the primary key on each tables.
I think you can try that:
You can create an index on S.Ship_Date
Create Index Shipper_Line_Ship_Date(Ship_Date) -- subject to updates according to your Primary Key
The query optimizer may not use the indexes (if they exists) with such a where clause:
AND S.Shipper_Key = ISNULL(#Shipper_Key, S.Shipper_Key)
AND SL.Part_Key = ISNULL(#Part_Key, SL.Part_Key)
you can use:
AND (S.Shipper_Key = #Shipper_Key or #Shipper_Key is null)
AND (SL.Part_Key = #Part_Key or #Part_Keyis null)
It would help to have indexes on Shipper_Key and PCN
Finally
As I already said above, we need to know more about your data model (create table...), primary keys and indexes (create indexes). You can create a modele here http://sqlfiddle.com/ with all 3 create tables and their indexes. Then go to link and add the link here.
In SSMS, you can right click on a table and go to Script Table as / Create To / New Query Window and add it here or in http://sqlfiddle.com/. Only keep the CREATE TABLE ... part down to the first GO.
You can then do the same thing on all you indexes.
You should also add a copy of you query plan.
In SSMS, go to Query menu / Display Estimated Execution Plan and right click to save it as xml (xml is better). It is only an estimation and it won't execute the whole query. It should be pretty fast.

DB Design: Looking for Performance improvement when a BIT Column from every table used in every SQL Queries

I recently got added to a new ASP .NET Project(A web application) .There were recent performance issues with the application, and I am in a team with their current task to Optimize some slow running stored procedures.
The database designed is highly normalized. In all the tables we have a BIT column as [Status_ID]. In every Stored procedures, For every tsql query, this column is involved in WHERE condition for all tables.
Example:
Select A.Col1,
C.Info
From dbo.table1 A
Join dbo.table2 B On A.id = B.id
Left Join dbo.table21 C On C.map = B.Map
Where A.[Status_ID] = 1
And B.[Status_ID] = 1
And C.[Status_ID] = 1
And A.link > 50
In the above sql, 3 tables are involved, [Status_ID] column from all 3 tables are involved in the WHERE condition. This is just an example. Like this [Status_ID] is involved in almost all the queries.
When I see the execution plan of most of the SPs, there are lot of Key lookup (Clustered) task involved and most of them are looking for [Status_ID] in the respective table.
In the Application, I found that, it is not possible to avoid these column checking from queries. So
Will it be a good idea to
Alter all [Status_ID] columns to NOT NULL, and then adding them to PRIMARY KEY of that table.Columns 12,13.. will be (12,1) and (13,1)
Adding [Status_ID] column to all the NON Clustered indexes in the INCLUDE PART for that table.
Please share you suggestions over the above two points as well as any other.
Thanks for reading.
If you add the Status_ID to the PK you change the definition of the PK
If you add Status_ID to the PK then you could have duplicate ID
And changing the Status_ID would fragment the index
Don't do that
The PK should be what should make the row unique
Add a separate nonclustered index for the Status_ID
And if it is not null then change it to not null
This will only cut the workload in 1/2
Another option is to add [Status_ID] to every other non clustered.
But if it is first it only cuts the workload in 1/2.
And if is second it is only effective if the other component of the index is in the query
Try Status_ID as a separate index
I suspect the query optimizer will be smart enough to evaluate it last since it will be the least specific index
If you don't have an index on link then do so
And try changing the query
Some times this helps the query optimizer
Select A.Col1, C.Info
From dbo.table1 A
Join dbo.table2 B
On A.id = B.id
AND A.[Status_ID] = 1
And A.link > 50
And B.[Status_ID] = 1
Left Join dbo.table21 C
On C.map = B.Map
And C.[Status_ID] = 1
Check the fragmentation of the indexes
Check the type of join
If it is using a loop join then try join hints
This query should not be performing poorly
If might be lock contention
Try with (nolock)
That might not be an acceptable long term solution but it would tell you is locks are the problem

Why does this SQL query take 8 hours to finish?

There is a simple SQL JOIN statement below:
SELECT
REC.[BarCode]
,REC.[PASSEDPROCESS]
,REC.[PASSEDNODE]
,REC.[ENABLE]
,REC.[ScanTime]
,REC.[ID]
,REC.[Se_Scanner]
,REC.[UserCode]
,REC.[aufnr]
,REC.[dispatcher]
,REC.[matnr]
,REC.[unitcount]
,REC.[maktx]
,REC.[color]
,REC.[machinecode]
,P.PR_NAME
,N.NO_NAME
,I.[inventoryID]
,I.[status]
FROM tbBCScanRec as REC
left join TB_R_INVENTORY_BARCODE as R
ON REC.[BarCode] = R.[barcode]
AND REC.[PASSEDPROCESS] = R.[process]
AND REC.[PASSEDNODE] = R.[node]
left join TB_INVENTORY as I
ON R.[inventid] = I.[id]
INNER JOIN TB_NODE as N
ON N.NO_ID = REC.PASSEDNODE
INNER JOIN TB_PROCESS as P
ON P.PR_CODE = REC.PASSEDPROCESS
The table tbBCScanRec has 556553 records while the table TB_R_INVENTORY_BARCODE has 260513 reccords and the table TB_INVENTORY has 7688. However, the last two tables (TB_NODE and TB_PROCESS) both have fewer than 30 records.
Incredibly, when it runs in SQL Server 2005, it takes 8 hours to return the result set.
Why does it take so much time to execute?
If the two inner joins are removed, it takes just ten seconds to finish running.
What is the matter?
There are at least two UNIQUE NONCLUSTERED INDEXes.
One is IX_INVENTORY_BARCODE_PROCESS_NODE on the table TB_R_INVENTORY_BARCODE, which covers four columns (inventid, barcode, process, and node).
The other is IX_BARCODE_PROCESS_NODE on the table tbBCScanRec, which covers three columns (BarCode, PASSEDPROCESS, and PASSEDNODE).
Well, standard answer to questions like this:
Make sure you have all the necessary indexes in place, i.e. indexes on N.NO_ID, REC.PASSEDNODE, P.PR_CODE, REC.PASSEDPROCESS
Make sure that the types of the columns you join on are the same, so that no implicit conversion is necessary.
You are working with around (556553 *30 *30) 500 millions of rows.
You probably have to add indexes on your tables.
If you are using SQL server, you can watch the plan query to see where you are losing time.
See the documentation here : http://msdn.microsoft.com/en-us/library/ms190623(v=sql.90).aspx
The query plan will help you to create indexes.
When you check the indexing, there should be clustered indexes as well - the nonclustered indexes use the clustered index, so not having one would render the nonclustered useless. Out-dated statistics could also be a problem.
However, why do you need to fetch ALL of the data? What is the purpose of that? You should have WHERE clauses restricting the result set to only what you need.

Query with a UNION sub-query takes a very long time

I've been having an odd problem with some queries that depend on a sub query. They run lightning fast, until I use a UNION statement in the sub query. Then they run endlessly, I've given after 10 minutes. The scenario I'm describing now isn't the original one I started with, but I think it cuts out a lot of possible problems yet yields the same problem. So even though it's a pointless query, bear with me!
I have a table:
tblUser - 100,000 rows
tblFavourites - 200,000 rows
If I execute:
SELECT COUNT(*)
FROM tblFavourites
WHERE userID NOT IN (SELECT uid FROM tblUser);
… then it runs in under a second. However, if I modify it so that the sub query has a UNION, it will run for at least 10 minutes (before I give up!)
SELECT COUNT(*)
FROM tblFavourites
WHERE userID NOT IN (SELECT uid FROM tblUser UNION SELECT uid FROM tblUser);
A pointless change, but it should yield the same result and I don't see why it should take any longer?
Putting the sub-query into a view and calling that instead has the same effect.
Any ideas why this would be? I'm using SQL Azure.
Problem solved. See my answer below.
UNION generate unique values, so the DBMS engine makes sorts.
You can use safely UNION ALL in this case.
UNION is really doing a DISTINCT on all fields in the combined data set. It filters out dupes in the final results.
Is Uid indexed? If not it may take a long time as the query engine:
Generates the first result set
Generates the second result set
Filters out all the dupes (which is half the records) in a hash table
If duplicates aren't a concern (and using IN means they won't be) then use UNION ALL which removes the expensive Sort/Filter step.
UNION's are usually implemented via temporary in-memory tables. You're essentially copying your tblUser two times into memory, WITH NO INDEX. Then every row in tblFavourites incur a complete table scan over 200,000 rows - that's 200Kx200K=40 billion double-row scans (because the query engine must get the uid from both table rows)
If your tblUser has an index on uid (which is definitely true because all tables in SQL Azure must have a clustered index), then each row in tblFavourites incurs a very fast index lookup, resulting in only 200Kxlog(100K) =200Kx17 = 200K row scans, each with 17 b-tree index comparisons (which is much faster than reading the uid from a row on a data page), so it should equate to roughly 200Kx(3-4) or 1 million double-row scans. I believe newer versions of SQL server may also build a temp hash table containing just the uid's, so essentially it gets down to 200K row scans (assuming hash table lookup to be trivial).
You should also generate your query plan to check.
Essentially, the non-UNION query runs around 500,000 times faster if tblUser has an index (should be on SQL Azure).
It turns out the problem was due to one of the indexes ... tblFavourites contained two foreign keys to the primary key (uid) in tblUser:
userId
otherUserId
both columns had the same definition and same indexes, but I discovered that swapping userId for otherUserId in the original query solved the problem.
I ran:
ALTER INDEX ALL ON tblFavourites REBUILD
... and the problem went away. The query now executes almost instantly.
I don't know too much about what goes on behind the scenes in Sql Server/Azure ... but I can only imagine that it was a damaged index or something? I update statistics frequently, but that had no effect.
Thanks!
---- UPDATE
The above was not fully correct. It did fix the problem for around 20 minutes, then it returned. I have been in touch with Microsoft support for several days and it seems the problem is to do with the tempDB. They are working on a solution at their end.
I just ran into this problem. I have about 1million rows to go through and then I realized that some of my IDs were in another table, so I unioned to get the same information in one "NOT EXISTS." I went from the query taking about 7 sec to processing only 5000 rows after a minute or so. This seemed to help. I absolutely hate the solution, but I've tried a multitude of things that all end up w/the same extremely slow execution plan. This one got me what I needed in about 18 sec.
DECLARE #PIDS TABLE ([PID] [INT] PRIMARY KEY)
INSERT INTO #PIDS SELECT DISTINCT [ID] FROM [STAGE_TABLE] WITH(NOLOCK)
INSERT INTO #PIDS SELECT DISTINCT [OTHERID] FROM [PRODUCTION_TABLE] WITH(NOLOCK)
WHERE NOT EXISTS(SELECT [PID] FROM #PIDS WHERE [PID] = [OTHERID]
SELECT (columns needed)
FROM [ORDER_HEADER] [OH] WITH(NOLOCK)
INNER JOIN #PIDS ON [OH].[SOME_ID] = [PID]
(And yes I tried "WHERE EXISTS IN..." for the final select... inner join was faster)
Please let me say again, I personaly feel this is really ugly, but I actually use this join twice in my proc, so it's going to save me time in the long run. Hope this helps.
Doesn't it make more sense to rephrase the questions from
"UserIds that aren't on the combined list of all the Ids that apper in this table and/or that table"
to
"UserIds that aren't on this table AND aren't on that table either
SELECT COUNT(*)
FROM tblFavourites
WHERE userID NOT IN (SELECT uid FROM tblUser)
AND userID NOT IN (SELECT uid FROM tblUser);