why do some columns slow down the query - sql

I am using SQL Server 2012.
I am trying to optimize a query which is somehting like this:
SELECT TOP 20 ta.id,
ta.name,
ta.amt,
tb.id,
tb.name,
tc.name,
tc.id,
tc.descr
FROM a ta
INNER JOIN b tb
ON ta.id = tb.id
INNER JOIN c tc
ON tb.id = tc.id
ORDER BY ta.mytime DESC
The query takes around 5 - 6 secs to run. There are indexes for all the columns used in joins. The tables have 500k records.
My question is: When I remove the columns tc.name, tc.id and tc.descr from the select, the query returns the results in less than a second. Why?

You need to post the execution plans to really know the difference.
As far as I know, SQL Server does not optimize away joins. After all, even without columns in the select list, the joins can still be used for filtering and multiplying the number of rows.
However, one step might be skipped. With the variables in the select, the engine needs to both go to the index and fetch the page with the data. Without the variables, the engine does not need to do the fetch. This may subtly tip the balance of the optimizer from one type of join to another.
A second possibility simply involves timing. If you ran the query once, then page caches might be filled on the machine. The second time you run it, the query goes much faster simply because the data is in memory. Don't ever run timings unless you either (1) clear the cache between each call or (2) be sure that the cache is filled equivalently.

Do you have clustered indexes? If not, you should create clustered indexes and run your query integer and mostly on primary key columns.
Check http://msdn.microsoft.com/en-us/library/aa933131(v=sql.80).aspx for clustered index.

I was finally able to tune the query by adding additional index to the table. SQL server did not show/imply a missing index but I figured it out by creating a new non-clustered index on a field that is present in a select.
Thanks to you all for coming forward for help.
#Wade the link is really helpful in understanding the SQL optimizer the

Related

SQL - Make Join Query Faster

I've made a query that selects 2 values from 2 tables. I need to run this query about 32 times when a visitor visits my website. This makes the page quite slow (it takes over 5 seconds to fully load).
The query looks like this:
SELECT tmdb.name, patch.sfo_title
FROM tmdb
RIGHT JOIN patch
ON tmdb.titleid = CONCAT(patch.cusa, '_00')
WHERE cusa = :titleid
LIMIT 1
is there any way to make this query faster? The query isn't the biggest operation if I look at it, so I'm not really sure why it's so slow?
I would write the query as a left join (out of preference, not performance):
SELECT t.name, p.sfo_title
FROM patch p LEFT JOIN
tmdb t
ON t.titleid = CONCAT(p.cusa, '_00')
WHERE p.cusa = :titleid
LIMIT 1;
Then for performance, I would recommend indexes on patch(cusa, sfo_title) and t(titleid).
Note that the use of LIMIT without ORDER BY is suspicious, although you might have reasons for it.
You are joining two tables based on a CALCULATED field? No wonder it is slow. I have no idea how your tables get maintained, but you need to get that concat'ed value of CUSA into the data base as a separate field and get it indexed, as Gordon Linoff suggested. You could even maintain it through On Insert and On Update triggers. Personally, I would examine why you have similar but different keys in your two tables and try to rationalize it down to one. That Concat(CUSA, '_00' ) looks suspiciously like an opportunity to simplify the application.

SQL query in dire need of optimization

I have this query, which works fine, except it takes a couple of minutes to load. I need help optimizing it so it runs faster and I don't know where to start:
SELECT
job_header.job,
job_header.suffix,
job_header.customer,
job_header.description,
job_header.comments_1,
job_header.date_due,
job_header.part,
job_header.customer_po,
job_header.date_closed,
job_header.flag_hold,
job_header.code_sort,
wo_user_flds.user_7,
wo_user_flds.user_3,
wo_user_flds.user_6,
wo_user_flds.user_5,
wo_user_flds.user_2,
quote_lines.user_2 as serialNo,
quote_lines.user_3 as unit,
quote_lines.user_4 as package
FROM job_header
LEFT JOIN wo_user_flds ON
(job_header.job = wo_user_flds.job) AND
(job_header.suffix = wo_user_flds.suffix)
LEFT JOIN quote_lines ON
(job_header.part = quote_lines.part)
WHERE job_header.date_closed = '000000'
AND LENGTH(job_header.job) > 5;
More information that might be of use:
Only the columns found in the select are the columns I need.
My query returns roughly 400 records.
Job_Header table has 97 columns and 6,300 records.
Wo_User_Flds table has 12 columns and 1,100 records.
Quote_Lines table has 198 columns and 46,000 records.
I could speculate on what I think I need to do, but I'm really just guessing at this point. I looked at similar questions and lot of talk of 'indexes', so I checked and these tables do have some indexes...if that helps? Thanks in advance.
[EDIT]
Thanks for the quick responses guys, really appreciate it. I'm going to look into everything everyone said, but here is the ddl for these tables: http://paste.ubuntu.com/13247664/
[EDIT 2]
My query takes 1 minute to load. My expectations may not be realistic in how much faster it can be. I might have to resort to breaking up the query into more than one and then just assemble the data on the client.
Without any other info you'd need an index on job_header on either (job, date_closed) or (date_closed, job). But post the indexes on the table e.g. sp_helpindex or better still the create index script (right click on the index in SSMS and script the index)
First be sure you have indexes on columns where you JOIN tables and your "WHERE clause column". In this case, you should have indexes on these columns:
--Table job_header indexes, beside unique index
job_header.job
job_header.suffix
job_header.part = quote_lines.part
job_header.date_closed
--Table wo_users_flds indexes, beside unique index
wo_user_flds.job
wo_user_flds.suffix
Then, avoid using UDFs (functions, like LENGHT, CAST, concatenation etc.). But in this case, you can leave LENGTH there. So your query would be same, only your indexes would improve query execution plan drastically.
Also, use execution plan to see where you have INDEX_SCAN and INDEX_SEEK. If you have INDEX_SCAN somewhere, it should be sign that you need index on that column.
This would be for start.

How to improve the performance of multiple joins

I have a query with multiple joins in it. When I execute the query it takes too long. Can you please suggest me how to improve this query?
ALTER View [dbo].[customReport]
As
SELECT DISTINCT ViewUserInvoicerReport.Owner,
ViewUserAll.ParentID As Account , ViewContact.Company,
Payment.PostingDate, ViewInvoice.Charge, ViewInvoice.Tax,
PaymentProcessLog.InvoiceNumber
FROM
ViewContact
Inner Join ViewUserInvoicerReport on ViewContact.UserID = ViewUserInvoicerReport.UserID
Inner Join ViewUserAll on ViewUserInvoicerReport.UserID = ViewUserAll.UserID
Inner Join Payment on Payment.UserID = ViewUserAll.UserID
Inner Join ViewInvoice on Payment.UserID = ViewInvoice.UserID
Inner Join PaymentProcessLog on ViewInvoice.UserID = PaymentProcessLog.UserID
GO
Work on removing the distinct.
THat is not a join issue. The problem is that ALL rows have to go into a temp table to find out which are double - if you analyze the query plan (programmers 101 - learn to use that fast) you will see that the join likely is not the big problem but the distinct is.
And IIRC that distinct is USELESS because all rows are unique anyway... not 100% sure, but the field list seems to indicate.
Use distincts VERY rarely please ;)
You should see the Query Execution Plan and optimize the query section by section.
The overall optimization process consists of two main steps:
Isolate long-running queries.
Identify the cause of long-running queries.
See - How To: Optimize SQL Queries for step by step instructions.
and
It's difficult to say how to improve the performance of a query without knowing things like how many rows of data are in each table, which columns are indexed, what performance you're looking for and which database you're using.
Most important:
1. Make sure that all columns used in joins are indexed
2. Make sure that the query execution plan indicates that you are using the indexes you expect

Subquery v/s inner join in sql server

I have following queries
First one using inner join
SELECT item_ID,item_Code,item_Name
FROM [Pharmacy].[tblitemHdr] I
INNER JOIN EMR.tblFavourites F ON I.item_ID=F.itemID
WHERE F.doctorID = #doctorId AND F.favType = 'I'
second one using sub query like
SELECT item_ID,item_Code,item_Name from [Pharmacy].[tblitemHdr]
WHERE item_ID IN
(SELECT itemID FROM EMR.tblFavourites
WHERE doctorID = #doctorId AND favType = 'I'
)
In this item table [Pharmacy].[tblitemHdr] Contains 15 columns and 2000 records. And [Pharmacy].[tblitemHdr] contains 5 columns and around 100 records. in this scenario which query gives me better performance?
Usually joins will work faster than inner queries, but in reality it will depend on the execution plan generated by SQL Server. No matter how you write your query, SQL Server will always transform it on an execution plan. If it is "smart" enough to generate the same plan from both queries, you will get the same result.
Here and here some links to help.
In Sql Server Management Studio you can enable "Client Statistics" and also Include Actual Execution Plan. This will give you the ability to know precisely the execution time and load of each request.
Also between each request clean the cache to avoid cache side effect on performance
USE <YOURDATABASENAME>;
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
I think it's always best to see with our own eyes than relying on theory !
Sub-query Vs Join
Table one 20 rows,2 cols
Table two 20 rows,2 cols
sub-query 20*20
join 20*2
logical, rectify
Detailed
The scan count indicates multiplication effect as the system will have to go through again and again to fetch data, for your performance measure, just look at the time
join is faster than subquery.
subquery makes for busy disk access, think of hard disk's read-write needle(head?) that goes back and forth when it access: User, SearchExpression, PageSize, DrilldownPageSize, User, SearchExpression, PageSize, DrilldownPageSize, User... and so on.
join works by concentrating the operation on the result of the first two tables, any subsequent joins would concentrate joining on the in-memory(or cached to disk) result of the first joined tables, and so on. less read-write needle movement, thus faster
Source: Here
First query is better than second query.. because first query we are joining both table.
and also check the explain plan for both queries...

Why does this SQL query take 8 hours to finish?

There is a simple SQL JOIN statement below:
SELECT
REC.[BarCode]
,REC.[PASSEDPROCESS]
,REC.[PASSEDNODE]
,REC.[ENABLE]
,REC.[ScanTime]
,REC.[ID]
,REC.[Se_Scanner]
,REC.[UserCode]
,REC.[aufnr]
,REC.[dispatcher]
,REC.[matnr]
,REC.[unitcount]
,REC.[maktx]
,REC.[color]
,REC.[machinecode]
,P.PR_NAME
,N.NO_NAME
,I.[inventoryID]
,I.[status]
FROM tbBCScanRec as REC
left join TB_R_INVENTORY_BARCODE as R
ON REC.[BarCode] = R.[barcode]
AND REC.[PASSEDPROCESS] = R.[process]
AND REC.[PASSEDNODE] = R.[node]
left join TB_INVENTORY as I
ON R.[inventid] = I.[id]
INNER JOIN TB_NODE as N
ON N.NO_ID = REC.PASSEDNODE
INNER JOIN TB_PROCESS as P
ON P.PR_CODE = REC.PASSEDPROCESS
The table tbBCScanRec has 556553 records while the table TB_R_INVENTORY_BARCODE has 260513 reccords and the table TB_INVENTORY has 7688. However, the last two tables (TB_NODE and TB_PROCESS) both have fewer than 30 records.
Incredibly, when it runs in SQL Server 2005, it takes 8 hours to return the result set.
Why does it take so much time to execute?
If the two inner joins are removed, it takes just ten seconds to finish running.
What is the matter?
There are at least two UNIQUE NONCLUSTERED INDEXes.
One is IX_INVENTORY_BARCODE_PROCESS_NODE on the table TB_R_INVENTORY_BARCODE, which covers four columns (inventid, barcode, process, and node).
The other is IX_BARCODE_PROCESS_NODE on the table tbBCScanRec, which covers three columns (BarCode, PASSEDPROCESS, and PASSEDNODE).
Well, standard answer to questions like this:
Make sure you have all the necessary indexes in place, i.e. indexes on N.NO_ID, REC.PASSEDNODE, P.PR_CODE, REC.PASSEDPROCESS
Make sure that the types of the columns you join on are the same, so that no implicit conversion is necessary.
You are working with around (556553 *30 *30) 500 millions of rows.
You probably have to add indexes on your tables.
If you are using SQL server, you can watch the plan query to see where you are losing time.
See the documentation here : http://msdn.microsoft.com/en-us/library/ms190623(v=sql.90).aspx
The query plan will help you to create indexes.
When you check the indexing, there should be clustered indexes as well - the nonclustered indexes use the clustered index, so not having one would render the nonclustered useless. Out-dated statistics could also be a problem.
However, why do you need to fetch ALL of the data? What is the purpose of that? You should have WHERE clauses restricting the result set to only what you need.