SQL Optimization Query - sql

The below MSSQL2005 query is very slow. I feel like their ought to be a way to speed it up, but am unsure how. Note that I editted the inner join to use select statements to make it more obvious (to people reading this question) what is going on, though this has no impact on speed (probably the Execution plan is the same either way). Interestingly, I never actually use keywordvaluegroups for anything more than a count, but I'm not sure if there is a way to capitalize on this.
select top 1 cde.processPath as 'keywordValue', count(*) as 'total'
from dbo.ClientDefinitionEntry AS cde INNER JOIN dbo.KeywordValueGroups AS kvg
ON cde.keywordGroupId = kvg.keywordValueGrpId
where kvg.[name] = #definitionName
group by cde.processPath
order by total desc
Edit: Apparently, people keep complaining about my use of subqueries. In fact, it makes no difference. I added them right before posting this question to make it easier to see what is going on. But they only made things more confusing, so I changed it not to use them.
Edit: Indexes in use:
ClientDefinitionEntry:
IX_ClientDefinitionEntry |nonclustered located on PRIMARY|clientId, keywordGroupId
KeyWordValueGroups
IX_KeywordValueGroups |nonclustered located on PRIMARY|keywordValueGrpId
IX_KeywordValueGroups_2 |nonclustered located on PRIMARY|version
IX_KeywordValueGroups_Name |nonclustered located on PRIMARY|name

How does the execution plan looks like ?
By having a look at it, you'll learn which part of the query takes the most time / resources.
Have you indexes on the columns where you filter on ? Have you indexes on the columns that you use for joining ? Have you indexes on the columns that you use for sorting ?
once you've taken a look at this, and the query is still slow, you can take a look at how your database / table is fragmented (dbcc showcontig), and see if it is necessary to rebuild the indexes.
It might be helpfull to have a maintenance plan which rebuilds your indexes on a regular basis.

Run the query with this option on:
SET SHOWPLAN_TEXT ON
And add the result to the question.
Also check if your statistics are up to date:
SELECT
object_name = Object_Name(ind.object_id),
IndexName = ind.name,
StatisticsDate = STATS_DATE(ind.object_id, ind.index_id)
FROM SYS.INDEXES ind
order by STATS_DATE(ind.object_id, ind.index_id) desc
And information about indexes, table definitions and foreign keys would be helpful.

I'd make sure you had the following indexes.
The ID on KeywordValueGroups.
The Name on KeywordValueGroups.
The ID on ClientDefinitionEntry with an INCLUDE for the processPath.
CREATE INDEX [IX_ClientDefinitionEntry_Id_ProcessPath] ON [dbo].[ClientDefinitionEntry] ( [keywordGroupId] ASC ) INCLUDE ( [processPath]) ON [PRIMARY]
CREATE INDEX [IX_KeywordValueGroups_Id] ON [dbo].[KeywordValueGroups] ( [keywordValueGrpId] ASC )
CREATE INDEX [IX_KeywordValueGroups_Name] ON [dbo].[KeywordValueGroups] ( [name] ASC )
I'd also change the query to the following.
select top 1
cde.processPath as 'keywordValue',
count(*) as 'total'
from
dbo.ClientDefinitionEntry AS cde
INNER JOIN
dbo.KeywordValueGroups AS kvg
ON
cde.keywordGroupId = kvg.keywordValueGrpId
where
kvg.[name] = #definitionName
group by
processPath
order by
total desc

There really isn't enough information to know for sure. If you are having performance problems in that query, then the tables must have a non trivial amount of data and you must be missing important indexes.
Which indexes will definitely help depends deeply on how large the tables are, and to a lesser extent on the distribution of values in the KeywordGroupId and KeywordValueGrpId fields.
Lacking any other information, I would say that you want to make sure that dbo.KeywordValueGroups.[name] is indexed, as well as dbo.ClientDefinitionEntry.[keywordGroupId].
Because of the way the query is written, an index on dbo.KeywordValueGroups.[keywordValueGrpId] alone cannot help, but a composite index on [name], [keywordValueGrpId] probably will. If you have that index, you don't need a dedicated index on [name].
Based on gut-feeling alone, I might hazard that the index on [name] is a must, and that cde.keywordGroupId is likely important. Whether the composite index on [name], [keywordValueGrpId] would help, it depends on how many records are there with the same [name].
The only way to know for sure is to add the indexes and see what happens.
You also need to think about how often this query runs (so, how important is it to make it fast), and how often the underlying data changes. Depending on your particular circumstances, the increase in speed might not justify the added cost of maintaining the indexes.

Not sure how many records we are talking about but this:
order by total desc
is on a calculated column meaning every calculation on every row will have to be done before the ordering can be done. Likely this is one of the things slowing it down, but I see no way out of that particular problem. Not a problem if you only have a few records after the join, but could be if there are lots of them.
I'd concentrate on indexing first. We often forget that when we create foriegn keys they are not automatically indexed. Check to see if both parts of the join are indexed.
Since you are passing a value in a parameter, you might also have a parameter sniffing problem. Google this for techniques to fix that.

Related

How can I solve a performance issue in sql query?

All developers know that "IN" and DISTINCT create issue for all sql query. My colleague created below query but now He is not working at my employed company.PLease take a look below code. How can I tune up my query for high performance?
SELECT xxx
, COUNT(DISTINCT Id) Count
FROM Test (NOLOCK)
WHERE IsDeleted = 0
AND xxx IN
(
SELECT CAST(value AS INT)
FROM STRING_SPLIT(#ProductIds, ',')
)
GROUP BY xxx
All developers know that "IN" and DISTINCT create issue for all sql query.
This is not necessarily true. They do hurt performance, but sometimes they are necessary.
The IN is probably not a big deal. It gets evaluated once. If you have another way to pass in a list -- say using a temporary table -- that is better.
The COUNT(DISTINCT id) is suspicious. I would expect id to already be unique. If so, then just use COUNT(*).
The WITH (NOLOCK) is not recommended unless you really know what you are doing. Working with data that might be inconsistent is dangerous.
I have used used Sentry One Plan Explorer to help find the tuning points of queries I am having performance issues with:
https://www.sentryone.com/plan-explorer
First you need to decide what good performance is in your environment, then find the worst parts of the query and optimize those first.
Last, consider how you are storing your data, look for places it makes sense to add an index if needed.
better you have to create an index for XXX column

Slow MSSQL Stored Procedure

I am not sure what I am missing, but I have a stored procedure that brings back the newest content in my database (via php), but it is really slow.
I have a View that brings in a specific kind of data (about 8000 records).
My stored procedure looks like this and takes about 9-11 seconds to complete, any advice? Be kind, I am new to this :)
WITH maxdate As (
SELECT id_cr, MAX(date_activation) "LastReading"
FROM [pwf].[dbo].[content_code_service_new_content]
GROUP BY id_cr
)
SELECT DISTINCT TOP 7 s.id_cr, s.date_activation, s.title, s.id_element
FROM [pwf].[dbo].[content_code_service] s
INNER JOIN maxdate t on s.id_cr = t.id_cr and s.date_activation = t.LastReading
WHERE (
id_service = #id_service
AND content_languages_list LIKE '%' + #id_language + '%'
) ORDER by date_activation DESC
Okay, you're admitting your kinda new to this, so after all of this, you'll probably want to do some googling on how to performance tune SQL queries.
But, here's a quick rundown that should help you get through this particular problem.
First up: "Display Actual Execution Plan". One of the most useful tools in MS SQL is the "Display Actual Execution Plan" - which can be found in the Query menu. When this is checked, running the query will create a third tab alongside Results and Messages after you run the query. It'll display each operation the SQL engine had to perform, along with the percentage each took. Usually, this will be enough to figure out what might be wrong (if 1 of your 12 steps took 95% of the time, it's probably indicating how it's using the DB slowly.)
One of the most important things in this is looking at how it's actually reading the data from SQL - they're the right-most nodes in the little tree it constructs. There are a few possibilities:
Table Scan. This is usually bad - it means its having to read the entire table to get what it wants
Clustered Index Scan. This is also usually bad. Clustered Indexes are the table, and if it's Scanning it, it means it's looking through all the records.
Non-Clustered Index Scan. Not optimal, but not necessarily a problem. It's able to use an index to help out, but not enough that it can perform a binary search for what it's looking for (it has to scan the whole index.)
Index Seek (Clustered or Non-Clustered). This is what you're after. It's performing a binary find to get quickly to the specific data its looking for.
So! How do you get Index Seeks? You make sure your table has indexes on the appropriate fields.
From a quick skimming of your query, here are the columns that SQL's having to look up:
content_code_service_new_content.id_cr
content_code_service_new_content.date_activation
content_code_service.id_cr
content_code_service.date_activation
content_code_service.id_service
content_code_service.content_languages_list
So right off the bad, I'd check those two tables, and make sure those columns have indexes for them.
I don't know anything about your data, but I would guess that this bit is hurting your performance
AND content_languages_list LIKE '%' + #id_language + '%'
Searching with wildcards like that is always slow. For more info see https://www.brentozar.com/archive/2010/06/sargable-why-string-is-slow/

Why is my SQL query getting disproportionally slow when adding a simple string comparison?

So, I have an SQL query for MSSQL looking like this (simplified for readability):
SELECT ...
FROM (
SELECT ..., ROUND(SUM(TOTAL_TIME)/86400.0,2) ...
FROM MY_DATA
WHERE STATUS NOT IN (107)
GROUP BY ...
) q
WHERE q.Tdays > 0
GROUP BY ...
It works fine, but I need a comparison against another table in the inner query, so I added a left join and said comparison:
SELECT ...
FROM (
SELECT ..., ROUND(SUM(TOTAL_TIME)/86400.0,2) ...
FROM MY_DATA
LEFT JOIN OTHER_TABLE ON MY_DATA.ID=OTHER_TABLE.ID //new JOIN
WHERE STATUS NOT IN (107) AND (DEPARTMENT_ID='SP' OR DEPARTMENT_ID='BL') //new AND branch
GROUP BY ...
) q
WHERE q.Tdays > 0
GROUP BY ...
This query works, but is A LOT slower thant the previous one. The wierd thing is, commenting out the new AND-branch of the WHERE clause while leaving the JOIN as it is makes it faster again. As if it's not joining another table that is slowing the query down, but the actual string comparisons... I am lost as to why this is so slow, or how I could speed it up... any advice would be appreciated!
Use an INNER JOIN. The outer join is being undone by the WHERE clause anyway:
SELECT ..., ROUND(SUM(TOTAL_TIME)/86400.0,2) ...
FROM MY_DATA d INNER JOIN
OTHER_TABLE ot
ON d.ID = ot.ID //new JOIN
WHERE od.STATUS NOT IN (107) AND DEPARTMENT_ID IN ('SP', 'BL') //new AND branch
GROUP BY ...
(The IN shouldn't make a difference; it is just easier to write.)
Next, if this still has slow performance, then look at the execution plans. It means that SQL Server is making a poor decision, probably on the JOIN algorithm. Normally, I fix this by forbidding nested loop joins, but there might be other solutions as well.
It's hard to say definitively what will or won't speed things up without seeing the execution plan. Also, understanding how fast you need it to be affects what steps you might want to (or not want to) consider taking.
What follows is admittedly somewhat vague, but these are a few things that came to mind when I thought about this. Take a look at the execution plan as Philip Couling suggested in that good link to get an idea where the pain points are, and of course, take these suggestions with a grain of salt.
You might consider adding some indexes to either or both of the tables. The execution plan might even give you suggestions on what could be useful, but off the top of my head, something on OTHER_TABLE.DEPARTMENT_ID probably wouldn't hurt.
You might be able to build potential new indexes as Filtered Indexes if you know those hard-coded search terms (like STATUS and DEPARTMENT_ID are always going to be the same).
You could pre-calculate some of this information if it's not changing so rapidly that you need to query it fresh on every call. This comes back to how fast you need it to go, because for just about any query, you can add columns or pre-populated lookup tables to avoid doing work at run time. For example, you could make an new bit field like IsNewOrBranch or IsStatusNot107 (both somewhat egregious steps, but things which could work). Or that might be pre-aggregating the data in the inner query ahead of time.
I know you simplified the query for our benefit, but that also makes it a little hard to know what's going on with the subquery, and the subsequent GROUP BY being performed against that subquery. There might be a way to avoid having to do two group bys.
Along the same vein, you might also look into splitting what you're doing into separate statements if SQL is having a difficult time figuring out how best to return the data. For example, you might populate a temp table or table variable with the results of your inner query, then perform your subsequent GROUP BY on that. While this approach isn't always useful, there are many times where trying to cram all the work into a single query will actually end up being worse than several individual, simple, optimized steps would be.
And as Gordon Linoff suggested, there are a number of query hints which could be used to coax the execution plan into doing things a specific way. But be careful, often that way lies madness.
Your SQL is fine, and restricting your data with an additional AND clause should usually not make it slower.
As it happens, choosing a fast execution path is a hard problem, and SQL Server sometimes (albeit seldom) gets it wrong.
What you can do to help SQL Server find the best execution path is to:
make sure the statistics on your tables are up-to-date and
make sure that there is an "obviously suitable" index that SQL Server can use. SQL Server Management studio will usually give you suggestions on missing indexes when selecting the "show actual execution plan" option.

Slow SQL Queries, Order Table by Date?

I have a Sql-Server-2008 database that I am querying from on the regular that was over 30 million entries (joy!). Unfortunately this database cannot be drastically changed because it is still in use for R/D.
When I query from this database, it takes FOREVER. By that I mean I haven't been patient enough to wait for results (after 2 mins I have to cancel to avoid locking the R/D department out). Even if I use a short date range (more than a few months), it is basically impossible to get any results from it. I am querying with requirements from 4 of the columns and unfortunately have to use an inner-join for another table (which I've been told is very costly in terms of query efficiency -- but it unavoidable). This inner joined table has less than 100k entries.
What I was wondering, is it is possible to organize the table to have it defaultly be ordered by date to reduce the number of results it has to search through?
If this is not possible, is there anything I can do to reduce query times? Is there any other useful information that could assist me in coming up with a solution?
I have included a sample of the query that I use:
SELECT DISTINCT N.TestName
FROM [DalsaTE].[dbo].[ResultsUut] U
INNER JOIN [DalsaTE].[dbo].[ResultsNumeric] N
ON N.ModeDescription = 'Mode 8: Low Gain - Green-Blue'
AND N.ResultsUutId = U.ResultsUutId
WHERE U.DeviceName = 'BO-32-3HK60-00-R'
AND U.StartDateTime > '2011-11-25 01:10:10.001'
ORDER BY N.TestName
Any help or suggestions are appreciated!
It sounds like datetime may be a text based field and subsequently an index isn't being used?
Could you try the following to see if you have any speed improvement:
select distinct N.TestName
from [DalsaTE].[dbo].[ResultsUut] U
inner join [DalsaTE].[dbo].[ResultsNumeric] N
on N.ModeDescription = 'Mode 8: Low Gain - Green-Blue'
and N.ResultsUutId = U.ResultsUutId
where U.DeviceName = 'BO-32-3HK60-00-R'
and U.StartDateTime > cast('2011-11-25 01:10:10.001' as datetime)
order by N.TestName
It would also be worth trying changing your inner join to a left outer join as those occasionally perform faster for no conceivable reason (at least one that I'm not aware of).
you can add an index based on your date column, which should improve your query time. You can either use an alter table command, or use the table designer.
Is the sole purpose of the join to provide sorting? If so, a quick thing to try would be to remove this, and see how much of a difference it makes - at least then you'll know where to focus your attention.
Finally, SQL server management studio has some useful tools such as execution plans that can help diagnose performance issues. Good luck!
There are a number of problems which may be causing delays in the execution of your query.
Indexes (except the primary key) do not reorder the data, they merely create an index (think phonebook) which orders a number of values and points back to the primary key.
Without seeing the type of data or the existing indexes, it's difficult, but at the very least, the following ASCENDING indexes might help:
[DalsaTE].[dbo].[ResultsNumeric] ModeDescription and ResultsUutId and TestName
[DalsaTE].[dbo].[ResultsUut] StartDateTime and DeviceName and ResultsUutId
Without the indexes above, the sample query you gave can be completed without performing a single lookup on the actual table data.

Indexing table with duplicates MySQL/SQL Server with millions of records

I need help in indexing in MySQL.
I have a table in MySQL with following rows:
ID Store_ID Feature_ID Order_ID Viewed_Date Deal_ID IsTrial
The ID is auto generated. Store_ID goes from 1 - 8. Feature_ID from 1 - let's say 100. Viewed Date is Date and time on which the data is inserted. IsTrial is either 0 or 1. You can ignore Order_ID and Deal_ID from this discussion.
There are millions of data in the table and we have a reporting backend that needs to view the number of views in a certain period or overall where trial is 0 for a particular store id and for a particular feature.
The query takes the form of:
select count(viewed_date)
from theTable
where viewed_date between '2009-12-01' and '2010-12-31'
and store_id = '2'
and feature_id = '12'
and Istrial = 0
In SQL Server you can have a filtered index to use for Istrial. Is there anything similar to this in MySQL? Also, Store_ID and Feature_ID have a lot of duplicate data. I created an index using Store_ID and Feature_ID. Although this seems to have decreased the search period, I need better improvement than this. Right now I have more than 4 million rows. To search for a particular query like the one above, it looks at 3.5 million rows in order to give me the count of 500k rows.
PS. I forgot to add view_date filter in the query. Now I have done this.
Well you could expand your index to consist of Store_ID, Feature_ID and IsTrial. You won't get any better than this, performancewise.
My first idea would be an index on (feature_id, store_id, istrial), since feature_id seems to be the column with the highest Shannon entropy. But without knowing the statistics on feature_id i'm not sure. Maybe you should better create two indexes, (store_id, feature_id, istrial) being the other and let the optimizer sort it out. Using all three columns also has the advantage of the database being able to answer your query from the index alone, which should improve performance, too.
But if neither of your columns is selective enough to sufficiently improve index performance, you might have to resort to denormalization by using INSERT/UPDATE triggers to fill a second table (feature_id, store_id, istrial, view_count). This would slow down inserts and updates, of course...
You might want to think about splitting that table horizontally. You could run a nightly job that puts each store_id in a separate table. Or take a look at feature_id, yeah, it's a lot of tables but if you don't need real-time data. It's the route I would take.
If you need to optimize this query specifically in MySQL, why not add istrial to the end of the existing index on Store_ID and Feature_ID. This will completely index away the WHERE clause and will be able to grab the COUNT from the cardinality summary of the index if the table is MyISAM. All of your existing queries that leverage the current index will be unchanged as well.
edit: also, I'm unsure of why you're doing COUNT(viewed_date) instead of COUNT(*)? Is viewed_date ever NULL? If not, you can just use the COUNT(*) which will eliminate the need to go to the .MYD file if you take it in conjunction with my other suggestion.
The best way I found in tackling this problem is to skip DTA's recommendation and do it on my own in the following way:
Use Profiler to find the costliest queries in terms of CPU usage (probably blocking queries) and apply indexes to tables based on those queries. If the query execution plan can be changed to decrease the Read, Writes and overall execution time, then first do that. If not, in which case the query is what it is, then apply clustered/non-clustered index combination to best suit. This depends on the nature of the existing table indexes, the bytes total of columns participating in index, etc.
Run queries in the SSMS to find the most frequently executing queries and do the same as above.
Create a defragmentation schedule in order to either Reorganize or Rebuild indexes depending on how much fragmented they are.
I am pretty sure others can suggest good ideas. Doing these gave me good results. I hope someone can use this help. I think DTA does not really make things faster in terms of indexing because you really need to go through what all indexes it is going to create. This is more true for a database that gets hit a lot.