Slow MSSQL Stored Procedure - sql

I am not sure what I am missing, but I have a stored procedure that brings back the newest content in my database (via php), but it is really slow.
I have a View that brings in a specific kind of data (about 8000 records).
My stored procedure looks like this and takes about 9-11 seconds to complete, any advice? Be kind, I am new to this :)
WITH maxdate As (
SELECT id_cr, MAX(date_activation) "LastReading"
FROM [pwf].[dbo].[content_code_service_new_content]
GROUP BY id_cr
)
SELECT DISTINCT TOP 7 s.id_cr, s.date_activation, s.title, s.id_element
FROM [pwf].[dbo].[content_code_service] s
INNER JOIN maxdate t on s.id_cr = t.id_cr and s.date_activation = t.LastReading
WHERE (
id_service = #id_service
AND content_languages_list LIKE '%' + #id_language + '%'
) ORDER by date_activation DESC

Okay, you're admitting your kinda new to this, so after all of this, you'll probably want to do some googling on how to performance tune SQL queries.
But, here's a quick rundown that should help you get through this particular problem.
First up: "Display Actual Execution Plan". One of the most useful tools in MS SQL is the "Display Actual Execution Plan" - which can be found in the Query menu. When this is checked, running the query will create a third tab alongside Results and Messages after you run the query. It'll display each operation the SQL engine had to perform, along with the percentage each took. Usually, this will be enough to figure out what might be wrong (if 1 of your 12 steps took 95% of the time, it's probably indicating how it's using the DB slowly.)
One of the most important things in this is looking at how it's actually reading the data from SQL - they're the right-most nodes in the little tree it constructs. There are a few possibilities:
Table Scan. This is usually bad - it means its having to read the entire table to get what it wants
Clustered Index Scan. This is also usually bad. Clustered Indexes are the table, and if it's Scanning it, it means it's looking through all the records.
Non-Clustered Index Scan. Not optimal, but not necessarily a problem. It's able to use an index to help out, but not enough that it can perform a binary search for what it's looking for (it has to scan the whole index.)
Index Seek (Clustered or Non-Clustered). This is what you're after. It's performing a binary find to get quickly to the specific data its looking for.
So! How do you get Index Seeks? You make sure your table has indexes on the appropriate fields.
From a quick skimming of your query, here are the columns that SQL's having to look up:
content_code_service_new_content.id_cr
content_code_service_new_content.date_activation
content_code_service.id_cr
content_code_service.date_activation
content_code_service.id_service
content_code_service.content_languages_list
So right off the bad, I'd check those two tables, and make sure those columns have indexes for them.

I don't know anything about your data, but I would guess that this bit is hurting your performance
AND content_languages_list LIKE '%' + #id_language + '%'
Searching with wildcards like that is always slow. For more info see https://www.brentozar.com/archive/2010/06/sargable-why-string-is-slow/

Related

How to speed up this LIKE query?

I have a like query that’s processing millions of rows:
SELECT
sample_id,
REPLACE( sample_id, '*', '') AS term
FROM
sample.table
WHERE
sample_id LIKE '%*%'
ORDER BY
sample_id ASC;
I tried batching the queries but its still too slow to process. Have someone experienced this in the past and successfully solved this? I’m basically open to any ideas at this point. Thanks!
You did not mention which RDBMS you are using, but you can speed up processing by using properly designed index.
Index properties (basing on Microsoft SQL Server RDBMS):
filtered index:
you can implement a filtered index. Filter corresponds to the WHERE clause from your query. You can add "sample_id LIKE '%*%'" as a filter condition.
covering index:
your query is not complicated, so it should be easy to create a covering index
for it. By covering index I mean a structure which will contain all the columns
which are mentioned in your query, it will help the RDBMS engine to decide to
use it during execution becaue it will contain all the needed columns, and the
filter also, as mentioned in the first point.
So the syntax could look like this (Microsoft SQL Server pseudo code):
CREATE INDEX idx1 ON your_table_name (sample_id) WHERE sample_id LIKE '%*%'
If you would build it, you would have a DEDICATED structure for your query. You can think of it as of a subset of the data from your table, but physicaly present in your database, written to disk and being constantly updated as the data changes. As long as this index has the filter, it contains only the rows needed by your query. So you can imagine that if the RDBMS engine would choose it - by parsing and analyzing your code - the WHERE clause would not have to execute.
Unfortunatelly, I am not aware if other RDBMSes than Microsoft SQL Server deliver filtered indexes.
If your RDBMS doesn't allow for filtered indexes you can at least create a covering one. Still it might be lighter structure than your table, however, you didn't present the structure of your table.
An index doesn't come without a cost but this is a further story. Just remember that it takes place on disk and is being updated along with the data in your table.

SQL Server: Performance of searching for hex strings in large tables (using LIKE, Full-Text Search, etc.)

I have a table with 40+ million rows in MS SQL Server 2019. One of the columns store pure hexadecimal strings (both binary and readable ASCII content). I need to search this table for rows containing a specific hex string.
Normally, I would do this:
SELECT * FROM transactionoutputs WHERE outhex LIKE '%74657374%' ORDER BY id DESC OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;
Since the results are paginated, it can take less than a second to find the first 10 results. However, when increasing the offset, or searching for strings that only appear 1-2 times in the entire table, it can take more than a minute, at which point my application will time out.
The execution plan for this query is this:
Are there any easy ways to improve the performance of such a search?
Using this answer, I was able to reduce the query time from 33 seconds to 27 seconds:
SELECT * FROM transactionoutputs WHERE
CHARINDEX('74657374' collate Latin1_General_BIN, outhex collate Latin1_General_BIN) > 0
ORDER BY id DESC OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;
When I leave out the ORDER BY and pagination, I can reduce this to 19 seconds. This is not ideal because I need both the ordering and pagination. It still has to scan the entire table
I have tried the following:
Create an index on that column. This has no noticeable effect.
I came across this article about slow queries. Initially, I was using parameterized queries in my application, which was much slower than running them in SSMS. I have since moved to the query shown above, but it is still slow.
I tried to enable Multiple Active Result Sets (MARS), but without any improvement in query time.
I also tried using Full-Text Search. This seemed to be the most promising solution as text search is exactly what I need. I created a full-text index and can do a similar query like above, but using the index:
SELECT * FROM transactionoutputs WHERE CONTAINS(outhex,'7465') ORDER BY id desc OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;
This returns results almost instantly. However, when the query is longer than a few characters (often 4), it doesn't return anything. Am I doing something wrong or why is it doing that?
The execution plan:
My understanding is that my case is not the ideal use case for FTS, as it is designed to search in readable text and not hexadecimal strings. Is it possible to use anyway, and if so, how?
After reading dozens of articles and SO posts, I can not confidently say I know how to improve the performance of such queries for my specific use case, if it is even possible at all. So, is there any easy option to improve this?
First, Kudo's for the fantastic explanation of your problem. This helps you get better answers fast. You should also include DDL, including indexes when possible. This will be come clear as I answer your question.
I'm going to tackle a couple issues with your query which are unrelated to how your parsing your text now and talk about how to handle the string problem later tonight.
Answer Part 1: Unrelated to string parsing
It's quite possible that the way you are searching through the string is the main performance problem. Let's start with the SELECT * - do you absolutely need all the columns? Specifically, do you absolutely need all the columns that are included in that Key lookup? This is the most important thing to sort out. Let me explain.
You're query is performing a scan against your nonclustered index named outhex-index, then performs a Key lookup to retrieve the rows not included in outhex-index. Key lookups destroy performance, especially against a clustered and nonclustered index with 40,000,000 rows.
If you do need those columns, then you should consider adding them as included columns to your outhex-index index. I say consider because I don't know how many columns nor the data type. Include columns speed up queries by eliminating costly key lookups but they slow down data modification, sometimes dramatically depending on the number/type of indexes. If you need the columns not included in outhex-index and they are big columns (MAX/BLOB/LOB data types, XML, etc) then a covering index is NOT an option. If your don't need them then you she refactor your SELECT * statement to only include the columns you need.
Full text indexing is not an option here unless you can find a way to lose that sort. Sorting has an N log N complexity which means the sort gets more expensive the more rows you sort. A 40 million row sort should be avoided whenever possible. This will be hard to avoid with full text indexing for reasons which require more time to explain then I have time for. Adding/modifying a 40-million row index can be expensive and take a lot of time. If you do go that route I suggest taking an offline copy of that table to time how long it takes to build. You can also consider adding creating a filtered index if possible to reduce your search area.
I noticed, too, that both queries are getting serial execution plans. I don't know if a parallel plan will help the first query with the key lookup but I know that it will likely help with the second one as there is a sort involved. Parallel execution plans can really speed up sorts. Consider testing your query with OPTION (QUERYTRACEON 8649) or make_parallel() by Adam Machanic.
I'll update this post tonight with some ideas to parse your string faster. One thing you could look into in the meantime is Paul White's clever Trigram Wildcard String Search trick which might be an option.

Using temp table for sorting data in SQL Server

Recently, I came across a pattern (not sure, could be an anti-pattern) of sorting data in a SELECT query. The pattern is more of a verbose and non-declarative way for ordering data. The pattern is to dump relevant data from actual table into temporary table and then apply orderby on a field on the temporary table. I guess, the only reason why someone would do that is to improve the performance (which I doubt) and no other benefit.
For e.g. Let's say, there is a user table. The table might contain rows in millions. We want to retrieve all the users whose first name starts with 'G' and sorted by first name. The natural and more declarative way to implement a SQL query for this scenario is:
More natural and declarative way
SELECT * FROM Users
WHERE NAME LIKE 'G%'
ORDER BY Name
Verbose way
SELECT * INTO TempTable
FROM Users
WHERE NAME LIKE 'G%'
SELECT * FROM TempTable
ORDER BY Name
With that context, I have few questions:
Will there be any performance difference between two ways if there is no index on the first name field. If yes, which one would be better.
Will there be any performance difference between two ways if there is an index on the first name field. If yes, which one would be better.
Should not the SQL Server optimizer generate same execution plan for both the ways?
Is there any benefit in writing a verbose way from any other persective like locking/blocking?
Thanks in advance.
Reguzlarly: Anti pattern by people without an idea what they do.
SOMETIMES: ok, because SQL Server has a problem that is not resolvable otherwise - not seen that one in yeas, though.
It makes things slower because it forces the tmpddb table to be fully populated FIRST, while otherwise the query could POSSIBLY be resoled more efficiently.
last time I saw that was like 3 years ago. We got it 3 times as fast by not being smart and using a tempdb table ;)
Answers:
1: No, it still needs a table scan, obviously.
2: Possibly - depends on data amount, but an index seek by index would contain the data in order already (as the index is ordered by content).
3: no. Obviously. Query plan optimization is statement by statement. By cutting the execution in 2, the query optimizer CAN NOT merge the join into the first statement.
4: Only if you run into a query optimizer issue or a limitation of how many tables you can join - not in that degenerate case (degenerate in a technical meaning - i.e. very simplistic). BUt if you need to join MANY MANY tables it may be better to go with an interim step.
If the field you want to do an order by on is not indexed, you could put everything into a temp table and index it and then do the ordering and it might be faster. You would have to test to make sure.
There is never any benefit of the second approach that I can think of.
It means if the data is available pre-ordered SQL Server can't take advantage of this and adds an unnecessary blocking operator and additional sort to the plan.
In the case that the data is not available pre-ordered SQL Server will sort it in a work table either in memory or tempdb anyway and adding an explicit #temp table just adds an unnecessary additional step.
Edit
I suppose one case where the second approach could give an apparent benefit might be if the presence of the ORDER BY caused SQL Server to choose a different plan that turned out to be sub optimal. In which case I would resolve that in a different way by either improving statistics or by using hints/query rewrite to avoid the undesired plan.

How to improve query performance

I have a lot of records in table. When I execute the following query it takes a lot of time. How can I improve the performance?
SET ROWCOUNT 10
SELECT StxnID
,Sprovider.description as SProvider
,txnID
,Request
,Raw
,Status
,txnBal
,Stxn.CreatedBy
,Stxn.CreatedOn
,Stxn.ModifiedBy
,Stxn.ModifiedOn
,Stxn.isDeleted
FROM Stxn,Sprovider
WHERE Stxn.SproviderID = SProvider.Sproviderid
AND Stxn.SProviderid = ISNULL(#pSProviderID,Stxn.SProviderid)
AND Stxn.status = ISNULL(#pStatus,Stxn.status)
AND Stxn.CreatedOn BETWEEN ISNULL(#pStartDate,getdate()-1) and ISNULL(#pEndDate,getdate())
AND Stxn.CreatedBy = ISNULL(#pSellerId,Stxn.CreatedBy)
ORDER BY StxnID DESC
The stxn table has more than 100,000 records.
The query is run from a report viewer in asp.net c#.
This is my go-to article when I'm trying to do a search query that has several search conditions which might be optional.
http://www.sommarskog.se/dyn-search-2008.html
The biggest problem with your query is the column=ISNULL(#column, column) syntax. MSSQL won't use an index for that. Consider changing it to (column = #column AND #column IS NOT NULL)
You should consider using the execution plan and look for missing indexes. Also, how long it takes to execute? What is slow for you?
Maybe you could also not return so many rows, but that is just a guess. Actually we need to see your table and indexes plus the execution plan.
Check sql-tuning-tutorial
For one, use SELECT TOP () instead of SET ROWCOUNT - the optimizer will have a much better chance that way. Another suggestion is to use a proper inner join instead of potentially ending up with a cartesian product using the old style table,table join syntax (this is not the case here but it can happen much easier with the old syntax). Should be:
...
FROM Stxn INNER JOIN Sprovider
ON Stxn.SproviderID = SProvider.Sproviderid
...
And if you think 100K rows is a lot, or that this volume is a reason for slowness, you're sorely mistaken. Most likely you have really poor indexing strategies in place, possibly some parameter sniffing, possibly some implicit conversions... hard to tell without understanding the data types, indexes and seeing the plan.
There are a lot of things that could impact the performance of query. Although 100k records really isn't all that many.
Items to consider (in no particular order)
Hardware:
Is SQL Server memory constrained? In other words, does it have enough RAM to do its job? If it is swapping memory to disk, then this is a sure sign that you need an upgrade.
Is the machine disk constrained. In other words, are the drives fast enough to keep up with the queries you need to run? If it's memory constrained, then disk speed becomes a larger factor.
Is the machine processor constrained? For example, when you execute the query does the processor spike for long periods of time? Or, are there already lots of other queries running that are taking resources away from yours...
Database Structure:
Do you have indexes on the columns used in your where clause? If the tables do not have indexes then it will have to do a full scan of both tables to determine which records match.
Eliminate the ISNULL function calls. If this is a direct query, have the calling code validate the parameters and set default values before executing. If it is in a stored procedure, do the checks at the top of the s'proc. Unless you are executing this with RECOMPILE that does parameter sniffing, those functions will have to be evaluated for each row..
Network:
Is the network slow between you and the server? Depending on the amount of data pulled you could be pulling GB's of data across the wire. I'm not sure what is stored in the "raw" column. The first question you need to ask here is "how much data is going back to the client?" For example, if each record is 1MB+ in size, then you'll probably have disk and network constraints at play.
General:
I'm not sure what "slow" means in your question. Does it mean that the query is taking around 1 second to process or does it mean it's taking 5 minutes? Everything is relative here.
Basically, it is going to be impossible to give a hard answer without a lot of questions asked by you. All of these will bear out if you profile the queries, understand what and how much is going back to the client and watch the interactions amongst the various parts.
Finally depending on the amount of data going back to the client there might not be a way to improve performance short of hardware changes.
Make sure Stxn.SproviderID, Stxn.status, Stxn.CreatedOn, Stxn.CreatedBy, Stxn.StxnID and SProvider.Sproviderid all have indexes defined.
(NB -- you might not need all, but it can't hurt.)
I don't see much that can be done on the query itself, but I can see things being done on the schema :
Create an index / PK on Stxn.SproviderID
Create an index / PK on SProvider.Sproviderid
Create indexes on status, CreatedOn, CreatedBy, StxnID
Something to consider: When ROWCOUNT or TOP are used with an ORDER BY clause, the entire result set is created and sorted first and then the top 10 results are returned.
How does this run without the Order By clause?

SQL Server Express performance issue

I know my questions will sound silly and probably nobody will have perfect answer but since I am in a complete dead-end with the situation it will make me feel better to post it here.
So...
I have a SQL Server Express database that's 500 Mb. It contains 5 tables and maybe 30 stored procedure. This database is use to store articles and is use for the Developer It web site. Normally the web pages load quickly, let's say 2 ou 3 sec. BUT, sqlserver process uses 100% of the processor for those 2 or 3 sec.
I try to find which stored procedure was the problem and I could not find one. It seems like every read into the table dans contains the articles (there are about 155,000 of them and 20 or so gets added every 15 minutes).
I added few indexes but without luck...
It is because the table is full text indexed ?
Should I have order with the primary key instead of date ? I never had any problems with ordering by dates....
Should I use dynamic SQL ?
Should I add the primary key into the URL of the articles ?
Should I use multiple indexes for separate columns or one big index ?
I you want more details or code bits, just ask for it.
Basically, every little hint is much appreciated.
Thanks.
If your index is not being used, then it usually indicates one of two problems:
Non-sargable predicate conditions, such as WHERE DATEPART(YY, Column) = <something>. Wrapping columns in a function will impair or eliminate the optimizer's ability to effectively use an index.
Non-covered columns in the output list, which is very likely if you're in the habit of writing SELECT * instead of SELECT specific_columns. If the index doesn't cover your query, then SQL Server needs to perform a RID/key lookup for every row, one by one, which can slow down the query so much that the optimizer just decides to do a table scan instead.
See if one of these might apply to your situation; if you're still confused, I'd recommend updating the question with more information about your schema, the data, and the queries that are slow. 500 MB is very small for a SQL database, so this shouldn't be slow. Also post what's in the execution plan.
Use SQL Profiler to capture a lot of typical queries used in your app. Then run the profiler results through index tuning wizard. That will tell you what indexes can be added to optimize.
Then look at the worst performing queries and analyze their execution plans manually.