Slow SQL query involving CONTAINS and OR

Slow SQL query involving CONTAINS and OR - sql

We’re having a problem we were hoping the good folks of Stack Overflow could help us with. We’re running SQL Server 2008 R2 and are having problems with a query that takes a very long time to run on a moderate set of data , about 100000 rows. We're using CONTAINS to search through xml files and LIKE on another column to support leading wild cards.
We’ve reproduced the problem with the following small query that takes about 35 seconds to run:
SELECT something FROM table1
WHERE (CONTAINS(TextColumn, '"WhatEver"') OR
DescriptionColumn LIKE '%WhatEver%')
Query plan:
If we modify the query above to using UNION instead, the running time drops from 35 seconds to < 1 seconds. We would like to avoid using this approach to solve the issue.
SELECT something FROM table1 WHERE (CONTAINS(TextColumn, '"WhatEver"')
UNION
(SELECT something FROM table1 WHERE (DescriptionColumn LIKE '%WhatEver%'))
Query plan:
The column that we’re using CONTAINS to search through is a column with type image and consists of xml files sized anywhere from 1k to 20k in size.
We have no good theories as to why the first query is so slow so we were hoping someone here would have something wise to say on the matter. The query plans don’t show anything out of the ordinary as far as we can tell. We've also rebuilt the indexes and statistics.
Is there anything blatantly obvious we’re overlooking here?
Thanks in advance for your time!

Why are you using DescriptionColumn LIKE '%WhatEver%' instead of CONTAINS(DescriptionColumn, '"WhatEver"')?
CONTAINS is obviously a Full-Text predicate and will use the SQL Server Full-Text engine to filter the search results, however LIKE is a "normal" SQL Server keyword and so SQL Server will not use the Full-Text engine to asist with this query - In this case because the LIKE term begins with a wildcard SQL Server will be unable to use any indexes to help with the query either which will most likely result in a table scan and / or poorer performance than using the Full-Text engine.
Its difficult impossible to tell without an execution plan, however my guess on whats happening would be:
The UNION variation of the query is performing a table scan against table1 - the table scan is not fast, however because there are relatively few rows in the table it is not performing that slowly (compared to a 35s benchmark).
In the OR variation of the query SQL Server is first using the Full-Text engine to filter based on the CONTAINS and then goes on to perform an RDI lookup on each matching row in the result to filter based on the LIKE predicate, however for some reason SQL Server has massively underestimated the number of rows (this can happen with certain types of predicate) and so goes on to perform several thousnad RDI lookups which ends up being incredibly slow (a table scan would have been much quicker).
To really understand whats going on you need to get a query plan.

Did you guys try this:
SELECT *
FROM table
WHERE CONTAINS((column1, column2, column3), '"*keyword*"')
Instead of this:
SELECT *
FROM table
WHERE CONTAINS(column1, '"*keyword*"')
OR CONTAINS(column2, '"*keyword*"')
OR CONTAINS(column3y, '"*keyword*"')
The first one is a lot faster.

I just ran into this. This is reportedly a bug on SQL server 2008 R2:
http://www.arcomit.co.uk/support/kb.aspx?kbid=000060
Your approach of using a UNION of two selects instead of an OR is the workaround they recommend in that article.

Related

DISTINCT with HASH MATCH (Flow Distinct) in SQL Server

Recently, while working in SQL Server, I got an interesting thing that removing DISTINCT keyword actually decreased my query performance and increased my search time.
I read that DISTINCT can make queries slow so I removed it to make them faster, but that made my query even slower.
Further experimenting, I got that when I add DISTINCT, SQL Server actually do something HASH MATCH (Flow Distinct) and this reduces the time, and even parallelism is added with HASH MATCH.
My query looks like this:
SELECT DISTINCT TOP 5000
[A].[Row Id], [A].[Account], ... other columns
FROM
[Archival System].[dbo].[Activity] A
WHERE
([A].[Row Id] LIKE N'%{search_term}%'
OR [A].[Account] LIKE '%{search_term}%'
OR ... others conditions)
When I remove DISTINCT from this query, it becomes slower.
Here I have added TOP without ORDER BY as I a search string is not going to repeat again and if it occurs it would be rare. Hence I am avoiding ORDER BY to get a little more performance.
Below are my query execution plans with the query code of both my queries (previous and later) and the only change in query is just the DISTINCT keyword
Previous : https://www.brentozar.com/pastetheplan/?id=HyV7FKiU9
Later: https://www.brentozar.com/pastetheplan/?id=S1bFV9jUq
Can anybody tell me what this HASH MATCH (Flow Distinct) does, and why it just works when I add distinct?
And is it reliable to depend on that? As I am using DISTINCT now does it will continue to return results with the same time as it is returning now.
OR is there a better way to improve my query search time.
I am using SQL Server 2012 Enterprise edition.
Thanks in advance.

Slow MSSQL Stored Procedure

I am not sure what I am missing, but I have a stored procedure that brings back the newest content in my database (via php), but it is really slow.
I have a View that brings in a specific kind of data (about 8000 records).
My stored procedure looks like this and takes about 9-11 seconds to complete, any advice? Be kind, I am new to this :)
WITH maxdate As (
SELECT id_cr, MAX(date_activation) "LastReading"
FROM [pwf].[dbo].[content_code_service_new_content]
GROUP BY id_cr
)
SELECT DISTINCT TOP 7 s.id_cr, s.date_activation, s.title, s.id_element
FROM [pwf].[dbo].[content_code_service] s
INNER JOIN maxdate t on s.id_cr = t.id_cr and s.date_activation = t.LastReading
WHERE (
id_service = #id_service
AND content_languages_list LIKE '%' + #id_language + '%'
) ORDER by date_activation DESC

Okay, you're admitting your kinda new to this, so after all of this, you'll probably want to do some googling on how to performance tune SQL queries.
But, here's a quick rundown that should help you get through this particular problem.
First up: "Display Actual Execution Plan". One of the most useful tools in MS SQL is the "Display Actual Execution Plan" - which can be found in the Query menu. When this is checked, running the query will create a third tab alongside Results and Messages after you run the query. It'll display each operation the SQL engine had to perform, along with the percentage each took. Usually, this will be enough to figure out what might be wrong (if 1 of your 12 steps took 95% of the time, it's probably indicating how it's using the DB slowly.)
One of the most important things in this is looking at how it's actually reading the data from SQL - they're the right-most nodes in the little tree it constructs. There are a few possibilities:
Table Scan. This is usually bad - it means its having to read the entire table to get what it wants
Clustered Index Scan. This is also usually bad. Clustered Indexes are the table, and if it's Scanning it, it means it's looking through all the records.
Non-Clustered Index Scan. Not optimal, but not necessarily a problem. It's able to use an index to help out, but not enough that it can perform a binary search for what it's looking for (it has to scan the whole index.)
Index Seek (Clustered or Non-Clustered). This is what you're after. It's performing a binary find to get quickly to the specific data its looking for.
So! How do you get Index Seeks? You make sure your table has indexes on the appropriate fields.
From a quick skimming of your query, here are the columns that SQL's having to look up:
content_code_service_new_content.id_cr
content_code_service_new_content.date_activation
content_code_service.id_cr
content_code_service.date_activation
content_code_service.id_service
content_code_service.content_languages_list
So right off the bad, I'd check those two tables, and make sure those columns have indexes for them.

I don't know anything about your data, but I would guess that this bit is hurting your performance
AND content_languages_list LIKE '%' + #id_language + '%'
Searching with wildcards like that is always slow. For more info see https://www.brentozar.com/archive/2010/06/sargable-why-string-is-slow/

Optimization of large SQL Select query in odbc environment, comparison operators

I'm currently designing a database application that executes SQL statements on a SQL Server linked to the PCs via ODBC drivers (SQL Native Client v10, Local Network, Network Latency >1ms, executed from withing a MS Access 2003 Environment).
I'm dealing with a peculiar select query that is executed often and has to iterate through an indexed table with about 1.5 million entries. Currently the query structure is this:
SELECT *
FROM table1
WHERE field1 = value1
AND field2 = value2
AND textfield1 LIKE '* value3 *'
AND (field3 = value3 OR field4 = value4 OR field5 = value5)
ORDER BY indexedField1 DESC
(Simplified for reading comprehension and understandability, the real query can have up to 4 bracketed AND connected OR blocks, and up to a total of 31 AND connected statements).
Currently this query takes about ~2s every time it gets executed. It returns somewhere between 1.000 and 15.000 records in usual production. I'm looking for a way to make it execute faster or to restructure it in a way to make it work faster.
Coworkers of mine have hinted at the fact that using LIKE operators might be performance inefficient and that restructuring the OR statements in brackets could bring additional performance.
Edit: Additional relevant information: the table that is being pulled from is VERY active, there is an entry roughly every 1-5 minutes into it.
So the final question is:
Given my parameters outlined above, is this version of the query the most simplistic I can get it.
Can I do something to otherwise speed up the query or the execution time thereof.

General query optimisation is beyond the scope of a single answer, though there may be some help to be had on http://dba.stackexchange.com. However, you should learn how to read a query plan and figure our your bottlenecks before you start optimising.
(The way I'd do that would be to take a few typical queries and look at their estimated execution plan through a tool like SQL Server Management Studio. You may have to try to dig out the real SQL Server query that's resulted from your Access query, which your DBA might be able to help with. I'm assuming that your Access query is actually being translated into a SQL Server query and run natively on the server; if it's not then that will be your big problem!)
I'm going to assume you've indexed every column used in every predicate in your WHERE clause and still have a problem. If that's the case, the suspect is likely to be:
AND textfield1 LIKE '* value3 *'
Because that can't use an index. (It's not SARGable, because it has a wildcard at the beginning, so an index won't be any help.)
If you can't rearrange your search or pre-calculate this particular predicate, then you basically have the problem that Full-Text Searching was designed to solve by tokenising and pre-indexing the words in your text, and that will probably be the best solution.

Adding inner query is not changing the execution plan

Consider the following queries.
select * from contact where firstname like '%some%'
select * from
(select * from contact) as t1
where firstname like '%some%'
The execution plans for both queries are same and executes at same time. But, I was expecting the second query will have a different plan and execute more slowly as it has to select all data from contact and apply filter. It looks like I was wrong.
I am wondering how this is happening?
Database Server : SQL server 2005

The "query optimiser" is what's happening. When you run a query, SQL Server uses a cost-based optimiser to identify what is likely to be the best way to fulfil that request (i.e. it's execution plan). Think about it as a route map from Place A to Place B. There may be many different ways to get from A to B, some will be quicker than others. SQL Server will workout different routes to achieve the end goal of returning the data that satisfies the query and go with one that has an acceptable cost. Note, it doesn't necessarily analyse EVERY possible way, as that would be unnecessarily expensive.
In your case, the optimiser has worked out that those 2 queries can be collapsed down to the same thing, hence you get the same plan.

Need tips for optimizing SQL Query using a JOIN

The Query I'm writing runs fine when looking at the past few days, once I go over a week it crawls (~20min). I am joining 3 tables together. I was wondering what things I should look for to make this run faster. I don't really know what other information is needed for the post.
EDIT: More info: db is Sybase 10. Query:
SELECT a.id, a.date, a.time, a.signal, a.noise,
b.signal_strength, b.base_id, b.firmware,
a.site, b.active, a.table_key_id
FROM adminuser.station AS a
JOIN adminuser.base AS b
ON a.id = b.base_id
WHERE a.site = 1234 AND a.date >= '2009-03-20'
I also took out the 3rd JOIN and it still runs extremely slow. Should I try another JOIN method?

I don't know Sybase 10 that well, but try running that query for say 10-day period and then 10 times, for each day in a period respectively and compare times. If the time in the first case is much higher, you've probably hit the database cache limits.
The solution is than to simply run queries for shorter periods in a loop (in program, not SQL). It works especially well if table A is partitioned by date.

You can get a lot of information (assuming you're using MSSQL here) by running your query in SQL Server Management Studio with the Include Actual Execution Plan option set (in the Query menu).
This will show you a diagram of the steps that SQLServer performs in order to execute the query - with relative costs against each step.
The next step is to rework the query a little (try doing it a different way) then run the new version and the old version at the same time. You will get two execution plans, with relative costs not only against each step, but against the two versions of the query! So you can tell objectively if you are making progress.
I do this all the time when debugging/optimizing queries.

Make sure you have indexes on the foreign keys.

It sounds more like you have a memory leak or aren't closing database connections in your client code than that there's anything wrong with the query.
[edit]
Nevermind: you mean quering over a date range rather than the duration the server has been active. I'll leave this up to help others avoid the same confusion.
Also, it would help if you could post the sql query, even if you need to obfuscate it some first, and it's a good bet to check if there's an index on your date column and the number of records returned by the longer range.

You may want to look into using a PARTITION for the date ranges, if your DB supports it. I've heard this can help significantly.

Grab the book "Professional SQL Server 2005 Performance Tuning" its pretty great.

You didn't mention your database. If it's not SQL Server, the specifics of how to get the data might be different, but the advice is fundamentally the same.
Look at indexing, for sure, but the first thing to do is to follow Blorgbeard's advice and scan for execution plans using Management Studio (again, if you are running SQL Server).
What I'm guessing you'll see is that for small date ranges, the optimizer picks a reasonable query plan, but that when the date range is large, it picks something completely different, likely involving either table scans or index scans, and possibly joins that lead to very large temporary recordsets. The execution plan analyzer will reveal all of this.
A scan means that the optimizer thinks that grinding over the whole table or the whole index is cheaper for what you are trying to do than seeking specific values.
What you eventually want to do is get indexes and the syntax of your query set up such that you keep index seeks in the query plan for your query regardless of the date range, or, failing that, that the scans you require are filtered as well as you can manage to minimize temporary recordset size and thereby avoid excessive reads and I/O.

SELECT
a.id, a.date, a.time, a.signal, a.noise,a.site, b.active, a.table_key_id,
b.signal_strength, b.base_id, b.firmware
FROM
( SELECT * FROM adminuser.station
WHERE site = 1234 AND date >= '2009-03-20') AS a
JOIN
adminuser.base AS b
ON
a.id = b.base_id
Kind of rewrote the query, so as to first filter the desired rows then perform a join rather than perform a join then filter the result.
Rather than pulling * from the sub-query you can just select the columns you want, which might be little helpful.
May be this will of little help, in speeding things.
While this is valid in MySql, I am not sure of the sysbase syntax though.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas