Optimize SQL Query containing Like Operator - sql

Trying to make the following query neat and run faster. Any insights are helpful. I have to use like Operator since need to search for the pattern anywhere in the field.
Select Col1,Col2,Col3 from TableName where
(Subject like '%Maths%' OR
Subject like '%Physics%' OR
Subject like '%Chemistry%' OR
Subject like '%English%')
AND
(Description like '%Maths%' OR
Description like '%Physics%' OR
Description like '%Chemistry%' OR
DESCRIPTION like '%English%')
AND
(Extra like '%Maths%' OR
Extra like '%Physics%' OR
Extra like '%Chemistry%' OR
Extra like '%English%') AND Created Date > 2017-01-01

Basically, you can't optimize this query using basic SQL. If the strings you are searching for are words in the texts, then you can use a full text string. The place to start in learning about this is the documentation.
If you happen to know that you will be searching for these four strings, you can set up computed columns and then build indexes on the computed columns. That would be fast. But you would be limited to exactly those strings.
All is not lost. Technically, there are other solutions, such as those based on n-grams or by converting to XML/JSON and indexing that. However, these are either not supported in SQL Server or non-trivial to implement.

You can try this using CTE and charindex
compare execution plan from your OP.
;with mycte as ( --- building words to look for
select * from
(values ('English'),
('Math'),
('Physics'),
('English'),
('Chemistry')
) t (words)
)
select t.* from
tablename t
inner join mycte c
on charindex(c.words,t.subject) > 0 --- check if there is a match
and charindex(c.words,t.description) > 0 --- check if there is a match
and charindex(c.words,t.extra) > 0 --- check if there is a match
where
t.createddate > '2017-01-01'

Related

SQL full text search behavior on numeric values

I have a table with about 200 million records. One of the columns is defined as varchar(100) and it's included in a full text index. Most of the values are numeric. Only few are not numeric.
The problem is that it's not working well. For example if a row contains the value '123456789' and i look for '567', it's not returning this row. It will only return rows where the value is exactly '567'.
What am I doing wrong?
sql server 2012.
Thanks.
Full text search doesn't support leading wildcards
In my setup, these return the same
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'28400')
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'"2840*"')
This gives zero rows
SELECT *
FROM [dbo].[somelogtable]
where CONTAINS (logmessage, N'"*840*"')
You'll have to use LIKE or some fancy trigram approach
The problem is probably that you are using a wrong tool since Full-text queries perform linguistic searches and it seems like you want to use simple "like" condition.
If you want to get a solution to your needs then you can post DDL+DML+'desired result'
You can do this:
....your_query.... LIKE '567%' ;
This will return all the rows that have a number 567 in the beginning, end or in between somewhere.
99% You're missing % after and before the string you search in the LIKE clause.
es:
SELECT * FROM t WHERE att LIKE '66'
is the same as as using WHERE att = '66'
if you write:
SELECT * FROM t WHERE att LIKE '%66%'
will return you all the lines containing 2 'sixes' one after other

SQL FTS and comparison statements

Short story. I am working on a project where I need to communicate with SQLite database. And there I have several problems:
There is one FTS table with nodeId and nodeName columns. I need to select all nodeIds for which nodeNames contains some text pattern. For instance all node names with "Donald" inside. Something similar was discussed in this thread. The point is that I can't use CONTAINS keyword. Instead I use MATCH. And here is the question itself: how should this "Donald" string be "framed"? With '*' or with '%' character? Here is my query:
SELECT * FROM nodeFtsTable WHERE nodeName MATCH "Donald"
Is it OK to write multiple comparison in SELECT statement? I mean something like this:
SELECT * FROM distanceTable WHERE pointId = 1 OR pointId = 4 OR pointId = 203 AND distance<200
I hope that it does not sound very confusing. Thank you in advance!
Edit: Sorry, I missed the fact that you are using FTS4. It looks like you can just do this:
SELECT * FROM nodeFtsTable WHERE nodeName MATCH 'Donald'
Here is relevant documentation.
No wildcard characters are needed in order to match all entries in which Donald is a discrete word (e.g. the above will match Donald Duck). If you want to match Donald as part of a word (e.g. Donalds) then you need to use * in the appropriate place:
SELECT * FROM nodeFtsTable WHERE nodeName MATCH 'Donald*'
If your query wasn't working, it was probably because you used double quotes.
From the SQLite documentation:
The MATCH operator is a special syntax for the match()
application-defined function. The default match() function
implementation raises an exception and is not really useful for
anything. But extensions can override the match() function with more
helpful logic.
FTS4 is an extension that provides a match() function.
Yes, it is ok to use multiple conditions as in your second query. When you have a complex set of conditions, it is important to understand the order in which the conditions will be evaluated. AND is always evaluated before OR (they are analagous to mathematical multiplication and addition, respectively). In practice, I think it is always best to use parentheses for clarity when using a combination of AND and OR:
--This is the same as with no parentheses, but is clearer:
SELECT * FROM distanceTable WHERE
pointId = 1 OR
pointId = 4 OR
(pointId = 203 AND distance<200)
--This is something completely different:
SELECT * FROM distanceTable WHERE
(pointId = 1 OR pointId = 4 OR pointId = 203) AND
distance<200

Count and Order By Where clause Matches

I'm writing some very simple search functionality for a list of FAQ's. I'm splitting the search string on a variety of characters, including spaces. Then performing a select along the lines of
SELECT *
FROM "faq"
WHERE
((LOWER("Question") LIKE '%what%'
OR LOWER("Question") LIKE '%is%'
OR LOWER("Question") LIKE '%a%'
OR LOWER("Question") LIKE '%duck%'))
I've had to edit this slightly as its generated by our data access layer but it should give you an idea of whats going on.
The problem is demonstrated well with the above query in that most questions are likely to have the words a or is in them, however I can not filter these out as acronyms may well be important for the searcher. What has been suggested is we order by the number of matching keywords. However I have been unable to find a way of doing this in SQL (we do not have time to create a simple search engine with an index of keywords etc). Does anyone know if there's a way of counting the number of LIKE matches in an SQL statement and ordering by that so that the questions with the most keywords appear at the top of the results?
I assume the list of matching keywords is being entered by the user and inserted into the query dynamically by the application, immediately prior to executing the query. If so, I suggest amending the query like so:
SELECT *
FROM "faq"
WHERE
((LOWER("Question") LIKE '%what%'
OR LOWER("Question") LIKE '%is%'
OR LOWER("Question") LIKE '%a%'
OR LOWER("Question") LIKE '%duck%'))
order by
case when LOWER("Question") LIKE '%what%' then 1 else 0 end +
case when LOWER("Question") LIKE '%is%' then 1 else 0 end +
case when LOWER("Question") LIKE '%a%' then 1 else 0 end +
case when LOWER("Question") LIKE '%duck%' then 1 else 0 end
descending;
This would even enable you to "weight" the importance of each selection term, assuming the user (or an algorithm) could assign a weighting to each term.
One caveat: if your query is being constructed dynamically, are you aware of the risk of SQL Insertion attacks?
You can write a function which counts the occurrences of one string in another like this:
CREATE OR REPLACE FUNCTION CountInString(text,text)
RETURNS integer AS $$
SELECT(Length($1) - Length(REPLACE($1, $2, ''))) / Length($2) ;
$$ LANGUAGE SQL IMMUTABLE;
And use it in the select: select CountInString("Question",' what ') from "faq".

Searching a column containing CSV data in a MySQL table for existence of input values

I have a table say, ITEM, in MySQL that stores data as follows:
ID FEATURES
--------------------
1 AB,CD,EF,XY
2 PQ,AC,A3,B3
3 AB,CDE
4 AB1,BC3
--------------------
As an input, I will get a CSV string, something like "AB,PQ". I want to get the records that contain AB or PQ. I realized that we've to write a MySQL function to achieve this. So, if we have this magical function MATCH_ANY defined in MySQL that does this, I would then simply execute an SQL as follows:
select * from ITEM where MATCH_ANY(FEAURES, "AB,PQ") = 0
The above query would return the records 1, 2 and 3.
But I'm running into all sorts of problems while implementing this function as I realized that MySQL doesn't support arrays and there's no simple way to split strings based on a delimiter.
Remodeling the table is the last option for me as it involves lot of issues.
I might also want to execute queries containing multiple MATCH_ANY functions such as:
select * from ITEM where MATCH_ANY(FEATURES, "AB,PQ") = 0 and MATCH_ANY(FEATURES, "CDE")
In the above case, we would get an intersection of records (1, 2, 3) and (3) which would be just 3.
Any help is deeply appreciated.
Thanks
First of all, the database should of course not contain comma separated values, but you are hopefully aware of this already. If the table was normalised, you could easily get the items using a query like:
select distinct i.Itemid
from Item i
inner join ItemFeature f on f.ItemId = i.ItemId
where f.Feature in ('AB', 'PQ')
You can match the strings in the comma separated values, but it's not very efficient:
select Id
from Item
where
instr(concat(',', Features, ','), ',AB,') <> 0 or
instr(concat(',', Features, ','), ',PQ,') <> 0
For all you REGEXP lovers out there, I thought I would add this as a solution:
SELECT * FROM ITEM WHERE FEATURES REGEXP '[[:<:]]AB|PQ[[:>:]]';
and for case sensitivity:
SELECT * FROM ITEM WHERE FEATURES REGEXP BINARY '[[:<:]]AB|PQ[[:>:]]';
For the second query:
SELECT * FROM ITEM WHERE FEATURES REGEXP '[[:<:]]AB|PQ[[:>:]]' AND FEATURES REGEXP '[[:<:]]CDE[[:>:]];
Cheers!
select *
from ITEM where
where CONCAT(',',FEAURES,',') LIKE '%,AB,%'
or CONCAT(',',FEAURES,',') LIKE '%,PQ,%'
or create a custom function to do your MATCH_ANY
Alternatively, consider using RLIKE()
select *
from ITEM
where ','+FEATURES+',' RLIKE ',AB,|,PQ,';
Just a thought:
Does it have to be done in SQL? This is the kind of thing you might normally expect to write in PHP or Python or whatever language you're using to interface with the database.
This approach means you can build your query string using whatever complex logic you need and then just submit a vanilla SQL query, rather than trying to build a procedure in SQL.
Ben

Regular expressions inside SQL Server

I have stored values in my database that look like 5XXXXXX, where X can be any digit. In other words, I need to match incoming SQL query strings like 5349878.
Does anyone have an idea how to do it?
I have different cases like XXXX7XX for example, so it has to be generic. I don't care about representing the pattern in a different way inside the SQL Server.
I'm working with c# in .NET.
You can write queries like this in SQL Server:
--each [0-9] matches a single digit, this would match 5xx
SELECT * FROM YourTable WHERE SomeField LIKE '5[0-9][0-9]'
stored value in DB is: 5XXXXXX [where x can be any digit]
You don't mention data types - if numeric, you'll likely have to use CAST/CONVERT to change the data type to [n]varchar.
Use:
WHERE CHARINDEX(column, '5') = 1
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
CHARINDEX
ISNUMERIC
i have also different cases like XXXX7XX for example, so it has to be generic.
Use:
WHERE PATINDEX('%7%', column) = 5
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
PATINDEX
Regex Support
SQL Server 2000+ supports regex, but the catch is you have to create the UDF function in CLR before you have the ability. There are numerous articles providing example code if you google them. Once you have that in place, you can use:
5\d{6} for your first example
\d{4}7\d{2} for your second example
For more info on regular expressions, I highly recommend this website.
Try this
select * from mytable
where p1 not like '%[^0-9]%' and substring(p1,1,1)='5'
Of course, you'll need to adjust the substring value, but the rest should work...
In order to match a digit, you can use [0-9].
So you could use 5[0-9][0-9][0-9][0-9][0-9][0-9] and [0-9][0-9][0-9][0-9]7[0-9][0-9][0-9]. I do this a lot for zip codes.
SQL Wildcards are enough for this purpose. Follow this link: http://www.w3schools.com/SQL/sql_wildcards.asp
you need to use a query like this:
select * from mytable where msisdn like '%7%'
or
select * from mytable where msisdn like '56655%'