SQL Server: how to optimize "like" queries? - sql

I have a query that searches for clients using "like" with wildcard. For example:
SELECT TOP (10)
[t0].[CLIENTNUMBER],
[t0].[FIRSTNAME],
[t0].[LASTNAME],
[t0].[MI],
[t0].[MDOCNUMBER]
FROM [dbo].[CLIENT] AS [t0]
WHERE (LTRIM(RTRIM([t0].[DOCREVNO])) = '0')
AND ([t0].[FIRSTNAME] LIKE '%John%')
AND ([t0].[LASTNAME] LIKE '%Smith%')
AND ([t0].[SSN] LIKE '%123%')
AND ([t0].[CLIENTNUMBER] LIKE '%123%')
AND ([t0].[MDOCNUMBER] LIKE '%123%')
AND ([t0].[CLIENTINDICATOR] = 'ON')
It can also use less parameters in "where" clause, for example:
SELECT TOP (10)
[t0].[CLIENTNUMBER],
[t0].[FIRSTNAME],
[t0].[LASTNAME],
[t0].[MI],
[t0].[MDOCNUMBER]
FROM [dbo].[CLIENT] AS [t0]
WHERE (LTRIM(RTRIM([t0].[DOCREVNO])) = '0')
AND ([t0].[FIRSTNAME] LIKE '%John%')
AND ([t0].[CLIENTINDICATOR] = 'ON')
Can anybody tell what is the best way to optimize performance of such query? Maybe I need to create an index? This table can have up to 1000K records in production.

To do much for a LIKE where the pattern has the form '%XXX%', you want to look up SQL Server's full-text indexing capability, and use CONTAINS instead of LIKE. As-is, you're doing a full table scan, because a normal index won't help with a search for an item that starts with a wild card -- but a full-text index will.
/* ... */
WHERE (LTRIM(RTRIM([t0].[DOCREVNO])) = '0')
AND (contains([t0].[FIRSTNAME], 'John'))
AND (contains([t0].[LASTNAME], 'Smith'))
AND (contains([t0].[SSN], '123'))
AND (contains([t0].[CLIENTNUMBER],'123'))
AND (contains([t0].[MDOCNUMBER], '123'))
AND ([t0].[CLIENTINDICATOR] = 'ON')

Looks like some databases (PostgreSQL 7.1+, MySQL v3.23.23+, Microsoft-SQL v???, ) already contains such things:
MySQL>> ALTER TABLE articles ADD FULLTEXT(body, title);
MySQL>> SELECT * FROM articles WHERE MATCH(title, body) AGAINST ('PHP')
MS-SQL>> SELECT ProductName FROM Products WHERE FREETEXT (ProductName, 'spread' )
PgSQL>> CREATE FUNCTION fti() RETURNS opaque AS '/path/to/fti.so' LANGUAGE 'C';
PgSQL>> CREATE TABLE articles_fti (string type, id oid);
....
Oracle..., Sybase...

Related

Performance comparison: SQL query using in and like

Which query runs faster?
Considering it returns 200 records and authorname is atleast 20 chars long and authorname is fulltext indexed
select * from quotestable
where quotesauthor like (select Authorname from Authortable where authorid =45)
.
select * from quotestable
where quotesauthor in (select Authorname from Authortable where authorid =45)
It's not a question of "faster". They have different meanings.
The first query can only run if the subquery returns 0 or 1 records (and should normally use TOP 1 to guarantee this). However, it can do wildcard matching on the results. The second query can run if the subquery returns any number of records, but will not do wildcard matching.
It sounds like what you should really have here is a JOIN:
SELECT q.*
FROM quotestable q
INNER JOIN AuthorTable a ON q.quotesauthor = a.authorname
WHERE a.authorid = 45
... assuming of course that AuthorID or AuthorName is unique in AuthorTable. This will also allow to use LIKE with wildcards for the matching condition, in the case where the quotesauthor field might not always be a direct match with AuthorTable.AuthorName.
While I'm here, it's also strange to me that AuthorName would be full-text indexed. A traditional index, rather than fulltext, would be more helpful for this query. The only reason to use fulltext here is if you have full names like 'John Milton' in that field, and want to be able to do things like search on last name only or first name only. But even in that case, it seems like you'd be much better served by storing those as their own fields and removing the fulltext index. Fulltext indexes work best on longer fields, like descriptions or articles/posts.
Not the same. One test equality, other test like expression...
try this for equality
select * from quotestable f1
where exists
(
select * from Authortable f2
where f2.authorid =45 and f1.quotesauthor =f2.Authorname
)
try this for like
select * from quotestable f1
where exists
(
select * from Authortable f2
where f2.authorid =45 and f1.quotesauthor like '%' + f2.Authorname + '%'
)

SQL Server Full-Text-Search FREETEXTTABLE search multiple columns

I'm using the below query to return results from a table using Full-Text-Search.
In SQL2000 it was only possible to search one or all columns in a table. Is it possible in SQL 2008?
I would like to search two tables, Problem and Solution (Both indexed and in the same table):
DECLARE #topRank int set #topRank=(SELECT MAX(RANK)
FROM FREETEXTTABLE([Support_Calls], Problem, 'test', 1))
SELECT [ID] AS [Call No],Company_Name, Problem, Solution, CONVERT(VARCHAR(20),CAST((CAST(ftt.RANK as DECIMAL)/#topRank * 100) AS DECIMAL(13,0))) + '%' as Match
FROM [Support_Calls] INNER JOIN FREETEXTTABLE([Support_Calls], Problem, 'test') as ftt ON ftt.[KEY]=[ID] ORDER BY ftt.RANK DESC;
From what I can see the FREETEXTTABLE does not accept more than one column?
You specify them in parentheses; FREETEXTTABLE(tablename, (col1,col2,col3), 'expr') or use an asterisk to seach all columns in the index.
From MSDN,
Returns a table of zero, one, or more rows for those columns containing character-based data types for values that match the meaning, but not the exact wording, of the text in the specified freetext_string. FREETEXTTABLE can only be referenced in the FROM clause of a SELECT statement like a regular table name.
Queries using FREETEXTTABLE specify freetext-type full-text queries that return a relevance ranking value (RANK) and full-text key (KEY) for each row.
They give the following syntax:
FREETEXTTABLE (table , { column_name | (column_list) | * }
,'freetext_string'
[ , LANGUAGE language_term ]
[ ,top_n_by_rank ] )
So yes, what Alex K. said as well.
If you created a FULLTEXT INDEX on different columns, then you can simple use CONTAINS or FREETEXT to look on one of them, all of them, or some of them. Like this:
SELECT *
FROM YourTable
WHERE CONTAINS(*, #SearchTerm);
If you want to look on all the columns that are included in the FULLTEXT INDEX. or:
SELECT *
FROM YourTable
WHERE CONTAINS((ProductName, ProductNumber, Color), #SearchTerm);
If you want to specify the columns that you want to search. If you need the results in one column, you are gonna have to do a UNION and do a search for every column you want to be searched.
SELECT *
FROM YourTable
WHERE CONTAINS(ProductName, #SearchTerm)
UNION
SELECT *
FROM YourTable
WHERE CONTAINS(ProductNumber, #SearchTerm)
UNION
SELECT *
FROM YourTable
WHERE CONTAINS(Color, #SearchTerm)

How to search multiple columns in MySQL?

I'm trying to make a search feature that will search multiple columns to find a keyword based match. This query:
SELECT title FROM pages LIKE %$query%;
works only for searching one column, I noticed separating column names with commas results in an error. So is it possible to search multiple columns in mysql?
If it is just for searching then you may be able to use CONCATENATE_WS.
This would allow wild card searching.
There may be performance issues depending on the size of the table.
SELECT *
FROM pages
WHERE CONCAT_WS('', column1, column2, column3) LIKE '%keyword%'
You can use the AND or OR operators, depending on what you want the search to return.
SELECT title FROM pages WHERE my_col LIKE %$param1% AND another_col LIKE %$param2%;
Both clauses have to match for a record to be returned. Alternatively:
SELECT title FROM pages WHERE my_col LIKE %$param1% OR another_col LIKE %$param2%;
If either clause matches then the record will be returned.
For more about what you can do with MySQL SELECT queries, try the documentation.
If your table is MyISAM:
SELECT *
FROM pages
WHERE MATCH(title, content) AGAINST ('keyword' IN BOOLEAN MODE)
This will be much faster if you create a FULLTEXT index on your columns:
CREATE FULLTEXT INDEX fx_pages_title_content ON pages (title, content)
, but will work even without the index.
1)
select *
from employee em
where CONCAT(em.firstname, ' ', em.lastname) like '%parth pa%';
2)
select *
from employee em
where CONCAT_ws('-', em.firstname, em.lastname) like '%parth-pa%';
First is usefull when we have data like : 'firstname lastname'.
e.g
parth patel
parth p
patel parth
Second is usefull when we have data like : 'firstname-lastname'. In it you can also use special characters.
e.g
parth-patel
parth_p
patel#parth
Here is a query which you can use to search for anything in from your database as a search result ,
SELECT * FROM tbl_customer
WHERE CustomerName LIKE '%".$search."%'
OR Address LIKE '%".$search."%'
OR City LIKE '%".$search."%'
OR PostalCode LIKE '%".$search."%'
OR Country LIKE '%".$search."%'
Using this code will help you search in for multiple columns easily
SELECT * FROM persons WHERE (`LastName` LIKE 'r%') OR (`FirstName` LIKE 'a%');
Please try with above query.

How to combine IN operator with LIKE condition (or best way to get comparable results)

I need to select rows where a field begins with one of several different prefixes:
select * from table
where field like 'ab%'
or field like 'cd%'
or field like "ef%"
or...
What is the best way to do this using SQL in Oracle or SQL Server? I'm looking for something like the following statements (which are incorrect):
select * from table where field like in ('ab%', 'cd%', 'ef%', ...)
or
select * from table where field like in (select foo from bar)
EDIT:
I would like to see how this is done with either giving all the prefixes in one SELECT statement, of having all the prefixes stored in a helper table.
Length of the prefixes is not fixed.
Joining your prefix table with your actual table would work in both SQL Server & Oracle.
DECLARE #Table TABLE (field VARCHAR(32))
DECLARE #Prefixes TABLE (prefix VARCHAR(32))
INSERT INTO #Table VALUES ('ABC')
INSERT INTO #Table VALUES ('DEF')
INSERT INTO #Table VALUES ('ABDEF')
INSERT INTO #Table VALUES ('DEFAB')
INSERT INTO #Table VALUES ('EFABD')
INSERT INTO #Prefixes VALUES ('AB%')
INSERT INTO #Prefixes VALUES ('DE%')
SELECT t.*
FROM #Table t
INNER JOIN #Prefixes pf ON t.field LIKE pf.prefix
you can try regular expression
SELECT * from table where REGEXP_LIKE ( field, '^(ab|cd|ef)' );
If your prefix is always two characters, could you not just use the SUBSTRING() function to get the first two characters of "field", and then see if it's in the list of prefixes?
select * from table
where SUBSTRING(field, 1, 2) IN (prefix1, prefix2, prefix3...)
That would be "best" in terms of simplicity, if not performance. Performance-wise, you could create an indexed virtual column that generates your prefix from "field", and then use the virtual column in your predicate.
Depending on the size of the dataset, the REGEXP solution may or may not be the right answer. If you're trying to get a small slice of a big dataset,
select * from table
where field like 'ab%'
or field like 'cd%'
or field like "ef%"
or...
may be rewritten behind the scenes as
select * from table
where field like 'ab%'
union all
select * from table
where field like 'cd%'
union all
select * from table
where field like 'ef%'
Doing three index scans instead of a full scan.
If you know you're only going after the first two characters, creating a function-based index could be a good solution as well. If you really really need to optimize this, use a global temporary table to store the values of interest, and perform a semi-join between them:
select * from data_table
where transform(field) in (select pre_transformed_field
from my_where_clause_table);
You can also try like this, here tmp is temporary table that is populated by the required prefixes. Its a simple way, and does the job.
select * from emp join
(select 'ab%' as Prefix
union
select 'cd%' as Prefix
union
select 'ef%' as Prefix) tmp
on emp.Name like tmp.Prefix

Alternative SQL ways of looking up multiple items of known IDs?

Is there a better solution to the problem of looking up multiple known IDs in a table:
SELECT * FROM some_table WHERE id='1001' OR id='2002' OR id='3003' OR ...
I can have several hundreds of known items. Ideas?
SELECT * FROM some_table WHERE ID IN ('1001', '1002', '1003')
and if your known IDs are coming from another table
SELECT * FROM some_table WHERE ID IN (
SELECT KnownID FROM some_other_table WHERE someCondition
)
The first (naive) option:
SELECT * FROM some_table WHERE id IN ('1001', '2002', '3003' ... )
However, we should be able to do better. IN is very bad when you have a lot of items, and you mentioned hundreds of these ids. What creates them? Where do they come from? Can you write a query that returns this list? If so:
SELECT *
FROM some_table
INNER JOIN ( your query here) filter ON some_table.id=filter.id
See Arrays and Lists in SQL Server 2005
ORs are notoriously slow in SQL.
Your question is short on specifics, but depending on your requirements and constraints I would build a look-up table with your IDs and use the EXISTS predicate:
select t.id from some_table t
where EXISTS (select * from lookup_table l where t.id = l.id)
For a fixed set of IDs you can do:
SELECT * FROM some_table WHERE id IN (1001, 2002, 3003);
For a set that changes each time, you might want to create a table to hold them and then query:
SELECT * FROM some_table WHERE id IN
(SELECT id FROM selected_ids WHERE key=123);
Another approach is to use collections - the syntax for this will depend on your DBMS.
Finally, there is always this "kludgy" approach:
SELECT * FROM some_table WHERE '|1001|2002|3003|' LIKE '%|' || id || '|%';
In Oracle, I always put the id's into a TEMPORARY TABLE to perform massive SELECT's and DML operations:
CREATE GLOBAL TEMPORARY TABLE t_temp (id INT)
SELECT *
FROM mytable
WHERE mytable.id IN
(
SELECT id
FROM t_temp
)
You can fill the temporary table in a single client-server roundtrip using Oracle collection types.
We have a similar issue in an application written for MS SQL Server 7. Although I dislike the solution used, we're not aware of anything better...
'Better' solutions exist in 2008 as far as I know, but we have Zero clients using that :)
We created a table valued user defined function that takes a comma delimited string of IDs, and returns a table of IDs. The SQL then reads reasonably well, and none of it is dynamic, but there is still the annoying double overhead:
1. Client concatenates the IDs into the string
2. SQL Server parses the string to create a table of IDs
There are lots of ways of turning '1,2,3,4,5' into a table of IDs, but the Stored Procedure which uses the function ends up looking like...
CREATE PROCEDURE my_road_to_hell #IDs AS VARCHAR(8000)
AS
BEGIN
SELECT
*
FROM
myTable
INNER JOIN
dbo.fn_split_list(#IDs) AS [IDs]
ON [IDs].id = myTable.id
END
The fastest is to put the ids in another table and JOIN
SELECT some_table.*
FROM some_table INNER JOIN some_other_table ON some_table.id = some_other_table.id
where some_other_table would have just one field (ids) and all values would be unique