Performance comparison: SQL query using in and like - sql

Which query runs faster?
Considering it returns 200 records and authorname is atleast 20 chars long and authorname is fulltext indexed
select * from quotestable
where quotesauthor like (select Authorname from Authortable where authorid =45)
.
select * from quotestable
where quotesauthor in (select Authorname from Authortable where authorid =45)

It's not a question of "faster". They have different meanings.
The first query can only run if the subquery returns 0 or 1 records (and should normally use TOP 1 to guarantee this). However, it can do wildcard matching on the results. The second query can run if the subquery returns any number of records, but will not do wildcard matching.
It sounds like what you should really have here is a JOIN:
SELECT q.*
FROM quotestable q
INNER JOIN AuthorTable a ON q.quotesauthor = a.authorname
WHERE a.authorid = 45
... assuming of course that AuthorID or AuthorName is unique in AuthorTable. This will also allow to use LIKE with wildcards for the matching condition, in the case where the quotesauthor field might not always be a direct match with AuthorTable.AuthorName.
While I'm here, it's also strange to me that AuthorName would be full-text indexed. A traditional index, rather than fulltext, would be more helpful for this query. The only reason to use fulltext here is if you have full names like 'John Milton' in that field, and want to be able to do things like search on last name only or first name only. But even in that case, it seems like you'd be much better served by storing those as their own fields and removing the fulltext index. Fulltext indexes work best on longer fields, like descriptions or articles/posts.

Not the same. One test equality, other test like expression...
try this for equality
select * from quotestable f1
where exists
(
select * from Authortable f2
where f2.authorid =45 and f1.quotesauthor =f2.Authorname
)
try this for like
select * from quotestable f1
where exists
(
select * from Authortable f2
where f2.authorid =45 and f1.quotesauthor like '%' + f2.Authorname + '%'
)

Related

Ensuring two columns only contain valid results from same subquery

I have the following table:
id symbol_01 symbol_02
1 abc xyz
2 kjh okd
3 que qid
I need a query that ensures symbol_01 and symbol_02 are both contained in a list of valid symbols. In other words I would needs something like this:
select *
from mytable
where symbol_01 in (
select valid_symbols
from somewhere)
and symbol_02 in (
select valid_symbols
from somewhere)
The above example would work correctly, but the subquery used to determine the list of valid symbols is identical both times and is quite large. It would be very innefficient to run it twice like in the example.
Is there a way to do this without duplicating two identical sub queries?
Another approach:
select *
from mytable t1
where 2 = (select count(distinct symbol)
from valid_symbols vs
where vs.symbol in (t1.symbol_01, t1.symbol_02));
This assumes that the valid symbols are stored in a table valid_symbols that has a column named symbol. The query would also benefit from an index on valid_symbols.symbol
You could try use a CTE like;
WITH ValidSymbols AS (
SELECT DISTINCT valid_symbol
FROM somewhere
)
SELECT mt.*
FROM MyTable mt
INNER JOIN ValidSymbols v1
ON mt.symbol_01 = v1.valid_symbol
INNER JOIN ValidSymbols v2
ON mt.symbol_02 = v2.valid_symbol
From a performance perspective, your query is the right way to do this. I would write it as:
select *
from mytable t
where exists (select 1
from valid_symbols vs
where t.symbol_01 = vs.valid_symbol
) and
exists (select 1
from valid_symbols vs
where t.symbol_02 = vs.valid_symbol
) ;
The important component is that you need an index on valid_symbols(valid_symbol). With this index, the lookup should be pretty fast. Appropriate indexes can even work if valid_symbols is a view, although the effect depends on the complexity of the view.
You seem to have a situation where you have two foreign key relationships. If you explicitly declare these relationships, then the database will enforce that the columns in your table match the valid symbols.

Solution to avoid non-sargable argument in where clause

In the code_list CTE in this query I have a row constructor that will eventually take any number of arguments. The column icd in the patient_codes CTE is a five digit identifier that is most descriptive that the three digit codes that the row constructor has. The table icd_patient has a 100 million rows so for performance's sake, I would like to filer the rows on this table before I do any further work. I have
;with code_list(code_list)
as
(
select x.code_list
from (values ('70700'),('25002')) as x(code_list)
),patient_codes
as
(
select distinct icd,pat_id,id
from icd_patient
where icd in (select icd from code_list)
)
select distinct pat_id from patient_codes
The problem is, however, is that in the icd_patient table all of the icd columns are five digit and more descriptive. If I look at the execution plan of this query it's pretty streamlined. If I do
;with code_list(code_list)
as
(
select x.code_list
from (values ('70700'),('25002')) as x(code_list)
),patient_codes
as
(
select substring(icd,1,3) as icd,pat_id
from icd_patient2
where substring(icd,1,3) in (select * from code_list)
)
select * from patient_codes
this if course has a large performance impact because of the substring expression in the where clause. Does something akin to like in exist so I can take advantage of my indexes?
Index on icd_patient
CREATE NONCLUSTERED INDEX [ix_icd_patient] ON [dbo].[icd_patient2]
(
[pat_id] ASC
)
INCLUDE ( [id],
This much simpler query should be better than (or, at worst, the same as) your existing query.
select pat_id
FROM dbo.icd_patient
where icd LIKE '707%'
OR icd LIKE '250%'
GROUP BY pat_id;
Note that sargability only matters if there is actually an index on this column.
An alternative (since OR can sometimes give the optimizer fits):
SELECT pat_id FROM
(
SELECT pat_id
FROM dbo.icd_patient
WHERE icd LIKE '707%'
UNION ALL
SELECT pat_id
FROM dbo.icd_patient
WHERE icd LIKE '250%'
) AS x
GROUP BY pat_id;
To make this extensible beyond a handful of OR conditions, I would use a table-valued parameter (TVP).
CREATE TYPE dbo.StringPatterns AS TABLE(s VARCHAR(3) PRIMARY KEY);
Then your stored procedure could say:
CREATE PROCEDURE dbo.whatever
#sp dbo.StringPatterns READONLY
AS
BEGIN
SET NOCOUNT ON;
SELECT p.pat_id
FROM dbo.icd_patient AS p
INNER JOIN #sp AS sp
ON p.pat_id LIKE sp.s + '%'
GROUP BY p.pat_id;
END
Then you can pass in your set of three-character substrings from a DataTable or other collection in C#. From T-SQL just as an example:
DECLARE #p dbo.StringPatterns;
INSERT #p VALUES('707'),('250');
EXEC dbo.whatever #sp = #p;
Something like like in does not exist. The following is sargable:
select *
from icd_patient
where icd like '70700%' or
icd like '25002%'
Because like with a constant initial substring is a special case for SQL Server. This does not work when the strings on the right are variables.
One solution is to create an indexed view on the icd_patient table with an index on the first five characters of the icd code.
Using "IN" makes that part of a command non-sargable on both sides. End of discussion.
Saying he fixes it using substring, completely changes what it would return while it remains non sarged.
Any "fix" should exactly match results. The actual fix is to join the cte so the five characters match or put three characters in the cte and match that in a join or put 4 characters in the cte where the fourth is "%" and join matching by using LIKE
Using a "like" that starts with "%" increases the complexity of the search, but it would still use the index to find the value because parsing the index should use less reading by only getting the full table row when a search is successful.

SQL Server Full-Text-Search FREETEXTTABLE search multiple columns

I'm using the below query to return results from a table using Full-Text-Search.
In SQL2000 it was only possible to search one or all columns in a table. Is it possible in SQL 2008?
I would like to search two tables, Problem and Solution (Both indexed and in the same table):
DECLARE #topRank int set #topRank=(SELECT MAX(RANK)
FROM FREETEXTTABLE([Support_Calls], Problem, 'test', 1))
SELECT [ID] AS [Call No],Company_Name, Problem, Solution, CONVERT(VARCHAR(20),CAST((CAST(ftt.RANK as DECIMAL)/#topRank * 100) AS DECIMAL(13,0))) + '%' as Match
FROM [Support_Calls] INNER JOIN FREETEXTTABLE([Support_Calls], Problem, 'test') as ftt ON ftt.[KEY]=[ID] ORDER BY ftt.RANK DESC;
From what I can see the FREETEXTTABLE does not accept more than one column?
You specify them in parentheses; FREETEXTTABLE(tablename, (col1,col2,col3), 'expr') or use an asterisk to seach all columns in the index.
From MSDN,
Returns a table of zero, one, or more rows for those columns containing character-based data types for values that match the meaning, but not the exact wording, of the text in the specified freetext_string. FREETEXTTABLE can only be referenced in the FROM clause of a SELECT statement like a regular table name.
Queries using FREETEXTTABLE specify freetext-type full-text queries that return a relevance ranking value (RANK) and full-text key (KEY) for each row.
They give the following syntax:
FREETEXTTABLE (table , { column_name | (column_list) | * }
,'freetext_string'
[ , LANGUAGE language_term ]
[ ,top_n_by_rank ] )
So yes, what Alex K. said as well.
If you created a FULLTEXT INDEX on different columns, then you can simple use CONTAINS or FREETEXT to look on one of them, all of them, or some of them. Like this:
SELECT *
FROM YourTable
WHERE CONTAINS(*, #SearchTerm);
If you want to look on all the columns that are included in the FULLTEXT INDEX. or:
SELECT *
FROM YourTable
WHERE CONTAINS((ProductName, ProductNumber, Color), #SearchTerm);
If you want to specify the columns that you want to search. If you need the results in one column, you are gonna have to do a UNION and do a search for every column you want to be searched.
SELECT *
FROM YourTable
WHERE CONTAINS(ProductName, #SearchTerm)
UNION
SELECT *
FROM YourTable
WHERE CONTAINS(ProductNumber, #SearchTerm)
UNION
SELECT *
FROM YourTable
WHERE CONTAINS(Color, #SearchTerm)

How to search multiple columns in MySQL?

I'm trying to make a search feature that will search multiple columns to find a keyword based match. This query:
SELECT title FROM pages LIKE %$query%;
works only for searching one column, I noticed separating column names with commas results in an error. So is it possible to search multiple columns in mysql?
If it is just for searching then you may be able to use CONCATENATE_WS.
This would allow wild card searching.
There may be performance issues depending on the size of the table.
SELECT *
FROM pages
WHERE CONCAT_WS('', column1, column2, column3) LIKE '%keyword%'
You can use the AND or OR operators, depending on what you want the search to return.
SELECT title FROM pages WHERE my_col LIKE %$param1% AND another_col LIKE %$param2%;
Both clauses have to match for a record to be returned. Alternatively:
SELECT title FROM pages WHERE my_col LIKE %$param1% OR another_col LIKE %$param2%;
If either clause matches then the record will be returned.
For more about what you can do with MySQL SELECT queries, try the documentation.
If your table is MyISAM:
SELECT *
FROM pages
WHERE MATCH(title, content) AGAINST ('keyword' IN BOOLEAN MODE)
This will be much faster if you create a FULLTEXT index on your columns:
CREATE FULLTEXT INDEX fx_pages_title_content ON pages (title, content)
, but will work even without the index.
1)
select *
from employee em
where CONCAT(em.firstname, ' ', em.lastname) like '%parth pa%';
2)
select *
from employee em
where CONCAT_ws('-', em.firstname, em.lastname) like '%parth-pa%';
First is usefull when we have data like : 'firstname lastname'.
e.g
parth patel
parth p
patel parth
Second is usefull when we have data like : 'firstname-lastname'. In it you can also use special characters.
e.g
parth-patel
parth_p
patel#parth
Here is a query which you can use to search for anything in from your database as a search result ,
SELECT * FROM tbl_customer
WHERE CustomerName LIKE '%".$search."%'
OR Address LIKE '%".$search."%'
OR City LIKE '%".$search."%'
OR PostalCode LIKE '%".$search."%'
OR Country LIKE '%".$search."%'
Using this code will help you search in for multiple columns easily
SELECT * FROM persons WHERE (`LastName` LIKE 'r%') OR (`FirstName` LIKE 'a%');
Please try with above query.

SQL Server: how to optimize "like" queries?

I have a query that searches for clients using "like" with wildcard. For example:
SELECT TOP (10)
[t0].[CLIENTNUMBER],
[t0].[FIRSTNAME],
[t0].[LASTNAME],
[t0].[MI],
[t0].[MDOCNUMBER]
FROM [dbo].[CLIENT] AS [t0]
WHERE (LTRIM(RTRIM([t0].[DOCREVNO])) = '0')
AND ([t0].[FIRSTNAME] LIKE '%John%')
AND ([t0].[LASTNAME] LIKE '%Smith%')
AND ([t0].[SSN] LIKE '%123%')
AND ([t0].[CLIENTNUMBER] LIKE '%123%')
AND ([t0].[MDOCNUMBER] LIKE '%123%')
AND ([t0].[CLIENTINDICATOR] = 'ON')
It can also use less parameters in "where" clause, for example:
SELECT TOP (10)
[t0].[CLIENTNUMBER],
[t0].[FIRSTNAME],
[t0].[LASTNAME],
[t0].[MI],
[t0].[MDOCNUMBER]
FROM [dbo].[CLIENT] AS [t0]
WHERE (LTRIM(RTRIM([t0].[DOCREVNO])) = '0')
AND ([t0].[FIRSTNAME] LIKE '%John%')
AND ([t0].[CLIENTINDICATOR] = 'ON')
Can anybody tell what is the best way to optimize performance of such query? Maybe I need to create an index? This table can have up to 1000K records in production.
To do much for a LIKE where the pattern has the form '%XXX%', you want to look up SQL Server's full-text indexing capability, and use CONTAINS instead of LIKE. As-is, you're doing a full table scan, because a normal index won't help with a search for an item that starts with a wild card -- but a full-text index will.
/* ... */
WHERE (LTRIM(RTRIM([t0].[DOCREVNO])) = '0')
AND (contains([t0].[FIRSTNAME], 'John'))
AND (contains([t0].[LASTNAME], 'Smith'))
AND (contains([t0].[SSN], '123'))
AND (contains([t0].[CLIENTNUMBER],'123'))
AND (contains([t0].[MDOCNUMBER], '123'))
AND ([t0].[CLIENTINDICATOR] = 'ON')
Looks like some databases (PostgreSQL 7.1+, MySQL v3.23.23+, Microsoft-SQL v???, ) already contains such things:
MySQL>> ALTER TABLE articles ADD FULLTEXT(body, title);
MySQL>> SELECT * FROM articles WHERE MATCH(title, body) AGAINST ('PHP')
MS-SQL>> SELECT ProductName FROM Products WHERE FREETEXT (ProductName, 'spread' )
PgSQL>> CREATE FUNCTION fti() RETURNS opaque AS '/path/to/fti.so' LANGUAGE 'C';
PgSQL>> CREATE TABLE articles_fti (string type, id oid);
....
Oracle..., Sybase...