SQL Full-Text Search vs. LIKE Search Performance - sql

I've been searching on the internet and found the Full-Text Search usually have a better performance.
I followed the instructions on this post to set up thesaurus tables on my machine so I can play around with it and get more familiar with full-text search.
I am viewing everything in Microsoft SQL Server Management Studio 2008.
When I run the queries. I notice that my LIKE search was faster than my FREETEXT search, which contradict what I found on most wiki sites/pages.
Below are the query I ran:
select *
from TheThesaurus
where freetext(TheDefinition, 'aspire')
select *
from TheThesaurus
where TheDefinition like '%aspire%'
The LIKE search took 0sec, where the FREETEXT search took 6sec.
The LIKE search returns 70 rows, where FREETEXT search returns 94, which makes FREETEXT search more accurate and better result.
Is there something I'm missing that cause the FREETEXT search to be mush slower than the LIKE search?
I would really like to use FREETEXT search in my program because it returns more hits (collect more data), but the speed was a significant issue.
Thanks for the help!

Have you created a full text index? If not, see CREATE FULLTEXT CATALOG at the MSDN site or this link walks you through it using SQL 2008 http://www.codeproject.com/Articles/29237/SQL-SERVER-2008-Creating-Full-Text-Catalog-and-Ful
Another reason the run times would be different has to do with what the predicates are doing. The LIKE is closer to an exact match. The FREETEXT function "searches for the values that match the meaning of a phrase and not just exact words" so your FREETEXT command is doing more work. That is from "Querying Microsoft SQL Server 2012"

Related

How to find href=blah but not href=/blah with Full-text search

I'm currently using the query
SELECT Url FROM Link WHERE CONTAINS(Url, 'href=blah')
It is including results with href=/blah. Any way I can tell the query to act more like WHERE Url LIKE '%href=blah%' and still use the full-text catalog?
Your problem is that = and / are both word breakers, in other words, sql fulltext is actually searching for href and blah
There are a couple of options you could try. First you could filter down the search domain using the fulltext engine, then search the subset of data using LIKE. You'll need to experiment to see how to squeeze out the best performance.
The other option is, if href=blah is a consistent term you could add that to a custom dictionary. A good article on this is here.

SQL Server 2008 R2 Full Text Search with FORMSOF and Accent Insensitive

I'm using MS SQL Server 2008 R2 with Full Text Search for searching text data stored in different languages.
I'm a bit confused about how CONTAINS predicate works with accents.
When I use the following predicate
CONTAINS([Text], #keywords , Language #language)
on a catalog with ACCENT_SENSITIVITY = OFF the search results are the same for e.g. 'Lächeln' and 'lacheln' when Germany is specified as language.
But if I change the predicate to look like
CONTAINS([Text], FORMSOF(INFLECTIONAL, #keywords) , Language #language)
then results are different and it seems to me that Accent Insensitivity doesn't work with FORMSOF
I've tried to find an answer on MSDN and Google but didn't find anything useful.
Does anybody know why the results are different?
Thanks!
My understanding is that these serve two separate purposes in finding matches for a full-text search. With an accent insensitive catalog there is a simple character equality performed for the term matching so that eñya = enya because 'n' is considered the accent insensitive equivalent of 'ñ'.
With FORMSOF you're requesting that the search perform a stemming operation on the terms so that verb and noun forms will be searched as additional terms in the search. e.g. searching for 'foot' would include 'feet' and 'run' would include 'ran'.
If the FORMSOF seems to be fundamentally not working for your values you may want to make sure that you have the appropriate language support installed for full-text languages.
SELECT * FROM sys.fulltext_languages
If you haven't had a chance to review MSDN the SQL Word Breakers documentation may shed some light on the observed behavior. http://msdn.microsoft.com/en-us/library/ms142509.aspx
FORMSOF cuts diacritics from Your word:
SELECT * FROM sys.dm_fts_parser(N'FORMSOF(INFLECTIONAL, "Lächeln")', 1031, 0, 1)
check column "display_term".

Searching numeric strings with Full-Text Search in SQL 2005

I'm using the SQL Full-Text Search and have a stored proceedure that uses the FREETEXTTABLE function.
This all works great, however, I have noticed that if I search for something such as 'Chapter 19' the 19 seems as if it is thrown away and the search only searches on 'Chapter'.
Also if I search for just '19' I get no results. I know the columns I have indexed contain a '19' in multiple rows.
Is this the intended behaviour? To not index numerics?
If so, then I suppose I'll have to live with it, but if not I'll be happy to post any T-SQL if anyone thinks I'm doing anything wrong.
Thanks.
P.S. I've googled this and have found nothing on searching numerics will full-text search.
I eventually found the reason behind this.
Numerics are considered as noise words in SQL server. You can allow searching on numerics by removing the numeric entries in the appropriate noise file for your language.
Noise files are found at in the FTData directoraty of your SQL Server install.
The english noise files are: noiseENU.txt & noiseENG.txt
Hope this helps someone.

What is the SQL used to do a search similar to "Related Questions" on Stackoverflow

I am trying to implement a feature similar to the "Related Questions" on Stackoverflow.
How do I go about writing the SQL statement that will search the Title and Summary field of my database for similar questions?
If my questions is: "What is the SQL used to do a search similar to "Related Questions" on Stackoverflow".
Steps that I can think of are;
Strip the quotation marks
Split the sentence into an array of words and run a SQL search on each word.
If I do it this way, I am guessing that I wouldn't get any meaningful results. I am not sure if Full Text Search is enabled on the server, so I am not using that. Will there be an advantage of using Full Text Search?
I found a similar question but there was no answer: similar question
Using SQL 2005
Check out this podcast.
One of our major performance
optimizations for the “related
questions” query is removing the top
10,000 most common English dictionary
words (as determined by Google search)
before submitting the query to the SQL
Server 2008 full text engine. It’s
shocking how little is left of most
posts once you remove the top 10k
English dictionary words. This helps
limit and narrow the returned results,
which makes the query dramatically
faster.
They probably relate based on tags that are added to the questions...
After enabling Full Text search on my SQL 2005 server, I am using the following stored procedure to search for text.
ALTER PROCEDURE [dbo].[GetSimilarIssues]
(
#InputSearch varchar(255)
)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
DECLARE #SearchText varchar(500);
SELECT #SearchText = '"' + #InputSearch + '*"'
SELECT PostId, Summary, [Description],
Created
FROM Issue
WHERE FREETEXT (Summary, #SearchText);
END
I'm pretty sure it would be most efficient to implement the feature based on the tags associated with each post.
It's probably done using a full text search which matches like words/phrases. I've used it in MySQL and SQL Server with decent success with out of the box functionality.
You can find more on MySQL full text searches at:
http://dev.mysql.com/doc/refman/5.1/en/fulltext-search.html
Or just google Full Text search and you will find a lot of information.
It looks keyword based on the title you enter, queried against titles and content of other questions. Probably easier (and more appropriate) to do in Lucene (or similar) then in a relational database.
I'd say it's probably a full text search on the question title and the question content and answers as well using the individual words (not the whole title) you enter. Then, using the ranking features of full-text, the top 10 or so questions that rank the highest are displayed.
As tydok pointed out, it looks like they are using full-text searching (I couldn't imagine any other way).
Here's the MSDN reference on Full-Text Searching, nailing the specific query used probably isn't going to happen.
The SQL very well may be just "SELECT * FROM questions;". I find it hard to imagine that the algorithm for finding similar questions is implemented in SQL.

Best way to implement a stored procedure with full text search

I would like to run a search with MSSQL Full text engine where given the following user input:
"Hollywood square"
I want the results to have both Hollywood and square[s] in them.
I can create a method on the web server (C#, ASP.NET) to dynamically produce a sql statement like this:
SELECT TITLE
FROM MOVIES
WHERE CONTAINS(TITLE,'"hollywood*"')
AND CONTAINS(TITLE, '"square*"')
Easy enough. HOWEVER, I would like this in a stored procedure for added speed benefit and security for adding parameters.
Can I have my cake and eat it too?
I agreed with above, look into AND clauses
SELECT TITLE
FROM MOVIES
WHERE CONTAINS(TITLE,'"hollywood*" AND "square*"')
However you shouldn't have to split the input sentences, you can use variable
SELECT TITLE
FROM MOVIES
WHERE CONTAINS(TITLE,#parameter)
by the way
search for the exact term (contains)
search for any term in the phrase (freetext)
The last time I had to do this (with MSSQL Server 2005) I ended up moving the whole search functionality over to Lucene (the Java version, though Lucene.Net now exists I believe). I had high hopes of the full text search but this specific problem annoyed me so much I gave up.
Have you tried using the AND logical operator in your string? I pass in a raw string to my sproc and stuff 'AND' between the words.
http://msdn.microsoft.com/en-us/library/ms187787.aspx