I'm using the SQL Full-Text Search and have a stored proceedure that uses the FREETEXTTABLE function.
This all works great, however, I have noticed that if I search for something such as 'Chapter 19' the 19 seems as if it is thrown away and the search only searches on 'Chapter'.
Also if I search for just '19' I get no results. I know the columns I have indexed contain a '19' in multiple rows.
Is this the intended behaviour? To not index numerics?
If so, then I suppose I'll have to live with it, but if not I'll be happy to post any T-SQL if anyone thinks I'm doing anything wrong.
Thanks.
P.S. I've googled this and have found nothing on searching numerics will full-text search.
I eventually found the reason behind this.
Numerics are considered as noise words in SQL server. You can allow searching on numerics by removing the numeric entries in the appropriate noise file for your language.
Noise files are found at in the FTData directoraty of your SQL Server install.
The english noise files are: noiseENU.txt & noiseENG.txt
Hope this helps someone.
Related
I have a Cloudant database with a search index. In the search index I index the titles of my documents. For instance, search for 'rijkspersoneel':
http://wetten.cloudant.com/regelingen/_design/RegelingInfo/_search/regeling?q=title:rijkspersoneel
Returns 48 rows.
However, when I replace the 'o' with a ? wildcard:
http://wetten.cloudant.com/regelingen/_design/RegelingInfo/_search/regeling?q=title:rijkspers?neel
I get 0 results. Why is that? The Cloudant docs say that this should match 'rijkspersoneel' as well!
My previous answer was definitely mistaken. Internal wildcads do appear to be supported. Try:
title:rijkspe*on*
title rijksper?on*
Fairly sure what is happening here is an analysis issue. Fairly sure you are using a stemming analyzer. I'm not really all the familiar with cloudant and their implementation of this, but in Lucene, wildcard queries are not subject to the same analysis as term queries. I'm guessing that your analysis of this field includes a stemmer, in which case "rijkspersoneel" is actually indexed as "rijkspersone".
So, when you search for
rijkspersonee*
or
rijkper?oneel
Since the "el" is missing from the end in the indexed form, you find no matches. When just searching for rijkpersoneel it does get analyzed though, and you search for the stemmed form of the word, and will find matches.
Stemming and wildcards just don't get along.
I've been searching on the internet and found the Full-Text Search usually have a better performance.
I followed the instructions on this post to set up thesaurus tables on my machine so I can play around with it and get more familiar with full-text search.
I am viewing everything in Microsoft SQL Server Management Studio 2008.
When I run the queries. I notice that my LIKE search was faster than my FREETEXT search, which contradict what I found on most wiki sites/pages.
Below are the query I ran:
select *
from TheThesaurus
where freetext(TheDefinition, 'aspire')
select *
from TheThesaurus
where TheDefinition like '%aspire%'
The LIKE search took 0sec, where the FREETEXT search took 6sec.
The LIKE search returns 70 rows, where FREETEXT search returns 94, which makes FREETEXT search more accurate and better result.
Is there something I'm missing that cause the FREETEXT search to be mush slower than the LIKE search?
I would really like to use FREETEXT search in my program because it returns more hits (collect more data), but the speed was a significant issue.
Thanks for the help!
Have you created a full text index? If not, see CREATE FULLTEXT CATALOG at the MSDN site or this link walks you through it using SQL 2008 http://www.codeproject.com/Articles/29237/SQL-SERVER-2008-Creating-Full-Text-Catalog-and-Ful
Another reason the run times would be different has to do with what the predicates are doing. The LIKE is closer to an exact match. The FREETEXT function "searches for the values that match the meaning of a phrase and not just exact words" so your FREETEXT command is doing more work. That is from "Querying Microsoft SQL Server 2012"
Given your data stored somewhere in a database:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:
Dinosaurs
And the SQL:
WHERE articleBody LIKE '%Dinosaurs%'
Copes fine with returning the correct set of records.
How would we cope however, if a user mispells dinosaurs? IE:
Dinosores
(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.
Any way programatically?
Edit
Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:
Dinosores wrocks
returns records instead of doing:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
which would return squadoosh?
If you're using SQL Server, have a look at SOUNDEX.
For your example:
select SOUNDEX('Dinosaurs'), SOUNDEX('Dinosores')
Returns identical values (D526) .
You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).
SELECT DIFFERENCE('Dinosaurs', 'Dinosores'); --returns 4
Edit:
After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by #Neil Knight (+1 to that, for me!).
This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.
Perhaps your RDBMS has a SOUNDEX function? You didn't mention which one was involved here.
SQL Server's SOUNDEX
Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.
SSIS Fuzzy Lookup
I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.
Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help:
http://msdn.microsoft.com/en-us/library/ms142491.aspx
Here is another SO question from someone setting up a thesaurus to handle common misspellings:
FORMSOF Thesaurus in SQL Server
Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.
Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.
You can also try the SubString(), to eliminate the first 3 or so characters . Below is an example of how that can be achieved
SELECT Fname, Lname
FROM Table1 ,Table2
WHERE substr(Table1.Fname, 1,3) || substr(Table1.Lname,1 ,3) = substr(Table2.Fname, 1,3) || substr(Table2.Lname, 1 , 3))
ORDER BY Table1.Fname;
I've search for answers for this and I can't seem to find an answer to what should be somewhat simple.
This is related to another question I asked, but it's different. What's the best way to take a user's search phrase and throw it into a CONTAINSTABLE(table, column, #phrase, topN ) phrase?
Say, for example the user inputs: Books by "Dr. Seuss"
What's the best way to turn that into something that will return results in my ContainsTAble() phrase?
I was previously parsing the search phrase and writing something like ISABOUT("Books" WEIGHT(1.0), "by" WEIGHT(0.9), "Dr. Seuss" WEIGHT(0.8)) as my #phrase but ISABOUT seems to be returning odd results... especially when one word searches are entered.
Any Ideas?
We've implemented a slightly modified version of the code found in this article on SQL Server Central. It uses the Irony Compiler Construction Kit from Codeplex.
There was a bug in the original version when starting any search query with a reserved word. For example, by searching for 'Orange', it would recognise the OR term and expect binary operands which weren't supplied. This was fixed in some code provided in the discussion forum on the article which is now up to 13 pages!
I seem to have a weird bug in Microsoft SQL Server 2005 where FREETEXT() searches are somewhat case-sensitive despite the collation being case-insensitive (Latin1_General_CI_AS).
First of, LIKE queries are perfectly case-insensitive, so
WHERE column LIKE '%word%'
and
WHERE column LIKE '%Word%'
return the same results.
Also, FREETEXT are infact case-insensitive to some extent, for instance
WHERE FREETEXT(column, 'Word')
will return results with different cases.
BUT
WHERE FREETEXT(column, 'word')
while still returning case-insensitive matches for word, gives a different resultset.
Or, as I found out after some investigation, searching for word gives all matches for different cases of word but searching for Word gives the same PLUS inflectional results.
Or to use one of the actual cases I found, searching for marketingleader returns all results containing that word, independent of the case, whereas searching for Marketingleader would return those, but also results that just contain leader that don't show up when searching for the lower case.
has anyone got any Idea as to what is causing this and how I could turn on inflectional/fuzzy searching for lower-case words as well?
Any help would be appreciated.
Use the alternative to freetext which is contains and the inflectional results are optional ..
CONTAINS (Transact-SQL)
.. oups just saw that you mention contains in your question, but does it behave the same way as the freetext in the provided examples ?