How to disable word stemming in SQL Server Full Text Search? - sql

I need to do a Full Text Search with NO word stemming...I tried to wrap the term I'm searching in double quotes, but no joy... I still get results like "bologna" when I search for '"bolognalo"'
Any help is appreciated..

Switch from using FREETEXT to CONTAINS.
I assume that you're currently using FREETEXT because stemming is automatically applied to FREETEXT queries, whereas CONTAINS doesn't use stemming by default.
A second, inferior, option is to specify language neutrality in your FREETEXT query:
SELECT *
FROM my_table
WHERE FREETEXT(my_column, 'my search', LANGUAGE 0x0)
If you use this then no other language-specific rules will be applied either (eg, word breaking, stopwords etc).

After too many days spend in try, finally
I can do this:
I recreate catalog setting the language to 0 (neutral)
CREATE FULLTEXT INDEX ON table_name
(DescriptionField LANGUAGE 0)
KEY INDEX idx_DescriptionField
ON catalog_name
and after in each query with contains I set the language to 0
select * from table_name where contains(DescriptionField,'bolognolo',LANGUAGE 0)
Before I couldn't do this because I didn't do the first step
Thank you very much!

Maybe setting the language of the fulltextindex to neutral will do the trick...
(Though, then you'd never get stemming at all...)

Related

SQL Server Full Text Search matches part of a word, even without wildcard

Take this query:
SELECT * FROM Books
WHERE CONTAINS(([Description], ReverseDescription), '"øgle"')
And these two text for the columns being search:
http://textuploader.com/5bg5r
http://textuploader.com/5bg59
Why does that one match? I cannot for find an exact match in either of those texts. And as far as I know only the partial match should show up if I use the following query:
SELECT * FROM Books
WHERE CONTAINS(([Description], ReverseDescription), '"øgle*"')
Anyone know what's going on?
Full-Text works on selected language grammar and vocabulary basis, not on simple character comparison like LIKE would do. Each language defines stemmers and word breakers. I can't say weather øgle is a full word by itself and how is your FT index treating that ø. My suspicion is that your index is not created with Danish language rules. If your index is indeed using the correct language, then you need to check the stemmer and breakers rules in use for that language.
Update
Actually I think is simpler. The presence of "" makes the search term a prefix term, event without an *. MSDN is a bit ambiguous here, because for example in Performing Prefix Searches it states:
When the prefix term is a phrase, each token making up the phrase is considered a separate prefix term. All rows that have words beginning with the prefix terms will be returned. For example, the prefix term "light bread*" will find rows with text of either "light breaded," "lightly breaded," or "light bread", but will not return "Lightly toasted bread".
Note how light in the example is a prefix and does not require light*. I do not have a system to test, so is a bit of speculation on my side, but I suspect that CONTAINS will consider "øgle" as a case insensitive prefix search and then your text contains two matches for Øgledronning and Øgledronningens.
Change COLLATE Latin1_General_CS_AS
for example query will look like
SELECT * FROM Books
WHERE CONTAINS(([Description], ReverseDescription), '"øgle*"')
AND [Description] COLLATE Latin1_General_CS_AS LIKE '%"øgle*"%'

Sql full text search for 'c' in a candidate table

I have found this on microsoft support (http://technet.microsoft.com/en-us/library/ms142547(v=SQL.105).aspx) :
SELECT candidate_name,SSN FROM candidates WHERE CONTAINS(candidate_resume,”SQL Server”)
Is it possible to call the script with "c" (candidate that have learned C language)?
Because c is single character, it doesn't work with fulltext search. Only if I set stoplist to off as following, the script returns data:
ALTER FULLTEXT INDEX ON [ADNEOM.BE].[adneom].[T_CANDIDATE] SET STOPLIST = OFF
But the data returned include c# and c++ too, and I want only C.
In addition, I don't think that disable system stoplist is a good idea.
From technical point of view, I see nothing that prevents you from doing a fulltext search on "c" character, except for minimal pattern length threshold mentioned in the question, so the answer is Yes, it is possible.
I assume that candidate_resume is a text field and like operator is applicable, so you may consider another solution:
select candidate_name, SSN from candidates where candidate_resume like '%C%';
But, before doing that you should consider that searching by one letter will give you tons of false-positive results. For example, my answer contains it 14 times.
If you have detailed wanted and unwanted patterns list, you can add it to the query:
--positive cases list
... where (candidate_resume like '%c/%' or candidate_resume like '%c,%' or ...
--negative cases list
...) and candidate_resume not like '%C#% and candidate_resume not like '%mvc% and ...
Disclaimer: Having lots of such clauses will slow down your query.
Disclaimer: Maybe you'll need "%c/%", not '%c/%', I don't remember unfortunately. In that case feel free to edit my post or add a comment so I fix it.

Postgresql prefix wildcard for full text

I am trying to run a fulltext query using Postgresql that can cater for partial matches using wildcards.
It seems easy enough to have a postfix wildcard after the search term, however I cannot figure out how to specify a prefix wildcard.
For example, I can perform a postfix search easily enough using something like..
SELECT "t1".*
FROM "t1"
WHERE (to_tsvector('simple', "t1"."city") ## to_tsquery('simple', 'don:*') )
should return results matching "London"
However I cant seem to do a prefix search like...
SELECT "t1".*
FROM "t1"
WHERE (to_tsvector('simple', "t1"."city") ## to_tsquery('simple', ':*don') )
Ideally I'd like to have a wildcard prefixed to the front and end of the search term, something like...
SELECT "t1".*
FROM "t1"
WHERE (to_tsvector('simple', "t1"."city") ## to_tsquery('simple', ':*don:*') )
I can use a LIKE condition however I was hoping to benefit from the performance of the full text search features in Postgres.
Full text search is good for finding words, not substrings.
For substring searches you'd better use like '%don%' with pg_trgm extension available from PostgreSQL 9.1 and using gin (column_name gin_trgm_ops) or using gist (column_name gist_trgm_ops) indexes. But your index would be very big (even several times bigger than your table) and write performance not very good.
There's a very good example of using pg_trgm for substring search on select * from depesz blog.
One wild and crazy way of doing it would be to create a tsvector index of all your documents, reversed. And reverse your queries for postfix search too.
This is essentially what Solr does with its ReversedWildcardFilterFactory
select
reverse('brown fox')::tsvector ## (reverse('rown') || ':*')::tsquery --true

SQL: transform full-text search into like construction

I've got stored procedure that performs search using full-text indexes in general case. But I can't build full-text index for one field, and I need to use LIKE construction.
So, the problem is: parameter could be
"a*" or "b*"
like parameter for CONTAINS command.
Сan anyone give a good solution, how to transform this parameter for LIKE construction.
Thank you.
P.S: I use MSSQL Server
Depending on the full-text search constructs you want to support, this is generally impossible.
According to MSDN, full-text search syntax on SQL Server supports these constructs:
One or more specific words or phrases (simple term)
something along LIKE '%[,;.-()!? ]Term[,;.-()!? ]%'
A word or a phrase where the words begin with specified text (prefix term)
something along LIKE '%[,;.-()!? ]Term%'
Inflectional forms of a specific word (generation term)
Not possible
A word or phrase close to another word or phrase (proximity term)
Not possible
Synonymous forms of a specific word (thesaurus)
Not possible
Words or phrases using weighted values (weighted term)
Not possible
Those which I have marked "not possible" can't really be translated to LIKE queries, but of course you could get inventive (using your own stemming algorithm for inflectional forms, or your own thesaurus for synonyms) to support at least some of those.
In the end, you will probably need to use dynamic SQL.
Here is a way you can get the correct WHERE clause, given that input:
declare #str varchar(255) = '"a*" or "b*"';
with const as (select 'col' as col)
select col+' like '+replace(replace(REPLACE(#str, '"', ''''), '*', '%'), 'or ', 'or '+COL+' like ') as WhereClause
from const
The "const" is just a table with one column to specify your column name. It allows it to be specified in one place.
This just does replaces to get the correct syntax for LIKE. Of course, this would be more complex to support more functionality from CONTAINS.
Thanks to everyone!
Unfortunately expression parsing is not enough for general case.
I use regular expressions in MS SQL SERVER
http://anastasiosyal.com/POST/2008/07/05/REGULAR-EXPRESSIONS-IN-MS-SQL-SERVER-USING-CLR.ASPX

SQL Contains - only match at start

For some reason I cannot find the answer on Google! But with the SQL contains function how can I tell it to start at the beginning of a string, I.e I am looking for the full-text equivalent to
LIKE 'some_term%'.
I know I can use like, but since I already have the full-text index set up, AND the table is expected to have thousands of rows, I would prefer to use Contains.
Thanks!
You want something like this:
Rather than specify multiple terms, you can use a 'prefix term' if the
terms begin with the same characters. To use a prefix term, specify
the beginning characters, then add an asterisk (*) wildcard to the end
of the term. Enclose the prefix term in double quotes. The following
statement returns the same results as the previous one.
-- Search for all terms that begin with 'storm'
SELECT StormID, StormHead, StormBody FROM StormyWeather
WHERE CONTAINS(StormHead, '"storm*"')
http://www.simple-talk.com/sql/learn-sql-server/full-text-indexing-workbench/
You can use CONTAINS with a LIKE subquery for matching only a start:
SELECT *
FROM (
SELECT *
FROM myTable WHERE CONTAINS('"Alice in wonderland"')
) AS S1
WHERE S1.edition LIKE 'Alice in wonderland%'
This way, the slow LIKE query will be run against a smaller set
The only solution I can think of it to actually prepend a unique word to the beginning of every field in the table.
e.g. Update every row so that 'xfirstword ' appears at the start of the text (e.g. Field1). Then you can search for CONTAINS(Field1, 'NEAR ((xfirstword, "TERM*"),0)')
Pretty crappy solution, especially as we know that the full text index stores the actual position of each word in the text (see this link for details: http://msdn.microsoft.com/en-us/library/ms142551.aspx)
I am facing the similar issue. This is what I have implemented as a work around.
I have made another table and pulled only the rows like 'some_term%'.
Now, on this new table I have implemented the FullText search.
Please do inform me if you tried some other better approach