Seperate between words in Sql server full text search - sql

I use sql server 2012 and have full text search on Description field of Document table.
I can't find numbers that stick to alphabet characters with following query:
select *
from document
where contains (Description,'"123"')
how can I separate between numbers and alphabets on.
some of numbers should not separated such as date 2014/01/01
I have Persian description.
thanks in advance

Word breakers are the most effective way to handle this scenario. They define where the word boundaries exist when text is being indexed. (Example: the default word-breaker indexes "the dog-catcher" as [the] [dog] [catcher]). You can create a custom word breaker to split digits and letters (example: "123abc" becomes [123] [abc]). This will enable a search for "123" to match "123abc".
Unfortunately I don't have an example to show. I recommend you start with the link above and also look at the Windows Search SDK docs starting with the links below. (MSSQL and Windows Search both use the same technology.)
Windows Search Developer's Guide
Windows Search: Extending the Index

Have you tried the NEAR search?
select *
from document
where contains (Description,'123 NEAR abc')

Related

Search for part of the word in the phrase with full text search in SQL Server 2016

In the Microsoft SQL Server, our searches are limited to starting words when we use a full-text search to search for values. That is, we cannot search contains the word looks like the LIKE operator in the middle.
I try to execute this query but the result is not my opinion.
I want to search for the middle of the term. For example, if my term is "Microsoft" and my query is :
SELECT *
FROM dbo.SMS_Outbox
WHERE CONTAINS(MessageText, N'"*soft*"')
There is no result returned!
The documentation is quite clear that wildcards are allowed only at the end of search terms:
The CONTAINS predicate supports the use of the asterisk (*) as a wildcard character to represent words and phrases. You can add the asterisk only at the end of the word or phrase. The presence of the asterisk enables the prefix-matching mode. In this mode, matches are returned if the column contains the specified search word followed by zero or more other characters.
You cannot do what you want easily. One simple option is to switch to LIKE and take the performance hit:
WHERE MessageText LIKE N'%soft%'
Another option might be to parse your text in such a way that soft is always at the beginning of a search term.

How to use Lucene Luke for testing search results on more than one field?

I am using Lucene Luke to test search index results and noticed that I cannot select more than one field in 'Default field' drop down list. Is this by design or we cannot use Luke tool for searching against multiple fields?
Basically I would like to know SOLR qf(query field) equivalent in Lucene.
Thanks
You can search using format field:query.
For details see: https://lucene.apache.org/core/8_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#package.description
Lucene supports fielded data. When performing a search you can either
specify a field, or use the default field. The field names and default
field is implementation specific.
You can search any field by typing the field name followed by a colon
":" and then the term you are looking for.
As an example, let's assume a Lucene index contains two fields, title
and text and text is the default field. If you want to find the
document entitled "The Right Way" which contains the text "don't go
this way", you can enter:
title:"The Right Way" AND text:go or
title:"The Right Way" AND go Since text is the default field, the
field indicator is not required.
Note: The field is only valid for the term that it directly precedes,
so the query
title:The Right Way Will only find "The" in the title field. It will
find "Right" and "Way" in the default field (in this case the text
field).

Full Text Search SQL can't find digital value in nvarchar field

I have a Stored Procedure, which uses full-text search for my nvarchar fields. And I'm stuck when I realized, that Full-Text Search can't find field if I type only numeric values of this field.
For example, I have field Name in my table with value 'Request_121'
If I type Запрос_120 or Request - it's okay
If I type 120 - nothing is found
What is going on?
Screenshots:
No results found: https://gyazo.com/9e9e061ce68432c368db7e9162909771
Results found: https://gyazo.com/e4cb9a06da5bf8b9f4d702c55e7f181e
You cannot find 121 word part in your full-indexed column because SQL Server treats Request_121 as a single term. You can verify this by running the fts parser manually:
select * from sys.dm_fts_parser('"Request_121"', 1033, 0, 0)
Returns:
while running:
select * from sys.dm_fts_parser('"Request 121"', 1033, 0, 0)
Returns:
Note, in the second example 121 was picked as separate search term.
What you could do is to try using wildcards in your FTS query like:
FROM dbo.CardSearchIndexes idx WHERE CONTAINS(idx.Name, '"121*"');
However, again I doubt it will pick 121 being inside a non-breakable word part, only if you have 121 as standalone word. Play with sys.dm_fts_parser to see how SQL FTS engine breaks up your input and adjust your query accordingly.
UPDATE: I've noticed that you use Cyrillic search terms together with English. Notice, when running FTS queries it's also important to know what Language was specified when FTS index was created for Name column. If the FTS language locale is Cyrillic then it will not find English term Request in the Name column.
Note, in my dm_fts_parser examples above I have used 1033 (English) language id. Examine the LANGUAGE language_term operator in your CREATE FULLTEXT INDEX statement to check what language was used for FTS index.
I have field Name in my table with value 'Request_121'
Your query is wrong, you have a typo, write 121 instead of 120
FROM dbo.CardSearchIndexes idx WHERE CONTAINS(idx.Name, '121');

Word search within texts to find text that contains most matched variant

I want to find a way to find most suitable row from table which contains a word that is most similar to the word i'm entering. any idea? (I'm using OCR that finds words not exactly the same sometimes reads word 'specific' as 'spccific')
If you are using Oracle then you can try UTL_MATCH which uses something known as the Levenshtein Distance to calculate the minimum number of edits to transform one string into another. Other systems may have something similar or you can use the alogrithm as a starting point for your own function.
Maybe you can use the SOUNDEX functionality (SQL Server) or SOUNDS LIKE (MySQL) if it is available with the SQL engine you are using.

Why doesn't SQL Full Text Indexing return results for words containing #?

For instance, my query is like the following using SQL Server 2005:
SELECT * FROM Table WHERE FREETEXT(SearchField, 'c#')
I have a full text index defined to use the column SearchField which returns results when using:
SELECT * FROM Table WHERE SearchField LIKE '%c#%'
I believe # is a special letter, so how do I allow FREETEXT to work correctly for the query above?
The # char is indexed as punctuation and therefore ignored, so it looks like we'll remove the letter C from our word indexing ignore lists.
Tested it locally after doing that and rebuilding the indexes and I get results!
Looking at using a different word breaker language on the indexed column, so that those special characters aren't ignored.
EDIT: I also found this information:
c# is indexed as c (if c is not in your noise word list, see more on noise word lists later), but C# is indexed as C# (in SQL 2005 and SQL 2000 running on Win2003 regardless if C or c is in your noise word list). It is not only C# that is stored as C#, but any capital letter followed by #. Conversely, c++ ( and any other lower-cased letter followed by a ++) is indexed as c (regardless of whether c is in your noise word list).
Quoting a much-replicated help page about Indexing Service query language:
To use specially treated characters such as &, |, ^, #, #, $, (, ), in a query, enclose your query in quotation marks (“).
As far as I know, full text search in MSSQL is also done by the Indexing Service, so this might help.