Full Text Search SQL can't find digital value in nvarchar field - sql

I have a Stored Procedure, which uses full-text search for my nvarchar fields. And I'm stuck when I realized, that Full-Text Search can't find field if I type only numeric values of this field.
For example, I have field Name in my table with value 'Request_121'
If I type Запрос_120 or Request - it's okay
If I type 120 - nothing is found
What is going on?
Screenshots:
No results found: https://gyazo.com/9e9e061ce68432c368db7e9162909771
Results found: https://gyazo.com/e4cb9a06da5bf8b9f4d702c55e7f181e

You cannot find 121 word part in your full-indexed column because SQL Server treats Request_121 as a single term. You can verify this by running the fts parser manually:
select * from sys.dm_fts_parser('"Request_121"', 1033, 0, 0)
Returns:
while running:
select * from sys.dm_fts_parser('"Request 121"', 1033, 0, 0)
Returns:
Note, in the second example 121 was picked as separate search term.
What you could do is to try using wildcards in your FTS query like:
FROM dbo.CardSearchIndexes idx WHERE CONTAINS(idx.Name, '"121*"');
However, again I doubt it will pick 121 being inside a non-breakable word part, only if you have 121 as standalone word. Play with sys.dm_fts_parser to see how SQL FTS engine breaks up your input and adjust your query accordingly.
UPDATE: I've noticed that you use Cyrillic search terms together with English. Notice, when running FTS queries it's also important to know what Language was specified when FTS index was created for Name column. If the FTS language locale is Cyrillic then it will not find English term Request in the Name column.
Note, in my dm_fts_parser examples above I have used 1033 (English) language id. Examine the LANGUAGE language_term operator in your CREATE FULLTEXT INDEX statement to check what language was used for FTS index.

I have field Name in my table with value 'Request_121'
Your query is wrong, you have a typo, write 121 instead of 120
FROM dbo.CardSearchIndexes idx WHERE CONTAINS(idx.Name, '121');

Related

MS SQL fulltext search with ignoring special characters

I have a following (simplified) database structure:
[places]
name NVARCHAR(255)
description TEXT (usually quite a lot of text)
region_id INT FK
[regions]
id INT PK
name NVARCHAR(255)
[regions_translations]
lang_code NVARCHAR(5) FK
label NVARCHAR(255)
region_id INT FK
In real db I have few more fields in [places] table to search in, and [countries] table with similar structure to [regions].
My requirements are:
Search using name, description and region label, using the same behaviour as name LIKE '%text%' OR description LIKE '%text5' OR regions_translations.label LIKE '%text%'
Ignore all special characters like Ą, Ć, Ó, Š, Ö, Ü, etc. so for example, when someone search for
PO ZVAIGZDEM I return a place with name PO ŽVAIGŽDĖM - but of course also return this record, when user uses proper characters with accents.
Quite fast. ;)
I had a few approaches to solve this issue.
Create new column 'searchable_content', normalize text (so replace Ą -> A, Ö -> O and so on) and just do simple SELECT ... FROM places WHERE searchable_content LIKE '%text%' but it was slow
Add fulltext search index to table places and regions_translations - it was faster, but I could not find a way to ignore special characters (characters are from various of languages, so specyfying index language will not work)
Create new column as in first attempt, and addfulltext index only on that column - it was faster then attempt 1 (probably because I do not need to join tables) and I could manually normalize the content, but I feel like it's not a great solution.
Question is - what is the best approach here?
My top priority is to ignore special characters.
EDIT:
ALTER FULLTEXT CATALOG [catalog_name] REBUILD WITH ACCENT_SENSITIVITY = OFF
Probably is a solution to my issue with special characters (need to test it a bit more) - I query too fast, and index did not rebuild, that's why I did not get any records.
You can use the COLLATE clause on a column to specify a sql collation that will treat these special characters as their non-accented counterparts. Think of it as essentially casting one data type as another, except you're casting é as e (for example). You can use the same tool to return case sensitive or case insensitive results.
The documentation talks a little more about it, and you can do a search to find exactly which collation works best for you.
https://learn.microsoft.com/en-us/sql/t-sql/statements/collations?view=sql-server-ver16

Seperate between words in Sql server full text search

I use sql server 2012 and have full text search on Description field of Document table.
I can't find numbers that stick to alphabet characters with following query:
select *
from document
where contains (Description,'"123"')
how can I separate between numbers and alphabets on.
some of numbers should not separated such as date 2014/01/01
I have Persian description.
thanks in advance
Word breakers are the most effective way to handle this scenario. They define where the word boundaries exist when text is being indexed. (Example: the default word-breaker indexes "the dog-catcher" as [the] [dog] [catcher]). You can create a custom word breaker to split digits and letters (example: "123abc" becomes [123] [abc]). This will enable a search for "123" to match "123abc".
Unfortunately I don't have an example to show. I recommend you start with the link above and also look at the Windows Search SDK docs starting with the links below. (MSSQL and Windows Search both use the same technology.)
Windows Search Developer's Guide
Windows Search: Extending the Index
Have you tried the NEAR search?
select *
from document
where contains (Description,'123 NEAR abc')

Fastest way to find string by substring in SQL?

I have huge table with 2 columns: Id and Title. Id is bigint and I'm free to choose type of Title column: varchar, char, text, whatever. Column Title contains random text strings like "abcdefg", "q", "allyourbasebelongtous" with maximum of 255 chars.
My task is to get strings by given substring. Substrings also have random length and can be start, middle or end of strings. The most obvious way to perform it:
SELECT * FROM t LIKE '%abc%'
I don't care about INSERT, I need only to do fast selects. What can I do to perform search as fast as possible?
I use MS SQL Server 2008 R2, full text search will be useless, as far as I see.
if you dont care about storage, then you can create another table with partial Title entries, beginning with each substring (up to 255 entries per normal title ).
in this way, you can index these substrings, and match only to the beginning of the string, should greatly improve performance.
If you want to use less space than Randy's answer and there is considerable repetition in your data, you can create an N-Ary tree data structure where each edge is the next character and hang each string and trailing substring in your data on it.
You number the nodes in depth first order. Then you can create a table with up to 255 rows for each of your records, with the Id of your record, and the node id in your tree that matches the string or trailing substring. Then when you do a search, you find the node id that represents the string you are searching for (and all trailing substrings) and do a range search.
Sounds like you've ruled out all good alternatives.
You already know that your query
SELECT * FROM t WHERE TITLE LIKE '%abc%'
won't use an index, it will do a full table scan every time.
If you were sure that the string was at the beginning of the field, you could do
SELECT * FROM t WHERE TITLE LIKE 'abc%'
which would use an index on Title.
Are you sure full text search wouldn't help you here?
Depending on your business requirements, I've sometimes used the following logic:
Do a "begins with" query (LIKE 'abc%') first, which will use an index.
Depending on if any rows are returned (or how many), conditionally move on to the "harder" search that will do the full scan (LIKE '%abc%')
Depends on what you need, of course, but I've used this in situations where I can show the easiest and most common results first, and only move on to the more difficult query when necessary.
You can add another calculated column on the table: titleLength as len(title) PERSISTED. This would store the length of the "title" column. Create an index on this.
Also, add another calculated column called: ReverseTitle as Reverse(title) PERSISTED.
Now when someone searches for a keyword, check if the length of keyword is same as titlelength. If so, do a "=" search. If length of keyword is less than the length of the titleLength, then do a LIKE. But first do a title LIKE 'abc%', then do a reverseTitle LIKE 'cba%'. Similar to Brad's approach - ie you do the next difficult query only if required.
Also, if the 80-20 rules applies to your keywords/ substrings (ie if most of the searches are on a minority of the keywords), then you can also consider doing some sort of caching. For eg: say you find that many users search for the keyword "abc" and this keyword search returns records with ids 20, 22, 24, 25 - you can store this in a separate table and have this indexed.
And now when someone searches for a new keyword, first look in this "cache" table to see if the search was already performed by an earlier user. If so, no need to look again in main table. Simply return results from "cache" table.
You can also combine the above with SQL Server TextSearch. (assuming you have a valid reason not to use it). But you could nevertheless use Text search first to shortlist the result set. and then run a SQL query against your table to get exact results using the Ids returned by the TExt Search as a parameter along with your keyword.
All this is obviously assuming you have to use SQL. If not, you can explore something like Apache Solr.
Create index view there is new feature in sql create index on the column that you need to search and use that view after in your search that will give your more faster result.
Use ASCII charset with clustered indexing the char column.
The charset influences the search performance because of the data
size on both ram and disk. The bottleneck is often I/O.
Your column is 255 characters long so you can use normal index on
your char field rather than full text, which is faster. Do not
select unnecessary columns in your select statement.
Lastly, add more RAM to the server and Increase cache size.
Do one thing, use primary key on specific column & index it in cluster form.
Then search using any method (wild card or = or any), it will search optimally because the table is already in clustered form, so it knows where he can find (because column is already in sorted form)

Fulltext search (sql server 2005) works only on some fields

OK this is the situation..
I am enabling fulltext search on a table but it only works on some fields..
CREATE FULLTEXT CATALOG [defaultcatalog]
CREATE UNIQUE INDEX ui_staticid on static(id)
CREATE FULLTEXT INDEX ON static(title_gr LANGUAGE 19,title_en,description_gr LANGUAGE 19,description_en) KEY INDEX staticid ON [defaultcatalog] WITH CHANGE_TRACKING AUTO
now why the following will bring results
Select * from static where freetext(description_en, N'str')
and this not (while the both have text with str in it ..)
Select * from static where freetext(description_gr, N'str')
(i have tried it also without the language specification - greek in this case)
(the collation is of the database is Greek_CI_AS)
btw
Select * from static where description_gr like N'%str%'
will work just fine ..
all fields are nvarchar type and the _gr fields hold english and greek text..(should not matter)
All help will be greatly appreciated
Just trying to figure out what's going on: what do you get with this query here?
SELECT * FROM static WHERE FREETEXT(*, N'str')
If you're not explicitly specifying any column to search in - does it give you the expected results?
Another point: I think you have a wrong language ID in your statement. According to SQL Server Books Online:
When specified as a string,
language_term corresponds to the alias
column value in the syslanguages
system table. The string must be
enclosed in single quotation marks, as
in 'language_term'. When specified
as an integer, language_term is the
actual LCID that identifies the
language.
and from what I found on the internet searching around, the LCID for Greek is 1032 - not 19. Can you try with 1032 instead of 19? Does that make a difference?
Marc

Why doesn't SQL Full Text Indexing return results for words containing #?

For instance, my query is like the following using SQL Server 2005:
SELECT * FROM Table WHERE FREETEXT(SearchField, 'c#')
I have a full text index defined to use the column SearchField which returns results when using:
SELECT * FROM Table WHERE SearchField LIKE '%c#%'
I believe # is a special letter, so how do I allow FREETEXT to work correctly for the query above?
The # char is indexed as punctuation and therefore ignored, so it looks like we'll remove the letter C from our word indexing ignore lists.
Tested it locally after doing that and rebuilding the indexes and I get results!
Looking at using a different word breaker language on the indexed column, so that those special characters aren't ignored.
EDIT: I also found this information:
c# is indexed as c (if c is not in your noise word list, see more on noise word lists later), but C# is indexed as C# (in SQL 2005 and SQL 2000 running on Win2003 regardless if C or c is in your noise word list). It is not only C# that is stored as C#, but any capital letter followed by #. Conversely, c++ ( and any other lower-cased letter followed by a ++) is indexed as c (regardless of whether c is in your noise word list).
Quoting a much-replicated help page about Indexing Service query language:
To use specially treated characters such as &, |, ^, #, #, $, (, ), in a query, enclose your query in quotation marks (“).
As far as I know, full text search in MSSQL is also done by the Indexing Service, so this might help.