Fulltext search numbers does not work in SQL Server 2012 - sql

Fulltext search numbers does not work in SQL Server 2012.
I tried to create an empty_stoplist and repopulate the index. Can anyone tell me what I am doing wrong here?
CREATE FULLTEXT CATALOG Orders_FTS
WITH ACCENT_SENSITIVITY = OFF;
GO
CREATE FULLTEXT INDEX ON dbo.Orders
(
a Language 1031,
b Language 1031,
c Language 1031,
d Language 1031
)
KEY INDEX [PK_Orders]
ON Orders_FTS;
GO
CREATE FULLTEXT STOPLIST EMPTY_STOPLIST;
ALTER FULLTEXT STOPLIST empty_stoplist DROP ALL;
ALTER FULLTEXT INDEX ON Orders SET STOPLIST EMPTY_STOPLIST;
ALTER FULLTEXT INDEX ON Orders SET STOPLIST = OFF;
ALTER FULLTEXT INDEX ON Orders START UPDATE POPULATION;
The SQL query:
SELECT
T.*, R.RANK
FROM
Orders As T
INNER JOIN
CONTAINSTABLE(Orders, *, '"*007440147*"') AS R On T.ID = R.[KEY]
ORDER BY
RANK DESC, ID DESC

The problem is that leading wildcards (ex: *bcde) are not supported by SQL Server. (More here.) The query will execute without any errors but will always return 0 results. You can only use wildcards in the middle of a string (ex: ab*de) or the end of a string (ex: abcd*).
Usually this can be worked around by creating columns that contain the reverse string and searching on those columns (ex: Column1 = abcde, Column1Reverse = edcba, query has CONTAINS(Column1Reverse, '"edcb*"')).
However in your case you want to use a wildcard at the beginning and end of the string. I think your options are limited to:
If you don't need a leading wildcard, then don't use it. For example, if the text you are trying to match is 007440147xxx then using 007440147* in your query will work fine.
Use LIKE instead of CONTAINSTABLE, for example: SELECT * FROM Orders WHERE Column1 LIKE '%007440147%'. The downside to this approach is that you won't get a rank value and queries may take a long time to execute. (Then again, even if you could use a leading wildcard in full text searches, they would be slow.)
Redesign how the data is stored and queried. I can't offer any suggestions without understanding what these numbers mean and how they need to be queried.
Consider using another search product. I believe Lucene can perform leading wildcard searches but such searches tend to be slow.

Related

SQL Contains() not returning results for 'The'

I have SQL script as below for querying my ContactInfoes table
SELECT *
FROM ContactInfoes
WHERE CONTAINS(Name, 'The')
I am getting only empty result set. I have an entry in my table with Name 'The Company'.
Why I am not getting any data here and how this can be resolved. Any help is appreciated.
I am using SQL Server 2019
You have created FULLTEXT index without specifying STOPLIST. Thus, the default STOPLIST was used. By default the word 'the' is the stop word, that removed from your text. If you want to search by word 'the' you should create an empty STOPLIST and then specify this STOPLIST in your FULLTEXT INDEX.
The default stop words you can check by query:
SELECT *
FROM sys.fulltext_system_stopwords
WHERE language_id = 1033 -- English
Then you can create empty STOPLIST:
CREATE FULLTEXT STOPLIST MyEmptyStopList;
GO
Then set it into your FULLTEXT INDEX:
CREATE FULLTEXT INDEX ON table_name ... STOPLIST = MyEmptyStopList;
The simple solution for me was to turn off the stop list using below script
ALTER FULLTEXT INDEX ON [tablename] Set StopList = OFF
I already have an index configured, which was using the default stoplist. I am turning off the stoplist here

Best way to index a SQL table to find best matching string

Let's say I have a SQL table with an int PK column and an nvarchar(max). In the the nvarchar(max) column, I have a bunch of table entries that are all like this:
SOME_PEOPLE_LIKE_APPLES
SOME_PEOPLE_LIKE_APPLES_ON_TUESDAY
SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON
SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON_CAFE
SOME_PEOPLE_LIKE_APPLES_ON_THE_RIVER
.
.
.
SOME_ANTS_HATE_SYRUP
SOME_ANTS_HATE_SYRUP_WITH_STRAWBERRIES
There's millions of these rows - Then let's say my goal is to find the row with the most overlap for an input searchTerm - So in this case, if I input SOME PEOPLE_LIKE_APPLES_ON_THE_MOON_MOUNTAIN, the returned entry would be the third entry from the table above, SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON
I have a SPROC that does this very naively, it goes through the entire table as follows:
SELECT DISTINCT phrase, len(phrase) l, [id] FROM X WHERE searchTerm LIKE phrase + '%'
-- phrase is the row entry being searched against
-- searchTerm is the phrase we're searching for
I then ORDER BY length and pick the TOP only
Would there be a way to speed this up, perhaps by doing some indexing?
If this is confusing, think of it as tableRowEntry + wildcard = searchTerm
I'm on MSSQL 2008 if that makes any difference
If there is an index on your NVARCHAR-column a LIKE 'Something%' -search will be able to use it and should be pretty fast.
If there is a wildcard in the beginning you are out of luck. But - in your case - this should work.
You might use an indexed persistant computed column storing the length of the string. In this case you might reduce the workload enormously by filtering out all string which are to short or to long.
If there are certain words in your search terms which appear often but not everywhere, you might use side columns again and filter like AND InlcudePEOPLE=1 AND IncludeMOON=1
UPDATE
Here is an example
CREATE TABLE Phrase(ID INT IDENTITY
,Phrase NVARCHAR(100)
,PhraseLength AS LEN(Phrase) PERSISTED);
CREATE INDEX IX_Phrase_Phrase ON Phrase(Phrase);
CREATE INDEX IX_Phrase_PhraseLength ON Phrase(PhraseLength);
INSERT INTO Phrase
VALUES
('SOME_PEOPLE_LIKE_APPLES')
,('SOME_PEOPLE_LIKE_APPLES_ON_TUESDAY')
,('SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON')
,('SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON_CAFE')
,('SOME_PEOPLE_LIKE_APPLES_ON_THE_RIVER')
,('SOME_ANTS_HATE_SYRUP')
,('SOME_ANTS_HATE_SYRUP_WITH_STRAWBERRIES');
DECLARE #SearchTerm NVARCHAR(100)=N'SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON_MOUNTAIN';
--This uses the index (checked against execution plan)
SELECT TOP 1 *
FROM Phrase
WHERE #SearchTerm LIKE Phrase + '%'
ORDER BY PhraseLength DESC;
--This might be even better, check with your high row count.
SELECT TOP 1 *
FROM Phrase
WHERE Phrase=LEFT(#SearchTerm,PhraseLength)
ORDER BY PhraseLength DESC;
GO
--Clean-Up
DROP TABLE Phrase;
The best solution here is to create a full-text search index:
https://msdn.microsoft.com/en-us/library/ms142571.aspx
Full text search is optimized for this task, once the index is created you can use full-text queries with the CONTAINS full-text function to find the matches efficiently:
SELECT DISTINCT phrase, len(phrase) l, [id] FROM X WHERE CONTAINS(phrase, searchPhrase)
Full text search not only allows custom optimization through query hints like OPTIMIZE FOR, it also allows for stopwords like AND and OR within the search terms, and a variety of other text-searching goodies, like being able to find spelling variations of the same word automatically and filter by relevance, etc..

SQL Server 2012 - Fulltext search on nvarchar field with SHA hashes returns nothing

I have a SQL table with 60 Million records with 2 columns: ID and Hash.
ID is a incremental int PK.
Hash is a nvarchar field with Index.
I created a Fulltext index on the Hash field like this:
CREATE FULLTEXT CATALOG hashes_catalog;
GO
CREATE FULLTEXT INDEX
ON dbo.hashes(hash LANGUAGE 1033)
KEY INDEX IXC_Hash ON hashes_catalog;
GO
But when I try to do a FullText search there are no results, for example:
SELECT * FROM Hashes WHERE CONTAINS(hash,'1CE')
Returns nothing.
But data like this exists, this is some sample of data from the table:
1CefATjZSfzDK1bn15tFv5EHzQtxmCkNQL
1CEfatKXLUomearrh7JyKgv4w1Ci1jxM8B
1CefAtyhBVXda5324NwTkfBMkEZ9YcF6vN
1CEfAUbiB2AfqjGpg8r8hxuAxTdzrDPGmv
1CEFAUzKC2Ffi8HwMSfkqTDN8deBTjXnrD
1CEfavd9sVZmLsez8JHKUHHZ7ZEAaKbp6W
1CEFAVfD55it65d6MdQpo3mnnBhBviLTh4
1CEfAVjjGrBQCkLh6qBEfwX46G213DnNhc
1Cefavph9RxQdLfasHR25B3P9W98tCGGus
1CEfavqq739Ny9sH7F1qCS5GzSpVB1Yz5g
1CEFAw68XLVRwzQSP7HNW4kd5z3JRdcPgU
If I execute
SELECT * FROM Hashes WHERE Hash like '%1CE%'
There are 129184 results.
Any ideas how to put fulltext search working for fields like this? Is it even possible?
If the hash begins with this term, then you can use the following syntax:
SELECT * FROM Hashes WHERE CONTAINS(hash,'"1CE*"')
Fulltext search is probably not the best bet for searches of this sort. It works by tokenizing your text and if all the text in the field is in one single "word" (with no punctuation), there isn't a major advantage to using fulltext over LIKE.

Full-Text Catalog unable to search Primary Key

I have create this little sample Table Person.
The Id is a primary key and is identity = True
Now when I try to create a FullText Catalog, I'm unable to search for the Id.
If I follow this wizard, I'll end up with a Catalog, where its only possible to search for a persons name. I would really like to know, how I can make it possible to do a fulltext search for both the Id and Name.
edit
SELECT *
FROM [TestDB].[dbo].[Person]
WHERE FREETEXT (*, 'anders' );
SELECT *
FROM [TestDB].[dbo].[Person]
WHERE FREETEXT (*, '1' );
I would like them to return the same result, the first returns id = 1 name = Anders, while the second query don't return anything.
edit 2
Looks like the problem is in using int, but is it not possible to trick FullText to support it?
edit 3
Created a view where I convert the int to a nvarchar. CONVERT(nvarchar(50), Id) AS PersonId this did make it possible for me to select that column, when creating the Full Text Catalog, but It still won't let me find it searching for the Id.
From reading your question, I am not sure that you understand the purpose of a full-text index. A full-text index is intended to search TEXT (one or more columns) on a table. And not just as a replacement for:
SELECT *
FROM table
WHERE col1 LIKE 'Bob''s Pizzaria%'
OR col2 LIKE 'Bob''s Pizzaria%'
OR col3 LIKE 'Bob''s Pizzaria%'
It also allows you to search for variations of "Bob's Pizzaria" like "Bobs Pizzeria" (in case someone misspells Pizzeria or forgot to put in the ' or over-zealous anti-SQL-injection code stripped the ') or "Robert's Pizza" or "Bob's Pizzeria" or "Bob's Pizza", etc. It also allows you to search "in the middle" of a text column (char, varchar, nchar, nvarchar, etc.) without the dreaded "%Bob%Pizza%" that eliminates any chance of using a traditional index.
Enough with the lecture, however. To answer your specific question, I would create a separate column (not a "computed column") "IdText varchar(10)" and then an AFTER INSERT trigger something like this:
UPDATE t
SET IdText = CAST(Id AS varchar(10))
FROM table AS t
INNER JOIN inserted i ON i.Id = t.Id
If you don't know what "inserted" is, see this MSDN article or search Stack Overflow for "trigger inserted". Then you can include the IdText column in your full-text index.
Again, from your example I am not sure that a full-text index is what you should use here but then again your actual situation might be something different and you just created this example for the question. Also, full-text indexes are relatively "expensive" so make sure you do a cost-benefit analysis. Here is a Stack Overflow question about full-text index usage.
Why not just do a query like this.
SELECT *
FROM [TestDB].[dbo].[Person]
WHERE FREETEXT (*, '1' )
OR ID = 1
You can leave off the "OR ID = 1" part if by first checking if the search term is a number.

SQL Server 2005 FTS unexpected results

I have an Indexed View with two columns, a primary key char and a field for full-text indexing varchar(300). I need to search from a MS Great Plains database, so I created the view to populate a field with concatenated values from my primary table IV00101.
CREATE VIEW SearchablePartContent WITH SCHEMABINDING AS
SELECT ITEMNMBR, rtrim(ITEMNMBR)+' '+rtrim(ITMSHNAM)+' '+rtrim(ITMGEDSC)
as SearchableContent
FROM dbo.IV00101
GO
-- create the index on the view to be used as full text key index
CREATE UNIQUE CLUSTERED INDEX IDX_ITEMNMBR ON SearchablePartContent(ITEMNMBR)
CREATE FULLTEXT INDEX ON SearchablePartContent(SearchableContent) KEY INDEX IDX_ITEMNMBR ON Cat1_PartContent
WHILE fulltextcatalogproperty('Cat1_PartContent','populatestatus') <> 0
BEGIN
WAITFOR DELAY '00:00:01'
END
The problem is that when I do a search with particular keyword(s) it will yield unexpected results. For instance a simple query such as:
SELECT * FROM SearchablePartContent WHERE CONTAINS(SearchableContent, 'rotor')
should yield 5 results, instead I get 1. There's about 72,000 records indexed. However, if I do a LIKE comparison, I will get the expected rows. My data is not complex, here are a couple results that should be returned from my query, but are not:
MN-H151536 John Chopper, Rotor Assembly Monkey 8820,9600,8820FRT
MN-H152756 John Rotor, Bearing 9650STS,9750STS1
MN-H160613 John Rotor, Bearing 9650STS,9750STS2
Any help would be greatly appreciated. Thanks
Just a thought: Try enclosing your search term with double quotes to see if it makes a difference.
SELECT * FROM SearchablePartContent WHERE CONTAINS(SearchableContent, ' "rotor" ')