Specific word only with CONTAINS, FREETEXT? - sql

How do I search for a specific word only in full text indexed fields ?
I know I could just do "where field='word'" but I would rather have the search form as generic as possible and throw the search term to the CONTAINS, FREETEXT functions.
Seems there should be a word boundary or end of phrase character that could be used but there doesn't seem to be one.
I'm using MS SQL Server 2005.

In order to use Contains or FreeText, you must have a full-text index on the column(s) in question. Assuming you do, then you can force a search for a specific word by enclosing in double-quotes:
Select ..
From Table
Where Contains( SomeCol, '"word"')
CONTAINS (Transact-SQL)
Update
If what you seek is to return results where the contents is only and exactly your search term, then the only way to do that is to use = or Like without a wildcard. I.e., there is no means to search on an exact match of the entire contents using the full-text search predicates. Thus, you must use:
Select ..
From Table
Where SomeCol = 'word'
Select ..
From Table
Where SomeCol Like 'word'

Related

Is there a way to optimize a simple SQL request with a big database

I'm using SQLITE with Unity3D to create a game with words.
I have a big database of all the words that exist (700k words) and I just want to check if a word exist in the database.
Here is my query SELECT Word FROM Words WHERE Word = 'IA' COLLATE NOCASE
This take approximatly 0.3599885 sec to execute
Is it possible to low down the time of the request ?
OK I just found that COLLATE NOCASE is really slow.
I just used LOWER() Instead
The query take 0.0009974 sec now
Make sure all words are stored in the table in all lower case (or all upper case for the matter). You can use a CONSTRAINT to ensure this. Then you don't need to use a collation to find the words, and that will make it blazing fast.
For example:
create table words (
word varchar(50)
);
insert into words (word) values ('chicago'); -- always lower case
insert into words (word) values ('ia');
insert into words (word) values ('london');
Then create an index on the column:
create index ix1 on words (word);
Now, you can search fast:
select word from words where word = lower('IA');
Note: Here note the LOWER() function is applied to the "right side" of the equation (the value). If you apply it to the column, you'll make the query slow since you'll prevent the use of the index.

Best way to index a SQL table to find best matching string

Let's say I have a SQL table with an int PK column and an nvarchar(max). In the the nvarchar(max) column, I have a bunch of table entries that are all like this:
SOME_PEOPLE_LIKE_APPLES
SOME_PEOPLE_LIKE_APPLES_ON_TUESDAY
SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON
SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON_CAFE
SOME_PEOPLE_LIKE_APPLES_ON_THE_RIVER
.
.
.
SOME_ANTS_HATE_SYRUP
SOME_ANTS_HATE_SYRUP_WITH_STRAWBERRIES
There's millions of these rows - Then let's say my goal is to find the row with the most overlap for an input searchTerm - So in this case, if I input SOME PEOPLE_LIKE_APPLES_ON_THE_MOON_MOUNTAIN, the returned entry would be the third entry from the table above, SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON
I have a SPROC that does this very naively, it goes through the entire table as follows:
SELECT DISTINCT phrase, len(phrase) l, [id] FROM X WHERE searchTerm LIKE phrase + '%'
-- phrase is the row entry being searched against
-- searchTerm is the phrase we're searching for
I then ORDER BY length and pick the TOP only
Would there be a way to speed this up, perhaps by doing some indexing?
If this is confusing, think of it as tableRowEntry + wildcard = searchTerm
I'm on MSSQL 2008 if that makes any difference
If there is an index on your NVARCHAR-column a LIKE 'Something%' -search will be able to use it and should be pretty fast.
If there is a wildcard in the beginning you are out of luck. But - in your case - this should work.
You might use an indexed persistant computed column storing the length of the string. In this case you might reduce the workload enormously by filtering out all string which are to short or to long.
If there are certain words in your search terms which appear often but not everywhere, you might use side columns again and filter like AND InlcudePEOPLE=1 AND IncludeMOON=1
UPDATE
Here is an example
CREATE TABLE Phrase(ID INT IDENTITY
,Phrase NVARCHAR(100)
,PhraseLength AS LEN(Phrase) PERSISTED);
CREATE INDEX IX_Phrase_Phrase ON Phrase(Phrase);
CREATE INDEX IX_Phrase_PhraseLength ON Phrase(PhraseLength);
INSERT INTO Phrase
VALUES
('SOME_PEOPLE_LIKE_APPLES')
,('SOME_PEOPLE_LIKE_APPLES_ON_TUESDAY')
,('SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON')
,('SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON_CAFE')
,('SOME_PEOPLE_LIKE_APPLES_ON_THE_RIVER')
,('SOME_ANTS_HATE_SYRUP')
,('SOME_ANTS_HATE_SYRUP_WITH_STRAWBERRIES');
DECLARE #SearchTerm NVARCHAR(100)=N'SOME_PEOPLE_LIKE_APPLES_ON_THE_MOON_MOUNTAIN';
--This uses the index (checked against execution plan)
SELECT TOP 1 *
FROM Phrase
WHERE #SearchTerm LIKE Phrase + '%'
ORDER BY PhraseLength DESC;
--This might be even better, check with your high row count.
SELECT TOP 1 *
FROM Phrase
WHERE Phrase=LEFT(#SearchTerm,PhraseLength)
ORDER BY PhraseLength DESC;
GO
--Clean-Up
DROP TABLE Phrase;
The best solution here is to create a full-text search index:
https://msdn.microsoft.com/en-us/library/ms142571.aspx
Full text search is optimized for this task, once the index is created you can use full-text queries with the CONTAINS full-text function to find the matches efficiently:
SELECT DISTINCT phrase, len(phrase) l, [id] FROM X WHERE CONTAINS(phrase, searchPhrase)
Full text search not only allows custom optimization through query hints like OPTIMIZE FOR, it also allows for stopwords like AND and OR within the search terms, and a variety of other text-searching goodies, like being able to find spelling variations of the same word automatically and filter by relevance, etc..

REGEXP_LIKE Oracle function

I have a list of 100 words that I need to do a pattern match on 55 Million rows of data. Is there a way to create a list of words and pass the list through the REGEXP_LIKE function, instead of using the | (or) statement multiple times, can a list be input instead?
Search *
From table
Where REGEXP_LIKE(C1, 'wordlword2letc...', 'i');
You cannot pass a list of words as pattern in REGEXP_LIKE.
pattern is the regular expression and usually is text literal and cannot be more than 512 bytes.
What you can possibly do is, store the words you're trying to search in separate table/column and then use LIKE condition in your query as you're just trying to search for the occurrence of the words and not expecting regular expression search support.
So, if there is a table/column (new_table.col) which stores your input items to search for, your query might look like (using UPPER function to ensure case insensitive search as you were trying) -
SELECT a.* FROM table a, new_table b WHERE UPPER(a.col1) LIKE UPPER(b.col);

Finding the "&" character in SQL SERVER using a like statement and Wildcards

I need to find the '&' in a string.
SELECT * FROM TABLE WHERE FIELD LIKE ..&...
Things we have tried :
SELECT * FROM TABLE WHERE FIELD LIKE '&&&'
SELECT * FROM TABLE WHERE FIELD LIKE '&\&&'
SELECT * FROM TABLE WHERE FIELD LIKE '&|&&' escape '|'
SELECT * FROM TABLE WHERE FIELD LIKE '&[&]&'
None of these give any results in SQLServer.
Well some give all rows, some give none.
Similar questions that didn't work or were not specific enough.
Find the % character in a LIKE query
How to detect if a string contains special characters?
some old reference Server 2000
http://web.archive.org/web/20150519072547/http://sqlserver2000.databases.aspfaq.com:80/how-do-i-search-for-special-characters-e-g-in-sql-server.html
& isn't a wildcard in SQL, therefore no escaping is needed.
Use % around the value your looking for.
SELECT * FROM TABLE WHERE FIELD LIKE '%&%'
Your statement contains no wildcards, thus is equivalent to WHERE FIELD = '&'.
& isn't a special character in SQL so it doesn't need to be escaped. Just write
WHERE FIELD LIKE '%&%'
to search for entries that contain & somewhere in the field
Be aware though, that this will result in a full table scan as the server can't use any indexes. Had you typed WHERE FIELD LIKE '&%' the server could do a range seek to find all entries starting with &.
If you have a lot of data and can't add any more constraints, you should consider using SQL Server's full-text search to create and use and FTS index, with predicates like CONTAINS or FREETEXT

Make an SQL request more efficient and tidy?

I have the following SQL query:
SELECT Phrases.*
FROM Phrases
WHERE (((Phrases.phrase) Like "*ing aids*")
AND ((Phrases.phrase) Not Like "*getting*")
AND ((Phrases.phrase) Not Like "*contracting*"))
AND ((Phrases.phrase) Not Like "*preventing*"); //(etc.)
Now, if I were using RegEx, I might bunch all the Nots into one big (getting|contracting|preventing), but I'm not sure how to do this in SQL.
Is there a way to render this query more legibly/elegantly?
Just by removing redundant stuff and using a consistent naming convention your SQL looks way cooler:
SELECT *
FROM phrases
WHERE phrase LIKE '%ing aids%'
AND phrase NOT LIKE '%getting%'
AND phrase NOT LIKE '%contracting%'
AND phrase NOT LIKE '%preventing%'
You talk about regular expressions. Some DBMS do have it: MySQL, Oracle... However, the choice of either syntax should take into account the execution plan of the query: "how quick it is" rather than "how nice it looks".
With MySQL, you're able to use regular expression where-clause parameters:
SELECT something FROM table WHERE column REGEXP 'regexp'
So if that's what you're using, you could write a regular expression string that is possibly a bit more compact that your 4 like criteria. It may not be as easy to see what the query is doing for other people, however.
It looks like SQL Server offers a similar feature.
Sinec it sounds like you're building this as you go to mine your data, here's something that you could consider:
CREATE TABLE Includes (phrase VARCHAR(50) NOT NULL)
CREATE TABLE Excludes (phrase VARCHAR(50) NOT NULL)
INSERT INTO Includes VALUES ('%ing aids%')
INSERT INTO Excludes VALUES ('%getting%')
INSERT INTO Excludes VALUES ('%contracting%')
INSERT INTO Excludes VALUES ('%preventing%')
SELECT
*
FROM
Phrases P
WHERE
EXISTS (SELECT * FROM Includes I WHERE P.phrase LIKE I.phrase) AND
NOT EXISTS (SELECT * FROM Excludes E WHERE P.phrase LIKE E.phrase)
You are then always just running the same query and you can simply change what's in the Includes and Excludes tables to refine your searches.
Depending on what SQL server you are using, it may support REGEX itself. For example, google searches show that SQL Server, Oracle, and mysql all support regex.
You could push all your negative criteria into a short circuiting CASE expression (works Sql Server, not sure about MSAccess).
SELECT *
FROM phrases
WHERE phrase LIKE '%ing aids%'
AND CASE
WHEN phrase LIKE '%getting%' THEN 2
WHEN phrase LIKE '%contracting%' THEN 2
WHEN phrase LIKE '%preventing%' THEN 2
ELSE 1
END = 1
On the "more efficient" side, you need to find some criteria that allows you to avoid reading the entire Phrases column. Double sided wildcard criteria is bad. Right sided wildcard criteria is good.