SQL Server Full Text Search with complete sentences - sql

I have an Azure SQL database and tried the full text search.
Is is possible to search for a complete sentence?
E.g. query with LIKE-operator that works (but probably not fast as full text search):
SELECT Sentence
FROM Sentences
WHERE 'This is a whole sentence for example.' LIKE '%'+Sentence+'%'
Would return: "a whole sentence"
I need something like that with full text search:
SELECT Sentence
FROM Sentences
WHERE FREETEXT(WorkingExperience,'This is a whole sentence for example.')
This will return each hit on a word, but not on the complete sentence.
E.g. would return: "a whole sentence" and "another sentence".
Is that possible or do I have to use the LIKE-operator?

Have you tried this:
SELECT Sentence
FROM Sentences
WHERE FREETEXT(WorkingExperience,'"This is a whole sentence for example."')
If the above doesn't work you may need to construct the proper FTS search string using AND operator, like below:
SELECT Sentence
FROM Sentences
WHERE FREETEXT(WorkingExperience,'"This" AND "is" AND "a" AND "whole"
AND "sentence" AND "for" AND "example."')
Also, for more precise matching I recommend using CONTAINS or CONTAINSTABLE:
SELECT Sentence
FROM Sentences
WHERE CONTAINS(WorkingExperience,'"This is a whole sentence for example."')
HTH

If anyone else is interested here is a link for a good article with examples:
https://www.microsoftpressstore.com/articles/article.aspx?p=2201634&seqNum=3
You can choose the best method to accommodate your need from the examples.
To me matching a whole sentence can be easily done with the below where clause as mentioned in the other answer:
WHERE CONTAINS(WorkingExperience,'"This is a whole sentence for example."')
if you need to look for all the words but user might input them in abnormal sequence I would suggest to use
WHERE CONTAINS(WorkingExperience, N'NEAR(This, whole, sentence, is, a, for, example)')
You can do other magics with this which can be found in the above article. If you need to order the result based on the hit score/rank you will need to use CONTAINSTABLE instead of CONTAINS.

Related

Search postgresql database for strings contianing specific words

I'm looking to query a postgresql database full of strings, specifically for strings with the word 'LOVE' in - this means only this specific version of the word and nothing where love is the stem or has that sequence of characters inside another word. I've so far been using the SELECT * FROM songs WHERE title LIKE '%LOVE%';, which mostly returns the desired results.
However, it also returns results like CRIMSON AND CLOVER, LOVESTONED/I THINK SHE KNOWS (INTERLUDE), LOVER YOU SHOULD'VE COME OVER and TO BE LOVED, which I want to exclude as they are specifically the word 'LOVE'.
I know you can use SELECT * FROM songs WHERE title = 'LOVE';, but this will obviously miss any string that isn't exactly 'LOVE'. Is there an operation in postgresql that can return the results I need?
You can use a regular expression that looks for love either with a space before or after, or if the word is at the start or end of the string:
with songs (title) as (
values
('Crimson And Clover'),
('Love hurts'),
('Only love can tear us apart'),
('To be loved'),
('Tainted love')
)
select *
from songs
where title ~* '\mlove\M';
The ~* is the regex operator and uses case insensitive comparison. The \m and \M restrict the match to the beginning and end of a word.
returns:
title
---------------------------
Love hurts
Only love can tear us apart
Tainted love
Online example: http://rextester.com/EUTHKM33922

Regex matching sequence of characters

I have a test string such as: The Sun and the Moon together, forever
I want to be able to type a few characters or words and be able to match this string if the characters appear in the correct sequence together, even if there are missing words. For example, the following search word(s) should all match against this string:
The Moon
Sun tog
Tsmoon
The get ever
What regex pattern should I be using for this? I should add that the supplied test strings are going to be dynamic within an app, and so I'd like to be able to use a pattern based on the search string.
From your example Tsmoon you show partial words (T), ignoring case (s, m) and allow anything between each entered character. So as a first attempt you can:
Set the ignore case option
Between each chapter input insert the regular expression to match zero or more of anything. You can choose whether to match the shortest or longest run.
Try that, reading the documentation for NSRegularExpression if you're stuck, and see how it goes. If you get stuck ask a new question showing your code and the RE constructed and explain what happens/doesn't work as expected.
HTH

How exact phrase search is performed by a Search Engine?

I am using Lucene to search in a Data-set, I need to now how "" search (I mean exact phrase search) mechanism has been implemented?
I want to make it able to result all "little cat" hits when the user enters "littlecat". I now that I should manipulate the indexing code, but at least I should now how the "" search works.
I want to make it able to result all "little cat" hits when the user enters "littlecat"
This might sound easy but it is very tough to implement. For a human being little and cat are two different words but for a computer it does not know little and cat seperately from littlecat, unless you have a dictionary and your code check those two words in dictionary. On the other hand searching for "little cat" can easily search for "littlecat" aswell. And i believe that this goes beyong the concept of an exact phrase search. Exact phrase search will only return littlecat if you search for "littlecat" and vice versa. Even google seemingly (expectedly too), doesnt return "little cat" on littlecat search
A way to implement this is Dynamic programming - using a dictionary/corpus to compare your individual words against(and also the left over words after you have parsed the text into strings).
Think of it like you were writing a custom spell-checker or likewise. In this, there's also a scenario when more than one combination of words may be left over eg -"walkingmydoginrain" - here you could break the 1st word as "walk", or as "walking" , and this is the beauty of DP - since you know (from your corpus) that you can't form legitimate words from "ingmydoginrain" (ie rest of the string - you have just discovered that in this context - you should pick the segmented word as "Walking" and NOT walk.
Also think of it like not being able to find a match is adding to a COST function that you define, so you should get optimal results - meaning you can be sure that your text(un-separated with white spaces) will for sure be broken into legitimate words- though there may be MORE than one possible word sequences in that line(and hence, possibly also intent of the person seeking this)
You should be able to find pretty good base implementations over the web for your use case (read also : How does Google implement - "Did you mean" )
For now, see also -
How to split text without spaces into list of words?

Space issue in Lucene.NET C#

I want to search sentence which has space in full text search.
Ex: Tom is a very good boy in class.
I want to Search the key word "very good".
I'm using white space tokenizer to create/search index. But it is not finding the keyword if it is separated by space.
Code:
Query searchItemQuery = new WildcardQuery(new Term(string-field-name, searchkeyword.ToLower()));
I've tried with split but it is not working properly.
Do anyone suggest me a solution for this problem?
Thanks,
Vijay
Since, you are working with tokenized string, every word is a separate term.
In order too find a phrase consisting of multiple terms, you would need to use PhraseQuery instead of WildcardQuery.
Like this:
PhraseQuery phraseQuery = new PhraseQuery();
phraseQuery.Add(new Term(string-field-name, "very"));
phraseQuery.Add(new Term(string-field-name, "good"));
Note also, that you are using wildcard query. Wildcards in phrase query are a bit complex. Check this post for details: Lucene - Wildcards in phrases
And finally, I would suggest to consider using QueryParser instead of constructing query manually.

"Exclude these words" feature

How do I implement "Exclude these words" feature for a search appliation using Lucene?
Thanks!
therefor i can use the stopanalyzer:
StopAnalyzer StopAnalyzer includes the lower-case filter, and also has a filter that drops out any "stop words", words like articles (a, an, the, etc) that occur so commonly in english that they might as well be noise for searching purposes. StopAnalyzer comes with a set of stop words, but you can instantiate it with your own array of stop words.
http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/analysis/StopAnalyzer.html
more information:
http://www.darksleep.com/lucene/
How to sort by Lucene.Net field and ignore common stop words such as 'a' and 'the'?
Look at the NOT operator here. Just construct your query accordingly or massage if it is a user-generated query.