How to search multiple fields using Surround QueryParser? - lucene

I have some queries regarding the Surround QueryParser. Could any of you please suggest?
How to search multiple fields at once?
As shown below, the syntax allows to search against one field. But how do I submit a query like "FIELD1:N(abc,corp) FIELD2:N(xyz,corp)". Is something like this possible with Surround QueryParser?
SrndQuery srndQuery = org.apache.lucene.queryparser.surround.parser.QueryParser.parse(strTxtSearchString);
Query query = srndQuery.makeLuceneQueryField(, new BasicQueryFactory());
How to escape special characters the way we do in the regular QueryParser as queryparser.escape();
How to escape words such as "and", "or", "W", "N" etc.? The search string itself might have the words such as "and". In that case, my query would look something like "N(abc,and,sons)" or "W(abc,n,company)".
I get a org.apache.lucene.queryparser.surround.parser.ParseException when I submit such a query.
How to provide wild card in the beginning of the words?
The regular QueryParser lets us do parser.setAllowLeadingWildcard(true); Is there some way to do this with the Surround QueryParser?
Any inputs will be very helpful. Thanks!

Related

SQL Server Full Text Search with complete sentences

I have an Azure SQL database and tried the full text search.
Is is possible to search for a complete sentence?
E.g. query with LIKE-operator that works (but probably not fast as full text search):
SELECT Sentence
FROM Sentences
WHERE 'This is a whole sentence for example.' LIKE '%'+Sentence+'%'
Would return: "a whole sentence"
I need something like that with full text search:
SELECT Sentence
FROM Sentences
WHERE FREETEXT(WorkingExperience,'This is a whole sentence for example.')
This will return each hit on a word, but not on the complete sentence.
E.g. would return: "a whole sentence" and "another sentence".
Is that possible or do I have to use the LIKE-operator?
Have you tried this:
SELECT Sentence
FROM Sentences
WHERE FREETEXT(WorkingExperience,'"This is a whole sentence for example."')
If the above doesn't work you may need to construct the proper FTS search string using AND operator, like below:
SELECT Sentence
FROM Sentences
WHERE FREETEXT(WorkingExperience,'"This" AND "is" AND "a" AND "whole"
AND "sentence" AND "for" AND "example."')
Also, for more precise matching I recommend using CONTAINS or CONTAINSTABLE:
SELECT Sentence
FROM Sentences
WHERE CONTAINS(WorkingExperience,'"This is a whole sentence for example."')
HTH
If anyone else is interested here is a link for a good article with examples:
https://www.microsoftpressstore.com/articles/article.aspx?p=2201634&seqNum=3
You can choose the best method to accommodate your need from the examples.
To me matching a whole sentence can be easily done with the below where clause as mentioned in the other answer:
WHERE CONTAINS(WorkingExperience,'"This is a whole sentence for example."')
if you need to look for all the words but user might input them in abnormal sequence I would suggest to use
WHERE CONTAINS(WorkingExperience, N'NEAR(This, whole, sentence, is, a, for, example)')
You can do other magics with this which can be found in the above article. If you need to order the result based on the hit score/rank you will need to use CONTAINSTABLE instead of CONTAINS.

Lucene phase query case insensitive

I am writing a query to do an exact match on a 'city' field. The field/property is defined as:
#org.hibernate.search.annotations.Field(index = Index.YES, analyze = Analyze.NO, store = Store.NO)
private String city;
If I have the value of "New York", I want to find a match if user enters "new york", or some variation of case. I am using the StandardAnalyzer for the entity, so I know that will lowercase all the tokens. I don't tokenize since I want to match the phrase (Analyze.NO).
I tried to lowercase my search value, but no luck.
Query query = qb.phrase().onField(.....).sentence(location.toLowerCase()).createQuery();
If I don't lowercase the search term and the value is 'New York', results are returned. Searching for 'new york' does not return any result.
If I tokenize (Analyze.YES), then other cities like 'New Jersey' are returned. I know I can use a wildcard query (searchTerm*), but I was hoping to be able to do a case insensitive search on a phrase. Just not sure if that's possible unless you use the wildcard.
thanks
It sounds like you would want to use an analyzer which emits the entire text as a single token while lower-casing the input. In this case, you would want to use analyze=Analyze.YES, while specifying the appropriate analyzer (the answer here has code that looks like what you need) using analyzer=#Analyzer(impl=your.fully.qualified.Analyzer.class).

Space issue in Lucene.NET C#

I want to search sentence which has space in full text search.
Ex: Tom is a very good boy in class.
I want to Search the key word "very good".
I'm using white space tokenizer to create/search index. But it is not finding the keyword if it is separated by space.
Code:
Query searchItemQuery = new WildcardQuery(new Term(string-field-name, searchkeyword.ToLower()));
I've tried with split but it is not working properly.
Do anyone suggest me a solution for this problem?
Thanks,
Vijay
Since, you are working with tokenized string, every word is a separate term.
In order too find a phrase consisting of multiple terms, you would need to use PhraseQuery instead of WildcardQuery.
Like this:
PhraseQuery phraseQuery = new PhraseQuery();
phraseQuery.Add(new Term(string-field-name, "very"));
phraseQuery.Add(new Term(string-field-name, "good"));
Note also, that you are using wildcard query. Wildcards in phrase query are a bit complex. Check this post for details: Lucene - Wildcards in phrases
And finally, I would suggest to consider using QueryParser instead of constructing query manually.

Lucene ignore keywords in search term

This seems like it should be simple, but I can't figure out how to get Lucene to ignore the AND, OR, and NOT keywords - the query parser throws a parse error when it gets one. I have a query builder class that splits the search term so that it searches on the words themselves as well as on n-grams in the word. I'm using Lucene in Java.
So in a search for, say, "ANDERSON COOPER" the query string looks like:
name: (ANDERSON COOPER "ANDERSON COOPER")^5 gram4: ( ANDE NDER DERS ERSO RSON
SONC ONCO NCOO COOP OOPE OPER)
the query parser throws an error when it gets those ANDs. Ideally, I'd like the parser to just ignore AND, OR, NOT altogether, and I'll use the &&, ||, and ! equivalents if I need them - do I have to modify the code in the QueryParser class itself to get this? Or is there an easier way? I could also just insert an escape character for these cases if that is the best way to do it, but adding \ before the word AND doesn't seem to do anything.
You can wrap the AND in quotes like this: "AND". Is that easy? A regex could probably do that easily if you know exactly what your queries look like.
The parser shouldn't have a problem with it, and the PhraseQuery will be rewritten as a term query, so it will be a small constant-time performance difference big-oh O(1).
The regex could probably look like this:
\b(AND|OR|NOT)\b
Which would be replaced with
"$1"

How to make Lucene match all words in query?

I am using Lucene to allow a user to search for words in a large number of documents. Lucene seems to default to returning all documents containing any of the words entered.
Is it possible to change this behaviour? I know that '+' can be use to force a term to be included but I would like to make that the default action.
Ideally I would like functionality similar to Google's: '-' to exclude words and "abc xyz" to group words.
Just to clarify
I also thought of inserting '+' into all spaces in the query. I just wanted to avoid detecting grouped terms (brackets, quotes etc) and potentially breaking the query. Is there another approach?
This looks similar to the Lucene Sentence Search question. If you're interested, this is how I answered that question:
String defaultField = ...;
Analyzer analyzer = ...;
QueryParser queryParser = new QueryParser(defaultField, analyzer);
queryParser.setDefaultOperator(QueryParser.Operator.AND);
Query query = queryParser.parse("Searching is fun");
Like Adam said, there's no need to do anything to the query string. QueryParser's setDefaultOperator does exactly what you're asking for.
Why not just preparse the user search input and adjust it to fit your criteria using the Lucene query syntax before passing it on to Lucene. Alternatively, you could just create some help documentation on how to use the standard syntax to create a specific query and let the user decide how the query should be performed.
Lucene has a extensive query language as described here that describes everything you want except for + being the default but that's something you can simple handle by replacing spaces with +. So the only thing you need to do is define the format you want people to enter their search queries in (I would strongly advise to adhere to the default Lucene syntax) and then you can write the transformations from your own syntax to the Lucene syntax.
The behavior is hard-coded in method addClause(List, int, int, Query) of class org.apache.lucene.queryParser.QueryParser, so the only way to change the behavior (other than the workarounds above) is to change that method. The end of the method looks like this:
if (required && !prohibited)
clauses.addElement(new BooleanClause(q, BooleanClause.Occur.MUST));
else if (!required && !prohibited)
clauses.addElement(new BooleanClause(q, BooleanClause.Occur.SHOULD));
else if (!required && prohibited)
clauses.addElement(new BooleanClause(q, BooleanClause.Occur.MUST_NOT));
else
throw new RuntimeException("Clause cannot be both required and prohibited");
Changing "SHOULD" to "MUST" should make clauses (e.g. words) required by default.