How to get many terms matched using Hibernate Search query DSL? - lucene

When I search for "cars blue" I get every result that matches "cars" or "blue", but I need to match them both. I've read about setting some defaultOperator to AND but I can't find where to do that,
Also I can't use PhraseQuery because the order of the terms in the search query is irrelevant,
This is my code so far, thanks!
// create the query using Hibernate Search query DSL
QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory()
.buildQueryBuilder().forEntity(Articulo.class).get();
// a very basic query by keywords
BooleanJunction<BooleanJunction> bool = queryBuilder.bool();
bool.must(queryBuilder.keyword()
.onFields("description")
.matching(text)
.createQuery()
);
Query query = bool.createQuery();
FullTextQuery jpaQuery =
fullTextEntityManager.createFullTextQuery(query, Articulo.class);
return jpaQuery.getResultList();
Note: I'm using Hibernate Search 5.6.4

I think you're looking for the Simple query string feature.
See http://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_simple_query_string_queries for more details about it.
You have an example with .withAndAsDefaultOperator():
Query luceneQuery = mythQB
.simpleQueryString()
.onField("history")
.withAndAsDefaultOperator()
.matching("storm tree")
.createQuery();
This blog post explaining the rationale of this feature might be helpful too: http://in.relation.to/2017/04/27/simple-query-string-what-about-it/ .

Related

what is the difference between TermQuery and QueryParser in Lucene 6.0?

There are two queries,one is created by QueryParser:
QueryParser parser = new QueryParser(field, analyzer);
Query query1 = parser.parse("Lucene");
the other is term query:
Query query2=new TermQuery(new Term("title", "Lucene"));
what is the difference between query1 and query2?
This is the definition of Term from lucene docs.
A Term represents a word from text. This is the unit of search. It is composed of two elements, the text of the word, as a string, and the name of the field that the text occurred in.
So in your case the query will be created to search the word "Lucene" in the field "title".
To explain the difference between the two let me take a difference example,
consider the following
Query query2 = new TermQuery(new Term("title", "Apache Lucene"));
In this case the query will search for the exact word "Apache Lucene" in the field title.
In the other case
As an example, let's assume a Lucene index contains two fields, "title" and "body".
QueryParser parser = new QueryParser("title", "StandardAnalyzer");
Query query1 = parser.parse("title:Apache body:Lucene");
Query query2 = parser.parse("title:Apache Lucene");
Query query3 = parser.parse("title:\"Apache Lucene\"");
couple of things.
"title" is the field that QueryParser will search if you don't prefix it with a field.(as given in the constructor).
parser.parse("title:Apache body:Lucene"); -> in this case the final query will look like this. query2 = title:Apache body:Lucene.
parser.parse("body:Apache Lucene"); -> in this case the final query will also look like this. query2 = body:Apache title:Lucene. but for a different reason.
So the parser will search "Apache" in body field and "Lucene" in title field. Since The field is only valid for the term that it directly precedes,(http://lucene.apache.org/core/2_9_4/queryparsersyntax.html)
So since we do not specify any field for lucene , the default field which is "title" will be used.
query2 = parser.parse("title:\"Apache Lucene\""); in this case we are explicitly telling that we want to search for "Apache Lucene" in field "title". This is phrase query and is similar to Term query if analyzed correctly.
So to summarize the term query will not analyze the term and search as it is. while Query parser parses the input based on some conditions described above.
The QueryParser parses the string and constructs a BooleanQuery (afaik) consisting of BooleanClauses and analyzes the terms along the way.
The TermQuery does NOT do analysis, and takes the term as-is. This is the main difference.
So the query1 and query2 might be equivalent (in a sense, that they provide the same search results) if the field is the same, and the QueryParser's analyzer is not changing the term.

Search by exact words in a phrase using Umbraco Examine

I have some description field per content and those are:
For content1:
The quick brown fox jumps over the lazy dog. And the lazy dog is good.
For content2:
The lazy fog is crazy.
Now, when I use keyword = lazy dog, I want to give result as content1 and not content2
I tried like:
BaseSearchProvider searcher = ExamineManager.Instance.SearchProviderCollection["MySearch"];
ISearchCriteria criteria =
searcher.CreateSearchCriteria()
.GroupedAnd( new List<string> { "description" }, "lazy dog") )
.Compile();
ISearchResults result = searcher.Search( criteria );
But it didn't gave me desired results, it give me results: content1 and content2.
What should I do in order to get as content1 result ?
By default examine is compiling this query to:
+(+description:lazy dog)
and based on it it's returning the results with both: lazy and dog words.
What you want to achieve is:
+(+description:"lazy dog")
First of what you need to try is to escape the phrase. In your case it will be:
BaseSearchProvider searcher = ExamineManager.Instance.SearchProviderCollection["MySearch"];
ISearchCriteria criteria =
searcher.CreateSearchCriteria()
.GroupedAnd( new List<string> { "description" }, "lazy dog".Escape()) )
.Compile();
ISearchResults result = searcher.Search( criteria );
Can't test it now, but there were some problems with it in the past from what I remember. The second option and a life saver for you, may be building the search query manually and using the raw query.
BaseSearchProvider searcher = ExamineManager.Instance.SearchProviderCollection["MySearch"];
ISearchCriteria criteria = searcher.CreateSearchCriteria();
var query = criteria.RawQuery("+description:\"lazy dog\"");
ISearchResults result = searcher.Search( query );
And it should return you correct = matched result only. Personally, I've used also some boosting of specific words to just point some results higher in the score list, but if you want to have only matched items, try above solutions and let me know if it helped you.
If you want to deal with more than one property, you can either use some fluent API methods like GroupedAnd or GroupedOr (depending of the desired behaviour of search) or build more advanced raw query.
For the first option, check Grouped Operations documentation: https://github.com/Shazwazza/Examine/wiki/Grouped-Operations.
For the second scenario it would be the best to analyze how it's done e.g. in ezSearch package (which btw. is awesome!): https://github.com/umco/umbraco-ezsearch/blob/master/Src/Our.Umbraco.ezSearch/Web/UI/Views/MacroPartials/ezSearch.cshtml.

lucene wildcard query with space

I have Lucene index which has city names.
Consider I want to search for 'New Delhi'. I have string 'New Del' which I want to pass to Lucene searcher and I am expecting output as 'New Delhi'.
If I generate query like Name:New Del* It will give me all cities with 'New and Del'in it.
Is there any way by which I can create Lucene query wildcard query with spaces in it?
I referred and tried few solutions given # http://www.gossamer-threads.com/lists/lucene/java-user/5487
It sounds like you have indexed your city names with analysis. That will tend to make this more difficult. With analysis, "new" and "delhi" are separate terms, and must be treated as such. Searching over multiple terms with wildcards like this tends to be a bit more difficult.
The easiest solution would be to index your city names without tokenization (lowercasing might not be a bad idea though). Then you would be able to search with the query parser simply by escaping the space:
QueryParser parser = new QueryParser("defaultField", analyzer);
Query query = parser.parse("cityname:new\\ del*");
Or you could use a simple WildcardQuery:
Query query = new WildcardQuery(new Term("cityname", "new del*"));
With the field analyzed by standard analyzer:
You will need to rely on SpanQueries, something like this:
SpanQuery queryPart1 = new SpanTermQuery(new Term("cityname", "new"));
SpanQuery queryPart2 = new SpanMultiTermQueryWrapper(new WildcardQuery(new Term("cityname", "del*")));
Query query = new SpanNearQuery(new SpanQuery[] {query1, query2}, 0, true);
Or, you can use the surround query parser (which provides query syntax intended to provide more robust support of span queries), using a query like W(new, del*):
org.apache.lucene.queryparser.surround.parser.QueryParser surroundparser = new org.apache.lucene.queryparser.surround.parser.QueryParser();
SrndQuery srndquery = surroundparser.parse("W(new, del*)");
query = srndquery.makeLuceneQueryField("cityname", new BasicQueryFactory());
As I learnt from the thread mentioned by you (http://www.gossamer-threads.com/lists/lucene/java-user/5487), you can either do an exact match with space or treat either parts w/ wild card.
So something like this should work - [New* Del*]

Lucene - "AND" sets of "OR" terms

Suppose I have a search using criteria such as a list countries. A user can select a set of countries to search across and combine this set with other criteria.
In SQL I'd do this in my where clause i.e. WHERE (country = 'brazil' OR country = 'france' OR country = 'china) AND (other search criteria).
It isn't clear how to do this in Lucene. Query.combine seems to have promise but that would increase in complexity very quickly if I have multiple sets of "OR" terms to work through.
Is Lucene capable in this regard? Or should I just hit my regular DB with these types of criteria and filter my Lucene results?
Digging deeper, it looks like you can nest boolean queries to accomplish this. I'll update with an answer if this technique works and if it is performant.
Using the standard query parser(and you can take a look at the relevant documentation), you can use syntax similar to a DB query, such as:
(country:brazil OR country:france OR country:china) AND (other search criteria)
Or, to simplify a bit:
country:(brazil OR france OR china) AND (other search criteria)
Alternatively, Lucene also supports queries written using +/-, rather than AND/OR syntax. I find that syntax more expressive for a Lucene query. The equivalent in this form would be:
+country:(brazil france china) +(other search criteria)
If manually constructing queries, you can indeed nest BooleanQueries to create a similar structure, using the correct BooleanClauses to establish the And/Or logic you've specified:
Query countryQuery = new BooleanQuery();
countryQuery.add(new TermQuery(new Term("country","brazil")),BooleanClause.Occur.SHOULD);
countryQuery.add(new TermQuery(new Term("country","france")),BooleanClause.Occur.SHOULD);
countryQuery.add(new TermQuery(new Term("country","china")),BooleanClause.Occur.SHOULD);
Query otherStuffQuery = //Set up the other query here,
//or get it from a query parser, or something
Query rootQuery = new BooleanQuery();
rootQuery.add(countryQuery, BooleanClause.Occur.MUST);
rootQuery.add(otherStuffQuery, BooleanClause.Occur.MUST);
Two ways.
Let the Lucene formulate the query. To accomplish that, send in the query string in the following format.
Query: "country(brazil france china)"
An inbuilt QueryParser parses the above string to a BooleanQuery with an OR operator.
QueryParser qp = new QueryParser(Version.LUCENE_41, "country", new StandardAnalyzer(Version.LUCENE_41));
Query q = qp.parse(s);
If you want to formulate the query yourself,
BooleanQuery bq = new BooleanQuery();
//
TermQuery tq = new TermQuery(new Term("country", "brazil"));
bq.add(tq, Occur.SHOULD); // SHOULD ==> OR operator
//
tq = new TermQuery(new Term("country", "france"));
bq.add(tq, Occur.SHOULD);
//
tq = new TermQuery(new Term("country", "china"));
bq.add(tq, Occur.SHOULD);
Unless you add hundreds of subqueries, Lucene will meet your expectations performance-wise.

Lucene: Can I run a query against few specific docs of the collection only?

Can I run a query against few specific docs of the collection only ?
Can I filter the built collection according to documents fields content ?
For example I would like to query over documents having field2 = "abc".
thanks
Sure -- use a Filter. See http://lucene.apache.org/java/3_0_1/api/core/org/apache/lucene/search/QueryWrapperFilter.html
The code will look something like:
QueryParser qp = ...
Filter filter = new QueryWrapperFilter(qp.parse("field2:abc"));
// pass filter to searcher.search()