Searching using SOLR on multiple fields - apache

I have two requirements for my SOLR implementation:
I need to be able to search on multiple fields at the same time (preferably with field boosting). This is possible using dismax parser.
I also have a specific set of indexed fields (example gender field). I need to be able to apply such specific filters (example: select?q=david&gender:male&status:married). As per my understanding of dismax, this is not possible.
Please suggest if the second requirement can be handled using dismax (or edismax)? For now i am forced to use standard query parser, even though i really liked dismax.

There is nothing stopping you from using dismax or edismax. Use qf to tell it which fields to search by default, and use fq to apply queries that act as filters.
/select?q=david&fq=gender:male&fq=status:married&qf=name^10 address^3
Filter Queries doesn't affect score, and will be cached separately. If you always filter on both gender and status, you could combine them to get a single query cache instead (fq=gender:male AND status:married).

Related

Endeca search query on multiple fields

How to create an Endeca query on combination of multiple fields [just like where clause in sql query]. Suppose we have three fields indexed are -
empId
empName
empGender
Now, I need a query like "where empName like 's%' AND empGender=male"
Thanks.
Firstly,
Checkout Record Filters in the Advanced Development Guide.
If you are trying to use a Record Filter on a property, you will need to enable it explicitly in Developer Studio for that property, while your Dimensions will automatically have the ability to apply a Record Filter. This will help when you have explicit values to filter on, for example empGender.
Your Record Filter can then look as follow:
Nr=AND(empGender:male)
You can further use the Ntk parameter to specify fields to search on so assuming your empName field is enabled for wildcard searching (configure this in Developer Studio) searching this field will look as follow:
Ntk=empName&Ntt=s*
So assuming your properties have been configured correctly, your example above will probably end up looking as follow:
Nr=AND(empGender:male)&Ntk=empName&Ntt=s*
To take this one step further, you can specify Search Filters (ie. Ntk + Ntt parameters) together. I haven't tried this for wildcards so you'll need to confirm that yourself but to combine Search Filters you delimit them with |
Ntk=empName|empId&Ntt=s*|1234*
I suggest you manually build up queries in the Reference Application to confirm you get your expected results and then start to code this up in your application.
radimbe, the problem with record filters for this use case is that they need to be precise. This means you don't get pelling correction, thesaurus expansion, case insensitivity or stemming. It's very unlikely that a user will input precise information like this.
Saraubh, you can do a boolean search to do OR text search queries. You can also use the Endeca Query Language to specify a complex set of boolean logic that goes beyond boolean search and which would incorporate spelling correction, stemming, etc.
In general though, I think for an application like this, you should move away from searching specific individual fields simultaneously and make use of the faceting capabilities of dimensions to guide the user. Additionally, a search box that searches many fields in combination simultaneously in order of importance is really the way to go for a simplified user interface for this sort of application.

Hibernate Search - possible to get new Lucene query after facets applied?

A Lucene Query is generated as so:
Query luceneQuery = builder.all().createQuery();
Then facets are applied.
I'm not sure if when facets are applied the luceneQuery is ANDed and ORed with other Querys resulting in a new Lucene Query. Alternatively, perhaps a bunch of BitSets's are applied to the original Query to refine the results. (I don't know).
If a new query is generated I'd like to retrieve it. If not, I need a rethink. That's the crux of the question.
Why:
I'm applying a faceted search on a field with multiple possible values.
E.g. TMovie.class many-to-many TTag.class (multiple-value-facet)
I'm filtering on TMovie where TTag is some value.
Anyway, the filtering works but there is a known problem whereby the Facet-counts returned are incorrect.
Detailed here: Add faceting over multivalued to application using Hibernate Search and https://forum.hibernate.org/viewtopic.php?f=9&t=1010472
I'm using this solution:
http://sujitpal.blogspot.ie/2007/04/lucene-search-within-search-with.html (see comment on new API under article)
The BitSet solution (in this example at least) generates counts based on the original Lucene Query. This works perfectly. However.....
If alternate (different, not TTags) facets are applied to the original query some complications arise.
The Bitset solution calculates on the original Lucene query. It does not calculate on the lucene query now reduced by the application of alternate Facets (a different FacetSelection) (or even TTag Facets themselves for that matter). I.e. the count calculations are irrespective of any other FacetSelection Facets applied.
So...
A. can I get the new Lucene query after facets are applied? The BitSet solution applied to this would be correct.
B. Any other alternative suggestions?
Thanks so much.. All comments welcome.
John
Regarding your first question, applying a facet is not modifying the original query, it uses a custom Collector called FacetCollector - see https://github.com/hibernate/hibernate-search/blob/master/engine/src/main/java/org/hibernate/search/query/collector/impl/FacetCollector.java. Under the hood the collector uses a Lucene FieldCache for doing the facet count. There is also the root of the limitation for multi-value faceting. FieldCache does not support multiple values per field.
Anyways, no additional queries are applied during faceting and the original query is unmodified. The benefit of course is performance. The solution you are pointing to probably works as well, but relies on running multiple queries. However, it might be a valid work around for your use case.

Any way to merge two queries in solr?

In my project, we use solr to index a lot of different kind of documents, by example Books and Persons, with some common fields (like the name) and some type-specific fields (like the category, or the group people belong to).
We would like to do queries that can find both books and persons, with for each document type some filters applied. Something like:
find all Books and Persons with "Jean" in the name and/or content
but only Books from category "fiction" and "fantasy"
and only Persons from the group "pangolin"
everything sorted by score
A very simple way to do that would be:
q = name:jean content:jean
&
fq=
(type:book AND category:(fiction fantasy))
OR
(type:person AND group:pangolin)
But alas, as fq are cached, I'd prefer something allowing me simpler and so more reusable fq like :
fq=type:book,
fq=type:person,
fq=category(fiction fantasy),
fq=group:pangolin.
Is there a way to tell solr to merge or combine many queries? Something like 'grouping' fq together.
I read a bit about nested queries with _query_, but the very few documentation about it makes me think it's not the solution I'm looking for.
As Geert-Jan mentioned it in his answer, the possibility to do OR between fq is a solr asking feature, but with very little support by now: https://issues.apache.org/jira/browse/SOLR-1223
So I managed to simulate what I want to in a simple way:
for each field a document type can have, we have to define everytime a value (so if in my own example Books can have no category, at index time we still have to define something like category=noCategoryCode
when using a filter on one of this fields in a query on multiple types, we add a non-present condition in the filter, so fq=category:fiction becomes fq=category:fiction (*:* AND -category:*)
By this way, all other types (like Person) will pass through this filter, and the filter stands quite atomic and often used - so caching is still useful.
So, my full example becomes:
q = name:jean content:jean
&
fq= type:(book person)
&
fq= category:(fiction fantasy) (*:* AND -category:*)
&
fq= group:(pangolin) (*:* AND -group:*)
Still, can't wait SOLR-1223 to be patched :)
You can apply multiple filter queries at the same time
q=name:jean content:jean&fq=type:book&fq=type:person&fq=category(fiction fantasy)&fq=group:pangolin
Perhaps I am not understanding your issue, but the only difference between a query and a filter is that the filter is cached. If you don't care about the caching, just modify their query:
real query +((type:book category:fiction) (type:person group:pangolin))

Solr query parser that allows specifying multiple default fields

I would like to use the Dismax query parser because it allows me to specify multiple default search fields (using the 'qf' parameter) as well as other nice features such as field boosting.
However, I want a query parser/scoring algorithm that takes the sum of all field scores, rather than just the max.
Is there a way to configure DisMax to take a sum of scores rather than the max?
Can I specify multiple default search fields using the standard query parser?
Is there a different query parser alltogether that would achieve this?
Do I need to write my own query parser?
Any help is greatly appreciated.
Thanks!
Isn't that qt=fieldA fieldB what you are looking for?
if fieldA is more important do qt=fieldA^2 fieldB

searching in solr for specific values with dismax

I'm using the dismax handler to perform solr search over records (boosting some fields).
In my index, I have a RetailerId for each document, as well as other fields.
My query needs to search for documents that have this RetailerId as well as keywords:
http://localhost:8983/solr/select?qt=dismax&q=RetailerId:(27 OR 92) AND socks
What is the syntax for such a query?
Thanks!
Dismax does not support boolean operators. For a query like the one you described, you need to use the Standard Query Handler.
UPDATE
I have made a couple of tests and the fq parameter seems to work with dismax:
/select?qt=dismax&q=socks&fq=RetailerId:(27 OR 92)
if you want to filter by facet, user eDismax (extended disMax) that way you can say for instance q= your query AND face_name:"facet value"