Solr query parser that allows specifying multiple default fields - apache

I would like to use the Dismax query parser because it allows me to specify multiple default search fields (using the 'qf' parameter) as well as other nice features such as field boosting.
However, I want a query parser/scoring algorithm that takes the sum of all field scores, rather than just the max.
Is there a way to configure DisMax to take a sum of scores rather than the max?
Can I specify multiple default search fields using the standard query parser?
Is there a different query parser alltogether that would achieve this?
Do I need to write my own query parser?
Any help is greatly appreciated.
Thanks!

Isn't that qt=fieldA fieldB what you are looking for?
if fieldA is more important do qt=fieldA^2 fieldB

Related

How to get all the records matching a regex in Aerospike?

I have millions of records in a set. I would like to retrieve all the records that match the same pattern.
For example I may have :
id=4444?mode=mode1?fieldA=abc
id=4444?mode=mode1?fieldA=azerty
id=4444?mode=mode1?fieldA=qwerty
id=4444?mode=mode1?fieldA=foo
id=4444?mode=mode1?fieldA=bar
Is it possible to make a query to get all the above records without knowing in advance the value of the fieldA ? Something like this in regex :
id=4444?mode=mode1?fieldA=[\w]*
Thanks for you time.
Yes, it can be done. You would need to query by a secondary index first to narrow the result set to a manageable size first, then write a filter using Lua which filters out the ones you don't want. This filter could take the regex you want to match against (passed in dynamically) and return only those records that match.
Whilst this would work, it would not be as performant as the key-value operations in Aerospike. You would definitely want to benchmark such a solution before putting it into production.
Predicate filtering was added in release 3.12 on March 15. You can use the stringRegex method of the PredExp class of the Java client to build complex filters such as the one you mentioned. It also currently exists for the C, C# and Go clients.
There's a similar example in the Aerospike Java client:
Statement stmt = new Statement();
stmt.setNamespace(params.namespace);
stmt.setSetName(params.set);
stmt.setFilter(Filter.range(binName, begin, end));
stmt.setPredExp(
PredExp.stringBin("bin3"),
PredExp.stringValue("prefix.*suffix"),
PredExp.stringRegex(RegexFlag.ICASE | RegexFlag.NEWLINE)
);
The RegexFlag class in com.aerospike.client.query defines which regular expressions you can use, and how they'd behave.

Searching using SOLR on multiple fields

I have two requirements for my SOLR implementation:
I need to be able to search on multiple fields at the same time (preferably with field boosting). This is possible using dismax parser.
I also have a specific set of indexed fields (example gender field). I need to be able to apply such specific filters (example: select?q=david&gender:male&status:married). As per my understanding of dismax, this is not possible.
Please suggest if the second requirement can be handled using dismax (or edismax)? For now i am forced to use standard query parser, even though i really liked dismax.
There is nothing stopping you from using dismax or edismax. Use qf to tell it which fields to search by default, and use fq to apply queries that act as filters.
/select?q=david&fq=gender:male&fq=status:married&qf=name^10 address^3
Filter Queries doesn't affect score, and will be cached separately. If you always filter on both gender and status, you could combine them to get a single query cache instead (fq=gender:male AND status:married).

Endeca search query on multiple fields

How to create an Endeca query on combination of multiple fields [just like where clause in sql query]. Suppose we have three fields indexed are -
empId
empName
empGender
Now, I need a query like "where empName like 's%' AND empGender=male"
Thanks.
Firstly,
Checkout Record Filters in the Advanced Development Guide.
If you are trying to use a Record Filter on a property, you will need to enable it explicitly in Developer Studio for that property, while your Dimensions will automatically have the ability to apply a Record Filter. This will help when you have explicit values to filter on, for example empGender.
Your Record Filter can then look as follow:
Nr=AND(empGender:male)
You can further use the Ntk parameter to specify fields to search on so assuming your empName field is enabled for wildcard searching (configure this in Developer Studio) searching this field will look as follow:
Ntk=empName&Ntt=s*
So assuming your properties have been configured correctly, your example above will probably end up looking as follow:
Nr=AND(empGender:male)&Ntk=empName&Ntt=s*
To take this one step further, you can specify Search Filters (ie. Ntk + Ntt parameters) together. I haven't tried this for wildcards so you'll need to confirm that yourself but to combine Search Filters you delimit them with |
Ntk=empName|empId&Ntt=s*|1234*
I suggest you manually build up queries in the Reference Application to confirm you get your expected results and then start to code this up in your application.
radimbe, the problem with record filters for this use case is that they need to be precise. This means you don't get pelling correction, thesaurus expansion, case insensitivity or stemming. It's very unlikely that a user will input precise information like this.
Saraubh, you can do a boolean search to do OR text search queries. You can also use the Endeca Query Language to specify a complex set of boolean logic that goes beyond boolean search and which would incorporate spelling correction, stemming, etc.
In general though, I think for an application like this, you should move away from searching specific individual fields simultaneously and make use of the faceting capabilities of dimensions to guide the user. Additionally, a search box that searches many fields in combination simultaneously in order of importance is really the way to go for a simplified user interface for this sort of application.

What is the Filter equivalent of TermQuery?

I'm doing some complex querying using Lucene 4.0, and my information-retrieval-theory buddy has told me that anywhere that I can use a filter instead of a query, I should, in order to improve performance. Therefore, I decided to take one particularly hairy component of the query and transform it into a filter. This is relatively straightforward, as there are Filter equivalents of BooleanQuery and NumericRangeQuery, but there doesn't seem to be a TermFilter equivalent of TermQuery. There is a FieldValueFilter, but that seems only to filter on the presence of a given field, not a particular value in that field.
What filter should I use for this?
I believe TermsFilter is what you are looking for.

searching in solr for specific values with dismax

I'm using the dismax handler to perform solr search over records (boosting some fields).
In my index, I have a RetailerId for each document, as well as other fields.
My query needs to search for documents that have this RetailerId as well as keywords:
http://localhost:8983/solr/select?qt=dismax&q=RetailerId:(27 OR 92) AND socks
What is the syntax for such a query?
Thanks!
Dismax does not support boolean operators. For a query like the one you described, you need to use the Standard Query Handler.
UPDATE
I have made a couple of tests and the fq parameter seems to work with dismax:
/select?qt=dismax&q=socks&fq=RetailerId:(27 OR 92)
if you want to filter by facet, user eDismax (extended disMax) that way you can say for instance q= your query AND face_name:"facet value"