How to exclude results having (or not) a specific value in a collection? - lucene

let's say I have an entity more or less like this (pseudo code):
class Contact {
String name;
String surname;
List<Address> addresses;
}
class Address {
String streetName;
String type;
}
* let's say every field is correctly annotated with #Field / #Indexed / #Embeddable etc
Using jpa hibernate-search I can get every contact correctly using full-text-queries and fuzzy, but I cannot find a way to limit the search only to
name or surname or (addresses.streetName but only if addresses.type="XYZ"). I don't want it to search into streetNames when they're not of the type xyz.
org.apache.lucene.search.Query baseQuery = qb
.keyword()
.fuzzy()
.onFields("name", "surname")
.matching(String.join("+", queryStrings))
.createQuery();
org.apache.lucene.search.Query addressQueryRestriction = qb.keyword()
.onField("addresses.type")
.matching("XYZ")
.createQuery();
org.apache.lucene.search.Query addressQuery = qb.fuzzy()
.onFields("addresses.streetName")
.matching(String.join("+", queryStrings))
.createQuery();
org.apache.lucene.search.Query queryAddressComposite = qb
.bool()
.must(addressQuery)
.must(addressQueryRestriction)
.createQuery();
org.apache.lucene.search.Query finalQuery = qb
.bool()
.should(baseQuery)
.should(queryAddressComposite)
.createQuery();
I've been trying a lot by composing alternative queries with .bool().must() / should() / must().not() but without too much success. Especially when a contact has an XYZ address but also others that aren't.
I'm starting to thing it's a logical issue here, as I'm looking into a list, but if you have any idea of what I'm doing wrong please blast me.

If you indexed-embed a list of addresses in your document, and want to apply conditions to each of these addresses, rather than to all of them merged together, you need to index each object as a nested document and then use a "nested" predicate.
The concept of nested documents exists in Hibernate Search 6 (still in Beta), but not in Hibernate Search 5.
I would recommend that you upgrade.
See this section of the Hibernate Search 6 documentation for more information.

Related

How to get many terms matched using Hibernate Search query DSL?

When I search for "cars blue" I get every result that matches "cars" or "blue", but I need to match them both. I've read about setting some defaultOperator to AND but I can't find where to do that,
Also I can't use PhraseQuery because the order of the terms in the search query is irrelevant,
This is my code so far, thanks!
// create the query using Hibernate Search query DSL
QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory()
.buildQueryBuilder().forEntity(Articulo.class).get();
// a very basic query by keywords
BooleanJunction<BooleanJunction> bool = queryBuilder.bool();
bool.must(queryBuilder.keyword()
.onFields("description")
.matching(text)
.createQuery()
);
Query query = bool.createQuery();
FullTextQuery jpaQuery =
fullTextEntityManager.createFullTextQuery(query, Articulo.class);
return jpaQuery.getResultList();
Note: I'm using Hibernate Search 5.6.4
I think you're looking for the Simple query string feature.
See http://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_simple_query_string_queries for more details about it.
You have an example with .withAndAsDefaultOperator():
Query luceneQuery = mythQB
.simpleQueryString()
.onField("history")
.withAndAsDefaultOperator()
.matching("storm tree")
.createQuery();
This blog post explaining the rationale of this feature might be helpful too: http://in.relation.to/2017/04/27/simple-query-string-what-about-it/ .

Search by exact words in a phrase using Umbraco Examine

I have some description field per content and those are:
For content1:
The quick brown fox jumps over the lazy dog. And the lazy dog is good.
For content2:
The lazy fog is crazy.
Now, when I use keyword = lazy dog, I want to give result as content1 and not content2
I tried like:
BaseSearchProvider searcher = ExamineManager.Instance.SearchProviderCollection["MySearch"];
ISearchCriteria criteria =
searcher.CreateSearchCriteria()
.GroupedAnd( new List<string> { "description" }, "lazy dog") )
.Compile();
ISearchResults result = searcher.Search( criteria );
But it didn't gave me desired results, it give me results: content1 and content2.
What should I do in order to get as content1 result ?
By default examine is compiling this query to:
+(+description:lazy dog)
and based on it it's returning the results with both: lazy and dog words.
What you want to achieve is:
+(+description:"lazy dog")
First of what you need to try is to escape the phrase. In your case it will be:
BaseSearchProvider searcher = ExamineManager.Instance.SearchProviderCollection["MySearch"];
ISearchCriteria criteria =
searcher.CreateSearchCriteria()
.GroupedAnd( new List<string> { "description" }, "lazy dog".Escape()) )
.Compile();
ISearchResults result = searcher.Search( criteria );
Can't test it now, but there were some problems with it in the past from what I remember. The second option and a life saver for you, may be building the search query manually and using the raw query.
BaseSearchProvider searcher = ExamineManager.Instance.SearchProviderCollection["MySearch"];
ISearchCriteria criteria = searcher.CreateSearchCriteria();
var query = criteria.RawQuery("+description:\"lazy dog\"");
ISearchResults result = searcher.Search( query );
And it should return you correct = matched result only. Personally, I've used also some boosting of specific words to just point some results higher in the score list, but if you want to have only matched items, try above solutions and let me know if it helped you.
If you want to deal with more than one property, you can either use some fluent API methods like GroupedAnd or GroupedOr (depending of the desired behaviour of search) or build more advanced raw query.
For the first option, check Grouped Operations documentation: https://github.com/Shazwazza/Examine/wiki/Grouped-Operations.
For the second scenario it would be the best to analyze how it's done e.g. in ezSearch package (which btw. is awesome!): https://github.com/umco/umbraco-ezsearch/blob/master/Src/Our.Umbraco.ezSearch/Web/UI/Views/MacroPartials/ezSearch.cshtml.

Lucene.net PerFieldAnalyzerWrapper

I've read on how to use the per field analyzer wrapper, but can't get it to work with a custom analyzer of mine. I can't even get the analyzer to run the constructor, which makes me believe I'm actually calling the per field analyzer incorrectly.
Here's what I'm doing:
Create the per field analyzer:
PerFieldAnalyzerWrapper perFieldAnalyzer = new PerFieldAnalyzerWrapper(srchInfo.GetAnalyzer(true));
perFieldAnalyzer.AddAnalyzer("<special field>", dta);
Add all the fields do document as usual, including a special field that we analyze differently.
And add document using the analyzer like this:
iw.AddDocument(doc, perFieldAnalyzer);
Am I on the right track?
The problem was related to my reliance on CMSs (Kentico) built-in Lucene helper classes. Basically, using those classes you need to specify the custom analyzer at index-level through the CMS and I did not wish to do that. So I ended up using Lucene.net directly almost everywhere gaining the flexibility of using any custom analyzer I want
I also did some changes to how I structure data and ended up using the tried-and-true KeywordAnalyzer to analyze document tags. Previously I was trying to do some custom tokenization magic on comma separated values like [tag1, tag2, tag with many parts] and could not get it reliably working with multi-parted tags. I still kept that field, but started adding multiple "tag" fields to the document, each storing one tag. So now I have N "tag" fields for "N" tags, each analyzed as a keyword, meaning each tag (one word or many) is a single token.
I think I overthinked it with my initial approach.
Here is what I ended up with.
On Indexing:
KeywordAnalyzer ka = new KeywordAnalyzer();
PerFieldAnalyzerWrapper perFieldAnalyzer = new PerFieldAnalyzerWrapper(srchInfo.GetAnalyzer(true));
perFieldAnalyzer.AddAnalyzer("documenttags_t", ka);
-- Some procedure to compile all documents by reading from DB and putting into Lucene docs
foreach(var doc in docs)
{
iw.AddDocument(doc, perFieldAnalyzer);
}
On Searching:
KeywordAnalyzer ka = new KeywordAnalyzer();
PerFieldAnalyzerWrapper perFieldAnalyzer = new PerFieldAnalyzerWrapper(srchInfo.GetAnalyzer(true));
perFieldAnalyzer.AddAnalyzer("documenttags_t", ka);
string baseQuery = "documenttags_t:\"" + tagName + "\"";
Query query = _parser.Parse(baseQuery);
var results = _searcher.Search(query, sortBy)

How do you get Endeca to search on a particular target field rather than across all indexed fields?

We have an Endeca index configured across multiple fields of email content - subject and body. But we only want searches to be performed on the subject lines. Endeca is returning matches within the bodies too. How do you limit the search to the subject?
You can search a specific field or fields by specifying it (them) with the Ntk parameter.
Or if you wish to search a specific group of fields frequently you can set up an interface (also specified with the Ntk parameter), that includes that group of fields.
This is how you can do it using presentation API.
final ENEQuery query = new ENEQuery();
final DimValIdList dimValIdList = new DimValIdList("0");
query.setNavDescriptors(dimValIdList);
final ERecSearchList searches = new ERecSearchList();
final StringBuilder builder = new StringBuilder();
for(final String productId : productIds){
builder.append(productId);
builder.append(" ");
}
final ERecSearch eRecSearch = new ERecSearch("product.id", builder.toString().trim(), "mode matchany");
searches.add(eRecSearch);
query.setNavERecSearches(searches);
Please see this post for a complete example.
Use Search Interfaces in Developer Studio.
Refer - http://docs.oracle.com/cd/E28912_01/DeveloperStudio.612/pdf/DevStudioHelp.pdf#page=209

SQL Select Like Keywords in Any Order

I am building a Search function for a shopping cart site, which queries a SQL Server database. When the user enters "Hula Hoops" in the search box, I want results for all records containing both "Hula" and "Hoop", in any order. Furthermore, I need to search multiple columns (i.e. ProductName, Description, ShortName, MaufacturerName, etc.)
All of these product names should be returned, when searching for "Hula hoop":
Hula hoop
Hoop Hula
The Hoopity of xxhula sticks
(Bonus points if these can be ordered by relevance!)
It sounds like you're really looking for full-text search, especially since you want to weight the words.
In order to use LIKE, you'll have to use multiple expressions (one per word, per column), which means dynamic SQL. I don't know which language you're using, so I can't provide an example, but you'll have to produce a statement that's like this:
For "Hula Hoops":
where (ProductName like '%hula%' or ProductName like '%hoops%')
and (Description like '%hula%' or Description like '%hoops%')
and (ShortName like '%hula%' or ShortName like '%hoops%')
etc.
Unfortunately, that's really the only way to do it. Using Full Text Search would allow you to reduce your criteria to one per column, but you'll still have to specify the columns explicitly.
Since you're using SQL Server, I'm going to hazard a guess that this is a C# question. You'd have to do something like this (assuming you're constructing the SqlCommand or DbCommand object yourself; if you're using an ORM, all bets are off and you probably wouldn't be asking this anyway):
SqlCommand command = new SqlCommand();
int paramCount = 0;
string searchTerms = "Hula Hoops";
string commandPrefix = #"select *
from Products";
StringBuilder whereBuilder = new StringBuilder();
foreach(string term in searchTerms.Split(' '))
{
if(whereBuilder.Length == 0)
{
whereBuilder.Append(" where ");
}
else
{
whereBuilder.Append(" and ");
}
paramCount++;
SqlParameter param = new SqlParameter(string.Format("param{0}",paramCount), "%" + term + "%");
command.Parameters.Add(param);
whereBuilder.AppendFormat("(ProductName like #param{0} or Description like #param{0} or ShortName like #param{0})",paramCount);
}
command.CommandText = commandPrefix + whereBuilder.ToString();
SQL Server Full Text Search should help you out. You will basically create indexes on the columns you want to search. in the where clause of your query you will use the CONTAINS operator and pass it your search input.
you can start HERE or HERE to learn more
You might want to check out SOLR too - if you're going to be doing this type of searching. Super cool.
http://lucene.apache.org/solr/