How to boost hibernate-search query with field values? - lucene

I have two fields in an entity class:
establishmentName
contactType
contactType has values like PBX, GSM, TEL and FAX
I want a scoring mechanism as to get the most matching data first then PBX, TEL, GSM and FAX.
Scoring:
On establishmentName to get the most matching data first
On contactType to get first PBX then TEL and so on
My final query is:
(+establishmentName:kamran~1^2.5 +(contactType:PBX^2.0 contactType:TEL^1.8 contactType:GSM^1.6 contactType:FAX^1.4))
But it not returning the result.
My question is, how to boost a specific field on different values basis ?
We can use the following query for two different fields:
Query query = qb.keyword()
.onField( field_one).boostedTo(2.0f)
.andField( field_two)
.matching( searchTerm)
.createQuery();
But i need to boost a field on its values as in my case it is contactType.
My dataset:
(establishmentName : Concert Decoration, contactType : GSM),
(establishmentName : Elissa Concert, contactType : TEL),
(establishmentName : Yara Concert, contactType : FAX),
(establishmentName : E Concept, contactType : TEL),
(establishmentName : Infinity Concept, contactType : FAX),
(establishmentName : SD Concept, contactType : PBX),
(establishmentName : Broadcom Technical Concept, contactType : GSM),
(establishmentName : Concept Businessmen, contactType : PBX)
By searching the term=concert(fuzzy query on establishmentName), it should return me the list as below:
(establishmentName : Elissa Concert, contactType : TEL)
[term=concert, exact matching so it will be on top by keeping the
order as PBX, TEL, GSM and FAX]
(establishmentName : Concert Decoration, contactType : GSM)
[term=concert, exact matching and by keeping the order as PBX, TEL,
GSM and FAX]
(establishmentName : Yara Concert, contactType : FAX)
[term=concert, exact matching and by keeping the order as PBX, TEL,
GSM and FAX]
(establishmentName : Concept Businessmen, contactType : PBX)
[term=concert, partial matching and keeping the order as PBX, TEL, GSM
and FAX]
(establishmentName : SD Concept, contactType : PBX)
[term=concert, partial matching and keeping the order as PBX, TEL, GSM
and FAX]
(establishmentName : E Concept, contactType : TEL)
[term=concert, partial matching and keeping the order as PBX, TEL,
GSM and FAX]
(establishmentName : Broadcom Technical Concept, contactType : GSM)
[term=concert, partial matching and keeping the order as PBX, TEL, GSM
and FAX]
(establishmentName : Infinity Concept, contactType : FAX)
[term=concert, partial matching and keeping the order as PBX, TEL, GSM
and FAX]

From what I understand you basically want a two-phase sort:
Put exact matches before other (fuzzy) matches.
Sort by contact type.
The second sort is trivial, but the first one will require a bit of work.
You can actually rely on scoring to implement it.
Essentially the idea would be to run a disjunction of multiple queries, and to assign a constant score to each query.
Instead of doing this:
Query query = qb.keyword()
.fuzzy().withEditDistanceUpTo(1)
.boostedTo(2.5f)
.onField("establishmentName")
.matching(searchTerm)
.createQuery();
Do this:
Query query = qb.bool()
.should(qb.keyword()
.withConstantScore().boostedTo(100.0f) // Higher score, sort first
.onField("establishmentName")
.matching(searchTerm)
.createQuery())
.should(qb.keyword()
.fuzzy().withEditDistanceUpTo(1)
.withConstantScore().boostedTo(1.0f) // Lower score, sort last
.onField("establishmentName")
.matching(searchTerm)
.createQuery())
.createQuery();
The matched documents will be the same, but now the query will assign predictable scores: 1.0 for fuzzy-only matches, and 101.0 (1 from the fuzzy query and 100 from the exact query) for exact matches.
This way, you can define the sort as follows:
fullTextQuery.setSort(qb.sort()
.byScore()
.andByField("contactType")
.createSort());
This may not be a very elegant, or optimized solution, but I think it will work.
To customize the relative order of contact types, I would suggest a different approach: use a custom bridge to index numbers instead of the "PBX"/"TEL"/etc., assigning to each contact type the ordinal you expect. Essentially something like that:
public class Establishment {
#Field(name = "contactType_sort", bridge = #FieldBridge(impl = ContactTypeOrdinalBridge.class))
private ContactType contactType;
}
public class ContactTypeOrdinalBridge implements MetadataProvidingFieldBridge {
#Override
public void set(String name, Object value, Document document, LuceneOptions luceneOptions) {
if ( value != null ) {
int ordinal = getOrdinal((ContactType) value);
luceneOptions.addNumericFieldToDocument(name, ordinal, document);
luceneOptions.addNumericDocValuesFieldToDocument(name, ordinal, document);
}
}
#Override
public void configureFieldMetadata(String name, FieldMetadataBuilder builder) {
builder.field(name, FieldType.INTEGER).sortable(true);
}
private int getOrdinal(ContactType value) {
switch( value ) {
case PBX: return 0;
case TEL: return 1;
case GSM: return 2;
case PBX: return 3;
default: return 4;
}
}
}
Then reindex, and sort like this:
fullTextQuery.setSort(qb.sort()
.byScore()
.andByField("contactType_sort")
.createSort());

Related

JPA 2 Relationship JOIN Named Query

I'm using SpringBoot 2.4 with JPA 2.0 and I have a Model like following:
#Entity
#Data
public class Nation {
#Id
#GeneratedValue (strategy = GenerationType.IDENTITY)
private Integer id;
#OneToMany(mappedBy = "nation")
private List<Country> country;
}
And:
#Entity
#Data
public class Country {
#Id
#GeneratedValue (strategy = GenerationType.IDENTITY)
private Integer id;
#ManyToOne
private Nation nation;
}
Now I would to like to find all Nation filtered by their ID and by country ID. In pure SQL is simply something like:
select * from nation n, country c where n.id = [nation_id] AND c.id = [country_id];
Therefore I thought about doing this way with JPA:
#Query("select n from Nation n JOIN n.country c where n.id = ?1 AND c.id = ?2)
public List<Nation> find(Integer nationID, Integer countryID);
But it doesn't work; it is filtered by Nation but not by countries.
If I print the Hibernate generate SQL by adding:
spring.jpa.show.sql=true
I can see that the query is exactly the same I posted above in pure SQL. The problem occours when I invoke nation.getCountry(), it generates another query that load all country connected to given Nation id.
Is there a way to solve this?
I believe that you can use JPA DTO projections for this case ...
So, in your Nation class create a constructor like this:
/**
* Copy Constructor. It creates a "detached" copy of the
* given nation with only a copy of the provided country.
*/
public Nation(Nation n, Country c) {
super();
this.id = n.id;
// copy other nation values ...
this.country.add( new Country(c) );
}
Modify your query to invoke such constructor ... something like this (assuming that Nation is declared in the java package my.domain):
#Query("select new my.domain.Nation(n, c) from Nation n JOIN n.country c where n.id = ?1 AND c.id = ?2)
Disclaimer:
I have done this using JPA and hibernate. So far, I haven't tested with spring, but I guess this does not matter because your JPA provider is probably hibernate too.
I have done this using only parts (or attributes) of the target entities (as described in the provided link) ... I never have pass the full entities (as I suggest to do in the query). Let me explain, in the cases where I have applied this, I have a constructor like Nation(String name, int population) and in the query I do something like: SELECT new my.domain.Nation(n.name, c.population) ... Try to pass the full entities to see if it works ... if it fails, fallback to create a constructor that receives only the attributes that you require for your business case.

Process data representing relationships between a web of entities

Here are the facts:
There are many companies.
Each company can have many businesses.
There are many addresses.
You don't know which businesses are owned by which companies (or the name of the company).
However, you do know the address of each business and you know a business might trade at more than one address.
Forming relationships between addresses:
If a business has the same address as another then, for the purpose of this question, we will say that they are owned by the same company.
ie A link is formed between two addresses when a business uses both addresses.
So ,an address "A" might be linked to many other addresses.
Note that:
6a. the addresses that address "A" links to might also be linked to one OR MORE addresses.
6b. One of the addresses ""A" links to might link back to "A" via a third address (ie two business that use both these addresses)
A complex example of this is shown in the picture attached. In this picture, there are only two companies. One has the red business, the other has the blue, green and black business.
Here is some example data in tableBA ( I have attached a photo to describe these relationships)
BUSINESS Address
A 1
A 2
B 1
C 3
D 4 < four businesses sharing the same address
E 4
F 4
G 4
W 2
W 5
X 5
X 6
So I want to created code that will create the following output. The output has one name per company and lists the business names that are in the company.
ie there is one row for every complete chain of addresses
A,B,W,X
D,E,F,G
C
This question is an simplification/ improvement on another SO question here.
This answer to the other question uses a combination of SQL and VBA code to solve the problem, because MS Access doesn't support recursive joining.
How can this be done with pure SQL, either with recursive joining or some other technique (not with a stored procedure)?
This is an SQL Server answer.
Quote from this answer:
To clarify:
A business can have multiple addresses
Any business in a business group (AKA company) shares an address with any other business in the group
A business group can own multiple businesses
Each business is only associated with one business group (corollary to second point)
SQL Fiddle with sample data
We can refer to each business group by the first (smallest name in
alphabetical order) business in the group. Let's call this the key
business. After we've identified the key business for each business,
we can group by the key business and get the results.
In order to get the key business:
Generate a list of pairs of businesses where both businesses are in the same group, based on any shared address. This list should exclude
the following (see next point for why):
A -> B, when we have B -> A
A -> A
The left side of the pairs should be unique: each business should appear on the left side of the pair no more than once, if at all.
For each business, follow the pairs from one to the next until the right business is never the left business in any other pair. That is
the key business.
That is the reason for the exclusions in the first point. If we have both A -> B and B -> A, we'll get to a never-ending loop.
Same goes for A -> A.
The following query will return the key business for each business:
WITH Pairs AS (
SELECT Businesses.Business AS Business2, MIN(Businesses_1.Business) AS Business1
FROM Businesses
INNER JOIN Businesses AS Businesses_1 ON Businesses.Address = Businesses_1.Address
WHERE Businesses.Business > Businesses_1.Business
GROUP BY Businesses.Business
),
KeyBusinesses AS (
SELECT Business2 AS Business, Business1 AS KeyBusiness
FROM Pairs
UNION ALL
SELECT Pairs.Business2, KeyBusinesses.KeyBusiness
FROM Pairs
INNER JOIN KeyBusinesses ON Pairs.Business1 = KeyBusinesses.Business
)
SELECT Businesses.*, ISNULL(KeyBusinesses.KeyBusiness, Businesses.Business) AS KeyBusiness
FROM Businesses
LEFT JOIN KeyBusinesses ON Businesses.Business = KeyBusinesses.Business
SQL Fiddle
This is a crude and crappy solution (probably not efficient), but it gets the correct results. It works by maintaining 2 maps to a Company (where a Company contains a list of business-names). First maps business-address to company, 2nd maps business-name to company. if the company is NOT found in either map, a new company is created: -
package test;
import org.junit.Assert;
import org.junit.Test;
import java.util.*;
import java.util.stream.Collectors;
public class Companies {
public static List<Company> listCompanies(List<Business> businesses) {
List<Company> companies = new ArrayList<>();
Map<String, Company> companyByAddress = new HashMap<>(); // map allows many addresses to map to same company
Map<String, Company> companyByBusiness = new HashMap<>(); // map allows many businesses to map to same company
for (Business business : businesses) {
Company company = companyByAddress.get(business.address);
if (company == null)
company = companyByBusiness.get(business.name);
if (company != null) {
company.addAddress(business.name);
} else {
company = new Company();
company.addAddress(business.name);
companies.add(company);
}
companyByBusiness.put(business.name, company);
companyByAddress.put(business.address, company);
}
return companies;
}
#Test
public void testOne() {
List<Business> businesses = new ArrayList<>();
businesses.add(new Business("A", "1"));
businesses.add(new Business("A", "2"));
businesses.add(new Business("B", "1"));
businesses.add(new Business("C", "3"));
businesses.add(new Business("D", "4"));
businesses.add(new Business("E", "4"));
businesses.add(new Business("F", "4"));
businesses.add(new Business("G", "4"));
businesses.add(new Business("W", "2"));
businesses.add(new Business("W", "5"));
businesses.add(new Business("X", "5"));
businesses.add(new Business("X", "6"));
List<Company> companies = listCompanies(businesses);
Assert.assertEquals("A, B, W, X", companies.get(0));
Assert.assertEquals("C", companies.get(1));
Assert.assertEquals("D, E, F, G", companies.get(2));
}
static class Business {
private final String name;
private final String address;
Business(String business, String address) {
this.name = business;
this.address = address;
}
}
static class Company {
private final Set<String> addresses; // Being a "Set", each address will occur only once
Company() {
this.addresses = new LinkedHashSet<>(); // A "LinkedHashSet" preserves insertion order
}
void addAddress(String address) {
addresses.add(address);
}
#Override
public String toString() {
return addresses.stream().collect(Collectors.joining(", "));
}
}
}

Sub-optimal queries over many-to-many relations with HQL

I have two entities, Location and Industry, and a link-table between them. I've configured a many-to-many relationship, in both directions, between the two entities.
In a search query, I'm trying to select Locations that are associated with a list of industries.
After days and days of trying to wrangle the criteria API, I've decided to drop down to HQL and abandon the criteria API. But even that isn't going well for me - it seems, regardless of whether I hand-write this HQL query, or let the criteria API do it, I end up with the same result.
I managed to produce the right result in two ways - like this:
var q = Data.Query("select distinct loc from Location loc join loc.Industries ind where ind in (:ind)");
q.SetParameterList("ind", new Industry[] { Data.GetIndustry(4), Data.GetIndustry(5) });
And (better) like that:
var q = Data.Query("select distinct loc from Location loc join loc.Industries ind where ind.id in (:ind)");
q.SetParameterList("ind", new int[] { 4, 5 });
Unfortunately, both result in a sub-optimal query:
select distinct
location0_.Id as Id16_,
location0_.Name as Name16_,
(etc.)
from Location location0_
inner join LocationIndustry industries1_
on location0_.Id=industries1_.LocationId
inner join Industry industry2_
on industries1_.IndustryId=industry2_.Id
where
industry2_.Id in (? , ?)
Why the extra join?
Is NH not smart enough to know that the Industry.Id property, being the only Industry-property involved in the query, is stored in the LocationIndustry link-table, and there is no need for the extra join to the Industry table itself?
Or am I doing something wrong?
Ideally, the most intuitive thing for me would be to write:
from Location loc where loc.Industries in (:ind)
This does not work - it throws an error and says it does not know about the Industries property. I guess because Industries, being a "property" in programming terms, is actually a "relationship" in terms of DBMS.
What is the simplest and most efficient way to write this query in HQL?
Thanks!
I'm not sure you can avoid this extra join given the mapping strategy you have used.
You could avoid it by using an intermediary class but this would mean you would need a class structure like this:
public class Industry {
//Other stuff
public virtual List<LocationIndustry> LocationIndustries {get; set:;}
}
public class LocationIndustry {
public virtual Location Location {get; set;}
public virtual Industry Industry {get; set;}
}
public class Location {
//normal stuff
public virtual IList<LocationIndustry> LocationIndustries {get; set;}
}
Then you can query on the LocationIndustry class and avoid the join to Location.

Order By Aggregate Subquery with NHibernate ICriteria or QueryOver

Is there a way to achieve SQL like this with NHibernate ICriteria or QueryOver?
select *
from [BlogPost] b
inner join (select blogpost_id, count(*) matchCount
from [Tag]
where name in ('tag X', 'tag Y')
group by blogpost_id
) tagmatch
on tagmatch.blogpost_id = b.Id
order by tagmatch.matchCount desc
The aim is to rank blog posts by the number of matching tags so that a post with both tag X and tag Y comes above posts with just tag X.
I've got this so far:
DetachedCriteria
.For<Tag>("tag")
.Add(Restrictions.In(Projections.Property<Tag>(x => x.Name), tags.ToArray()))
.SetProjection(Projections.Group<Tag>(t => t.BlogPost))
.CreateCriteria("BlogPost")
.SetFetchMode("BlogPost", FetchMode.Eager)
.AddOrder(Order.Desc(Projections.RowCount()));
However, the resulting query doesn't join fetch BlogPost. Instead it returns just the ids, which leads to select n+1 when the BlogPosts are iterated.
public class BlogPost
{
...
ISet<Tag> Tags {get; set;}
}
public class Tag
{
BlogPost BlogPost { get; set; }
string Name { get; set; }
}
This looks like a similar issue.
Is this now possible with NHibernate 3?
If not, is there an alternative solution?
I can change schema & domain model if necessary. I don't want to use SQL or HQL if possible.
I know this question was put some time ago, but I want to do about the same thing, please take a look to my question here, and this guy here, maybe you can use the idea.

NHibernate sorting (SQL as a second option)

I'm using NHibernate as my ORM and I'm trying to sort some data. The data needs to be retrieved paged.
Two of the columns in my Request table are UrgencyID and CreateDate. UrgencyID is a FK to the Urgency table with static data:
1 = Low, 2 = Normal, 3 = High, 4 = Critical.
I need to order my Requests in the following manner.
Critical Requests by CreateDate descending
All other requests by CreateDate descending
So my list of Requests should always have Critical by CreateDate desc at the top and then all other Requests (disregarding UrgencyID) by CreateDate desc
Is it possible to perform this sort order in NHibernate (using the Criteria API)?
If not, how would I do this in SQL? In a stored procedure?
EDIT: Solution thanks to both #DanP and #Michael Pakhantsov
Using the this_ prefix in the sql string as this is the default NHibernate prefix for the primary table selection.
public class OperatorJobQueueOrder : Order
{
public OperatorJobQueueOrder() : base("", true) { }
public override NHibernate.SqlCommand.SqlString ToSqlString(ICriteria criteria, ICriteriaQuery criteriaQuery)
{
return new NHibernate.SqlCommand.SqlString("case this_.JobRequestUrgencyID when 4 then 4 else 0 end desc, this_.CreatedDate");
}
}
You may be able to create a custom sort order to handle this through the critiera api; see this question for an example implementation.
IN SQL ORDER BY will be
ORDER by case UrgencyId when 4 then 4 else 0 end, CreateDate desc