How to configure Hibernate Search to find words with accents - lucene

I need to implement a global search, on a website which I am implementing using Spring (4.0.2)/Hibernate(4.3.1)/MySQL. I have decided to use Hibernate Search(4.5.0) for this.
This seems to be working fine, but only when I do a search for an exact pattern.
Imagine I have the following text on an indexed field:
"A história do Capuchinho e do Lobo Mau"
1) If I search for "história" or "lobo mau", the query will retrieve the corresponding indexed entity, as I would have expected.
2) If I search for "historia" or "lobos maus" the search will not retrieve the entity.
As far as I have read, it should be possible to configure Hibernate Search to perform a much smarter search than this. Can anyone point me on the right direction to achieve this? See below key aspects of the implementation I executed. Thanks!
This is the "parent" indexed entity
#Entity
#Table(name="NEWS_HEADER")
#Indexed
public class NewsHeader implements Serializable {
static final long serialVersionUID = 20140301L;
private int id;
private String articleHeader;
private String language;
private Set<NewsParagraph> paragraphs = new HashSet<NewsParagraph>();
/**
* #return the id
*/
#Id
#Column(name="ID")
#GeneratedValue(strategy=GenerationType.AUTO)
#DocumentId
public int getId() {
return id;
}
/**
* #param id the id to set
*/
public void setId(int id) {
this.id = id;
}
/**
* #return the articleHeader
*/
#Column(name="ARTICLE_HEADER")
#Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO)
public String getArticleHeader() {
return articleHeader;
}
/**
* #param articleHeader the articleHeader to set
*/
public void setArticleHeader(String articleHeader) {
this.articleHeader = articleHeader;
}
/**
* #return the language
*/
#Column(name="LANGUAGE")
public String getLanguage() {
return language;
}
/**
* #param language the language to set
*/
public void setLanguage(String language) {
this.language = language;
}
/**
* #return the paragraphs
*/
#OneToMany(mappedBy="newsHeader", fetch=FetchType.EAGER, cascade=CascadeType.ALL)
#IndexedEmbedded
public Set<NewsParagraph> getParagraphs() {
return paragraphs;
}
// Other standard getters/setters go here
And this the IndexedEmbedded entity
#Entity
#Table(name="NEWS_PARAGRAPH")
public class NewsParagraph implements Serializable {
static final long serialVersionUID = 20140302L;
private int id;
private String content;
private NewsHeader newsHeader;
/**
* #return the id
*/
#Id
#Column(name="ID")
#GeneratedValue(strategy=GenerationType.AUTO)
public int getId() {
return id;
}
/**
* #param id the id to set
*/
public void setId(int id) {
this.id = id;
}
/**
* #return the content
*/
#Column(name="CONTENT")
#Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO)
public String getContent() {
return content;
}
// Other standard getters/setters go here
This is my search method, implemented on my SearchDAOImpl
public class SearchDAOImpl extends DAOBasics implements SearchDAO {
...
public List<NewsHeader> searchParagraph(String patternStr) {
Session session = null;
Transaction tx;
List<NewsHeader> result = null;
try {
session = sessionFactory.getCurrentSession();
FullTextSession fullTextSession = Search.getFullTextSession(session);
tx = fullTextSession.beginTransaction();
// Create native Lucene query using the query DSL
QueryBuilder queryBuilder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(NewsHeader.class).get();
org.apache.lucene.search.Query luceneSearchQuery = queryBuilder
.keyword()
.onFields("articleHeader", "paragraphs.content")
.matching(patternStr)
.createQuery();
// Wrap Lucene query in a org.hibernate.Query
org.hibernate.Query hibernateQuery =
fullTextSession.createFullTextQuery(luceneSearchQuery, NewsHeader.class, NewsParagraph.class);
// Execute search
result = hibernateQuery.list();
} catch (Exception xcp) {
logger.error(xcp);
} finally {
if ((session != null) && (session.isOpen())) {
session.close();
}
}
return result;
}
...
}

This is what I have ended up doing, to resolve my problem.
Configure an AnalyzerDef at the entity level. Within it, use LowerCaseFilterFactory, ASCIIFoldingFilterFactory and SnowballPorterFilterFactory to achieve the type of filtering I needed.
#AnalyzerDef(name = "customAnalyzer",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
#TokenFilterDef(factory = SnowballPorterFilterFactory.class)
})
public class NewsHeader implements Serializable {
...
}
Add this notation for each of the fields I want indexed, either in the Parent entity or its IndexedEmbedded counterpart, to use the defined above analyzer.
#Field(index=Index.YES, store=Store.NO)
#Analyzer(definition = "customAnalyzer")
You will need to either re-index, or re-insert your entities, for the analyser to take effect.

You could configure, or you can use a standard language analyzer, such as PortugueseAnalyzer. I'd recommend starting from the existing analyzer, and creating you own if necessary, using it as a starting point for tweaking the filter chain.
You can set this in using the #Analyzer annotation for the field:
#Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO, analyzer = #Analyzer(impl = org.apache.lucene.analysis.pt.PortugueseAnalyzer.class))
Or you can set that analyzer as the default for the class, if you place an #analyzerannotation are the head of the class instead.

If you want to search accent character and also find same normal keyword in result then you must have to implement ASCIIFoldingFilterFactory class in analyzer like
#AnalyzerDef(name = "customAnalyzer",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
#TokenFilterDef(factory = StopFilterFactory.class, params = {
#Parameter(name="words", value= "com/ik/resource/stoplist.properties" ),
#Parameter(name="ignoreCase", value="true")
})
})
#Analyzer(definition = "customAnalyzer") apply on entity or fields

Related

Search where A or B with querydsl and spring data rest

http://localhost:8080/users?firstName=a&lastName=b ---> where firstName=a and lastName=b
How to make it to or ---> where firstName=a or lastName=b
But when I set QuerydslBinderCustomizer customize
#Override
default public void customize(QuerydslBindings bindings, QUser user) {
bindings.bind(String.class).all((StringPath path, Collection<? extends String> values) -> {
BooleanBuilder predicate = new BooleanBuilder();
values.forEach( value -> predicate.or(path.containsIgnoreCase(value) );
});
}
http://localhost:8080/users?firstName=a&firstName=b&lastName=b ---> where (firstName=a or firstName = b) and lastName=b
It seem different parameters with AND. Same parameters with what I set(predicate.or/predicate.and)
How to make it different parameters with AND like this ---> where firstName=a or firstName=b or lastName=b ??
thx.
Your current request param are grouped as List firstName and String lastName. I see that you want to keep your request parameters without a binding, but in this case it would make your life easier.
My suggestion is to make a new class with request param:
public class UserRequest {
private String lastName;
private List<String> firstName;
// getters and setters
}
For QueryDSL, you can create a builder object:
public class UserPredicateBuilder{
private List<BooleanExpression> expressions = new ArrayList<>();
public UserPredicateBuilder withFirstName(List<String> firstNameList){
QUser user = QUser.user;
expressions.add(user.firstName.in(firstNameList));
return this;
}
//.. same for other fields
public BooleanExpression build(){
if(expressions.isEmpty()){
return Expressions.asBoolean(true).isTrue();
}
BooleanExpression result = expressions.get(0);
for (int i = 1; i < expressions.size(); i++) {
result = result.and(expressions.get(i));
}
return result;
}
}
And after you can just use the builder as :
public List<User> getUsers(UserRequest userRequest){
BooleanExpression expression = new UserPredicateBuilder()
.withFirstName(userRequest.getFirstName())
// other fields
.build();
return userRepository.findAll(expression).getContent();
}
This is the recommended solution.
If you really want to keep the current params without a binding (they still need some kind of validation, otherwise it can throw an Exception in query dsl binding)
you can group them by path :
Map<StringPath,List<String>> values // example firstName => a,b
and after that to create your boolean expression based on the map:
//initial value
BooleanExpression result = Expressions.asBoolean(true).isTrue();
for (Map.Entry entry: values.entrySet()) {
result = result.and(entry.getKey().in(entry.getValues());
}
return userRepository.findAll(result);

JPA query to retrieve data using list of matching tuples as arguments [duplicate]

I have an Entity Class like this:
#Entity
#Table(name = "CUSTOMER")
class Customer{
#Id
#Column(name = "Id")
Long id;
#Column(name = "EMAIL_ID")
String emailId;
#Column(name = "MOBILE")
String mobile;
}
How to write findBy method for the below query using crudrepository spring data jpa?
select * from customer where (email, mobile) IN (("a#b.c","8971"), ("e#f.g", "8888"))
I'm expecting something like
List<Customer> findByEmailMobileIn(List<Tuple> tuples);
I want to get the list of customers from given pairs
I think this can be done with org.springframework.data.jpa.domain.Specification. You can pass a list of your tuples and proceed them this way (don't care that Tuple is not an entity, but you need to define this class):
public class CustomerSpecification implements Specification<Customer> {
// names of the fields in your Customer entity
private static final String CONST_EMAIL_ID = "emailId";
private static final String CONST_MOBILE = "mobile";
private List<MyTuple> tuples;
public ClaimSpecification(List<MyTuple> tuples) {
this.tuples = tuples;
}
#Override
public Predicate toPredicate(Root<Customer> root, CriteriaQuery<?> query, CriteriaBuilder cb) {
// will be connected with logical OR
List<Predicate> predicates = new ArrayList<>();
tuples.forEach(tuple -> {
List<Predicate> innerPredicates = new ArrayList<>();
if (tuple.getEmail() != null) {
innerPredicates.add(cb.equal(root
.<String>get(CONST_EMAIL_ID), tuple.getEmail()));
}
if (tuple.getMobile() != null) {
innerPredicates.add(cb.equal(root
.<String>get(CONST_MOBILE), tuple.getMobile()));
}
// these predicates match a tuple, hence joined with AND
predicates.add(andTogether(innerPredicates, cb));
});
return orTogether(predicates, cb);
}
private Predicate orTogether(List<Predicate> predicates, CriteriaBuilder cb) {
return cb.or(predicates.toArray(new Predicate[0]));
}
private Predicate andTogether(List<Predicate> predicates, CriteriaBuilder cb) {
return cb.and(predicates.toArray(new Predicate[0]));
}
}
Your repo is supposed to extend interface JpaSpecificationExecutor<Customer>.
Then construct a specification with a list of tuples and pass it to the method customerRepo.findAll(Specification<Customer>) - it returns a list of customers.
It is maybe cleaner using a projection :
#Entity
#Table(name = "CUSTOMER")
class CustomerQueryData {
#Id
#Column(name = "Id")
Long id;
#OneToOne
#JoinColumns(#JoinColumn(name = "emailId"), #JoinColumn(name = "mobile"))
Contact contact;
}
The Contact Entity :
#Entity
#Table(name = "CUSTOMER")
class Contact{
#Column(name = "EMAIL_ID")
String emailId;
#Column(name = "MOBILE")
String mobile;
}
After specifying the entities, the repo :
CustomerJpaProjection extends Repository<CustomerQueryData, Long>, QueryDslPredicateExecutor<CustomerQueryData> {
#Override
List<CustomerQueryData> findAll(Predicate predicate);
}
And the repo call :
ArrayList<Contact> contacts = new ArrayList<>();
contacts.add(new Contact("a#b.c","8971"));
contacts.add(new Contact("e#f.g", "8888"));
customerJpaProjection.findAll(QCustomerQueryData.customerQueryData.contact.in(contacts));
Not tested code.

Spring JDBCTemplates: execute a query with Join

I'm developing an app for DB querying, using Spring Boot and JDBCTemplates.
The problem is this: if I have to ask the db on a single table, I have no problems. But, if I have a join, how can I perform this task?
More specifically, the SQL commands to create tables are these:
CREATE TABLE firewall_items
(
id INT NOT NULL AUTO_INCREMENT,
firewall_id INT NOT NULL,
date DATE,
src VARCHAR(15),
src_port INT,
dst VARCHAR(15),
dst_port INT,
protocol VARCHAR(4),
PRIMARY KEY(id)
);
CREATE TABLE firewalls (
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(20) NOT NULL,
ip VARCHAR(15) NOT NULL,
info TEXT,
PRIMARY KEY(id)
);
The correspondings java class are these:
import java.util.Date;
public class FirewallItems
{
private Date date;
private String id;
private String protocol;
private String src;
private String dst;
private String src_port;
private String dst_port;
private String firewall_id;
public FirewallItems() {}
public FirewallItems(Date data, String identificativo, String protocollo, String sorgente, String destinazione,
String porta_sorgente, String porta_destinazione, String firewall_id)
{
super();
this.date = data;
this.id = identificativo;
this.protocol = protocollo;
this.src = sorgente;
this.dst = destinazione;
this.src_port = porta_sorgente;
this.dst_port = porta_destinazione;
this.firewall_id = firewall_id;
}
/**
* Return the date of the report
* #return date
*/
public Date getDate()
{
return date;
}
/**
* Set the date of the report
* #param date the report's date
*/
public void setDate(Date date)
{
this.date = date;
}
/**
* Return the id of the report
* #return id
*/
public String getId()
{
return id;
}
/**
* Set the id of the report
* #param id the report's id
*/
public void setId(String id)
{
this.id = id;
}
/**
* Return the protocol cecked by report
* #return protocol
*/
public String getProtocol()
{
return protocol;
}
/**
* Set the protocol cecked by report
* #param protocol
*/
public void setProtocol(String protocol)
{
this.protocol = protocol;
}
/**
* Return the source of firewall's drop
* #return Src
*/
public String getSrc()
{
return src;
}
/**
* Set the source of firewall's drop
* #param src the firewall's source drop
*/
public void setSrc(String src)
{
this.src = src;
}
/**
* Return the firewall's destionation drop
* #return dst
*/
public String getDst()
{
return dst;
}
/**
* Set the firewall's destination drop
* #param dst the firewall's destination drop
*/
public void setDst(String dst)
{
this.dst = dst;
}
/**
* Return the source's port
* #return src_port
*/
public String getSrc_port()
{
return src_port;
}
/**
* Set the source's port
* #param src_port the source's port
*/
public void setSrc_port(String src_port)
{
this.src_port = src_port;
}
/**
* Return the destination's port
* #return dst_port
*/
public String getDst_port()
{
return dst_port;
}
/**
* Set the destination's port
* #param dst_port the destination's port
*/
public void setDst_port(String dst_port)
{
this.dst_port = dst_port;
}
/**
* Return the id of firewall associated to report
* #return firewall_id
*/
public String getFirewall_id()
{
return firewall_id;
}
/**
* Set the id of firewall associated to report
* #param firewall_id the id of firewall associated to report
*/
public void setFirewall_id(String firewall_id)
{
this.firewall_id = firewall_id;
}
}
public class Firewall
{
private String id;
private String ip;
private String info;
private String name;
/**
* Empty constructor, which instantiates a Firewall specimen without setting default values
*/
public Firewall() {}
/**
* Constructor instantiating a Firewall specimen specifying its initial values
*
* #param id the firewall's id code
* #param ip the firewall's ip code
* #param info the info about firewall
* #param name firewall's name
*/
public Firewall(String id, String ip, String info, String nome)
{
super();
this.id = id;
this.ip = ip;
this.info = info;
this.name = nome;
}
/**
* Return the firewall's id
* #return id
*/
public String getId()
{
return id;
}
/**
* Set firewall's id
* #param id the firewall's id
*/
public void setId(String id)
{
this.id = id;
}
/**
* Return the firewall's ip
* #return ip
*/
public String getIp()
{
return ip;
}
/**
* Set firewall's ip
* #param ip the firewall's ip
*/
public void setIp(String ip)
{
this.ip = ip;
}
/**
* Return firewall's info
* #return info
*/
public String getInfo()
{
return info;
}
/**
* Set firewall's info
* #param info firewall's info fields
*/
public void setInfo(String info)
{
this.info = info;
}
/**
* Return firewall's name
* #return name
*/
public String getName()
{
return name;
}
/**
* Set firewall's name
* #param name firewall's name
*/
public void setName(String name)
{
this.name = name;
}
}
The constraint is that firewall_Items.firewall_id = firewall.id (so, these are the variables that i must use to perform join).
Now, if i want perform this query:
SELECT info, src
FROM firewalls, firewall_items
WHERE firewall_items.firewall_id = firewalls.id;
How my java code must be, using jdbctemplate?
Should i add to firewall class a collection to collect object of FirewallsItems, like an ArrayList?
Note1: i must use jdbctemplate project specifications. I can't use Hibernate or other instruments.
Note2: I know what rowmapper and resultset are, i regolary use them with query on a single table. What i nedd is to understand how to use them for a query with join, like that of the example.
Thanks a lot in advance for response!
you should use the JOIN keyword to join your tables before you query them.
Like so:
String query= "SELECT firewall_items.src, firewalls.info
FROM firewall_items
JOIN firewalls
ON firewall_items.firewalls_id = firewalls.id"
List<Item> items = jdbcTemplate.query(
query,
new MapSqlParameterSource(),
new FirewallInfoRowMapper()
);
Where Item is your retrieved object. You decide what that is.
Look at this article for more info
EDIT:
In response to your further inquiry. Above is the amended usage of jdbcTemplate, below you can find the classes you need. This requires you to have Spring.
I've assumed that if you're using jdbcTemplate you already have Spring.
Below is a cheat sheet, but please look at this site and learn more about querying databases with java Spring and jdbcTemplates.
The correct implementation for a row mapper is like so :
public class FirewallInfoRowMapper implements RowMapper<FirewallInfo>{
#Override
public FirewallInfo mapRow(ResultSet rs, int rowNum) throws SQLException{
return new FirewallInfo(rs.getString("src"), rs.getString("info"))
}
}
public class FirewallInfo{
private String src;
private String info;
public FirewallInfo(String src, String info){
this.src = src;
this.info = info;
}
{}<<< Getters and Setters Here
}
I know its late. There aren't a lot of good tutorial out there so incase someone else needs to know how to execute join query with spring and jdbc template. This is the way I did mine.
First ensure to import the following jars in your dao class like so
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.List;
import org.springframework.jdbc.core.BeanPropertyRowMapper;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.jdbc.core.RowMapper;
Then in your FirewallDao class, do something like
public List<Firewall> getData(){
return template.query("SELECT firewalls.info, firewall_items.src FROM firewalls INNER JOIN firewall_items ON firewall_items.firewall_id = firewalls.id",new RowMapper<Firewall>(){
public Firewall mapRow(ResultSet rs, int row) throws SQLException {
Firewall f =new Firewall();
f.setInfo(rs.getString(1));
f.setSrc(rs.getString(2));
return f;
}
});
}
In you controller, something like
#Autowired
FirewallDao dao;
#RequestMapping("/viewinfo")
public ModelAndView viewinfo(){
List<Firewall> list=dao.getData();
return new ModelAndView("viewinfo","list",list);
}
in ur view, do something like
<table border="2" width="70%" cellpadding="2">
<tr><th>Info</th><th>Src</th></th><th>Edit</th></tr>
<c:forEach var="f" items="${list}">
<tr>
<td>${f.info}</td>
<td>${f.src}</td>
</tr>
</c:forEach>
</table>

Hibernate Search with AnalyzerDiscriminator - Analyzer called only when creating Entity?

can you help me?
I am implementing Hibernate Search, to retrieve results for a global search on a localized website (portuguese and english content)
To do this, I have followed the steps indicated on the Hibernate Search docs:
http://docs.jboss.org/hibernate/search/4.5/reference/en-US/html_single/#d0e4141
Along with the specific configuration in the entity itself, I have implemented a "LanguageDiscriminator" class, following the instructions in this doc.
Because I am not getting exactly the results I was expecting (e.g. my entity has the text "Capuchinho" stored, but when I search for "capucho" I get no hits), I have decided to try and debug the execution, and try to understand if the Analyzers which I have configured are being used at all.
When creating a new record for the entity in the database, I can see that the "getAnalyzerDefinitionName()" method from the "LanguageDiscriminator" gets called. Great. But the same does not happen when I execute a search. Can anyone explain me why?
I am posting the key parts of my code below. Thanks a lot for any feedback!
This is one entity I want to index
#Entity
#Table(name="NEWS_HEADER")
#Indexed
#AnalyzerDefs({
#AnalyzerDef(name = "en",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = SnowballPorterFilterFactory.class,
params = {#Parameter(name="language", value="English")}
)
}
),
#AnalyzerDef(name = "pt",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = SnowballPorterFilterFactory.class,
params = {#Parameter(name="language", value="Portuguese")}
)
}
)
})
public class NewsHeader implements Serializable {
static final long serialVersionUID = 20140301L;
private int id;
private String articleHeader;
private String language;
private Set<NewsParagraph> paragraphs = new HashSet<NewsParagraph>();
/**
* #return the id
*/
#Id
#Column(name="ID")
#GeneratedValue(strategy=GenerationType.AUTO)
#DocumentId
public int getId() {
return id;
}
/**
* #param id the id to set
*/
public void setId(int id) {
this.id = id;
}
/**
* #return the articleHeader
*/
#Column(name="ARTICLE_HEADER")
#Field(index=Index.YES, store=Store.NO)
public String getArticleHeader() {
return articleHeader;
}
/**
* #param articleHeader the articleHeader to set
*/
public void setArticleHeader(String articleHeader) {
this.articleHeader = articleHeader;
}
/**
* #return the language
*/
#Column(name="LANGUAGE")
#Field
#AnalyzerDiscriminator(impl=LanguageDiscriminator.class)
public String getLanguage() {
return language;
}
...
}
This is my LanguageDiscriminator class
public class LanguageDiscriminator implements Discriminator {
#Override
public String getAnalyzerDefinitionName(Object value, Object entity, String field) {
String result = null;
if (value != null) {
result = (String) value;
}
return result;
}
}
This is my search method present in my SearchDAO
public List<NewsHeader> searchParagraph(String patternStr) {
Session session = null;
Transaction tx;
List<NewsHeader> result = null;
try {
session = sessionFactory.getCurrentSession();
FullTextSession fullTextSession = Search.getFullTextSession(session);
tx = fullTextSession.beginTransaction();
// Create native Lucene query using the query DSL
QueryBuilder queryBuilder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(NewsHeader.class).get();
org.apache.lucene.search.Query luceneSearchQuery = queryBuilder
.keyword()
.onFields("articleHeader", "paragraphs.content")
.matching(patternStr)
.createQuery();
// Wrap Lucene query in a org.hibernate.Query
org.hibernate.Query hibernateQuery =
fullTextSession.createFullTextQuery(luceneSearchQuery, NewsHeader.class, NewsParagraph.class);
// Execute search
result = hibernateQuery.list();
} catch (Exception xcp) {
logger.error(xcp);
} finally {
if ((session != null) && (session.isOpen())) {
session.close();
}
}
return result;
}
When creating a new record for the entity in the database, I can see that the "getAnalyzerDefinitionName()" method from the "LanguageDiscriminator" gets called. Great. But the same does not happen when I execute a search. Can anyone explain me why?
The selection of the analyzer is dependent on the state of a given entity, in your case NewsHeader. You are dealing with entity instances during indexing. While querying you don't have entities to start with, you are searching for them. Which analyzer would you Hibernate Search to select for your query?
That said, I think there is a shortcoming in the DSL. It does not allow you to explicitly specify the analyzer for a class. There is ignoreAnalyzer, but that's not what you want. I guess you could create a feature request in the Search issue tracker - https://hibernate.atlassian.net/browse/HSEARCH.
In the mean time you can build the query using the native Lucene query API. However, you will need to know which language you are targeting with your query (for example via the preferred language of the logged in user or whatever). This will depend on your use case. It might be you are looking at the wrong feature to start with.

why does hibernate hql distinct cause an sql distinct on left join?

I've got this test HQL:
select distinct o from Order o left join fetch o.lineItems
and it does generate an SQL distinct without an obvious reason:
select distinct order0_.id as id61_0_, orderline1_.order_id as order1_62_1_...
The SQL resultset is always the same (with and without an SQL distinct):
order id | order name | orderline id | orderline name
---------+------------+--------------+---------------
1 | foo | 1 | foo item
1 | foo | 2 | bar item
1 | foo | 3 | test item
2 | empty | NULL | NULL
3 | bar | 4 | qwerty item
3 | bar | 5 | asdfgh item
Why does hibernate generate the SQL distinct? The SQL distinct doesn't make any sense and makes the query slower than needed.
This is contrary to the FAQ which mentions that hql distinct in this case is just a shortcut for the result transformer:
session.createQuery("select distinct o
from Order o left join fetch
o.lineItems").list();
It looks like you are using the SQL DISTINCT keyword here. Of course, this is not SQL, this is HQL. This distinct is just a shortcut for the result transformer, in this case. Yes, in other cases an HQL distinct will translate straight into a SQL DISTINCT. Not in this case: you can not filter out duplicates at the SQL level, the very nature of a product/join forbids this - you want the duplicates or you don't get all the data you need.
thanks
Have a closer look at the sql statement that hibernate generates - yes it does use the "distinct" keyword but not in the way I think you are expecting it to (or the way that the Hibernate FAQ is implying) i.e. to return a set of "distinct" or "unique" orders.
It doesn't use the distinct keyword to return distinct orders, as that wouldn't make sense in that SQL query, considering the join that you have also specified.
The resulting sql set still needs processing by the ResultTransformer, as clearly the sql set contains duplicate orders. That's why they say that the HQL distinct keyword doesn't directly map to the SQL distinct keyword.
I had the exact same problem and I think this is an Hibernate issue (not a bug because code doesn't fail). However, I have to dig deeper to make sure it's an issue.
Hibernate (at least in version 4 which its the version I'm working on my project, specifically 4.3.11) uses the concept of SPI, long story short: its like an API to extend or modify the framework.
I took advantage of this feature to replace the classes org.hibernate.hql.internal.ast.ASTQueryTranslatorFactory (This class is called by Hibernate and delegates the job of generating the SQL query) and org.hibernate.hql.internal.ast.QueryTranslatorImpl (This is sort of an internal class which is called by org.hibernate.hql.internal.ast.ASTQueryTranslatorFactory and generates the actual SQL query). I did it as follows:
Replacement for org.hibernate.hql.internal.ast.ASTQueryTranslatorFactory:
package org.hibernate.hql.internal.ast;
import java.util.Map;
import org.hibernate.engine.query.spi.EntityGraphQueryHint;
import org.hibernate.engine.spi.SessionFactoryImplementor;
import org.hibernate.hql.spi.QueryTranslator;
public class NoDistinctInSQLASTQueryTranslatorFactory extends ASTQueryTranslatorFactory {
#Override
public QueryTranslator createQueryTranslator(String queryIdentifier, String queryString, Map filters, SessionFactoryImplementor factory, EntityGraphQueryHint entityGraphQueryHint) {
return new NoDistinctInSQLQueryTranslatorImpl(queryIdentifier, queryString, filters, factory, entityGraphQueryHint);
}
}
Replacement for org.hibernate.hql.internal.ast.QueryTranslatorImpl:
package org.hibernate.hql.internal.ast;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Set;
import org.hibernate.HibernateException;
import org.hibernate.MappingException;
import org.hibernate.QueryException;
import org.hibernate.ScrollableResults;
import org.hibernate.engine.query.spi.EntityGraphQueryHint;
import org.hibernate.engine.spi.QueryParameters;
import org.hibernate.engine.spi.RowSelection;
import org.hibernate.engine.spi.SessionFactoryImplementor;
import org.hibernate.engine.spi.SessionImplementor;
import org.hibernate.event.spi.EventSource;
import org.hibernate.hql.internal.QueryExecutionRequestException;
import org.hibernate.hql.internal.antlr.HqlSqlTokenTypes;
import org.hibernate.hql.internal.antlr.HqlTokenTypes;
import org.hibernate.hql.internal.antlr.SqlTokenTypes;
import org.hibernate.hql.internal.ast.exec.BasicExecutor;
import org.hibernate.hql.internal.ast.exec.DeleteExecutor;
import org.hibernate.hql.internal.ast.exec.MultiTableDeleteExecutor;
import org.hibernate.hql.internal.ast.exec.MultiTableUpdateExecutor;
import org.hibernate.hql.internal.ast.exec.StatementExecutor;
import org.hibernate.hql.internal.ast.tree.AggregatedSelectExpression;
import org.hibernate.hql.internal.ast.tree.FromElement;
import org.hibernate.hql.internal.ast.tree.InsertStatement;
import org.hibernate.hql.internal.ast.tree.QueryNode;
import org.hibernate.hql.internal.ast.tree.Statement;
import org.hibernate.hql.internal.ast.util.ASTPrinter;
import org.hibernate.hql.internal.ast.util.ASTUtil;
import org.hibernate.hql.internal.ast.util.NodeTraverser;
import org.hibernate.hql.spi.FilterTranslator;
import org.hibernate.hql.spi.ParameterTranslations;
import org.hibernate.internal.CoreMessageLogger;
import org.hibernate.internal.util.ReflectHelper;
import org.hibernate.internal.util.StringHelper;
import org.hibernate.internal.util.collections.IdentitySet;
import org.hibernate.loader.hql.QueryLoader;
import org.hibernate.param.ParameterSpecification;
import org.hibernate.persister.entity.Queryable;
import org.hibernate.type.Type;
import org.jboss.logging.Logger;
import antlr.ANTLRException;
import antlr.RecognitionException;
import antlr.TokenStreamException;
import antlr.collections.AST;
/**
* A QueryTranslator that uses an Antlr-based parser.
*
* #author Joshua Davis (pgmjsd#sourceforge.net)
*/
public class NoDistinctInSQLQueryTranslatorImpl extends QueryTranslatorImpl implements FilterTranslator {
private static final CoreMessageLogger LOG = Logger.getMessageLogger(
CoreMessageLogger.class,
QueryTranslatorImpl.class.getName()
);
private SessionFactoryImplementor factory;
private final String queryIdentifier;
private String hql;
private boolean shallowQuery;
private Map tokenReplacements;
//TODO:this is only needed during compilation .. can we eliminate the instvar?
private Map enabledFilters;
private boolean compiled;
private QueryLoader queryLoader;
private StatementExecutor statementExecutor;
private Statement sqlAst;
private String sql;
private ParameterTranslations paramTranslations;
private List<ParameterSpecification> collectedParameterSpecifications;
private EntityGraphQueryHint entityGraphQueryHint;
/**
* Creates a new AST-based query translator.
*
* #param queryIdentifier The query-identifier (used in stats collection)
* #param query The hql query to translate
* #param enabledFilters Currently enabled filters
* #param factory The session factory constructing this translator instance.
*/
public NoDistinctInSQLQueryTranslatorImpl(
String queryIdentifier,
String query,
Map enabledFilters,
SessionFactoryImplementor factory) {
super(queryIdentifier, query, enabledFilters, factory);
this.queryIdentifier = queryIdentifier;
this.hql = query;
this.compiled = false;
this.shallowQuery = false;
this.enabledFilters = enabledFilters;
this.factory = factory;
}
public NoDistinctInSQLQueryTranslatorImpl(
String queryIdentifier,
String query,
Map enabledFilters,
SessionFactoryImplementor factory,
EntityGraphQueryHint entityGraphQueryHint) {
this(queryIdentifier, query, enabledFilters, factory);
this.entityGraphQueryHint = entityGraphQueryHint;
}
/**
* Compile a "normal" query. This method may be called multiple times.
* Subsequent invocations are no-ops.
*
* #param replacements Defined query substitutions.
* #param shallow Does this represent a shallow (scalar or entity-id)
* select?
* #throws QueryException There was a problem parsing the query string.
* #throws MappingException There was a problem querying defined mappings.
*/
#Override
public void compile(
Map replacements,
boolean shallow) throws QueryException, MappingException {
doCompile(replacements, shallow, null);
}
/**
* Compile a filter. This method may be called multiple times. Subsequent
* invocations are no-ops.
*
* #param collectionRole the role name of the collection used as the basis
* for the filter.
* #param replacements Defined query substitutions.
* #param shallow Does this represent a shallow (scalar or entity-id)
* select?
* #throws QueryException There was a problem parsing the query string.
* #throws MappingException There was a problem querying defined mappings.
*/
#Override
public void compile(
String collectionRole,
Map replacements,
boolean shallow) throws QueryException, MappingException {
doCompile(replacements, shallow, collectionRole);
}
/**
* Performs both filter and non-filter compiling.
*
* #param replacements Defined query substitutions.
* #param shallow Does this represent a shallow (scalar or entity-id)
* select?
* #param collectionRole the role name of the collection used as the basis
* for the filter, NULL if this is not a filter.
*/
private synchronized void doCompile(Map replacements, boolean shallow, String collectionRole) {
// If the query is already compiled, skip the compilation.
if (compiled) {
LOG.debug("compile() : The query is already compiled, skipping...");
return;
}
// Remember the parameters for the compilation.
this.tokenReplacements = replacements;
if (tokenReplacements == null) {
tokenReplacements = new HashMap();
}
this.shallowQuery = shallow;
try {
// PHASE 1 : Parse the HQL into an AST.
final HqlParser parser = parse(true);
// PHASE 2 : Analyze the HQL AST, and produce an SQL AST.
final HqlSqlWalker w = analyze(parser, collectionRole);
sqlAst = (Statement) w.getAST();
// at some point the generate phase needs to be moved out of here,
// because a single object-level DML might spawn multiple SQL DML
// command executions.
//
// Possible to just move the sql generation for dml stuff, but for
// consistency-sake probably best to just move responsiblity for
// the generation phase completely into the delegates
// (QueryLoader/StatementExecutor) themselves. Also, not sure why
// QueryLoader currently even has a dependency on this at all; does
// it need it? Ideally like to see the walker itself given to the delegates directly...
if (sqlAst.needsExecutor()) {
statementExecutor = buildAppropriateStatementExecutor(w);
} else {
// PHASE 3 : Generate the SQL.
generate((QueryNode) sqlAst);
queryLoader = new QueryLoader(this, factory, w.getSelectClause());
}
compiled = true;
} catch (QueryException qe) {
if (qe.getQueryString() == null) {
throw qe.wrapWithQueryString(hql);
} else {
throw qe;
}
} catch (RecognitionException e) {
// we do not actually propagate ANTLRExceptions as a cause, so
// log it here for diagnostic purposes
LOG.trace("Converted antlr.RecognitionException", e);
throw QuerySyntaxException.convert(e, hql);
} catch (ANTLRException e) {
// we do not actually propagate ANTLRExceptions as a cause, so
// log it here for diagnostic purposes
LOG.trace("Converted antlr.ANTLRException", e);
throw new QueryException(e.getMessage(), hql);
}
//only needed during compilation phase...
this.enabledFilters = null;
}
private void generate(AST sqlAst) throws QueryException, RecognitionException {
if (sql == null) {
final SqlGenerator gen = new SqlGenerator(factory);
gen.statement(sqlAst);
sql = gen.getSQL();
//Hack: The distinct operator is removed from the sql
//string to avoid executing a distinct query in the db server when
//the distinct is used in hql.
sql = sql.replace("distinct", "");
if (LOG.isDebugEnabled()) {
LOG.debugf("HQL: %s", hql);
LOG.debugf("SQL: %s", sql);
}
gen.getParseErrorHandler().throwQueryException();
collectedParameterSpecifications = gen.getCollectedParameters();
}
}
private static final ASTPrinter SQL_TOKEN_PRINTER = new ASTPrinter(SqlTokenTypes.class);
private HqlSqlWalker analyze(HqlParser parser, String collectionRole) throws QueryException, RecognitionException {
final HqlSqlWalker w = new HqlSqlWalker(this, factory, parser, tokenReplacements, collectionRole);
final AST hqlAst = parser.getAST();
// Transform the tree.
w.statement(hqlAst);
if (LOG.isDebugEnabled()) {
LOG.debug(SQL_TOKEN_PRINTER.showAsString(w.getAST(), "--- SQL AST ---"));
}
w.getParseErrorHandler().throwQueryException();
return w;
}
private HqlParser parse(boolean filter) throws TokenStreamException, RecognitionException {
// Parse the query string into an HQL AST.
final HqlParser parser = HqlParser.getInstance(hql);
parser.setFilter(filter);
LOG.debugf("parse() - HQL: %s", hql);
parser.statement();
final AST hqlAst = parser.getAST();
final NodeTraverser walker = new NodeTraverser(new JavaConstantConverter());
walker.traverseDepthFirst(hqlAst);
showHqlAst(hqlAst);
parser.getParseErrorHandler().throwQueryException();
return parser;
}
private static final ASTPrinter HQL_TOKEN_PRINTER = new ASTPrinter(HqlTokenTypes.class);
#Override
void showHqlAst(AST hqlAst) {
if (LOG.isDebugEnabled()) {
LOG.debug(HQL_TOKEN_PRINTER.showAsString(hqlAst, "--- HQL AST ---"));
}
}
private void errorIfDML() throws HibernateException {
if (sqlAst.needsExecutor()) {
throw new QueryExecutionRequestException("Not supported for DML operations", hql);
}
}
private void errorIfSelect() throws HibernateException {
if (!sqlAst.needsExecutor()) {
throw new QueryExecutionRequestException("Not supported for select queries", hql);
}
}
#Override
public String getQueryIdentifier() {
return queryIdentifier;
}
#Override
public Statement getSqlAST() {
return sqlAst;
}
private HqlSqlWalker getWalker() {
return sqlAst.getWalker();
}
/**
* Types of the return values of an <tt>iterate()</tt> style query.
*
* #return an array of <tt>Type</tt>s.
*/
#Override
public Type[] getReturnTypes() {
errorIfDML();
return getWalker().getReturnTypes();
}
#Override
public String[] getReturnAliases() {
errorIfDML();
return getWalker().getReturnAliases();
}
#Override
public String[][] getColumnNames() {
errorIfDML();
return getWalker().getSelectClause().getColumnNames();
}
#Override
public Set<Serializable> getQuerySpaces() {
return getWalker().getQuerySpaces();
}
#Override
public List list(SessionImplementor session, QueryParameters queryParameters)
throws HibernateException {
// Delegate to the QueryLoader...
errorIfDML();
final QueryNode query = (QueryNode) sqlAst;
final boolean hasLimit = queryParameters.getRowSelection() != null && queryParameters.getRowSelection().definesLimits();
final boolean needsDistincting = (query.getSelectClause().isDistinct() || hasLimit) && containsCollectionFetches();
QueryParameters queryParametersToUse;
if (hasLimit && containsCollectionFetches()) {
LOG.firstOrMaxResultsSpecifiedWithCollectionFetch();
RowSelection selection = new RowSelection();
selection.setFetchSize(queryParameters.getRowSelection().getFetchSize());
selection.setTimeout(queryParameters.getRowSelection().getTimeout());
queryParametersToUse = queryParameters.createCopyUsing(selection);
} else {
queryParametersToUse = queryParameters;
}
List results = queryLoader.list(session, queryParametersToUse);
if (needsDistincting) {
int includedCount = -1;
// NOTE : firstRow is zero-based
int first = !hasLimit || queryParameters.getRowSelection().getFirstRow() == null
? 0
: queryParameters.getRowSelection().getFirstRow();
int max = !hasLimit || queryParameters.getRowSelection().getMaxRows() == null
? -1
: queryParameters.getRowSelection().getMaxRows();
List tmp = new ArrayList();
IdentitySet distinction = new IdentitySet();
for (final Object result : results) {
if (!distinction.add(result)) {
continue;
}
includedCount++;
if (includedCount < first) {
continue;
}
tmp.add(result);
// NOTE : ( max - 1 ) because first is zero-based while max is not...
if (max >= 0 && (includedCount - first) >= (max - 1)) {
break;
}
}
results = tmp;
}
return results;
}
/**
* Return the query results as an iterator
*/
#Override
public Iterator iterate(QueryParameters queryParameters, EventSource session)
throws HibernateException {
// Delegate to the QueryLoader...
errorIfDML();
return queryLoader.iterate(queryParameters, session);
}
/**
* Return the query results, as an instance of <tt>ScrollableResults</tt>
*/
#Override
public ScrollableResults scroll(QueryParameters queryParameters, SessionImplementor session)
throws HibernateException {
// Delegate to the QueryLoader...
errorIfDML();
return queryLoader.scroll(queryParameters, session);
}
#Override
public int executeUpdate(QueryParameters queryParameters, SessionImplementor session)
throws HibernateException {
errorIfSelect();
return statementExecutor.execute(queryParameters, session);
}
/**
* The SQL query string to be called; implemented by all subclasses
*/
#Override
public String getSQLString() {
return sql;
}
#Override
public List<String> collectSqlStrings() {
ArrayList<String> list = new ArrayList<>();
if (isManipulationStatement()) {
String[] sqlStatements = statementExecutor.getSqlStatements();
Collections.addAll(list, sqlStatements);
} else {
list.add(sql);
}
return list;
}
// -- Package local methods for the QueryLoader delegate --
#Override
public boolean isShallowQuery() {
return shallowQuery;
}
#Override
public String getQueryString() {
return hql;
}
#Override
public Map getEnabledFilters() {
return enabledFilters;
}
#Override
public int[] getNamedParameterLocs(String name) {
return getWalker().getNamedParameterLocations(name);
}
#Override
public boolean containsCollectionFetches() {
errorIfDML();
List collectionFetches = ((QueryNode) sqlAst).getFromClause().getCollectionFetches();
return collectionFetches != null && collectionFetches.size() > 0;
}
#Override
public boolean isManipulationStatement() {
return sqlAst.needsExecutor();
}
#Override
public void validateScrollability() throws HibernateException {
// Impl Note: allows multiple collection fetches as long as the
// entire fecthed graph still "points back" to a single
// root entity for return
errorIfDML();
final QueryNode query = (QueryNode) sqlAst;
// If there are no collection fetches, then no further checks are needed
List collectionFetches = query.getFromClause().getCollectionFetches();
if (collectionFetches.isEmpty()) {
return;
}
// A shallow query is ok (although technically there should be no fetching here...)
if (isShallowQuery()) {
return;
}
// Otherwise, we have a non-scalar select with defined collection fetch(es).
// Make sure that there is only a single root entity in the return (no tuples)
if (getReturnTypes().length > 1) {
throw new HibernateException("cannot scroll with collection fetches and returned tuples");
}
FromElement owner = null;
for (Object o : query.getSelectClause().getFromElementsForLoad()) {
// should be the first, but just to be safe...
final FromElement fromElement = (FromElement) o;
if (fromElement.getOrigin() == null) {
owner = fromElement;
break;
}
}
if (owner == null) {
throw new HibernateException("unable to locate collection fetch(es) owner for scrollability checks");
}
// This is not strictly true. We actually just need to make sure that
// it is ordered by root-entity PK and that that order-by comes before
// any non-root-entity ordering...
AST primaryOrdering = query.getOrderByClause().getFirstChild();
if (primaryOrdering != null) {
// TODO : this is a bit dodgy, come up with a better way to check this (plus see above comment)
String[] idColNames = owner.getQueryable().getIdentifierColumnNames();
String expectedPrimaryOrderSeq = StringHelper.join(
", ",
StringHelper.qualify(owner.getTableAlias(), idColNames)
);
if (!primaryOrdering.getText().startsWith(expectedPrimaryOrderSeq)) {
throw new HibernateException("cannot scroll results with collection fetches which are not ordered primarily by the root entity's PK");
}
}
}
private StatementExecutor buildAppropriateStatementExecutor(HqlSqlWalker walker) {
final Statement statement = (Statement) walker.getAST();
switch (walker.getStatementType()) {
case HqlSqlTokenTypes.DELETE: {
final FromElement fromElement = walker.getFinalFromClause().getFromElement();
final Queryable persister = fromElement.getQueryable();
if (persister.isMultiTable()) {
return new MultiTableDeleteExecutor(walker);
} else {
return new DeleteExecutor(walker, persister);
}
}
case HqlSqlTokenTypes.UPDATE: {
final FromElement fromElement = walker.getFinalFromClause().getFromElement();
final Queryable persister = fromElement.getQueryable();
if (persister.isMultiTable()) {
// even here, if only properties mapped to the "base table" are referenced
// in the set and where clauses, this could be handled by the BasicDelegate.
// TODO : decide if it is better performance-wise to doAfterTransactionCompletion that check, or to simply use the MultiTableUpdateDelegate
return new MultiTableUpdateExecutor(walker);
} else {
return new BasicExecutor(walker, persister);
}
}
case HqlSqlTokenTypes.INSERT:
return new BasicExecutor(walker, ((InsertStatement) statement).getIntoClause().getQueryable());
default:
throw new QueryException("Unexpected statement type");
}
}
#Override
public ParameterTranslations getParameterTranslations() {
if (paramTranslations == null) {
paramTranslations = new ParameterTranslationsImpl(getWalker().getParameters());
}
return paramTranslations;
}
#Override
public List<ParameterSpecification> getCollectedParameterSpecifications() {
return collectedParameterSpecifications;
}
#Override
public Class getDynamicInstantiationResultType() {
AggregatedSelectExpression aggregation = queryLoader.getAggregatedSelectExpression();
return aggregation == null ? null : aggregation.getAggregationResultType();
}
public static class JavaConstantConverter implements NodeTraverser.VisitationStrategy {
private AST dotRoot;
#Override
public void visit(AST node) {
if (dotRoot != null) {
// we are already processing a dot-structure
if (ASTUtil.isSubtreeChild(dotRoot, node)) {
return;
}
// we are now at a new tree level
dotRoot = null;
}
if (node.getType() == HqlTokenTypes.DOT) {
dotRoot = node;
handleDotStructure(dotRoot);
}
}
private void handleDotStructure(AST dotStructureRoot) {
final String expression = ASTUtil.getPathText(dotStructureRoot);
final Object constant = ReflectHelper.getConstantValue(expression);
if (constant != null) {
dotStructureRoot.setFirstChild(null);
dotStructureRoot.setType(HqlTokenTypes.JAVA_CONSTANT);
dotStructureRoot.setText(expression);
}
}
}
#Override
public EntityGraphQueryHint getEntityGraphQueryHint() {
return entityGraphQueryHint;
}
#Override
public void setEntityGraphQueryHint(EntityGraphQueryHint entityGraphQueryHint) {
this.entityGraphQueryHint = entityGraphQueryHint;
}
}
If you follow the code flow you will notice that I just modified the method private void generate(AST sqlAst) throws QueryException, RecognitionException and added the a following lines:
//Hack: The distinct keywordis removed from the sql string to
//avoid executing a distinct query in the DBMS when the distinct
//is used in hql.
sql = sql.replace("distinct", "");
What I do with this code is to remove the distinct keyword from the generated SQL query.
After creating the classes above, I added the following line in the hibernate configuration file:
<property name="hibernate.query.factory_class">org.hibernate.hql.internal.ast.NoDistinctInSQLASTQueryTranslatorFactory</property>
This line tells hibernate to use my custom class to parse HQL queries and generate SQL queries without the distinct keyword. Notice I created my custom classes in the same package where the original HQL parser resides.