Hibernate Search with AnalyzerDiscriminator - Analyzer called only when creating Entity? - lucene

can you help me?
I am implementing Hibernate Search, to retrieve results for a global search on a localized website (portuguese and english content)
To do this, I have followed the steps indicated on the Hibernate Search docs:
http://docs.jboss.org/hibernate/search/4.5/reference/en-US/html_single/#d0e4141
Along with the specific configuration in the entity itself, I have implemented a "LanguageDiscriminator" class, following the instructions in this doc.
Because I am not getting exactly the results I was expecting (e.g. my entity has the text "Capuchinho" stored, but when I search for "capucho" I get no hits), I have decided to try and debug the execution, and try to understand if the Analyzers which I have configured are being used at all.
When creating a new record for the entity in the database, I can see that the "getAnalyzerDefinitionName()" method from the "LanguageDiscriminator" gets called. Great. But the same does not happen when I execute a search. Can anyone explain me why?
I am posting the key parts of my code below. Thanks a lot for any feedback!
This is one entity I want to index
#Entity
#Table(name="NEWS_HEADER")
#Indexed
#AnalyzerDefs({
#AnalyzerDef(name = "en",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = SnowballPorterFilterFactory.class,
params = {#Parameter(name="language", value="English")}
)
}
),
#AnalyzerDef(name = "pt",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = SnowballPorterFilterFactory.class,
params = {#Parameter(name="language", value="Portuguese")}
)
}
)
})
public class NewsHeader implements Serializable {
static final long serialVersionUID = 20140301L;
private int id;
private String articleHeader;
private String language;
private Set<NewsParagraph> paragraphs = new HashSet<NewsParagraph>();
/**
* #return the id
*/
#Id
#Column(name="ID")
#GeneratedValue(strategy=GenerationType.AUTO)
#DocumentId
public int getId() {
return id;
}
/**
* #param id the id to set
*/
public void setId(int id) {
this.id = id;
}
/**
* #return the articleHeader
*/
#Column(name="ARTICLE_HEADER")
#Field(index=Index.YES, store=Store.NO)
public String getArticleHeader() {
return articleHeader;
}
/**
* #param articleHeader the articleHeader to set
*/
public void setArticleHeader(String articleHeader) {
this.articleHeader = articleHeader;
}
/**
* #return the language
*/
#Column(name="LANGUAGE")
#Field
#AnalyzerDiscriminator(impl=LanguageDiscriminator.class)
public String getLanguage() {
return language;
}
...
}
This is my LanguageDiscriminator class
public class LanguageDiscriminator implements Discriminator {
#Override
public String getAnalyzerDefinitionName(Object value, Object entity, String field) {
String result = null;
if (value != null) {
result = (String) value;
}
return result;
}
}
This is my search method present in my SearchDAO
public List<NewsHeader> searchParagraph(String patternStr) {
Session session = null;
Transaction tx;
List<NewsHeader> result = null;
try {
session = sessionFactory.getCurrentSession();
FullTextSession fullTextSession = Search.getFullTextSession(session);
tx = fullTextSession.beginTransaction();
// Create native Lucene query using the query DSL
QueryBuilder queryBuilder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(NewsHeader.class).get();
org.apache.lucene.search.Query luceneSearchQuery = queryBuilder
.keyword()
.onFields("articleHeader", "paragraphs.content")
.matching(patternStr)
.createQuery();
// Wrap Lucene query in a org.hibernate.Query
org.hibernate.Query hibernateQuery =
fullTextSession.createFullTextQuery(luceneSearchQuery, NewsHeader.class, NewsParagraph.class);
// Execute search
result = hibernateQuery.list();
} catch (Exception xcp) {
logger.error(xcp);
} finally {
if ((session != null) && (session.isOpen())) {
session.close();
}
}
return result;
}

When creating a new record for the entity in the database, I can see that the "getAnalyzerDefinitionName()" method from the "LanguageDiscriminator" gets called. Great. But the same does not happen when I execute a search. Can anyone explain me why?
The selection of the analyzer is dependent on the state of a given entity, in your case NewsHeader. You are dealing with entity instances during indexing. While querying you don't have entities to start with, you are searching for them. Which analyzer would you Hibernate Search to select for your query?
That said, I think there is a shortcoming in the DSL. It does not allow you to explicitly specify the analyzer for a class. There is ignoreAnalyzer, but that's not what you want. I guess you could create a feature request in the Search issue tracker - https://hibernate.atlassian.net/browse/HSEARCH.
In the mean time you can build the query using the native Lucene query API. However, you will need to know which language you are targeting with your query (for example via the preferred language of the logged in user or whatever). This will depend on your use case. It might be you are looking at the wrong feature to start with.

Related

JPA - Attributeconverter not working with native query

I have created AttributeConverter class which is converting Enum to DB value and overriden the necessary methods.
If I use JPA query like below then converter is getting called and getting correct result.
public List<Driver> findByStatus(DriverStatus status);
BUT If I use with Query annotation that AttributeConverter is not getting called. I have more complex query where I need to use native query with Attribute Converter but it is not working for me.
#Query(value = "select * from driver where status=:status", nativeQuery = true)
public List<Driver> findByStatus1(DriverStatus status);
is there any way to handle this requirement ?
Update 1 - below is the Converter code
#Converter(autoApply = true)
public class DriverStatusConverter implements AttributeConverter<DriverStatus, String> {
#Override
public String convertToDatabaseColumn(DriverStatus driverStatus) {
if (driverStatus == null) {
return null;
}
System.err.println("from converter" +driverStatus.getCode());
return driverStatus.getCode();
}
#Override
public DriverStatus convertToEntityAttribute(String code) {
if (code == null) {
return null;
}
return Stream.of(DriverStatus.values()).filter(c -> c.getCode().equals(code)).findFirst()
.orElseThrow(IllegalArgumentException::new);
}
}
Faced similar issue, wasn't able to find satisfying solution, so had to resort to the ugly hack... Repository method signature was changed like so:
public List<Driver> findByStatus1(Integer status);
...and was called like so:
repository.findByStatus1(DriverStatus.YOUR_DESIRED_STATUS.getCode());
Yes, it's ugly and kinda defeats the purpose of AttributeConverter, but at least it works. In my particular case I had to resort to such measures just with one native query. AttributeConverter still works for the all other JPA queries.
I just come across same problem and use following to go around it
DriverStatusConverter move logic code in convertToEntityAttribute to public static method:
#Override
public DriverStatus convertToEntityAttribute(String code) {
return convertStringToEnum(code);
}
public static DriverStatus convertStringToEnum(String code) {
if (code == null) {
return null;
}
return Stream.of(DriverStatus.values()).filter(c -> c.getCode().equals(code)).findFirst()
.orElseThrow(IllegalArgumentException::new);
}
create new DriverDTO for native query result
public interface DriverDTO {
public Integer getId();
public String getName();
public String getStatusCode();
default DriverStatus getStatus() {
return DriverStatusConverter.convertStringToEnum(getStatusCode());
}
}
DriverRepository findByStatusCode() select columns in query and return DriverDTO class
#Query(value = "select id, name, status as statusCode from driver WHERE status=:statusCode", nativeQuery = true)
public List<DriverDTO> findByStatusCode(String statusCode);
Remove the nativeQuery = true and try.
#Query(value = "select * from driver where status=:status")
public List<Driver> findByStatus1(DriverStatus status);
Alternatively you can try using entityManager like so -
This is just sample code.
TypedQuery<Trip> q = entityManager.createQuery("SELECT t FROM Trip t WHERE t.vehicle = :v", Trip.class);
q.setParameter("v", Vehicle.PLANE);
List<Trip> trips = q.getResultList();
Reference -
https://thorben-janssen.com/jpa-21-type-converter-better-way-to/

Search where A or B with querydsl and spring data rest

http://localhost:8080/users?firstName=a&lastName=b ---> where firstName=a and lastName=b
How to make it to or ---> where firstName=a or lastName=b
But when I set QuerydslBinderCustomizer customize
#Override
default public void customize(QuerydslBindings bindings, QUser user) {
bindings.bind(String.class).all((StringPath path, Collection<? extends String> values) -> {
BooleanBuilder predicate = new BooleanBuilder();
values.forEach( value -> predicate.or(path.containsIgnoreCase(value) );
});
}
http://localhost:8080/users?firstName=a&firstName=b&lastName=b ---> where (firstName=a or firstName = b) and lastName=b
It seem different parameters with AND. Same parameters with what I set(predicate.or/predicate.and)
How to make it different parameters with AND like this ---> where firstName=a or firstName=b or lastName=b ??
thx.
Your current request param are grouped as List firstName and String lastName. I see that you want to keep your request parameters without a binding, but in this case it would make your life easier.
My suggestion is to make a new class with request param:
public class UserRequest {
private String lastName;
private List<String> firstName;
// getters and setters
}
For QueryDSL, you can create a builder object:
public class UserPredicateBuilder{
private List<BooleanExpression> expressions = new ArrayList<>();
public UserPredicateBuilder withFirstName(List<String> firstNameList){
QUser user = QUser.user;
expressions.add(user.firstName.in(firstNameList));
return this;
}
//.. same for other fields
public BooleanExpression build(){
if(expressions.isEmpty()){
return Expressions.asBoolean(true).isTrue();
}
BooleanExpression result = expressions.get(0);
for (int i = 1; i < expressions.size(); i++) {
result = result.and(expressions.get(i));
}
return result;
}
}
And after you can just use the builder as :
public List<User> getUsers(UserRequest userRequest){
BooleanExpression expression = new UserPredicateBuilder()
.withFirstName(userRequest.getFirstName())
// other fields
.build();
return userRepository.findAll(expression).getContent();
}
This is the recommended solution.
If you really want to keep the current params without a binding (they still need some kind of validation, otherwise it can throw an Exception in query dsl binding)
you can group them by path :
Map<StringPath,List<String>> values // example firstName => a,b
and after that to create your boolean expression based on the map:
//initial value
BooleanExpression result = Expressions.asBoolean(true).isTrue();
for (Map.Entry entry: values.entrySet()) {
result = result.and(entry.getKey().in(entry.getValues());
}
return userRepository.findAll(result);

How to configure Hibernate Search to find words with accents

I need to implement a global search, on a website which I am implementing using Spring (4.0.2)/Hibernate(4.3.1)/MySQL. I have decided to use Hibernate Search(4.5.0) for this.
This seems to be working fine, but only when I do a search for an exact pattern.
Imagine I have the following text on an indexed field:
"A história do Capuchinho e do Lobo Mau"
1) If I search for "história" or "lobo mau", the query will retrieve the corresponding indexed entity, as I would have expected.
2) If I search for "historia" or "lobos maus" the search will not retrieve the entity.
As far as I have read, it should be possible to configure Hibernate Search to perform a much smarter search than this. Can anyone point me on the right direction to achieve this? See below key aspects of the implementation I executed. Thanks!
This is the "parent" indexed entity
#Entity
#Table(name="NEWS_HEADER")
#Indexed
public class NewsHeader implements Serializable {
static final long serialVersionUID = 20140301L;
private int id;
private String articleHeader;
private String language;
private Set<NewsParagraph> paragraphs = new HashSet<NewsParagraph>();
/**
* #return the id
*/
#Id
#Column(name="ID")
#GeneratedValue(strategy=GenerationType.AUTO)
#DocumentId
public int getId() {
return id;
}
/**
* #param id the id to set
*/
public void setId(int id) {
this.id = id;
}
/**
* #return the articleHeader
*/
#Column(name="ARTICLE_HEADER")
#Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO)
public String getArticleHeader() {
return articleHeader;
}
/**
* #param articleHeader the articleHeader to set
*/
public void setArticleHeader(String articleHeader) {
this.articleHeader = articleHeader;
}
/**
* #return the language
*/
#Column(name="LANGUAGE")
public String getLanguage() {
return language;
}
/**
* #param language the language to set
*/
public void setLanguage(String language) {
this.language = language;
}
/**
* #return the paragraphs
*/
#OneToMany(mappedBy="newsHeader", fetch=FetchType.EAGER, cascade=CascadeType.ALL)
#IndexedEmbedded
public Set<NewsParagraph> getParagraphs() {
return paragraphs;
}
// Other standard getters/setters go here
And this the IndexedEmbedded entity
#Entity
#Table(name="NEWS_PARAGRAPH")
public class NewsParagraph implements Serializable {
static final long serialVersionUID = 20140302L;
private int id;
private String content;
private NewsHeader newsHeader;
/**
* #return the id
*/
#Id
#Column(name="ID")
#GeneratedValue(strategy=GenerationType.AUTO)
public int getId() {
return id;
}
/**
* #param id the id to set
*/
public void setId(int id) {
this.id = id;
}
/**
* #return the content
*/
#Column(name="CONTENT")
#Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO)
public String getContent() {
return content;
}
// Other standard getters/setters go here
This is my search method, implemented on my SearchDAOImpl
public class SearchDAOImpl extends DAOBasics implements SearchDAO {
...
public List<NewsHeader> searchParagraph(String patternStr) {
Session session = null;
Transaction tx;
List<NewsHeader> result = null;
try {
session = sessionFactory.getCurrentSession();
FullTextSession fullTextSession = Search.getFullTextSession(session);
tx = fullTextSession.beginTransaction();
// Create native Lucene query using the query DSL
QueryBuilder queryBuilder = fullTextSession.getSearchFactory()
.buildQueryBuilder().forEntity(NewsHeader.class).get();
org.apache.lucene.search.Query luceneSearchQuery = queryBuilder
.keyword()
.onFields("articleHeader", "paragraphs.content")
.matching(patternStr)
.createQuery();
// Wrap Lucene query in a org.hibernate.Query
org.hibernate.Query hibernateQuery =
fullTextSession.createFullTextQuery(luceneSearchQuery, NewsHeader.class, NewsParagraph.class);
// Execute search
result = hibernateQuery.list();
} catch (Exception xcp) {
logger.error(xcp);
} finally {
if ((session != null) && (session.isOpen())) {
session.close();
}
}
return result;
}
...
}
This is what I have ended up doing, to resolve my problem.
Configure an AnalyzerDef at the entity level. Within it, use LowerCaseFilterFactory, ASCIIFoldingFilterFactory and SnowballPorterFilterFactory to achieve the type of filtering I needed.
#AnalyzerDef(name = "customAnalyzer",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
#TokenFilterDef(factory = SnowballPorterFilterFactory.class)
})
public class NewsHeader implements Serializable {
...
}
Add this notation for each of the fields I want indexed, either in the Parent entity or its IndexedEmbedded counterpart, to use the defined above analyzer.
#Field(index=Index.YES, store=Store.NO)
#Analyzer(definition = "customAnalyzer")
You will need to either re-index, or re-insert your entities, for the analyser to take effect.
You could configure, or you can use a standard language analyzer, such as PortugueseAnalyzer. I'd recommend starting from the existing analyzer, and creating you own if necessary, using it as a starting point for tweaking the filter chain.
You can set this in using the #Analyzer annotation for the field:
#Field(index=Index.YES, analyze=Analyze.YES, store=Store.NO, analyzer = #Analyzer(impl = org.apache.lucene.analysis.pt.PortugueseAnalyzer.class))
Or you can set that analyzer as the default for the class, if you place an #analyzerannotation are the head of the class instead.
If you want to search accent character and also find same normal keyword in result then you must have to implement ASCIIFoldingFilterFactory class in analyzer like
#AnalyzerDef(name = "customAnalyzer",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
#TokenFilterDef(factory = LowerCaseFilterFactory.class),
#TokenFilterDef(factory = ASCIIFoldingFilterFactory.class),
#TokenFilterDef(factory = StopFilterFactory.class, params = {
#Parameter(name="words", value= "com/ik/resource/stoplist.properties" ),
#Parameter(name="ignoreCase", value="true")
})
})
#Analyzer(definition = "customAnalyzer") apply on entity or fields

get next sequence value from database using hibernate

I have an entity that has an NON-ID field that must be set from a sequence.
Currently, I fetch for the first value of the sequence, store it on the client's side, and compute from that value.
However, I'm looking for a "better" way of doing this. I have implemented a way to fetch the next sequence value:
public Long getNextKey()
{
Query query = session.createSQLQuery( "select nextval('mySequence')" );
Long key = ((BigInteger) query.uniqueResult()).longValue();
return key;
}
However, this way reduces the performance significantly (creation of ~5000 objects gets slowed down by a factor of 3 - from 5740ms to 13648ms ).
I have tried to add a "fake" entity:
#Entity
#SequenceGenerator(name = "sequence", sequenceName = "mySequence")
public class SequenceFetcher
{
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "sequence")
private long id;
public long getId() {
return id;
}
}
However this approach didn't work either (all the Ids returned were 0).
Can someone advise me how to fetch the next sequence value using Hibernate efficiently?
Edit: Upon investigation, I have discovered that calling Query query = session.createSQLQuery( "select nextval('mySequence')" ); is by far more inefficient than using the #GeneratedValue- because of Hibernate somehow manages to reduce the number of fetches when accessing the sequence described by #GeneratedValue.
For example, when I create 70,000 entities, (thus with 70,000 primary keys fetched from the same sequence), I get everything I need.
HOWEVER , Hibernate only issues 1404 select nextval ('local_key_sequence') commands. NOTE: On the database side, the caching is set to 1.
If I try to fetch all the data manually, it will take me 70,000 selects, thus a huge difference in performance. Does anyone know the internal functioning of Hibernate, and how to reproduce it manually?
You can use Hibernate Dialect API for Database independence as follow
class SequenceValueGetter {
private SessionFactory sessionFactory;
// For Hibernate 3
public Long getId(final String sequenceName) {
final List<Long> ids = new ArrayList<Long>(1);
sessionFactory.getCurrentSession().doWork(new Work() {
public void execute(Connection connection) throws SQLException {
DialectResolver dialectResolver = new StandardDialectResolver();
Dialect dialect = dialectResolver.resolveDialect(connection.getMetaData());
PreparedStatement preparedStatement = null;
ResultSet resultSet = null;
try {
preparedStatement = connection.prepareStatement( dialect.getSequenceNextValString(sequenceName));
resultSet = preparedStatement.executeQuery();
resultSet.next();
ids.add(resultSet.getLong(1));
}catch (SQLException e) {
throw e;
} finally {
if(preparedStatement != null) {
preparedStatement.close();
}
if(resultSet != null) {
resultSet.close();
}
}
}
});
return ids.get(0);
}
// For Hibernate 4
public Long getID(final String sequenceName) {
ReturningWork<Long> maxReturningWork = new ReturningWork<Long>() {
#Override
public Long execute(Connection connection) throws SQLException {
DialectResolver dialectResolver = new StandardDialectResolver();
Dialect dialect = dialectResolver.resolveDialect(connection.getMetaData());
PreparedStatement preparedStatement = null;
ResultSet resultSet = null;
try {
preparedStatement = connection.prepareStatement( dialect.getSequenceNextValString(sequenceName));
resultSet = preparedStatement.executeQuery();
resultSet.next();
return resultSet.getLong(1);
}catch (SQLException e) {
throw e;
} finally {
if(preparedStatement != null) {
preparedStatement.close();
}
if(resultSet != null) {
resultSet.close();
}
}
}
};
Long maxRecord = sessionFactory.getCurrentSession().doReturningWork(maxReturningWork);
return maxRecord;
}
}
Here is what worked for me (specific to Oracle, but using scalar seems to be the key)
Long getNext() {
Query query =
session.createSQLQuery("select MYSEQ.nextval as num from dual")
.addScalar("num", StandardBasicTypes.BIG_INTEGER);
return ((BigInteger) query.uniqueResult()).longValue();
}
Thanks to the posters here: springsource_forum
I found the solution:
public class DefaultPostgresKeyServer
{
private Session session;
private Iterator<BigInteger> iter;
private long batchSize;
public DefaultPostgresKeyServer (Session sess, long batchFetchSize)
{
this.session=sess;
batchSize = batchFetchSize;
iter = Collections.<BigInteger>emptyList().iterator();
}
#SuppressWarnings("unchecked")
public Long getNextKey()
{
if ( ! iter.hasNext() )
{
Query query = session.createSQLQuery( "SELECT nextval( 'mySchema.mySequence' ) FROM generate_series( 1, " + batchSize + " )" );
iter = (Iterator<BigInteger>) query.list().iterator();
}
return iter.next().longValue() ;
}
}
If you are using Oracle, consider specifying cache size for the sequence. If you are routinely create objects in batches of 5K, you can just set it to a 1000 or 5000. We did it for the sequence used for the surrogate primary key and were amazed that execution times for an ETL process hand-written in Java dropped in half.
I could not paste formatted code into comment. Here's the sequence DDL:
create sequence seq_mytable_sid
minvalue 1
maxvalue 999999999999999999999999999
increment by 1
start with 1
cache 1000
order
nocycle;
To get the new id, all you have to do is flush the entity manager. See getNext() method below:
#Entity
#SequenceGenerator(name = "sequence", sequenceName = "mySequence")
public class SequenceFetcher
{
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "sequence")
private long id;
public long getId() {
return id;
}
public static long getNext(EntityManager em) {
SequenceFetcher sf = new SequenceFetcher();
em.persist(sf);
em.flush();
return sf.getId();
}
}
POSTGRESQL
String psqlAutoincrementQuery = "SELECT NEXTVAL(CONCAT(:psqlTableName, '_id_seq')) as id";
Long psqlAutoincrement = (Long) YOUR_SESSION_OBJ.createSQLQuery(psqlAutoincrementQuery)
.addScalar("id", Hibernate.LONG)
.setParameter("psqlTableName", psqlTableName)
.uniqueResult();
MYSQL
String mysqlAutoincrementQuery = "SELECT AUTO_INCREMENT as id FROM information_schema.tables WHERE table_name = :mysqlTableName AND table_schema = DATABASE()";
Long mysqlAutoincrement = (Long) YOUR_SESSION_OBJ.createSQLQuery(mysqlAutoincrementQuery)
.addScalar("id", Hibernate.LONG)
.setParameter("mysqlTableName", mysqlTableName)
.uniqueResult();
Interesting it works for you. When I tried your solution an error came up, saying that "Type mismatch: cannot convert from SQLQuery to Query". --> Therefore my solution looks like:
SQLQuery query = session.createSQLQuery("select nextval('SEQUENCE_NAME')");
Long nextValue = ((BigInteger)query.uniqueResult()).longValue();
With that solution I didn't run into performance problems.
And don't forget to reset your value, if you just wanted to know for information purposes.
--nextValue;
query = session.createSQLQuery("select setval('SEQUENCE_NAME'," + nextValue + ")");
Spring 5 has some builtin helper classes for that:
org/springframework/jdbc/support/incrementer
Here is the way I do it:
#Entity
public class ServerInstanceSeq
{
#Id //mysql bigint(20)
#SequenceGenerator(name="ServerInstanceIdSeqName", sequenceName="ServerInstanceIdSeq", allocationSize=20)
#GeneratedValue(strategy=GenerationType.SEQUENCE, generator="ServerInstanceIdSeqName")
public Long id;
}
ServerInstanceSeq sis = new ServerInstanceSeq();
session.beginTransaction();
session.save(sis);
session.getTransaction().commit();
System.out.println("sis.id after save: "+sis.id);
Your idea with the SequenceGenerator fake entity is good.
#Id
#GenericGenerator(name = "my_seq", strategy = "sequence", parameters = {
#org.hibernate.annotations.Parameter(name = "sequence_name", value = "MY_CUSTOM_NAMED_SQN"),
})
#GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "my_seq")
It is important to use the parameter with the key name "sequence_name". Run a debugging session on the hibernate class SequenceStyleGenerator, the configure(...) method at the line final QualifiedName sequenceName = determineSequenceName( params, dialect, jdbcEnvironment ); to see more details about how the sequence name is computed by Hibernate. There are some defaults in there you could also use.
After the fake entity, I created a CrudRepository:
public interface SequenceRepository extends CrudRepository<SequenceGenerator, Long> {}
In the Junit, I call the save method of the SequenceRepository.
SequenceGenerator sequenceObject = new SequenceGenerator();
SequenceGenerator result = sequenceRepository.save(sequenceObject);
If there is a better way to do this (maybe support for a generator on any type of field instead of just Id), I would be more than happy to use it instead of this "trick".

why does hibernate hql distinct cause an sql distinct on left join?

I've got this test HQL:
select distinct o from Order o left join fetch o.lineItems
and it does generate an SQL distinct without an obvious reason:
select distinct order0_.id as id61_0_, orderline1_.order_id as order1_62_1_...
The SQL resultset is always the same (with and without an SQL distinct):
order id | order name | orderline id | orderline name
---------+------------+--------------+---------------
1 | foo | 1 | foo item
1 | foo | 2 | bar item
1 | foo | 3 | test item
2 | empty | NULL | NULL
3 | bar | 4 | qwerty item
3 | bar | 5 | asdfgh item
Why does hibernate generate the SQL distinct? The SQL distinct doesn't make any sense and makes the query slower than needed.
This is contrary to the FAQ which mentions that hql distinct in this case is just a shortcut for the result transformer:
session.createQuery("select distinct o
from Order o left join fetch
o.lineItems").list();
It looks like you are using the SQL DISTINCT keyword here. Of course, this is not SQL, this is HQL. This distinct is just a shortcut for the result transformer, in this case. Yes, in other cases an HQL distinct will translate straight into a SQL DISTINCT. Not in this case: you can not filter out duplicates at the SQL level, the very nature of a product/join forbids this - you want the duplicates or you don't get all the data you need.
thanks
Have a closer look at the sql statement that hibernate generates - yes it does use the "distinct" keyword but not in the way I think you are expecting it to (or the way that the Hibernate FAQ is implying) i.e. to return a set of "distinct" or "unique" orders.
It doesn't use the distinct keyword to return distinct orders, as that wouldn't make sense in that SQL query, considering the join that you have also specified.
The resulting sql set still needs processing by the ResultTransformer, as clearly the sql set contains duplicate orders. That's why they say that the HQL distinct keyword doesn't directly map to the SQL distinct keyword.
I had the exact same problem and I think this is an Hibernate issue (not a bug because code doesn't fail). However, I have to dig deeper to make sure it's an issue.
Hibernate (at least in version 4 which its the version I'm working on my project, specifically 4.3.11) uses the concept of SPI, long story short: its like an API to extend or modify the framework.
I took advantage of this feature to replace the classes org.hibernate.hql.internal.ast.ASTQueryTranslatorFactory (This class is called by Hibernate and delegates the job of generating the SQL query) and org.hibernate.hql.internal.ast.QueryTranslatorImpl (This is sort of an internal class which is called by org.hibernate.hql.internal.ast.ASTQueryTranslatorFactory and generates the actual SQL query). I did it as follows:
Replacement for org.hibernate.hql.internal.ast.ASTQueryTranslatorFactory:
package org.hibernate.hql.internal.ast;
import java.util.Map;
import org.hibernate.engine.query.spi.EntityGraphQueryHint;
import org.hibernate.engine.spi.SessionFactoryImplementor;
import org.hibernate.hql.spi.QueryTranslator;
public class NoDistinctInSQLASTQueryTranslatorFactory extends ASTQueryTranslatorFactory {
#Override
public QueryTranslator createQueryTranslator(String queryIdentifier, String queryString, Map filters, SessionFactoryImplementor factory, EntityGraphQueryHint entityGraphQueryHint) {
return new NoDistinctInSQLQueryTranslatorImpl(queryIdentifier, queryString, filters, factory, entityGraphQueryHint);
}
}
Replacement for org.hibernate.hql.internal.ast.QueryTranslatorImpl:
package org.hibernate.hql.internal.ast;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Set;
import org.hibernate.HibernateException;
import org.hibernate.MappingException;
import org.hibernate.QueryException;
import org.hibernate.ScrollableResults;
import org.hibernate.engine.query.spi.EntityGraphQueryHint;
import org.hibernate.engine.spi.QueryParameters;
import org.hibernate.engine.spi.RowSelection;
import org.hibernate.engine.spi.SessionFactoryImplementor;
import org.hibernate.engine.spi.SessionImplementor;
import org.hibernate.event.spi.EventSource;
import org.hibernate.hql.internal.QueryExecutionRequestException;
import org.hibernate.hql.internal.antlr.HqlSqlTokenTypes;
import org.hibernate.hql.internal.antlr.HqlTokenTypes;
import org.hibernate.hql.internal.antlr.SqlTokenTypes;
import org.hibernate.hql.internal.ast.exec.BasicExecutor;
import org.hibernate.hql.internal.ast.exec.DeleteExecutor;
import org.hibernate.hql.internal.ast.exec.MultiTableDeleteExecutor;
import org.hibernate.hql.internal.ast.exec.MultiTableUpdateExecutor;
import org.hibernate.hql.internal.ast.exec.StatementExecutor;
import org.hibernate.hql.internal.ast.tree.AggregatedSelectExpression;
import org.hibernate.hql.internal.ast.tree.FromElement;
import org.hibernate.hql.internal.ast.tree.InsertStatement;
import org.hibernate.hql.internal.ast.tree.QueryNode;
import org.hibernate.hql.internal.ast.tree.Statement;
import org.hibernate.hql.internal.ast.util.ASTPrinter;
import org.hibernate.hql.internal.ast.util.ASTUtil;
import org.hibernate.hql.internal.ast.util.NodeTraverser;
import org.hibernate.hql.spi.FilterTranslator;
import org.hibernate.hql.spi.ParameterTranslations;
import org.hibernate.internal.CoreMessageLogger;
import org.hibernate.internal.util.ReflectHelper;
import org.hibernate.internal.util.StringHelper;
import org.hibernate.internal.util.collections.IdentitySet;
import org.hibernate.loader.hql.QueryLoader;
import org.hibernate.param.ParameterSpecification;
import org.hibernate.persister.entity.Queryable;
import org.hibernate.type.Type;
import org.jboss.logging.Logger;
import antlr.ANTLRException;
import antlr.RecognitionException;
import antlr.TokenStreamException;
import antlr.collections.AST;
/**
* A QueryTranslator that uses an Antlr-based parser.
*
* #author Joshua Davis (pgmjsd#sourceforge.net)
*/
public class NoDistinctInSQLQueryTranslatorImpl extends QueryTranslatorImpl implements FilterTranslator {
private static final CoreMessageLogger LOG = Logger.getMessageLogger(
CoreMessageLogger.class,
QueryTranslatorImpl.class.getName()
);
private SessionFactoryImplementor factory;
private final String queryIdentifier;
private String hql;
private boolean shallowQuery;
private Map tokenReplacements;
//TODO:this is only needed during compilation .. can we eliminate the instvar?
private Map enabledFilters;
private boolean compiled;
private QueryLoader queryLoader;
private StatementExecutor statementExecutor;
private Statement sqlAst;
private String sql;
private ParameterTranslations paramTranslations;
private List<ParameterSpecification> collectedParameterSpecifications;
private EntityGraphQueryHint entityGraphQueryHint;
/**
* Creates a new AST-based query translator.
*
* #param queryIdentifier The query-identifier (used in stats collection)
* #param query The hql query to translate
* #param enabledFilters Currently enabled filters
* #param factory The session factory constructing this translator instance.
*/
public NoDistinctInSQLQueryTranslatorImpl(
String queryIdentifier,
String query,
Map enabledFilters,
SessionFactoryImplementor factory) {
super(queryIdentifier, query, enabledFilters, factory);
this.queryIdentifier = queryIdentifier;
this.hql = query;
this.compiled = false;
this.shallowQuery = false;
this.enabledFilters = enabledFilters;
this.factory = factory;
}
public NoDistinctInSQLQueryTranslatorImpl(
String queryIdentifier,
String query,
Map enabledFilters,
SessionFactoryImplementor factory,
EntityGraphQueryHint entityGraphQueryHint) {
this(queryIdentifier, query, enabledFilters, factory);
this.entityGraphQueryHint = entityGraphQueryHint;
}
/**
* Compile a "normal" query. This method may be called multiple times.
* Subsequent invocations are no-ops.
*
* #param replacements Defined query substitutions.
* #param shallow Does this represent a shallow (scalar or entity-id)
* select?
* #throws QueryException There was a problem parsing the query string.
* #throws MappingException There was a problem querying defined mappings.
*/
#Override
public void compile(
Map replacements,
boolean shallow) throws QueryException, MappingException {
doCompile(replacements, shallow, null);
}
/**
* Compile a filter. This method may be called multiple times. Subsequent
* invocations are no-ops.
*
* #param collectionRole the role name of the collection used as the basis
* for the filter.
* #param replacements Defined query substitutions.
* #param shallow Does this represent a shallow (scalar or entity-id)
* select?
* #throws QueryException There was a problem parsing the query string.
* #throws MappingException There was a problem querying defined mappings.
*/
#Override
public void compile(
String collectionRole,
Map replacements,
boolean shallow) throws QueryException, MappingException {
doCompile(replacements, shallow, collectionRole);
}
/**
* Performs both filter and non-filter compiling.
*
* #param replacements Defined query substitutions.
* #param shallow Does this represent a shallow (scalar or entity-id)
* select?
* #param collectionRole the role name of the collection used as the basis
* for the filter, NULL if this is not a filter.
*/
private synchronized void doCompile(Map replacements, boolean shallow, String collectionRole) {
// If the query is already compiled, skip the compilation.
if (compiled) {
LOG.debug("compile() : The query is already compiled, skipping...");
return;
}
// Remember the parameters for the compilation.
this.tokenReplacements = replacements;
if (tokenReplacements == null) {
tokenReplacements = new HashMap();
}
this.shallowQuery = shallow;
try {
// PHASE 1 : Parse the HQL into an AST.
final HqlParser parser = parse(true);
// PHASE 2 : Analyze the HQL AST, and produce an SQL AST.
final HqlSqlWalker w = analyze(parser, collectionRole);
sqlAst = (Statement) w.getAST();
// at some point the generate phase needs to be moved out of here,
// because a single object-level DML might spawn multiple SQL DML
// command executions.
//
// Possible to just move the sql generation for dml stuff, but for
// consistency-sake probably best to just move responsiblity for
// the generation phase completely into the delegates
// (QueryLoader/StatementExecutor) themselves. Also, not sure why
// QueryLoader currently even has a dependency on this at all; does
// it need it? Ideally like to see the walker itself given to the delegates directly...
if (sqlAst.needsExecutor()) {
statementExecutor = buildAppropriateStatementExecutor(w);
} else {
// PHASE 3 : Generate the SQL.
generate((QueryNode) sqlAst);
queryLoader = new QueryLoader(this, factory, w.getSelectClause());
}
compiled = true;
} catch (QueryException qe) {
if (qe.getQueryString() == null) {
throw qe.wrapWithQueryString(hql);
} else {
throw qe;
}
} catch (RecognitionException e) {
// we do not actually propagate ANTLRExceptions as a cause, so
// log it here for diagnostic purposes
LOG.trace("Converted antlr.RecognitionException", e);
throw QuerySyntaxException.convert(e, hql);
} catch (ANTLRException e) {
// we do not actually propagate ANTLRExceptions as a cause, so
// log it here for diagnostic purposes
LOG.trace("Converted antlr.ANTLRException", e);
throw new QueryException(e.getMessage(), hql);
}
//only needed during compilation phase...
this.enabledFilters = null;
}
private void generate(AST sqlAst) throws QueryException, RecognitionException {
if (sql == null) {
final SqlGenerator gen = new SqlGenerator(factory);
gen.statement(sqlAst);
sql = gen.getSQL();
//Hack: The distinct operator is removed from the sql
//string to avoid executing a distinct query in the db server when
//the distinct is used in hql.
sql = sql.replace("distinct", "");
if (LOG.isDebugEnabled()) {
LOG.debugf("HQL: %s", hql);
LOG.debugf("SQL: %s", sql);
}
gen.getParseErrorHandler().throwQueryException();
collectedParameterSpecifications = gen.getCollectedParameters();
}
}
private static final ASTPrinter SQL_TOKEN_PRINTER = new ASTPrinter(SqlTokenTypes.class);
private HqlSqlWalker analyze(HqlParser parser, String collectionRole) throws QueryException, RecognitionException {
final HqlSqlWalker w = new HqlSqlWalker(this, factory, parser, tokenReplacements, collectionRole);
final AST hqlAst = parser.getAST();
// Transform the tree.
w.statement(hqlAst);
if (LOG.isDebugEnabled()) {
LOG.debug(SQL_TOKEN_PRINTER.showAsString(w.getAST(), "--- SQL AST ---"));
}
w.getParseErrorHandler().throwQueryException();
return w;
}
private HqlParser parse(boolean filter) throws TokenStreamException, RecognitionException {
// Parse the query string into an HQL AST.
final HqlParser parser = HqlParser.getInstance(hql);
parser.setFilter(filter);
LOG.debugf("parse() - HQL: %s", hql);
parser.statement();
final AST hqlAst = parser.getAST();
final NodeTraverser walker = new NodeTraverser(new JavaConstantConverter());
walker.traverseDepthFirst(hqlAst);
showHqlAst(hqlAst);
parser.getParseErrorHandler().throwQueryException();
return parser;
}
private static final ASTPrinter HQL_TOKEN_PRINTER = new ASTPrinter(HqlTokenTypes.class);
#Override
void showHqlAst(AST hqlAst) {
if (LOG.isDebugEnabled()) {
LOG.debug(HQL_TOKEN_PRINTER.showAsString(hqlAst, "--- HQL AST ---"));
}
}
private void errorIfDML() throws HibernateException {
if (sqlAst.needsExecutor()) {
throw new QueryExecutionRequestException("Not supported for DML operations", hql);
}
}
private void errorIfSelect() throws HibernateException {
if (!sqlAst.needsExecutor()) {
throw new QueryExecutionRequestException("Not supported for select queries", hql);
}
}
#Override
public String getQueryIdentifier() {
return queryIdentifier;
}
#Override
public Statement getSqlAST() {
return sqlAst;
}
private HqlSqlWalker getWalker() {
return sqlAst.getWalker();
}
/**
* Types of the return values of an <tt>iterate()</tt> style query.
*
* #return an array of <tt>Type</tt>s.
*/
#Override
public Type[] getReturnTypes() {
errorIfDML();
return getWalker().getReturnTypes();
}
#Override
public String[] getReturnAliases() {
errorIfDML();
return getWalker().getReturnAliases();
}
#Override
public String[][] getColumnNames() {
errorIfDML();
return getWalker().getSelectClause().getColumnNames();
}
#Override
public Set<Serializable> getQuerySpaces() {
return getWalker().getQuerySpaces();
}
#Override
public List list(SessionImplementor session, QueryParameters queryParameters)
throws HibernateException {
// Delegate to the QueryLoader...
errorIfDML();
final QueryNode query = (QueryNode) sqlAst;
final boolean hasLimit = queryParameters.getRowSelection() != null && queryParameters.getRowSelection().definesLimits();
final boolean needsDistincting = (query.getSelectClause().isDistinct() || hasLimit) && containsCollectionFetches();
QueryParameters queryParametersToUse;
if (hasLimit && containsCollectionFetches()) {
LOG.firstOrMaxResultsSpecifiedWithCollectionFetch();
RowSelection selection = new RowSelection();
selection.setFetchSize(queryParameters.getRowSelection().getFetchSize());
selection.setTimeout(queryParameters.getRowSelection().getTimeout());
queryParametersToUse = queryParameters.createCopyUsing(selection);
} else {
queryParametersToUse = queryParameters;
}
List results = queryLoader.list(session, queryParametersToUse);
if (needsDistincting) {
int includedCount = -1;
// NOTE : firstRow is zero-based
int first = !hasLimit || queryParameters.getRowSelection().getFirstRow() == null
? 0
: queryParameters.getRowSelection().getFirstRow();
int max = !hasLimit || queryParameters.getRowSelection().getMaxRows() == null
? -1
: queryParameters.getRowSelection().getMaxRows();
List tmp = new ArrayList();
IdentitySet distinction = new IdentitySet();
for (final Object result : results) {
if (!distinction.add(result)) {
continue;
}
includedCount++;
if (includedCount < first) {
continue;
}
tmp.add(result);
// NOTE : ( max - 1 ) because first is zero-based while max is not...
if (max >= 0 && (includedCount - first) >= (max - 1)) {
break;
}
}
results = tmp;
}
return results;
}
/**
* Return the query results as an iterator
*/
#Override
public Iterator iterate(QueryParameters queryParameters, EventSource session)
throws HibernateException {
// Delegate to the QueryLoader...
errorIfDML();
return queryLoader.iterate(queryParameters, session);
}
/**
* Return the query results, as an instance of <tt>ScrollableResults</tt>
*/
#Override
public ScrollableResults scroll(QueryParameters queryParameters, SessionImplementor session)
throws HibernateException {
// Delegate to the QueryLoader...
errorIfDML();
return queryLoader.scroll(queryParameters, session);
}
#Override
public int executeUpdate(QueryParameters queryParameters, SessionImplementor session)
throws HibernateException {
errorIfSelect();
return statementExecutor.execute(queryParameters, session);
}
/**
* The SQL query string to be called; implemented by all subclasses
*/
#Override
public String getSQLString() {
return sql;
}
#Override
public List<String> collectSqlStrings() {
ArrayList<String> list = new ArrayList<>();
if (isManipulationStatement()) {
String[] sqlStatements = statementExecutor.getSqlStatements();
Collections.addAll(list, sqlStatements);
} else {
list.add(sql);
}
return list;
}
// -- Package local methods for the QueryLoader delegate --
#Override
public boolean isShallowQuery() {
return shallowQuery;
}
#Override
public String getQueryString() {
return hql;
}
#Override
public Map getEnabledFilters() {
return enabledFilters;
}
#Override
public int[] getNamedParameterLocs(String name) {
return getWalker().getNamedParameterLocations(name);
}
#Override
public boolean containsCollectionFetches() {
errorIfDML();
List collectionFetches = ((QueryNode) sqlAst).getFromClause().getCollectionFetches();
return collectionFetches != null && collectionFetches.size() > 0;
}
#Override
public boolean isManipulationStatement() {
return sqlAst.needsExecutor();
}
#Override
public void validateScrollability() throws HibernateException {
// Impl Note: allows multiple collection fetches as long as the
// entire fecthed graph still "points back" to a single
// root entity for return
errorIfDML();
final QueryNode query = (QueryNode) sqlAst;
// If there are no collection fetches, then no further checks are needed
List collectionFetches = query.getFromClause().getCollectionFetches();
if (collectionFetches.isEmpty()) {
return;
}
// A shallow query is ok (although technically there should be no fetching here...)
if (isShallowQuery()) {
return;
}
// Otherwise, we have a non-scalar select with defined collection fetch(es).
// Make sure that there is only a single root entity in the return (no tuples)
if (getReturnTypes().length > 1) {
throw new HibernateException("cannot scroll with collection fetches and returned tuples");
}
FromElement owner = null;
for (Object o : query.getSelectClause().getFromElementsForLoad()) {
// should be the first, but just to be safe...
final FromElement fromElement = (FromElement) o;
if (fromElement.getOrigin() == null) {
owner = fromElement;
break;
}
}
if (owner == null) {
throw new HibernateException("unable to locate collection fetch(es) owner for scrollability checks");
}
// This is not strictly true. We actually just need to make sure that
// it is ordered by root-entity PK and that that order-by comes before
// any non-root-entity ordering...
AST primaryOrdering = query.getOrderByClause().getFirstChild();
if (primaryOrdering != null) {
// TODO : this is a bit dodgy, come up with a better way to check this (plus see above comment)
String[] idColNames = owner.getQueryable().getIdentifierColumnNames();
String expectedPrimaryOrderSeq = StringHelper.join(
", ",
StringHelper.qualify(owner.getTableAlias(), idColNames)
);
if (!primaryOrdering.getText().startsWith(expectedPrimaryOrderSeq)) {
throw new HibernateException("cannot scroll results with collection fetches which are not ordered primarily by the root entity's PK");
}
}
}
private StatementExecutor buildAppropriateStatementExecutor(HqlSqlWalker walker) {
final Statement statement = (Statement) walker.getAST();
switch (walker.getStatementType()) {
case HqlSqlTokenTypes.DELETE: {
final FromElement fromElement = walker.getFinalFromClause().getFromElement();
final Queryable persister = fromElement.getQueryable();
if (persister.isMultiTable()) {
return new MultiTableDeleteExecutor(walker);
} else {
return new DeleteExecutor(walker, persister);
}
}
case HqlSqlTokenTypes.UPDATE: {
final FromElement fromElement = walker.getFinalFromClause().getFromElement();
final Queryable persister = fromElement.getQueryable();
if (persister.isMultiTable()) {
// even here, if only properties mapped to the "base table" are referenced
// in the set and where clauses, this could be handled by the BasicDelegate.
// TODO : decide if it is better performance-wise to doAfterTransactionCompletion that check, or to simply use the MultiTableUpdateDelegate
return new MultiTableUpdateExecutor(walker);
} else {
return new BasicExecutor(walker, persister);
}
}
case HqlSqlTokenTypes.INSERT:
return new BasicExecutor(walker, ((InsertStatement) statement).getIntoClause().getQueryable());
default:
throw new QueryException("Unexpected statement type");
}
}
#Override
public ParameterTranslations getParameterTranslations() {
if (paramTranslations == null) {
paramTranslations = new ParameterTranslationsImpl(getWalker().getParameters());
}
return paramTranslations;
}
#Override
public List<ParameterSpecification> getCollectedParameterSpecifications() {
return collectedParameterSpecifications;
}
#Override
public Class getDynamicInstantiationResultType() {
AggregatedSelectExpression aggregation = queryLoader.getAggregatedSelectExpression();
return aggregation == null ? null : aggregation.getAggregationResultType();
}
public static class JavaConstantConverter implements NodeTraverser.VisitationStrategy {
private AST dotRoot;
#Override
public void visit(AST node) {
if (dotRoot != null) {
// we are already processing a dot-structure
if (ASTUtil.isSubtreeChild(dotRoot, node)) {
return;
}
// we are now at a new tree level
dotRoot = null;
}
if (node.getType() == HqlTokenTypes.DOT) {
dotRoot = node;
handleDotStructure(dotRoot);
}
}
private void handleDotStructure(AST dotStructureRoot) {
final String expression = ASTUtil.getPathText(dotStructureRoot);
final Object constant = ReflectHelper.getConstantValue(expression);
if (constant != null) {
dotStructureRoot.setFirstChild(null);
dotStructureRoot.setType(HqlTokenTypes.JAVA_CONSTANT);
dotStructureRoot.setText(expression);
}
}
}
#Override
public EntityGraphQueryHint getEntityGraphQueryHint() {
return entityGraphQueryHint;
}
#Override
public void setEntityGraphQueryHint(EntityGraphQueryHint entityGraphQueryHint) {
this.entityGraphQueryHint = entityGraphQueryHint;
}
}
If you follow the code flow you will notice that I just modified the method private void generate(AST sqlAst) throws QueryException, RecognitionException and added the a following lines:
//Hack: The distinct keywordis removed from the sql string to
//avoid executing a distinct query in the DBMS when the distinct
//is used in hql.
sql = sql.replace("distinct", "");
What I do with this code is to remove the distinct keyword from the generated SQL query.
After creating the classes above, I added the following line in the hibernate configuration file:
<property name="hibernate.query.factory_class">org.hibernate.hql.internal.ast.NoDistinctInSQLASTQueryTranslatorFactory</property>
This line tells hibernate to use my custom class to parse HQL queries and generate SQL queries without the distinct keyword. Notice I created my custom classes in the same package where the original HQL parser resides.