How to be compatible with like syntax after upgrading Hibernate Search 6.x version - sql

I recently upgraded hibernate search from version 5.x to version 6.x, and encountered some problems. Most grammars can be processed by referring to the document, but there is a like grammar that cannot be processed directly. The official document also gives a description, However, the content of the document is not detailed enough and cannot be completed.
This is my syntax for 5.x version queryBuilder.moreLikeThis().comparingFields("name").toEntity(product).createQuery()
But I want to use the 6.x version and I don't know how to transform it for the time being
Hope someone who knows can answer, thanks!

As explained in the migration guide, the moreLikeThis predicate doesn't exist anymore in Hibernate Search 6.
But if it's just about a single field, you didn't really need the moreLikeThis predicate to begin with.
This should return the same results as your current code:
SearchSession session = Search.session(entityManager);
List<Product> hits = session.search(Product.class)
.where(f -> f.match().field("name").matching(product.getName()))
.fetchHits(20);

Related

Using Field.index from apache lucene index

I am attempting to implement a simple lucene index, using Lucene 7.1.
There are allot of changes to the code between versions, so I am meeting a lot of changes from answer to answer.
In this tutorial I am following
https://www.avajava.com/tutorials/lessons/how-do-i-use-lucene-to-index-and-search-text-files.html
There is a line
document.add(new Field(FIELD_PATH, path, Field.Store.YES, Field.Index.UN_TOKENIZED));
However Field.Index is throwing up errors. I can convert it to TextField but I am not sure if this is the same thing. Can anyone tell me what Field.Index does and how one could modify the code so that it will run ?
That tutorial is using 2.3, it's so old the folks at apache don't even keep that version of lucene in the archives. It wouldn't bother with a resource that old, more headache than it's worth. Looks like they're mostly just going through the lucene demo that comes with every released version of lucene, though. Try going through the current Lucene demo, instead.
As far as what to replace that exact field with, it's indexed, stored and not tokenized, so you'll want to use a StringField. A TextField would be for a field that is tokenized.

Can Liquibase or Flyway handle multi non-linear versioning scenario?

Here is a tough one.
v1.1 has a table with index i.
v2.1 contains this table and index as well.
A bug was discovered and in v1.1.0.1 we changes the code and as a result, decided to drop the index.
We created a corresponding patch for v2.1, v2.1.0.6.
The customer applied patch v1.1.0.1 and a few weeks later upgraded to v2.1 (without patch 6)
As v2.1 code base performs better with the index we have a "broken" application.
I can't force my customers to apply the latest patch.
I can't force the developers to avoid such scenarios.
Can Liquibase or Flyway handle this scenario?
I guess these kind of problems are more organizational and not tool-specific. If you support multiple Version (A branch 1.0 and a newer one 2.0) and provide patches for both (which is totally legitimate approach - don't get me wrong here) you will probably have to provide upgrade notes for all these versions and maybe a matrix that shows from which version to which you can go (and what you can't do).
I just happened to upgrade an older version of Atlassian's Jira Bugtracker and had to find out that they do provide upgrade notes for all versions.
That would have meant to go from one version to the next to finally arrive at the latest version (I was on version 4.x and wanted to go to the latest 5.x) and obey all upgrade notes in between. (Btw, I skipped all this and set it up as a complete fresh installation to avoid this.)
Just to give you an impression, here is a page that shows all these upgrade notes:
https://confluence.atlassian.com/display/JIRA/Important+Version-Specific+Upgrade+Notes
So I guess you could provide a small script that recreates the index if somebody wants to go from version 1.1.0.1 to 2.1 and state in upgrade notes that it needs to be applied.
Since you asked if liquibase (or flyway) can support this, maybe it is helpful to mention that liquibase (I only know liquibase) has a something called preConditions. Which means you can run a changeset (resp. an sql) based on the fact that an (e.g.) index exists <indexExists>.
That could help to re-create the index if it is missing.
But since version 2.1 has already been released (before knowing that the index might be dropped in a future bugfix) there is no chance to add this feature to the upgrade procedure of version 2.1.
Liquibase will handle the drop index change across branches fine, but since you are going from a version that contains code (a drop index change) to one that does not expect that you are going to end up with your broken app state.
With liquibase, changes are completely independent of each other and independent of any versioning. You can think of the liquibase changelog as an ordered list of changes to make that each have a unique identifier. When you do an update, liquibase checks each change in turn to see if it has been ran and runs it if it has not.
Any "versioning" is purely within your codebase and branching scheme, liquibase does not care.
Imagine you start out with your 1.1.0 release that looks like:
change a
change b
change c
when you deploy 1.1.0, the customer database will know changes a,b, and c were ran.
You have v2.1 with new changesets to the end of your changelog file, so it looks like:
change a
change b
change c
change x
change y
change z
and all 2.1 customers database know that a,b,c,x,y,z are applied.
When you create 1.1.0.1 with changeset d that drops your index, you end up with this changelog in the 1.1.0.1 branch:
change a
change b
change c
change d
But when you upgrade your 1.1.0.1 customers to 2.1, liquibase just compares the defined changesets of (a,b,c,x,y,z) against the known changesets of (a,b,c,d) and runs x,y,z. It doesn't care that there is an already ran changeset of d, it does nothing about that.
The liquibase diff support can be used as a bit of a sanity check and would be able to report that there is a missing index compared to some "correct" database, but that is not something you would normally do in a production deployment scenario.
The answer may be a bit late, but I will share my experience. We also came across the same problem in our project. We dealt with it in the next way:
Since releases in our project were not made often, we marked each changeset in liquibase particular context. The value was the exact version migration (like v6.2.1-v6.2.2). We passed value to liquibase though jndi properties, so customer was able to specify them. So during upgrade customer was responsible for setting right value for migration scope for each upgrade. Liquibase context can accept list of values. So in the end, context looked like this:
context=v5.1-5.2,v5.3-5.3.1,v5.3.1-5.4,v6.2.1-v6.2.2

Migrating from lucene 2.x to 3.x

I am porting my app from lucene 2.X to lucene 3.X. The following is my issue.
This one was valid in 2.X, but 3.5 throws me an error.
IndexReader reader = IndexReader.open("/home/path/to/my/dataDir");
2.X accepted a string, but 3.5 strictly wants a Directory object. I find Directory to be abstract and the only way to instantiate it seems a RAMDirectory().
How do I go about this and how do I point my reader to the desired directory?
Try to use
DirectoryReader.open(FSDirectory
.open(new File(indexFilePath)))
as IndexReader.open method is deprecated for lucene 4 :)
I was able to do it. I just did it this way
IndexReader reader = IndexReader.open(new SimpleFSDirectory(new File("my/desired/path")));`
Thanks for your time.

Lucene StandardAnalyzer 3.5 TypeAttribute

I have recently noticed that the behavior of the Lucene StandardAnalyzer have changed somewhat since version 3.1. Concretely, 3.0 and previous versions recognized e-mails, IP addresses, company names etc as separate lexical types, while later versions don't.
For example, for input text : "example#mail.com 127.0.0.1 H&M", the 3.0 analyzer would recognize the following types:
1: example#mail.com: 0->16: <EMAIL>
2: 127.0.0.1: 17->26: <HOST>
3: h&m: 27->30: <COMPANY>
However, version 3.1 and later give the following output for the same input text:
1: example: 0->7: <ALPHANUM>
2: mail.com: 8->16: <ALPHANUM>
3: 127.0.0.1: 17->26: <NUM>
My question is, how can I implement the old StandardAnalyzer behavior with newer version of the Lucene library? Are there some standard TokenFilters that can help me achieve this, or do I need to implement custom filters?
See the javadocs for StandardAnalyzer: As of 3.1, StandardTokenizer implements Unicode text segmentation.... ClassicTokenizer and ClassicAnalyzer are the pre-3.1 implementations of StandardTokenizer and StandardAnalyzer.
Alternatively, you can pass LUCENE_30 version to StandardAnalyzer and you also get the previous behavior. Thats the purpose of these version constants, so that behavior stays consistent for existing users, and you decide when to upgrade your app to changed behavior.

What is the real difference between INDEX.TOKENIZER vs INDEX.ANALYZER in Lucene 2.9?

With lucene 2.9.1, INDEX.TOKENIZED is deprecated. The documentation says it is just renamed to ANALYZER, but I don't think the meaning has stayed the same. I have an existing app based on 2.3 and I'm upgrading to 2.9, but the expected behavior seems to have changed.
Anyone know any more details about INDEX.TOKENIZER vs INDEX.ANALYZER?
I assume you refer to the Field.Index fields ANALYZED and TOKENIZED?
It is true that the TOKENIZED Field has been deprecated. This was the case already with the 2.4.
The Field.Index.ANALYZED is equal to the old Field.Index.TOKENIZED. Could you show how your results deviate from the behaviour you expect?