I am attempting to implement a simple lucene index, using Lucene 7.1.
There are allot of changes to the code between versions, so I am meeting a lot of changes from answer to answer.
In this tutorial I am following
https://www.avajava.com/tutorials/lessons/how-do-i-use-lucene-to-index-and-search-text-files.html
There is a line
document.add(new Field(FIELD_PATH, path, Field.Store.YES, Field.Index.UN_TOKENIZED));
However Field.Index is throwing up errors. I can convert it to TextField but I am not sure if this is the same thing. Can anyone tell me what Field.Index does and how one could modify the code so that it will run ?
That tutorial is using 2.3, it's so old the folks at apache don't even keep that version of lucene in the archives. It wouldn't bother with a resource that old, more headache than it's worth. Looks like they're mostly just going through the lucene demo that comes with every released version of lucene, though. Try going through the current Lucene demo, instead.
As far as what to replace that exact field with, it's indexed, stored and not tokenized, so you'll want to use a StringField. A TextField would be for a field that is tokenized.
Related
I recently upgraded hibernate search from version 5.x to version 6.x, and encountered some problems. Most grammars can be processed by referring to the document, but there is a like grammar that cannot be processed directly. The official document also gives a description, However, the content of the document is not detailed enough and cannot be completed.
This is my syntax for 5.x version queryBuilder.moreLikeThis().comparingFields("name").toEntity(product).createQuery()
But I want to use the 6.x version and I don't know how to transform it for the time being
Hope someone who knows can answer, thanks!
As explained in the migration guide, the moreLikeThis predicate doesn't exist anymore in Hibernate Search 6.
But if it's just about a single field, you didn't really need the moreLikeThis predicate to begin with.
This should return the same results as your current code:
SearchSession session = Search.session(entityManager);
List<Product> hits = session.search(Product.class)
.where(f -> f.match().field("name").matching(product.getName()))
.fetchHits(20);
I want to parse text files with lucene using HunspellStemmer to check for spelling errors. I will use Hunspell dictionaries that's why I want to use HunspellStemmer.
At this point I'm not sure how I should parse the files and do the checking.
Could I use a Standard Analyser with WordFilter to index the text in a file and check Term by term if the keyword is present in HunspellDictionary.
I did that and it works, not sure it's the optimal solution, but if I want to output 3-5 suggestions by word not present, I have no idea what do to.
I could use a IndexerSearch when I use a PlainTextDictionnary, but no idea how to get that functionality with HunspellDictionary. (it doesn't implement Dictionary).
any help will be really appreciate.
thanks
examples that I want to check : hell, hello, hall, helli. I'm hoping to have suggestions for "helli" using a Hunspell.
today I was trying to use the SnowballAnalyzer on Lucene Java API v3.6.0 but it seems Deprecated already. When I try to use the analyzer on my code, the code stop when it reach the analyzer. Actually I want to use PorterStemmer but it was not available on luce, so I decided to use this snowball, but this problem occured.
Anyone knows how to fix this?
Plus, does anyone know how to set the stop word file format, cause when I put:
a
as
able
about
above
according
accordingly
across
actually
after
afterwards
.
.
.
In the stopword.txt, and call it, the program stop. Can anyone share with me how to format the stopword.txt file?
Thanks.
being deprecated cannot make your code stop running. You must have some other issue.
Your stopword.txt seems to have the right format.
I have a question that how could somebody get term frequency as we do get in lucene by the following method
DocFreq(new Term("Field", "value")); using solr/solrnet.
Try debugQuery=on or TermsComponent. None of them are currently supported through SolrNet, so you can either work around it, or implement them and contribute them to the project.
With lucene 2.9.1, INDEX.TOKENIZED is deprecated. The documentation says it is just renamed to ANALYZER, but I don't think the meaning has stayed the same. I have an existing app based on 2.3 and I'm upgrading to 2.9, but the expected behavior seems to have changed.
Anyone know any more details about INDEX.TOKENIZER vs INDEX.ANALYZER?
I assume you refer to the Field.Index fields ANALYZED and TOKENIZED?
It is true that the TOKENIZED Field has been deprecated. This was the case already with the 2.4.
The Field.Index.ANALYZED is equal to the old Field.Index.TOKENIZED. Could you show how your results deviate from the behaviour you expect?