How to read Lucene 3.2 index by Lucene 4.10? - lucene

Getting Lucene 4.10 read 3.2 version indexes
Upgraded to 4.10 still need to read 3.2 indexes. Deployed jre 7 as required. Made all changes within a existing code base which became erroneous. Still need to read 3.2 indexes before going to take on re-indexing. How to read existing 3.2 indexes by Lucene 4.10 ( what changes to make if any in a code )

You can use IndexUpgrader, something like:
IndexUpgrader upgrader = new IndexUpgrader(myIndexDirectory, Version.LUCENE_4_10_0);
upgrader.upgrade();
or run it from the command line:
java -cp lucene-core.jar org.apache.lucene.index.IndexUpgrader myIndexDirectory

You can set the codec used to decode the indexes in the IndexWriterConfig. Lucene3xCodec would be the codec to use here:
IndexWriterConfig config = new IndexWriterConfig(Version.LATEST, analyzer);
config.setCodec(new Lucene3xCodec());
IndexWriter writer = new IndexWriter(directory, config);
IndexSearcher searcher = new IndexSearcher(new DirectoryReader.open(writer));
Bear in mind, this codec is strictly read-only. Any attempt to add, delete, or update a document will result in an UnsupportedOperationException being thrown. If you wish to support writing to the index, you must upgrade your index (see my original answer).

Related

How to use a compass lucene generated cfs index?

With (the latest) lucene 8.7 is it possible to open a .cfs compound index file generated by lucene 2.2 of around 2009, in a legacy application that I cannot modify, with lucene utility "Luke" ?
or alternatively could it be possibile to generate the .idx file for Luke from the .cfs ?
the .cfs was generated by compass on top of lucene 2.2, not by lucene directly
Is it possible to use a compass generated index containing :
_b.cfs
segments.gen
segments_d
possibly with solr ?
are there any examples how to open a file based .cfs index with compass anywhere ?
the conversion tool won't work because the index version is too old :
from lucene\build\demo :
java -cp ../core/lucene-core-8.7.0-SNAPSHOT.jar;../backward-codecs/lucene-backward-codecs-8.7.0-SNAPSHOT.jar org.apache.lucene.index.IndexUpgrader -verbose path_of_old_index
and the searchfiles demo :
java -classpath ../core/lucene-core-8.7.0-SNAPSHOT.jar;../queryparser/lucene-queryparser-8.7.0-SNAPSHOT.jar;./lucene-demo-8.7.0-SNAPSHOT.jar org.apache.lucene.demo.SearchFiles -index path_of_old_index
both fail with :
org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported
This version of Lucene only supports indexes created with release 6.0 and later.
Is is possible to use an old index with lucene somehow ? how to use the old "codec" ?
also from lucene.net if possible ?
current lucene 8.7 yields an index containing these files :
segments_1
write.lock
_0.cfe
_0.cfs
_0.si
==========================================================================
update : amazingly it seems to open that very old format index with lucene.net v. 3.0.3 from nuget !
this seems to work in order to extract all terms from the index :
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Globalization;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
using Lucene.Net.Store;
using Version = Lucene.Net.Util.Version;
namespace ConsoleApplication1
{
class Program
{
static void Main()
{
var reader = IndexReader.Open(FSDirectory.Open("C:\\Temp\\ftsemib_opzioni\\v210126135604\\index\\search_0"), true);
Console.WriteLine("number of documents: "+reader.NumDocs() + "\n");
Console.ReadLine();
TermEnum terms = reader.Terms();
while (terms.Next())
{
Term term = terms.Term;
String termField = term.Field;
String termText = term.Text;
int frequency = reader.DocFreq(term);
Console.WriteLine(termField +" "+termText);
}
var fieldNames = reader.GetFieldNames(IndexReader.FieldOption.ALL);
int numFields = fieldNames.Count;
Console.WriteLine("number of fields: " + numFields + "\n");
for (IEnumerator<String> iter = fieldNames.GetEnumerator(); iter.MoveNext();)
{
String fieldName = iter.Current;
Console.WriteLine("field: " + fieldName);
}
reader.Close();
Console.ReadLine();
}
}
}
out of curiosity could it be possible to find out what index version it is ?
are there any examples of (old) compass with file system based index ?
Unfortunately you can't use an old Codec to access index files from Lucene 2.2. This is because codecs were introduced in Lucene 4.0. Prior to that the code for reading and writing files of the index was not grouped together into a codec but rather was just inherently part of the overall Lucene Library.
So in version of Lucene prior to 4.0 there is no codec, just file reading and writing code baked into the library. It would be very difficult to track down all that code and to create a codec that could be plugged into a modern version of Lucene. It's not an impossible task, but it require an Expert Lucene developer and a large amount of effort (ie an extremely expensive endeavor).
In light of all that, the answer to this SO question may be of some use: How to upgrade lucene files from 2.2 to 4.3.1
Update
Your best bet would be to use an old 3.x copy of java lucene or the Lucene.net ver 3.0.3 to open the index, then add and commit one doc (which will create a 2nd segment) and do a Optimize which will cause the two segments to be merged into one new segment. The new segment will be a version 3 segment. Then you can use Lucene.Net 4.8 Beta or a Java Lucene 4.X to do the same thing (but Commit was renamed ForceMerge starting in ver 4) again to convert the index to a 4.x index.
Then you can use the current java version of Lucene 8.x to do this once more to move the index all the way up to 8 since the current version of Java Lucene has codecs reaching all the way back to 5.0 see: https://github.com/apache/lucene-solr/tree/master/lucene/core/src/java/org/apache/lucene/codecs
However if you do receive the error again that you reported:
This version of Lucene only supports indexes created with release 6.0 and later.
then you will have to play this game one more cycle with a version 6.x Java Lucene to get from a 5.x index to a 6.x index. :-)

Solr: import Lucene index while server is up and running

As read here Can a raw Lucene index be loaded by Solr? Lucene indexes can be imported into Solr. This works well when the Solr server is not running (creating a Solr core folder structure in the data folder with all the needed configuration files) but it does not work when the Solr server is up and running.
Is there any call (via rest endpoint or java api) to tell Solr to re-scan the data folder?
You want to generate an index with lucene (outsite solr) and insert this to solr without restart.
You must not change the index-folder directly. But you can create a new core which point to the already build index folder and switch/swap the core with the (outdated) old one. Or you can merge the new index-Folder in the old core.
All this can be done by the solrj admin api.
e.g. create:
CoreAdminRequest.Create req = new CoreAdminRequest.Create();
req.setConfigName(configName);
req.setSchemaName(schemaName);
req.setDataDir(dataDir);
req.setCoreName(coreName);
req.setInstanceDir(instanceDir);
req.setIsTransient(true);
req.setIsLoadOnStartup(false); // <= unless its productive core.
return req.process(adminServer);
e.g. the swap:
CoreAdminRequest request = new CoreAdminRequest();
request.setAction(CoreAdminAction.SWAP);
request.setCoreName(coreName1);
request.setOtherCoreName(coreName2);
request.process(solrClient);
For SolrCloud use the first "create" approach with the collections api and use alias instead of swap.
e.g. the alias:
CollectionAdminRequest.CreateAlias req = new CollectionAdminRequest.CreateAlias();
req.setAliasedCollections(coreName);
req.setAliasName(aliasName);
return req.process(solrClient);

how to use very old iText(under 0.99) to create bookmarks / outlines?

may I know how to use old iText(very old version under 0.99, package path = com.lowagie.xxx) to create bookmarks to jump in the internal pdf pls?
like the api in new iText jar:
PdfOutline outoline2 = com.itextpdf.pdf.PdfAction.gotoLocalPage("destinationName", false)
we have found below code to create bookmark, but find old iText needs to use the filename(see outFileName in below code). but what we want is a jump in internal pdf (not remote pdf)
olineSignature = new PdfOutline(root, new PdfAction(outFileName, "Signature2TxtDestination"), "Signature2TxtOutline");
FYI, we don't know what page number in advance, so no way to use the api as below: old PdfAction.gotoLocalPage(int, PdfDestination, PdfWriter)
anybody can help me? Thanks.#Bruno Lowagie, #itext :)
We are in the progress of upgrading to new iText(itext5+), but now we do get a request to create bookmarks(using old iText) for others to retrieve the created bookmarks.
My memory can't go that far back but local destinations are most probably not supported. Your only chance is to do an interim upgrade to the Jurassic 2.1.7 that should be more or less compatible with that Pleistocene 0.99.

avoid indexing documents again Lucene

When I run my program, I index the documents each time I run the program in eclipse. However, I want to just index once. Perhaps by deleting the index after each use, but I don't know how to go about doing that.
Set your IndexWriter to OpenMode.CREATE. It's probably set to OpenMode.CREATE_OR_APPEND now. Setting it to CREATE will cause the existing index at the specified directory to be overwritten when you open the indexwriter, to make way for the new one.
Like:
IndexWriterConfig config = new IndexWriterConfig(version, analyzer);
config.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
//etc.....
IndexWriter writer = new IndexWriter(directory, config);

Lucene IndexSearcher locks index causing IOException when rebuilding

I've learned from reading the available documentation that an IndexSearcher instance should be shared across searches, for optimal performance, and that a new instance must be created in order to load any changes made to the index. This implies that the index is writable (using IndexWriter) after having created an instance of IndexSearcher that points to the same directory. However, this is not the behaviour I see in my implementation of Lucene.Net. I'm using FSDirectory. RAMDirectory is not a viable option. The IndexSearcher locks one of the index files (in my implementation it's the _1.cfs file) making the index non-updatable during the lifetime of the IndexSearcher instance.
Is this a known behaviour? Can't I rebuild the index from scratch while using an IndexSearcher instance created prior to rebuilding? Is it only possible to to modifications to the index, but not to rebuild it?
Here is how I create the IndexSearcher instance:
// Create FSDirectory
var directory = FSDirectory.GetDirectory(storagePath, false);
// Create IndexReader
var reader = IndexReader.Open(directory);
// I get the same behaviour regardless of whether I close the directory or not.
directory.Close();
// Create IndexSearcher
var searcher = new IndexSearcher(reader);
// Closing the reader will cause "object reference not set..." when searching.
//reader.Close();
Here is how I create the IndexWriter:
var directory = FSDirectory.GetDirectory(storagePath, true);
var indexWriter = new IndexWriter(directory, new StandardAnalyzer(), true);
I'm using Lucene.Net version 2.0.
Edit:
Upgrading to Lucene.Net 2.1 (thx KenE) and slightly modifying the way I create my IndexWriter fixed the problem:
var directory = FSDirectory.GetDirectory(storagePath, false);
var indexWriter = new IndexWriter(directory, new StandardAnalyzer(), true);
The latest version of Lucene.Net (2.1) appears to support opening an IndexWriter with create=true even when there are open readers:
http://incubator.apache.org/lucene.net/docs/2.1/Lucene.Net.Index.IndexWriter.html
Earlier versions are not clear as to whether they support this or not. I would try using 2.1.