How to add another field to an existing Lucene index?

How to add another field to an existing Lucene index? - lucene

I have a lucene index that contains documents with the following fields: num(IntField), title(TextField,stored), contents(TextField,not stored)
I want to add a field to this index. I tried this (after finding the documentId, both the reader and the writer are open and q is a query that i used to find documentId):
Document doc = indexreader.document(documentId);
doc.add(new TextField("terms",terms,Store.YES));
writer.deleteDocuments(q);
writer.addDocument(doc);
However, when I try to query the index for the newly edited document , i can't seem to find it.
edit:it worked perfectly before I added the field, and it still works for other documents that I haven't edited.

Related

Lucene 6.2.1 - IllegalStateException "field-name" was indexed without position data; cannot run SpanTermQuery

I am not familiar with lucene . Recently I got a chance to involve in a work where they are moving from old lucene version 2.4.1 to 6.2.1 for their application.
While running with new version 6.2.1, we are facing an exception while searching:
Exception during query field "field_name" was indexed without position data; cannot run SpanTermQuery (term=2887629129)**
In code, the field is created as follows:
doc.add(new Field("field_1", "field_value", StringField.TYPE_STORED));
Finally we tried as given below:
FieldType type = new FieldType();
type.setStored(true);
type.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS);
doc.add(new Field("field_1", "field_value", StringField.TYPE_STORED));
With the above change, the previous error was gone, but we are not receiving any search results, getting empty result.

Given that you are using a SpanQuery, I assume you want the field to be analyzed. StringField indexes without analysis, as a single token. You will want to use TextField.
doc.add(new Field("field_1", "field_value", TextField.TYPE_STORED));
No need to set IndexOptions here, the default will already be DOCS_AND_FREQS_AND_POSITIONS.

How to read a document in lucene which is not stored but indexed

Hello i have 3 fields title,content,url and i created the index added some document
Document doc = new Document();
doc.add(new TextField("title", title, Field.Store.YES));
doc.add(new TextField("content", title, Field.Store.YES));
doc.add(new StringField("url", isbn, Field.Store.NO));
w.addDocument(doc);
I can read the index using the index writer and iterate and receive the field title,content how can i receive the field url ?

You can search using "url" field but cannot get(display) it
for Example:
Field.Store.NO is suitable for id like fields which you need only to retrieve documents not for displaying

Since you chose not to store it, I don't think you can. That's exactly what the "store" option is for (allowing you to retrieve more data than just the document id).

how can i receive the field url?
You can't. Field.Store.NO means Lucene takes this value and only uses it for indexing purposes so this document can be found if you query by matching url.

How to index a WEB TREC collection?

I've build a WEB TREC collection by downloading and parsing html pages by myself. Each TREC file contains a Category field. How can I build an index by using Lucene in order to perform a search in that collection? The idea is that this search, instead of returning documents as results, it could return categories.
Thank you!

This should be a relatively simple task since you have them in HTML format. You could index them in Lucene thus (Java based pseudo code)
foreach(file in htmlfiles)
{
Document d = new Document();
d.add(new Field("Category", GetCategoryName(...), Field.Store.YES, Field.Index.NOT_ANALYZED));
d.add(new Field("Contents", GetContents(...), Field.Store.YES, Field.Index.ANALYZED));
writer.addDocument(d);
writer.close();
}
GetCategoryName = should return the category string and GetContents(...) the contents of corresponding HTML file.It would be a good idea to parse out the HTML contents from the tags there are several ways of doing it. HtmlParser being one.
When you search, search the contents field and iterate through your search results to collect your Categories.
If you want to get a list of categories with counts attached ("facets") look into faceted search. Solr is a search server built using Lucene that provides this out of the box.

How can I read a Lucene document field tokens after they are analyzed?

If I create a document and add a field that is both stored and analyzed, how can I then read this field back as a list of tokens? I have the following:
Document doc = new Document();
doc.add(new Field("url", fileName, Store.YES, Index.NOT_ANALYZED));
doc.add(new Field("text", fileContent, Store.YES, Index.ANALYZED));
// add the document to the index
writer.addDocument(doc);
So the fileContext is a String containing a lot of text. It is analyzed whereby it is tokenized when it is stored in the index. However, how can I get these tokens? I can retrieve the document from the index after it is stored, and I can read the "text" field from the document, but this is returned as a string. I would like to get the tokens if possible. My 'writer' is an IndexWriter instance and it uses a StandardAnalyzer. Any pointers would be very much welcomed.
Thank you very much

Check out document.getField("name").tokenStreamValue().
EDIT: Actually this question gives you the full solution using the above TokenStream.

Lucene.net - how to query a path filed with numeric sections?

I've created an index which indexes the event items in different sections of a website.
This items are on the website in a structure like this:
/Start/Section1/Events/2011/12/25/X-mas
/Start/Section2/Events/2012/01/01/New-years-day
These paths are stored in the field path in the index.
On the start page I need an overview of the events from all the different sections.
When I'm in a section I only need the events placed under that section.
I add a booleanquery like this:
QueryParser queryParser = new QueryParser("path", analyzer);
Query query = queryParser.Parse(startPath);
completeQuery.Add(query, BooleanClause.Occur.MUST);
"path" is a field that is added through a custom index script;
To retreive the items for the start page I would search my index using:
string startPath = "/Start";
This normally gives me all item where the path starts with "/Start"
To retreive the items for section1 I would search my index using:
string startPath = "/Start/Section1/Events";
This normally gives me all item where the path starts with "/Start/Section1/Events"
I've implemented this solution for news items and that works fine. For event items it does not.
When I search my index it returns no hits. The problem is that the last three folder names are numeric.
When I rename the folders (f.e. 2011,12,25) to text (two-thousand,twelve,twenty-five) it DOES return hits.
How can I get my index to return results keeping my folder names numeric?

Use a CharTokenizer for your path, and have IsTokenChar(char c) return false for the /.
This way you'll be sure each part of your path is an individual Token.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to add another field to an existing Lucene index? - lucene

Related

Lucene 6.2.1 - IllegalStateException "field-name" was indexed without position data; cannot run SpanTermQuery

How to read a document in lucene which is not stored but indexed

How to index a WEB TREC collection?

How can I read a Lucene document field tokens after they are analyzed?

Lucene.net - how to query a path filed with numeric sections?

Categories

Resources