lucene updateDocument not work

lucene updateDocument not work - lucene

I am using Lucene 3.6. I want to know why update does not work. Is there anything wrong?
public class TokenTest
{
private static String IndexPath = "D:\\update\\index";
private static Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_33);
public static void main(String[] args) throws Exception
{
try
{
update();
display("content", "content");
}
catch (IOException e)
{
e.printStackTrace();
}
}
#SuppressWarnings("deprecation")
public static void display(String keyField, String words) throws Exception
{
IndexSearcher searcher = new IndexSearcher(FSDirectory.open(new File(IndexPath)));
Term term = new Term(keyField, words);
Query query = new TermQuery(term);
TopDocs results = searcher.search(query, 100);
ScoreDoc[] hits = results.scoreDocs;
for (ScoreDoc hit : hits)
{
Document doc = searcher.doc(hit.doc);
System.out.println("doc_id = " + hit.doc);
System.out.println("内容: " + doc.get("content"));
System.out.println("路径:" + doc.get("path"));
}
}
public static String update() throws Exception
{
IndexWriterConfig writeConfig = new IndexWriterConfig(Version.LUCENE_33, analyzer);
IndexWriter writer = new IndexWriter(FSDirectory.open(new File(IndexPath)), writeConfig);
Document document = new Document();
Field field_name2 = new Field("path", "update_path", Field.Store.YES, Field.Index.ANALYZED);
Field field_content2 = new Field("content", "content update", Field.Store.YES, Field.Index.ANALYZED);
document.add(field_name2);
document.add(field_content2);
Term term = new Term("path", "qqqqq");
writer.updateDocument(term, document);
writer.optimize();
writer.close();
return "update_path";
}
}

I assume you want to update your document such that field "path" = "qqqq". You have this exactly backwards (please read the documentation).
updateDocument performs two steps:
Find and delete any documents containing term
In this case, none are found, because your indexed documents does not contain path:qqqq
Add the new document to the index.
You appear to be doing the opposite, trying to lookup by document, then add the term to it, and it doesn't work that way. What you are looking for, I believe, is something like:
Term term = new Term("content", "update");
document.removeField("path");
document.add("path", "qqqq");
writer.updateDocument(term, document);

Related

Index field is empty

I'm working with the Lucene library. I want to index some documents and generate TermVectors for them. I've written an Indexer class to create the fields of the index, but this code returns an empty field.
My index class is:
public class Indexer {
private static File sourceDirectory;
private static File indexDirectory;
private String fieldtitle,fieldbody;
public Indexer() {
this.sourceDirectory = new File(LuceneConstants.dataDir);
this.indexDirectory = new File(LuceneConstants.indexDir);
fieldtitle = LuceneConstants.CONTENTS1;
fieldbody= LuceneConstants.CONTENTS2;
}
public void index() throws CorruptIndexException,
LockObtainFailedException, IOException {
Directory dir = FSDirectory.open(indexDirectory.toPath());
Analyzer analyzer = new StandardAnalyzer(StandardAnalyzer.STOP_WORDS_SET); // using stop words
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
if (indexDirectory.exists()) {
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
} else {
// Add new documents to an existing index:
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
}
IndexWriter writer = new IndexWriter(dir, iwc);
for (File f : sourceDirectory.listFiles()) {
Document doc = new Document();
String[] linetext=getAllText(f);
String title=linetext[1];
String body=linetext[2];
doc.add(new Field(fieldtitle, title, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
doc.add(new Field(fieldbody, body, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.addDocument(doc);
}
writer.close();
}
public String[] getAllText(File f) throws FileNotFoundException, IOException {
String textFileContent = "";
String[] ar = null;
try {
BufferedReader in = new BufferedReader(new FileReader(f));
for (String str : Files.readAllLines(Paths.get(f.getAbsolutePath()))) {
textFileContent += str;
ar=textFileContent.split("--");
}
in.close();
} catch (IOException e) {
System.out.println("File Read Error");
}
return ar;
}
}
and result of debug is:
doc Document #534
fields ArrayList "size=0"
Static
linetext String[] #535(length=4)
title String "how ...."
body String "I created ...."
I also get another error in debugging:
Non-static method "toString" cannot be referenced from a static context.
This error is happened for filepath.

Sounds like you've got an empty file, or are running into an IOException. See this part of your code:
String[] ar = null;
try {
//Do Stuff
} catch (IOException e) {
System.out.println("File Read Error");
}
return ar;
On an IOException, you fail to handle it, and effectively guarantee you'll immediately thereafter run into another exception. You need to figure out how to handle it if you run into an IOException, or if getAllText returns an array of length 1 or 2
Also, not the issue you are currently running into, but this is almost certainly backwards:
if (indexDirectory.exists()) {
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
} else {
// Add new documents to an existing index:
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
}
And there really isn't a need for it at all, anyway. That's what CREATE_OR_APPEND is for, to write to an existing index, or create it if it isn't there. Just replace that whole bit with
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);

Searching sentences in PDF using Lucene phrase query and PDFBOX

I have used the following code for searching text in pdf. It is working fine with single word. But for sentences as mentioned in the code, it is showing that it is not present even if the text is present in the document. can any one help me in resolving this?
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
// Store the index in memory:
Directory directory = new RAMDirectory();
// To store an index on disk, use this instead:
//Directory directory = FSDirectory.open("/tmp/testindex");
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
Document doc = new Document();
PDDocument document = null;
try {
document = PDDocument.load(strFilepath);
}
catch (IOException ex) {
System.out.println("Exception Occured while Loading the document: " + ex);
}
int i =1;
String name = null;
String output=new PDFTextStripper().getText(document);
//String text = "This is the text to be indexed";
doc.add(new Field("contents", output, TextField.TYPE_STORED));
iwriter.addDocument(doc);
iwriter.close();
// Now search the index
DirectoryReader ireader = DirectoryReader.open(directory);
IndexSearcher isearcher = new IndexSearcher(ireader);
// Parse a simple query that searches for "text":
QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "contents", analyzer);
String sentence = "Following are the";
PhraseQuery query = new PhraseQuery();
String[] words = sentence.split(" ");
for (String word : words) {
query.add(new Term("contents", word));
}
ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
if(hits.length>0){
System.out.println("Searched text existed in the PDF.");
}
ireader.close();
directory.close();
}
catch(Exception e){
System.out.println("Exception: "+e.getMessage());
}
}

You should use the query parser to create a query from your sentence instead of creating your phrasequery by yourself. your self created query contains the term "Following" which is not indexed since the standard analyzer will lowercase it during indexing so only "following" is indexed.

Lucene Full Text search Engine Did you mean feature

How to implement Did You Mean and Spellchecker feature in lucene full text search engine.

After you've created the index you can can create the index with the dictionary used by the spell checker using:
public void createSpellChekerIndex() throws CorruptIndexException,
IOException {
final IndexReader reader = IndexReader.open(this.indexDirectory, true);
final Dictionary dictionary = new LuceneDictionary(reader,
LuceneExample.FIELD);
final SpellChecker spellChecker = new SpellChecker(this.spellDirectory);
final Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
final IndexWriterConfig writerConfig = new IndexWriterConfig(
Version.LUCENE_36, analyzer);
spellChecker.indexDictionary(dictionary, writerConfig, true);
spellChecker.close();
}
and than ask for a suggestions array with:
public String[] getSuggestions(final String queryString,
final int numberOfSuggestions, final float accuracy) {
try {
final SpellChecker spellChecker = new SpellChecker(
this.spellDirectory);
final String[] similarWords = spellChecker.suggestSimilar(
queryString, numberOfSuggestions, accuracy);
return similarWords;
} catch (final Exception e) {
return new String[0];
}
}
Example:
After indexing the following document:
luceneExample.index("spell checker");
luceneExample.index("did you mean");
luceneExample.index("hello, this is a test");
luceneExample.index("Lucene is great");
And creating the spell index with the method above, i tried to search for the string "lucete" and, asking for suggestion with
final String query = "lucete";
final String[] suggestions = luceneExample.getSuggestions(query, 5,
0.2f);
System.out.println("Did you mean:\n" + Arrays.toString(suggestions));
This was the output:
Did you mean:
[lucene]

How to use Lucene 4.0 Spatial API?

I cannot find any complete examples of how to use this API. The code below is not giving any results. Any idea why?
static String spatialPrefix = "_point";
static String latField = spatialPrefix + "lat";
static String lngField = spatialPrefix + "lon";
public static void main(String[] args) throws IOException {
SpatialLuceneExample spatial = new SpatialLuceneExample();
spatial.addData();
IndexReader reader = DirectoryReader.open(modules.getDirectory());
IndexSearcher searcher = new IndexSearcher(reader);
searchAndUpdateDocument(38.9510000, -77.4107000, 100.0, searcher,
modules);
}
private void addLocation(IndexWriter writer, String name, double lat,
double lng) throws IOException {
Document doc = new Document();
doc.add(new org.apache.lucene.document.TextField("name", name,
Field.Store.YES));
doc.add(new org.apache.lucene.document.DoubleField(latField, lat,
Field.Store.YES));
doc.add(new org.apache.lucene.document.DoubleField(lngField, lng,
Field.Store.YES));
doc.add(new org.apache.lucene.document.TextField("metafile", "doc",
Field.Store.YES));
writer.addDocument(doc);
System.out.println("===== Added Doc to index ====");
}
private void addData() throws IOException {
IndexWriter writer = modules.getWriter();
addLocation(writer, "McCormick & Schmick's Seafood Restaurant",
38.9579000, -77.3572000);
addLocation(writer, "Jimmy's Old Town Tavern", 38.9690000, -77.3862000);
addLocation(writer, "Ned Devine's", 38.9510000, -77.4107000);
addLocation(writer, "Old Brogue Irish Pub", 38.9955000, -77.2884000);
//...
writer.close();
}
private final static Logger logger = LogManager
.getLogger(SpatialTools.class);
public static void searchAndUpdateDocument(double lo, double la,
double dist, IndexSearcher searcher, LuceneModules modules) {
SpatialContext ctx = SpatialContext.GEO;
SpatialArgs args = new SpatialArgs(SpatialOperation.IsWithin,
ctx.makeCircle(lo, la, DistanceUtils.dist2Degrees(dist,
DistanceUtils.EARTH_MEAN_RADIUS_KM)));
PointVectorStrategy strategy = new PointVectorStrategy(ctx, "_point");
// RecursivePrefixTreeStrategy recursivePrefixTreeStrategy = new
// RecursivePrefixTreeStrategy(grid, fieldName);
// How to use it?
Query makeQueryDistanceScore = strategy.makeQueryDistanceScore(args);
LuceneSearcher instance = LuceneSearcher.getInstance(modules);
instance.getTopResults(makeQueryDistanceScore);
//no results
Filter geoFilter = strategy.makeFilter(args);
try {
Sort chainedSort = new Sort().rewrite(searcher);
TopDocs docs = searcher.search(new MatchAllDocsQuery(), geoFilter,
10000, chainedSort);
logger.debug("search finished, num: " + docs.totalHits);
//no results
for (ScoreDoc scoreDoc : docs.scoreDocs) {
Document doc = searcher.doc(scoreDoc.doc);
double la1 = Double.parseDouble(doc.get(latField));
double lo1 = Double.parseDouble(doc.get(latField));
double distDEG = ctx.getDistCalc().distance(
args.getShape().getCenter(), lo1, la1);
logger.debug("dist deg: : " + distDEG);
double distKM = DistanceUtils.degrees2Dist(distDEG,
DistanceUtils.EARTH_MEAN_RADIUS_KM);
logger.debug("dist km: : " + distKM);
}
} catch (IOException e) {
logger.error("fail to get the search result!", e);
}
}

Did you see the javadocs? These docs in turn point to SpatialExample.java which is what you're looking for. What could I do to make them more obvious?
If you're bent on using a pair of doubles as the internal index approach then use PointVectorStrategy. However, you'll get superior filter performance if you instead use RecursivePrefixTreeStrategy. Presently, PVS does better distance sorting, though, scalability wise. You could use both for their respective benefits.
Just looking quickly at your example, I see you didn't use SpatialStrategy.createIndexableFields(). The intention is that you use that.

See the following link for example : http://mad4search.blogspot.in/2013/06/implementing-geospatial-search-using.html

storing the RDBMS table data through lucene in text file on hard disk

I want to store the RDBMS sql query result of 3.2 million records in text file using lucene and then search that.
[I saw the example here how to integrate RAMDirectory into FSDirectory in lucene
[1]: how to integrate RAMDirectory into FSDirectory in lucene .I have this piece of code that is working for me
public class lucetest {
public static void main(String args[]) {
lucetest lucetestObj = new lucetest();
lucetestObj.main1(lucetestObj);
}
public void main1(lucetest lucetestObj) {
final File INDEX_DIR = new File(
"C:\\Documents and Settings\\44444\\workspace\\lucenbase\\bin\\org\\lucenesample\\index");
try {
Connection conn;
Class.forName("com.teradata.jdbc.TeraDriver").newInstance();
conn = DriverManager.getConnection(
"jdbc:teradata://x.x.x.x/CHARSET=UTF16", "aaa", "bbb");
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
// Directory index = new RAMDirectory(); //To use RAM space
Directory index = FSDirectory.open(INDEX_DIR); //To use Hard disk,This will not consume RAM
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_35,
analyzer);
IndexWriter writer = new IndexWriter(index, config);
// IndexWriter writer = new IndexWriter(INDEX_DIR, analyzer, true);
System.out.println("Indexing to directory '" + INDEX_DIR + "'...");
lucetestObj.indexDocs(writer, conn);
writer.optimize();
writer.close();
System.out.println("pepsi");
lucetestObj.searchDocs(index, analyzer, "india");
try {
conn.close();
} catch (SQLException e2) {
// TODO Auto-generated catch block
e2.printStackTrace();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
}
}
void indexDocs(IndexWriter writer, Connection conn) throws Exception {
String sql = "select id, name, color from pet";
String queryy = " SELECT CFMASTERNAME, " + " ULTIMATEPARENTID,"
+ "ULTIMATEPARENT, LONG_NAMEE FROM XCUST_SRCH_SRCH"
+ "sample 100000;";
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(queryy);
int kk = 0;
while (rs.next()) {
Document d = new Document();
d.add(new Field("id", rs.getString("CFMASTERID"), Field.Store.YES,
Field.Index.NO));
d.add(new Field("name", rs.getString("CFMASTERNAME"),
Field.Store.YES, Field.Index.ANALYZED));
d.add(new Field("color", rs.getString("LONG_NAMEE"),
Field.Store.YES, Field.Index.ANALYZED));
writer.addDocument(d);
}
if (rs != null) {
rs.close();
}
}
void searchDocs(Directory index, StandardAnalyzer analyzer,
String searchstring) throws Exception {
String querystr = searchstring.length() > 0 ? searchstring : "lucene";
Query q = new QueryParser(Version.LUCENE_35, "name", analyzer)
.parse(querystr);
int hitsPerPage = 10;
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(
hitsPerPage, true);
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
System.out.println("Found " + hits.length + " hits.");
for (int i = 0; i < hits.length; ++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println((i + 1) + ".CFMASTERNAME " + d.get("name")
+ " ****LONG_NAMEE**" + d.get("color") + "****ID******"
+ d.get("id"));
}
searcher.close();
}
}
How to format this code so that instead of RAM directory the sql result table is saved on the hard disk at the path specified.I am not able to work out a solution.My requirement is that this table data stored on disk through lucene returns result very fast.Hence i am saving data on disk through lucene which is indexed.

Directory index = FSDirectory.open(INDEX_DIR);
You mention saving the sql result to a text file, but that is unnecessary overhead. As you iterate through a ResultSet, save the rows directly to the Lucene index.
As an aside, not that it matters much, but naming your local var (final or otherwise) in all caps is against the convention. Use camelCase. All caps is only for class-level constants (static final members of a class).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

lucene updateDocument not work - lucene

Related

Index field is empty

Searching sentences in PDF using Lucene phrase query and PDFBOX

Lucene Full Text search Engine Did you mean feature

How to use Lucene 4.0 Spatial API?

storing the RDBMS table data through lucene in text file on hard disk

Categories

Resources