iText - TU Entries in PDF Dictionary

iText - TU Entries in PDF Dictionary - pdf

I have the following code in Java. How can I create a PDF Dictionary and add TU entries to it? I have an instance of PdFReader, Acrofields, and PdfStamper. It is currently printing out the field names. I'm trying to add TU entries so that I can change the field names.
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.AcroFields;
import com.itextpdf.text.pdf.PdfStamper;
import java.util.Set;
import java.io.FileOutputStream;
public class PDFFile {
public static final String xfaForm3 = "C:/PDF_Service/pdf-project/src/sample.pdf";
public static final String dest = "sample2.xml";
public static void main(String[] args)
{
PdfReader reader;
PdfReader.unethicalreading = true;
AcroFields form;
try{
reader = new PdfReader(xfaForm3);
PdfStamper stamper2 = new PdfStamper(reader, new FileOutputStream(dest));
form = stamper2.getAcroFields();
stamper2.close();
Set<String> fldNames = form.getFields().keySet();
for (String fldName : fldNames)
{
System.out.println( fldName + " : " + form.getField( fldName ) );
}
}
catch(Exception e)
{
e.printStackTrace();
}
}

Related

Is there a way to get the word-count from a Lucene index?

I'm using the latest Apache-Lucene version 8.1.1
Is it possible and how can I get the word-count of all the (non stop words) terms stored in a Lucene index? The result should be:
term1 453443
term2 445484
term3 443333
etc.
I need this in Java or Scala but any language will be fine to illustrate the API for it ...

You can find an example implementation below.
Note that count gives number of documents not number of occurences (lucene word count 4, document count 3). Also stop words are not omitted.
import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.misc.HighFreqTerms;
import org.apache.lucene.misc.TermStats;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public class LuceneTest2 {
final static String index = "index";
final static String field = "text";
public static void index() {
try {
Directory dir = FSDirectory.open(Paths.get(index));
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
iwc.setOpenMode(OpenMode.CREATE);
IndexWriter writer = new IndexWriter(dir, iwc);
String[] lines = {
"lucene java lucene mark",
"lucene",
"lucene an example",
"java python"
};
for (int i = 0; i < lines.length; i++) {
String line = lines[i];
Document doc = new Document();
doc.add(new StringField("id", "" + i, Field.Store.YES));
doc.add(new TextField(field, line.trim(), Field.Store.YES));
writer.addDocument(doc);
}
System.out.println("indexed " + lines.length + " sentences");
writer.close();
} catch (IOException e) {
System.out.println(" caught a " + e.getClass() + "\n with message: " + e.getMessage());
}
}
public static void count() {
try {
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(index)));
int numTerms = 100;
TermStats[] stats = HighFreqTerms.getHighFreqTerms(reader, numTerms, field, new HighFreqTerms.DocFreqComparator());
for (TermStats termStats : stats) {
String termText = termStats.termtext.utf8ToString();
System.out.println(termText + " " + termStats.docFreq);
}
reader.close();
} catch (Exception e) {
System.out.println(" caught a " + e.getClass() + "\n with message: " + e.getMessage());
}
}
public static void main(String[] args) {
index();
count();
}
}
This outputs:
lucene 3
java 2
python 1
mark 1
example 1
an 1

Lucene Document and Java ArrayList

I have a set of data: ID, Name and a ArrayList associated with the ID. I wanted this data to be stored in the Lucene document. The search will be based on the Id and name. The List should not be indexed.
I do not know how a List/ArrayList can be stored in the Lucene Document. What is best way of doing this?
Apache Lucene verison I am using is 7.1.0.
Thanks.
Document doc = new Document();
String id = "something";
String name = "someName";
List someList = new ArrayList();
doc.add(new StringField("Id", id, Field.Store.YES));
doc.add(new TextField("name", name, Field.Store.YES));
How do I do something similar for the 'someList'?

You can use StoredField.
A simple aproach might be serializing the list. Here is a sample code:
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Random;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StoredField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
public class LuceneTest {
public static byte[] serialize(Object obj) throws IOException {
ByteArrayOutputStream bos = new ByteArrayOutputStream(200000);
ObjectOutputStream os = new ObjectOutputStream(bos);
os.writeObject(obj);
os.close();
return bos.toByteArray();
}
public static Object unserialize(byte[] bytes) throws ClassNotFoundException, IOException {
ObjectInputStream is = new ObjectInputStream(new ByteArrayInputStream(bytes));
Object result = is.readObject();
is.close();
return result;
}
public static void index() {
String indexPath = "index";
try {
System.out.println("Indexing to directory '" + indexPath + "'...");
Directory dir = FSDirectory.open(Paths.get(indexPath));
Analyzer analyzer = new StandardAnalyzer();
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
iwc.setOpenMode(OpenMode.CREATE);
IndexWriter writer = new IndexWriter(dir, iwc);
for (int i = 0; i < 100; i++) {
Document doc = new Document();
String id = "id" + i;
String name = "jack " + i;
ArrayList<Object> someList = new ArrayList<>();
someList.add(id + " => " + name);
someList.add("index" + i);
doc.add(new StringField("Id", id, Field.Store.YES));
doc.add(new TextField("name", name, Field.Store.YES));
doc.add(new StoredField("list", serialize(someList)));
writer.addDocument(doc);
}
writer.close();
} catch (IOException e) {
System.out.println(" caught a " + e.getClass() + "\n with message: " + e.getMessage());
}
}
public static void search() {
String index = "index";
String field = "name";
int hitsPerPage = 10;
try {
IndexReader reader = DirectoryReader.open(FSDirectory.open(Paths.get(index)));
IndexSearcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser(field, analyzer);
Random rnd = new Random();
for (int i = 0; i < 10; i++) {
int randIndex = rnd.nextInt(100);
String nameQuery = "" + randIndex;
Query query = parser.parse(nameQuery);
System.out.println("Searching for: " + query.toString(field));
TopDocs results = searcher.search(query, hitsPerPage);
ScoreDoc[] hits = results.scoreDocs;
for (ScoreDoc scoreDoc : hits) {
Document doc = searcher.doc(scoreDoc.doc);
String id = doc.get("Id");
String name = doc.get("name");
ArrayList<Object> list = (ArrayList<Object>) unserialize(doc.getBinaryValue("list").bytes);
System.out.println("id: " + id + " name: " + name + " list: " + list);
}
System.out.println();
}
reader.close();
} catch (Exception e) {
System.out.println(" caught a " + e.getClass() + "\n with message: " + e.getMessage());
}
}
public static void main(String[] args) {
index();
search();
}
}

SQL Query to JTable NullPointerException

This runs perfectly:
package sledmonitor;
import java.io.File;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
public class GetList3 {
private Connection conn;
private String ConnectURL = "jdbc:odbc:Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=." + File.separator + "db" + File.separator + "sledprod.mdb";
private String user = "";
private String pw = "";
String offer;
GetList3(String sql){
try{
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver").newInstance();
conn = DriverManager.getConnection(ConnectURL, user, pw);
Statement stmt = conn.createStatement();
System.out.println(sql);
ResultSet rs = stmt.executeQuery(sql);
while(rs.next()){
offer = rs.getString("offernumber");
System.out.println(offer);
}
rs.close();
stmt.close();
}
catch(Exception e){
e.printStackTrace();
}
finally{
if(conn!=null){
try{
conn.close();
}
catch(Exception e){
e.printStackTrace();
}
}
}
}
}
Now I would like to show the Results in a JTable. So I make this, what normaly should work:
while(rs.next()){
offer = rs.getString("offernumber");
List.tablemodel.addRow(new String[]{offer});
}
But this somehow throws following NullPointerException:
SELECT o.offernumber, o.TotalMPL
FROM sled_offers o left join sled_changes c on o.offerguid = c.objectid
WHERE (c.nextsequencenr is null or c.nextsequencenr=99999) AND TotalMPL='0'
AND (o.offerstate=2 or o.offerstate=4 or o.offerstate=5)
java.lang.NullPointerException
at sledmonitor.GetList3.<init>(GetList3.java:28)
at sledmonitor.GetList.<init>(GetList.java:45)
at sledmonitor.Monitor$4.mouseClicked(Monitor.java:99)
at java.awt.AWTEventMulticaster.mouseClicked(AWTEventMulticaster.java:270)
at java.awt.Component.processMouseEvent(Component.java:6508)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3321)
at java.awt.Component.processEvent(Component.java:6270)
at java.awt.Container.processEvent(Container.java:2229)
at java.awt.Component.dispatchEventImpl(Component.java:4861)
at java.awt.Container.dispatchEventImpl(Container.java:2287)
at java.awt.Component.dispatchEvent(Component.java:4687)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4832)
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4501)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4422)
at java.awt.Container.dispatchEventImpl(Container.java:2273)
at java.awt.Window.dispatchEventImpl(Window.java:2719)
at java.awt.Component.dispatchEvent(Component.java:4687)
at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:703)
at java.awt.EventQueue.access$000(EventQueue.java:102)
at java.awt.EventQueue$3.run(EventQueue.java:662)
at java.awt.EventQueue$3.run(EventQueue.java:660)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:87)
at java.awt.EventQueue$4.run(EventQueue.java:676)
at java.awt.EventQueue$4.run(EventQueue.java:674)
at java.security.AccessController.doPrivileged(Native Method)
at java.security.ProtectionDomain$1.doIntersectionPrivilege(ProtectionDomain.java:76)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:673)
at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:244)
at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:163)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:151)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:147)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:139)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:97)
The class List looks like this:
package sledmonitor;
import java.awt.BorderLayout;
import java.awt.EventQueue;
import javax.swing.JFrame;
import javax.swing.JPanel;
import javax.swing.border.EmptyBorder;
import javax.swing.table.DefaultTableModel;
import javax.swing.JScrollPane;
import javax.swing.JTable;
public class List extends JFrame {
private JPanel contentPane;
private JTable table;
static DefaultTableModel tablemodel;
/**
* Launch the application.
*/
public static void main(String[] args) {
EventQueue.invokeLater(new Runnable() {
public void run() {
try {
List frame = new List();
frame.setVisible(true);
} catch (Exception e) {
e.printStackTrace();
}
}
});
}
/**
* Create the frame.
*/
public List() {
setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
setBounds(100, 100, 450, 300);
contentPane = new JPanel();
contentPane.setBorder(new EmptyBorder(5, 5, 5, 5));
contentPane.setLayout(new BorderLayout(0, 0));
setContentPane(contentPane);
JScrollPane scrollPane = new JScrollPane();
contentPane.add(scrollPane, BorderLayout.CENTER);
tablemodel = new DefaultTableModel();
tablemodel.addColumn("OfferNumber");
table = new JTable(tablemodel);
scrollPane.setViewportView(table);
}
}
Any idea?

You seem to be accessing the static field List.tablemodel without instanciating first the List object, so you aren't calling List() which doesn't instanciate your tablemodel object.
You should then use a static constructor:
static {
tablemodel = new DefaultTableModel();
tablemodel.addColumn("OfferNumber");
}
And remove those 2 lines of code from the List() constructor.

use this
DefaultTableModel myData = new DefaultTableModel();
JTable table = new JTable(myData);
// initialize the table with model type constructor
while(rs.next()){
String offer = rs.getString("offernumber");
Object [] rowData=new Object[]{offer,null};
myData.addRow(rowData);
}
//use addRow(Object[] rowData) method of defaulttablemodel class

How do I get all the fields and value from a PDF file using iText?

Using the code below, I'm trying to get all the fields and their values, but it's only returning the field values. What do I need to do to get both?
package PrintFiled;
/*
Usage: java TestDocument <pdf file>
*/
import java.io.File;
import java.io.FileOutputStream;
import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.Iterator;
import java.util.Map;
import org.pdfbox.pdmodel.interactive.form.PDField;
import com.lowagie.text.pdf.AcroFields;
import com.lowagie.text.pdf.MultiColumnText;
import com.lowagie.text.pdf.PdfDictionary;
import com.lowagie.text.pdf.RadioCheckField;
import com.lowagie.text.pdf.PdfSignature;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.PdfStamper;
import com.lowagie.text.pdf.AcroFields;
import com.lowagie.text.pdf.AcroFields.*;
public class TestDocument
{
public static void main(String[] args) throws Exception
{
// get the destination
String location = "test/";
if (location != null && !location.endsWith(File.separator))
{
location = location + File.separator;
}
PdfReader reader = new PdfReader(location + "acc.pdf");
String name = "Output.pdf";
PdfStamper stamp = new PdfStamper(reader, new FileOutputStream(location + name));
AcroFields form = stamp.getAcroFields();
String last = "subform[0].LastName[0]";
String first = "subform[0].FirstName[0]";
String ProcessDate="subform[0].ProcessDate[0]";
form.setField(last, "HRISTOV");
form.setField(first, "NEDKO");
form.setField(ProcessDate, "Process");
System.out.println("last Name :"+last.toString());
Map val =new HashMap();
//val=form.getFieldCache();
//System.out.println("Value :"+val);
Iterator it = val.entrySet().iterator();
while (it.hasNext())
{
Map.Entry pairs = (Map.Entry)it.next();
System.out.println("Key :"+pairs.getKey() + " = "+"Value :" + pairs.getValue());
}
Collection fieldlist = form.getFields().keySet();
// for (Iterator i = val.iterator(); i.hasNext(); )
for (Iterator i = fieldlist.iterator(); i.hasNext(); )
{
System.out.println(i.next().toString());
System.out.println("Value :"+val);
}
/*List fields = form.getFields();
Iterator fieldsIter = fields.iterator();
System.out.println(new Integer(fields.size()).toString()
+ " top-level fields were found on the form");
while (fieldsIter.hasNext()) {
PDField field = (PDField) fieldsIter.next();
// processField(field, "|--", field.getPartialName());
System.out.println("Field :"+fieldsIter);
}
*/
System.out.println("First Name:"+form.getField(first));
System.out.println("LastName :"+form.getField(last));
System.out.println("ProcessDate :"+form.getField(ProcessDate));
// close pdf stamper
stamp.setFormFlattening(true);
stamp.close();
reader.close();
}
}

To get all the fields and their values with iText:
// you only need a PdfStamper if you're going to change the existing PDF.
PdfReader reader = new PdfReader( pdfPath );
AcroFields fields = reader.getAcroFields();
Set<String> fldNames = fields.getFields().keySet();
for (String fldName : fldNames) {
System.out.println( fldName + ": " + fields.getField( fldName ) );
}
Beyond that, it would help if we could see a PDF that is giving you trouble.

Here's the answer updated for iText version 7.0.2:
PdfReader reader = new PdfReader( pdfPath );
PdfDocument document = new PdfDocument(reader);
PdfAcroForm acroForm = PdfAcroForm.getAcroForm(document, false);
Map<String,PdfFormField> fields = acroForm.getFormFields();
for (String fldName : fields.keySet()) {
System.out.println( fldName + ": " + fields.get( fldName ).getValueAsString() );
}
document.close();
reader.close();

Lucene ,highlighting and NullPointerException

I am trying to highlight some results . I index the body (the text) of my documents in the field "contents" and when I try to highilight using highlighter.getBestFragment(...) I get a NullPointerException .
But when,for exemple I try to highlight the fileName it works properly.
I know since I use only one field with the fileReader or (ParsingReader) my text is tokenized which is different from a file name .
Here's my code ,please help me .
package xxxxxx;
import java.io.File;
import java.io.FileFilter;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import org.apache.tika.parser.ParsingReader;
public class Indexer {
static long start = 0;
public static void main(String[] args) throws Exception {
System.out.println("l'index se trouve Ã  " + args[0]);
System.out.println("le dossier ou s'effectue l'indexation est :" + args[1]);
if (args.length != 2) {
throw new IllegalArgumentException("Usage: java " + Indexer.class.getName()
+ " <index dir> <data dir>");
}
String indexDir = args[0];
String dataDir = args[1];
start = System.currentTimeMillis();
Indexer indexer = new Indexer(indexDir);
int numIndexed;
try {
numIndexed = indexer.index(dataDir, new TextFilesFilter());
} finally {
indexer.close();
}
long end = System.currentTimeMillis();
System.out.println("Indexing " + numIndexed + " files took "
+ (end - start) + " milliseconds");
}
private IndexWriter writer;
public Indexer(String indexDir) throws IOException, InterruptedException {
Directory dir = FSDirectory.open(new File(indexDir));
writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), true,
IndexWriter.MaxFieldLength.UNLIMITED);
writer.setUseCompoundFile(true);
}
public void close() throws IOException {
writer.optimize();
writer.close();
}
public int index(String dataDir, FileFilter filter) throws Exception {
File[] files = new File(dataDir).listFiles();
for (File f : files) {
if (!f.isDirectory() && !f.isHidden() && f.exists() && f.canRead() && (filter == null || filter.accept(f))) {
if (!(f.getCanonicalPath().endsWith("~"))) {
indexFile(f);
}
} else {
index(f.toString(), filter);
}
}
return writer.numDocs();
}
private static class TextFilesFilter implements FileFilter {
public boolean accept(File path) {
return true;
}
}
protected Document getDocument(File f) throws Exception {
// FileReader frf = new FileReader(f);
Document doc = new Document();
Reader reader = new ParsingReader(f);
doc.add(new Field("contents", reader, Field.TermVector.WITH_POSITIONS_OFFSETS));
doc.add(new Field("filename", f.getName(), Field.Store.YES, Field.Index.ANALYZED ));
doc.add(new Field("fullpath", f.getCanonicalPath(),Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
return doc;
}
private void indexFile(File f) throws Exception {
System.out.println("Indexing " + f.getCanonicalPath());
Document doc = getDocument(f);
writer.addDocument(doc);
System.out.println(System.currentTimeMillis() - start);
}
}
-------------------------------------------------------------------
package xxxxxxxxxxxxxxxxxxxx;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.MultiFieldQueryParser;
import org.apache.lucene.queryParser.ParseException;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.DisjunctionMaxQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.InvalidTokenOffsetsException;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleSpanFragmenter;
import org.apache.lucene.search.highlight.TokenSources;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
public class Searcher {
public static void main(String[] args) throws IllegalArgumentException,
IOException, ParseException, InvalidTokenOffsetsException {
System.out.println("endroit ou se situe l'index " + args[0]);
System.out.println(args[1]);
if (args.length != 2) {
throw new IllegalArgumentException("Usage: java "
+ Searcher.class.getName()
+ " <index dir> <query>");
}
String indexDir = args[0];
String q = args[1];
search(indexDir, q);
}
public static void search(String indexDir, String q) throws IOException, ParseException, InvalidTokenOffsetsException {
Directory dir = FSDirectory.open(new File(indexDir));
IndexSearcher indexSearcher = new IndexSearcher(dir);
QueryParser parserC = new QueryParser(Version.LUCENE_30, "contents", new StandardAnalyzer(Version.LUCENE_30));
// QueryParser parserN = new QueryParser(Version.LUCENE_30, "filename", new StandardAnalyzer(Version.LUCENE_30));
QueryParser parserP = new QueryParser(Version.LUCENE_30, "fullpath", new StandardAnalyzer(Version.LUCENE_30));
parserC.setDefaultOperator(QueryParser.Operator.OR);
// parserN.setDefaultOperator(QueryParser.Operator.OR);
parserC.setPhraseSlop(10);
// parserN.setPhraseSlop(10);
DisjunctionMaxQuery dmq = new DisjunctionMaxQuery(6);
Query query = new MultiFieldQueryParser(Version.LUCENE_30, new String[]{"contents", "filename"},
new CustomAnalyzer()).parse(q);
Query queryC = parserC.parse(q);
//Query queryN = parserN.parse(q);
dmq.add(queryC);
//dmq.add(queryN);
// dmq.add(query) ;
QueryScorer scorer = new QueryScorer(dmq, "contents");
Highlighter highlighter = new Highlighter(scorer);
highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer));
System.out.println(query.toString());
long start = System.currentTimeMillis();
TopDocs hits = indexSearcher.search(dmq, 15);
System.out.println(hits.totalHits);
long end = System.currentTimeMillis();
System.err.println("Found " + hits.totalHits
+ " document(s) (in " + (end - start)
+ " milliseconds) that matched query '"
+ q + "':");
for (ScoreDoc scoreDoc : hits.scoreDocs) {
Document doc = indexSearcher.doc(scoreDoc.doc);
System.out.print(scoreDoc.score);
System.out.println(doc.get("fullpath"));
String contents = doc.get("contents"); // I am pretty sure the mistake is here , contents is always Null
//But what can I do to make this thing work ?
TokenStream stream =
TokenSources.getAnyTokenStream(indexSearcher.getIndexReader(),
scoreDoc.doc,
"contents",
doc,
new StandardAnalyzer(Version.LUCENE_30));
String fragment =
highlighter.getBestFragment(stream, contents);
System.out.println(fragment);
}
indexSearcher.close();
}
}
----------------------------------------------------------------------

You need it to be stored if you want to use that highlighter. "filename" is stored but "contents" isn't, which is why you see them behaving differently:
doc.add(new Field("contents", reader, Field.TermVector.WITH_POSITIONS_OFFSETS));
doc.add(new Field("filename", f.getName(), Field.Store.YES, Field.Index.ANALYZED ));

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

iText - TU Entries in PDF Dictionary - pdf

Related

Is there a way to get the word-count from a Lucene index?

Lucene Document and Java ArrayList

SQL Query to JTable NullPointerException

How do I get all the fields and value from a PDF file using iText?

Lucene ,highlighting and NullPointerException

Categories

Resources