I'm working with the Lucene library. I want to index some documents and generate TermVectors for them. I've written an Indexer class to create the fields of the index, but this code returns an empty field.
My index class is:
public class Indexer {
private static File sourceDirectory;
private static File indexDirectory;
private String fieldtitle,fieldbody;
public Indexer() {
this.sourceDirectory = new File(LuceneConstants.dataDir);
this.indexDirectory = new File(LuceneConstants.indexDir);
fieldtitle = LuceneConstants.CONTENTS1;
fieldbody= LuceneConstants.CONTENTS2;
}
public void index() throws CorruptIndexException,
LockObtainFailedException, IOException {
Directory dir = FSDirectory.open(indexDirectory.toPath());
Analyzer analyzer = new StandardAnalyzer(StandardAnalyzer.STOP_WORDS_SET); // using stop words
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
if (indexDirectory.exists()) {
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
} else {
// Add new documents to an existing index:
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
}
IndexWriter writer = new IndexWriter(dir, iwc);
for (File f : sourceDirectory.listFiles()) {
Document doc = new Document();
String[] linetext=getAllText(f);
String title=linetext[1];
String body=linetext[2];
doc.add(new Field(fieldtitle, title, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
doc.add(new Field(fieldbody, body, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.addDocument(doc);
}
writer.close();
}
public String[] getAllText(File f) throws FileNotFoundException, IOException {
String textFileContent = "";
String[] ar = null;
try {
BufferedReader in = new BufferedReader(new FileReader(f));
for (String str : Files.readAllLines(Paths.get(f.getAbsolutePath()))) {
textFileContent += str;
ar=textFileContent.split("--");
}
in.close();
} catch (IOException e) {
System.out.println("File Read Error");
}
return ar;
}
}
and result of debug is:
doc Document #534
fields ArrayList "size=0"
Static
linetext String[] #535(length=4)
title String "how ...."
body String "I created ...."
I also get another error in debugging:
Non-static method "toString" cannot be referenced from a static context.
This error is happened for filepath.
Sounds like you've got an empty file, or are running into an IOException. See this part of your code:
String[] ar = null;
try {
//Do Stuff
} catch (IOException e) {
System.out.println("File Read Error");
}
return ar;
On an IOException, you fail to handle it, and effectively guarantee you'll immediately thereafter run into another exception. You need to figure out how to handle it if you run into an IOException, or if getAllText returns an array of length 1 or 2
Also, not the issue you are currently running into, but this is almost certainly backwards:
if (indexDirectory.exists()) {
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
} else {
// Add new documents to an existing index:
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
}
And there really isn't a need for it at all, anyway. That's what CREATE_OR_APPEND is for, to write to an existing index, or create it if it isn't there. Just replace that whole bit with
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);
Related
My goal is to programmatically create .doc on Google Drive with text and generated shapes.
My first problem that I cannot overcome is that when I create a doc file the file is generated saved on google drive but can not be opened .
Before when I created plain text files everything was OK.
private void createFile() {
final Task<DriveFolder> rootFolderTask = getDriveResourceClient().getRootFolder();
final Task<DriveContents> createContentsTask = getDriveResourceClient().createContents();
Tasks.whenAll(rootFolderTask, createContentsTask)
.continueWithTask(new Continuation<Void, Task<DriveFile>>() {
#Override
public Task<DriveFile> then(#NonNull Task<Void> task) throws Exception {
DriveFolder parent = rootFolderTask.getResult();
final Task<MetadataBuffer> Createlist = getDriveResourceClient().listChildren(parent);
DriveContents contents = createContentsTask.getResult();
OutputStream outputStream = contents.getOutputStream();
try (Writer writer = new OutputStreamWriter(outputStream)) {
writer.write("Witaj z nuclearhelperaa");
}
MetadataChangeSet changeSet = new MetadataChangeSet.Builder()
.setTitle("test6")
// .setMimeType("text/plain")
.setMimeType("application/msword")
.setStarred(true)
.build();
return getDriveResourceClient().createFile(parent, changeSet, contents);
}
})
.addOnSuccessListener(this,
new OnSuccessListener<DriveFile>() {
#Override
public void onSuccess(DriveFile driveFile) {
showMessage(getString(R.string.file_created,
driveFile.getDriveId().encodeToString()));
finish();
}
})
.addOnFailureListener(this, new OnFailureListener() {
#Override
public void onFailure(#NonNull Exception e) {
Log.e(TAG, "Unable to create file", e);
showMessage(getString(R.string.file_create_error));
finish();
}
});
// [END create_file]
}
Is my task possible? I mean after creating a doc put some images formatted text and generated shapes (I need to create very simple picture containing couple dots) is possible to fulfil and if yes do you know some good libraries to make it possible?
I am using Lucene 3.6. I want to know why update does not work. Is there anything wrong?
public class TokenTest
{
private static String IndexPath = "D:\\update\\index";
private static Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_33);
public static void main(String[] args) throws Exception
{
try
{
update();
display("content", "content");
}
catch (IOException e)
{
e.printStackTrace();
}
}
#SuppressWarnings("deprecation")
public static void display(String keyField, String words) throws Exception
{
IndexSearcher searcher = new IndexSearcher(FSDirectory.open(new File(IndexPath)));
Term term = new Term(keyField, words);
Query query = new TermQuery(term);
TopDocs results = searcher.search(query, 100);
ScoreDoc[] hits = results.scoreDocs;
for (ScoreDoc hit : hits)
{
Document doc = searcher.doc(hit.doc);
System.out.println("doc_id = " + hit.doc);
System.out.println("内容: " + doc.get("content"));
System.out.println("路径:" + doc.get("path"));
}
}
public static String update() throws Exception
{
IndexWriterConfig writeConfig = new IndexWriterConfig(Version.LUCENE_33, analyzer);
IndexWriter writer = new IndexWriter(FSDirectory.open(new File(IndexPath)), writeConfig);
Document document = new Document();
Field field_name2 = new Field("path", "update_path", Field.Store.YES, Field.Index.ANALYZED);
Field field_content2 = new Field("content", "content update", Field.Store.YES, Field.Index.ANALYZED);
document.add(field_name2);
document.add(field_content2);
Term term = new Term("path", "qqqqq");
writer.updateDocument(term, document);
writer.optimize();
writer.close();
return "update_path";
}
}
I assume you want to update your document such that field "path" = "qqqq". You have this exactly backwards (please read the documentation).
updateDocument performs two steps:
Find and delete any documents containing term
In this case, none are found, because your indexed documents does not contain path:qqqq
Add the new document to the index.
You appear to be doing the opposite, trying to lookup by document, then add the term to it, and it doesn't work that way. What you are looking for, I believe, is something like:
Term term = new Term("content", "update");
document.removeField("path");
document.add("path", "qqqq");
writer.updateDocument(term, document);
In My App I have documents represents my data for each category, my application perform an automatic index to new and the modified documents.
if i performed index for all documents in one category, its work fine and retrieve a correct results, but the problem is, if i modified or create new document its will not retrieve it, if its matched my search query.
usually keeps return all docs except the last modified one.
any help please ?
I have this IndexWriter config :
private IndexWriter getIndexWriter() throws IOException {
Directory directory = FSDirectory.open(new File(filepath));
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_43, IndexFactory.ANALYZER);
config.setRAMBufferSizeMB(350);
TieredMergePolicy tmp = new TieredMergePolicy();
tmp.setUseCompoundFile(false);
config.setMergePolicy(tmp);
ConcurrentMergeScheduler scheduler = (ConcurrentMergeScheduler) config.getMergeScheduler();
scheduler.setMaxThreadCount(2);
scheduler.setMaxMergeCount(20);
IndexWriter writer = new IndexWriter(directory, config);
writer.forceMerge(1);
return writer;
My Collector :
public void collect(int docNum) throws IOException {
try {
if ((getCount() == getMaxSearchLimit() + 1) && getMaxSearchResults() != null) {
setCounterExceededLimit(true);
return;
}
addDocKey();// method to add and render the matching docs by customize way
} catch(IOException exp) {
if (!getErrors().toArrayList(getApplication().getLocale()).contains(exp.getMessage())) {
getErrors().addError(exp.getMessage());
}
} catch (BusinessException bEx) {
if (!getErrors().containsError(bEx.getErrorNumber())) {
getErrors().addError(bEx);
}
} catch (CounterExceededLimitException counterEx) {
return;
}
}
#Override
public boolean acceptsDocsOutOfOrder() {
// TODO Auto-generated method stub
return true;
}
#Override
public void setNextReader(AtomicReaderContext context) throws IOException {
// TODO Auto-generated method stub
}
#Override
public void setScorer(Scorer scorer) throws IOException {
// TODO Auto-generated method stub
}
acually i have this busniess logic to save my doc, then i asked if the doc saved successfully to add it to the index process.
public boolean saveDocument(CategoryDocument doc) {
boolean saved = false;
// code to save my doc
if(saved) {
//add this document to the index process
IndexManager.getInstance().addToIndex(this);
}
}
then my index manager create a new thread to handle indexing this doc.
here is my process to index my data document :
private void processDocument(IndexDocument indexDoc, DocKey docKey, boolean addToIndex) throws SearchException, BusinessException {
CategorySetting catSetting = docKey.getCategorySetting();
Integer catID = catSetting.getID();
IndexManager manager = IndexManager.getInstance();
IndexWriter writer = null;
try {
//Delete the lock file in case previous index operation failed to delete it
File lockFile = new File(filepath, IndexWriter.WRITE_LOCK_NAME);
if (lockFile != null && lockFile.exists()) {
lockFile.delete();
}
if(!manager.isGlobalIndexingProcess(catID)) {
writer = getIndexWriter();
} else {
writer = manager.getGlobalIndexWriter(catID);
}
writer.forceMerge(1);
removeDocument(docKey, writer);
if (addToIndex) {
writer.addDocument(indexDoc.getLuceneIndexDoc());
}
} catch(IOException exp) {
throw new SearchException(exp.getMessage(), true);
} finally {
if(!manager.isGlobalIndexingProcess(catID)) {
if (writer != null) {
try {
writer.close(true);
} catch(IOException ex) {
throw new SearchException(ex);
}
}
}
}
}
Use lucene search and search for the word or phrase that you edited in the document and let us know whether you get the correct hits or not. If you didn't get any hits then probably you are not indexing edited or newly added documents.
I cannot find any complete examples of how to use this API. The code below is not giving any results. Any idea why?
static String spatialPrefix = "_point";
static String latField = spatialPrefix + "lat";
static String lngField = spatialPrefix + "lon";
public static void main(String[] args) throws IOException {
SpatialLuceneExample spatial = new SpatialLuceneExample();
spatial.addData();
IndexReader reader = DirectoryReader.open(modules.getDirectory());
IndexSearcher searcher = new IndexSearcher(reader);
searchAndUpdateDocument(38.9510000, -77.4107000, 100.0, searcher,
modules);
}
private void addLocation(IndexWriter writer, String name, double lat,
double lng) throws IOException {
Document doc = new Document();
doc.add(new org.apache.lucene.document.TextField("name", name,
Field.Store.YES));
doc.add(new org.apache.lucene.document.DoubleField(latField, lat,
Field.Store.YES));
doc.add(new org.apache.lucene.document.DoubleField(lngField, lng,
Field.Store.YES));
doc.add(new org.apache.lucene.document.TextField("metafile", "doc",
Field.Store.YES));
writer.addDocument(doc);
System.out.println("===== Added Doc to index ====");
}
private void addData() throws IOException {
IndexWriter writer = modules.getWriter();
addLocation(writer, "McCormick & Schmick's Seafood Restaurant",
38.9579000, -77.3572000);
addLocation(writer, "Jimmy's Old Town Tavern", 38.9690000, -77.3862000);
addLocation(writer, "Ned Devine's", 38.9510000, -77.4107000);
addLocation(writer, "Old Brogue Irish Pub", 38.9955000, -77.2884000);
//...
writer.close();
}
private final static Logger logger = LogManager
.getLogger(SpatialTools.class);
public static void searchAndUpdateDocument(double lo, double la,
double dist, IndexSearcher searcher, LuceneModules modules) {
SpatialContext ctx = SpatialContext.GEO;
SpatialArgs args = new SpatialArgs(SpatialOperation.IsWithin,
ctx.makeCircle(lo, la, DistanceUtils.dist2Degrees(dist,
DistanceUtils.EARTH_MEAN_RADIUS_KM)));
PointVectorStrategy strategy = new PointVectorStrategy(ctx, "_point");
// RecursivePrefixTreeStrategy recursivePrefixTreeStrategy = new
// RecursivePrefixTreeStrategy(grid, fieldName);
// How to use it?
Query makeQueryDistanceScore = strategy.makeQueryDistanceScore(args);
LuceneSearcher instance = LuceneSearcher.getInstance(modules);
instance.getTopResults(makeQueryDistanceScore);
//no results
Filter geoFilter = strategy.makeFilter(args);
try {
Sort chainedSort = new Sort().rewrite(searcher);
TopDocs docs = searcher.search(new MatchAllDocsQuery(), geoFilter,
10000, chainedSort);
logger.debug("search finished, num: " + docs.totalHits);
//no results
for (ScoreDoc scoreDoc : docs.scoreDocs) {
Document doc = searcher.doc(scoreDoc.doc);
double la1 = Double.parseDouble(doc.get(latField));
double lo1 = Double.parseDouble(doc.get(latField));
double distDEG = ctx.getDistCalc().distance(
args.getShape().getCenter(), lo1, la1);
logger.debug("dist deg: : " + distDEG);
double distKM = DistanceUtils.degrees2Dist(distDEG,
DistanceUtils.EARTH_MEAN_RADIUS_KM);
logger.debug("dist km: : " + distKM);
}
} catch (IOException e) {
logger.error("fail to get the search result!", e);
}
}
Did you see the javadocs? These docs in turn point to SpatialExample.java which is what you're looking for. What could I do to make them more obvious?
If you're bent on using a pair of doubles as the internal index approach then use PointVectorStrategy. However, you'll get superior filter performance if you instead use RecursivePrefixTreeStrategy. Presently, PVS does better distance sorting, though, scalability wise. You could use both for their respective benefits.
Just looking quickly at your example, I see you didn't use SpatialStrategy.createIndexableFields(). The intention is that you use that.
See the following link for example : http://mad4search.blogspot.in/2013/06/implementing-geospatial-search-using.html
I am developing an testing automation tool in linux system. I dont have write permissions for tomcat directory which is located on server. I need to develop an application where we can select an excel file so that the excel content is automatically stored in already existing table.
For this pupose i have written an form to select an file which is posted to a servlet CommonsFileUploadServlet where i am storing the uploaded file and then calling ReadExcelFile class which reads the file path and create a vector for data in file which is used to sstore data in database.
My problem is that i am not able to store the uploaded file in directory. Is it necessary to have permission rights for tomcat to do this. Can i store the file on my system and pass the path to ReadExcelFile.class
Please guide me
My code is as follows:
Form in jsp
CommonsFileUploadServlet class code:
public void init(ServletConfig config) throws ServletException {
super.init(config);
}
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
PrintWriter out = response.getWriter();
response.setContentType("text/plain");
out.println("<h1>Servlet File Upload Example using Commons File Upload</h1>");
DiskFileItemFactory fileItemFactory = new DiskFileItemFactory ();
fileItemFactory.setSizeThreshold(1*1024*1024);
fileItemFactory.setRepository(new File("/home/example/Documents/Project/WEB-INF/tmp"));
ServletFileUpload uploadHandler = new ServletFileUpload(fileItemFactory);
try {
List items = uploadHandler.parseRequest(request);
Iterator itr = items.iterator();
while(itr.hasNext()) {
FileItem item = (FileItem) itr.next();
if(item.isFormField()) {
out.println("File Name = "+item.getFieldName()+", Value = "+item.getString());
} else {
out.println("Field Name = "+item.getFieldName()+
", File Name = "+item.getName()+
", Content type = "+item.getContentType()+
", File Size = "+item.getSize());
File file = new File("/",item.getName());
String realPath = getServletContext().getRealPath("/")+"/"+item.getName();
item.write(file);
ReadExcelFile ref= new ReadExcelFile();
String res=ref.insertReq(realPath,"1");
}
out.close();
}
}catch(FileUploadException ex) {
log("Error encountered while parsing the request",ex);
} catch(Exception ex) {
log("Error encountered while uploading file",ex);
}
}
}
ReadExcelFile code:
public static String insertReq(String fileName,String sno) {
//Read an Excel File and Store in a Vector
Vector dataHolder=readExcelFile(fileName,sno);
//store the data to database
storeCellDataToDatabase(dataHolder);
}
public static Vector readExcelFile(String fileName,String Sno)
{
/** --Define a Vector
--Holds Vectors Of Cells
*/
Vector cellVectorHolder = new Vector();
try{
/** Creating Input Stream**/
//InputStream myInput= ReadExcelFile.class.getResourceAsStream( fileName );
FileInputStream myInput = new FileInputStream(fileName);
/** Create a POIFSFileSystem object**/
POIFSFileSystem myFileSystem = new POIFSFileSystem(myInput);
/** Create a workbook using the File System**/
HSSFWorkbook myWorkBook = new HSSFWorkbook(myFileSystem);
int s=Integer.valueOf(Sno);
/** Get the first sheet from workbook**/
HSSFSheet mySheet = myWorkBook.getSheetAt(s);
/** We now need something to iterate through the cells.**/
Iterator rowIter = mySheet.rowIterator();
while(rowIter.hasNext())
{
HSSFRow myRow = (HSSFRow) rowIter.next();
Iterator cellIter = myRow.cellIterator();
Vector cellStoreVector=new Vector();
short minColIndex = myRow.getFirstCellNum();
short maxColIndex = myRow.getLastCellNum();
for(short colIndex = minColIndex; colIndex < maxColIndex; colIndex++)
{
HSSFCell myCell = myRow.getCell(colIndex);
if(myCell == null)
{
cellStoreVector.addElement(myCell);
}
else
{
cellStoreVector.addElement(myCell);
}
}
cellVectorHolder.addElement(cellStoreVector);
}
}catch (Exception e){e.printStackTrace(); }
return cellVectorHolder;
}
private static void storeCellDataToDatabase(Vector dataHolder)
{
Connection conn;
Statement stmt;
String query;
try
{
// get connection and declare statement
int z;
for (int i=1;i<dataHolder.size(); i++)
{
z=0;
Vector cellStoreVector=(Vector)dataHolder.elementAt(i);
String []stringCellValue=new String[10];
for (int j=0; j < cellStoreVector.size();j++,z++)
{
HSSFCell myCell = (HSSFCell)cellStoreVector.elementAt(j);
if(myCell==null)
stringCellValue[z]=" ";
else
stringCellValue[z] = myCell.toString();
}
try
{
//inserting into database
}
catch(Exception error)
{
String e="Error"+error;
System.out.println(e);
}
}
stmt.close();
conn.close();
System.out.println("success");
}
catch(Exception error)
{
String e="Error"+error;
System.out.println(e);
}
}
POI will happily open from an old InputStream, it needn't be a File one.
I'd suggest you look at the Commons FileUpload Streaming API and consider just passing the excel part straight to POI without touching the disk