Field.Store.COMPRESS in Lucene 3.0.2 - lucene

I am upgrading lucene 2.4.1 to 3.0.2 in my java web project
in lucene API's i found that Field.Store.COMPRESS is not present in 3.0.2 so
what i can use in place of Field.Store.COMPRESS?
some time field data is so large that i have to compress that.

Lucene made the decision to not compress fields, as it was really slow, and not Lucene's forte. The Javadocs say:
Please use
CompressionTools instead. For string
fields that were previously indexed
and stored using compression, the new
way to achieve this is: First add the
field indexed-only (no store) and
additionally using the same field name
as a binary, stored field with
CompressionTools.compressString(java.lang.String).

Related

how to test search accuracy in solr

Hello I am new with Solr information retravel system
and I want to add a text file to Solr then search for a word form the file in order to see Solr accuracy in other languages but I am not sure how. I find that there is a UI for search but also don't know how to use it and there is data import handler but it must be in XML, CSV or JSON and I want text file but also if I use it I don't know how to search for a word or sentence
I would recommend a basic Apache Lucene/Solr course and a deep dive in the Solr wiki[1].
The getting started especially, should really help you.
Good Luck
[1] https://lucene.apache.org/solr/guide/7_0/solr-tutorial.html

How to import an external file of indexed documents in solr core

we are working on a teamwork to create a Persian search engine.
I am doing the "indexing" part.
I worked with Solr and indexed some English documents to see if it works.
It worked! so it's the time for Persian indexer. I optimized a code for PersianAnalyzer a little bit (extending the stop words set for instance) and it can index the documents. Now I want to import the external Persian indexed document to the core to see the indexing process and search a query on it. how can I do it and import these indexed documents to the core?
I am kind of in hurry, so I will appreciate any help.
thanks,
Mahshid
You have several options:
the quickest option in order to get content from a file would be to use Solr DataImportHandler;
another option would be to write a custom crawler/indexer but that would require time;
if you need a web-crawler instead then you can use Apache Nutch.

Lucene search query in different file formats

I'm using Apache's Lucene 3.0.3 on Windows 7. I'm able to index files successfully given any file extensions (.doc, .ppt, .pdf, .txt, .rtf etc). But, I'm able to search for a word(s) in any spoken human language(Indian/foreign) from only the indexed text document(s) but not from indexed Word/Powerpoint/PDF documents. Why is this? Is it possible for Lucene to do this directly?
Do I need to use a higher version of Lucene? I'm aware of Lucene 4.8.1. Do I need to use that to achieve my task stated above or is not possible for Lucene 3 to achieve the same?
Lucene doesn't interpret content. It indexes the content you give it and makes it searchable. If you hand it binary garbage, it will happily index it and make it searchable, it just won't be in searchable via human language. .doc, .ppt, .pdf, and .rtf formats are not plain-text, and so won't index well by just reading and chucking them directly into lucene.
You need to extract the content from the documents in order to search them meaningfully. I'd recommend using Tika.

missing packages in lucene 4.0 snapshot

anybody knows why there is no QueryParser, nor IndexWriter.MaxFieldLength(25000) and some more in Lucene 4.0 Snapshot?
I'm having hard time to port the code to this newer version, though I'm following the code as given here: http://search-lucene.com/jd/lucene/overview-summary.html
How do I find the missing packages, and how do I get them? As the snapshop jar doesn't contain all the features..
thanks
Lucene has been re-architectured, and some classes which used to be in the core module are now in submodules. You will now find the QueryParser stuff in the queryparser submodule. Similarly, lots of useful analyzers, tokenizers and tokenfilters have been moved to the analysis submodule.
Regarding IndexWriter, the maximum field length option has been deprecated, it is now recommended to wrap an analyzer with LimitTokenCountAnalyzer (in the analysis submodule) instead.

Where can I get XSD schema for solrconfig.xml and schema.xml

I want the XML schemas defining what elements can appear where in solrconfig and schema XML files, for some IDE completion help, and also to handwrite some config, instead of copy pasting from the net where there is some mixed content for many solr versions. I'm using Solr 3.3 (which has Lucene 3.3 under it).
I cannot find it in the svn, or anywhere else for that matter. Maybe Lucene has the XSD for the schema.xml which looks a lot like mapping to a document in lucene.
Take a look at the patch attached to this issue.