Lucene Query (with shingles ? )

Lucene Query (with shingles ? ) - lucene

I have a Lucene Index containing documents like these :
_id | Name | Alternate Names | Population
123 Bosc de Planavilla (some names here in 5000
345 Planavilla other languages) 20000
456 Bosc de la Planassa 1000
567 Bosc de Plana en Blanca 100000
What's the best Lucene query type I should use and how should I structure it considering I need the following :
If a user queries for :
"Italian Restaurant near Bosc de Planavilla"
I want document with id 123 to be returned because its contains an exact match with the name of the doc.
If a user queries for :
"Italian Restaurant near Planavilla"
I want document with id 345 because query contains an exact match AND it has the highest population.
If a user queries for "Italian Restaurant near Bosc"
I want 567 because query contains "Bosc" AND of the 3 "Bosc" it has the highest pop.
there are probably many other use cases ... but you get the feeling of what i need ...
What kind of query will do this form me ?
Should I generate word N grams (shingles) and create an ORed boolean query using the shingles then apply custom scoring ? or will a regular phrase query will do ? I also saw DisjunctionMaxQuery but dont know if its what im looking for ...
The idea, as you might have anderstood by now, is to find the exact Location a user implied in his query. From that I can start my Geo search and add some further querying around that.
What's the best approach ?
Thanks in advance .

Here is the code for sorting as well. Although I think it would make more sense to add a custom scoring taking into account the city size rather than bruteforcing the sort on the population. Also please note that this uses the FieldCache, which may not be the best solution regarding memory usage.
public class ShingleFilterTests {
private Analyzer analyzer;
private IndexSearcher searcher;
private IndexReader reader;
private QueryParser qp;
private Sort sort;
public static Analyzer createAnalyzer(final int shingles) {
return new Analyzer() {
#Override
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream tokenizer = new WhitespaceTokenizer(reader);
tokenizer = new StopFilter(false, tokenizer, ImmutableSet.of("de", "la", "en"));
if (shingles > 0) {
tokenizer = new ShingleFilter(tokenizer, shingles);
}
return tokenizer;
}
};
}
public class PopulationComparatorSource extends FieldComparatorSource {
#Override
public FieldComparator newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException {
return new PopulationComparator(fieldname, numHits);
}
private class PopulationComparator extends FieldComparator {
private final String fieldName;
private Integer[] values;
private int[] populations;
private int bottom;
public PopulationComparator(String fieldname, int numHits) {
values = new Integer[numHits];
this.fieldName = fieldname;
}
#Override
public int compare(int slot1, int slot2) {
if (values[slot1] > values[slot2]) return -1;
if (values[slot1] < values[slot2]) return 1;
return 0;
}
#Override
public void setBottom(int slot) {
bottom = values[slot];
}
#Override
public int compareBottom(int doc) throws IOException {
int value = populations[doc];
if (bottom > value) return -1;
if (bottom < value) return 1;
return 0;
}
#Override
public void copy(int slot, int doc) throws IOException {
values[slot] = populations[doc];
}
#Override
public void setNextReader(IndexReader reader, int docBase) throws IOException {
/* XXX uses field cache */
populations = FieldCache.DEFAULT.getInts(reader, "population");
}
#Override
public Comparable value(int slot) {
return values[slot];
}
}
}
#Before
public void setUp() throws Exception {
Directory dir = new RAMDirectory();
analyzer = createAnalyzer(3);
IndexWriter writer = new IndexWriter(dir, analyzer, IndexWriter.MaxFieldLength.UNLIMITED);
ImmutableList<String> cities = ImmutableList.of("Bosc de Planavilla", "Planavilla", "Bosc de la Planassa",
"Bosc de Plana en Blanca");
ImmutableList<Integer> populations = ImmutableList.of(5000, 20000, 1000, 100000);
for (int id = 0; id < cities.size(); id++) {
Document doc = new Document();
doc.add(new Field("id", String.valueOf(id), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("city", cities.get(id), Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("population", String.valueOf(populations.get(id)),
Field.Store.YES, Field.Index.NOT_ANALYZED));
writer.addDocument(doc);
}
writer.close();
qp = new QueryParser(Version.LUCENE_30, "city", createAnalyzer(0));
sort = new Sort(new SortField("population", new PopulationComparatorSource()));
searcher = new IndexSearcher(dir);
searcher.setDefaultFieldSortScoring(true, true);
reader = searcher.getIndexReader();
}
#After
public void tearDown() throws Exception {
searcher.close();
}
#Test
public void testShingleFilter() throws Exception {
System.out.println("shingle filter");
printSearch("city:\"Bosc de Planavilla\"");
printSearch("city:Planavilla");
printSearch("city:Bosc");
}
private void printSearch(String query) throws ParseException, IOException {
Query q = qp.parse(query);
System.out.println("query " + q);
TopDocs hits = searcher.search(q, null, 4, sort);
System.out.println("results " + hits.totalHits);
int i = 1;
for (ScoreDoc dc : hits.scoreDocs) {
Document doc = reader.document(dc.doc);
System.out.println(i++ + ". " + dc + " \"" + doc.get("city") + "\" population: " + doc.get("population"));
}
System.out.println();
}
}
This gives the following results:
query city:"Bosc Planavilla"
results 1
1. doc=0 score=1.143841[5000] "Bosc de Planavilla" population: 5000
query city:Planavilla
results 2
1. doc=1 score=1.287682[20000] "Planavilla" population: 20000
2. doc=0 score=0.643841[5000] "Bosc de Planavilla" population: 5000
query city:Bosc
results 3
1. doc=3 score=0.375[100000] "Bosc de Plana en Blanca" population: 100000
2. doc=0 score=0.5[5000] "Bosc de Planavilla" population: 5000
3. doc=2 score=0.5[1000] "Bosc de la Planassa" population: 1000

How do you tokenize the fields? Do you store them as complete string? Also, how do you parse the query?
Okay, so I am playing around a bit with this. I have been using a StopFilter to remove la, en, de. I then used a shingle filter to get multiple combination in order to do the "exact matches". So for example Bosc de Planavilla gets tokenized as [Bosc] [Bosc Planavilla] and Bosc de Plana en Blanca gets tokenized to [Bosc] [Bosc Plana] [Plana Blanca] [Bosc Plana Blanca]. This is so that you can have "exact matches" on parts of the query.
I then query the exact string the user passed, although there could be some adaptation there as well. I went with the simple case to make the results better match what you were looking for.
Here is code I am using (lucene 3.0.3):
public class ShingleFilterTests {
private Analyzer analyzer;
private IndexSearcher searcher;
private IndexReader reader;
public static Analyzer createAnalyzer(final int shingles) {
return new Analyzer() {
#Override
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream tokenizer = new WhitespaceTokenizer(reader);
tokenizer = new StopFilter(false, tokenizer, ImmutableSet.of("de", "la", "en"));
if (shingles > 0) {
tokenizer = new ShingleFilter(tokenizer, shingles);
}
return tokenizer;
}
};
}
#Before
public void setUp() throws Exception {
Directory dir = new RAMDirectory();
analyzer = createAnalyzer(3);
IndexWriter writer = new IndexWriter(dir, analyzer, IndexWriter.MaxFieldLength.UNLIMITED);
ImmutableList<String> cities = ImmutableList.of("Bosc de Planavilla", "Planavilla", "Bosc de la Planassa",
"Bosc de Plana en Blanca");
ImmutableList<Integer> populations = ImmutableList.of(5000, 20000, 1000, 100000);
for (int id = 0; id < cities.size(); id++) {
Document doc = new Document();
doc.add(new Field("id", String.valueOf(id), Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("city", cities.get(id), Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("population", String.valueOf(populations.get(id)),
Field.Store.YES, Field.Index.NOT_ANALYZED));
writer.addDocument(doc);
}
writer.close();
searcher = new IndexSearcher(dir);
reader = searcher.getIndexReader();
}
#After
public void tearDown() throws Exception {
searcher.close();
}
#Test
public void testShingleFilter() throws Exception {
System.out.println("shingle filter");
QueryParser qp = new QueryParser(Version.LUCENE_30, "city", createAnalyzer(0));
printSearch(qp, "city:\"Bosc de Planavilla\"");
printSearch(qp, "city:Planavilla");
printSearch(qp, "city:Bosc");
}
private void printSearch(QueryParser qp, String query) throws ParseException, IOException {
Query q = qp.parse(query);
System.out.println("query " + q);
TopDocs hits = searcher.search(q, 4);
System.out.println("results " + hits.totalHits);
int i = 1;
for (ScoreDoc dc : hits.scoreDocs) {
Document doc = reader.document(dc.doc);
System.out.println(i++ + ". " + dc + " \"" + doc.get("city") + "\" population: " + doc.get("population"));
}
System.out.println();
}
}
I am now looking into sorting per population.
This prints out:
query city:"Bosc Planavilla"
results 1
1. doc=0 score=1.143841 "Bosc de Planavilla" population: 5000
query city:Planavilla
results 2
1. doc=1 score=1.287682 "Planavilla" population: 20000
2. doc=0 score=0.643841 "Bosc de Planavilla" population: 5000
query city:Bosc
results 3
1. doc=0 score=0.5 "Bosc de Planavilla" population: 5000
2. doc=2 score=0.5 "Bosc de la Planassa" population: 1000
3. doc=3 score=0.375 "Bosc de Plana en Blanca" population: 100000

Related

hadoop mapreduce common friends reducer spillage

I am trying to run the below code, to find the common friends between two persons. The input is as follows
A : B C D
B : A C D E
C : A B D E
D : A B C E
E : B C D
I am not able to get any output in the output file and there is no exception.
Please find my code below,
public class Friend {
public static class Mapperfriend extends Mapper<Object, Text, Text, Text>{
private Text vendor = new Text();
#Override
protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer tokenizer = new StringTokenizer(value.toString(), "\n");
String line = null;
String[] lineArray = null;
String[] friendArray = null;
String[] tempArray = null;
while(tokenizer.hasMoreTokens()){
line = tokenizer.nextToken();
lineArray = line.split(":");
friendArray = lineArray[1].split(" ");
tempArray = new String[2];
for(int i = 0; i < friendArray.length; i++){
tempArray[0] = friendArray[i];
tempArray[1] = lineArray[0];
Arrays.sort(tempArray);
context.write(new Text(tempArray[0] + " " + tempArray[1]), new Text(lineArray[1]));
}
}
}}
public static class ReducerFriend extends Reducer<Text,Text,Text,Text>{
#Override
protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
Text[] texts = new Text[2];
int index = 0;
for(Text val: values)
{
texts[index++] = new Text(val);
}
String[] list1 = texts[0].toString().split(" ");
String[] list2 = texts[1].toString().split(" ");
List<String> list = new LinkedList<String>();
for(String friend1 : list1){
for(String friend2 : list2){
if(friend1.equals(friend2)){
list.add(friend1);
}
}
}
StringBuffer sb = new StringBuffer();
for(int i = 0; i < list.size(); i++){
sb.append(list.get(i));
if(i != list.size() - 1)
sb.append(" ");
}
context.write(key, new Text(sb.toString()));
}
}
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws Exception {
// TODO code application logic here
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Friends");
job.setJarByClass(Friend.class);
job.setMapperClass(Mapperfriend.class);
job.setReducerClass(ReducerFriend.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

The mapper class emitted the entire key from first. Instead of picking up the array. When removed that it works fine.

Get results with exact match

I want to do a query like that : "banana apple cherry" on a "fruit" field.
All the fruits in the desserts needs to be in the query, but not all the fruits in the query needs to be in the desserts..
Here's an example..
NAME FRUIT
Dessert1 banana apple OK (we got banana and apple in the query)
Dessert2 cherry apple banana OK(the order doesn't matter)
Dessert3 cherry apple banana melon NO (melon is missing in the query)
public class ArrayStringFieldBridge implements TwoWayFieldBridge{
#Override
public Object get(String name, Document document) {
IndexableField[] fields = document.getFields(name);
String[] values = new String[fields.length];
for (int i=0; i<fields.length; i++) {
values[i] = fields[i].stringValue();
}
return values;
}
#Override
public String objectToString(Object value) {
return StringUtils.join((String[])value, " ");
}
#Override
public void set(String name, Object value, Document document, LuceneOptions luceneOptions) {
String newString = StringUtils.join((String[])value, " ");
Field field = new Field(name, newString, luceneOptions.getStore(), luceneOptions.getIndex(), luceneOptions.getTermVector());
field.setBoost(luceneOptions.getBoost());
document.add(field);
}
}
#Indexed
#AnalyzerDef(name = "customanalyzer",
tokenizer = #TokenizerDef(factory = StandardTokenizerFactory.class))
public class Dessert {
#Analyzer(definition="customanalyzer")
#Field(name = "equipment", index=Index.YES, analyze = Analyze.YES, store=Store.YES)
#FieldBridge(impl=ArrayStringFieldBridge.class)
public String[] fruits = new String[]{};
}
Even if you are not using hibernate-search, every suggestions about the theory to handle that would be great... Thank you

Step 1 : Fire lucene query "fruit:banana OR fruit:apple OR fruit:cherry"
Step 2 : Gather all matched dessert documents
Step 3 : Post process your match dessert document with query
convert match document to array of terms matchDocArr : {banana, apple}
convert query terms to array - queryArr : {banana, apple, cherry}
iterate over matchDocArr and make sure each term of matchDocArr is found in queryArr by array, if NOT (melon use case) knockout this matched document
Here is an example function which needs to be called for every matched doc
public static boolean isDocInterested(String query, String matchDoc)
{
List<String> matchDocArr = new ArrayList<String>();
matchDocArr = Arrays.asList(matchDoc.split(" "));
List<String> queryArr = new ArrayList<String>();
queryArr = Arrays.asList(query.split(" "));
int matchCounter = 0;
for(int i=0; i<matchDocArr.size(); i++)
{
if (queryArr.contains(matchDocArr.get(i)))
matchCounter++;
}
if (matchCounter == matchDocArr.size())
return true;
return false;
}
if function returns TRUE we are interested in doc/dessert, if it returns FALSE ignore this doc/dessert.
of course this function can be written in many different ways but I think you get the point.

Highlighter in lucene.net not working for wildchard and fuzzy search

highlighter using lucene.net ( 3.0.3) not working for the below code. If I am searching for a word "dealing" highlighter is showing but if I am searching for a word with wildchar "deal*" then there is no highlighting
protected void btnIndex_Click(object sender, EventArgs e)
{
string indexPath = #"D:\temp\LuceneIndex1";
Lucene.Net.Store.Directory directory = FSDirectory.Open(indexPath);
Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
IndexWriter writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED);
IndexReader red = IndexReader.Open(directory, true);
int totDocs = red.MaxDoc;
red.Close();
//Add documents to the index
string text = String.Empty;
text = "One thing that may be of interest, is that if you are dealing with vast quantites of data you may want to create static Field fields and reuse them rather than creating new one each time you rebuild the index. Obviously for this demo the Lucene index is only created once per application run, but in a production application you may build the index every 5 mins or something like that, in which case I would recommend reusing the Field objects by making static fields that get re-used.";
int txts = totDocs;
AddTextToIndex(txts++, text, writer);
writer.Optimize();
writer.Dispose();
//Setup searcher
IndexSearcher searcher = new IndexSearcher(directory);
QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "postBody", analyzer);
text = txtSearchData.Text;
Label1.Text = Search(text, searcher, parser, analyzer);
//Clean up everything
searcher.Close();
directory.Close();
}
private static void AddTextToIndex(int txts, string text, IndexWriter writer)
{
Document doc = new Document();
doc.Add(new Field("id", txts.ToString(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
doc.Add(new Field("postBody", text, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
writer.AddDocument(doc);
}
private string Search(string text, IndexSearcher searcher, QueryParser parser, Analyzer analyzer)
{
string indexPath = #"D:\temp\LuceneIndex1";
Lucene.Net.Store.Directory directory = FSDirectory.Open(indexPath);
string result = "";
string snip = "";
var booleanQuery = new BooleanQuery();
var fuzzyQuery = new FuzzyQuery(new Term("postBody", text), 0.7f, 3);
booleanQuery.Add(new BooleanClause(fuzzyQuery, Occur.SHOULD));
//Supply conditions
Query query = parser.Parse(text);
FastVectorHighlighter highlighter = getHighlighter();
parser.AllowLeadingWildcard = true;
query = parser.Parse(text);
BooleanQuery.MaxClauseCount = 10;
query = query.Rewrite(IndexReader.Open(directory, true));
query.Rewrite(IndexReader.Open(directory, true));
FieldQuery fieldQuery = highlighter.GetFieldQuery(booleanQuery);
TopScoreDocCollector collector = TopScoreDocCollector.Create(100, true);
searcher.Search(query, collector);
ScoreDoc[] hits = collector.TopDocs().ScoreDocs;
int results = hits.Length;
Console.WriteLine("Found {0} results", results);
for (int i = 0; i < hits.Length; i++)
{
int docId = hits[i].Doc;
float score = hits[i].Score;
Lucene.Net.Documents.Document doc = searcher.Doc(docId);
result = "Score: " + score.ToString() +
" Field: " + doc.Get("id") +
" Field2: " + doc.Get("postBody");
string text1 = doc.Get("postBody");
string[] hight = getFragmentsWithHighlightedTerms(analyzer, query, "postBody", text1, 5, 100, directory);
}
return result + " :::: " + snip;
}
private FastVectorHighlighter getHighlighter()
{
FragListBuilder fragListBuilder = new SimpleFragListBuilder();
FragmentsBuilder fragmentsBuilder = new ScoreOrderFragmentsBuilder(
BaseFragmentsBuilder.COLORED_PRE_TAGS,
BaseFragmentsBuilder.COLORED_POST_TAGS);
return new FastVectorHighlighter(true, true, fragListBuilder,
fragmentsBuilder);
}
private static String[] getFragmentsWithHighlightedTerms(Analyzer analyzer, Query query, string fieldName, string fieldContents, int fragmentSize, int maxsize, Lucene.Net.Store.Directory directory)
{
TokenStream stream = TokenSources.GetTokenStream(fieldName, fieldContents, analyzer);
// SpanScorer scorer = new SpanScorer();//(query, fieldName, new CachingTokenFilter(stream));
query = query.Rewrite(IndexReader.Open(directory, true));
QueryScorer scorer = new QueryScorer(query, fieldName);
scorer.IsExpandMultiTermQuery = true;// (true);
SimpleSpanFragmenter fragmenter = new SimpleSpanFragmenter(scorer, fragmentSize);
Highlighter highlighter = new Highlighter(scorer);
highlighter.TextFragmenter = fragmenter;
highlighter.MaxDocCharsToAnalyze = maxsize;
String[] fragments = highlighter.GetBestFragments(stream, fieldContents, 10);
return fragments;
}

With Lucene 4.3.1, How to get all terms which occur in sub-range of all docs

Suppose a lucene index with fields : date, content.
I want to get all terms value and frequency of docs whose date is yesterday. date field is keyword field. content field is analyzed and indexed.
Pls help me with sample code.

My solution source is as follow ...
/**
*
*
* #param reader
* #param fromDateTime
* - yyyymmddhhmmss
* #param toDateTime
* - yyyymmddhhmmss
* #return
*/
static public String top10(IndexSearcher searcher, String fromDateTime,
String toDateTime) {
String top10Query = "";
try {
Query query = new TermRangeQuery("tweetDate", new BytesRef(
fromDateTime), new BytesRef(toDateTime), true, false);
final BitSet bits = new BitSet(searcher.getIndexReader().maxDoc());
searcher.search(query, new Collector() {
private int docBase;
#Override
public void setScorer(Scorer scorer) throws IOException {
}
#Override
public void setNextReader(AtomicReaderContext context)
throws IOException {
this.docBase = context.docBase;
}
#Override
public void collect(int doc) throws IOException {
bits.set(doc + docBase);
}
#Override
public boolean acceptsDocsOutOfOrder() {
return false;
}
});
//
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_43,
EnglishStopWords.getEnglishStopWords());
//
HashMap<String, Long> wordFrequency = new HashMap<>();
for (int wx = 0; wx < bits.length(); ++wx) {
if (bits.get(wx)) {
Document wd = searcher.doc(wx);
//
TokenStream tokenStream = analyzer.tokenStream("temp",
new StringReader(wd.get("content")));
// OffsetAttribute offsetAttribute = tokenStream
// .addAttribute(OffsetAttribute.class);
CharTermAttribute charTermAttribute = tokenStream
.addAttribute(CharTermAttribute.class);
tokenStream.reset();
while (tokenStream.incrementToken()) {
// int startOffset = offsetAttribute.startOffset();
// int endOffset = offsetAttribute.endOffset();
String term = charTermAttribute.toString();
if (term.length() < 2)
continue;
Long wl;
if ((wl = wordFrequency.get(term)) == null)
wordFrequency.put(term, 1L);
else {
wl += 1;
wordFrequency.put(term, wl);
}
}
tokenStream.end();
tokenStream.close();
}
}
analyzer.close();
// sort
List<String> occurterm = new ArrayList<String>();
for (String ws : wordFrequency.keySet()) {
occurterm.add(String.format("%06d\t%s", wordFrequency.get(ws),
ws));
}
Collections.sort(occurterm, Collections.reverseOrder());
// make query string by top 10 words
int topCount = 10;
for (String ws : occurterm) {
if (topCount-- == 0)
break;
String[] tks = ws.split("\\t");
top10Query += tks[1] + " ";
}
top10Query.trim();
} catch (IOException e) {
e.printStackTrace();
} finally {
}
// return top10 word string
return top10Query;
}

In Lucene, why do my boosted and unboosted documents get the same score?

At index time I am boosting certain document in this way:
if (myCondition)
{
document.SetBoost(1.2f);
}
But at search time documents with all the exact same qualities but some passing and some failing myCondition all end up having the same score.
And here is the search code:
BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.Add(new TermQuery(new Term(FieldNames.HAS_PHOTO, "y")), BooleanClause.Occur.MUST);
booleanQuery.Add(new TermQuery(new Term(FieldNames.AUTHOR_TYPE, AuthorTypes.BLOGGER)), BooleanClause.Occur.MUST_NOT);
indexSearcher.Search(booleanQuery, 10);
Can you tell me what I need to do to get the documents that were boosted to get a higher score?
Many Thanks!

Lucene encodes boosts on a single byte (although a float is generally encoded on four bytes) using the SmallFloat#floatToByte315 method. As a consequence, there can be a big loss in precision when converting back the byte to a float.
In your case SmallFloat.byte315ToFloat(SmallFloat.floatToByte315(1.2f)) returns 1f because 1f and 1.2f are too close to each other. Try using a bigger boost so that your documents get different scores. (For exemple 1.25, SmallFloat.byte315ToFloat(SmallFloat.floatToByte315(1.25f)) gives 1.25f.)

Here is the requested test program that was too long to post in a comment.
class Program
{
static void Main(string[] args)
{
RAMDirectory dir = new RAMDirectory();
IndexWriter writer = new IndexWriter(dir, new WhitespaceAnalyzer());
const string FIELD = "name";
for (int i = 0; i < 10; i++)
{
StringBuilder notes = new StringBuilder();
notes.AppendLine("This is a note 123 - " + i);
string text = notes.ToString();
Document doc = new Document();
var field = new Field(FIELD, text, Field.Store.YES, Field.Index.NOT_ANALYZED);
if (i % 2 == 0)
{
field.SetBoost(1.5f);
doc.SetBoost(1.5f);
}
else
{
field.SetBoost(0.1f);
doc.SetBoost(0.1f);
}
doc.Add(field);
writer.AddDocument(doc);
}
writer.Commit();
//string TERM = QueryParser.Escape("*+*");
string TERM = "T";
IndexSearcher searcher = new IndexSearcher(dir);
Query query = new PrefixQuery(new Term(FIELD, TERM));
var hits = searcher.Search(query);
int count = hits.Length();
Console.WriteLine("Hits - {0}", count);
for (int i = 0; i < count; i++)
{
var doc = hits.Doc(i);
Console.WriteLine(doc.ToString());
var explain = searcher.Explain(query, i);
Console.WriteLine(explain.ToString());
}
}
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Lucene Query (with shingles ? ) - lucene

Related

hadoop mapreduce common friends reducer spillage

Get results with exact match

Highlighter in lucene.net not working for wildchard and fuzzy search

With Lucene 4.3.1, How to get all terms which occur in sub-range of all docs

In Lucene, why do my boosted and unboosted documents get the same score?

Categories

Resources