How can I update document without losing fields?

How can I update document without losing fields? - lucene

CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr/");
SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField("id", "id1");
doc1.addField("name", "doc1");
doc1.addField("price", new Float(10));
SolrInputDocument doc2 = new SolrInputDocument();
doc2.addField("id", "id1");
doc2.addField("name", "doc2");
server.add(doc1);
server.add(doc2);
server.commit();
SolrQuery query = new SolrQuery();
query.setQuery("id:id1");
query.addSortField("price", SolrQuery.ORDER.desc);
QueryResponse rsp = server.query(query);
Iterator<SolrDocument> iter = rsp.getResults().iterator();
while(iter.hasNext()){
SolrDocument doc = iter.next();
Collection fieldNames = doc.getFieldNames();
Iterator<String> fieldIter = fieldNames.iterator();
StringBuffer content = new StringBuffer("");
while(fieldIter.hasNext()){
String field = fieldIter.next();
content.append(field+":"+doc.get(field)).append(" ");
//System.out.println(field);
}
System.out.println(content);
}
The question is that I want to get the result "id:id1 name:doc2 price:10.0", but the output is "id:id1 name:doc2"...
So I want to know if I want to get the result as "id:id1 name:doc2 price:10.0", how can I modify my programming?

As you are adding the documents with same id. You are basically adding a same document twice.
Solr will update/overwrite the document. updated is basically delete and add.
As the second document you added with the same id does not have the price field, it won't be added and you wont find it the index.
you would need to have all the fields changed and unchanged when you are adding back the document.
doc2.addField("price", new Float(10)); // should add it back to the document

Related

Apache Lucene fuzzy search for multi-worded phrases

I have the following Apache Lucene 7 application:
StandardAnalyzer standardAnalyzer = new StandardAnalyzer();
Directory directory = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(standardAnalyzer);
IndexWriter writer = new IndexWriter(directory, config);
Document document = new Document();
document.add(new TextField("content", new FileReader("document.txt")));
writer.addDocument(document);
writer.close();
IndexReader reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
Query fuzzyQuery = new FuzzyQuery(new Term("content", "Company"), 2);
TopDocs results = searcher.search(fuzzyQuery, 5);
System.out.println("Hits: " + results.totalHits);
System.out.println("Max score:" + results.getMaxScore())
when I use it with :
new FuzzyQuery(new Term("content", "Company"), 2);
the application works fine and returns the following result:
Hits: 1
Max score:0.35161147
but when I try to search with multi term query, for example:
new FuzzyQuery(new Term("content", "Company name"), 2);
it returns the following result:
Hits: 0
Max score:NaN
Anyway, the phrase Company name exists in the source document.txt file.
How to properly use FuzzyQuery in this case in order to be able to do the fuzzy search for multi-word phrases.
UPDATED
Based on the provided solution I have tested it on the following text information:
Company name: BlueCross BlueShield Customer Service
1-800-521-2227
of Texas Preauth-Medical 1-800-441-9188
Preauth-MH/CD 1-800-528-7264
Blue Card Access 1-800-810-2583
For the following query:
SpanQuery[] clauses = new SpanQuery[2];
clauses[0] = new SpanMultiTermQueryWrapper<FuzzyQuery>(new FuzzyQuery(new Term("content", "BlueCross"), 2));
clauses[1] = new SpanMultiTermQueryWrapper<FuzzyQuery>(new FuzzyQuery(new Term("content", "BlueShield"), 2));
SpanNearQuery query = new SpanNearQuery(clauses, 0, true);
the search works fine:
Hits: 1
Max score:0.5753642
but when I try to corrupt a little bit the search query(for example from BlueCross to BlueCros)
SpanQuery[] clauses = new SpanQuery[2];
clauses[0] = new SpanMultiTermQueryWrapper<FuzzyQuery>(new FuzzyQuery(new Term("content", "BlueCros"), 2));
clauses[1] = new SpanMultiTermQueryWrapper<FuzzyQuery>(new FuzzyQuery(new Term("content", "BlueShield"), 2));
SpanNearQuery query = new SpanNearQuery(clauses, 0, true);
it stops working and returns:
Hits: 0
Max score:NaN

The problem here is the following, you're using TextField, which is tokenizing field. E.g. your text "Company name is working on something" would be effectively split by spaces (and others delimeters). So, even if you have the text Company name, during indexation it will become Company, name, is, etc.
In this case this TermQuery won't be able to find what you're looking for. The trick which going to help you would look like this:
SpanQuery[] clauses = new SpanQuery[2];
clauses[0] = new SpanMultiTermQueryWrapper(new FuzzyQuery(new Term("content", "some"), 2));
clauses[1] = new SpanMultiTermQueryWrapper(new FuzzyQuery(new Term("content", "text"), 2));
SpanNearQuery query = new SpanNearQuery(clauses, 0, true);
However, I wouldn't recommend this approach much, especially if your load would be big and you're planning on searching on a 10 term long company names. One should be aware, that those query are potentially heavy to execute.
The following problem with BlueCros is the following. By default Lucene uses StandardAnalyzer for TextField. So it means it effectively lowercase the terms, basically it means that BlueCross in the content field becomes bluecross.
Fuzzy difference between BlueCros and bluecross is 3, that's the reason you do not have a match.
Simple proposal would be to convert term in query to the lowercase, by doing something like .toLowerCase()
In general, one should prefer to use same analyzers during the query time as well (e.g. during construction of the query)

For Lucene.Net it can be like this.
private string _IndexPath = #"Your Index Path";
private Directory _Directory;
private Searcher _IndexSearcher;
private MultiPhraseQuery _MultiPhraseQuery;
_Directory = FSDirectory.Open(_IndexPath);
IndexReader indexReader = IndexReader.Open(_Directory, true);
string field = "Name" // Your field name
string keyword = "big red fox"; // your search term
float fuzzy = 0,7f; // between 0-1
using (_IndexSearcher = new IndexSearcher(indexReader))
{
// "big red fox" to [big,red,fox]
var keywordSplit = keyword.Split();
_MultiPhraseQuery = new MultiPhraseQuery();
FuzzyTermEnum[] _FuzzyTermEnum = new FuzzyTermEnum[keywordSplit.Length];
Term[] _Term = new Term[keywordSplit.Length];
for (int i = 0; i < keywordSplit.Length; i++)
{
_FuzzyTermEnum[i] = new FuzzyTermEnum(indexReader, new Term(field, keywordSplit[i]),fuzzy);
_Term[i] = _FuzzyTermEnum[i].Term;
if (_Term[i] == null)
{
_MultiPhraseQuery.Add(new Term(field, keywordSplit[i]));
}
else
{
_MultiPhraseQuery.Add(_FuzzyTermEnum[i].Term);
}
}
var results = _IndexSearcher.Search(_MultiPhraseQuery, indexReader.MaxDoc);
foreach (var loopDoc in results.ScoreDocs.OrderByDescending(s => s.Score))
{
//YourCode Here
}
}

Creating a MoreLikeThis query yields empty queries and terms

I'm trying to get similar Document of another one . I'm using the Lucene.Net MoreLikeThis-Class to achieve this. For this i seperate my Documents in multiple Fields - Title and Content. Now creating the actual query results in an empty query without interesting Terms.
My could looks like this:
var queries = new List<Query>();
foreach(var docField in docFields)
var similarSearch = new MoreLikeThis(indexReader);
similarSearch.SetFieldNames(docField.fieldName);
similarSearch.Analyzer = new GermanAnalyzer(Version.LUCENE_30, new HashSet<string>(StopWords));
similarSearch.MinDocFreq = 1;
similarSearch.MinTermFreq = 1;
similarSearch.MinWordLen = 1;
similarSearch.Boost = true;
similarSearch.BoostFactor = boostFactor;
using(var reader = new StringReader(docField.Content)){
var searchQuery = similarSearch.Like(reader);
// debugging purpose
var queryString = searchQuery.ToString(); // empty
var terms = similarSearch.RetrieveInterestingTerms(reader); // also empty
queries.Add(searchQuery);
}
var booleanQuery = new BooleanQuery();
foreach(var moreLikeThisQuery in queries)
{
booleanQuery.Add(moreLikeThisQuery, Occur.SHOULD);
}
var topDocs = indexSearcher.Search(booleanQuery, maxNumberOfResults); // and of course no results obtained
So the question is:
Why there are no Terms / why there is no query generated?
I hope important thing's be seen, if not please help me to make my first question better :)

I got it to work.
The problem was, that i worked on the false directory.
I have different Solutions for creating the index and creating the queries and had a missmatch with the index-location.
So the generall solution would be:
Is your Querygenerating-Class fully initialized? (MinDocFreq, MinTermFreq, MinWordLen, has a Analyzer, set the fieldNames)
Is your used IndexReader correctly initialized?

Lucene.net highlight searched term in the text

I am using Lucene.net to search a given document. Requirement is once search is done, it should highlight the searched term in the document. I have seen examples which returns the best fragments. But what i need is to highlight in the main content.
using (StandardAnalyzer standardAnalyzer = new StandardAnalyzer(Version.LUCENE_30, stopWords))
{
QueryParser parser = new QueryParser(Version.LUCENE_30, "Content", standardAnalyzer);
parser.AllowLeadingWildcard = true;
Query qry = parser.Parse(searchText);
Directory indexDir = CreateRAMDirectory(htmlContent);
IndexReader reader = IndexReader.Open(indexDir, true);
IndexSearcher searcher = new IndexSearcher(reader);
searcher.SetDefaultFieldSortScoring(true, true);
IFormatter formatter = new SimpleHTMLFormatter("<span style=\"font-weight:bold; background-color:yellow;\">", "</span>");
SimpleFragmenter fragmenter = new SimpleFragmenter(1000);
QueryScorer scorer = null;
scorer = new QueryScorer(qry);
ScoreDoc[] hits = searcher.Search(qry, null, 10000, Sort.RELEVANCE).ScoreDocs;
Highlighter highlighter = new Highlighter(formatter, scorer);
highlighter.TextFragmenter = fragmenter;
foreach (var result in hits)
{
int docId = result.Doc;
float score = result.Score;
Document doc = searcher.Doc(docId);
Lucene.Net.Analysis.TokenStream stream = standardAnalyzer.TokenStream("Content", new IO.StringReader(searchText));
String highlighterData = highlighter.GetBestFragments(stream, searchText, 1000, "");
}
}
I am a newbie to Lucene.net, how can i get the entire document with searched term content highlighted rather than fragments?

The fragmenter governs how large the chunks of text returned are. To use the entire field contents, just use NullFragmenter, instead of SimpleFragmenter.
Fragmenter fragmenter = new NullFragmenter();
.....
highlighter.TextFragmenter = fragmenter;

I had the same issue, even with the NullFragmenter, it only returned roughly 51 kB of text.
By analyzing the objects, I found out that there is another property at the highligher which sets how large a fragment would be at maximum. Set this value to the length of your string, then the whole document will be processed.
highlighter.TextFragmenter = new NullFragmenter();
highlighter.MaxDocCharsToAnalyze = text.Length;

Setting Custom Menu Field values in Rightnow API of Oracle

How to set Custom Menu Field values in Rightnow API of Oracle ?
I have a Custom field of data type Menu like :
Custom field Name : user type
Data Type : Menu
Value can be : Free, Paid or Premium
Can any one send me the java code by solving this problem?
Thanks in Advance

The following link is from the Oracle Service Cloud developer documentation. It has an example of setting a contact custom field using Java and Axis2, which would likely give you most of the information that you need in order to set your custom field.
At a high level, you must create an Incident object and specific the ID of the incident that you want to update. Then, you must create the custom field object structure using generic objects (because each site can have its own unique custom fields). Ultimately, your SOAP envelope will contain the node structure that you build through your java code. Since you're trying to set a menu, the end result is that your custom field is a NamedID object. You'll set the lookup name of the menu to one of the three values that you give above.
I'm a C# guy myself, so my example is in C#, but it should be easy to port to Java using the link above as an example too.
public static void SetMenuTest()
{
Incident incident = new Incident();
incident.ID = new ID();
incident.ID.id = 1234;
incident.ID.idSpecified = true;
GenericField customField = new GenericField();
customField.name = "user_type";
customField.dataType = DataTypeEnum.NAMED_ID;
customField.dataTypeSpecified = true;
customField.DataValue = new DataValue();
customField.DataValue.Items = new object[1];
customField.DataValue.ItemsElementName = new ItemsChoiceType[18]; //18 is a named ID value. Inspect ItemChoiceTypes for values.
customField.DataValue.Items[0] = "Free"; //Or Paid, or Premium
customField.DataValue.ItemsElementName[0] = ItemsChoiceType.NamedIDValue;
GenericObject customFieldsc = new GenericObject();
customFieldsc.GenericFields = new GenericField[1];
customFieldsc.GenericFields[0] = customField;
customFieldsc.ObjectType = new RNObjectType();
customFieldsc.ObjectType.TypeName = "IncidentCustomFieldsc";
GenericField cField = new GenericField();
cField.name = "c";
cField.dataType = DataTypeEnum.OBJECT;
cField.dataTypeSpecified = true;
cField.DataValue = new DataValue();
cField.DataValue.Items = new object[1];
cField.DataValue.Items[0] = customFieldsc;
cField.DataValue.ItemsElementName = new ItemsChoiceType[1];
cField.DataValue.ItemsElementName[0] = ItemsChoiceType.ObjectValue;
incident.CustomFields = new GenericObject();
incident.CustomFields.GenericFields = new GenericField[1];
incident.CustomFields.GenericFields[0] = cField;
incident.CustomFields.ObjectType = new RNObjectType();
incident.CustomFields.ObjectType.TypeName = "IncidentCustomFields";
}

Lucene DrillSideways with custom sort field or with TopScoreDocCollector

I am trying to implement drillsideways search with Lucene 4.6.1.
Following code works fine:
DrillSideways ds = new DrillSideways(searcher, taxoReader);
FacetSearchParams fsp = new FacetSearchParams(getAllFacetCounts());
DrillDownQuery ddq = new DrillDownQuery(fsp.indexingParams, mainQuery);
List<CategoryPath> paths = new ArrayList<CategoryPath>();
...
add category path
...
if (paths.size() >0)
ddq.add(paths.toArray(new CategoryPath[paths.size()]));
DrillSidewaysResult dsr = ds.search(null, ddq, 500, fsp); // <-- here
TopDocs topDocs = dsr.hits;
ScoreDoc[] hits = topDocs.scoreDocs;
// list search results
listSearchResults(searcher, hits, Math.min(500, topDocs.totalHits));
But what if I want to pass TopScoreDocCollector, like
// for now it is top score collector,
// but I may want to implement custom sort
TopScoreDocCollector topDocsCollector = TopScoreDocCollector.create(500, true);
DrillSidewaysResult dsr = ds.search(ddq, topDocsCollector, fsp);
the result is empty set and no errors. What is wrong?

I'm guessing you are referring to the value of DrillSidewaysResult.hits, and it is intended behavior, as noted in the documentation of DrillSidewaysResult:
Note that if you called DrillSideways.search(DrillDownQuery, Collector, FacetSearchParams), then hits will be null.
You should get your hits from the Collector instead.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How can I update document without losing fields? - lucene

Related

Apache Lucene fuzzy search for multi-worded phrases

Creating a MoreLikeThis query yields empty queries and terms

Lucene.net highlight searched term in the text

Setting Custom Menu Field values in Rightnow API of Oracle

Lucene DrillSideways with custom sort field or with TopScoreDocCollector

Categories

Resources