I am using the eXist implementation of Lucene. Is there a query that would allow me to find, for instance, all occurrences of <span>A</span> B in a document? I.e., all Bs that occur within 1 word of <span>A</span>, but aren’t wrapped in their own elements?
This XPath should do the trick:
//span[. = 'A'][following-sibling::node()[1] = ' B']
This doesn't make use of eXist's Lucene-based full text index, but you haven't said if you've applied an index to the span element here. If there's another aspect to the challenge, please let me know.
Related
I notice that in django when there is a sentence containing PLAZA/MASTERPIECE then when we search masterpiece I can't find this sentence. Is this a limitation of PostgreSQL full text search. Or how to solve this?
finalquery = SearchQuery("keyword")
vector = SearchVector('thefieldIwanttosearch')
self.search_results = self.search_results.annotate(search=vector).filter(search=finalquery).annotate(rank=SearchRank(vector, finalquery))
Is there any document about this? Thanks!
Yes, this is all documented.
When you write filter(search=finalquery) you're not specifying a lookup type.
As a convenience when no lookup type is provided (like in Entry.objects.get(id=14)) the lookup type is assumed to be exact.
So you're filtering on an exact match for "masterpiece". What you probably want is contains or icontains.
I want to make the first n (which i set) words from a document more important that the rest of the document in Lucene. How will i do that? I found something about boosting, but boost a field to be more important. My document is supposed to be an only field.
Is to number the words at indexing time and boost them a solution? Something like that:
TextField myField = new TextField("text",termAtt.toString(),Store.YES);
myField.setBoost(2);
document.add(myField);
if the i didn't reach the n-th word in my document?
I want to get the following result: let's say that the first 20 words in a document are more important than the rest. I have 2 identical documents that have more than 20 words and i add the word that i am searching in one document as th first word and in the second document as the last word, an i want that the first document to have a bigger score.
The best approach would be to simply create two different fields, one containing the higher value portion of the text (this wouldn't need to be stored), and the next containing the full text:
int leadinLength = 20
TextField myFieldLeadin = new TextField("text_leadin",termAtt.toString().substring(leadinLength,Store.NO);
TextField myField = new TextField("text, termAtt.toString(),Store.YES);
myFieldLeadin.setBoost(2);
document.add(myFieldLeadin);
document.add(myField);
To could use a MultiFieldQueryParser to streamline searching in both fields at once, if desired, like:
Query query = MultiFieldQueryParser.parse(Version.LUCENE_48, "my search query",{"text_leadin","text"}, analyzer);
TopDocs docs = searcher.search(query, 10);
Can someone help me to bring this code working? I have several select fields and I only want the last one in my variable.
variable = browser.elements_by_xpath('//div[#class="nested-field"]//select[last()]
Thanks!
This is a FAQ: The [] operator in XPath has higher precedence (priority) than the // pseudo-operator. This is why brackets must be used to change the default operator priorities. There are at least several similar questions with good explanations -- search for them and read and understand.
Instead of:
//div[#class="nested-field"]//select[last()]
Use:
(//div[#class="nested-field"]//select)[last()]
is the class attribute an exact match?
if the mark up is like this
<div class="nested-field other">
...
then you'll have to either match by the exact class or use xpath contains.
I'm extracting terms from the query calling ExtractTerms() on the Query object that I get as the result of QueryParser.Parse(). I get a HashTable, but each item present as:
Key - term:term
Value - term:term
Why are the key and the value the same? And more why is term value duplicated and separated by colon?
Do highlighters only insert tags or to do anything else? I want not only to get text fragments but to highlight the source text (it's big enough). I try to get terms and by offsets to insert tags by hand. But I worry if this is the right solution.
I think the answer to this question may help.
It is because .Net 2.0 doesnt have an equivalent to java's HashSet. The conversion to .Net uses Hashtables with the same value in key/value. The colon you see is just the result of Term.ToString(), a Term is a fieldname + the term text, your field name is probably "term".
To highlight an entire document using the Highlighter contrib, use the NullFragmenter
I am using Lucene to allow a user to search for words in a large number of documents. Lucene seems to default to returning all documents containing any of the words entered.
Is it possible to change this behaviour? I know that '+' can be use to force a term to be included but I would like to make that the default action.
Ideally I would like functionality similar to Google's: '-' to exclude words and "abc xyz" to group words.
Just to clarify
I also thought of inserting '+' into all spaces in the query. I just wanted to avoid detecting grouped terms (brackets, quotes etc) and potentially breaking the query. Is there another approach?
This looks similar to the Lucene Sentence Search question. If you're interested, this is how I answered that question:
String defaultField = ...;
Analyzer analyzer = ...;
QueryParser queryParser = new QueryParser(defaultField, analyzer);
queryParser.setDefaultOperator(QueryParser.Operator.AND);
Query query = queryParser.parse("Searching is fun");
Like Adam said, there's no need to do anything to the query string. QueryParser's setDefaultOperator does exactly what you're asking for.
Why not just preparse the user search input and adjust it to fit your criteria using the Lucene query syntax before passing it on to Lucene. Alternatively, you could just create some help documentation on how to use the standard syntax to create a specific query and let the user decide how the query should be performed.
Lucene has a extensive query language as described here that describes everything you want except for + being the default but that's something you can simple handle by replacing spaces with +. So the only thing you need to do is define the format you want people to enter their search queries in (I would strongly advise to adhere to the default Lucene syntax) and then you can write the transformations from your own syntax to the Lucene syntax.
The behavior is hard-coded in method addClause(List, int, int, Query) of class org.apache.lucene.queryParser.QueryParser, so the only way to change the behavior (other than the workarounds above) is to change that method. The end of the method looks like this:
if (required && !prohibited)
clauses.addElement(new BooleanClause(q, BooleanClause.Occur.MUST));
else if (!required && !prohibited)
clauses.addElement(new BooleanClause(q, BooleanClause.Occur.SHOULD));
else if (!required && prohibited)
clauses.addElement(new BooleanClause(q, BooleanClause.Occur.MUST_NOT));
else
throw new RuntimeException("Clause cannot be both required and prohibited");
Changing "SHOULD" to "MUST" should make clauses (e.g. words) required by default.