Ravendb. Filter documents considered for suggestions

Ravendb. Filter documents considered for suggestions - ravendb

I would like to use suggest query and filter documents to be considered for suggestions by few fields. Is it even possible? I could not find anything about this in ravendb documentation link to doc
I have tried to add my filter conditions to queryable but no luck
using (IDocumentSession documentSession = _storeProvider.GetStore().OpenSession())
{
var queryable = documentSession.Query<SearchableProduct>("SearchableProducts");
var result = queryable
//I would like to filter by this field!
.Where(m => m.BrandNo == query.BrandNumber)
.Suggest(new SuggestionQuery
{
Term = query.SearchTerm,
Accuracy = 0.4f,
Field = nameof(SearchableProduct.ProductName),
MaxSuggestions = 10,
Distance = (StringDistanceTypes)2,
Popularity = true
});
return result.Suggestions;
}
Ravendb version: 3.0

You cannot use additional filters on suggestion query.
The way suggestion works, it evaluate a search term against the stored terms in the index, without considering other fields that may apply there.
You can use facets, to do filtering based on additional filters, and use the suggestion output as input to the facets, though.

Related

Lucene calculate term vectors for existing index

With Lucene.net I would like to get the term vectors as described in this stackoverflow question.
The problem is, the index is already generated with the field indexed and stored, but without term vectors.
FieldType type = new FieldType();
type.setIndexed(true);
type.setStored(true);
type.setStoreTermVectors(false);
Theoretically, it should be possible to re-calculate the term vectors for each document and then store it in the index.
Do you know how this could be possible, without deleting the complete Lucene index?

As mentioned in my comments in the question, you can generate term vector data on-the-fly, which may help you to avoid a complete rebuild of your indexed data.
In my scenario, I want to find the offset positions of my search term in the matched document.
I don't want to oversell this approach - it's absolutely not a substitute for re-indexing - but if your queries are basic, it may help.
Step 1: Perform whatever query you are currently performing.
For each document in the list of hits, you will then need to re-process the relevant field from that document - so, either you already have the field data stored in your existing index, or you will need to retrieve it from its original source.
Step 2: For each such field, you can re-use the same analyzer to build a token stream on-the-fly. The token stream can be configured with different attributes, such as:
token attributes
offset attributes
and others (see here)
Example:
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Analysis.TokenAttributes;
using Lucene.Net.Util;
const LuceneVersion AppLuceneVersion = LuceneVersion.LUCENE_48;
String? fieldName = null;
String fieldContent = "Foo Bar Baz Bar Bat";
String searchTerm = "bar";
var analyzer = new StandardAnalyzer(AppLuceneVersion);
var ts = analyzer.GetTokenStream(fieldName, fieldContent);
var charTermAttr = ts.AddAttribute<ICharTermAttribute>();
var offsetAttr = ts.AddAttribute<IOffsetAttribute>();
try
{
ts.Reset();
Console.WriteLine("");
Console.WriteLine("Token: " + searchTerm);
while (ts.IncrementToken())
{
if (searchTerm.Equals(charTermAttr.ToString()))
{
var start = offsetAttr.StartOffset;
var end = offsetAttr.EndOffset;
Console.WriteLine(String.Format(" > offset: {0}-{1}", start, end));
}
}
ts.End();
}
catch (Exception)
{
throw;
}
The above example assumes one of the hits from step 1 was a field containing "Foo Bar Baz Bar Bat" - with a search term of bar.
The output generated is:
Token: bar
> offset: 4-7
> offset: 12-15
So, as you can see, you are not re-executing a query - you are just re-processing a token stream. The more complex the original search term is, the harder it will be to make this approach work the way you probably need it to.

How to get many terms matched using Hibernate Search query DSL?

When I search for "cars blue" I get every result that matches "cars" or "blue", but I need to match them both. I've read about setting some defaultOperator to AND but I can't find where to do that,
Also I can't use PhraseQuery because the order of the terms in the search query is irrelevant,
This is my code so far, thanks!
// create the query using Hibernate Search query DSL
QueryBuilder queryBuilder = fullTextEntityManager.getSearchFactory()
.buildQueryBuilder().forEntity(Articulo.class).get();
// a very basic query by keywords
BooleanJunction<BooleanJunction> bool = queryBuilder.bool();
bool.must(queryBuilder.keyword()
.onFields("description")
.matching(text)
.createQuery()
);
Query query = bool.createQuery();
FullTextQuery jpaQuery =
fullTextEntityManager.createFullTextQuery(query, Articulo.class);
return jpaQuery.getResultList();
Note: I'm using Hibernate Search 5.6.4

I think you're looking for the Simple query string feature.
See http://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_simple_query_string_queries for more details about it.
You have an example with .withAndAsDefaultOperator():
Query luceneQuery = mythQB
.simpleQueryString()
.onField("history")
.withAndAsDefaultOperator()
.matching("storm tree")
.createQuery();
This blog post explaining the rationale of this feature might be helpful too: http://in.relation.to/2017/04/27/simple-query-string-what-about-it/ .

Limitations of Filters to search data

I am exploring how I can write generic query for any Node given a set of search parameters and came across org.neo4j.ogm.cypher.Filters (in neo4j-ogm-core-2.0.3.jar)
I would have liked to have more options for ComparisionOperator like CONTAINS, IN, STARTSWITH etc.
Right now the operators supported are:
EQUALS("=")
MATCHES("=~")
LIKE("=~", new CaseInsensitiveLikePropertyValueTransformer())
GREATER_THAN(">")
LESS_THAN("<")
Are there is any plan to enhance this to support more operations?
Here is an example of how I am using Filters:
public Collection<User> findUserByFirstNameLike(String firstName) {
Filters filters = new Filters();
Filter firstNameFilter = new Filter("firstName", firstName);
firstNameFilter.setComparisonOperator(ComparisonOperator.LIKE);
filters.add(firstNameFilter);
Collection<User> users = session.loadAll(User.class, filters);
return users;
}

Filters has been updated in neo4j-ogm-core version 2.1.0.
All 3 options you want to see (CONTAINS, IN, STARTS WITH) are available in this version along with:
LESS_THAN("<")
LESS_THAN_EQUAL("<=")
IS_NULL("IS NULL")
ENDING_WITH("ENDS WITH")
EXISTS("EXISTS")
IS_TRUE("=")

Lucene fuzzy search on a phrase (FuzzyQuery + SpanQuery)

I am looking for a way of coding the lucene fuzzy query that searches all the documents, which are relevant to an exact phrase. If I search "mosa employee appreciata", a document contains "most employees appreciate" will be returned as the result.
I tried to use:
FuzzyQeury = new FuzzyQuery(new Term("contents","mosa employee appreicata"))
Unfortunately, it empirically doesn't work. The FuzzyQuery employs the editor distance, theoretically, "mosa employee appreciata" should be matched with "most employees appreciate" provide the appropriate distance is given. It seems a bit odd.
Any clues? Thank you.

There are two likely problems here. First: I'm guessing the "contents" field is being analyzed such that "most employees apreciate" is not a term, but rather three terms. Defining as a single term is not appropriate in this case.
However, even if the content listed is a single term, a second likely problem we have is that there is too much distance between the terms to get a match. The Damerau-Levenshtein distance between mosa employee appreicata and most employees appreciate is 4 (the approximate distance, incidentally, between my average first shot at spelling
"Damerau-Levenshtein" and the correct spelling). Fuzzy Query, as of 4.0, handles edit distances of no more than 2, due to performance constraints, and the assumption that larger distances are usually not particularly relevant.
If you need to perform a phrase query with fuzzy terms, you should look into either MultiPhraseQuery, or combine a set of SpanQueries (especially SpanMultiTermQueryWrapper and SpanNearQuery) to meet your needs.
SpanQuery[] clauses = new SpanQuery[3];
clauses[0] = new SpanMultiTermQueryWrapper(new FuzzyQuery(new Term("contents", "mosa")));
clauses[1] = new SpanMultiTermQueryWrapper(new FuzzyQuery(new Term("contents", "employee")));
clauses[2] = new SpanMultiTermQueryWrapper(new FuzzyQuery(new Term("contents", "appreicata")));
SpanNearQuery query = new SpanNearQuery(clauses, 0, true)
And since none of the individual terms have an edit distance greater than 2, this should be more effective.

ComplexPhraseQueryParser handles fuzzy searching on phrase words - i.e., specify the words that should be fuzzy searched and those that should not. Works as follows
Query query = new ComplexPhraseQueryParser("content", analyzer)
.parse("some test~ query~ blah blah");
Seems to work nicely. Not sure about performance, however but seems to work well on small data sets.

I had some (very small) millage with the following:
String[] searchTerms = searchString.split(" ");
FuzzyLikeThisQuery fltw = new FuzzyLikeThisQuery(searchTerms.length, new StandardAnalyzer());
Arrays.stream(searchTerms)
.forEach(term -> fltq.addTerms(term, FIELD, SIMILARITY_IN_EDITS, PREFIX_LENGTH);
This query matches far too distant strings with the index. String that don't match are ones where each of the terms are distant by more than 2 edits from the terms used in the indexed content.
Please use at your own peril.

The answer from femtoRgon is great! Thank you.
There is another way to solve this problem.
//declare a mutilphrasequery
MultiPhraseQuery childrenInOrder = new MultiPhraseQuery();
//user fuzzytermenum to enumerate your query string
FuzzyTermEnum fuzzyEnumeratedTerms1 = new FuzzyTermEnum(reader, new Term(searchField,"mosa"));
FuzzyTermEnum fuzzyEnumeratedTerms2 = new FuzzyTermEnum(reader, new Term(searchField,"employee"));
FuzzyTermEnum fuzzyEnumeratedTerms3 = new FuzzyTermEnum(reader, new Term(searchField,"appreicata"));
//this basically pull out the possbile terms from the index
Term termHolder1 = fuzzyEnumeratedTerms1.term();
Term termHolder2 = fuzzyEnumeratedTerms2.term();
Term termHolder3 = fuzzyEnumeratedTerms3.term();
//put the possible terms into multiphrasequery
if (termHolder1==null){
childrenInOrder.add(new Term(searchField,"mosa"));
}else{
childrenInOrder.add(fuzzyEnumeratedTerms1.term());
}
if (termHolder2==null){
childrenInOrder.add(new Term(searchField,"employee"));
}else{
childrenInOrder.add(fuzzyEnumeratedTerms2.term());
}
if (termHolder3==null){
childrenInOrder.add(new Term(searchField,"appreicata"));
}else{
childrenInOrder.add(fuzzyEnumeratedTerms3.term());
}
//close it - it is important to close it
fuzzyEnumeratedTerms1.close();
fuzzyEnumeratedTerms2.close();
fuzzyEnumeratedTerms3.close();

PDO fetchColumn() and fetchObject() which is better and proper usage

It's been bugging me, I have a query which returns a single row and I need to get their corresponding column value.
//Retrieve Ticket Information to Database
$r = db_query("SELECT title, description, terms_cond, image, social_status, sched_stat FROM giveaway_table WHERE ticket_id = :ticket_id",
array(
':ticket_id' => $ticket_id
));
There are two ways that I can get data which is, by using fetchColumn() and fetchObject()
fetchObject()
$object = $r->fetchObject();
$ticket_info[] = $object->title;
$ticket_info[] = $object->description;
$ticket_info[] = $object->terms_cond;
$ticket_info[] = $object->image;
$ticket_info[] = $object->social_status;
$ticket_info[] = $object->sched_stat;
fetchColumn()
$title = $r->fetchColumn() //Returns title column value
$description = $r->fetchColumn(1) //Returns description column value
Was wondering, which one is better, or are there any pros and cons about this stuff?
if possible, can you guys also suggest the best way (if there's any) on how to retrieve all columns that's been selected in a query and store it into an array with less line of code.

There are two ways that I can get data which is, by using fetchColumn() and fetchObject()
really? what about fetch()?
There is a PDO tag wiki where you can find everything you need

I don't know pros and cons of using it. In my project I often used fetching as array rather than object. It was more comfortable. But if you make ORM projects then maybe it would be better to use fetchObject and make it your object not a std_class. You could make a contructor that has one parametr which is stdClass and make your object from this class
Answering your other question you can fetch all columns using fetchAll();
Follow this link to learn more about this function http://www.php.net/manual/en/pdostatement.fetchall.php
More about abstract database layer you can find here -> http://www.doctrine-project.org/

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Ravendb. Filter documents considered for suggestions - ravendb

Related

Lucene calculate term vectors for existing index

How to get many terms matched using Hibernate Search query DSL?

Limitations of Filters to search data

Lucene fuzzy search on a phrase (FuzzyQuery + SpanQuery)

PDO fetchColumn() and fetchObject() which is better and proper usage

Categories

Resources