RavenDB MoreLikeThis - understanding results - lucene

I'm using Lucene's MoreLikeThis function with RavenDB, very much as per the docs:
MyDocument[] docs = session
.Advanced
.MoreLikeThis<MyDocument>(
"Documents/MoreLikeThis",
null,
new MoreLikeThisQuery
{
IndexName = "Documents/MoreLikeThis",
DocumentId = "Document/1",
Fields = new[] { "Body" },
});
It's performing quite well - some of the results are clearly similar. Others less apparently so. But I can't see any way to understand why a given match is returned - I have only an array of MyDocument.
Is there any way to get a better insight into the response? Some sort of similarity "score" or measure?

According to the sorting documentation, the score should be in the document's metadata under Temp-Index-Score.
Lucene also provides methods to get a more granular understanding of how that score was arrived at using IndexSearcher.explain. I couldn't find any reference to ravendb exposing the explanation, but wanted to mention it in case I just missed it.

Related

Laravel Scout toSearchableArray attribute is not filterable

I've been doing some testing with laravel scout and according to the documentation (https://laravel.com/docs/8.x/scout#configuring-searchable-data), I've mapped my User model as such:
/**
* Get the indexable data array for the model.
*
* #return array
*/
public function toSearchableArray()
{
$data = $this->toArray();
return array_merge($data, [
'entity' => 'An entity'
]);
}
Just for the sake of testing, this is literally what I came down to on the debugging.
After importing the User model with this mapping, I can see on the meilisearch dashboard it is indeed showing the user data + the entity = 'An entity'.
However, when applying this:
User::search('something')->where('entity', 'An entity')->get()
It produces the following error:
"message": " --> 1:1\n |\n1 | entity=\"An entity\"\n | ^----^\n |\n = attribute `entity` is not filterable, available filterable attributes are: ",
"exception": "MeiliSearch\\Exceptions\\ApiException",
"file": "/var/www/api/vendor/meilisearch/meilisearch-php/src/Http/Client.php",
Tracing back to view the 'filterable attributes', I've ended at the conclusion that:
$client = app(\MeiliSearch\Client::class);
dump($client->index('users')->getFilterableAttributes()); // Returns []
$client->index('users')->updateFilterableAttributes(['entity']);
dump($client->index('users')->getFilterableAttributes()); // Returns ['entity']
Forcing the updateFilterableAttributes now allows me to complete the search as intended, but I don't feel this should be the regular behaviour? If its mapped on the searchableArray, it should be searchable? What am I not understanding and what other approaches are there to achieve this goal?
This is actually not an issue but a requirement of meilisearch in particular. Scout under the hood uses different drivers for indexing - "algolia", "meilisearch", "database", "collection" and even "null", all of them have different indexing methods unifing of which would be troublesome and inefficient for scout I believe.
So filtering or a faceted search, as meilisearch refers to it, requires us to establish filtering criteria first, which is empty by default for document (models in laravel) fields.
Quoting from the docs:
This step is mandatory and cannot be done at search time. Filters need
to be properly processed and prepared by Meilisearch before they can
be used.
Updating filterableAttributes requires recreating the entire
index. This may take a significant amount of time depending on your
dataset size.
For more info please refer to meilisearch official docs https://docs.meilisearch.com/learn/advanced/filtering_and_faceted_search.html

How to query RavenDB using HTTP API for all documents of a type

I'm trying to query RavenDB using the HTTP client for all documents by type.
I would like a collection of the documents with a given type.
I understand that there might be limitations only the first 1024 documents will be returned.
I am well under that number and besides it's for a proof of concept.
I am able to obtain all the documents using the following syntax:
http://localhost:8080/databases/{database name}/docs/
I see that I could use the #metadata field to get the documents of the type I want but I don't know the syntax.
Since the HTTP api allows you to query indexes, I attempted to write a static index.
When I wrote the index from Raven Studio, the index was not returning the documents of the type I wanted. It was giving zero results.
from doc in docs.MyType
select new { doc};
I also tried this:
from doc in docs
let Tag = doc["#metadata"]["Raven-Entity-Name"]
where Tag == "MyType"
select new { doc};
You can do it using:
http://localhost:8080/databases/{database name}/indexes/dynamic/CollectionName

Best design approach to query documents for 'labels'

I am storing documents - and each document has a collection of 'labels' - like this. Labels are user defined, and could be any plain text.
{
"FeedOwner": "4ca44f7d-b3e0-4831-b0c7-59fd9e5bd30d",
"MessageBody": "blablabla",
"Labels": [
{
"IsUser": false,
"Text": "Mine"
},
{
"IsUser": false,
"Text": "Incomplete"
}
],
"CreationDate": "2012-04-30T15:35:20.8588704"
}
I need to allow the user to query for any combination of labels, i.e.
"Mine" OR "Incomplete"
"Incomplete" only
or
"Mine" AND NOT "Incomplete"
This results in Raven queries like this:
Query: (FeedOwner:25eb541c\-b04a\-4f08\-b468\-65714f259ac2) AND (Labels,
Text:Mine) AND (Labels,Text:Incomplete)
I realise that Raven will generate a 'dynamic index' for queries it has not seen before. I can see with this, this could result in a lot of indexes.
What would be the best approach to achieving this functionality with Raven?
[EDIT]
This is my Linq, but I get an error from Raven "All is not supported"
var result = from candidateAnnouncement in session.Query<FeedAnnouncement>()
where listOfRequiredLabels.All(
requiredLabel => candidateAnnouncement.Labels.Any(
candidateLabel => candidateLabel.Text == requiredLabel))
select candidateAnnouncement;
[EDIT]
I had a similar question, and the answer for that resolved both questions: Raven query returns 0 results for collection contains
Please notice that in case of FeedOwner being a unique property of your documents the query doesn't make a lot of sense at all. In that case, you should do it on the client using standard linq to objects.
Now, given that FeedOwner is not something unique, your query is basically correct. However, depending on what you actually want to return, you may need to create a static index instead:
If you're using the dynamically generated indexes, then you will always get the documents as the return value and you can't get the particular labels which matched the query. If this is ok for you, then just go with that approach and let the query optimizer do its job (only if you have really a lot of documents build the index upfront).
In the other case, where you want to use the actual labels as the query result, you have to build a simple map index upfront which covers the fields you want to query upon, in your sample this would be FeedOwner and Text of every label. You will have to use FieldStorage.Yes on the fields you want to return from a query, so enable that on the Text property of your labels. However, there's no need to do so with the FeedOwner property, because it is part of the actual document which raven will give you as part of any query results. Please refer to ravens documentation to see how you can build a static index and use field storage.

Understanding Orchard Joins and Data Relations

In Orchard, how is a module developer able to learn how "joins" work, particularly when joining to core parts and records? One of the better helps I've seen was in Orchard documentation, but none of those examples show how to form relations with existing or core parts. As an example of something I'm looking for, here is a snippet of module service code taken from a working example:
_contentManager
.Query<TaxonomyPart>()
.Join<RoutePartRecord>()
.Where(r => r.Title == name)
.List()
In this case, a custom TaxonomyPart is joining with a core RoutePartRecord. I've investigated the code, and I can't see how that a TaxononmyPart is "joinable" to a RoutePartRecord. Likewise, from working code, here is another snippet driver code which relates a custom TagsPart with a core CommonPartRecord:
List<string> tags = new List<string> { "hello", "there" };
IContentQuery<TagsPart, TagsPartRecord> query = _cms.Query<TagsPart, TagsPartRecord>();
query.Where(tpr => tpr.Tags.Any(t => tags.Contains(t.TagRecord.TagName)));
IEnumerable<TagsPart> parts =
query.Join<CommonPartRecord>()
.Where(cpr => cpr.Id != currentItemId)
.OrderByDescending(cpr => cpr.PublishedUtc)
.Slice(part.MaxItems);
I thought I could learn from either of the prior examples of how to form my own query. I did this:
List<string> tags = new List<string> { "hello", "there" };
IContentQuery<TagsPart, TagsPartRecord> query = _cms.Query<TagsPart, TagsPartRecord>();
query.Where(tpr => tpr.Tags.Any(t => tags.Contains(t.TagRecord.TagName)));
var stuff =
query.Join<ContainerPartRecord>()
.Where(ctrPartRecord => ctrPartRecord.ContentItemRecord.ContentType.Name == "Primary")
.List();
The intent of my code is to limit the content items found to only those of a particular container (or blog). When the code ran, it threw an exception on my join query saying {"could not resolve property: ContentType of: Orchard.Core.Containers.Models.ContainerPartRecord"}. This leads to a variety of questions:
Why in the driver's Display() method of the second example is the CommonPartRecord populated, but not the ContainerPartRecord? In general how would I know what part records are populated, and when?
In the working code snippets, how exactly is the join working since no join key/condition is specified (and no implicit join keys are apparent)? For example, I checked the data migration file and models classes, and found no inherent relation between a TagsPart and a CommonPartRecord. Thus, besides looking at that sample code, how would anyone have known in the first place that such a join was legal or possible?
Is the join I tried with TagsPart and ContainerPartRecord legal in any context? Which?
Is the query syntax of these examples primarily a reflection of Orchard, of NHibernate, or LINQ to NHibernate? If it is primarily a reflection of NHibernate, then which NHibernate book or article is recommended reading so that I can dig deeper into Orchard?
It seems there is a hole in the documentation regarding these kinds of thoughts and questions, which makes it hard to write a module. Whatever answers can be found for this topic, I'd be glad to compile into an article or community Orchard documentation.
The join is only there to enable the where that follows it. It doesn't mean that the part being joined will be actually brought down from the DB. That will happen no matter what with the latest 1.x source, and will happen lazily with 1.3.
You don't need a condition as you can only join parts this way. The join condition is implicit: parts are joined by the item id.
Yes. What is not legal is that the condition in the where is using data that is not available from the joined part records.
Those examples are all Orchard Content Manager queries, so they are fairly constrained, but also fairly easy to build as long as you don't step outside of their boundaries because so much can be assumed and will happen implicitly. If you need more control, you could use the new HQL capabilities that were added in the latest 1.x drops.
As for holes in the documentation, well, but of course. The documentation that we have today is only covering a very small part of the platform. Your best reference today is the source code. Any contribution you could make to this is highly appreciated by us and by the rest of the community. Let me know if you need help with this.

Advice on polling for new documents in RavenDB

I want to poll for new documents in my Raven DB. What is the recommended way of doing this? Can I use the IndexTimestamp or can I rely on the order of the documents?
I guess I want to either do it in two steps:
1. Check if there is anything new, if so:
1.1. Get the latest X documents.
Or in one step: Get the latest X documents and have it return those or tell me that there's nothing new according some argument I sent.
FYI: I have no corresponding CLR objects to the documents.
I would not poll for it, but I would use the Changes API included with RavenDB to just get the continuous stream of documents from RavenDB.
Check out the Changes API here http://ravendb.net/docs/2.0/client-api/changes-api
I personally would use the Changes API with some kind of Message Bus (RabbitMQ) to make sure every change is processed and resilient.
If you still want to poll, just create an index with your date time and sort in descending order.
var result = session.Query<Orders>()
.OrderByDescending(x => x.Created)
.Take(10)
.ToList();
If you need to process every document, you might want to create marker documents that include the id of the document you get and make sure they have not been processed.
To do that:
marker id : polling/processed/order/1
Index:
from o in orders
let processed = LoadDocument("polling/processed/" + o.Id)
select new {
WasProcessed = processed != null,
Created = o.Created
}
A few options for you, hope that helps :)