Apache Solr 7.4 : Nesting with "_childDocuments_" not working, the document is still flat - apache

My use case is this : I have a Parent -> Children -> GrandChildren hierarchy.
I would like to ingest documents as nested and would like to do BlockJoin queries to retrieve all grandchildren of a particular parent, all children of particular parent etc.
I have defined the appropriate fields in the schema (using curl) and copy fields and field-types as required by my application. I have also defined "text" as a copy field for everything as I have to support random searches.
I have defined the document to ingest as follows :
{
"id": "3443",
"path": "1.employee",
"employeeId": 3443,
"employeeName": "Tom",
"employeeCounty": "Maricopa",
"_childDocuments_": [{
"id": "3443.54545454",
"path": "2.employee.assets",
"assetId": 54545454,
"assetName": "Lenovo",
"assetType": "Laptop",
"_childDocuments_": [{
"id": "3443.54545454.5764646",
"path": "3.employee.assets.assetType",
"processorId": 5764646,
"processorType": "Intel core i7"
}]
}]
}
Now when I query using the Admin UI, I am getting the following flattened out object, also block join queries don't work as well :
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"_":"1533252181415"}},
"response":{"numFound":1,"start":0,"docs":[
{
"id":"3443",
"employeeId":3443,
"text":["3443",
"Tom",
"Maricopa"],
"employeeName":"Tom",
"employeeCounty":"Maricopa",
"_childDocuments_.id":[3443.54545454,
3443.643534544],
"_childDocuments_.path":["2.employee.assets],
"_childDocuments_.assetId":[54545454,
643534544],
"_childDocuments_.assetName":["Lenovo"],
What am I missing? How can I make Solr process the nested documents like they are supposed to be rather than flattening them out?
Any help is appreciated.

Found the solution. I was using the wrong URL to post.
I was using http://localhost:8983/solr/my-core/update/json/docs
Instead I should just use http://localhost:8983/solr/my-core/update
Because I am already formatting the doc in Solr format and Solr neednt do any special processing to index it.

Related

How do I join and project a multi-map index?

I'm struggling to get my head around this one and I know the way to do this is through a custom index. Essentially, I have several collections that share some common properties, one of which is "system id" which describes a many-to-one relationship, e.g.
// Id() = "component:a"
{
"Name": "Component A"
"SystemId": "system:foo"
}
// Id() = "resource:a"
{
"Name": "Resource A",
"SystemId": "system:foo"
}
So these are two example objects which live in two different collections, Components and Resources, respectively.
I have another collection called "Notifications" and they have a RecipientID which refers to the Id of one of the entities described above. e.g.
// Id() = "Notifications/84-A"
{
"RecipientID": "component:a",
"Message": "hello component"
}
// Id() = "Notifications/85-A"
{
"RecipientID": "resource:a",
"Message": "hello resource"
}
The query that I want to be able to perform is pretty straight forward -- "Give me all notifications that are addressed to entities which have a system of ''" but I also want to make sure I have some other bits from the entities themselves such as their name, so a result object something like this:
{
"System": "system:foo",
"Notifications": [{
"Entity": "component:a",
"EntityName": "Component A",
"Notifications": [{
"Id": "Notifications/84-A",
"Message": "hello component"
}]
}, {
"Entity": "resource:a",
"EntityName": "Resource A",
"Notifications": [{
"Id": "Notifications/85-A",
"Message": "hello resource"
}]
}]
}
Where I am with it right now is that I'm creating a AbstractMultiMapIndexCreationTask which has maps for all of these different entities and then I'm trying to use the reduce function to mimic the relationship.
Where I'm struggling is with how these map reduces work. They require that the shape is identical between maps as well as the reduce output. I've tried some hacky things like including all of the notifications as part of the map and putting dummy values where the properties don't match, but besides it making it really hard to read/understand, I still couldn't figure out the reduce function to make it work properly.
How can I achieve this? I've been digging for examples, but all of the examples I've come across make some assumptions about how references are arranged. For example, I don't think I can use Include because of the different collections (I don't know how Raven would know what to query), and coming from the other direction, the entities don't have enough info to include or load notifications (I haven't found any 'load by query' function). The map portion of the index seems fine, I can say 'give me all the entities that share this system, but the next part of grabbing the notifications and creating that response above has eluded me. I can't really do this in multiple steps either, because I also need to be able to page. I've gone in circles a lot and I could use some help.
How about indexing the related docs ?
Something like this (a javascript index):
map("Notifications", (n) => {
let c = load(n.RecipientID, 'Components');
let r = load(n.RecipientID, 'Resources');
return {
Message: n.Message,
System: c.SystemId || r.SystemId,
Name: c.Name || r.Name,
RelatedDocId: id(c) || id(r)
};
})
Then you can query on this index, filter by the system value, and get all matching notifications docs.
i.e. sample query:
from index 'yourIndexName'
where System == "system:foo"
Info about related documents is here:
RavenDB Demo
https://demo.ravendb.net/demos/csharp/related-documents/index-related-documents
Documentation
https://ravendb.net/docs/article-page/5.4/csharp/indexes/indexing-related-documents

In Scrapy, download files nested below the to of a yielded item dict

To download files in Scrapy, one adds the key 'fileurls' to the yielded item dict with a value of the urls to download. But my files are nested somewhere below the top level of the yielded dict. An item looks like this:
{
"title": "foo",
"files": {
"drawings": [
{
"caption": "bar",
"fileurl": "http://foo.com/foo/foo.pdf"
},
{
"caption": "second floor",
"fileurl": "http://foo.com/foo/bar.pdf"
}
],
"photos": [
{
"caption": "bar",
"fileurl": "http://foo.com/foo/baz.pdf"
}
]
}
}
Ideally, I'd like each file downloaded and have scrapy add its "file" element next to the "fileurl". But this does not seem to work automatically.
How can I achieve this? The current version of Scrapy is 1.6.0.
To do something like this, you will need to make your own subclass of scrapy's FilesPipeline.
To make the downloading happen, you'll need a custom get_media_requests method, which should get the URLs from your item and return an iterable of requests which will be used to download the files.
After that, you'll also need to modify the item_completed and/or the file_downloaded method to store the result in the exact way you want.
If you need more details than what's provided in the docs, take a look at the source and see how the existing pipeline works.

Rendering same component (with synced data) twice on same page

I've read a lot of documentation but I can't get the following use case to work:
I've got a component 'product-filter'. This component contains the child component 'product-filter-option' which renders a individual filter option (checkbox with label)
The json data for a product-filter instance looks like:
"name": "category",
"title": "Category",
"options": [
{
"value": "value",
"label": "Label 1",
"active": true,
"amount": 8
},
{
"value": "value2",
"label": "Label 2",
"amount": 15
},
etc.
]
I've got multiple instances of product-filter (and a lot of product-filter-option instances) on my page. So far so good.
Now I'd like to render one of my filters (eg. the given Category filter) multiple times on my page (sort of current 'highlighted' filter, which can change during user interaction).
So I've tried to fix this with the following template code:
<filter-component v-if="activefilter"
:name="activefilter.name"
:type="activefilter.type"
:title="activefilter.title"
:tooltip="activefilter.tooltip"
:configuration="activefilter.configuration"
:options="activefilter.options">
</filter-component>
So this filter now shows up 2 times on my page (only when the activefilter property in the vue app is set). But as you might guess when changing an option in this 'cloned' filter the original filter doesn't change, because the data is not synced between these 'clones'.
How can I fix this with Vue?
Thanks for your help!
#roy-j, thanks for your comment about sync. I already tried that by setting:
<filter-component v-if="activefilter"
:name="activefilter.name"
:type="activefilter.type"
:title="activefilter.title"
:tooltip="activefilter.tooltip"
:configuration="activefilter.configuration"
:options.sync="activefilter.options">
</filter-component>
This didn't work. But You got got me thinking, the options sync was not the issue, the sync of the 'checked' state was the issue.
It worked by changing :checked="option.active" to :checked.sync="option.acitve" to the child component: 'filter-option-component'!
Thanks!!

How can I use REST url as data in Vega-lite

I have this REST API that returns tabular data in the following way:
{"data": [{"el1": 8, "el2": 9}, {"el1": 3, "el2": 4}]}
I would like to use el1 and el2 in a Vega-lite chart. How should I refer to the elements in the array?
From the documentation here:
(JSON only) The JSON property containing the desired data. This parameter can be used when the loaded JSON file may have surrounding structure or meta-data. For example
"property": "values.features"
is equivalent to retrieving json.values.features from the loaded JSON object.
It seems that you can try to specify the "property" property (punny, eh) on the format. Something like this:
"data": {
"url": <your url here>,
"format": {
"type": "json",
"property": "data"
},
}
Disclaimer: I haven't actually tested this but it looks to be supported (:

REST API: fields of objects in a list of objects in response JSON

Suppose we are building one-page app with two views: list view and detail view.
In list view, we present a list of objects with just their names and maybe some more minimal data.
In detail view, we present all possible fields of particular object.
Hence the question: when we GET /api/items/, should we or should not to JSON-encode all fields of the objects listed, or just those presented in list view?
In other words, if we show list of food like
Name Price
Potato 1
Milk 2
does our API need to respond with JSON like this:
{
[
{
"name": "Potato",
"quantity": "1 kg",
"origin": "Egypt",
"manufacturer": "Egypt Farmers",
"price": 1,
"packaging": "String bag",
"_type": "Food"
},
{
"name": "Milk",
"quantity": "1 litre",
"origin": "Finland",
"manufacturer": "Valio",
"price": 2,
"packaging": "Tetra Pak",
"_type": "Food"
},
]
}
or like this:
{
[
{
"name": "Potato",
"price": 1,
"_type": "Food"
},
{
"name": "Milk",
"price": 2,
"_type": "Food"
},
]
}
The RESTful API should concentrate on the resources that are represented, not necessarily how those resources are used.
In a master/detail scenario, typically the master will contain details of the master object, and include a list of its details (including a link to the API for each detail resource. So /api/items/ might look like this:
{
items: [
{ name: 'item 1', href: '/api/items/1' },
{ name: 'item 2', href: '/api/items/2' }
]
}
The detail resource would contain properties of an individual item in the items list. So the /api/items/{itemName} api might look like this:
{
name: 'item 1',
color: 'blue',
weight: 100,
id: '/api/items/1'
}
So this would probably be closest to your second scenario. There are a number of advantages to this model: it probably matches the domain model that your api is accessing, it makes each api very simple and single-purpose, it's easy to scale, even to very large lists. The disadvantage is that it may lead to more complexity on the client.
The answer as usual may be: it all depends ;)
In case of the connection is limited or unstable (e.g. mobile connection like LTE or even wifi) the best idea is to return the whole list of resources with all fields filled and use the same data on both views. In the company I work for we often take this approach since our backend almost always provide data for mobile applications.
The second idea is to use a mechanism called field or resource expansion. In general a request is made to the endpoint and fields of resources to be returned are included in this request:
/api/items?fields=(name, quantity, origin, whatever)
This mechanism is very convenient since you can use this endpoint to server multiple views without any performance loss.
Personally I'd use two endpoints. An /api/items/ endpoint with field/resource expansion mechanism built-in (with a limited list of fields that can be expanded) and the second one /api/items/{itemID}/ to return a particular item with all the data. This is also the most RESTful approach.