Solr Nested Documents: query for parent document which has several specific nested documents - indexing

Let's imagine I have document with several nested documents:
{
"id": "doc1",
"type": "maindoc",
"title": "some document 1 title"
"nested": [
{
"id": "nested1",
"nested_type": "nestedType1",
"title": "nested doc 1 title"
},
{
"id": "nested2",
"nested_type": "nestedType2",
"title": "nested doc 2 title"
},
{
"id": "nested3",
"nested_type": "nestedType3",
"title": "nested doc 3 title"
}
]
}
So now if I want to search for document which has nested doc 1 - I do this:
{!parent which='type:maindoc'}
nested_type:nestedType1
But what if I want to search for document which has 2 specific children at the same time?
For example I want to find doc which has both nestedType1 + nestedType2.
Obviously query like this will not work:
{!parent which='type:maindoc'}
nested_type:nestedType1 AND nested_type:nestedType2
So how can I do that? Is that possible at all?

Something like this did the trick in my testing:
({!parent which='type:maindoc' v='nested_type:nestedType1'}) AND ({!parent which='type:maindoc' v='nested_type:nestedType2'})

Related

Snowflake JSON FLATTEN with ORDER BY

I have a working query that flattens a nested JSON object into rows of data. What I would like to do, however, is preserve the original order of one array of objects which is nested several layers in.
I have tried to use ROW_NUMBER with an ORDER BY NULL and an ORDER BY (SELECT NULL) and neither seem to preserve the order.
Any ideas on how to accomplish that? Examples below. I chose to mask the real data, but the important parts of the structure are the same. The data in JSON format comes through with no rank-identifying information, but I used numbers as examples here to show the strange results.
Original structure (masked):
{
"topNode: {
"childNode": {
"list": [
{
"title": "example title 1",
},
{
"title": "example title 2",
},
{
"title": "example title 3",
},
{
"title": "example title 4",
},
{
"title": "example title 5",
}
]
}
}
}
Example query (masked):
SELECT
A.VALUE:"title"::VARCHAR AS "TITLE",
ROW_NUMBER() OVER(ORDER BY NULL) AS RANK
FROM
DB.SCHEMA.TABLE as A,
lateral flatten(input=>A.JSON:topNode.childNode.list) "list_flatten"
Example output:
TITLE RANK
"example title 3" 1
"example title 5" 2
"example title 2" 3
"example title 1" 4
"example title 4" 5
It is possible with INDEX, which returns index of element in array:
SELECT A.VALUE:"title"::VARCHAR AS "TITLE",
"list_flatten".index AS "RANK"
FROM DB.SCHEMA.TABLE as A,
lateral flatten(input=>A.JSON:topNode.childNode.list) "list_flatten"

SQL - Extract values from key value pairs to array

I'm querying some data (SQL, presto), the source data has an array struct that includes a name and ID. I need to have an array of the IDs.
The data looks like:
[ { "id": 123456789,
"name": "name 1" },
{ "id": 234567891,
"name": "name 2" }
]
and I need it to look like:
[123456789, 234567891]
Do you know how I can achieve this?
It's: MAP_KEYS(MAP_FROM_ENTRIES(column))

Query for entire JSON document in nested JSON schema

Background:
I wish to locate the entire JSON document that has a condition where "state" = "new" and where length(Features.id) > 4
{
"id": "123"
"feedback": {
"Features": [
{
"state": "new"
"id": "12345"
}
]
}
}
This is what I have tried to do:
Since this is a nested document. My query looks like this:
A stackoverflow member has helped me to access the nested contents within the query, but is there a way to obtain the full document
I have used:
SELECT VALUE t.id FROM t IN f.feedback.Features where t.state = 'new' and length(t.id)>4
This will give me the ids.
My desire is to have access to the full document with this condition?
{
"id": "123"
"feedback": {
"Features": [
{
"state": "new"
"id": "12345"
}
]
}
}
Any help is appreciated
Try this
SELECT *
FROM f
WHERE
f.feedback.Features[0].state = 'new'
AND length(f.feedback.Features[0].id)>4
Here is the SELECT spec for CosmosDB for more details
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-select
Also, check out "working with JSON" in CosmosDB notes
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-working-with-json
If the Features array has more than 1 value, you can use EXISTS clause to search within them. See specs of EXISTS here with examples:
https://learn.microsoft.com/en-us/azure/cosmos-db/sql-query-subquery#exists-expression

Searching within an array in kibana

I am pushing my logs to elasticsearch which stores a typical doc as-
{
"_index": "logstash-2014.08.11",
"_type": "machine",
"_id": "2tSlN1P1QQuHUkmoJfkmnQ",
"_score": null,
"_source": {
"category": "critical log with list",
"app_name": "attachment",
"stacktrace_array": [
"this is the first line",
"this is the second line",
"this is the third line",
"this is the fourth line",
],
"#timestamp": "2014-08-11T13:30:51+00:00"
},
"sort": [
1407763851000,
1407763851000
]
}
Kibana makes searching substrings very easy. For example searching for "critical" in the dashboard will fetch all logs with the word critical in any string mapped value.
How do i go about searching for something like "second line" which is a string nested in an array within my doc?
It would be a simple field:<search_term> query, like -
"query": {
"query_string": {
"query": "stacktrace_array:*second line*"
}
...
So in layman terms, for Kibana dashboard, put your search query like so -
stacktrace_array:*second line*

How to get list of statements for a given Wikidata ID?

The only thing I managed to do is this link:
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q568&format=jsonfm
But this produces lots of useless data. What I need is to get all the statements for the given item, but I can't see any of the statements in the query above.
here it will be:
{ "instance of" : "chemical element",
"element symbol" : "Li",
"atomic number" : 3,
"oxidation state" : 1,
"subclass of" : ["chemical element", "alkali metal"]
// etc...
}
Is there an API for this or must I scrape the web page?
The information you want is in your query, except it's hard to decode. For example, this:
"P246": [
{
"id": "q568$E47B8CE7-C91D-484A-9DA4-6153F132997D",
"mainsnak": {
"snaktype": "value",
"property": "P246",
"datatype": "string",
"datavalue": {
"value": "Li",
"type": "string"
}
},
"type": "statement",
"rank": "normal",
"references": …
}
]
means that the “element symbol” (property P246) is “Li”. So, you will need to read all the properties from your query and then find out the name for each of the properties you found.
To get just the statements, you could also use action=wbgetclaims, but it's in the same format as above.