I have a data source A and I'd like to create a new data source B containing just the last element of A. What is the best way to do this in Vega?
This is relatively straight forward to do. Although I am slightly confused by your use of "max" in the aggregation since this isn't the last value?
Either way here is my solution for obtaining the last value in a dataset using this series of transforms,
transform: [
{
type: window
ops: [
row_number
]
}
{
type: joinaggregate
fields: [
row_number
]
ops: [
max
]
as: [
max_row_number
]
}
{
type: filter
expr: datum.row_number==datum.max_row_number
}
]
I was able to get this working in the Vega Editor using the following:
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"data": [
{
"name": "source",
"url": "https://raw.githubusercontent.com/vega/vega/master/docs/data/cars.json",
"transform": [
{
"type": "filter",
"expr": "datum['Horsepower'] != null && datum['Miles_per_Gallon'] != null && datum['Acceleration'] != null"
}
]
},
{
"name": "avg",
"source":"source",
"transform":[
{
"type":"aggregate",
"groupby":["Horsepower"],
"ops": ["average"],
"fields":["Miles_per_Gallon"],
"as":["Avg_Miles_per_Gallon"]
}
]
},
{
"name":"last",
"source": "avg",
"transform": [
{
"type": "aggregate",
"ops": ["max"],
"fields": ["Horsepower"],
"as": ["maxHorsepower"]
},
{
"type": "lookup",
"from": "avg",
"key": "Horsepower",
"fields": ["maxHorsepower"],
"values": ["Horsepower","Avg_Miles_per_Gallon"]
}
]
}
]
}
maxHorsepower
Horsepower
Avg_Miles_per_Gallon
230
230
16
I'd be interested to know if there are better ways, but this worked for me.
Related
I faced with the issue when I try to search for several words including a special character (section sign "§").
Example: AB § 32.
I want all words "AB", "32" and symbol "§" to be included in found documents.
In some cases document can be found, in some not.
If my document contains the following text then search finds it:
Lagrum: 32 § 1 mom. första stycket a) kommunalskattelagen (1928:370) AB
But if document contains this text then search doesn't find:
Lagrum: 32 § 1 mom. första stycket AB
For symbol "§" I use UT8-encoding "\xc2\xa7".
Index uses "lucene.swedish" analyzer.
"Content": [
{
"analyzer": "lucene.swedish",
"minGrams": 4,
"tokenization": "nGram",
"type": "autocomplete"
},
{
"analyzer": "lucene.swedish",
"type": "string"
}
]
Query looks like:
{
"index": "test_index",
"compound": {
"filter": [
{
"text": {
"query": [
"111111111111"
],
"path": "ProductId"
}
},
],
"must": [
{
"autocomplete": {
"query": [
"AB"
],
"path": "Content"
}
},
{
"autocomplete": {
"query": [
"\xc2\xa7",
],
"path": "Content"
}
},
{
"autocomplete": {
"query": [
"32"
],
"path": "Content"
}
}
],
},
"count": {
"type": "lowerBound",
"threshold": 500
}
}
The question is what is wrong with the search and how can I get a correct result (return both above mentioned documents) ?
Focusing only on the content field, here is an index definition that should work for your requirements. The docs are here. Let me know if this works for you.
{
"mappings": {
"dynamic": false,
"fields": {
"content": [
{
"type": "autocomplete",
"tokenization": "nGram",
"minGrams": 4,
"maxGrams": 7,
"foldDiacritics": false,
"analyzer": "lucene.whitespace"
},
{
"analyzer": "lucene.swedish",
"type": "string"
}
]
}
}
}
This question already has an answer here:
Is there a simple match for objects containing array where the array content order doesn't matter?
(1 answer)
Closed 1 year ago.
Trying to match two jsons, but getting test fails. Well, both jsons are the same but objects indexes inside the array are not same. I think should not make any difference. Following are two jsons:
This is the code line: And match response contains ScenarioModelResponse where
**response : **
{
"relationships": [
{
"sourceId": "36",
"targetId": "149",
"type": "Reid Enright"
}
],
"modelId": "027f93d1-ef9e-4f1e-b2c4-684436c5b18a",
"elements": [
{
"externalRefId": "36",
"attributes": {
"jsonPbject": "Reid Enright"
},
"id": "057f7b7e-11b9-4779-97c0-67485153c285",
"type": "Rocky Shore"
},
{
"externalRefId": "149",
"attributes": {
"jsonPbject": "Ben Lyon"
},
"id": "325b989e-b299-4cfc-86b5-0813106da38e",
"type": "Claire Voyance"
}
]
}
ScenarioModelResponse :
{
"relationships": [
{
"sourceId": "36",
"targetId": "149",
"type": "Reid Enright"
}
],
"modelId": "027f93d1-ef9e-4f1e-b2c4-684436c5b18a",
"elements": [
{
"externalRefId": "149",
"attributes": {
"jsonPbject": "Ben Lyon"
},
"id": "325b989e-b299-4cfc-86b5-0813106da38e",
"type": "Claire Voyance"
},
{
"externalRefId": "36",
"attributes": {
"jsonPbject": "Reid Enright"
},
"id": "057f7b7e-11b9-4779-97c0-67485153c285",
"type": "Rocky Shore"
}
]
}
This the error I am getting after execution :
$.elements[0].externalRefId | not equal (STRING:STRING)
'149'
'36'
The arrays are NOT the same. This can be solved in 2 lines:
* match response.relationships == expected.relationships
* match response.elements contains only expected.elements
For a detailed explanation, refer:
https://stackoverflow.com/a/65939070/143475
https://stackoverflow.com/a/55710769/143475
I'm fairly new to JSONPath so this could be my fault but when I try this expression in an online evaluator (https://jsonpath.com/) it works but does not in Karate.
$..entry[?(#.resource.resourceType == 'AllergyIntolerance' && #.resource.category=='food')].resource.code.coding.*.system
If I use an index I am able to get the first element out but I want to grab all elements that match the expression regardless of their index in case there are more items in the array and not my specific data example.
Working JSONPath:
$..entry[?(#.resource.resourceType == 'AllergyIntolerance' && #.resource.category[0]=='food')].resource.code.coding.*.system
I've tried to use wildcards but that doesn't seem to work:
$..entry[?(#.resource.resourceType == 'AllergyIntolerance' && #.resource.category[*]=='food')].resource.code.coding.*.system
JSON snippit with relevant sections
{
"entry": [ {
"resource": {
"resourceType": "AllergyIntolerance",
"id": "allergyFood",
"category": [ "food" ],
"criticality": "high",
"code": {
"coding": [ {
"system": "http://snomed.info/sct",
"code": "91935009",
"display": "Allergy to peanuts"
} ],
"text": "Allergy to peanuts"
},
"reaction": [ {
"manifestation": [ {
"coding": [ {
"system": "http://snomed.info/sct",
"code": "271807003",
"display": "skin rash"
} ],
"text": "skin rash"
} ],
"severity": "mild"
} ]
}
}, {
"resource": {
"resourceType": "AllergyIntolerance",
"id": "allergyMed",
"verificationStatus": "unconfirmed",
"type": "allergy",
"category": [ "medication" ],
"criticality": "high",
"code": {
"coding": [ {
"system": "http://www.nlm.nih.gov/research/umls/rxnorm",
"code": "7980",
"display": "penicillin"
} ]
}
}
} ]
}
The JsonPath engine is known to have issues with such complex expressions. Please use karate.filter() instead which I am sure you will agree is much more readable: https://github.com/intuit/karate#json-transforms
* def resources = $..resource
* def fun = function(x){ return x.resourceType == 'AllergyIntolerance' && x.category[0] == 'food' }
* def temp = karate.filter(resources, fun)
I am using Cosmos DB and have a document with the following simplified structure:
{
"id1":"123",
"stuff": [
{
"id2": "stuff",
"a": {
"b": {
"c": {
"d": [
{
"e": [
{
"id3": "things",
"name": "animals",
"classes": [
{
"name": "ostrich",
"meta": 1
},
{
"name": "big ostrich",
"meta": 1
}
]
},
{
"id3": "default",
"name": "other",
"classes": [
{
"name": "green trees",
"meta": 1
},
{
"name": "trees",
"score": 1
}
]
}
]
}
]
}
}
}
}
]
}
My issue is - I have an array of these documents and need to search name to see if it matches my search word. For example I want both big trees and trees to return if a user types in trees.
So currently I push every document into an array and do the following:
For each document
for each stuff
for each a.b.c.d[0].e
for each classes
var splice = name.split(' ')
if (splice.includes(searchWord))
return id1, id2 and id3.
Using cosmosDB I am using SQL with the following code:
client.queryDocuments(
collection,
`SELECT * FROM root r`
).toArray((err, results) => {stuff});
This effectively brings every document in my collection into an array to perform the search manually above as mentioned.
This is going to cause issues when I have 1000s or 1,000,000s of documents in the array and I believe I should be leveraging the search mechanics available within Cosmos itself. Is anyone able to help me to work out what SQL query would be able to perform this type of function?
Having searched everything is it also possible to search the 5 latest documents?
Thanks for any insight in advance!
1.Is anyone able to help me to work out what SQL query would be able to
perform this type of function?
According to your sample and description, I suggest you using ARRAY_CONTAINS in cosmos db sql. Please refer to my sample:
sample documents:
[
{
"id1": "123",
"stuff": [
{
"id2": "stuff",
"a": {
"b": {
"c": {
"d": [
{
"e": [
{
"id3": "things",
"name": "animals",
"classes": [
{
"name": "ostrich",
"meta": 1
},
{
"name": "big ostrich",
"meta": 1
}
]
},
{
"id3": "default",
"name": "other",
"classes": [
{
"name": "green trees",
"meta": 1
},
{
"name": "trees",
"score": 1
}
]
}
]
}
]
}
}
}
}
]
},
{
"id1": "456",
"stuff": [
{
"id2": "stuff2",
"a": {
"b": {
"c": {
"d": [
{
"e": [
{
"id3": "things2",
"name": "animals",
"classes": [
{
"name": "ostrich",
"meta": 1
},
{
"name": "trees",
"meta": 1
}
]
},
{
"id3": "default2",
"name": "other",
"classes": [
{
"name": "green trees",
"meta": 1
},
{
"name": "trees",
"score": 1
}
]
}
]
}
]
}
}
}
}
]
},
{
"id1": "789",
"stuff": [
{
"id2": "stuff3",
"a": {
"b": {
"c": {
"d": [
{
"e": [
{
"id3": "things3",
"name": "animals",
"classes": [
{
"name": "ostrich",
"meta": 1
},
{
"name": "big",
"meta": 1
}
]
},
{
"id3": "default3",
"name": "other",
"classes": [
{
"name": "big trees",
"meta": 1
}
]
}
]
}
]
}
}
}
}
]
}
]
query :
SELECT distinct c.id1,stuff.id2,e.id3 FROM c
join stuff in c.stuff
join d in stuff.a.b.c.d
join e in d.e
where ARRAY_CONTAINS(e.classes,{name:"trees"},true)
or ARRAY_CONTAINS(e.classes,{name:"big trees"},true)
output:
2.Having searched everything is it also possible to search the 5 latest
documents?
Per my research, features like LIMIT is not supported in cosmos so far. However , TOP is supported by cosmos db. So if you could add sort field(such as date or id), then you could use sql:
select top 5 from c order by c.sort desc
I can't find very much documentation on how to properly define the index function such that I can do a full text search on the information that I need.
I've used the Alchemy API to add "entities" json to my documents.
For instance, I have a document with the following:
"_id": "redacted",
"_rev": "redacted",
"session": "20152016",
"entities": [
{
"relevance": "0.797773",
"count": "3",
"type": "Organization",
"text": "California Constitution"
},
{
"relevance": "0.690092",
"count": "1",
"type": "Organization",
"text": "Governors Highway Safety Association"
}
]
I haven't been able to find any code snippets showing how to construct a search index function that looks at nested json.
My stab at indexing the whole object appears to be incorrect.
This is the full design document:
{
"_id": "_design/entities",
"_rev": "redacted",
"views": {},
"language": "javascript",
"indexes": {
"entities": {
"analyzer": "standard",
"index": "function (doc) {\n if (doc.entities.relevance > 0.5){\n index(\"default\", doc.entities.text, {\"store\":\"yes\"});\n }\n\n}"
}
}
}
And the search index formatted a little bit more clearly is
function (doc) {
if (doc.entities.relevance > 0.5){
index("default", doc.entities.text, {"store":"yes"});
}
}
Adding the for loop as suggested below makes a lot of sense.
However, I still am not able to return any results.
My query is
"https://user.cloudant.com/calbills/_design/entities/_search/entities?q=Governors"
Server response is:
{"total_rows":0,"bookmark":"g2o","rows":[]}
The "for..in" style loop doesn't seem to work.
However, I do get results using the more standard for loop loops.
function (doc) {
if(doc.entities){
var arrayLength = doc.entities.length;
for (var i = 0; i < arrayLength; i++) {
if (parseFloat(doc.entities[i].relevance) > 0.5)
index("default", doc.entities[i].text);
}
}
}
Cheers!
Your need to loop on the elements in the doc.entities array.
function (doc) {
for(entity in doc.entities){
if (parseFloat(entity.relevance) > 0.5){
index("default", entity.text, {"store":"yes"});
}
}
}
This is what I tried :
function(doc){
if(doc.entities){
for( var p in doc.entities ){
if (doc.entities[p].relevance > 0.5)
{
index("entitiestext", doc.entities[p].text, {"store":"yes"});
}
}
}
}
Query String used :"q=entitiestext:California Constitution&include_docs=true"
Result:
{
"total_rows": 1,
"bookmark": "xxxx",
"rows": [
{
"id": "redacted",
"order": [
0.03693288564682007,
1
],
"fields": {
"entitiestext": [
"Governors Highway Safety Association",
"California Constitution"
]
},
"doc": {
"_id": "redacted",
"_rev": "4-7f6e6db246abcf2f884dc0b91451272a",
"session": "20152016",
"entities": [
{
"relevance": "0.797773",
"count": "3",
"type": "Organization",
"text": "California Constitution"
},
{
"relevance": "0.690092",
"count": "1",
"type": "Organization",
"text": "Governors Highway Safety Association"
}
]
}
}
]
}
Query String used: q=entitiestext:California Constitution
Result:
{
"total_rows": 1,
"bookmark": "xxxx",
"rows": [
{
"id": "redacted",
"order": [
0.03693288564682007,
1
],
"fields": {
"entitiestext": [
"Governors Highway Safety Association",
"California Constitution"
]
}
}
]
}