GraphDB Elasticsearch Connector - graphdb

is there a working example to map lat long properties from graphdb to geo_point objects on elastic search ?
{
"fieldName": "location",
"propertyChain": [
"http://example.com/coordinates"
],
"objectFields": [
{
"fieldName": "lat",
"propertyChain": [
"http://www.w3.org/2003/01/geo/wgs84_pos#lat"
]
},
{
"fieldName": "lon",
"propertyChain": [
"http://www.w3.org/2003/01/geo/wgs84_pos#long"
]
}
]
}
thanks

The only way to index data as geo_point with the current version of GraphDB and the Elasticsearch connector is to have the latitude and the longitude in a single literal, e.g. with the property http://www.w3.org/2003/01/geo/wgs84_pos#lat_long. The connector would look like this:
PREFIX : <http://www.ontotext.com/connectors/elasticsearch#>
PREFIX inst: <http://www.ontotext.com/connectors/elasticsearch/instance#>
INSERT DATA {
inst:geopoint :createConnector '''
{
"elasticsearchNode": "localhost:9300",
"types": ["http://geopoint.ontotext.com/Point"],
"fields": [
{
"fieldName": "location",
"propertyChain": [
"http://www.w3.org/2003/01/geo/wgs84_pos#lat_long"
],
"datatype": "native:geo_point"
}
],
}
''' .
}
Note that datatype: "native:geo_point" is important as it tells Elasticsearch what type of data this is.
We are currently looking into possible ways to introduce support for latitude and longitude coming from separate literals.

Related

karate.repeat creates a malformed JSON that is unable to traverse

I have a scenario where I need to call a secondary feature file that contains an API call where the response is a JSON object. However, I need to call this scenario multiple times, so I am using karate.repeat to achieve this. However, the resulting response is a malformed JSON that I cannot traverse.
This is what I am doing:
* def fun = function(i){ return karate.call('abc.feature#abc', value)}
* def loop = karate.repeat(2, fun)
* karate.log(loop)
The response I get is:
{
"Total_packages1": {
"package1": {
"tags": [
"kj21",
"j1",
"sj2",
"z1"
],
"expectedResponse": [
{
"firstName": "Name",
"lastName": "lastName",
"purchase": [
{
"title": "title",
"category": [
"a",
"b",
"c"
]
}
]
}
]
}
}
}
{
"Total_packages2": {
"package2": {
"tags": [
"kj212",
"j12",
"sj22",
"z12"
],
"expectedResponse": [
{
"firstName": "Name2",
"lastName": "lastName2",
"purchase": [
{
"title": "title2",
"category": [
"a2",
"b2",
"c2"
]
}
]
}
]
}
}
}
As you can see, Total_packages2 starts malformed. I need to grab the "tags" values from each package, however, I cannot simply do Total_packages1.package1.tags like I could with a single response in the JSON.
If I cannot achieve what I need by karate.repeat, is there another method that is recommended for looping like this? I haven't found anything in the documentation for this particular scenario.
Don't use karate.repeat() use call with a JSON array. Read this part of the docs: https://github.com/karatelabs/karate#data-driven-features

Getting the last datum in a vega dataset

I have a data source A and I'd like to create a new data source B containing just the last element of A. What is the best way to do this in Vega?
This is relatively straight forward to do. Although I am slightly confused by your use of "max" in the aggregation since this isn't the last value?
Either way here is my solution for obtaining the last value in a dataset using this series of transforms,
transform: [
{
type: window
ops: [
row_number
]
}
{
type: joinaggregate
fields: [
row_number
]
ops: [
max
]
as: [
max_row_number
]
}
{
type: filter
expr: datum.row_number==datum.max_row_number
}
]
I was able to get this working in the Vega Editor using the following:
{
"$schema": "https://vega.github.io/schema/vega/v5.json",
"data": [
{
"name": "source",
"url": "https://raw.githubusercontent.com/vega/vega/master/docs/data/cars.json",
"transform": [
{
"type": "filter",
"expr": "datum['Horsepower'] != null && datum['Miles_per_Gallon'] != null && datum['Acceleration'] != null"
}
]
},
{
"name": "avg",
"source":"source",
"transform":[
{
"type":"aggregate",
"groupby":["Horsepower"],
"ops": ["average"],
"fields":["Miles_per_Gallon"],
"as":["Avg_Miles_per_Gallon"]
}
]
},
{
"name":"last",
"source": "avg",
"transform": [
{
"type": "aggregate",
"ops": ["max"],
"fields": ["Horsepower"],
"as": ["maxHorsepower"]
},
{
"type": "lookup",
"from": "avg",
"key": "Horsepower",
"fields": ["maxHorsepower"],
"values": ["Horsepower","Avg_Miles_per_Gallon"]
}
]
}
]
}
maxHorsepower
Horsepower
Avg_Miles_per_Gallon
230
230
16
I'd be interested to know if there are better ways, but this worked for me.

GraphDB Lucene connector: indexing rdfs:label values of a single language

I'm going to pose a question about indexes in GraphDB Lucene connector.
In the context of a multilingual rdf resource, how is it possible to index the rdfs:label values of a single language (for example english) ?
I tried with this:
PREFIX inst: <http://www.ontotext.com/connectors/lucene/instance#>
PREFIX : <http://www.ontotext.com/connectors/lucene#>
INSERT DATA {
inst:lexicalEntryIndex :createConnector '''
{
"types": [
"http://www.w3.org/ns/lemon/ontolex#LexicalEntry"
],
"fields": [
{
"fieldName": "type",
"propertyChain": [
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"http://www.w3.org/2000/01/rdf-schema#label"
],
"languages": [
"en"
]
}
]
}
''' .
}
but all the languages are indexed.
Thanks in advance,
Andrea
The GraphDB Lucene Connector documentation clearly demonstrates how to index a single language.
Here is a sample snippet how to do it:
PREFIX luc: <http://www.ontotext.com/connectors/lucene#>
PREFIX luc-index: <http://www.ontotext.com/connectors/lucene/instance#>
INSERT DATA {
luc-index:my_index luc:createConnector '''
{
"types": ["http://www.ontotext.com/example#gadget"],
"fields": [
{
"fieldName": "name",
"propertyChain": [
"http://www.ontotext.com/example#name"
]
},
{
"fieldName": "nameLanguage",
"propertyChain": [
"http://www.ontotext.com/example#name",
"lang()"
]
}
], "entityFilter":"?nameLanguage in (\\"en\\")"
}
''' .
}

How to query microsoft academic graph for citation and co-citation?

Reading through:
https://www.microsoft.com/cognitive-services/en-us/Academic-Knowledge-API/documentation/GraphSearchMethod
It is a bit obscure the meaning of "path":
"path": "/paper/AuthorIDs/author" - I don't see authorIds object in the returned results.
# post data query
{
"path": "/paper/AuthorIDs/author",
"paper": {
"type": "Paper",
"NormalizedTitle": "graph engine",
"select": [
"OriginalTitle"
]
},
"author": {
"return": {
"type": "Author",
"Name": "bin shao"
}
}
}
#results
{
"Results": [
[
{
"CellID": 2160459668,
"OriginalTitle": "Trinity: a distributed graph engine on a memory cloud"
},
{
"CellID": 2093502026
}
],
[
{
"CellID": 2171539317,
"OriginalTitle": "A distributed graph engine for web scale RDF data"
},
{
"CellID": 2093502026
}
],
[
{
"CellID": 2411554868,
"OriginalTitle": "A distributed graph engine for web scale RDF data"
},
{
"CellID": 2093502026
}
],
[
{
"CellID": 73304046,
"OriginalTitle": "The Trinity graph engine"
},
{
"CellID": 2093502026
}
]
]
}
Which is the correct path (or data to post) to query for citation and co-citation of an article, and paginate results?
You will find AuthorIDs on the graph schema from Microsoft Academic Search:
Assuming you know the ID of the source paper (2118322263 in the following example), here is the POST part of the request:
{
"path": "/paper/CitationIDs/citation",
"paper": {
"type": "Paper",
"id": [ 2118322263 ],
"select": [
"OriginalTitle"
]
},
"citation": {
"return": {
"type": "Paper"
},
"select": [
"OriginalTitle"
]
}
}
This returns 634 results in one response, while a query to the paper itself shows a citation count of 732. I have no idea why there is a difference, nor how to do pagination.

ElasticSearch - how to give priority to the matching from the same row

I have the following documents in ElasticSearch 0.19.11, using:
{ "title": "dogs species",
"col_names": [ "name", "description", "country_of_origin" ],
"rows": [
{ "row": [ "Boxer", "good dog", "Germany" ] },
{ "row": [ "Irish Setter", "great dog", "Ireland" ] }
]
}
{ "title": "Misc stuff",
"col_names": [ "foo" ],
"rows": [
{ "row": [ "Setter is impotant" ] },
{ "row": [ "Ireland is green" ] }
]
}
The mapping is as follows:
{
"table" : {
"properties" : {
"title" : {"type" : "string"},
"col_names" : {"type" : "string"},
"rows" : {
"properties" : {
"row" : {"type" : "string"}
}
}
}
}
}
Question: I'm now searching for "Ireland Setter" and I need to have a higher score for documents that have search terms in the same row.
Currently the second document gets score of 0.22, while the first one - 0.14.
I want the first document to get a higher score in this case, since it has both "Ireland" and "Setter" in the same row. How can it be done?
With great cooperation from ElasticSearch google-group members, the solution is found.
Here is the link to the discussion: https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/4O9dff2SNhg