How to create and use GeoSpatial indexes in Marklogic from Sparql - sparql

I have loaded the geospatial data from geonames.org into Marklogic using RDF import.
When using the Query Console to explore the data, I see the data has been loaded into an xml document and looks like this:
<sem:triple>
<sem:subject>http://sws.geonames.org/2736540/</sem:subject>
<sem:predicate>http://www.w3.org/2003/01/geo/wgs84_pos#lat</sem:predicate>
<sem:object datatype="http://www.w3.org/2001/XMLSchema#string">40.41476</sem:object>
</sem:triple>
<sem:triple>
<sem:subject>http://sws.geonames.org/2736540/</sem:subject>
<sem:predicate>http://www.w3.org/2003/01/geo/wgs84_pos#long</sem:predicate>
<sem:object datatype="http://www.w3.org/2001/XMLSchema#string">-8.54304</sem:object>
</sem:triple>
I am able to do a SPARQL DESCRIBE and see data. Here is an example.
#prefix geonames: <http://www.geonames.org/ontology#> .
#prefix xs: <http://www.w3.org/2001/XMLSchema#> .
#prefix p0: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
<http://sws.geonames.org/2736540/> geonames:parentCountry <http://sws.geonames.org/2264397/> ;
geonames:countryCode "PT"^^xs:string ;
p0:long "-8.54304"^^xs:string ;
geonames:featureCode <http://www.geonames.org/ontology#P.PPL> ;
geonames:parentADM1 <http://sws.geonames.org/2742610/> ;
geonames:parentFeature <http://sws.geonames.org/2742610/> ;
<http://www.w3.org/2000/01/rdf-schema#isDefinedBy> "http://sws.geonames.org/2736540/about.rdf"^^xs:string ;
a geonames:Feature ;
geonames:locationMap <http://www.geonames.org/2736540/pedreira-de-vilarinho.html> ;
geonames:name "Pedreira de Vilarinho"^^xs:string ;
geonames:nearbyFeatures <http://sws.geonames.org/2736540/nearby.rdf> ;
geonames:featureClass geonames:P ;
p0:lat "40.41476"^^xs:string .
I want to query over this data using SPARQL QUERY as my Query Type in a way where I can take advantage of the geospatial indexes that MarkLogic can create.
I have been having trouble with two aspects of this.
How to correctly create the geospatial indexes for the wgs84_pos#lat and wgs84_pos#long predicates?
How do I access those indexes from SPARQL QUERY?
I would like to have a sparql query that would be able to find subjects within some range of a Point.
=====================================
Followup:
After following David Ennis's Answer (Which worked nicely, thanks!) I ended up with this sample Xquery that was able to select data out of documents via geosearch and then use those IRI's in a sparql values query.
Example:
xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics"
at "/MarkLogic/semantics.xqy";
let $matches := cts:search(//rdf:RDF,
cts:element-pair-geospatial-query (
fn:QName("http://www.geonames.org/ontology#","Feature"),
fn:QName("http://www.w3.org/2003/01/geo/wgs84_pos#", "lat"),
fn:QName ("http://www.w3.org/2003/01/geo/wgs84_pos#","long"),
cts:circle(10, cts:point(19.8,99.8))))
let $iris := sem:iri($matches//#rdf:about)
let $bindings := (fn:map(function($n) { map:entry("featureIRI", $n) }, $iris))
let $sparql := '
PREFIX wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT *
WHERE {
?featureIRI wgs:lat ?lat;
wgs:long ?long.
}
'
return sem:sparql-values($sparql, $bindings)
This xquery queries the geospatial index, finds matching documents and then selects the IRI in the rdf:about attribute of the xml document.
It then maps over all of those IRIs and creates map entries that can be passed in the bindings parameter of the sem:sparql-values function.

I do not believe you can do what you want via just native SPARQL. Geospacial queries in any SPARQL implementation are extensions like geoSPARQL, Apache Jena geospacial queries etc.
My suggested approach in MarkLogic:
Insert the geonames subjects into MarkLogic as unmanaged triples (an XML or JSON document with embedded triples for each one)
In the same document, include the geo-spacial data in one of the acceptable MarkLogic formats. This essentially adds geo-spacial metadata to the triple since it is in the same fragment.
Add geo-spacial path-range-indexes for the geospacial data.
Use SPARQL inside of MarkLogic with a cts query restriction.
The Building Blocks for above:
Understanding unmanaged triples
Understanding Geo-spacial Region Types
Understanding Geo-spacial Indexes
Understanding Geo-spacial Queries
Understanding Semantics with cts search
Another approach to the final query could be the Optic API but I do not see how it would negate the need to do steps 1-3

Related

GraphDB: use similarity search with embedded repository

I have been using GraphDB's (version 9.2) similarity search from the workbench. Now I also want to use this feature for an embedded repository using graphdb-free-runtime 9.2.1. However I have no clue how this feature can be used from the APIs provided by the runtime. My questions are:
In http://graphdb.ontotext.com/documentation/standard/semantic-similarity-searches.html it is mentioned that similarity search is a plugin. However I could not find any such plugin within the runtime. Do I have to load it from some external resource? Where?
Is there any documentation or example how to create a similarity index programmatically?
Would it be an option to run the workbench as some kind of server and access the similarity search via REST API? However the REST API documentation at http://graphdb.ontotext.com/documentation/9.0/free/using-the-workbench-rest-api.html does not mention any API for similarity searches. Am I missing something?
Any hints or pointers are welcome.
You could add similarity plugin runtime, setting following "graphdb.extra.plugins" property to directory where similarity-plugin (you could find such into GDB instance -> dist/lib/plugins) is located , or:
-Dgraphdb.extra.plugins=directory
System.setProperty("graphdb.extra.plugins", "directory");
You may create index programmatically using SPARQL or:
to create similarity text index "allNews" execute following update:
PREFIX : <http://www.ontotext.com/graphdb/similarity/>
PREFIX inst: <http://www.ontotext.com/graphdb/similarity/instance/>
PREFIX pred: <http://www.ontotext.com/graphdb/similarity/psi/>
insert {
inst:allNews :createIndex "-termweight idf" ;
:analyzer "org.apache.lucene.analysis.en.EnglishAnalyzer" ;
:documentID ?documentID .
?documentID :documentText ?documentText .
} where {
SELECT ?documentID ?documentText {
?documentID ?p ?documentText .
filter(isLiteral(?documentText))
}
}
to delete index "allNews" execute following update:
PREFIX :<http://www.ontotext.com/graphdb/similarity/>
PREFIX inst:<http://www.ontotext.com/graphdb/similarity/instance/>
insert { inst:allNews :deleteIndex '' } where {}
to rebuild index "allNews":
PREFIX :<http://www.ontotext.com/graphdb/similarity/>
PREFIX inst:<http://www.ontotext.com/graphdb/similarity/instance/>
insert { inst:allNews :rebuildIndex '' } where {}
followed by the create query!
to list all created indexes, execute following query:
PREFIX :<http://www.ontotext.com/graphdb/similarity/>
PREFIX inst:<http://www.ontotext.com/graphdb/similarity/instance/>
select ?index ?status ?type
where {
?index :status ?status .
?index :type ?type .
}

Is it possible to get the nested json output in SPARQL?

I am using MarkLogic 8.0-6.3
I came across a scenario where I need nested JSON output in SPARQL.
For example there are multiple same predicates for an IRI, in the result I want the multiple values in array not as a whole triple.
for example:
Assume triples:
#prefix p0: <http://www.mla.com/term/> .
p0:7 <http://www.w3.org/2004/02/skos/core#narrower> p0:768 ,
p0:769 ,
p0:770 ,
p0:771 .
SPARQL query:
PREFIX skos-mla: <http://www.mlacustom.com#>
PREFIX term: <http://www.mla.com/term/>
select ?iri ?o {
graph<thesaurus-term>{
bind(term:7 as ?iri)
term:7 skos:narrower ?o .
}
}
the above query will return the 4 triples as output.
What I want is it should just return me a single json structure like
{
"iri": "http://www.mla.com/term/7",
"narrowers": ["http://www.mla.com/term/768", "http://www.mla.com/term/769", "http://www.mla.com/term/770", "http://www.mla.com/term/771"]
}
Above JSON is just to explain the problem.
In actual I would need a more complex json structure like instead of string array I need an array of json objects.
I know that I can read the response and recreate the whole json in any format but it has performance impacts.
In recent versions of MarkLogic 9, the Optic API can support this requirement:
Use the op.fromSPARQL() accessor to project columns of values from the triples.
Chain a select() call using op.jsonObject() to collect the values as properties of objects.
Chain a groupBy() call using op.arrayAggregate() to aggregate the objects in an array.
Chain a result() call to get the output.
For more information, see:
http://docs.marklogic.com/op.jsonObject
and:
http://docs.marklogic.com/op.arrayAggregate
Hoping that helps,

How to compare values, ignoring diacritics, with SPARQL

I've been trying (with no success so far) to filter values with a "broader equals" condition. That is, ignoring diacritics.
select * where {
?s per:surname1 ?t.
bind (fn:starts-with(str(?t),'Maria') as ?noAccent1) .
bind (fn:translate(str(?t),"áéíóú","aeiou") as ?noAccent2) .
} limit 100
To this moment, I've tried with XPath functions fn:contains, fn:compare, fn:translate, fn:starts-with, but none of them seem to be working.
Is there any other way (other than chaining replaces) to add collation into these functions or achieve the same goal?
The XPath functions you mention are not part of the SPARQL standard really, so as you found out, you can't rely on them being supported out of the box (though some vendors may provide them as an add-on).
However, GraphDB (which is based on RDF4J) allows you to create your own custom functions in SPARQL. It is a matter of writing a Java class that implements the org.eclipse.rdf4j.query.algebra.evaluation.function.Function interface, and registering it in the RDF4J engine by packaging it as a Java Service Provider Interface (SPI) implementation.
SPARQL and REGEX do not support efficiently transliterating character maps. If you want an efficient implementation you would need a custom RDF4J custom as described by Jeen.
If you want a quick and dirty solution use this code sample:
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
PREFIX spif: <http://spinrdf.org/spif#>
select * where {
BIND("Mariana" as ?t) .
BIND("Márénísótú" as ?t2) .
BIND (regex(str(?t),'^Maria') as ?noAccent1) .
BIND (spif:replaceAll(
spif:replaceAll(
spif:replaceAll(
spif:replaceAll(
spif:replaceAll(str(?t2),"á","a"),
"é","e")
,"í","i"),
"ó","o"),
"ú","u") as ?noAccent2) .
}

How to filter query SPARQL for property "type"

I have a data source file that one of its properties is an actual class instance:
<clinic:Radiology rdf:ID="rad1234">
<clinic:diagnosis>Stage 4</clinic:diagnosis>
<clinic:ProvidedBy rdf:resource="#MountSinai"/>
<clinic:ReceivedBy rdf:resource="#JohnSmith"/>
<clinic:patientId>7890123</clinic:patientId>
<clinic:radiologyDate>01-01-2017</clinic:radiologyDate>
</clinic:Radiology>
so clinic:ProvidedBy is pointing to this:
<clinic:Radiologists rdf:ID="MountSinai">
<clinic:name>Mount Sinai</clinic:name>
<clinic:npi>1234567</clinic:npi>
<clinic:specialty>Oncology</clinic:specialty>
</clinic:Radiologists>
How do I query using the property clinic:providedBy (which is of type clinic:Radiologists)? Whatever I have tried does not bring back results.
It's also not clear what exactly you want to have, so my answer will return "all radiology resources that are provided by MountSinai":
PREFIX clinic: <THE NAMESPACE OF_THE_CLINIC_PREFIX>
PREFIX : <THE_BASE_NAMESPACE_OF_YOUR_RDF_DOCUMENT>
SELECT DISTINCT ?s WHERE {
?s clinic:ProvidedBy :MountSinai
}
But, I really suggest to start with an RDF and SPARQL tutorial, since form your comment your query
SELECT * WHERE { ?x rdf:resource "#MountSinai" }
is missing fundamental SPARQL basics. And for writing a matching SPARQL query it'S always good to have a look at the data in Turtle resp. N-Triples format both of which being closer to the SPARQL syntax.

How Do I Query Against Data.gov

I am trying to teach myself this weekend how to run API queries against a data source in this case data.gov. At first I thought I'd use a simple SQL variant, but it seems in this case I have to use SPARQL.
I've read through the documentation, downloaded Twinkle, and can't seem to quite get it to run. Here is an example of a query I'm running. I'm basically trying to find all gas stations that are null around Denver, CO.
PREFIX station: https://api.data.gov/nrel/alt-fuel-stations/v1/nearest.json?api_key=???location=Denver+CO
SELECT *
WHERE
{ ?x station:network ?network like "null"
}
Any help would be very much appreciated.
SPARQL is a graph pattern language for RDF triples. A query consists of a set of "basic graph patterns" described by triple patterns of the form <subject>, <predicate>, <object>. RDF defines the subject and predicate with URI's and the object is either a URI (object property) or literal (datatype or language-tagged property). Each triple pattern in a query must therefore have three entities.
Since we don't have any examples of your data, I'll provide a way to explore the data a bit. Let's assume your prefix is correctly defined, which I doubt - it will not be the REST API URL, but the URI of the entity itself. Then you can try the following:
PREFIX station: <http://api.data.gov/nrel...>
SELECT *
WHERE
{ ?s station:network ?network .
}
...setting the PREFIX to correctly represent the namespace for network. Then look at the binding for ?network and find out how they represent null. Let's say it is a string as you show. Then the query would look like:
PREFIX station: <http://api.data.gov/nrel...>
SELECT ?s
WHERE
{ ?s station:network "null" .
}
There is no like in SPARQL, but you could use a FILTER clause using regex or other string matching features of SPARQL.
And please, please, please google "SPARQL" and "RDF". There is lots of information about SPARQL, and the W3C's SPARQL 1.1 Query Language Recommendation is a comprehensive source with many good examples.