I am using MarkLogic 8.0-6.3
I came across a scenario where I need nested JSON output in SPARQL.
For example there are multiple same predicates for an IRI, in the result I want the multiple values in array not as a whole triple.
for example:
Assume triples:
#prefix p0: <http://www.mla.com/term/> .
p0:7 <http://www.w3.org/2004/02/skos/core#narrower> p0:768 ,
p0:769 ,
p0:770 ,
p0:771 .
SPARQL query:
PREFIX skos-mla: <http://www.mlacustom.com#>
PREFIX term: <http://www.mla.com/term/>
select ?iri ?o {
graph<thesaurus-term>{
bind(term:7 as ?iri)
term:7 skos:narrower ?o .
}
}
the above query will return the 4 triples as output.
What I want is it should just return me a single json structure like
{
"iri": "http://www.mla.com/term/7",
"narrowers": ["http://www.mla.com/term/768", "http://www.mla.com/term/769", "http://www.mla.com/term/770", "http://www.mla.com/term/771"]
}
Above JSON is just to explain the problem.
In actual I would need a more complex json structure like instead of string array I need an array of json objects.
I know that I can read the response and recreate the whole json in any format but it has performance impacts.
In recent versions of MarkLogic 9, the Optic API can support this requirement:
Use the op.fromSPARQL() accessor to project columns of values from the triples.
Chain a select() call using op.jsonObject() to collect the values as properties of objects.
Chain a groupBy() call using op.arrayAggregate() to aggregate the objects in an array.
Chain a result() call to get the output.
For more information, see:
http://docs.marklogic.com/op.jsonObject
and:
http://docs.marklogic.com/op.arrayAggregate
Hoping that helps,
Related
I have loaded the geospatial data from geonames.org into Marklogic using RDF import.
When using the Query Console to explore the data, I see the data has been loaded into an xml document and looks like this:
<sem:triple>
<sem:subject>http://sws.geonames.org/2736540/</sem:subject>
<sem:predicate>http://www.w3.org/2003/01/geo/wgs84_pos#lat</sem:predicate>
<sem:object datatype="http://www.w3.org/2001/XMLSchema#string">40.41476</sem:object>
</sem:triple>
<sem:triple>
<sem:subject>http://sws.geonames.org/2736540/</sem:subject>
<sem:predicate>http://www.w3.org/2003/01/geo/wgs84_pos#long</sem:predicate>
<sem:object datatype="http://www.w3.org/2001/XMLSchema#string">-8.54304</sem:object>
</sem:triple>
I am able to do a SPARQL DESCRIBE and see data. Here is an example.
#prefix geonames: <http://www.geonames.org/ontology#> .
#prefix xs: <http://www.w3.org/2001/XMLSchema#> .
#prefix p0: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
<http://sws.geonames.org/2736540/> geonames:parentCountry <http://sws.geonames.org/2264397/> ;
geonames:countryCode "PT"^^xs:string ;
p0:long "-8.54304"^^xs:string ;
geonames:featureCode <http://www.geonames.org/ontology#P.PPL> ;
geonames:parentADM1 <http://sws.geonames.org/2742610/> ;
geonames:parentFeature <http://sws.geonames.org/2742610/> ;
<http://www.w3.org/2000/01/rdf-schema#isDefinedBy> "http://sws.geonames.org/2736540/about.rdf"^^xs:string ;
a geonames:Feature ;
geonames:locationMap <http://www.geonames.org/2736540/pedreira-de-vilarinho.html> ;
geonames:name "Pedreira de Vilarinho"^^xs:string ;
geonames:nearbyFeatures <http://sws.geonames.org/2736540/nearby.rdf> ;
geonames:featureClass geonames:P ;
p0:lat "40.41476"^^xs:string .
I want to query over this data using SPARQL QUERY as my Query Type in a way where I can take advantage of the geospatial indexes that MarkLogic can create.
I have been having trouble with two aspects of this.
How to correctly create the geospatial indexes for the wgs84_pos#lat and wgs84_pos#long predicates?
How do I access those indexes from SPARQL QUERY?
I would like to have a sparql query that would be able to find subjects within some range of a Point.
=====================================
Followup:
After following David Ennis's Answer (Which worked nicely, thanks!) I ended up with this sample Xquery that was able to select data out of documents via geosearch and then use those IRI's in a sparql values query.
Example:
xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics"
at "/MarkLogic/semantics.xqy";
let $matches := cts:search(//rdf:RDF,
cts:element-pair-geospatial-query (
fn:QName("http://www.geonames.org/ontology#","Feature"),
fn:QName("http://www.w3.org/2003/01/geo/wgs84_pos#", "lat"),
fn:QName ("http://www.w3.org/2003/01/geo/wgs84_pos#","long"),
cts:circle(10, cts:point(19.8,99.8))))
let $iris := sem:iri($matches//#rdf:about)
let $bindings := (fn:map(function($n) { map:entry("featureIRI", $n) }, $iris))
let $sparql := '
PREFIX wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT *
WHERE {
?featureIRI wgs:lat ?lat;
wgs:long ?long.
}
'
return sem:sparql-values($sparql, $bindings)
This xquery queries the geospatial index, finds matching documents and then selects the IRI in the rdf:about attribute of the xml document.
It then maps over all of those IRIs and creates map entries that can be passed in the bindings parameter of the sem:sparql-values function.
I do not believe you can do what you want via just native SPARQL. Geospacial queries in any SPARQL implementation are extensions like geoSPARQL, Apache Jena geospacial queries etc.
My suggested approach in MarkLogic:
Insert the geonames subjects into MarkLogic as unmanaged triples (an XML or JSON document with embedded triples for each one)
In the same document, include the geo-spacial data in one of the acceptable MarkLogic formats. This essentially adds geo-spacial metadata to the triple since it is in the same fragment.
Add geo-spacial path-range-indexes for the geospacial data.
Use SPARQL inside of MarkLogic with a cts query restriction.
The Building Blocks for above:
Understanding unmanaged triples
Understanding Geo-spacial Region Types
Understanding Geo-spacial Indexes
Understanding Geo-spacial Queries
Understanding Semantics with cts search
Another approach to the final query could be the Optic API but I do not see how it would negate the need to do steps 1-3
I have a data source file that one of its properties is an actual class instance:
<clinic:Radiology rdf:ID="rad1234">
<clinic:diagnosis>Stage 4</clinic:diagnosis>
<clinic:ProvidedBy rdf:resource="#MountSinai"/>
<clinic:ReceivedBy rdf:resource="#JohnSmith"/>
<clinic:patientId>7890123</clinic:patientId>
<clinic:radiologyDate>01-01-2017</clinic:radiologyDate>
</clinic:Radiology>
so clinic:ProvidedBy is pointing to this:
<clinic:Radiologists rdf:ID="MountSinai">
<clinic:name>Mount Sinai</clinic:name>
<clinic:npi>1234567</clinic:npi>
<clinic:specialty>Oncology</clinic:specialty>
</clinic:Radiologists>
How do I query using the property clinic:providedBy (which is of type clinic:Radiologists)? Whatever I have tried does not bring back results.
It's also not clear what exactly you want to have, so my answer will return "all radiology resources that are provided by MountSinai":
PREFIX clinic: <THE NAMESPACE OF_THE_CLINIC_PREFIX>
PREFIX : <THE_BASE_NAMESPACE_OF_YOUR_RDF_DOCUMENT>
SELECT DISTINCT ?s WHERE {
?s clinic:ProvidedBy :MountSinai
}
But, I really suggest to start with an RDF and SPARQL tutorial, since form your comment your query
SELECT * WHERE { ?x rdf:resource "#MountSinai" }
is missing fundamental SPARQL basics. And for writing a matching SPARQL query it'S always good to have a look at the data in Turtle resp. N-Triples format both of which being closer to the SPARQL syntax.
I am trying to teach myself this weekend how to run API queries against a data source in this case data.gov. At first I thought I'd use a simple SQL variant, but it seems in this case I have to use SPARQL.
I've read through the documentation, downloaded Twinkle, and can't seem to quite get it to run. Here is an example of a query I'm running. I'm basically trying to find all gas stations that are null around Denver, CO.
PREFIX station: https://api.data.gov/nrel/alt-fuel-stations/v1/nearest.json?api_key=???location=Denver+CO
SELECT *
WHERE
{ ?x station:network ?network like "null"
}
Any help would be very much appreciated.
SPARQL is a graph pattern language for RDF triples. A query consists of a set of "basic graph patterns" described by triple patterns of the form <subject>, <predicate>, <object>. RDF defines the subject and predicate with URI's and the object is either a URI (object property) or literal (datatype or language-tagged property). Each triple pattern in a query must therefore have three entities.
Since we don't have any examples of your data, I'll provide a way to explore the data a bit. Let's assume your prefix is correctly defined, which I doubt - it will not be the REST API URL, but the URI of the entity itself. Then you can try the following:
PREFIX station: <http://api.data.gov/nrel...>
SELECT *
WHERE
{ ?s station:network ?network .
}
...setting the PREFIX to correctly represent the namespace for network. Then look at the binding for ?network and find out how they represent null. Let's say it is a string as you show. Then the query would look like:
PREFIX station: <http://api.data.gov/nrel...>
SELECT ?s
WHERE
{ ?s station:network "null" .
}
There is no like in SPARQL, but you could use a FILTER clause using regex or other string matching features of SPARQL.
And please, please, please google "SPARQL" and "RDF". There is lots of information about SPARQL, and the W3C's SPARQL 1.1 Query Language Recommendation is a comprehensive source with many good examples.
I use this sparql query to get as much cities as possible:
select * where {
?city rdf:type dbo:PopulatedPlace
}
However, some expected ones are missing e.g.
http://dbpedia.org/resource/Heidelberg
(neither that nor one of its wikiRedirects)
which is of a dbo:PopulatedPlace as this query returns true (in JSON):
ask {
:Heidelberg a dbo:PopulatedPlace
}
I need that list to be exhaustiv because later I will add constraints based on user input.
I use http://dbpedia.org/snorql/ to test the queries.
Any help is appreciated.
UPDATE:
One of the Devs told me the public endpoint is limited ( about 1K ).
I'll come up with a paginated solution and see if it contains the 'outlier'.
UPDATE2:
The outlier is definitly in the resultset of rdf:type dbo:Town.
Using dbo:PopulatedPlace yields too many results to check per hand, though.
The public endpoint limits results to about 1K. Pagination or use of a smaller subclass of dbo:PopulatedPlace yields the result.
I'm playing around some with Dotnetrdf's sparql engine and I'm trying to create parametered queries with no success yet.
Say I'm working on a graph g with a blank node identified as _:1690 with the code
Dim queryString As SparqlParameterizedString = New SparqlParameterizedString()
queryString.Namespaces.AddNamespace("rdfs", UriFactory.Create("http://www.w3.org/2000/01/rdf-schema#"))
queryString.CommandText = "SELECT ?label { #context rdfs:label ?label } "
queryString.SetParameter("context", g.GetBlankNode("1690"))
Dim result As VDS.RDF.Query.SparqlResultSet = g.ExecuteQuery(New SparqlQueryParser().ParseFromString(queryString))
Whenever I run this, I get all nodes having a rdfs:label property instead of filtering the result on my blank node only.
Please, how to set the parameter's value properly so I get only one item in the result ?
Thanks in advance,
Max.
Blank Nodes in a SPARQL query differ from Blank Nodes in a RDF Graph
In a SPARQL Query a blank node is treated as a temporary variable which has limited scope, it does not match a specific blank node so you cannot write a SPARQL query to select by blank node identifier.
So your code creating your query is giving a result the same as if you replaced #context with a variable e.g. ?s
If you need to find the value associated with a specific blank node then you need to formulate a query that uniquely selects that blank node based on the triples it participates in. If you can't do that then you need to re-think your data modelling since if this is the case then you should likely be using URIs instead of blank nodes.
As a workaround since you are using dotNetRDF and have the original graph you are querying you can use the IGraph API instead e.g.
INode label = g.GetTriplesWithSubjectPredicate(g.GetBlankNode("1690"), g.CreateUriNode("rdfs:label")).Select(t => t.Object).FirstOrDefault();
Just remember that label could always be null if the triple you are looking for doesn't exist