How Do I Query Against Data.gov - sparql

I am trying to teach myself this weekend how to run API queries against a data source in this case data.gov. At first I thought I'd use a simple SQL variant, but it seems in this case I have to use SPARQL.
I've read through the documentation, downloaded Twinkle, and can't seem to quite get it to run. Here is an example of a query I'm running. I'm basically trying to find all gas stations that are null around Denver, CO.
PREFIX station: https://api.data.gov/nrel/alt-fuel-stations/v1/nearest.json?api_key=???location=Denver+CO
SELECT *
WHERE
{ ?x station:network ?network like "null"
}
Any help would be very much appreciated.

SPARQL is a graph pattern language for RDF triples. A query consists of a set of "basic graph patterns" described by triple patterns of the form <subject>, <predicate>, <object>. RDF defines the subject and predicate with URI's and the object is either a URI (object property) or literal (datatype or language-tagged property). Each triple pattern in a query must therefore have three entities.
Since we don't have any examples of your data, I'll provide a way to explore the data a bit. Let's assume your prefix is correctly defined, which I doubt - it will not be the REST API URL, but the URI of the entity itself. Then you can try the following:
PREFIX station: <http://api.data.gov/nrel...>
SELECT *
WHERE
{ ?s station:network ?network .
}
...setting the PREFIX to correctly represent the namespace for network. Then look at the binding for ?network and find out how they represent null. Let's say it is a string as you show. Then the query would look like:
PREFIX station: <http://api.data.gov/nrel...>
SELECT ?s
WHERE
{ ?s station:network "null" .
}
There is no like in SPARQL, but you could use a FILTER clause using regex or other string matching features of SPARQL.
And please, please, please google "SPARQL" and "RDF". There is lots of information about SPARQL, and the W3C's SPARQL 1.1 Query Language Recommendation is a comprehensive source with many good examples.

Related

How to compare values, ignoring diacritics, with SPARQL

I've been trying (with no success so far) to filter values with a "broader equals" condition. That is, ignoring diacritics.
select * where {
?s per:surname1 ?t.
bind (fn:starts-with(str(?t),'Maria') as ?noAccent1) .
bind (fn:translate(str(?t),"áéíóú","aeiou") as ?noAccent2) .
} limit 100
To this moment, I've tried with XPath functions fn:contains, fn:compare, fn:translate, fn:starts-with, but none of them seem to be working.
Is there any other way (other than chaining replaces) to add collation into these functions or achieve the same goal?
The XPath functions you mention are not part of the SPARQL standard really, so as you found out, you can't rely on them being supported out of the box (though some vendors may provide them as an add-on).
However, GraphDB (which is based on RDF4J) allows you to create your own custom functions in SPARQL. It is a matter of writing a Java class that implements the org.eclipse.rdf4j.query.algebra.evaluation.function.Function interface, and registering it in the RDF4J engine by packaging it as a Java Service Provider Interface (SPI) implementation.
SPARQL and REGEX do not support efficiently transliterating character maps. If you want an efficient implementation you would need a custom RDF4J custom as described by Jeen.
If you want a quick and dirty solution use this code sample:
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
PREFIX spif: <http://spinrdf.org/spif#>
select * where {
BIND("Mariana" as ?t) .
BIND("Márénísótú" as ?t2) .
BIND (regex(str(?t),'^Maria') as ?noAccent1) .
BIND (spif:replaceAll(
spif:replaceAll(
spif:replaceAll(
spif:replaceAll(
spif:replaceAll(str(?t2),"á","a"),
"é","e")
,"í","i"),
"ó","o"),
"ú","u") as ?noAccent2) .
}

GeoSPARQL within query

I'm struggling with the GeoSPARQL functions. I have two points defined in my ontology. Using this query I get them in my results:
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ?what ?met
WHERE {
?what geo:hasGeometry ?met .
FILTER geof:within( ?met ,"ENVELOPE(51.900991, 51.913594, 4.502206, 4.476328)"^^geo:wktLiteral ) .
}
http://www.example.org/POI#Headquater
http://www.example.org/POI#MiddenVanDeBrug
The question is why http://www.example.org/POI#ErasmusBrug is not part of the search result. Should it be possible to search for polygons within an envelop?
Which GeoSPARQL functions are available in Stardog? Any good example resource?
The ontology I use can be found here
The Stardog documentation for GeoSPARQL can be found here. For more specific support, please come visit us at Stardog Community.
I found out that there is an error in the log file of Stardog upon importing the data:
WARN 2017-12-14 08:31:30,989 [XNIO-1 task-24] com.complexible.stardog.spatial.io.StatementSourceGeospatialSource:parse(95): Failed to parse unknown/malformed shape POLYGON((4.476027 51.91137, 4.497099 51.911291, 4.497142 51.905307, 4.75813 51.905201, 4.476027 51.91137 )). Skipping this record
What could be wrong with this polygon?

How to create and use GeoSpatial indexes in Marklogic from Sparql

I have loaded the geospatial data from geonames.org into Marklogic using RDF import.
When using the Query Console to explore the data, I see the data has been loaded into an xml document and looks like this:
<sem:triple>
<sem:subject>http://sws.geonames.org/2736540/</sem:subject>
<sem:predicate>http://www.w3.org/2003/01/geo/wgs84_pos#lat</sem:predicate>
<sem:object datatype="http://www.w3.org/2001/XMLSchema#string">40.41476</sem:object>
</sem:triple>
<sem:triple>
<sem:subject>http://sws.geonames.org/2736540/</sem:subject>
<sem:predicate>http://www.w3.org/2003/01/geo/wgs84_pos#long</sem:predicate>
<sem:object datatype="http://www.w3.org/2001/XMLSchema#string">-8.54304</sem:object>
</sem:triple>
I am able to do a SPARQL DESCRIBE and see data. Here is an example.
#prefix geonames: <http://www.geonames.org/ontology#> .
#prefix xs: <http://www.w3.org/2001/XMLSchema#> .
#prefix p0: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
<http://sws.geonames.org/2736540/> geonames:parentCountry <http://sws.geonames.org/2264397/> ;
geonames:countryCode "PT"^^xs:string ;
p0:long "-8.54304"^^xs:string ;
geonames:featureCode <http://www.geonames.org/ontology#P.PPL> ;
geonames:parentADM1 <http://sws.geonames.org/2742610/> ;
geonames:parentFeature <http://sws.geonames.org/2742610/> ;
<http://www.w3.org/2000/01/rdf-schema#isDefinedBy> "http://sws.geonames.org/2736540/about.rdf"^^xs:string ;
a geonames:Feature ;
geonames:locationMap <http://www.geonames.org/2736540/pedreira-de-vilarinho.html> ;
geonames:name "Pedreira de Vilarinho"^^xs:string ;
geonames:nearbyFeatures <http://sws.geonames.org/2736540/nearby.rdf> ;
geonames:featureClass geonames:P ;
p0:lat "40.41476"^^xs:string .
I want to query over this data using SPARQL QUERY as my Query Type in a way where I can take advantage of the geospatial indexes that MarkLogic can create.
I have been having trouble with two aspects of this.
How to correctly create the geospatial indexes for the wgs84_pos#lat and wgs84_pos#long predicates?
How do I access those indexes from SPARQL QUERY?
I would like to have a sparql query that would be able to find subjects within some range of a Point.
=====================================
Followup:
After following David Ennis's Answer (Which worked nicely, thanks!) I ended up with this sample Xquery that was able to select data out of documents via geosearch and then use those IRI's in a sparql values query.
Example:
xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics"
at "/MarkLogic/semantics.xqy";
let $matches := cts:search(//rdf:RDF,
cts:element-pair-geospatial-query (
fn:QName("http://www.geonames.org/ontology#","Feature"),
fn:QName("http://www.w3.org/2003/01/geo/wgs84_pos#", "lat"),
fn:QName ("http://www.w3.org/2003/01/geo/wgs84_pos#","long"),
cts:circle(10, cts:point(19.8,99.8))))
let $iris := sem:iri($matches//#rdf:about)
let $bindings := (fn:map(function($n) { map:entry("featureIRI", $n) }, $iris))
let $sparql := '
PREFIX wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT *
WHERE {
?featureIRI wgs:lat ?lat;
wgs:long ?long.
}
'
return sem:sparql-values($sparql, $bindings)
This xquery queries the geospatial index, finds matching documents and then selects the IRI in the rdf:about attribute of the xml document.
It then maps over all of those IRIs and creates map entries that can be passed in the bindings parameter of the sem:sparql-values function.
I do not believe you can do what you want via just native SPARQL. Geospacial queries in any SPARQL implementation are extensions like geoSPARQL, Apache Jena geospacial queries etc.
My suggested approach in MarkLogic:
Insert the geonames subjects into MarkLogic as unmanaged triples (an XML or JSON document with embedded triples for each one)
In the same document, include the geo-spacial data in one of the acceptable MarkLogic formats. This essentially adds geo-spacial metadata to the triple since it is in the same fragment.
Add geo-spacial path-range-indexes for the geospacial data.
Use SPARQL inside of MarkLogic with a cts query restriction.
The Building Blocks for above:
Understanding unmanaged triples
Understanding Geo-spacial Region Types
Understanding Geo-spacial Indexes
Understanding Geo-spacial Queries
Understanding Semantics with cts search
Another approach to the final query could be the Optic API but I do not see how it would negate the need to do steps 1-3

How to filter query SPARQL for property "type"

I have a data source file that one of its properties is an actual class instance:
<clinic:Radiology rdf:ID="rad1234">
<clinic:diagnosis>Stage 4</clinic:diagnosis>
<clinic:ProvidedBy rdf:resource="#MountSinai"/>
<clinic:ReceivedBy rdf:resource="#JohnSmith"/>
<clinic:patientId>7890123</clinic:patientId>
<clinic:radiologyDate>01-01-2017</clinic:radiologyDate>
</clinic:Radiology>
so clinic:ProvidedBy is pointing to this:
<clinic:Radiologists rdf:ID="MountSinai">
<clinic:name>Mount Sinai</clinic:name>
<clinic:npi>1234567</clinic:npi>
<clinic:specialty>Oncology</clinic:specialty>
</clinic:Radiologists>
How do I query using the property clinic:providedBy (which is of type clinic:Radiologists)? Whatever I have tried does not bring back results.
It's also not clear what exactly you want to have, so my answer will return "all radiology resources that are provided by MountSinai":
PREFIX clinic: <THE NAMESPACE OF_THE_CLINIC_PREFIX>
PREFIX : <THE_BASE_NAMESPACE_OF_YOUR_RDF_DOCUMENT>
SELECT DISTINCT ?s WHERE {
?s clinic:ProvidedBy :MountSinai
}
But, I really suggest to start with an RDF and SPARQL tutorial, since form your comment your query
SELECT * WHERE { ?x rdf:resource "#MountSinai" }
is missing fundamental SPARQL basics. And for writing a matching SPARQL query it'S always good to have a look at the data in Turtle resp. N-Triples format both of which being closer to the SPARQL syntax.

DBpedia get all cities in the world - missing a few

I use this sparql query to get as much cities as possible:
select * where {
?city rdf:type dbo:PopulatedPlace
}
However, some expected ones are missing e.g.
http://dbpedia.org/resource/Heidelberg
(neither that nor one of its wikiRedirects)
which is of a dbo:PopulatedPlace as this query returns true (in JSON):
ask {
:Heidelberg a dbo:PopulatedPlace
}
I need that list to be exhaustiv because later I will add constraints based on user input.
I use http://dbpedia.org/snorql/ to test the queries.
Any help is appreciated.
UPDATE:
One of the Devs told me the public endpoint is limited ( about 1K ).
I'll come up with a paginated solution and see if it contains the 'outlier'.
UPDATE2:
The outlier is definitly in the resultset of rdf:type dbo:Town.
Using dbo:PopulatedPlace yields too many results to check per hand, though.
The public endpoint limits results to about 1K. Pagination or use of a smaller subclass of dbo:PopulatedPlace yields the result.