Sparql Query Geometry doing strange things - sparql

I'm doing a Sparql Query right now, to find Places around a specific radius of a given Point from dbpedia Endpoint (Snorql).
My first solution (already doing at some other Endpoints) was this:
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT *
WHERE {
?resource rdfs:label ?label .
?resource geo:lat ?lat .
?resource geo:long ?long .
?resource geo:geometry ?coordinates .
FILTER(bif:st_within(?coordinates, bif:st_geomFromText("POINT(10.2788 47.4093)"), 1)) .
FILTER (lang(?label)= "de") .
}
I noticed that it doesn't give me any results. Then I tried the same thing with the given rounded values in geo:lat and geo:long:
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT *
WHERE {
?resource rdfs:label ?label .
?resource geo:lat ?lat .
?resource geo:long ?long .
?resource geo:geometry ?coordinates .
FILTER(bif:st_within(bif:st_point(?long, ?lat), bif:st_geomFromText("POINT(10.2788 47.4093)"), 1)) .
FILTER (lang(?label)= "de") .
}
Now I'm getting 2 results. When I'm increasing the radius of the first solution to 21, there are plenty results, but decreasing it to 20, there are no results. Is there a mistake I made in the first Query?
Thank you very much,
SaW

As I answered on Confluence...
Interesting!
Your original query (with addition of FROM <http://dbpedia.org> clause) does return the expected results when run against the LOD Cloud Cache, which is now on a somewhat older Virtuoso engine. This looks like a regression in newer versions.
To check things on DBpedia, I started with your query, and added a couple of BINDs using the st_distance() function to the WHERE clause --
BIND
( bif:st_distance
( ?coordinates,
bif:st_geomFromText("POINT(10.2788 47.4093)")
) AS ?coord_distance
) .
BIND
( bif:st_distance
( bif:st_point(?long, ?lat),
bif:st_geomFromText("POINT(10.2788 47.4093)")
) AS ?latlong_distance
)
}
I also added a final ORDER BY ?coord_distance to the query.
My results on DBpedia.org/sparql clearly show two entities within your desired radius of 1, and the calculated distances are the same whether based on ?coordinates or st_point(?long, ?lat) but they are not delivered unless bif:st_within specifies a radius of 21 or greater -- and those results include a number of other entities that are within the larger radius.
I've raised this to Virtuoso Development, and it's being tracked internally as bug#18399.
... and as followed up there ...
st_within() uses st_distance(), so given that the srid is 4326 (as is typical for DBpedia geodata), "the haversine function is used to compute a great circle distance in kilometers on Earth." You can divide your distance in meters by 1000 (or multiply it by 0.001) to get the distance in kilometers for use in the st_within() call.
The computing time is dependent on the instance host, other load on the instance, etc. The public DBpedia instance's response time may well be longer than you can tolerate. You can set up your own mirror in a local server or in the cloud (AMIs based on DBpedia 2016-10 Snapshot [current DBpedia.org/sparql], or DBpedia-Live [current live.DBpedia.org/sparql]), which you can put on any AWS instance type -- so you can give it as much processor and/or RAM as you like.
Note that the LOD Cloud Cache instance may be upgraded to a newer Virtuoso engine at any time, so you should not rely on this delivering the desired results via st_within(). A slightly adjusted DBpedia query will deliver what I think you want using only the st_distance() function (here calculated from ?coordinates, but you could also use the more complex construction based on ?long and ?lat), and not the malfunctioning st_within().

Related

Sparql identifying Powerstations more precisely

I would like to get all powerstations and also the type of powerstation (Nuclear, Water, Coal etc...) from DBpedia.
I recognized that there are no specific types of powerstations, so I am query all powerstations and try to figure out the type of powerstation from the name. ( I will not catch all of them, but a lot enough).
My query so far :
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX dbp-ont: <http://dbpedia.org/ontology/>
#PREFIX georss: <http://www.georss.org/georss/>
select distinct *
{
?name rdf:type dbp-ont:PowerStation .
?name geo:lat ?lat .
?name geo:long ?long
OPTIONAL { ?name dbo:installedCapacity ?installedCapacity }
OPTIONAL { ?name dbo:openingDate ?openingDate }
OPTIONAL { ?name dbo:closingDate ?closingDate }
} limit 100
Is there a way to have a new field named 'nuclear' that has a value of '1' if its name contains 'nuclear' ?
First thing is to remember that DBpedia data is a moving target, just like the Wikipedia data from which it is derived. Updates to Wikipedia will eventually (typically, in 3-9 months) be part of DBpedia. More quickly (typically in a few hours, if not minutes or seconds; sometimes days, for various reasons), they'll be part of DBpedia-Live.
The long-term solution for giving every powerstation a type as you wish, is to edit Wikipedia.
For your specific immediate workaround, note that a large number of OPTIONAL clauses can make your query take much longer, and so may eventually mean that DBpedia will not return the data you want. You may need to spin up your own mirror (such as DBpedia or DBpedia-Live in the Amazon EC2 Cloud).
Finally, as #AKSW noted in comments, the line below should deliver your ?nuclear variable with a 1 if that string appeared in the ?name. Just put it after your OPTIONAL lines.
BIND ( IF ( CONTAINS ( LCASE ( STR ( ?name ) ), "nuclear" ), 1, 0 ) AS ?nuclear )

How to query for publication date of books using SPARQL in DBPEDIA

I am trying to retrieve the publication date and the no.of pages for books in DBpedia. I tried the following query and it gives empty results. I see that these are properties under book(http://mappings.dbpedia.org/server/ontology/classes/Book) but could not retrieve it.
I would like to know if there is an error in the code or if dbpedia does not store these dates related to books.
SELECT ?book ?genre ?date ?numberOfPages
WHERE {
?book rdf:type dbpedia-owl:Book .
?book dbp:genre ?genre .
?book dbp:firstPublicationDate ?date .
OPTIONAL {?book dbp:numberOfPages ?numberOfPages .}
}
The dbp:firstPublicationDate does not work for two reasons:
First, as pointed in the first answer, you used the wrong prefix.
But even if you correct it, you'll see that you would still have no results. Then the best thing to do is to test with the minimum number of patters, in you case you should as for books with first publication date, two triple pattern only. If you still don't get results, you should test how <http://dbpedia.org/ontology/firstPublicationDate> is actually used with a query like this:
SELECT ?class (COUNT (DISTINCT ?s) AS ?instances)
WHERE {
?s <http://dbpedia.org/ontology/firstPublicationDate> ?date ;
a ?class
}
GROUP BY ?class
ORDER BY DESC(?instances)
LIMIT 1000
Mapping based properties are using the namespace http://dbpedia.org/ontology/, thus, the prefix must be dbo instead of dbp, which stands for http://dbpedia.org/property/.
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?book ?genre ?date ?numberOfPages
WHERE {
?book a dbo:Book ;
dbp:genre ?genre ;
dbo:firstPublicationDate ?date .
OPTIONAL {?book dbp:numberOfPages ?numberOfPages .}
}
Some additional comments:
put the prefixes to the SPARQL query such that others here can run it without any exceptions (also in the future) - the current SPARQL query uses dbpedia-owl but this one is not pre-defined on the official DBpedia anymore - it's called dbo instead
which brings me to the second point -> if you're using a public SPARQL endpoint, show its URL
you can start debugging your own SPARQL query by simply starting with only parts of it and adding more triple patterns then, e.g. in your case you could check if there is any triple with the property with
PREFIX dbp: <http://dbpedia.org/property/>
SELECT * WHERE {?book dbp:firstPublicationDate ?date } LIMIT 10
Update
As Ivo Velitchkov noticed in his answer below, the property dbo:firstPublicationDate is only used for mangas, etc., i.e. written work that was published periodically. Thus, the result will be empty.

Query on sindice SPARQL endpoint

I tried to make this query on http://sparql.sindice.com/
PREFIX rev: <http://purl.org/stuff/rev#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE
{
?thing rdfs:label ?name .
?thing rev:hasReview ?review .
filter regex(str(?name), "harlem", "i")
} LIMIT 10
And it returns 504 Gateway Time-out
The server didn't respond in time.
What i'm doing wrong?
Thanks.
You made a query that was too hard for the endpoint to answer in a timely fashion hence why you got a timeout response. Note that there website states the following:
all queries are time and resource limited. notice that this means that
sometime you will get incomplete or even no results. If this is
happening often for you or you really want to run more complex queries
please contact us
Your query essentially selects a vast swathe of data and then makes the engine run a regular expression over ever possible value which is extremely slow.
I believe Sindice use Virtuoso as their SPARQL implementation so you can cheat and use Virtuoso specific full text query extension like so:
PREFIX rev: <http://purl.org/stuff/rev#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE
{
?thing rdfs:label ?name .
?thing rev:hasReview ?review .
?name bif:contains "harlem" .
}
LIMIT 10
However this query also seems to timeout, if you can add more conditions to constrain your query further you will have more chance of getting results in a timely fashion.

Limit a SPARQL query to one dataset

I'm working with the following SPARQL query, which is an example on the web-based end of my institution's SPARQL endpoint;
SELECT ?building_number ?name ?occupants WHERE {
?site a org:Site ;
rdfs:label "Highfield Campus" .
?building spacerel:within ?site ;
skos:notation ?building_number ;
rdfs:label ?name .
OPTIONAL {
?building soton:buildingOccupants ?occ .
?occ rdfs:label ?occupants .
} .
} ORDER BY ?name
The problem is that as well as getting data from 'Buildings and Places', the Dataset I'm interested in, and would expect the example to use, it also gets data from the 'Facilities and Equipment' dataset, which isn't relevant. You should see this if you follow the link.
I suspect the example may pre-date the addition of the Facilities and Equipment dataset, but even with the research I've done into SPARQL, I can't see a clear way to define which datasets to include.
Can anyone recommend a starting point to limit it to just show 'Buildings', or, more specifically, results from the 'Buildings and Places' dataset.
Thanks
First things first, you really need to use SELECT DISTINCT, as otherwise you'll get repeated results.
To answer your question, you can use GRAPH { ... } to filter certain parts of a SPARQL query to only match data from a specific dataset. This only works if the SPARQL endpoint is divided up into GRAPHs (this one is). The solution you asked for isn't the best choice, as it assumes that things within sites in the 'places' dataset will always be resticted to buildings... That's risky -- as it might end up containing trees and signposts at some time in the future.
Step one is to just find out what graphs are in play:
SELECT DISTINCT ?g1 ?building_number ?name ?occupants WHERE {
?site a org:Site ;
rdfs:label "Highfield Campus" .
GRAPH ?g1 { ?building spacerel:within ?site ;
skos:notation ?building_number ;
rdfs:label ?name .
}
OPTIONAL {
?building soton:buildingOccupants ?occ .
?occ rdfs:label ?occupants .
} .
} ORDER BY ?name
Try it here: http://is.gd/WdRAGX
From this you can see that http://id.southampton.ac.uk/dataset/places/latest and http://id.southampton.ac.uk/dataset/places/facilities are the two relevant ones.
To only look for things 'within' a site according to the "places" graph, use:
SELECT DISTINCT ?building_number ?name ?occupants WHERE {
?site a org:Site ;
rdfs:label "Highfield Campus" .
GRAPH <http://id.southampton.ac.uk/dataset/places/latest> {
?building spacerel:within ?site ;
skos:notation ?building_number ;
rdfs:label ?name .
}
OPTIONAL {
?building soton:buildingOccupants ?occ .
?occ rdfs:label ?occupants .
} .
} ORDER BY ?name
Alternate solutions:
Using rdf:type
Above I've answered your question, but it's not the answer to your problem. This solution is more semantic as it actually says 'only give me buildings within the campus' which is what you really mean.
Instead of filtering by graph, which is not very 'semantic' you could also restrict ?building to be of class 'building' which research facilities are not. They are still sometimes listed as 'within' a site. Usually when the uni has only published what campus they are on but not which building.
?building a rooms:Building
Using FILTER
In extreme cases you may not have data in different GRAPHS and there may not be an elegant relationship to use to filter your results. In this case you can use a FILTER and turn the building URI into a string and use a regular expression to match acceptable ones:
FILTER regex(str(?building), "^http://id.southampton.ac.uk/building/")
This is bar far the worst option and don't use it if you have to.
Belt and Braces
You can use any of these restictions together and a combination of restricting the GRAPH plus ensuring that all ?buildings really are buildings would be my recommended solution.

Reverse wikipedia geotagging lookup

Wikipedia is geotagging a lot of its articles. (Look in the top right corner of the page.)
Is there any API for querying all geotagged pages within a specified radius of a geographical position?
Update
Okay, so based on lost-theory's answer I tried this (on DBpedia query explorer):
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT ?subject ?label ?lat ?long WHERE {
?subject geo:lat ?lat.
?subject geo:long ?long.
?subject rdfs:label ?label.
FILTER(xsd:float(?lat) - 57.03185 <= 0.05 && 57.03185 - xsd:float(?lat) <= 0.05
&& xsd:float(?long) - 9.94513 <= 0.05 && 9.94513 - xsd:float(?long) <= 0.05
&& lang(?label) = "en"
).
} LIMIT 20
This is very close to what I want, except it returns results within a (local) square around the point and not a circle. Also I would like if the results where sorted based on the distance from the point. (If possible.)
Update 2
I am trying to determine the euclidean distance as an approximation of the true distance, But I am having trouble on squaring a number in SPARQL. (Question opened here.) When I get something useful I will update the question, but in the meantime I will appreciate any suggestions on alternative approaches.
Update 3
A final update. I gave up on using SPARQL through DBpedia. I have written a simple parser which fetches the Wikipedia article text nightly database dump and parses all articles for geocodes. It works rather nicely and it allows me to store information about geotagged articles however I wish.
This is probably the solution I will continue using, and if I get around to create a nice interface to it I might consider allowing public API access and/or publishing the source to the parser.
The OpenLink Virtuoso server used by the dbpedia endpoint has several query features. I found the information on http://docs.openlinksw.com/virtuoso/rdfsparqlgeospat.html useful for a similar problem.
I ended up with a query such as this:
SELECT ?page ?lat ?long (bif:st_distance(?geo, bif:st_point(15.560278, 58.394167)))
WHERE{
?m foaf:page ?page.
?m geo:geometry ?geo.
?m geo:lat ?lat.
?m geo:long ?long.
FILTER (bif:st_intersects (?geo, bif:st_point(15.560278, 58.394167), 30))
}
ORDER BY ASC 4 LIMIT 15
This example retrieves the geotagged locations within 30 km from the origin position.
You should be able to query for latitude/longitude using SPARQL and dbpedia. An example (from here):
SELECT distinct ?s ?la ?lo ?name ?country WHERE {
?s dbpedia2:latitude ?la .
?s dbpedia2:longitude ?lo .
?s dbpedia2:officialName ?name .
?s dbpedia2:country ?country .
filter (
regex(?country, 'England|Scotland|Wales|Ireland')
and regex(?name, '^[Aa]')
)
}
You can run your own queries here.
There are a couple of tools listed on Tools and applications based on coordinates from Wikipedia. I'm not sure if it's what you're looking for, but the Geosearch.py tool looks pretty cool.
Not an API, but you can also download this nice set of all geo-tagged wikipedia articles and query it directly in a local database:
http://www.google.com/fusiontables/DataSource?dsrcid=423292
The free GeoNames.org FindNearbyWikipedia service can fetch geotagged articles for a give postal code or coordinates (latitude, longitude)
It provides 30,000 credits daily limit per application (identified by the parameter 'username'), the hourly limit is 2000 credits. A credit is a web service request hit for most services. An exception is thrown when the limit is exceeded.
I'm not familiar enough with SPARQL, but if it can use power in its filter then its easy to compute the distance of a given article from a given point using Pythagoras theorem (a^2 + b^2 = c^2) and that would give you all the articles in a radius.
Another option would be to get a Wikipedia data dump and process it yourself - this is what I did when I needed to do some linguistic analysis on Wikipedia article.