Querying DBpedia-Live with SPARQL does not give same answer as DBpedia - sparql

I want to query DBpedia with DBpedia Live endpoint.
I have this query :
SELECT *
WHERE {
?x a dbo:Person .
?x rdfs:label "Usain Bolt"#en .
}
This query gives the correct answer with most names I tried (for example “Teddy Riner"#en) but it fails with Usain Bolt and Rachid Badouri.
I don’t get why as their DBpedia pages (Teddy Riner, Usain Bolt) are constructed the same way: they both have a rdfs:label, which is written exactly like I did.
It seems to me that there is an incoherence between the endpoint and DBpedia. But I don’t think that it's because the endpoint is not to date.
Even more surprising, this query gives the correct answer:
SELECT *
WHERE {
?x rdfs:label "Usain Bolt"#en .
}
However, Usain Bolt is a dbo:Person! Same thing for Rachid Badouri.
Could someone explain me why the first query does not give answer?
Any help would be appreciated! Thanks

According to DBpedia-Live, at the time of writing, the entity with rdfs:label "Usain Bolt"#en has many types, but is not a dbo:Person. Similar for the entity with rdfs:label "Rachid Badouri"#en.
In contrast, the entity with rdfs:label "Teddy Riner"#en is a dbo:Person.
Note: DBpedia-Live content is a moving target, varying with Wikipedia content changes, adjustments in the templates, and other variables. The statements I made above may no longer be true when you read this.

Related

DBPedia SPARQL, return certain number of relevant page URIs for entity EXCEPT the URIs where the entity belongs to a set of subclasses of Owl:Thing

Looking for SPARQL query to do the following:
For example, I have the word Apple. Apple may refer to the organization Apple_Inc or the Species of Plants class as per the ontology. Owl: Thing has a subclass called Species, so I want to return those most relevant/maximum-hit URIs where the keyword Apple does not belong to the Species subclass. So when you return all the URIs, http://dbpedia.org/page/Apple should not be one of them, neither must ANY relevant link that comes under Species subclass.
By maximum-hit/most relevant I mean the top returned results that match the query! Like when you access the PrefixSearch (i.e. Autocomplete) API, it has the parameter called MaxHits.
For example http://lookup.dbpedia.org/api/search/PrefixSearch?QueryClass=&MaxHits=2&QueryString=berl is a link where you want to return the top 2 URIs that match the QueryString=berl.
Like I'm actually really struggling to even explain the work I've done so far because I'm not able to understand the structure and how to formulate a proper query..
with respect to negation in SPARQL, I found a relevant portion of the documentation in the link here.. But I do not know how and where to proceed from there, and cannot understand why keywords like ?person are used.. I can understand the person is used to selected well.. PEOPLE names, but I would like to know how and where to find these keywords like ?person, ?name to represent a specific entity..
SELECT ?uri ?label
WHERE {
?uri rdfs:label ?label .
filter(?label="car"#en)
}
I would really appreciate if someone could link me the part of the documentation I can clearly read and understand that ?uri is used to select a URI in the form www.dbpedia.org'/page/SomeEntity and what these ?person, ?name, ?label represent.
I'm actually so lost.. I will go up and start eating one elephant at a time. For now, I'll be very grateful if I get an answer to this.
If there is anyway you know where I can avoid learning and using SPARQL, that would work too! I know Python well enough, so leveraging an API to pull this information is also fine by me. This question was posted by me.
Answer posted by #Stanislav-Kravin --
SELECT DISTINCT ?s
WHERE
{ ?s a owl:Thing .
?s rdfs:label ?label .
FILTER ( LANGMATCHES ( LANG ( ?label ), 'en' ) )
?label bif:contains '"apple"' .
FILTER NOT EXISTS { ?s rdf:type/rdfs:subClassOf* dbo:Species }
}

How to retrieve abstract for a DBpedia resource?

I need to find all DBpedia categories and articles that their abstract include a specific word.
I know how to write a SPARQL query that queries the label like the following:
SELECT ?uri ?txt WHERE {
?uri rdfs:label ?txt .
?txt bif:contains "Machine" .
}
but I have not figured out yet how to search the abstract.
I've tried with the following but it seems not to be correct.
SELECT ?uri ?txt WHERE {
?uri owl:abstract ?txt .
?txt bif:contains "Machine" .
}
How can I retrieve the abstract in order to query its text?
Since you already know how to search a string for text content, this question is really about how to get the abstract. If you retrieve any DBpedia resource in a web browser, e.g., http://dbpedia.org/resource/Mount_Monadnock (which will redirect to http://dbpedia.org/page/Mount_Monadnock), you can see the triples of which it's a subject or predicate. In this case, you'll see that the property is dbpedia-owl:abstract. Thus you can do things like
select * where {
?s dbpedia-owl:abstract ?abstract .
?abstract bif:contains "Monadnock" .
filter langMatches(lang(?abstract),"en")
}
limit 10
SPARQL results
Instead of visiting the page for the resource, which not endpoints will support, you could have simply retrieved all the triples for the subject, and looked at which ones relate it to its abstract. Since you know the abstract is a literal, you could even restrict it to triples where the object is a literal, and perhaps with a language that you want. E.g.,
select ?p ?o where {
dbpedia:Mount_Monadnock ?p ?o .
filter ( isLiteral(?o) && langMatches(lang(?o),'en') )
}
SPARQL results
This also clearly shows that the property you want is http://dbpedia.org/ontology/abstract. When you have a live query interface that you can use to pull down arbitrary data, it's very easy to find out what parts of the data you want. Just pull down more than you want at first, and then refine to get just what you want.

How to form dbPedia iSPARQL query (for wikipedia content)

Say I need to fetch content from wikipedia about all mountains. My target is to show initial paragraph, and an image from respective article (eg. Monte Rosa and Vincent Pyramid.
I came to know about dbpedia, and with some research got to find that it provides live queries into wiki database directly.
I have 2 questions:
1 - I am finding it difficult how could I formulate my queries. I can't play around iSPARQL. I tried following query but it throws error saying invalid xml.
SELECT DISTINCT ?Mountain FROM <http://dbpedia.org> WHERE {
[] rdf:type ?Mountain
}
2 - My requirement is to show only mountains that have at least 1 image (I need to show this image too). Now the ones I listed above have images, but how could I be sure? Also, looking at both examples I see many fields differ in wiki articles - so for future extension it maybe quite difficult to fetch them.
I just want to reject those which do not have sufficient data or description.
How can I filter out mountains based on pictures present?
UPDATE:
My corrected query, which solves my first problem:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?name ?description
WHERE {
?name rdf:type <http://dbpedia.org/ontology/Mountain>;
dbpedia-owl:abstract ?description .
}
You can also query dbpedia using its SPARQL endpoint (less fancy than iSPARQL). To find out more about what queries to write, take a look at the DBpedia's datasets page. The examples there show how one can select pages based on Wikipedia categories. To select resources in the Wikipedia Mountains category, you can use the following query:
select ?mountain where {
?mountain a dbpedia-owl:Mountain .
}
SPARQL Results
Once you have some of these links in hand, you can look at them in a web browser and see the data associated with them. For instance the page for Mount Everest shows lots of properties. For restricting results to those pages that have an image, you might be interested in the dbpedia-owl:thumbnail property, or perhaps better yet foaf:depiction. For the introductory paragraph, you probably want something like the dbpedia-owl:abstract. Using those, we can enhance the query from before. The following query finds things in the category Stratovolcanoes with an abstract and an depiction. Since StackOverflow is an English language site, I've restricted the abstracts to those in English.
select * where {
?mountain a dbpedia-owl:Mountain ;
dbpedia-owl:abstract ?abstract ;
foaf:depiction ?depiction .
FILTER(langMatches(lang(?abstract),"EN"))
}
LIMIT 10
SPARQL Results

DBpedia SPARQL and predicate connection

I've got a problem with the DBpedia SPARQL endpoint because the properties of the properties like the label of rdf:type are not stocked in the endpoint. So when I run this query:
SELECT *
WHERE{
<http://dbpedia.org/ontology/Place> ?predicat ?object .
OPTIONAL{?predicat rdfs:label ?label}
}
I've got nothing for ?label.
If someone got any idea to solve this problem it would be very helpful.
You can't get the real labels from DBpedia because the SPARQL endpoint doesn't have them. But you can take the local name of the property URI. So, for rdfs:subClassOf you'd get "subClassOf". That's better than nothing. This can be done using Virtuoso's (non-standard) bif:regexp_replace function.
SELECT DISTINCT (bif:regexp_replace(STR(?p), "^.*[/#]", "") AS ?label) WHERE {
<http://dbpedia.org/ontology/Place> ?p ?o .
}
I don't think there's a SPARQL solution. Dbpedia doesn't have the data you want, and I couldn't easily find a SPARQL endpoint for that RDF at W3C. And I don't think the Virtuoso dbpedia endpoint supports federation yet, even if we did find a SPARQL endpoint for W3C.
Happy to be proven wrong on any of those points.

Reverse wikipedia geotagging lookup

Wikipedia is geotagging a lot of its articles. (Look in the top right corner of the page.)
Is there any API for querying all geotagged pages within a specified radius of a geographical position?
Update
Okay, so based on lost-theory's answer I tried this (on DBpedia query explorer):
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT ?subject ?label ?lat ?long WHERE {
?subject geo:lat ?lat.
?subject geo:long ?long.
?subject rdfs:label ?label.
FILTER(xsd:float(?lat) - 57.03185 <= 0.05 && 57.03185 - xsd:float(?lat) <= 0.05
&& xsd:float(?long) - 9.94513 <= 0.05 && 9.94513 - xsd:float(?long) <= 0.05
&& lang(?label) = "en"
).
} LIMIT 20
This is very close to what I want, except it returns results within a (local) square around the point and not a circle. Also I would like if the results where sorted based on the distance from the point. (If possible.)
Update 2
I am trying to determine the euclidean distance as an approximation of the true distance, But I am having trouble on squaring a number in SPARQL. (Question opened here.) When I get something useful I will update the question, but in the meantime I will appreciate any suggestions on alternative approaches.
Update 3
A final update. I gave up on using SPARQL through DBpedia. I have written a simple parser which fetches the Wikipedia article text nightly database dump and parses all articles for geocodes. It works rather nicely and it allows me to store information about geotagged articles however I wish.
This is probably the solution I will continue using, and if I get around to create a nice interface to it I might consider allowing public API access and/or publishing the source to the parser.
The OpenLink Virtuoso server used by the dbpedia endpoint has several query features. I found the information on http://docs.openlinksw.com/virtuoso/rdfsparqlgeospat.html useful for a similar problem.
I ended up with a query such as this:
SELECT ?page ?lat ?long (bif:st_distance(?geo, bif:st_point(15.560278, 58.394167)))
WHERE{
?m foaf:page ?page.
?m geo:geometry ?geo.
?m geo:lat ?lat.
?m geo:long ?long.
FILTER (bif:st_intersects (?geo, bif:st_point(15.560278, 58.394167), 30))
}
ORDER BY ASC 4 LIMIT 15
This example retrieves the geotagged locations within 30 km from the origin position.
You should be able to query for latitude/longitude using SPARQL and dbpedia. An example (from here):
SELECT distinct ?s ?la ?lo ?name ?country WHERE {
?s dbpedia2:latitude ?la .
?s dbpedia2:longitude ?lo .
?s dbpedia2:officialName ?name .
?s dbpedia2:country ?country .
filter (
regex(?country, 'England|Scotland|Wales|Ireland')
and regex(?name, '^[Aa]')
)
}
You can run your own queries here.
There are a couple of tools listed on Tools and applications based on coordinates from Wikipedia. I'm not sure if it's what you're looking for, but the Geosearch.py tool looks pretty cool.
Not an API, but you can also download this nice set of all geo-tagged wikipedia articles and query it directly in a local database:
http://www.google.com/fusiontables/DataSource?dsrcid=423292
The free GeoNames.org FindNearbyWikipedia service can fetch geotagged articles for a give postal code or coordinates (latitude, longitude)
It provides 30,000 credits daily limit per application (identified by the parameter 'username'), the hourly limit is 2000 credits. A credit is a web service request hit for most services. An exception is thrown when the limit is exceeded.
I'm not familiar enough with SPARQL, but if it can use power in its filter then its easy to compute the distance of a given article from a given point using Pythagoras theorem (a^2 + b^2 = c^2) and that would give you all the articles in a radius.
Another option would be to get a Wikipedia data dump and process it yourself - this is what I did when I needed to do some linguistic analysis on Wikipedia article.