DBpedia Sparql by page template - sparql

I am trying to run a query on dbpedia using SPARQL syntax, to look for all pages of a certain template. Doesn't seem to be work, I am looking for all pages with dbpprop:wikiPageUsesTemplate. Does anyone know how to correct this to properly look for templates?
SELECT ?name ?member_Of ?country ?lat ?lng ?link
WHERE {
?x dbpprop:wikiPageUsesTemplate "dbpedia:Template:Infobox_settlement" .
?x a <http://dbpedia.org/ontology/Settlement> .
?x foaf:name ?name .
?x dbpedia-owl:isPartOf ?member_Of.
?x dbpedia-owl:country ?country.
?x geo:lat ?lat .
?x geo:long ?lng .
?x foaf:isPrimaryTopicOf ?link .
}
LIMIT 2500 OFFSET 0
I've also attempted to run it just by the dbprop to no avail.
SELECT * WHERE { ?page dbpprop:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Infobox_settlement> . ?page dbpedia:name ?name .}
If anyone is trying to do a similar thing, it is also possible via the wiki api, where you can pagananate over all results. http://en.wikipedia.org/w/api.php?action=query&list=embeddedin&eititle=Template:Infobox_settlement

There are at least two problems: (i) you need to use IRIs in places, and not strings; and (ii) you need to use properties that DBpedia uses.
Use IRIs
In
?x a <http://dbpedia.org/ontology/Settlement> .
and
?x foaf:isPrimaryTopicOf ?link .
you've demonstrated that you know that URIs need to be written in full with < and >, or abbreviated with a prefix. However,
?x dbpprop:wikiPageUsesTemplate "dbpedia:Template:Infobox_settlement" .
certainly isn't going to work. It's legal SPARQL, because a string can be the object of a triple, but you almost certainly want an IRI.
Use DBpedia's vocabulary
A query with dbpprop:wikiPageUsesTemplate like this returns no results:
select distinct ?template where {
[] dbpprop:wikiPageUsesTemplate ?template
}
SPARQL results
That's easy enough to check, and quickly confirms that there's no data that can possibly match your query. Where did you find this property? Have you seen it used somewhere? I'm not confident that you can query DBpedia based on infobox templates. DBpedia is not the same as Wikipedia, and even if the Wikipedia API supports it, it doesn't mean that DBpedia has a counterpart. There is a note on DBpedia Data Set Properties that says:
http://xx.dbpedia.org/property/wikiPageUsesTemplate (will be changed to http://dbpedia.org/ontology/wikiPageUsesTemplate)
but that latter property doesn't seem to be in use on the endpoints either. See my answer to Syntax for Sparql query for pages with specific infobox for more details.

Related

How to query for publication date of books using SPARQL in DBPEDIA

I am trying to retrieve the publication date and the no.of pages for books in DBpedia. I tried the following query and it gives empty results. I see that these are properties under book(http://mappings.dbpedia.org/server/ontology/classes/Book) but could not retrieve it.
I would like to know if there is an error in the code or if dbpedia does not store these dates related to books.
SELECT ?book ?genre ?date ?numberOfPages
WHERE {
?book rdf:type dbpedia-owl:Book .
?book dbp:genre ?genre .
?book dbp:firstPublicationDate ?date .
OPTIONAL {?book dbp:numberOfPages ?numberOfPages .}
}
The dbp:firstPublicationDate does not work for two reasons:
First, as pointed in the first answer, you used the wrong prefix.
But even if you correct it, you'll see that you would still have no results. Then the best thing to do is to test with the minimum number of patters, in you case you should as for books with first publication date, two triple pattern only. If you still don't get results, you should test how <http://dbpedia.org/ontology/firstPublicationDate> is actually used with a query like this:
SELECT ?class (COUNT (DISTINCT ?s) AS ?instances)
WHERE {
?s <http://dbpedia.org/ontology/firstPublicationDate> ?date ;
a ?class
}
GROUP BY ?class
ORDER BY DESC(?instances)
LIMIT 1000
Mapping based properties are using the namespace http://dbpedia.org/ontology/, thus, the prefix must be dbo instead of dbp, which stands for http://dbpedia.org/property/.
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?book ?genre ?date ?numberOfPages
WHERE {
?book a dbo:Book ;
dbp:genre ?genre ;
dbo:firstPublicationDate ?date .
OPTIONAL {?book dbp:numberOfPages ?numberOfPages .}
}
Some additional comments:
put the prefixes to the SPARQL query such that others here can run it without any exceptions (also in the future) - the current SPARQL query uses dbpedia-owl but this one is not pre-defined on the official DBpedia anymore - it's called dbo instead
which brings me to the second point -> if you're using a public SPARQL endpoint, show its URL
you can start debugging your own SPARQL query by simply starting with only parts of it and adding more triple patterns then, e.g. in your case you could check if there is any triple with the property with
PREFIX dbp: <http://dbpedia.org/property/>
SELECT * WHERE {?book dbp:firstPublicationDate ?date } LIMIT 10
Update
As Ivo Velitchkov noticed in his answer below, the property dbo:firstPublicationDate is only used for mangas, etc., i.e. written work that was published periodically. Thus, the result will be empty.

Retrieving the wider dbpedia vocabulary for tagging pictures

I'm trying to develop a tool in JS for tagging pictures, so I need a set of possible "things" from dbpedia. I already tryed to retrieve this way:
select ?s ?l {
?s a owl:Class .
?s rdf:type ?l
FILTER regex(str(?s), "House", "i").
}
http://dbpedia.org/snorql/?query=select+%3Fs+%3Fl+%7B%0D%0A+++%3Fs+a+owl%3AClass+.%0D%0A+++%3Fs+rdf%3Atype+%3Fl%0D%0A+++FILTER+regex%28str%28%3Fs%29%2C+%22House%22%2C+%22i%22%29.%0D%0A%7D
And also this way:
select ?label
WHERE {
?concept a skos:Concept.
?concept skos:prefLabel ?label.
FILTER regex(str(?label), "^House", "i").
}
http://dbpedia.org/snorql/?query=select+%3Flabel+%0D%0AWHERE+%7B%0D%0A++%3Fconcept+a+skos%3AConcept.%0D%0A++%3Fconcept+skos%3AprefLabel+%3Flabel.%0D%0A++FILTER+regex%28str%28%3Flabel%29%2C+%22%5EHouse%22%2C+%22i%22%29.%0D%0A%7D
In the first case, I just have "instances" of the house "thing", but not the "House" class itself. In the second one, I never retrieve the "house" and the similar thing is "houses". Any alternative for retrieving a better vocabulary based in dbpedia dataset?
If you don't bother to restrict yourself to owl:Thing or to skos:Concept, you can just get things that have a label that contains "house". Rather than using regex, I chose to use contains and lcase, since a string containment could be less expensive than invoking a full regular expression processor.
select ?thing ?label where {
?thing rdfs:label ?label .
filter contains(lcase(?label), "house")
}
SPARQL results (limited to 200)

Extract Chemical Data from DBpedia via SPARQL

I'd like to know how to submit a SPARQL query to DDBpedia and be given back a table that includes the information found in the Wikipedia "chembox" info box template, such as molecular weight or formula.
So, the first step was just to make a query whose results should be a list of chemical substances that had the formula and molecularWeight properties included. But the following returns no results:
SELECT * WHERE {
?y rdf:type dbpedia-owl:ChemicalSubstance.
?y rdfs:label ?Name .
?y dbpedia:molecularWeight ?molecularWeight .
?y dbpedia:formula ?formula .
OPTIONAL {?y dbpedia-owl:iupacName ?iupacname} .
FILTER (langMatches(lang(?Name),"en"))
}
LIMIT 50
SPARQL Explorer at dbpedia.org
And so I'm stuck. Is something wrong with this query or does DBPedia really not collect that information from the Wikipedia chemboxes?
You caught the wrong namespace for both dbpedia:molecularWeight and dbpedia:formula. The correct namespace here would be dbpedia2.
Furthermore, there seem rarely any entries having a dbpedia-owl:iupacName, dbpedia2:molecularWeight and dbpedia2:formula.
SELECT * WHERE {
?y rdf:type dbpedia-owl:ChemicalSubstance.
?y rdfs:label ?Name .
OPTIONAL {?y dbpedia2:formula ?formula }.
OPTIONAL {?y dbpedia2:molecularWeight ?molecularWeight}.
OPTIONAL {?y dbpedia-owl:iupacName ?iupacname} .
FILTER (langMatches(lang(?Name),"en"))
}
LIMIT 50
SPARQL Explorer #dbpedia.org
To get the correct namespaces, you could either look at one example like this or get a list of all used properties for type dbpedia-owl:ChemicalSubstance using
SELECT DISTINCT ?rel WHERE {
?y rdf:type dbpedia-owl:ChemicalSubstance.
?y ?rel ?x
}
SPARQL Explorer #dbpedia.org

Limit a SPARQL query to one dataset

I'm working with the following SPARQL query, which is an example on the web-based end of my institution's SPARQL endpoint;
SELECT ?building_number ?name ?occupants WHERE {
?site a org:Site ;
rdfs:label "Highfield Campus" .
?building spacerel:within ?site ;
skos:notation ?building_number ;
rdfs:label ?name .
OPTIONAL {
?building soton:buildingOccupants ?occ .
?occ rdfs:label ?occupants .
} .
} ORDER BY ?name
The problem is that as well as getting data from 'Buildings and Places', the Dataset I'm interested in, and would expect the example to use, it also gets data from the 'Facilities and Equipment' dataset, which isn't relevant. You should see this if you follow the link.
I suspect the example may pre-date the addition of the Facilities and Equipment dataset, but even with the research I've done into SPARQL, I can't see a clear way to define which datasets to include.
Can anyone recommend a starting point to limit it to just show 'Buildings', or, more specifically, results from the 'Buildings and Places' dataset.
Thanks
First things first, you really need to use SELECT DISTINCT, as otherwise you'll get repeated results.
To answer your question, you can use GRAPH { ... } to filter certain parts of a SPARQL query to only match data from a specific dataset. This only works if the SPARQL endpoint is divided up into GRAPHs (this one is). The solution you asked for isn't the best choice, as it assumes that things within sites in the 'places' dataset will always be resticted to buildings... That's risky -- as it might end up containing trees and signposts at some time in the future.
Step one is to just find out what graphs are in play:
SELECT DISTINCT ?g1 ?building_number ?name ?occupants WHERE {
?site a org:Site ;
rdfs:label "Highfield Campus" .
GRAPH ?g1 { ?building spacerel:within ?site ;
skos:notation ?building_number ;
rdfs:label ?name .
}
OPTIONAL {
?building soton:buildingOccupants ?occ .
?occ rdfs:label ?occupants .
} .
} ORDER BY ?name
Try it here: http://is.gd/WdRAGX
From this you can see that http://id.southampton.ac.uk/dataset/places/latest and http://id.southampton.ac.uk/dataset/places/facilities are the two relevant ones.
To only look for things 'within' a site according to the "places" graph, use:
SELECT DISTINCT ?building_number ?name ?occupants WHERE {
?site a org:Site ;
rdfs:label "Highfield Campus" .
GRAPH <http://id.southampton.ac.uk/dataset/places/latest> {
?building spacerel:within ?site ;
skos:notation ?building_number ;
rdfs:label ?name .
}
OPTIONAL {
?building soton:buildingOccupants ?occ .
?occ rdfs:label ?occupants .
} .
} ORDER BY ?name
Alternate solutions:
Using rdf:type
Above I've answered your question, but it's not the answer to your problem. This solution is more semantic as it actually says 'only give me buildings within the campus' which is what you really mean.
Instead of filtering by graph, which is not very 'semantic' you could also restrict ?building to be of class 'building' which research facilities are not. They are still sometimes listed as 'within' a site. Usually when the uni has only published what campus they are on but not which building.
?building a rooms:Building
Using FILTER
In extreme cases you may not have data in different GRAPHS and there may not be an elegant relationship to use to filter your results. In this case you can use a FILTER and turn the building URI into a string and use a regular expression to match acceptable ones:
FILTER regex(str(?building), "^http://id.southampton.ac.uk/building/")
This is bar far the worst option and don't use it if you have to.
Belt and Braces
You can use any of these restictions together and a combination of restricting the GRAPH plus ensuring that all ?buildings really are buildings would be my recommended solution.

How to match exact string literals in SPARQL?

I have this query. It matches anything which has "South" in its name. But I only want the one whose foaf:name is exactly "South".
SELECT Distinct ?TypeLabel
WHERE
{
?a foaf:name "South" .
?a rdf:type ?Type .
?Type rdfs:label ?TypeLabel .
}
a bit late but anyway... I think this is what your looking for:
SELECT Distinct ?TypeLabel Where {
?a foaf:name ?name .
?a rdf:type ?Type .
?Type rdfs:label ?TypeLabel .
FILTER (?name="South"^^xsd:string)
}
you can use FILTER with the xsd types in order to restrict the result.
hope this helps...
cheers!
(Breaking out of comments for this)
Data issues
The issue is the data, not your query. If use the following query:
SELECT DISTINCT ?a
WHERE {
?a foaf:name "Imran Khan" .
}
You find (as you say) "Imran Khan Niazy".
But looking at the dbpedia entry for Imran Khan, you'll see both:
foaf:name "Imran Khan Niazy"
foaf:name "Imran Khan"
This is because RDF allows repeated use of properties.
Cause
"South" had the same issue (album, artist, and oddly 'South Luton'). These are cases where there are both familiar names ("Imran Khan", "South"), and more precise names ("Imran Khan Niazy", "South (album)") for the purposes of correctness or disambiguation.
Resolution
If you want a more precise match try adding a type (e.g. http://dbpedia.org/ontology/MusicalWork for the album).
Beware
Be aware that DBpedia derives from Wikipedia, and the extraction process isn't perfect. This is an area alive with wonky data, so don't assume your query has gone wrong.
That query should match exactly the literal South and not literals merely containing South as a substring. For partial matches you'd go to FILTER with e.g. REGEX(). Your query engine is broken in this sense - which query engine you are working with?