I am novice with SPARQL and DBpedia.
I would like to get knowledge of building simple SPARQL queries.
Could you please help me to build answer for such questions as:
Hometown of footballer (any one), List of Artists, List of Oscar winners (any year)
I think this question is probably too broad, but in case it's useful, it might make sense to describe how to approach this type of problem. For one of the problems, here's what I did.
List of Oscar winners (any year)
In this case, I started by visting the DBpedia entry for an Academy Award winner, Brad Pitt. There you'll see the property dcterms:subject category:Producers_who_won_the_Best_Picture_Academy_Award.
That category has property
skos:broader
category:Best_Picture_Academy_Award_winners which, in turn, has
skos:broader
category:Academy_Award_winners. So you could look for things that have a dcterms:subject value of some category that's connected by a path of skos:broader links to the Academy_Award_winners category. That will actually turn up some things that aren't persons, because those categories are categories of articles, not classes of entities, so you'll also want to filter down to those things which are Persons. That's probably going to give you a list of Academy Award winners, though it's possible that some are just in that category because they have some other relationship to the category:
select ?person where {
?person a dbpedia-owl:Person ;
dcterms:subject/skos:broader* category:Academy_Award_winners .
}
SPARQL results
Related
I'm a little stuck on if a SPARQL query is possible for what I want to do:
I get that
?human wdt:P31 wd:Q5
would give me all items that are an instance of human.
Is there a similar way to find all items that are an instance of a place (e.g., town, city, country, river, continent, museum, building, etc.)?
The trick will likely be to find Wikidata class which is a good proxy for what you consider "place". The statement
?item wdt:P31/wdt:P279* wd:Q618123 .
will give you all the instances of "geographical objects" and its subclasses which might be a good starting point to explore.
Wikidata provides query browser at https://query.wikidata.org
I want to display films all fields. I tried with using * but its not working. Does anybody know how to display all fields of the data for Films?
To work with SPARQL is necessary to understand some concepts, as #AKSW said in the comments of the question. If you don't understand the meaning of ?film ?p ?o. This is called triple¹ and is composed by subject-predicate-object. E. g., in the case of the films, it could be: x is a film. This is what you are querying in the Wikidata Query Service (WDQS) when you use ?film wdt:P31 wd:Q11424.
I think it isn't possible to display all the property-values of an item. In addition it probably could cause a timeout because there is many statements of many items.
If you want to check the property-values of all the films in Wikidata I think an option might be you write or find a script to extract the items with P31-Q11424 (instance of films). For that, the accessing data section could be useful (e. g. with pywikibot you could query and extract what you want).
If you are interested in SPARQL and WDQS I recommend you to read some help resources:
Wikidata Query Service Help, specifically the SPARQL tutorial.
Query examples (read another queries is how I began to learn).
SPARQL 1.1 Query Language specification.
RDF Dump Format (because read about the ontology of Wikidata could help to understand the concepts).
Edit
When I answer it I wrote triplestore and linked it to its respective page in the Wikipedia in English, but after the comment of #AKSW I consider I was wrong because the triplestore is the concept which is used to refer to the storage and retrieval of triple or semantic triple, "a set of three entities that codifies a statement about semantic data in the form of subject–predicate–object expressions" (from Semantic triple page in Wikipedia in English).
I am trying to map DBPedia types to Wikipedia Categories, a simple example would be the following SPARQL query
select distinct ?cat where {
?s a dbpedia-owl:LacrossePlayer; dcterms:subject ?cat . filter(regex(?cat,'players','i') )
} limit 100
SPARQL Result
But this is highly inefficient as it has to first map the DBpedia types to DBpedia Named Entities(resources) and then extract their corresponding Wikipedia categories. I am trying to do this mapping for a lot of other DBpedia types.
Is there a direct or more efficient way to do this?
Improving the filter may help…
As an initial note, you may get some speedup if you remove or improve your filter. You can, of course, just remove it, but you could also make it more efficienct, since you're not really using any special regular expressions. Just do
filter contains(lcase(str(?cat)),'players')
to check whether the URI for ?cat contains the string players. It might even be better (I'm not sure) to grab the English rdfs:label of ?cat and check that, since you wouldn't have to do the case or string conversions.
… but there are lots of results.
But this is highly inefficient as it has to first map the DBpedia
types to DBpedia Named Entities(resources) and then extract their
corresponding Wikipedia categories. I am trying to do this mapping for
a lot of other DBpedia types. Is there a direct or more efficient way
to do this?
I'm not sure exactly what's inefficient in this. The only way that DBpedia types and categories are associated is that resources have types (via rdf:type) and have categories (via dcterms:subject). If you want to find the connections, then you'll need to find the instances of the type and the categories to which they belong. There may be some possibility that you can look into whether any particular infoboxes provide categories to articles and are used in the infobox mapping to provide DBpedia types. That's the only way to get category/DBpedia-types directly, without going through instances that I can think of, and I don't know whether the current dataset has that kind of information.
In general, since Wikipedia categories are not a type hierarchy, there will be lots of categories with which instances of any particular type are associated. For instance, we can count the number of categories associated with the types Fish and LacrossePlayer with a query like this:
select ?type (count(distinct ?category) as ?nCategories) where {
values ?type { dbpedia-owl:Fish dbpedia-owl:LacrossePlayer }
?type ^a/dcterms:subject ?category
}
group by ?type
SPARQL results
type nCategories
http://dbpedia.org/ontology/LacrossePlayer 346
http://dbpedia.org/ontology/Fish 2375
That query responds pretty quickly, and you can even get those categories pretty easily, too:
select distinct ?type ?category where {
values ?type { dbpedia-owl:Fish dbpedia-owl:LacrossePlayer }
?type ^a/dcterms:subject ?category
}
order by ?type
limit 4000
SPARQL results
When you start using types that have many more instances, though, these counts get big, and the queries take a while to return. E.g., a very common type like Place:
select ?type (count(distinct ?category) as ?nCategories) where {
values ?type { dbpedia-owl:Place }
?type ^a/dcterms:subject ?category
}
group by ?type
type nCategories
http://dbpedia.org/ontology/Place 191172
I wouldn't suggest trying to pull all that data down from the remote server. If you want to extract it, you should load the data locally.
I would like to know how to express the following question in sparql:
"Give me the parents whose every child goes to MIT"
More generally, I would like to know what are the limits of query sparql please? What kinds of questions with answers in database cannot be formulated as sparql, please?
Thank you for your help
You can express this using a negated existential quantification. Like this:
SELECT ?parent
WHERE { ?parent a :Parent .
FILTER NOT EXISTS {
?c :childOf ?parent .
?c :enrolledIn ?school .
FILTER (str(?school) != "MIT")
}
}
This query asks for all parents who do not havy any child that is enrolled in a school different from MIT.
Your question involves quantification, and is one example of things that cannot be expressed as one query in regular SPARQL 1.0. (It may be expressed in SPARQL 1.1 as shown in Jeen Broekstra's answer, or as an OWL class.)
Many SPARQL 1.0 implementations, though, have developped extensions to handle quantification. A commercial example is Intellidimension Semantics Platform, which would give you something like:
SELECT ?parent
WHERE { ?child :hasParent ?parent FORALL(?child){ ?child :hasSchool "MIT" } }
An academic example is SPARQLog from Oxford University Computing Lab. I am not aware that this system is available as an easy download, but the paper is freely available and provides insight into the difficulties of implementing quantification for SPARQL.
As for your question about the limits of SPARQL, it is too general to answer in a few words, but here is a link to a relevant paper, again as far as SPARQL 1.0 is concerned: Semantics and Complexity of SPARQL
For the specific question you can read up on relational division. Alternatively, you can find all the children who do not go to MIT, find their parents, and remove those parents from the list of all parents.
Sorry, can't help with SPARQL's limits.
I've forgotten all I once new about DBpedia and SPARQL and find all the examples too complex and hard to understand when I Google for them.
What I wish to do is pass in two or three Wikipedia pages and get back the set of Wikipedia categories that all of the pages are members of.
This seems that it should be utterly simple in SPARQL so I would appreciate a very minimal example to get me started.
This is actually a variation of your earlier question about getting all pages belonging to two categories. The only difference is that this time, you want two/three subjects rather than objects, so you cannot use a comma-separated enumeration of values, but instead have to write out the triple pattern that you want to match.
For example, to get back all categories that both Spain and Portugal belong to, you could simply do a query like this:
SELECT ?cat
WHERE {
<http://dbpedia.org/resource/Spain> dcterms:subject ?cat .
<http://dbpedia.org/resource/Portugal> dcterms:subject ?cat .
}
what this query does is select all triple patterns that have the same value of ?cat for the dcterms:subject relation for the subjects 'Spain' and 'Portugal'. In other words, it retrieves precisely those categories that both resources are a member of.
The trick is to think in terms of a graph, or triples with connected subjects and objects. It's a bit of a mental shift but once you've got that, query writing becomes a lot easier.
The mapping between wikipedia and dbpedia URI's is as follows:
For
http://en.wikipedia.org/wiki/Spain
DBPedia uri is:
http://dbpedia.org/resource/Spain
So to find out the categories for the above
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?categoryUri ?categoryName
WHERE {
<http://dbpedia.org/resource/Spain> dcterms:subject ?categoryUri.
?categoryUri rdfs:label ?categoryName.
FILTER (lang(?categoryName) = "en")
}