SPARQL query for all people for an institution on dbpedia - sparql

I'm trying to extract alumni lists for universities using SPARQL.
I've identified the ontologies I need:
http://mappings.dbpedia.org/server/ontology/classes/University
http://mappings.dbpedia.org/server/ontology/classes/Person
I tried this query, which you can examine here:
SELECT * WHERE {
?University dbpedia2:alumni ?Person .
}
Which seemed to make sense, except this returns counts instead of people, as the ontology says the property contains.
I found this query somewhere which seemed to do a better job finding universities, but was very slow.
SELECT * WHERE {
{ <http://dbpedia.org/ontology/University> ?property ?hasValue }
UNION
{ ?isValueOf ?property <http://dbpedia.org/ontology/University> }
}
I also tried going the other way, start with all people and look for their almae matres, in this form:
SELECT * WHERE {
?person dbpedia2:almaMater ?University
}
But this is much slower, possibly because searching through the people space is too laborious. This does actually work, but it returns a different set of results in application---namely, all people with a listed alma mater, rather than all people listed by universities as alumni. I'd prefer a syntax that gets me the alumni.
How can I phrase this to return all alumni listed for universities?

The performance of DBpedia's SPARQL endpoint can be a bit unreliable at times. After all, it's apublic service, and isn't intended for huge queries. Nonetheless, I think you can get what you're looking for here without too much trouble. First, you can check how many results there are with a query like this at the public SPARQL endpoint:
select (count(*) as ?nResults) where {
?person dbpedia-owl:almaMater ?almaMater
}
SPARQL results (64928)
Now, if you just want the big list, you'd get it like this. The order by helps organize the results for easy consumption, but isn't technically necessary:
select ?almaMater ?person where {
?person dbpedia-owl:almaMater ?almaMater
}
order by ?almaMater ?person
SPARQL results
If you need to place some additional restrictions on ?almaMater, e.g., to ensure that it's a university, then you can add them to the query. For instance:
select ?almaMater ?person where {
?person dbpedia-owl:almaMater ?almaMater .
?almaMater a dbpedia-owl:University .
}
order by ?almaMater ?person
SPARQL results

In your last query, you are almost there. However, you are currently asking for any resource that can take the place of the ?University variable. As you only want universities to take that place, you can use another triple to further restrict that variable:
SELECT * WHERE {
?University a dbpedia-owl:University.
?person dbpedia2:almaMater ?University.
}
This means that ?University can only be an individual of class dbpedia-owl:University (where dbpedia-owl is mapped to http://dbpedia.org/ontology/).

Your first query:
SELECT * WHERE {
?University dbpedia2:alumni ?Person .
}
isn't just returning counts; it's returning both counts and individual alumni. Apparently dbpedia's data here is poor quality and there are a number of triples misusing the dbpedia2:alumni relation.
You can filter out the counts by adding a second condition requiring that an entity satisfying Person be a member of the appropriate class:
SELECT * WHERE {
?university dbpedia2:alumni ?person .
?person rdf:type <http://dbpedia.org/ontology/Person>
}
What you see running this is that there are very few individuals tagged as alumni; the data is surprisingly scant, unfortunately.

Related

Get movie(s) based on book(s) from DBpedia

I am new to SPARQL and trying to fetch a movie adapted from specific book from dbpedia. This is what I have so far:
PREFIX onto: <http://dbpedia.org/ontology/>
SELECT *
WHERE
{
<http://dbpedia.org/page/2001:_A_Space_Odyssey> a ?type.
?type onto:basedOn ?book .
?book a onto:Book
}
I can't get any results. How can I do that?
When using any web resource, and in your case the property :basedOn, you need to make sure that you have declared the right prefix. If you are querying from the DBpedia SPARQL endpoint, then you can directly use dbo:basedOneven without declaring it, as it is among predefined. Alternatively, if you want to use your own, or if you are using another SPARQL client, make sure that whatever short name you choose for this property, you declare the prefix for http://dbpedia.org/ontology/.
Then, first, to get more result you may not restrict the type of the subject of this triple pattern, as there could be movies that actually not type as such. So, a query like this
select distinct *
{
?movie dbo:basedOn ?book .
?book a dbo:Book .
}
will give you lots of good results but not all. For example, the resource from your example will be missing. You can easily check test the available properties between these two resource with a query like this:
select ?p
{
{<http://dbpedia.org/resource/2001:_A_Space_Odyssey_(film)> ?p <http://dbpedia.org/resource/2001:_A_Space_Odyssey> }
UNION
{ <http://dbpedia.org/resource/2001:_A_Space_Odyssey> ?p <http://dbpedia.org/resource/2001:_A_Space_Odyssey_(film)>}
}
You'll get only one result:
http://www.w3.org/2000/01/rdf-schema#seeAlso
(note that the URI is with 'resource', not with 'page')
Then you may search for any path between the two resource, using the method described here, or find a combination of other patterns that would increase the number of results.

Find number of some entity type

What is sparql query that finds count of some entity? For examles, on Linked movie database, if I want find count of actors or films, how can I get it?
I tried this
SELECT (count ( ?Film)){?entity rdf:type ?Film}
but got wrong number.
There's a whole lot missing from this question (e.g., where you ran the query, what you expected as a result, etc.) but I think we can pinpoint the problem even without those details. First, let's rewrite your query using proper syntax (the formatting is optional; the important thing is count(?Film) as ?count):
select (count(?Film) as ?count) {
?entity rdf:type ?Film
}
?Film here is a variable, so you're asking "find me things and their types, and then count how many types were found." If you were trying to count the number of things of some particular film type, though, you probably wanted a query like:
select (count(?entity) as ?numberOfFilms) {
?entity rdf:type :Film .
}
Where :Film is some particular IRI, not a variable. Also note that you can abbreviate rdf:type with a, so you can make this even shorter and fit it nicely on one line again, if you want:
select (count(?entity) as ?numberOfFilms) { ?entity a :Film }

How to create a small SPARQL query for DBpedia?

I'm a SPARQL beginner, I would like to know how to create this small query in SPARQL from DBpedia:
The query is: Getting the topics of a thing (name of person, organisation …)
SELECT DISTINCT ?occupation WHERE {
?s <w3.org/2000/01/rdf-schema#label>; 'Madonna'#en . ?occupation dbpedia-owl:occupation ?s
}
So I create this query to get the occupation of Madonna, is this correct? In this case Madonna but it could be anything else.
I tried this query but i think this is wrong:
SELECT DISTINCT ?occupation WHERE {
?s <http://www.w3.org/2000/01/rdf-schema#label> 'Madonna'#en .
?s dbpedia-owl:occupation ?occupation
}
I tried this too i think it's correct:
PREFIX res: <http://dbpedia.org/resource/>
SELECT DISTINCT ?string
WHERE {
res:Tom_Cruise dbpprop:occupation ?string .
}
It works with Tom_Cruise but not with Madonna or barack_Obama for example.
A query like your attempt is a good start:
PREFIX res: <http://dbpedia.org/resource/>
SELECT DISTINCT ?string
WHERE {
res:Tom_Cruise dbpprop:occupation ?string .
}
Now that we've got something specific to work from, we can look at the specific problems that it might have. First, I'm going to rewrite it using the same namespace prefixes that the public endpoint web interface supports, so that we can copy and paste to it. I'm also putting the keywords in lower case because I don't like yelling.
select distinct ?string where {
dbpedia:Tom_Cruise dbpprop:occupation ?string .
}
SPARQL results
Now, you mentioned that
It works with Tom_Cruise but not with Madonna or barack_Obama, for example.
All the data in DBpedia is publicly available for you to browse. If you want to see why there are no results for Madonna, note that dbpedia:Madonna is shorthand for http://dbpedia.org/resource/Madonna and pull up that page in your browser. From the properties listed on that page, you'll see that it's a redirection page (indeed, you'll see the same thing if you go to the corresponding Wikipedia article, http://en.wikipedia.org/wiki/Madonna). You want the IRI http://dbpedia.org/resource/Madonna_(entertainer). Unfortunately, you can't write that directly in a SPARQL query because of the parentheses, so you have to write
select distinct ?string where {
<http://dbpedia.org/resource/Madonna_(entertainer)> dbpprop:occupation ?string .
}
SPARQL results
Now, there are a couple of problems with barack_Obama: (i) the capitalization needs to be Barack_Obama if you want any results. If you visit http://dbpedia.org/resource/Barack_Obama, though, you'll see that there's no dbpprop:occupation property. There's not much you can do about that; you can't query for data that isn't there. The data that is there that might be useful to you (and of a similar nature) would be dbpedia-owl:office, and dbpedia-owl:profession. For instance
select distinct ?string where {
dbpedia:Barack_Obama (dbpedia-owl:office|dbpedia-owl:profession) ?string .
}
SPARQL results

SPARQL query with multiple aggregates exceeds memory limit

I am trying to generate some user statistics from a triple store using SPARQL. Please see the query below. How can this be improved? Am I doing something evil here? Why is this consuming so much memory? (see the background story at the end of this post)
I prefer to do the aggregation and the joins all inside the triple store. Splitting up the query would mean that I had to join the results "manually", outside the database, loosing the efficiency and optimizations of the triple store. No need to reinvent the wheel for no good reason.
The query
SELECT
?person
(COUNT(DISTINCT ?sent_email) AS ?sent_emails)
(COUNT(DISTINCT ?received_email) AS ?received_emails)
(COUNT(DISTINCT ?receivedInCC_email) AS ?receivedInCC_emails)
(COUNT(DISTINCT ?revision) AS ?commits)
WHERE {
?person rdf:type foaf:Person.
OPTIONAL {
?sent_email rdf:type email:Email.
?sent_email email:sender ?person.
}
OPTIONAL {
?received_email rdf:type email:Email.
?received_email email:recipient ?person.
}
OPTIONAL {
?receivedInCC_email rdf:type email:Email.
?receivedInCC_email email:ccRecipient ?person.
}
OPTIONAL {
?revision rdf:type vcs:VcsRevision.
?revision vcs:committedBy ?person.
}
}
GROUP BY ?person
ORDER BY DESC(?commits)
Background
The problem is that I get the error "QUERY MEMORY LIMIT REACHED" in AllegroGraph (please also see my related SO question). As the repository only contains around 200k triples which easily fit into an (ntriples) input file of ca. 60 MB, I wonder how executing the query results requires more than 4 GB RAM, which is roughly two orders of magnitude higher.
Try splitting the computation in sub queries, for example:
SELECT
?person
(MAX(?sent_emails_) AS ?sent_emails_)
(MAX(?received_emails_ AS ?received_emails_)
(MAX(?receivedInCC_emails_ AS ?receivedInCC_emails_)
(MAX(?commits_) AS ?commits)
WHERE {
{
SELECT
?person
(COUNT(DISTINCT ?sent_email) AS ?sent_emails_)
(0 AS ?received_emails_)
(0 AS ?commits_)
WHERE {
?sent_email rdf:type email:Email.
?sent_email email:sender ?person.
?person rdf:type foaf:Person.
} GROUP BY ?person
} union {
(similar pattern for the others)
....
}
}
GROUP BY ?person
ORDER BY DESC(?commits)
The objective is to:
avoid the generation of a huge number of rows in the result set that needs to be processed for aggregation
avoid the use of OPTIONAL{} patterns, that also should affect performance

Problem with a select SPARQL query on dbpedia

I try to get some data about a city using Sparql query on DBpedia. The problem is I can't get the query to work.
Currently I do something like this:
SELECT ?title,?name,?abs WHERE {
?title skos:subject
<http://dbpedia.org/resource/Category:Cities%2C_towns_and_villages_in_Slovenia>.
?title dbpprop:officialName ?name.
?title dbpprop:abstract ?abs
}
I get all the towns, villages from Slovenia with all the data. The problem is, I would like to get the data (officialName and/or abstract) only for one town, for example Ljubljana. So I tried some things like this:
SELECT ?name WHERE {
?name dbpprop:officialName
<http://dbpedia.org/resource/Ljubljana>.
}
Of course it does not work. I don't exactly know why, though :), but I've been experimenting a bit and noticed some things like if I put
?name skos:subject <http://dbpedia.org/resource/Category:Ljubljana>.
I get some results (which are not relevant to me, but anyway), but if I put
?name skos:subject <http://dbpedia.org/resource/Ljubljana>.
there are no results for anything though element skos:subject exists on the page http://dbpedia.org/resource/Ljubljana.
Could someone please explain why the second example does not work and how to get the result I would like to have?
Thanks,
Ablak
Thanks
You want to query for <http://dbpedia.org/resource/Ljubljana> as a subject, not an object; this would replace your ?title binding in the SPARQL query, for example:
SELECT ?name, ?abs WHERE {
<http://dbpedia.org/resource/Ljubljana>
skos:subject <http://dbpedia.org/resource/Category:Cities%2C_towns_and_villages_in_Slovenia> ;
dbpprop:officialName ?name ;
dbpprop:abstract ?abs .
}
This is why your graph match ?name skos:subject <http://dbpedia.org/resource/Ljubljana> does not return the expected results; the URI for Ljubljana should be the subject of the statement(s) you want to match.