SPARQL query to find "notable" people - sparql

I am using live Dbpedia (http://dbpedia-live.openlinksw.com/sparql/) to get basic details of notable people. My query is:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?x0 ?name ?dob WHERE {
?x0 rdf:type foaf:Person.
?x0 rdfs:label ?name.
?x0 dbpedia-owl:birthDate ?dob.
FILTER REGEX(?name,"^[A-Z]","i").
} LIMIT 200
This works and I use LIMIT 200 to limit the output to a small number of people. My problem is the 200 people are random, and I want some way of measuring 'notability' such that I return 200 notable people, rather than 200 random people. There are over 500,000 people in Dbpedia.
My question is, how can I measure 'notability' and limit the query to return notable people only? I realize there is no 'notability' property and it is very subjective. I am happy to use any indirect or approximate measure such as number of links or number of references. But I don't know how to do this.
Edit : As a result of the helpful comments I improved the query to include page ranks:
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo:<http://dbpedia.org/ontology/>
PREFIX vrank:<http://purl.org/voc/vrank#>
SELECT DISTINCT ?s ?name2 ?dob ?v
FROM <http://dbpedia.org>
FROM <http://people.aifb.kit.edu/ath/#DBpedia_PageRank>
WHERE {
?s rdf:type foaf:Person.
?s rdfs:label ?name.
?s dbo:birthDate ?dob.
?s vrank:hasRank/vrank:rankValue ?v.
FILTER REGEX(?name,"^[A-Z].*").
BIND (str(?name) AS ?name2)
} ORDER BY DESC(?v) LIMIT 100
The problem now is there are lots of duplicates, even though I am using DISTINCT.

Related

Sparql: Retrieve information on all people in dbpedia with bulk query

I can fetch several metadata fields for a particular person using the following query at https://dbpedia.org/sparql:
prefix dbpedia: <http://dbpedia.org/resource/>
prefix dbpedia-owl: <http://dbpedia.org/ontology/>
select * {
<http://dbpedia.org/resource/Mahatma_Gandhi> dbpedia-owl:birthName ?name.
OPTIONAL{<http://dbpedia.org/resource/Mahatma_Gandhi> dbpedia-owl:birthDate ?birth_date}
OPTIONAL{<http://dbpedia.org/resource/Mahatma_Gandhi> dbpedia-owl:deathDate ?death_date}
OPTIONAL{<http://dbpedia.org/resource/Mahatma_Gandhi> dbpedia-owl:thumbnail ?thumbnail}
OPTIONAL{<http://dbpedia.org/resource/Mahatma_Gandhi> dbpedia-owl:abstract ?abstract FILTER (lang(?abstract) = 'en')}
}
I've also seen query syntax that shows how to get metadata fields on many people at once:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?resource ?name
WHERE {
?resource rdf:type dbo:Person;
dbp:name ?name.
FILTER (lang(?name) = 'en')
}
ORDER BY ASC(?name)
LIMIT 10000 OFFSET 10000
How can I combine these two so that I can fetch the birth_date, death_date, thumbnail, and abstract (in English) for all people in DBPedia? Any pointers others can offer would be hugely helpful!

Dbpedia sparql gives empty result to my query, what is wrong?

I'm doing sparql query in this site.
It gives me an empty result, what is wrong with my query?
prefix foaf: <http://xmlns.com/foaf/0.1/>
select * where {
?s rdf:type foaf:Person.
} LIMIT 100
This query is ok, but when I add the second pattern, I got empty result.
?s foaf:name 'Abraham_Robinson'.
prefix foaf: <http://xmlns.com/foaf/0.1/>
select * where {
?s rdf:type foaf:Person.
?s foaf:name 'Abraham_Robinson'.
} LIMIT 100
How to correct my query so the result includes this record:
http://dbpedia.org/resource/Abraham_Robinson
I guess Kingsley misread this as a freetext search, rather than a simple string comparison.
The query Kingsley posted and linked earlier delivers no solution, for the same reasons as the original query failed, as identified in the comment by AKSW, i.e. --
foaf:name values don't generally replaces spaces with underscores as in the original value; i.e., 'Abraham_Robinson' should have been 'Abraham Robinson'
foaf:name strings are typically langtagged, and it is in this case, so that actually needs to be 'Abraham Robinson'#en
Incorporating AKSW's fixes with this line ?s foaf:name 'Abraham Robinson'#en., the query works.
All that said -- you may prefer an alternative query, which will deliver results whether or not the foaf:name value is langtagged and whether or not the spaces are replaced by underscores. This one is Virtuoso-specific, and produces results faster because the bif:contains function uses its free-text indexes, would be --
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT *
WHERE
{
?s rdf:type foaf:Person ;
foaf:name ?name .
?name bif:contains "'Abraham Robinson'" .
}
LIMIT 100
Generic SPARQL using a REGEX FILTER works against both Virtuoso and other RDF stores, but produces results more slowly because REGEX does not leverage the free-text indexes, as in --
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT *
WHERE
{
?s rdf:type foaf:Person ;
foaf:name ?name .
FILTER ( REGEX ( ?name, 'Abraham Robinson' ) ) .
}
LIMIT 100
The following query should work. Right now it isn't working due to a need to refresh the text index associated with this instance (currently in progress):
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT * where {
?s rdf:type foaf:Person.
?s foaf:name "'Abraham Robinson'".
}
LIMIT 100
Note how the phrase is placed within single-quotes that are within double-quotes.
If the literal content is language-tagged, as is the case in DBpedia the exact-match query would take the form (already clarified in #TallTed's response):
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT * where {
?s rdf:type foaf:Person.
?s foaf:name 'Abraham Robinson'#en.
}
LIMIT 100
This SPARQL Results Page Link should produce a solution when the index update completes.

Get birthplace from DBpedia people in SPARQL

I have the following code to retrieve all people that was born in Barcelona
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?person ?birthPlace
WHERE {
?person rdfs:label ?label.
?person rdf:type dbo:Person.
?person <http://dbpedia.org/property/birthPlace>
<http://dbpedia.org/resource/Barcelona>.
}
However, I do not know how to get the birthPlace. I want a variable that says next to each name that Barcelona is the place of birth. Any ideas?
How about this:
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?person ?birthPlace
WHERE {
?person rdfs:label ?label.
?person rdf:type dbo:Person.
?person <http://dbpedia.org/property/birthPlace> ?birthPlace.
FILTER (?birthPlace = <http://dbpedia.org/resource/Barcelona>)
}
Note that your query has a pattern to match labels, but the labels are not returned. That leads to duplicate results because some people have multiple labels (in different languages). Remove the pattern, or add ?label to the SELECT clause.
You can abbreviate <http://dbpedia.org/property/birthPlace> to dbp:birthPlace.

How to query dbpedia.org with sparql

I'm very new in OpenData and try to write a query in SPARQL.
My goal is to get data for following set of criteria:
- Category: Home_automation
- select all items from type "Thing"
- with at least one entry in "is Product of"
- that have a picture-url with a German description
I tried the following:
PREFIX cat: <http://dbpedia.org/resource/Category:>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT *
WHERE {
cat:Home_automation skos:broader ?x
}
But now I don't know how to add the other filters to the where clause.
I tried to user broader:... to get the items, but I think that was the wrong direction.
I tested the queries with: https://dbpedia.org/sparql
The result should be:
| (label) | (url)
|--------------------------|-----------------------------------
|"Kurzzeitwecker"#de | urls to the picture of the device
|"Staubsauger"#de | -||-
|"Waschmaschine"#de | -||-
|"Geschirrspülmaschine"#de | -||-
Does anyone have some tips please?
UPDATE: new query:
PREFIX cat: <http://dbpedia.org/resource/Category:>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?s ?label WHERE {
?s ?p cat:Home_automation .
?s rdf:type owl:Thing .
?s rdfs:label ?label
FILTER (LANG(?label)='de')
}
order by ?p
It is not clear what you want. You must first know what exact related information dbpedia contains and how they are structured.
However, you can try discovering what types of relationships cat:Home_automation is involved in, thus, you may know better what you want.
I suggest starting by generic queries, to specify how cat:Home_automation occurs in dbpedia, then, you might be able to go more specific, and pose further queries.
A query to list triples where cat:Home_automation is an subject:
PREFIX cat: <http://dbpedia.org/resource/Category:>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?p ?o WHERE {
cat:Home_automation ?p ?o
}
order by ?p
A query to list triples where cat:Home_automation is an object:
PREFIX cat: <http://dbpedia.org/resource/Category:>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?s ?p WHERE {
?s ?p cat:Home_automation
}
order by ?p
check the results, see what is interesting for you, and then continue with further queries.

SPARQL query to get all Person available in DBpedia is showing only some Person data, not all

I am writing SPARQL query to get all Person available in DBpedia. My query is ->
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?resource ?name
WHERE {
?resource rdf:type dbo:Person;
dbp:name ?name.
FILTER (lang(?name) = 'en')
}
ORDER BY ASC(?name)
It's giving around 10000 rows,when I am taking the output as HTML/csv/spreadsheet format.
But when I am giving query to get total count
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT COUNT(*)
WHERE{
?resource rdf:type dbo:Person;
dbp:name ?name.
FILTER (lang(?name) = 'en')
}
It's giving -> 1783404
Can anyone suggest a solution to get all rows of Person available in DBpedia?
DBPedia is being smart enough here to not overload its servers with large queries, and capping matches at 10000. Since you are ordering the results, you can use LIMIT and OFFSET to get result in sets of 10000. For example, to get the second set of 10000 results use this:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?resource ?name
WHERE {
?resource rdf:type dbo:Person;
dbp:name ?name.
FILTER (lang(?name) = 'en')
}
ORDER BY ASC(?name)
LIMIT 10000 OFFSET 10000
Actually, since DBPedia is limiting the results to 10000 matches, the LIMIT isn't really necessary.