Sparql: Retrieve information on all people in dbpedia with bulk query - sparql

I can fetch several metadata fields for a particular person using the following query at https://dbpedia.org/sparql:
prefix dbpedia: <http://dbpedia.org/resource/>
prefix dbpedia-owl: <http://dbpedia.org/ontology/>
select * {
<http://dbpedia.org/resource/Mahatma_Gandhi> dbpedia-owl:birthName ?name.
OPTIONAL{<http://dbpedia.org/resource/Mahatma_Gandhi> dbpedia-owl:birthDate ?birth_date}
OPTIONAL{<http://dbpedia.org/resource/Mahatma_Gandhi> dbpedia-owl:deathDate ?death_date}
OPTIONAL{<http://dbpedia.org/resource/Mahatma_Gandhi> dbpedia-owl:thumbnail ?thumbnail}
OPTIONAL{<http://dbpedia.org/resource/Mahatma_Gandhi> dbpedia-owl:abstract ?abstract FILTER (lang(?abstract) = 'en')}
}
I've also seen query syntax that shows how to get metadata fields on many people at once:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?resource ?name
WHERE {
?resource rdf:type dbo:Person;
dbp:name ?name.
FILTER (lang(?name) = 'en')
}
ORDER BY ASC(?name)
LIMIT 10000 OFFSET 10000
How can I combine these two so that I can fetch the birth_date, death_date, thumbnail, and abstract (in English) for all people in DBPedia? Any pointers others can offer would be hugely helpful!

Related

SPARQL query all persons who lived through the 20th century in DBPedia

I want to query all persons who lived through the entire 20th century in DBPedia.
Im using https://dbpedia.org/sparql/ to process my query. I have limited the output to 20.
The query I've tried is the following:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?personName ?birthDate ?deathDate
WHERE {
FILTER (?birthDate < "1900-01-01"^^xsd:date AND ?deathDate > "1999-12-31"^^xsd:date).
?p rdf:type dbo:Person.
?p dbp:name ?personName.
?p dbp:birthDate ?birthDate.
?p dbp:deathDate ?deathDate.
}
LIMIT 20.
In the output all person died after 1999-12-31 but they weren't born before 1900-01-01.
Why is my query wrong? How can I fix it?
Thanks in advance for you time.
This issue is due to integer values being included in the query results, and can be resolved using the DATATYPE() function in the same or a new FILTER() section.
For example:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?personName ?birthDate ?deathDate
WHERE
{
?person rdf:type dbo:Person;
dbp:name ?personName;
dbp:birthDate ?birthDate;
dbp:deathDate ?deathDate.
#This FILTER ensures that only xsd:date values are returned
FILTER(DATATYPE(?birthDate) = xsd:date && DATATYPE(?deathDate) = xsd:date).
FILTER (?birthDate < "1900-01-01"^^xsd:date && ?deathDate > "1999-12-31"^^xsd:date).
}
LIMIT 20
Live Query Definition
Live Query Result
DATATYPE() Documentation

SPARQL query to find "notable" people

I am using live Dbpedia (http://dbpedia-live.openlinksw.com/sparql/) to get basic details of notable people. My query is:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?x0 ?name ?dob WHERE {
?x0 rdf:type foaf:Person.
?x0 rdfs:label ?name.
?x0 dbpedia-owl:birthDate ?dob.
FILTER REGEX(?name,"^[A-Z]","i").
} LIMIT 200
This works and I use LIMIT 200 to limit the output to a small number of people. My problem is the 200 people are random, and I want some way of measuring 'notability' such that I return 200 notable people, rather than 200 random people. There are over 500,000 people in Dbpedia.
My question is, how can I measure 'notability' and limit the query to return notable people only? I realize there is no 'notability' property and it is very subjective. I am happy to use any indirect or approximate measure such as number of links or number of references. But I don't know how to do this.
Edit : As a result of the helpful comments I improved the query to include page ranks:
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo:<http://dbpedia.org/ontology/>
PREFIX vrank:<http://purl.org/voc/vrank#>
SELECT DISTINCT ?s ?name2 ?dob ?v
FROM <http://dbpedia.org>
FROM <http://people.aifb.kit.edu/ath/#DBpedia_PageRank>
WHERE {
?s rdf:type foaf:Person.
?s rdfs:label ?name.
?s dbo:birthDate ?dob.
?s vrank:hasRank/vrank:rankValue ?v.
FILTER REGEX(?name,"^[A-Z].*").
BIND (str(?name) AS ?name2)
} ORDER BY DESC(?v) LIMIT 100
The problem now is there are lots of duplicates, even though I am using DISTINCT.

SPARQL: Federated query gives no result when using local file, while same query on dbpedia does

the local file uploaded on stardog:
#prefix dbo: <http://dbpedia.org/ontology/> .
#prefix dbr: <http://dbpedia.org/resource/> .
dbr:United_States dbo:leader dbr:John_Roberts ,
dbr:Joe_Biden ,
dbr:Barack_Obama ,
dbr:Paul_Ryan .
1.query using the local file:
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX db: <http://dbpedia.org/>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?person ?o
FROM <http://example.com/leaders.ttl>
WHERE{
dbr:United_States dbo:leader ?person .
SERVICE <http://dbpedia.org/sparql> { ?person dbo:abstract ?o .}
}
2.Same query using only dbpedia will give results:
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX db: <http://dbpedia.org/>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?person ?o
FROM <http://example.com/leaders.ttl>
WHERE{
#dbr:United_States dbo:leader ?person .
SERVICE <http://dbpedia.org/sparql> { dbr:United_States dbo:leader ?person. ?person dbo:abstract ?o.}
}
Using the second query will result in a colum with the leaders and a column of abstract of the leaders in all languages available from dbpedia. Why does the first query where I use the local rdf file not work? The select query on the local file with dbr:United_States dbo:leader ?person . returns exactly the same column with the same leaders as running it directly on the dbpedia endpoint: dbpedia:John_Roberts, dbpedia:Joe_Biden, dbpedia:Barack_Obama, dbpedia:Paul_Ryan.
Why does the first query give no results?

SPARQL query to get all Person available in DBpedia is showing only some Person data, not all

I am writing SPARQL query to get all Person available in DBpedia. My query is ->
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?resource ?name
WHERE {
?resource rdf:type dbo:Person;
dbp:name ?name.
FILTER (lang(?name) = 'en')
}
ORDER BY ASC(?name)
It's giving around 10000 rows,when I am taking the output as HTML/csv/spreadsheet format.
But when I am giving query to get total count
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT COUNT(*)
WHERE{
?resource rdf:type dbo:Person;
dbp:name ?name.
FILTER (lang(?name) = 'en')
}
It's giving -> 1783404
Can anyone suggest a solution to get all rows of Person available in DBpedia?
DBPedia is being smart enough here to not overload its servers with large queries, and capping matches at 10000. Since you are ordering the results, you can use LIMIT and OFFSET to get result in sets of 10000. For example, to get the second set of 10000 results use this:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?resource ?name
WHERE {
?resource rdf:type dbo:Person;
dbp:name ?name.
FILTER (lang(?name) = 'en')
}
ORDER BY ASC(?name)
LIMIT 10000 OFFSET 10000
Actually, since DBPedia is limiting the results to 10000 matches, the LIMIT isn't really necessary.

Number of incoming link fo each DBpedia resource

I have the SPARQL DBpedia Query below:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX vrank:<http://purl.org/voc/vrank#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT distinct ?Nom ?resource ?url (count( (?o) as ?nb))
WHERE{
?resource rdfs:label ?Nom.
?resource foaf:isPrimaryTopicOf ?url.
?resource dbpedia-owl:wikiPageWikiLink ?o
?Nom <bif:contains> "Apple".
FILTER ( langMatches( lang(?Nom), "EN" )).
MINUS {?resource dbo:wikiPageRedirects|dbo:wikiPageDisambiguates ?dis}
}
Group By ?Nom ?resource ?url
I want to get the number of incoming links of each entitie within wikipedia. How can I proceed?
Thanks
First of all syntax:
you are missing a dot after ?o,
also it should be bif:contains, not <bif:contains>.
Next:
i ran this simpler query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX vrank:<http://purl.org/voc/vrank#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT distinct ?Nom ?resource
WHERE{
?resource rdfs:label ?Nom.
?Nom bif:contains "Apple".
}
which produced a lot of results....
now i added this triple:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX vrank:<http://purl.org/voc/vrank#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT distinct ?Nom ?resource
WHERE{
?resource rdfs:label ?Nom.
?resource dbpedia-owl:wikiPageWikiLink ?o.
?Nom bif:contains "Apple".
}
which returned no results.
this means there is no triple containing apple in its object literal, where the subject has a wikiPageWikiLink in the whole endpoint.
If this property does exist, its instances are not included in the official endpoint since there is no triple containing this property (I checked). This is probably due to the fact, that the official endpoint does not hold every dbpedia dataset, or it might be deprecated.