SPARQL query to get all Person available in DBpedia is showing only some Person data, not all - sparql

I am writing SPARQL query to get all Person available in DBpedia. My query is ->
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?resource ?name
WHERE {
?resource rdf:type dbo:Person;
dbp:name ?name.
FILTER (lang(?name) = 'en')
}
ORDER BY ASC(?name)
It's giving around 10000 rows,when I am taking the output as HTML/csv/spreadsheet format.
But when I am giving query to get total count
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT COUNT(*)
WHERE{
?resource rdf:type dbo:Person;
dbp:name ?name.
FILTER (lang(?name) = 'en')
}
It's giving -> 1783404
Can anyone suggest a solution to get all rows of Person available in DBpedia?

DBPedia is being smart enough here to not overload its servers with large queries, and capping matches at 10000. Since you are ordering the results, you can use LIMIT and OFFSET to get result in sets of 10000. For example, to get the second set of 10000 results use this:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?resource ?name
WHERE {
?resource rdf:type dbo:Person;
dbp:name ?name.
FILTER (lang(?name) = 'en')
}
ORDER BY ASC(?name)
LIMIT 10000 OFFSET 10000
Actually, since DBPedia is limiting the results to 10000 matches, the LIMIT isn't really necessary.

Related

Sparql: Retrieve information on all people in dbpedia with bulk query

I can fetch several metadata fields for a particular person using the following query at https://dbpedia.org/sparql:
prefix dbpedia: <http://dbpedia.org/resource/>
prefix dbpedia-owl: <http://dbpedia.org/ontology/>
select * {
<http://dbpedia.org/resource/Mahatma_Gandhi> dbpedia-owl:birthName ?name.
OPTIONAL{<http://dbpedia.org/resource/Mahatma_Gandhi> dbpedia-owl:birthDate ?birth_date}
OPTIONAL{<http://dbpedia.org/resource/Mahatma_Gandhi> dbpedia-owl:deathDate ?death_date}
OPTIONAL{<http://dbpedia.org/resource/Mahatma_Gandhi> dbpedia-owl:thumbnail ?thumbnail}
OPTIONAL{<http://dbpedia.org/resource/Mahatma_Gandhi> dbpedia-owl:abstract ?abstract FILTER (lang(?abstract) = 'en')}
}
I've also seen query syntax that shows how to get metadata fields on many people at once:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?resource ?name
WHERE {
?resource rdf:type dbo:Person;
dbp:name ?name.
FILTER (lang(?name) = 'en')
}
ORDER BY ASC(?name)
LIMIT 10000 OFFSET 10000
How can I combine these two so that I can fetch the birth_date, death_date, thumbnail, and abstract (in English) for all people in DBPedia? Any pointers others can offer would be hugely helpful!

SPARQL query all persons who lived through the 20th century in DBPedia

I want to query all persons who lived through the entire 20th century in DBPedia.
Im using https://dbpedia.org/sparql/ to process my query. I have limited the output to 20.
The query I've tried is the following:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?personName ?birthDate ?deathDate
WHERE {
FILTER (?birthDate < "1900-01-01"^^xsd:date AND ?deathDate > "1999-12-31"^^xsd:date).
?p rdf:type dbo:Person.
?p dbp:name ?personName.
?p dbp:birthDate ?birthDate.
?p dbp:deathDate ?deathDate.
}
LIMIT 20.
In the output all person died after 1999-12-31 but they weren't born before 1900-01-01.
Why is my query wrong? How can I fix it?
Thanks in advance for you time.
This issue is due to integer values being included in the query results, and can be resolved using the DATATYPE() function in the same or a new FILTER() section.
For example:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?personName ?birthDate ?deathDate
WHERE
{
?person rdf:type dbo:Person;
dbp:name ?personName;
dbp:birthDate ?birthDate;
dbp:deathDate ?deathDate.
#This FILTER ensures that only xsd:date values are returned
FILTER(DATATYPE(?birthDate) = xsd:date && DATATYPE(?deathDate) = xsd:date).
FILTER (?birthDate < "1900-01-01"^^xsd:date && ?deathDate > "1999-12-31"^^xsd:date).
}
LIMIT 20
Live Query Definition
Live Query Result
DATATYPE() Documentation

Querying dbpedia, not getting expected result not sure what's the mistake

My query is
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?craft where
{
?craft <http://dbpedia.org/property/title> ?v.
}
Now this returning lot of results,
but nothing related to Steve Jobs or Tim Cook, even though in their page there is a property named title.
dbp:title dbr:List_of_Apple_Computer_CEOs
http://dbpedia.org/page/Steve_Jobs
The query:
PREFIX dbp: <http://dbpedia.org/property/>
SELECT (COUNT(*) AS ?nb_result)
WHERE {
?craft dbp:title ?v .
}
returns:
nb_result
---------
1566113
The public query endpoint for DBpedia limits the number of results to 10,000, among other restrictions. So your chances of retrieving any specific statement there are very small. If you worry that the data at the query endpoint is different than at the front end, you can check that the data is there with the query:
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX dbr: <http://dbpedia.org/resource/>
SELECT ?prop ?value
WHERE {
dbr:Steve_Jobs ?prop ?value .
}
and compare with what's displayed at http://dbpedia.org/page/Steve_Jobs.

SPARQL query to find "notable" people

I am using live Dbpedia (http://dbpedia-live.openlinksw.com/sparql/) to get basic details of notable people. My query is:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?x0 ?name ?dob WHERE {
?x0 rdf:type foaf:Person.
?x0 rdfs:label ?name.
?x0 dbpedia-owl:birthDate ?dob.
FILTER REGEX(?name,"^[A-Z]","i").
} LIMIT 200
This works and I use LIMIT 200 to limit the output to a small number of people. My problem is the 200 people are random, and I want some way of measuring 'notability' such that I return 200 notable people, rather than 200 random people. There are over 500,000 people in Dbpedia.
My question is, how can I measure 'notability' and limit the query to return notable people only? I realize there is no 'notability' property and it is very subjective. I am happy to use any indirect or approximate measure such as number of links or number of references. But I don't know how to do this.
Edit : As a result of the helpful comments I improved the query to include page ranks:
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo:<http://dbpedia.org/ontology/>
PREFIX vrank:<http://purl.org/voc/vrank#>
SELECT DISTINCT ?s ?name2 ?dob ?v
FROM <http://dbpedia.org>
FROM <http://people.aifb.kit.edu/ath/#DBpedia_PageRank>
WHERE {
?s rdf:type foaf:Person.
?s rdfs:label ?name.
?s dbo:birthDate ?dob.
?s vrank:hasRank/vrank:rankValue ?v.
FILTER REGEX(?name,"^[A-Z].*").
BIND (str(?name) AS ?name2)
} ORDER BY DESC(?v) LIMIT 100
The problem now is there are lots of duplicates, even though I am using DISTINCT.

Number of incoming link fo each DBpedia resource

I have the SPARQL DBpedia Query below:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX vrank:<http://purl.org/voc/vrank#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT distinct ?Nom ?resource ?url (count( (?o) as ?nb))
WHERE{
?resource rdfs:label ?Nom.
?resource foaf:isPrimaryTopicOf ?url.
?resource dbpedia-owl:wikiPageWikiLink ?o
?Nom <bif:contains> "Apple".
FILTER ( langMatches( lang(?Nom), "EN" )).
MINUS {?resource dbo:wikiPageRedirects|dbo:wikiPageDisambiguates ?dis}
}
Group By ?Nom ?resource ?url
I want to get the number of incoming links of each entitie within wikipedia. How can I proceed?
Thanks
First of all syntax:
you are missing a dot after ?o,
also it should be bif:contains, not <bif:contains>.
Next:
i ran this simpler query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX vrank:<http://purl.org/voc/vrank#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT distinct ?Nom ?resource
WHERE{
?resource rdfs:label ?Nom.
?Nom bif:contains "Apple".
}
which produced a lot of results....
now i added this triple:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX vrank:<http://purl.org/voc/vrank#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT distinct ?Nom ?resource
WHERE{
?resource rdfs:label ?Nom.
?resource dbpedia-owl:wikiPageWikiLink ?o.
?Nom bif:contains "Apple".
}
which returned no results.
this means there is no triple containing apple in its object literal, where the subject has a wikiPageWikiLink in the whole endpoint.
If this property does exist, its instances are not included in the official endpoint since there is no triple containing this property (I checked). This is probably due to the fact, that the official endpoint does not hold every dbpedia dataset, or it might be deprecated.