I am working on a fun query. In dbpedia, I want to find records of females who have married younger males. Or,alternatively, husbands who had wives older than them.
I came up this this SPARQL query. Sadly it returns too few records, only n = 3.
# I always add these for easier commandline querying
prefix dbo: <http://dbpedia.org/ontology/>
prefix dbp: <http://dbpedia.org/property/>
prefix dbr: <http://dbpedia.org/resource/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix exo: <http://example.org/ontology/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?husband ?husbandBirthDate ?wife ?wifeBirthDate
WHERE {
# there are 300000 triples that use dbp:spouse
# unfortunately very few have spouses which are also in dbpedia
# and for those partners the birthDate occurs even more rarely in dbpedia
{
?husband dbp:spouse ?wife .
?husband dbp:gender ?husbandGender.
FILTER(str(?husbandGender) = "male")
?husband dbp:birthDate ?husbandBirthDate .
?wife dbp:birthDate ?wifeBirthDate .
?wife dbp:gender ?wifeGender.
FILTER(str(?wifeGender) = "female") .
FILTER (?husbandBirthDate > ?wifeBirthDate)
}
UNION
# there are ~1400 triples that use dbo:husband which means "hasHusband"
{
?wife dbp:husband ?husband .
?husband dbo:birthDate ?husbandBirthDate .
?wife dbo:birthDate ?wifeBirthDate .
FILTER (?husbandBirthDate > ?wifeBirthDate)
}
UNION
# there are ~1300 triples that use dbo:wife which means "hasWife"
{
?husband dbp:wife ?wife .
?husband dbo:birthDate ?husbandBirthDate .
?wife dbo:birthDate ?wifeBirthDate .
FILTER (?husbandBirthDate > ?wifeBirthDate)
}
}
ORDER BY ?husbandBirthDate
Sadly it returns too few records,these three, (currently, 2020):
husband
husbandBirthDate
wife
wifeBirthDate
http://dbpedia.org/resource/Karel_Kosík
1926-06-26
http://dbpedia.org/resource/Růžena_Grebeníčková
1925-11-01
http://dbpedia.org/resource/Keiran_Lee
1984-01-15
http://dbpedia.org/resource/Puma_Swede
1976-09-13
http://dbpedia.org/resource/Keiran_Lee
1984-01-15
http://dbpedia.org/resource/Kirsten_Price_(actress)
1981-11-13
What am I doing wrong?
Should I switch to the foaf ontology? Do I have to explicitly parse the date with ^^xsd::date?
Note: I have my own docker-image of DBpedia, in the production version of DBpedia about 25 records are found. Still too few , I think.
Related
I have the following query on this endpoint https://data.bnf.fr/sparql/ :
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdagroup2elements: <http://rdvocab.info/ElementsGr2/>
PREFIX bio: <http://vocab.org/bio/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT distinct ?name ?nationality
WHERE {
?oeuvre dcterms:creator ?author.
?author foaf:name ?name.
?author rdagroup2elements:countryAssociatedWithThePerson ?nationality.
}
ORDER BY DESC (?mort) LIMIT 100
Which returns a list of authors with a nationality field that is itself another RDF source :
So, for the first author Jean Martin, I get this link to another RDF source, in the case for the country France : http://id.loc.gov/vocabulary/countries/fr
How could I modify the query to receive the country code (or country name, if not possible) instead of this link, in this case FR (or France)?
An alternative to extracting the country code from the URI, using "the Linked Data way":
The default graph of the endpoint https://data.bnf.fr/sparql/ doesn’t provide any data about the entities under the namespace http://id.loc.gov/vocabulary/countries/, but it provides entities under the namespace http://data.bnf.fr/vocabulary/countrycodes/, which have an owl:sameAs link to them.
For example:
<http://data.bnf.fr/vocabulary/countrycodes/fr> <http://www.w3.org/2002/07/owl#sameAs> <http://id.loc.gov/vocabulary/countries/fr> .
And these http://data.bnf.fr/vocabulary/countrycodes/ entities refer to the country code with skos:notation, and to the country name with skos:prefLabel (language-tagged).
For these cases, getting the country code would be possible with this property path:
?author rdagroup2elements:countryAssociatedWithThePerson/^owl:sameAs/skos:notation ?countryCode .
Unfortunately, only some rdagroup2elements:countryAssociatedWithThePerson values are under the http://id.loc.gov/vocabulary/countries/ namespace, while other values are under the http://data.bnf.fr/vocabulary/countrycodes/ namespace directly.
To find both cases, you could use UNION:
{ ?author rdagroup2elements:countryAssociatedWithThePerson/^owl:sameAs/skos:notation ?countryCode . }
UNION
{ ?author rdagroup2elements:countryAssociatedWithThePerson/skos:notation ?countryCode . }
The full query:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdagroup2elements: <http://rdvocab.info/ElementsGr2/>
PREFIX bio: <http://vocab.org/bio/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT DISTINCT ?name ?countryCode
WHERE {
?oeuvre dcterms:creator ?author.
?author foaf:name ?name.
{ ?author rdagroup2elements:countryAssociatedWithThePerson/^owl:sameAs/skos:notation ?countryCode . }
UNION
{ ?author rdagroup2elements:countryAssociatedWithThePerson/skos:notation ?countryCode . }
}
LIMIT 100
(In case you didn’t intend it: Your query treats different persons with the same name and country as one entry. To prevent this, you could output the person’s URI.)
You can use SPARQL replace function:
SELECT ... (REPLACE(STR(?nationality), "http://id.loc.gov/vocabulary/countries/", "") AS ?nationalityShort)
WHERE ...
to extract the code from the url.
But you can as well link to this resource and retrieve additional fields from it like sos:notation in your case:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdagroup2elements: <http://rdvocab.info/ElementsGr2/>
PREFIX bio: <http://vocab.org/bio/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT distinct ?name ?nationalityCode
WHERE {
?oeuvre dcterms:creator ?author.
?author foaf:name ?name.
?author rdagroup2elements:countryAssociatedWithThePerson ?nationality.
?nationality skos:notation ?nationalityCode
}
ORDER BY DESC (?mort) LIMIT 100
I want to query all persons who lived through the entire 20th century in DBPedia.
Im using https://dbpedia.org/sparql/ to process my query. I have limited the output to 20.
The query I've tried is the following:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?personName ?birthDate ?deathDate
WHERE {
FILTER (?birthDate < "1900-01-01"^^xsd:date AND ?deathDate > "1999-12-31"^^xsd:date).
?p rdf:type dbo:Person.
?p dbp:name ?personName.
?p dbp:birthDate ?birthDate.
?p dbp:deathDate ?deathDate.
}
LIMIT 20.
In the output all person died after 1999-12-31 but they weren't born before 1900-01-01.
Why is my query wrong? How can I fix it?
Thanks in advance for you time.
This issue is due to integer values being included in the query results, and can be resolved using the DATATYPE() function in the same or a new FILTER() section.
For example:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?personName ?birthDate ?deathDate
WHERE
{
?person rdf:type dbo:Person;
dbp:name ?personName;
dbp:birthDate ?birthDate;
dbp:deathDate ?deathDate.
#This FILTER ensures that only xsd:date values are returned
FILTER(DATATYPE(?birthDate) = xsd:date && DATATYPE(?deathDate) = xsd:date).
FILTER (?birthDate < "1900-01-01"^^xsd:date && ?deathDate > "1999-12-31"^^xsd:date).
}
LIMIT 20
Live Query Definition
Live Query Result
DATATYPE() Documentation
I am trying to retrieve some data about movies from DBpedia. This is my query:
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX : <http://dbpedia.org/resource/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX onto: <http://dbpedia.org/ontology/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT *
{{SELECT *
WHERE {
SERVICE<http://dbpedia.org/sparql>{
?movie dcterms:subject <http://dbpedia.org/resource/Category:American_films> ;
a onto:Film ;
rdfs:label ?title ;
dbpedia2:gross ?revenue .
?movie onto:starring ?actorUri .
?actorUri rdfs:label ?actor .
OPTIONAL {
?movie onto:imdbId ?imdbId .
}
BIND(xsd:integer(?revenue) as ?intRevenue) .
FILTER ((datatype(?revenue) = 'http://dbpedia.org/datatype/usDollar') && (LANGMATCHES(LANG(?title), 'en')) && (LANGMATCHES(LANG(?actor), 'en'))) .
}
}
}}
ORDER BY DESC (?intRevenue)
LIMIT 40000
OFFSET 0
Running this query on http://dbpedia.org/snorql/ (without the SERVICE keyword) returns the correct result. However, doing so from a third party triplestore doesn't yield the same order (ex: Hobbit and Lord of the Rings are missing).
What do I need to change in the query to get identical results?
The best way to overcome this specific limitation is to have your own DBpedia mirror, on which you can set your own limits (including none), and which you can then use as either your primary or remote data store and/or query engine.
(ObDisclaimer: OpenLink Software provides the public DBpedia SPARQL endpoint, produces Virtuoso and the DBpedia Mirror AMI, and employs me.)
Some wikipedia articles has precise timestamp in infobox, like this one:
https://en.wikipedia.org/wiki/Apollo_11
(Launch date: July 16, 1969, 13:32:00 UTC)
or:
https://en.wikipedia.org/wiki/Remembrance_Day_bombing
(Date: 8 November 1987 10:43 (GMT))
Is there a way to get a list of all articles like this? Seems like it possible with SPARQL
AFAIK It would be possible, but it require to know what wiki property is linked to the date ( or date time ) field of the infobox; let me explain with an example:
PREFIX : <http://dbpedia.org/resource/>
PREFIX time-of-spacecraft-launch: <http://www.wikidata.org/entity/P619c>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?entity_label, ?property_label, ?time_of_spacecraft_launch WHERE {
:Apollo_11 owl:sameAs ?wikidata_entity .
?wikidata_entity time-of-spacecraft-launch: ?time_of_spacecraft_launch .
?wikidata_entity rdfs:label ?entity_label .
?wke_prop ?property_rel time-of-spacecraft-launch:.
?wke_prop rdfs:label ?property_label .
FILTER (LANG(?property_label)='en' && LANG(?entity_label)='it')
}
click here to se the result
Now we can gel all the article with the same kind of information simply by removing the where condition on Apollo_11:
PREFIX : <http://dbpedia.org/resource/>
PREFIX time-of-spacecraft-launch: <http://www.wikidata.org/entity/P619c>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?entity_label, ?property_label, ?time_of_spacecraft_launch WHERE {
?wikidata_entity time-of-spacecraft-launch: ?time_of_spacecraft_launch .
?wikidata_entity rdfs:label ?entity_label .
?wke_prop ?property_rel time-of-spacecraft-launch:.
?wke_prop rdfs:label ?property_label .
FILTER (LANG(?property_label)='en' && LANG(?entity_label)='it')
}
see the result herefy
in some cases may be useful to simplify the query:
PREFIX : <http://dbpedia.org/resource/>
PREFIX time-of-spacecraft-launch: <http://www.wikidata.org/entity/P619c>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
?wikidata_entity time-of-spacecraft-launch: ?time_of_spacecraft_launch .
?wikidata_entity rdfs:label ?entity_label .
FILTER (LANG(?entity_label)='en')
}
ORDER BY DESC(?time_of_spacecraft_launch)
see the result here
I am writing SPARQL query to get all Person available in DBpedia. My query is ->
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?resource ?name
WHERE {
?resource rdf:type dbo:Person;
dbp:name ?name.
FILTER (lang(?name) = 'en')
}
ORDER BY ASC(?name)
It's giving around 10000 rows,when I am taking the output as HTML/csv/spreadsheet format.
But when I am giving query to get total count
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT COUNT(*)
WHERE{
?resource rdf:type dbo:Person;
dbp:name ?name.
FILTER (lang(?name) = 'en')
}
It's giving -> 1783404
Can anyone suggest a solution to get all rows of Person available in DBpedia?
DBPedia is being smart enough here to not overload its servers with large queries, and capping matches at 10000. Since you are ordering the results, you can use LIMIT and OFFSET to get result in sets of 10000. For example, to get the second set of 10000 results use this:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT ?resource ?name
WHERE {
?resource rdf:type dbo:Person;
dbp:name ?name.
FILTER (lang(?name) = 'en')
}
ORDER BY ASC(?name)
LIMIT 10000 OFFSET 10000
Actually, since DBPedia is limiting the results to 10000 matches, the LIMIT isn't really necessary.