Getting all wikipedia articles with precise time - sparql

Some wikipedia articles has precise timestamp in infobox, like this one:
https://en.wikipedia.org/wiki/Apollo_11
(Launch date: July 16, 1969, 13:32:00 UTC)
or:
https://en.wikipedia.org/wiki/Remembrance_Day_bombing
(Date: 8 November 1987 10:43 (GMT))
Is there a way to get a list of all articles like this? Seems like it possible with SPARQL

AFAIK It would be possible, but it require to know what wiki property is linked to the date ( or date time ) field of the infobox; let me explain with an example:
PREFIX : <http://dbpedia.org/resource/>
PREFIX time-of-spacecraft-launch: <http://www.wikidata.org/entity/P619c>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?entity_label, ?property_label, ?time_of_spacecraft_launch WHERE {
:Apollo_11 owl:sameAs ?wikidata_entity .
?wikidata_entity time-of-spacecraft-launch: ?time_of_spacecraft_launch .
?wikidata_entity rdfs:label ?entity_label .
?wke_prop ?property_rel time-of-spacecraft-launch:.
?wke_prop rdfs:label ?property_label .
FILTER (LANG(?property_label)='en' && LANG(?entity_label)='it')
}
click here to se the result
Now we can gel all the article with the same kind of information simply by removing the where condition on Apollo_11:
PREFIX : <http://dbpedia.org/resource/>
PREFIX time-of-spacecraft-launch: <http://www.wikidata.org/entity/P619c>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?entity_label, ?property_label, ?time_of_spacecraft_launch WHERE {
?wikidata_entity time-of-spacecraft-launch: ?time_of_spacecraft_launch .
?wikidata_entity rdfs:label ?entity_label .
?wke_prop ?property_rel time-of-spacecraft-launch:.
?wke_prop rdfs:label ?property_label .
FILTER (LANG(?property_label)='en' && LANG(?entity_label)='it')
}
see the result herefy
in some cases may be useful to simplify the query:
PREFIX : <http://dbpedia.org/resource/>
PREFIX time-of-spacecraft-launch: <http://www.wikidata.org/entity/P619c>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
?wikidata_entity time-of-spacecraft-launch: ?time_of_spacecraft_launch .
?wikidata_entity rdfs:label ?entity_label .
FILTER (LANG(?entity_label)='en')
}
ORDER BY DESC(?time_of_spacecraft_launch)
see the result here

Related

Retrieve country code from RDF source recieved via Sparql query

I have the following query on this endpoint https://data.bnf.fr/sparql/ :
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdagroup2elements: <http://rdvocab.info/ElementsGr2/>
PREFIX bio: <http://vocab.org/bio/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT distinct ?name ?nationality
WHERE {
?oeuvre dcterms:creator ?author.
?author foaf:name ?name.
?author rdagroup2elements:countryAssociatedWithThePerson ?nationality.
}
ORDER BY DESC (?mort) LIMIT 100
Which returns a list of authors with a nationality field that is itself another RDF source :
So, for the first author Jean Martin, I get this link to another RDF source, in the case for the country France : http://id.loc.gov/vocabulary/countries/fr
How could I modify the query to receive the country code (or country name, if not possible) instead of this link, in this case FR (or France)?
An alternative to extracting the country code from the URI, using "the Linked Data way":
The default graph of the endpoint https://data.bnf.fr/sparql/ doesn’t provide any data about the entities under the namespace http://id.loc.gov/vocabulary/countries/, but it provides entities under the namespace http://data.bnf.fr/vocabulary/countrycodes/, which have an owl:sameAs link to them.
For example:
<http://data.bnf.fr/vocabulary/countrycodes/fr> <http://www.w3.org/2002/07/owl#sameAs> <http://id.loc.gov/vocabulary/countries/fr> .
And these http://data.bnf.fr/vocabulary/countrycodes/ entities refer to the country code with skos:notation, and to the country name with skos:prefLabel (language-tagged).
For these cases, getting the country code would be possible with this property path:
?author rdagroup2elements:countryAssociatedWithThePerson/^owl:sameAs/skos:notation ?countryCode .
Unfortunately, only some rdagroup2elements:countryAssociatedWithThePerson values are under the http://id.loc.gov/vocabulary/countries/ namespace, while other values are under the http://data.bnf.fr/vocabulary/countrycodes/ namespace directly.
To find both cases, you could use UNION:
{ ?author rdagroup2elements:countryAssociatedWithThePerson/^owl:sameAs/skos:notation ?countryCode . }
UNION
{ ?author rdagroup2elements:countryAssociatedWithThePerson/skos:notation ?countryCode . }
The full query:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdagroup2elements: <http://rdvocab.info/ElementsGr2/>
PREFIX bio: <http://vocab.org/bio/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT DISTINCT ?name ?countryCode
WHERE {
?oeuvre dcterms:creator ?author.
?author foaf:name ?name.
{ ?author rdagroup2elements:countryAssociatedWithThePerson/^owl:sameAs/skos:notation ?countryCode . }
UNION
{ ?author rdagroup2elements:countryAssociatedWithThePerson/skos:notation ?countryCode . }
}
LIMIT 100
(In case you didn’t intend it: Your query treats different persons with the same name and country as one entry. To prevent this, you could output the person’s URI.)
You can use SPARQL replace function:
SELECT ... (REPLACE(STR(?nationality), "http://id.loc.gov/vocabulary/countries/", "") AS ?nationalityShort)
WHERE ...
to extract the code from the url.
But you can as well link to this resource and retrieve additional fields from it like sos:notation in your case:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdagroup2elements: <http://rdvocab.info/ElementsGr2/>
PREFIX bio: <http://vocab.org/bio/0.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT distinct ?name ?nationalityCode
WHERE {
?oeuvre dcterms:creator ?author.
?author foaf:name ?name.
?author rdagroup2elements:countryAssociatedWithThePerson ?nationality.
?nationality skos:notation ?nationalityCode
}
ORDER BY DESC (?mort) LIMIT 100

SPARQL query all persons who lived through the 20th century in DBPedia

I want to query all persons who lived through the entire 20th century in DBPedia.
Im using https://dbpedia.org/sparql/ to process my query. I have limited the output to 20.
The query I've tried is the following:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?personName ?birthDate ?deathDate
WHERE {
FILTER (?birthDate < "1900-01-01"^^xsd:date AND ?deathDate > "1999-12-31"^^xsd:date).
?p rdf:type dbo:Person.
?p dbp:name ?personName.
?p dbp:birthDate ?birthDate.
?p dbp:deathDate ?deathDate.
}
LIMIT 20.
In the output all person died after 1999-12-31 but they weren't born before 1900-01-01.
Why is my query wrong? How can I fix it?
Thanks in advance for you time.
This issue is due to integer values being included in the query results, and can be resolved using the DATATYPE() function in the same or a new FILTER() section.
For example:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?personName ?birthDate ?deathDate
WHERE
{
?person rdf:type dbo:Person;
dbp:name ?personName;
dbp:birthDate ?birthDate;
dbp:deathDate ?deathDate.
#This FILTER ensures that only xsd:date values are returned
FILTER(DATATYPE(?birthDate) = xsd:date && DATATYPE(?deathDate) = xsd:date).
FILTER (?birthDate < "1900-01-01"^^xsd:date && ?deathDate > "1999-12-31"^^xsd:date).
}
LIMIT 20
Live Query Definition
Live Query Result
DATATYPE() Documentation

SPARQL query on dbpedia: Find women who have husbands younger than them

I am working on a fun query. In dbpedia, I want to find records of females who have married younger males. Or,alternatively, husbands who had wives older than them.
I came up this this SPARQL query. Sadly it returns too few records, only n = 3.
# I always add these for easier commandline querying
prefix dbo: <http://dbpedia.org/ontology/>
prefix dbp: <http://dbpedia.org/property/>
prefix dbr: <http://dbpedia.org/resource/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix exo: <http://example.org/ontology/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?husband ?husbandBirthDate ?wife ?wifeBirthDate
WHERE {
# there are 300000 triples that use dbp:spouse
# unfortunately very few have spouses which are also in dbpedia
# and for those partners the birthDate occurs even more rarely in dbpedia
{
?husband dbp:spouse ?wife .
?husband dbp:gender ?husbandGender.
FILTER(str(?husbandGender) = "male")
?husband dbp:birthDate ?husbandBirthDate .
?wife dbp:birthDate ?wifeBirthDate .
?wife dbp:gender ?wifeGender.
FILTER(str(?wifeGender) = "female") .
FILTER (?husbandBirthDate > ?wifeBirthDate)
}
UNION
# there are ~1400 triples that use dbo:husband which means "hasHusband"
{
?wife dbp:husband ?husband .
?husband dbo:birthDate ?husbandBirthDate .
?wife dbo:birthDate ?wifeBirthDate .
FILTER (?husbandBirthDate > ?wifeBirthDate)
}
UNION
# there are ~1300 triples that use dbo:wife which means "hasWife"
{
?husband dbp:wife ?wife .
?husband dbo:birthDate ?husbandBirthDate .
?wife dbo:birthDate ?wifeBirthDate .
FILTER (?husbandBirthDate > ?wifeBirthDate)
}
}
ORDER BY ?husbandBirthDate
Sadly it returns too few records,these three, (currently, 2020):
husband
husbandBirthDate
wife
wifeBirthDate
http://dbpedia.org/resource/Karel_Kosík
1926-06-26
http://dbpedia.org/resource/Růžena_Grebeníčková
1925-11-01
http://dbpedia.org/resource/Keiran_Lee
1984-01-15
http://dbpedia.org/resource/Puma_Swede
1976-09-13
http://dbpedia.org/resource/Keiran_Lee
1984-01-15
http://dbpedia.org/resource/Kirsten_Price_(actress)
1981-11-13
What am I doing wrong?
Should I switch to the foaf ontology? Do I have to explicitly parse the date with ^^xsd::date?
Note: I have my own docker-image of DBpedia, in the production version of DBpedia about 25 records are found. Still too few , I think.

Results filtered out even though the data is there

Here's my query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX : <http://dbpedia.org/resource/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT DISTINCT ?resource ?parentOrSpouse
WHERE {
?resource a dbo:Royalty ;
rdfs:label ?label ;
dbo:parent ?parent ;
dbo:birthDate ?bd ;
dbo:birthPlace ?bp .
?bp dbo:isPartOf :England .
FILTER(?bd < '1900-01-01'^^xsd:dateTime) .
FILTER(?bd > '1800-01-01'^^xsd:dateTime) .
FILTER(LANGMATCHES(LANG(?label), 'en')) .
{ ?resource dbo:parent ?parentOrSpouse } UNION { ?resource dbo:spouse ?parentOrSpouse }
?parentOrSpouse dbo:birthPlace ?psbp .
?psbp dbo:isPartOf :England .
}
ORDER BY(?bd)
This searches for all royals born in England between 1800 and 1900 that have a spouse or parent that is born in England.
Result
In the result list I get
http://dbpedia.org/page/George_V with http://dbpedia.org/page/Mary_of_Teck listed as a spouse, but not Mary_of_Teck listed while George is clearly born in England.
Why is Mary disappearing? There are a lot of other people disappearing that should clearly be on the list when I look at the data.
So the solution to Mary not showing up is to use dbo:parent|dbo:spouse/dbo:wikiPageRedirects?, since George is refered by Mary via a redirect.
The other problem was dbo:birthPlace/dbo:location?/dbo:isPartOf dbr:England throwing an error that is probably(?) related to a bug in the compiler. Using ?parentOrSpouse dbo:birthPlace|dbo:birthPlace/dbo:location/dbo:isPartOf :England . instead seems to work fine.
Credit goes to #AKSW.

Different order of results in DBpedia through SNORQL vs. SERVICE query

I am trying to retrieve some data about movies from DBpedia. This is my query:
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX : <http://dbpedia.org/resource/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX onto: <http://dbpedia.org/ontology/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT *
{{SELECT *
WHERE {
SERVICE<http://dbpedia.org/sparql>{
?movie dcterms:subject <http://dbpedia.org/resource/Category:American_films> ;
a onto:Film ;
rdfs:label ?title ;
dbpedia2:gross ?revenue .
?movie onto:starring ?actorUri .
?actorUri rdfs:label ?actor .
OPTIONAL {
?movie onto:imdbId ?imdbId .
}
BIND(xsd:integer(?revenue) as ?intRevenue) .
FILTER ((datatype(?revenue) = 'http://dbpedia.org/datatype/usDollar') && (LANGMATCHES(LANG(?title), 'en')) && (LANGMATCHES(LANG(?actor), 'en'))) .
}
}
}}
ORDER BY DESC (?intRevenue)
LIMIT 40000
OFFSET 0
Running this query on http://dbpedia.org/snorql/ (without the SERVICE keyword) returns the correct result. However, doing so from a third party triplestore doesn't yield the same order (ex: Hobbit and Lord of the Rings are missing).
What do I need to change in the query to get identical results?
The best way to overcome this specific limitation is to have your own DBpedia mirror, on which you can set your own limits (including none), and which you can then use as either your primary or remote data store and/or query engine.
(ObDisclaimer: OpenLink Software provides the public DBpedia SPARQL endpoint, produces Virtuoso and the DBpedia Mirror AMI, and employs me.)