DBPedia skipping some articles

DBPedia skipping some articles - sparql

I have a Sparql query:
SELECT DISTINCT ?film_title ?title ?year
WHERE {
?film_title rdf:type <http://dbpedia.org/ontology/Film> .
?film_title rdfs:label ?title .
?film_title <http://dbpedia.org/ontology/releaseDate> ?year .
FILTER (LANG(?title)='en')
} ORDER BY DESC(?year) LIMIT 100 OFFSET 0
I have used descending order for release year. But some movies are missing, like https://en.wikipedia.org/wiki/The_Lady_in_the_Car_with_Glasses_and_a_Gun_(2015_film). It was released on 5th Aug 2015, but it is still not in the list.
Am I doing something wrong here?

It doesn't appear that DBpedia has an entry for that film. If the article was created recently enough, it might not be in the latest DBpedia.

Related

How to get the number of languages a Wikipedia page is available in? (SPARQL query)

I'm trying to have a list of Italian books from the 1980 on and the number of Wikipedia pages their original Wikipedia page has been translated into, for instance I would like to have:
Book, number
The name of the Rose, 5
Where 5 is the number of languages The Name of the Rose has been translated into in Wikipedia, for instance there is an English wiki page, a Dutch one, a Spanish one, a French one, a Greek one.
Here is the query so far:
SELECT ?item ?label
WHERE
{
VALUES ?type {wd:Q571 wd:Q7725634} # book or literary work
?item wdt:P31 ?type .
?item wdt:P577 ?date FILTER (?date > "1980-01-01T00:00:00Z"^^xsd:dateTime) . #dal 1980
?item rdfs:label ?label filter (lang(?label) = "it")
?item wdt:P495 wd:Q38 .
}
I get the list of books, but I can't find the right property to look for.

Did you actually get The Name of the Rose in your example? Because its date of publication is set to 1980 (which is = 1980-01-01 for our purposes), I had to change your query to a >= comparison.
Then, using COUNT() and GROUP BY as mentioned in the comment gets you what you want. But if you really just need the number of sitselinks, there is a shortcut that may be useful. It was added because that number is often used as a good proxy for an item's popularity, and using the precomputed number is vastly more efficient than getting all links, grouping, and counting.
SELECT ?book ?bookLabel ?sitelinks ?date WHERE {
VALUES ?type { wd:Q571 wd:Q47461344 wd:Q7725634 }
?book wdt:P31 ?type;
wdt:P577 ?date;
wdt:P495 wd:Q38;
wikibase:sitelinks ?sitelinks.
FILTER((?date >= "1980-01-01T00:00:00Z"^^xsd:dateTime) && (?date < "1981-01-01T00:00:00Z"^^xsd:dateTime))
SERVICE wikibase:label { bd:serviceParam wikibase:language "it,en". }
}
Query
Note that the values may slightly differ from the version with group & count because here sites such as commons or wikiquote are also included.

SPARQL query returning much more rows than expected

I'm using the following SPARQL query to get a list of all US presidents together with the start and end dates of their presidency:
SELECT ?person $personLabel $start $end
WHERE {
?person wdt:P39 wd:Q11696.
?person p:P39 ?statement.
?position_held_statement ps:P39 wd:Q11696.
?statement pq:P580 ?start.
?statement pq:P582 ?end
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC($start)
You can try it out here
Why does it return so many rows?
EDIT: I know I can use SELECT DISTINCT to get only distinct results, but I want to learn how I can find out the reason for the duplicates. Moreover, there are entries which state that Barack Obama's presidency lasted from January 2009 to January 2017, while others state his two terms separately.

You have ?position_held_statement there. That should be ?statement.
When you unexpectedly have a lot of results in SPARQL, it’s usually because of mismatched variable names turning what should be a join into a cartesian product.

Retrieve the US release date for a movie from Wikidata using Sparql

I am trying to retrieve the titles and release dates (publication date) for movies using the wikidata.org sparkql endpoint (https://query.wikidata.org/). The titles are listed in different languages, which are filtered in the query below. However, some movies also have several publication dates (e.g. for different countries), e.g. https://www.wikidata.org/wiki/Q217020. I'm not sure how the RDF triple structure is actually used to assign a country to the value of another triple, but specifically, how can I only retrieve the publication date for a movie in the US?
SELECT ?item ?title ?publicationdate
WHERE {
?item wdt:P31 wd:Q11424 ;
rdfs:label ?title ;
wdt:P577 ?publicationdate ;
filter ( lang(?title) = "en" )
}
ORDER BY ?movieid
LIMIT 10
Solution
The solution provided by M.Sarmini works. Apparently, facts such as publication data are stored as n-ary relations, they create a unique symbolic tag that links the resources. The value that P577 links to is just the date, when turned into a string will give the release date, while in reality it is a token that you can link to other qualifiers.

Just add a new variable to hold the place of publication and filter your results to just list US films like this:
PREFIX q: <http://www.wikidata.org/prop/qualifier/>
PREFIX s: <http://www.wikidata.org/prop/statement/>
SELECT distinct ?item ?title ?publicationdate
WHERE {
?item wdt:P31 wd:Q11424;
rdfs:label ?title;
p:P577 ?placeofpublication.
?placeofpublication q:P291 wd:Q30.
?placeofpublication s:P577 ?publicationdate;
filter ( lang(?title) = "en")
}
ORDER BY ?item

How to extract persons on a wikipedia list using dbpedia/sparql

I'm trying to extract all persons that won a (Gold) medal at the Olympics and ideally their birth location using the dbpedia SPARQL query. Basically it's this list I'm aiming at: https://de.wikipedia.org/wiki/Liste_der_olympischen_Medaillengewinner_aus_Spanien
I guess it must somehow work with this piece of code:
yago-res:wikicategory_Olympic_bronze_medalists_for_Spain
This doesn't work:
SELECT ?res
WHERE {
?res yago-res:wikicategory_Olympic_bronze_medalists_for_Spain .
}
any ideas?

To get all the spanish persons that have won the gold medal in olympic
select ?person where
{
?person a <http://dbpedia.org/class/yago/OlympicGoldMedalistsForSpain>
}
If you look at what dbpedia has, there is no class:
http://dbpedia.org/class/yago/OlympicGoldMedalists
but there is
http://dbpedia.org/class/yago/OlympicGoldMedalistsForItaly
and
http://dbpedia.org/class/yago/OlympicGoldMedalistsForFrance
and
http://dbpedia.org/class/yago/OlympicGoldMedalistsForGermany
so a work around could be:
select distinct ?person ?birthPlace where
{
?goldForCountry rdfs:subClassOf yago:Medalist110305062 .
?person a ?goldForCountry .
optional{
?person dbo:birthPlace ?birthPlace
}
filter (contains(str(?goldForCountry), "http://dbpedia.org/class/yago/OlympicGoldMedalistsFor"))
}
The birthPlace should be optional because there are 3994 persons that dbpedia doesn't have their birth place

About UNION and FILTER NOT EXISTS in SPARQL (OpenRDF 2.8.0)

I learnt some semantic technologies, including RDF and SPARQL, a few years ago, then I didn't have chances to work with them for some time. Now I've started a new project which uses OpenRDF 2.8.0 as a semantic store and I'm resuming my knowledge, even though I have some forgotten things to recover.
In particular, in the past days I had some troubles in correctly undestanding the FILTER NOT EXIST construct in SPARQL.
Problem: I have a semantic store imported from DbTune.org (music ontologies). A mo:MusicArtist, intended as foaf:maker of a mo:Track, can be present in four scenarios (I'm only listing relevant statements):
<http://dbtune.org/musicbrainz/resource/artist/013c8e5b-d72a-4cd3-8dee-6c64d6125823> a mo:MusicArtist ;
vocab:artist_type "1"^^xs:short ;
rdfs:label "Edvard Grieg" .
<http://dbtune.org/musicbrainz/resource/artist/032df978-9130-490e-8857-0c9ef231fae8> a mo:MusicArtist ;
vocab:artist_type "2"^^xs:short ;
rel:collaboratesWith <http://dbtune.org/musicbrainz/resource/artist/3db5dfb1-1b91-4038-8268-ae04d15b6a3e> , <http://dbtune.org/musicbrainz/resource/artist/d78afc01-f918-440c-89fc-9d546a3ba4ac> ;
rdfs:label "Doris Day & Howard Keel".
<http://dbtune.org/musicbrainz/resource/artist/1645f335-2367-427d-8e2d-ad206946a8eb> a mo:MusicArtist ;
vocab:artist_type "2"^^xs:short ;
rdfs:label "Pat Metheny & Anna Maria Jopek".
<http://dbtune.org/musicbrainz/resource/artist/12822d4f-4607-4f1d-ab16-d6bacc27cafe> a mo:MusicArtist ;
rdfs:label "René Marie".
From what I understand, the vocab:artist_type is 1 for single artists (example #1) and 2 for groups of collaborations (examples #2 and #3). In this case, there might a few rel:collaboratesWith statements that point to the description of the single members of the group or collaboration (example #2). In some cases, the vocab:artist_type statement is missing (example #4).
Now I want to extract all the artists as single entities, where possibile. I mean, I don't want to retrieve example #2, because I will get "Doris Day" and "Howard Keel" separately. I have to retrieve example #3 "Pat Metheny & Anna Maria Jopek" because I can't do anything else. Of course, I also want to retrieve "René Marie".
I've solved the problem in a satisfactory way with this SPARQL:
SELECT *
WHERE
{
?artist a mo:MusicArtist.
?artist rdfs:label ?label.
MINUS
{
?artist vocab:artist_type "2"^^xs:short.
?artist rel:collaboratesWith ?any1 .
}
}
ORDER BY ?label
It makes sense and it looks like it's readable ("retrieve all mo:MusicArtist items minus those that are collaborations with individual members listed").
I didn't find the solution immediately. I first thought of putting together the three separate cases, with UNION:
SELECT *
WHERE
{
?artist a mo:MusicArtist.
?artist rdfs:label ?label.
# Single artists
{
?artist vocab:artist_type "1"^^xs:short.
}
UNION
# Groups for which there is no defined collaboration with single persons
{
?artist vocab:artist_type "2"^^xs:short.
FILTER NOT EXISTS
{
?artist rel:collaboratesWith ?any1
}
}
UNION
# Some artists don't have this attribute
{
FILTER NOT EXISTS
{
?artist vocab:artist_type ?any2
}
}
}
ORDER BY ?label
I found that the third UNION statements, the ones which should add mo:MusicArtist items without a vocab:artist_type, didn't worked. That is, they didn't find the items such as "René Marie".
While I'm satisfied with the shortest solution I found with MINUS, I'm not ok with the fact that I don't understand why the older solution didn't work. Clearly I'm missing some point with FILTER NOT EXISTS that could be useful for some other case.
Any help is welcome.

When I run the following query, I get the results that it sounds like you're looking for:
select distinct ?label where {
?artist a mo:MusicArtist ;
rdfs:label ?label .
#-- artists with type 1
{
?artist vocab:artist_type "1"^^xs:short
}
#-- artists with no type
union {
filter not exists {
?artist vocab:artist_type ?type
}
}
#-- artists with type 2 that have no
#-- collaborators
union {
?artist vocab:artist_type "2"^^xs:short
filter not exists {
?artist rel:collaboratesWith ?another
}
}
}
------------------------------------
| label |
====================================
| "René Marie" |
| "Pat Metheny & Anna Maria Jopek" |
| "Edvard Grieg" |
------------------------------------
I'm not whether I see where this essentially differs from yours, though. I do think that you could clean this query up a bit though. You can use optional and values to specify that the type is optional, but if present must be 1 or 2. Then you can add a filter that requires that when the value is 2, there is no collaborator.
select ?label where {
#-- get an artist and their label
?artist a mo:MusicArtist ;
rdfs:label ?label .
#-- and optionally their type, if it is
#-- "1"^^xs:short or "2"^^xs:short
optional {
values ?type { "1"^^xs:short "2"^^xs:short }
?artist vocab:artist_type ?type
}
#-- if ?type is "2"^^xs:short, then ?artist
#-- must not collaborate with anyone.
filter ( !sameTerm(?type,"2"^^xs:short)
|| not exists { ?artist rel:collaboratesWith ?anyone })
}
------------------------------------
| label |
====================================
| "René Marie" |
| "Pat Metheny & Anna Maria Jopek" |
| "Edvard Grieg" |
------------------------------------

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

DBPedia skipping some articles - sparql

It doesn't appear that DBpedia has an entry for that film. If the article was created recently enough, it might not be in the latest DBpedia.

Related

How to get the number of languages a Wikipedia page is available in? (SPARQL query)

SPARQL query returning much more rows than expected

Retrieve the US release date for a movie from Wikidata using Sparql

How to extract persons on a wikipedia list using dbpedia/sparql

About UNION and FILTER NOT EXISTS in SPARQL (OpenRDF 2.8.0)

Categories

Resources