extracting data from wikipedia soccer player infoboxes - sparql

I want to retrieve informations from soccer player wikipedia inforboxes with the following properties (name, team, team number, apperances, goals) using the URIs returned by this wikidata query:
SELECT ?SoccerPlayer ?SoccerPlayerLabel ?Team ?TeamLabel ?TeamNumber ?numMatches ?numGoals ?startTime ?article WHERE
{?SoccerPlayer wdt:P106 wd:Q937857;
p:P54 ?stmt .
?stmt ps:P54 ?Team;
pq:P1350 ?numMatches;
pq:P1351 ?numGoals;
pq:P580 ?startTime .
optional {?stmt pq:P1618 ?TeamNumber} filter not exists {?SoccerPlayer p:P54/pq:P580 ?startTimeOther filter(?startTimeOther > ?startTime)}
FILTER(?startTime >= "2018-01-01T00:00:00Z"^^xsd:dateTime).
OPTIONAL { ?article schema:about ?SoccerPlayer .
?article schema:isPartOf <https://en.wikipedia.org/> . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".} } limit 200

Wikipedia infoboxes are plain text which can not be queried.
Instead, use either DBpedia or Wikidata.
DBpedia is likely more complete than Wikidata if it comes to data stored on English Wikipedia infoboxes but can not provide you much information beyond that.
In contrast, Wikidata aggregated data from various sources and can provide information about entities which do not have an Wikipedia article.

Related

Wikidata Query: Find American authors of children’s fiction

I want to find all children's fiction writers using Wikidata SPARQL query. But I couldn't figure out how? Can someone help, please? The following is my approach but I don't think it is the correct way.
SELECT ?item ?itemLabel {
?item wdt:P31 wd:Q5. #find humans
?item wdt:P106 wd: #humans whose occupation is a novelist
[another condition needed] #children's fiction.
SERVICE wikibase:label {bd:serviceParam wikibase:language 'en'.}
} LIMIT 10
There is not one correct way, especially not in Wikidata where not all items of the same kind necessarily have the same properties.
One way would be to find the authors of works that are intended for (P2360) children:
# it’s a literary work (incl. any sublasses)
?book wdt:P31/wdt:P279* wd:Q7725634 .
# the literary work is intended for children
?book wdt:P2360 wd:Q7569 .
# the literary work has an author
?book wdt:P50 ?author .
# the author is a US citizen
?author wdt:P27 wd:Q30 .
Instead of getting all works that belong to the class "literary work" or any of its subclasses, you could decide to use only the class "fiction literature" (Q38072107) instead; with the risk that not all relevant works use this class.
Another way would be to find all authors that have "children’s writer" (Q4853732), or any of its subclasses, as occupation:
?author wdt:P106/wdt:P279* wd:Q4853732 .
?author wdt:P27 wd:Q30 .
As the different ways might find different results, you could could use them in the same query, using UNION:
SELECT DISTINCT ?author ?authorLabel
WHERE {
{
# way 1
}
UNION
{
# way 2
}
UNION
{
# way 3
}
SERVICE wikibase:label {bd:serviceParam wikibase:language 'en'.}
}

Query wikidata for wikipedia urls in multiple langauges

I'm trying to use Wikidata as an intermediary to get from a unique identifier listed in Wikidata (for example VIAF ID) to a Wikipedia description.
I've managed to piece together this query to get the Wikipedia page ID from a given VIAF ID ("153672966" below is the VIAF ID for "Southern Illinois University Press"):
SELECT ?pageid WHERE {
?item wdt:P214 "153672966".
[ schema:about ?item ; schema:name ?name ;
schema:isPartOf <https://en.wikipedia.org/> ]
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "en.wikipedia.org" .
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam mwapi:generator "allpages" .
bd:serviceParam mwapi:gapfrom ?name .
bd:serviceParam mwapi:gapto ?name .
?pageid wikibase:apiOutput "#pageid" .
}
}
This results in the pageid 9393762 which I am able to lookup in the Wikipedia API and get the introduction text I need using this request:
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&pageids=9393762
The resulting response includes an unparsed description (explaintext) taken from the first section in the wikipedia article, so this gets me where I need to be given the language is english.
Now the problem is that I need to use this on a internationalized site where I might not even know upfront which languages might be used in the future. The query against Wikidata is supposed to run as a batch job on the backend, while fetching the actual descriptions from Wikipedia will be done from the frontend and be rendered asynchronously.
Ideally I would want the Wikidata query to return a pageid for each given language where there is a Wikipedia article available. On the frontend I would then check whether the current active language has a pageid associated and call the Wikipedia api or render a fallback if no pageid is given.
In the future I would need to make similar queries with other library related identifiers such as ISNI for example, but I don't imagine that being much different than the current use-case.
Is this a reasonable way to get the job done and how can I expand it to support multiple languages?
To get the explaintext you don't necessarily need the pageid but the page title is enough.
The page titles in all languages you get from Wikidata with the following query:
SELECT ?item ?title ?site WHERE {
?item wdt:P214 "153672966" .
[ schema:about ?item ; schema:name ?title ;
schema:isPartOf ?site ] .
}
And afterwards you can use Wikipedia API to get the explaintext:
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles=Southern Illinois University Press
The downside of working with page titles is that they are not stable. So you will need to run your batch job regularly to check for renamings of articles.

Get all properties, sub-properties and label IDs from Wikidata item

I am trying to write a query that returns all possible information from a wikidata page, as https://www.wikidata.org/wiki/Q1299.
Ideally I would like to retrieve all info that are present in that page in english language.
So I am trying this query:
SELECT ?wdLabel ?ps_Label ?wdpqLabel ?pq_Label WHERE {
VALUES ?artist {
wd:Q1299
}
?artist ?p ?statement.
?statement ?ps ?ps_.
?wd wikibase:claim ?p;
wikibase:statementProperty ?ps.
OPTIONAL {
?statement ?pq ?pq_.
?wdpq wikibase:qualifier ?pq.
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
which works quite fine but I would like to retrieve all wikidata ids on the ps_Label column.
For example here I have Paul McCartney as string and I would like to also have the wikidata ID associated to that item, which is Q2599
has part Paul McCartney start time 1960-01-01T00:00:00Z
has part Paul McCartney end time 1970-01-01T00:00:00Z
has part Paul McCartney object has role singer
has part Paul McCartney object has role instrumentalist
Something similar to this other below but I can't merge the two together as I am missing some hint on sub-properties here.
SELECT ?propUrl ?propLabel ?valUrl ?valLabel WHERE {
wd:Q1299 ?propUrl ?valUrl.
?property ?ref ?propUrl;
rdf:type wikibase:Property;
rdfs:label ?propLabel.
?valUrl rdfs:label ?valLabel.
FILTER((LANG(?valLabel)) = "en")
FILTER((LANG(?propLabel)) = "en")
}
ORDER BY (?propUrl) (?valUrl)
Thanks a lot for your help!
This isn’t exactly what you’re asking for, but if you really just want all data on a single object, it is far easier and more economical to use the data access provided by https://www.wikidata.org/wiki/Special:EntityData/Q2599.json

Easiest way to query if a name exists in Wikidata?

I'm trying to create the simplest possible query to check if a name exists in Wikidata. For example, I just to want to see if the common name "Jack Smith" is in Wikidata.
Following the example query in this StackOverflow answer, I created the following SPARQL query (run it):
SELECT distinct ?item ?itemLabel ?itemDescription WHERE{
?item ?label "Jack Smith" .
?article schema:about ?item .
?article schema:inLanguage "en" .
?article schema:isPartOf <https://en.wikipedia.org/>.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 10
However, it returns zero results.
On the other hand, if I search for "Jack Smith" through the Wikidata search webpage, I get back 1200+ results with a lot of people named "Jack Smith".
The behavior with the SPARQL interface is inconsistent. When I run a SPARQL search for "Abraham Lincoln" (run it), I correctly get back an entry for the American president by that name.
My specific questions are:
Why is my SPARQL query returning zero results for "Jack Smith"?
What is the simplest API call to check if a name (e.g. Jack Smith) exists in Wikidata? It could be SPARQL or some other REST API.
Thank you.
The query returns zero results because there is no Jack Smith about whom there is an article in English and is part of Wikipedia.org.
Every line in the body of a SPARQL query is a way to restrict the search as opposed to just a way to add new variables.
Look at this query:
SELECT ?person ?dateOfDeath
WHERE {
?person a :Person .
?person :date_of_death ?dateOfDeath .
}
This would only return people who have a date of death, i.e. who are dead, as well as the date.
If you wanted to return people, and optionally their date of death, i.e. if they are still alive, you don't want a date, but you do want the person, then use this:
SELECT ?person ?dateOfDeath
WHERE {
?person a :Person .
OPTIONAL {?person :date_of_death ?dateOfDeath }
}
In terms of your second question, I'd try something like this:
SELECT ?boolean
WHERE{
BIND(EXISTS{?item ?label "Jack Smith"} AS ?boolean)
}
This can of course also be issued as, say, a cURL request:
curl https://query.wikidata.org/bigdata/namespace/wdq/sparql -X POST --data 'query=SELECT%20%3Fboolean%0AWHERE%7B%0ABIND%28EXISTS%7B%3Fitem%20%3Flabel%20%22Jack%20Smith%22%7D%20AS%20%3Fboolean%29%0A%20%20%7D'

How to get Wikidata ID for DBpedia Entities?

I have a set of DBpedia concepts and would like to get the corresponding wikidata IDs of them. For example, consider word2vec. The wikidata ID of word2vec is wd:Q22673982.
Currently, I am doing it as follows.
SELECT * {
VALUES ?searchTerm { "word2vec" "fasttext" "natural language processing" "deep learning" "support vector machine" }
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "EntitySearch".
bd:serviceParam wikibase:endpoint "www.wikidata.org".
bd:serviceParam wikibase:limit 10 .
bd:serviceParam mwapi:search ?searchTerm.
bd:serviceParam mwapi:language "en".
?item wikibase:apiOutputItem mwapi:item.
?num wikibase:apiOrdinal true.
}
?item (wdt:P279|wdt:P31) ?type
}
ORDER BY ?searchTerm ?num
However, I noted that when I do it this way, most of my terms do not get a wikidata ID.
Therefore, I would like to know;
Are all DBpedia concepts associated with its relevent wikidata ID?
How to get the wikidata ID associated with DBpedia using sparql?
I am happy to provide more details if needed.
I used the following SPARQL query to solve my issue:
SELECT distinct ?wikidata_concept
WHERE {dbr:Word2vec owl:sameAs ?wikidata_concept}
LIMIT 100