Wikidata query to select items with both Simple Wikipedia & English Wikipedia Pages - sparql

I am trying to get Wikidata entries that have both a Simple Wikipedia page, and an English Wikipedia page, and have both of the links. So far I have:
SELECT ?item ?itemLabel ?articleSimple ?article
WHERE {
?articleSimple schema:about ?item ;
schema:isPartOf <https://simple.wikipedia.org/> ;
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]" }
}
LIMIT 10
(Link to demo)
Right now, this successfully selects the Simple Wikipedia pages, but I'm not able to get the regular Wikipedia page.
I've tried with an "OPTIONAL" block, or another "schema:about", but have not been successful. Thank you for any help provided!

Related

Can I use generators in WikiData's API?

I am trying to get all of the pages in given category from wikipedia, including ones in subcategories. No problem with that, but I also want certain fields from each page, like birth date.
From this topic I suppose I need to use https://wikidata.org/w/api.php and not for example https://pl.wikipedia.org/...
I assumed I should use generator, but my trouble is that with calling WikiData I get an error about bad ID, which I don't get for Wikipedia.
query.params = {
"action": "query", // placeholder for test
"generator": "categorymembers",
"gcmpageid": 1810130, // sophists'category at pl.wikipedia
"format": "json"
}
https://pl.wikipedia.org/w/api.php -> data
https://en.wikipedia.org/w/api.php -> error: nosuchpage (expected)
https://www.wikidata.org/w/api.php -> error: invalidcategory (why???)
I've tried to use that id from WikiData prefixed with "Q", but then I got badinteger
Alternatively I could make requests to Wikipedia for ids and then to WikiData, but calling two times for the same thing and handling all that ids into request...
Please help
TL;DR Using generators from Polish Wikipedia in Wikidata API does not work but other solutions exist.
A few things to note about Wikidata and its API:
Wikidata doesn't know anything about the category hierarchies on Polish Wikipedia (or on any other Wikipedia language version)
There is no API to query pages in all subcategories. This is mainly because the catgory system of MediaWiki allows cycles in the category hierarchies and infinite levels of nested categories.
pageIds are only unique within a project. So using a pageId from pl.wikipedia.org does not work on https://en.wikipedia.org/w/api.php or https://www.wikidata.org/w/api.php
There are multiple solutions to your problem:
Use the query in your question recursively to get all page titles from Kategoria:Sofiści and its subcategories.
Afterwards, use the Wikidata API to retrieve the Wikidata item for each Polish Wikipedia article: e.g. for Protagoras the query is this: https://www.wikidata.org/w/api.php?action=wbgetentities&sites=plwiki&titles=Protagoras&props=claims&format=json
This returns a json file with all statements about Protagoras stored on Wikidata. The birth data you find in that file under claims->P569->mainsnak->datavalue->value->time.
Use the Wikidat Query Service. It allows you to call out MediaWiki API from SPARQL.
SELECT ?item ?itemLabel ?date_of_birth WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam wikibase:endpoint "pl.wikipedia.org" .
bd:serviceParam mwapi:gcmtitle 'Kategoria:Sofiści' .
bd:serviceParam mwapi:generator "categorymembers" .
bd:serviceParam mwapi:gcmprop "ids|title|type" .
bd:serviceParam mwapi:gcmlimit "max" .
?item wikibase:apiOutputItem mwapi:item .
}
?item wdt:P569 ?date_of_birth
SERVICE wikibase:label { bd:serviceParam wikibase:language "pl". }
}
Insert this query on https://query.wikidata.org/. That page also offers you code examples how to access the results programmatically.
The drawback of this solution is, that pages in subcategories are not included.
Fully rely on Wikidata. Use the following query in https://query.wikidata.org/:
SELECT ?item ?itemLabel ?date_of_birth WHERE {
?item wdt:P106 wd:Q3750514.
?item wdt:P569 ?date_of_birth
SERVICE wikibase:label { bd:serviceParam wikibase:language "pl,en". }
}

Query wikidata for wikipedia urls in multiple langauges

I'm trying to use Wikidata as an intermediary to get from a unique identifier listed in Wikidata (for example VIAF ID) to a Wikipedia description.
I've managed to piece together this query to get the Wikipedia page ID from a given VIAF ID ("153672966" below is the VIAF ID for "Southern Illinois University Press"):
SELECT ?pageid WHERE {
?item wdt:P214 "153672966".
[ schema:about ?item ; schema:name ?name ;
schema:isPartOf <https://en.wikipedia.org/> ]
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "en.wikipedia.org" .
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam mwapi:generator "allpages" .
bd:serviceParam mwapi:gapfrom ?name .
bd:serviceParam mwapi:gapto ?name .
?pageid wikibase:apiOutput "#pageid" .
}
}
This results in the pageid 9393762 which I am able to lookup in the Wikipedia API and get the introduction text I need using this request:
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&pageids=9393762
The resulting response includes an unparsed description (explaintext) taken from the first section in the wikipedia article, so this gets me where I need to be given the language is english.
Now the problem is that I need to use this on a internationalized site where I might not even know upfront which languages might be used in the future. The query against Wikidata is supposed to run as a batch job on the backend, while fetching the actual descriptions from Wikipedia will be done from the frontend and be rendered asynchronously.
Ideally I would want the Wikidata query to return a pageid for each given language where there is a Wikipedia article available. On the frontend I would then check whether the current active language has a pageid associated and call the Wikipedia api or render a fallback if no pageid is given.
In the future I would need to make similar queries with other library related identifiers such as ISNI for example, but I don't imagine that being much different than the current use-case.
Is this a reasonable way to get the job done and how can I expand it to support multiple languages?
To get the explaintext you don't necessarily need the pageid but the page title is enough.
The page titles in all languages you get from Wikidata with the following query:
SELECT ?item ?title ?site WHERE {
?item wdt:P214 "153672966" .
[ schema:about ?item ; schema:name ?title ;
schema:isPartOf ?site ] .
}
And afterwards you can use Wikipedia API to get the explaintext:
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles=Southern Illinois University Press
The downside of working with page titles is that they are not stable. So you will need to run your batch job regularly to check for renamings of articles.

How to fetch data by giving page URL in Wikidata Query Service (SPARQL)

I am trying to get some lists of data from Wikipedia. So I am using Wikidata SPARQL query service. What I need to know is how can I fetch the data from the URL (Wikipedia page URL). Currently I am searching from name. Following is my query,
SELECT DISTINCT ?item ?itemLabel ?birthLocation ?birthLocationLabel WHERE {
?item (wdt:P31|wdt:P101|wdt:P106)/wdt:P279* wd:Q482980 ;
rdfs:label "Enid Blyton"#en ;
wdt:P19 ?birthLocation
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
I need to search by URL instead of name. How can I achieve this? And also is that possible to get the sub links from the URL. I tried by I couldn't find any way to get them. So anybody can tell me what I am doing wrong here? I highly appreciate that. Thank you.

extracting data from wikipedia soccer player infoboxes

I want to retrieve informations from soccer player wikipedia inforboxes with the following properties (name, team, team number, apperances, goals) using the URIs returned by this wikidata query:
SELECT ?SoccerPlayer ?SoccerPlayerLabel ?Team ?TeamLabel ?TeamNumber ?numMatches ?numGoals ?startTime ?article WHERE
{?SoccerPlayer wdt:P106 wd:Q937857;
p:P54 ?stmt .
?stmt ps:P54 ?Team;
pq:P1350 ?numMatches;
pq:P1351 ?numGoals;
pq:P580 ?startTime .
optional {?stmt pq:P1618 ?TeamNumber} filter not exists {?SoccerPlayer p:P54/pq:P580 ?startTimeOther filter(?startTimeOther > ?startTime)}
FILTER(?startTime >= "2018-01-01T00:00:00Z"^^xsd:dateTime).
OPTIONAL { ?article schema:about ?SoccerPlayer .
?article schema:isPartOf <https://en.wikipedia.org/> . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".} } limit 200
Wikipedia infoboxes are plain text which can not be queried.
Instead, use either DBpedia or Wikidata.
DBpedia is likely more complete than Wikidata if it comes to data stored on English Wikipedia infoboxes but can not provide you much information beyond that.
In contrast, Wikidata aggregated data from various sources and can provide information about entities which do not have an Wikipedia article.

How to get Mediawiki SPARQL query with return text of article?

For example i need to get all peoples names in wikipedia and it pages text (parsed or not- it's not important).
I write SPARQL query...
SELECT ?human ?humanLabel WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
?human wdt:P31 wd:Q5.
}
LIMIT 10
How in this query get a full text of articles with addition column?
You can't. SPARQL is designed to get the data only from wikidata. So the best solution for you is to run your query first, then loop over it, and run the following API for each record to get the page text.
https://en.wikipedia.org//w/api.php?action=query&format=json&prop=revisions&titles=Barack_Obama&utf8=1&rvprop=content
Change barack obama to the page title.