Sparql - How to specifiy Property paths with regex - sparql

Imagine I would like to query all descendants of Otto Bismarck until generation 3.
How could I write the sparql code with regex? In this tutorial it says that we can use regex but I don't know how.
I tried to use "{3}":
SELECT ?descendant ?descendantLabel
WHERE
{
wd:Q8442 wdt:P40{3} ?descendant.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
However, this does not work. The output should be this:
try here

It's not possible to write queries with REGEX, but the REGEX syntax can look similar to property paths, hence why you might have been confused.
As for writing paths of length of up to 3, the syntax you are using did not actually make it in the standard, even though it does appear in a few documents.
I'd use something like:
SELECT DISTINCT ?descendant ?descendantLabel
WHERE
{
wd:Q8442 wdt:P40/wdt:P40?/wdt:P40? ?descendant.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
This will give us the paths of length 1, 2, and 3. ? means 'zero or one instances' of the property.
This trick can work with relatively short paths.

Related

Can I use generators in WikiData's API?

I am trying to get all of the pages in given category from wikipedia, including ones in subcategories. No problem with that, but I also want certain fields from each page, like birth date.
From this topic I suppose I need to use https://wikidata.org/w/api.php and not for example https://pl.wikipedia.org/...
I assumed I should use generator, but my trouble is that with calling WikiData I get an error about bad ID, which I don't get for Wikipedia.
query.params = {
"action": "query", // placeholder for test
"generator": "categorymembers",
"gcmpageid": 1810130, // sophists'category at pl.wikipedia
"format": "json"
}
https://pl.wikipedia.org/w/api.php -> data
https://en.wikipedia.org/w/api.php -> error: nosuchpage (expected)
https://www.wikidata.org/w/api.php -> error: invalidcategory (why???)
I've tried to use that id from WikiData prefixed with "Q", but then I got badinteger
Alternatively I could make requests to Wikipedia for ids and then to WikiData, but calling two times for the same thing and handling all that ids into request...
Please help
TL;DR Using generators from Polish Wikipedia in Wikidata API does not work but other solutions exist.
A few things to note about Wikidata and its API:
Wikidata doesn't know anything about the category hierarchies on Polish Wikipedia (or on any other Wikipedia language version)
There is no API to query pages in all subcategories. This is mainly because the catgory system of MediaWiki allows cycles in the category hierarchies and infinite levels of nested categories.
pageIds are only unique within a project. So using a pageId from pl.wikipedia.org does not work on https://en.wikipedia.org/w/api.php or https://www.wikidata.org/w/api.php
There are multiple solutions to your problem:
Use the query in your question recursively to get all page titles from Kategoria:Sofiści and its subcategories.
Afterwards, use the Wikidata API to retrieve the Wikidata item for each Polish Wikipedia article: e.g. for Protagoras the query is this: https://www.wikidata.org/w/api.php?action=wbgetentities&sites=plwiki&titles=Protagoras&props=claims&format=json
This returns a json file with all statements about Protagoras stored on Wikidata. The birth data you find in that file under claims->P569->mainsnak->datavalue->value->time.
Use the Wikidat Query Service. It allows you to call out MediaWiki API from SPARQL.
SELECT ?item ?itemLabel ?date_of_birth WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam wikibase:endpoint "pl.wikipedia.org" .
bd:serviceParam mwapi:gcmtitle 'Kategoria:Sofiści' .
bd:serviceParam mwapi:generator "categorymembers" .
bd:serviceParam mwapi:gcmprop "ids|title|type" .
bd:serviceParam mwapi:gcmlimit "max" .
?item wikibase:apiOutputItem mwapi:item .
}
?item wdt:P569 ?date_of_birth
SERVICE wikibase:label { bd:serviceParam wikibase:language "pl". }
}
Insert this query on https://query.wikidata.org/. That page also offers you code examples how to access the results programmatically.
The drawback of this solution is, that pages in subcategories are not included.
Fully rely on Wikidata. Use the following query in https://query.wikidata.org/:
SELECT ?item ?itemLabel ?date_of_birth WHERE {
?item wdt:P106 wd:Q3750514.
?item wdt:P569 ?date_of_birth
SERVICE wikibase:label { bd:serviceParam wikibase:language "pl,en". }
}

Wikidata/Wikidata-Toolkit Getting entities by SPARQL query (#362)

I am trying to understand the above issue, but still unclear about the exact requirement in the issue. Can you please describe it once again in detail?
What I can figure out is :-
This query is working on Wikidata Query Service platform
SELECT ?station WHERE {
?station wdt:P954 ?ibnr.
FILTER regex(?ibnr,"^80","i")
}
So is this -
SELECT ?station ?stationLabel WHERE {
?station wdt:P954 ?ibnr
FILTER regex(?ibnr,"^80","i")
SERVICE wikibase:label { # ... include the labels
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en"
}
}
But, this also gives the labels associated with the station names which are extracted using Service keyword
(These two are the ones you mentioned in the comments. )
So, what exactly is the aim of the API? since the standard SPARQL queries are also working on that Wikidata Query Service platform. If possible also mention the files I should go through for the same.

How to fetch data by giving page URL in Wikidata Query Service (SPARQL)

I am trying to get some lists of data from Wikipedia. So I am using Wikidata SPARQL query service. What I need to know is how can I fetch the data from the URL (Wikipedia page URL). Currently I am searching from name. Following is my query,
SELECT DISTINCT ?item ?itemLabel ?birthLocation ?birthLocationLabel WHERE {
?item (wdt:P31|wdt:P101|wdt:P106)/wdt:P279* wd:Q482980 ;
rdfs:label "Enid Blyton"#en ;
wdt:P19 ?birthLocation
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
I need to search by URL instead of name. How can I achieve this? And also is that possible to get the sub links from the URL. I tried by I couldn't find any way to get them. So anybody can tell me what I am doing wrong here? I highly appreciate that. Thank you.

How to find the correct property for Wikidata Sparql queries?

How should I find the correct property for a query?
Say I want to find all the kings in wikidata.
My first attempt was this:
SELECT ?king ?kingLabel
WHERE
{
#all result where occupation is king
?king wdt:P106 wd:Q12097.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
It returned total 515 hit.
Which I think is very small number for all the kings.
Second attempt:
SELECT ?king ?kingLabel
WHERE
{
# all result where position held is king
?king wdt:P39 wd:Q12097.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Which returned 817 hits.
So you can see I used different properties: P106 vs P39. They returned different result set. And those are only two properties, maybe it exist another 10-12 properties which I haven't discovered yet and would be better for my query.
edit: And on the top of that the usage of given property doesn't seem to be consequent. For property of P39 (position held) some wiki pages has the value of Q12097 (king) others has Q6412254 (king of Hungary).
So as you can see it is impossible to fetch all the kings in wikidata with one query. Or should I fetch bunch of properties and parse their values for the word king? It would be a nightmare but right now couldn't find a better way.
Right now the usage of properties seems ad hoc to me unfortunately and because of this I am using a trial and error method for discovering properties .
So my question is (and it isn't related to the 'king' problem only):
If I want to formulate a query how can I decide which is the best property?

How to get Mediawiki SPARQL query with return text of article?

For example i need to get all peoples names in wikipedia and it pages text (parsed or not- it's not important).
I write SPARQL query...
SELECT ?human ?humanLabel WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
?human wdt:P31 wd:Q5.
}
LIMIT 10
How in this query get a full text of articles with addition column?
You can't. SPARQL is designed to get the data only from wikidata. So the best solution for you is to run your query first, then loop over it, and run the following API for each record to get the page text.
https://en.wikipedia.org//w/api.php?action=query&format=json&prop=revisions&titles=Barack_Obama&utf8=1&rvprop=content
Change barack obama to the page title.