How to speed up this federated query? - sparql

I have a query which brings people who a dbo:Writer and have different birth-dates stated in DBpedia and Wikidata. I wrote it initially to be run independently but currently I'm running on Wikidata. It's quite slow and the interesting thing is that when I don't call with Wikidata endpoint with SERVICE, which I can afford to do here, it gets even slower. I'd be interested to learn why it is so. But my main question is, how to optimize it? Here is the query:
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT DISTINCT ?personDBP ?personWD ?personWDLabel ?BirthDateDBP ?BirthDateWD
WHERE {
SERVICE <http://dbpedia.org/sparql> {
?personDBP a dbo:Writer ;
dbo:birthDate ?BirthDate_DBP;
owl:sameAs ?personWD .
FILTER REGEX (?personWD, "wikidata.org")
BIND (xsd:date(?BirthDate_DBP) AS ?BirthDateDBP)
}
SERVICE <https://query.wikidata.org/sparql> {
?personWD wdt:P569 ?BirthDate_WD .
BIND (xsd:date(?BirthDate_WD) AS ?BirthDateWD)
}
FILTER (?BirthDateDBP != ?BirthDateWD)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY DESC(?BirthDateDBP)
LIMIT 50

Related

getting labels from Wikidata in graphDB

I have a list of artstyles in graphDB, i am trying to use the SERVICE function to get their labels from Wikidata with this query:
PREFIX gp: <http://www.semanticweb.org/kandd/group76/final_project#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?movement ?label
WHERE{
?artist gp:hasArtStyle ?movement.
SERVICE <https://query.wikidata.org/sparql>{
?movement rdfs:label ?label .
FILTER (langMatches( lang(?label), "EN" ) )
}
}
note that gp is a namespace that only exists in my graph, not anywhere on the internet and also note that ?movement contains a list of valid Wikidata URIs such as http://www.wikidata.org/entity/Q186030
yet still the response I get is:
Error 500: error
Query evaluation error: org.eclipse.rdf4j.query.QueryEvaluationException: org.eclipse.rdf4j.query.QueryEvaluationException: java.io.IOException: Unkown record type: 83 (HTTP status 500)
What am I doing wrong?
Remember that you query is handled from the inside to the outside, meaning that the service part is handled first, and then the part where you use your own specific property.
Currently, your query on WikiData is very general. You ask for everything that has a rdfs:label, and then filter on all the English labels it returns.
Given this, my guess is that you query simply times out. Instead, I would try something like this:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT *
WHERE{
SERVICE <https://query.wikidata.org/sparql>{
?artist wdt:P101 wd:Q186030 ; #Field of Work is contemporary art
wdt:P31 wd:Q5 ; #instance of Human
rdfs:label ?name . #get the label
FILTER (langmatches(lang(?name), "en"))
}
}
If I try this in GraphDB, it returns 156 results.

SPARQL Query returns empty result set

I am querying marine data via SPARQL. I developed a SPARQL console with CodeMirror, RDFLib and SPARQLWrapper in order to display a number of predefined queries and results in my website. In the console, the query:
prefix geo: <https://www.w3.org/2003/01/geo/wgs84_pos#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix ssn: <http://www.w3.org/ns/ssn/>
prefix xml: <http://www.w3.org/XML/1998/namespace>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix argo: <http://www.argodatamgt.org/argo-ontology#>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix sosa: <http://www.w3.org/ns/sosa/>
prefix nerc: <http://vocab.nerc.ac.uk/collection/>
prefix dct: <http://purl.org/dc/terms/>
prefix prov: <https://www.w3.org/TR/prov-o/>
# stations/date of each cycle
SELECT distinct ?wmo ?lat ?lon ?date where{
?float argo:cycle ?cycle;
argo:wmoCode ?wmo.
?cycle geo:latitude ?lat;
geo:longitude ?lon;
argo:startDate ?date.
}
returns nothing. I cross-checked it by using https://www.orpha.net/sparql , a query editor, and the result was the same - empty result set.
However, when I try the exact same query in the research infrastructure's SPARQL Endpoint https://www.ifremer.fr/co/argo-linked-data/html/Argo-HTML-SPARQL/ , it works flawlessly.
I have tried very generic queries like:
SELECT DISTINCT * WHERE {
?s ?p ?o
}
LIMIT 10
or
select distinct ?p ?label
where {
?s ?p ?o .
OPTIONAL { ?p rdfs:label ?label }
}
and they return non-empty results both in my console and in the generic SPARQL editor I mentioned before.
Performing a CURL request, modifying the template query by the first mentioned "stations/date of each cycle" one I am able to get the data:
curl -X POST "https://sparql.ifremer.fr/argo/query" --data-urlencode "query=select ?s ?o ?p where{?s ?o ?p.} limit 10"
This makes me think that an outdated Virtuoso server on their side might be the culprit, however, I am very new in SPARQL and Semantics to tell and I would appreciate any clue.
As OP said in comment --
I found the culprit and it was an error from my side. I am using SPARQLWrapper and somewhere in my code I had a fixed SPARQL endpoint from another research infrastructure, I totally missed that.

SPARQL wikidata query: Obtain number of languages that associated wikipedia article is available in

Is it possible to use SPARQL on wikidata to extract the number of languages of a wikipedia article associated with a wikidata item?
I am new to SPARQL and wikidata. I have tried to find examples online, but no luck so far. I am able to extract the url for the wikipedia article. But I wonder if it is possible to count the number of languages for which the url exists. Here is my code so far:
prefix schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?museum ?museumLabel ?article WHERE {
?museum wdt:P17 wd:Q39; # countries
wdt:P31 ?type.
?type (wdt:P279*) wd:Q207694.
# If available, get the "en" entry, use native language as fallback:
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de". }
# get wikipedia article in english
OPTIONAL {
?article schema:about ?museum .
?article schema:inLanguage "en" .
FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")
}
}
order by ?museum
I would like to add another column to my output that gives me the number of languages a wikipedia article exists in.
PREFIX schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?museum ?museumLabel (COUNT(DISTINCT ?article) as ?wiki_article_count) WHERE {
?museum wdt:P17 wd:Q39; # countries
wdt:P31 ?type.
?type (wdt:P279*) wd:Q207694.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de". }
OPTIONAL {
?article schema:about ?museum .
}
}
GROUP BY ?museum ?museumLabel
ORDER BY DESC( ?wiki_article_count )
Further developing the query given in the question, one can obtain the number of linked wiki-pages (I am absolutely not sure about the nomenclature).
Please try the Wikidata Query Service here: https://w.wiki/334g
For example, one can see for the wikidata/wiki entry: https://www.wikidata.org/wiki/Q194626 (Kunstmuseum Basel) (nomenclature remains unclear to me for now. Is it a wiki-entity?) 22 linked wiki-pages. That is 21 wikipedia-languages and one multilingual-site. See screenshot here:
I would be thankful for any comments on this answer and could improve this answer accordingly.

SPARQL Federated Query Not Returning All Solutions

This is an evolution of this question.
Basically I am having trouble getting all the solutions to a SPARQL query from a remote endpoint. I have read through section 2.4 here because it seems to describe a situation almost identical to mine.
The idea is that I want to filter my results from DBPedia based on information in my local RDF graph. The query is here:
PREFIX ns1:
<http://www.semanticweb.org/caeleanb/ontologies/twittermap#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT *
WHERE {
?p ns1:displayName ?name .
SERVICE <http://dbpedia.org/sparql> {
?s rdfs:label ?name .
?s rdf:type foaf:Person .
}
}
And the only result I get is dbpedia:John_McCain (for ?s). I think this is because John McCain is the only match in the first 'x' results, but I can't figure out how to get the query to return all matches. For example, if I add a filter like:
SERVICE <http://dbpedia.org/sparql> {
?s rdfs:label ?name .
?s rdf:type foaf:Person .
FILTER(?name = "John McCain"#en || ?name = "Jamie Oliver"#en)
}
Then it correctly returns BOTH dbpedia:Jamie_Oliver and dbpedia:John_McCain. There are dozens of other matches like Jamie Oliver that do not come through unless I specifically add it to a Filter like this.
Can someone explain a way to extract the rest of the matches? Thanks.
It looks like the cause of this issue is that the SERVICE block is attempting to pull all foaf:Persons from DBPedia, and then filter them based on my local Stardog db. Since there is a 10,000 result limit when querying DBPedia, only matches which occur in that set of 10,000 arbitrary Persons will be found. To fix this, I wrote a script to put together a FILTER block containing every string name in my Stardog db and attached it to the SERVICE block to filter remotely and thereby avoid hitting the 10,000 result limit. It looks something like this:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX ns1: <http://www.semanticweb.org/caeleanb/ontologies/twittermap#>
CONSTRUCT{
?s rdf:type ns1:Person ;
ns1:Politician .
}
WHERE {
?s rdfs:label ?name .
?s rdf:type dbo:Politician .
FILTER(?name IN ("John McCain"#en, ...)
}

How to check if Japanese version of Wikidata item exists

I'm trying to build a list of names and match them to their Japanese equivalents.
I first thought about trying to crawl Wikipedia and following a link to the Japanese version of the page but I didn't know how to check whether the page was about a person or anything else.
Thankfully there's the wikidata and dbpedia projects.
I started tinkering with wikidata and found this example
https://www.mediawiki.org/wiki/Wikibase/Indexing/SPARQL_Query_Examples#People_born_before_year_1880_with_no_death_date
which can be shrunk to a query for 'people'
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
?h wdt:P31 wd:Q5 .
} LIMIT 1
That results in a link for George Washington
https://www.wikidata.org/wiki/Q23
At the bottom of that page is a list of links to wikipedia pages for this person
in other languages including Japanese.
Is there a way of returning the name and Japanese version in the same query?
If you are querying wikidata, you may use label service:
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX v: <http://www.wikidata.org/prop/statement/>
SELECT * WHERE {
wd:Q30 p:P6/v:P6 ?p .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
?p rdfs:label ?enName .
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "ja" .
?p rdfs:label ?jaName .
}
}
Link
I've found answer here