How to ensure that you only get a single claim returned on a SPARQL query in Wikidata? - sparql

This is a wikidata to get the TED talks ID for an individual. Even though the TED ID field is constrained to a single value, two have been entered, this means that this query returns two rows. I would just like to get one of the two claims (the default one??? I don't really care which) returned.
https://w.wiki/5RVk
SELECT ?itemLabel ?TED
WHERE {
VALUES ?item {
wd:Q321698
}
{
OPTIONAL { ?item wdt:P2611 ?TED. }
}
}
GROUP BY ?itemLabel ?TED
I don't really want to group them - but I could fall back on that.

Related

Can this SPARQL search query be made more efficient?

I have a compound 'search' query made in SPARQL that
(1) Searches for unique subject URIs that are of a certain rdf:type:
Example:
SELECT ?s FROM NAMED <http://www.example.org/graph1> FROM NAMED <http://www.example.org/graph2>
{
GRAPH ?g
{
?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.example.org/widget>.
}
} OFFSET 10000 LIMIT 100
This query is quite simple and just returns all subjects of type 'widget'.
(2) For the returned page of satisfying subject URIs, search for all subject URIs that have a reference to those subject URIs (i.e. referencing entities), specifying the reference predicate URIs that indicate a reference.
Let's say the previous query (1) returned 2 subject URIs http://www.example.org/widget100 and http://www.example.org/widget101 and the referencing predicate I wanted to query for was http://www.example.org/widget:
Example:
SELECT ?s FROM NAMED <http://www.example.org/graph1> FROM NAMED <http://www.example.org/graph2>
WHERE {
UNION
{
?s <http://www.example.org/widget> <http://www.example.org/widget100>
}
UNION
{
?s <http://www.example.org/widget> <http://www.example.org/widget101>
}
}
If the previous page returned 100 subject URIs, there would be 100 'UNION' statements here for each subject.
This query works - it selects the subject URIs of the given type, and returns the additional subject URIs that reference those subjects with the given reference predicate.
The problem is in practice, when I have 100,000s of triples across my query graphs, even on a fast machine on an in-memory graph this query is taking typically 1 minute+ to execute. This is unacceptably slow for users for this fairly typical search scenario.
Under profiling, both queries take roughly 50% of the query time.
I have enough experience with SPARQL to construct such a query above, but I am certainly not an expert. I am wondering if this could be made more efficient. For example, could it be combined into a single query that might at least reduce query times by 50%+? Is my use of UNIONs across potentially many subjects replacable by a more efficient method?
Thank you
SPARQL Guy
UPDATE: I have managed to reduce the query down to a single query of the following form:
SELECT *
FROM NAMED <http://www.example.org/widgets>
FROM NAMED <http://www.example.org/widgetstats>
FROM NAMED <http://www.example.org/widgetmetadata>
FROM NAMED <http://www.example.org/widgetfactory>
WHERE
{ { SELECT ?s ?p ?o
WHERE
{ GRAPH ?g
{ ?s ?p ?o }
{ SELECT ?s
WHERE
{ GRAPH ?i
{ ?s a <http://www.example.org/widget> }
}
OFFSET 0
LIMIT 100
}
}
}
UNION
{ SELECT ?s ?p ?o
WHERE
{ GRAPH ?g
{ ?s ?p ?o }
{ SELECT DISTINCT ?s
WHERE
{ GRAPH ?h
{ OPTIONAL
{ ?s <http://www.example.org/widgetstats/widget> ?x }
OPTIONAL
{ ?s <http://www.example.org/widgetmetadata/widget> ?x }
OPTIONAL
{ ?s <http://www.example.org/widgetfactory/widget> ?x }
}
{ SELECT ?x
WHERE
{ GRAPH ?i
{ ?x a <http://www.example.org/widget> }
}
OFFSET 0
LIMIT 100
}
}
}
}
}
}
This improves query speed by approx. 50%. The query can, though, I think be made faster. This form of query - fetching first all triples associated with the primary entities of the given type followed by all the triples associated with the referencing entities - requires two identical innermost subqueries, fetching the unique subjects of the given type.
Is there any way of reducing this query down - perhaps performing with a single query instead of a UNION of two subqueries? I am assuming this will probably improve performance further.
UPDATE 2: I couldn't improve on the query above (first update) and so I will make this as the answer for now.
If you still want the paging of the first query then probably the best approach would be to combine the queries using a SPARQL subquery.
Note that with subqueries you work from the inside out, so the subquery selects the widgets and the outer query expands to find the references. If you are using FROM NAMED then you need to match on the graph (assuming your results are in a named graph and you aren't working with a union default graph). The OFFSET and LIMIT on the inner query means that the example below returns references to the 3rd widget (in whatever default sort order the engine is applying).
I'm not sure if this will speed up the overall query time, but worth experimenting with and saves you a bunch of string concantenation!
PREFIX ex: <http://www.example.org/>
SELECT ?s FROM NAMED ex:g1 FROM NAMED ex:g2 WHERE {
GRAPH ?h {
?s ex:widget ?x
}
{
SELECT ?x WHERE {
GRAPH ?g {
?x a ex:widget
}
} OFFSET 2 LIMIT 1
}
}

get a variable number of columns for output in sparql

Is there a way to get a variable number of columns for a given predicate? Essentially, I want to turn this:
title note
A. 1
A. 2
A. 3
B. 4
B. 5
into
title note1 note2 note3
A. 1 2 3
B. 4 5 null
Like, can i set the columns created to the maximum number of "notes" in the query or something. Thanks.
There are several ways you can approach this. One way is to change your query. Now, in the general case it is not possible to do a SELECT query that does exactly what you want. However, if you happen to know in advance what the maximum number of notes per title is, you can sort of do this.
Supposing your original query was something like this:
SELECT ?title ?note
WHERE { ?title :hasNote ?note }
And supposing you know titles have at most 3 notes, you could probably (untested) do something like this:
SELECT ?title ?note1 ?note2 ?note3
WHERE {
?title :hasNote ?note1 .
OPTIONAL { ?title :hasNote ?note2 . FILTER (?note2 != ?note1) }
OPTIONAL { ?title :hasNote ?note3 . FILTER (?note3 != ?note1 && ?note3 != ?note2) }
}
As you can see this is not a very nice solution though: it doesn't scale and is probably very inefficient to process as well.
Alternatives are various forms of post-processing. To make it simpler to post-process you could use an aggregate operator to get all notes for a single item on a single line at least:
SELECT ?title (GROUP_CONCAT(?note) as ?notes)
WHERE { ?title :hasNote ?note }
GROUP BY ?title
result:
title notes
A. "1 2 3"
B. "4 5"
You could then post-process the values of the ?notes variable to split them into the separate notes again.
Another solution is that instead of using a SELECT query, you use a CONSTRUCT query to give you back an RDF graph, rather than a table, and work directly with that in your code. Tables are kinda weird in an RDF world if you think about it: you're querying a graph model, why is the query result not a graph but a table?
CONSTRUCT
WHERE { ?title :hasNote ?note }
...and then process the result in whatever API you're using to do the queries.

Wikidata: an effective way to count items that share two properties

I would like to count the number of Wikidata items that have two properties at the same time. For example, a Viaf ID and a BNF ID, or a LoC Id and a SUDOC id. The first way that comes to my mind would be a query like this:
SELECT (COUNT(DISTINCT ?item) AS ?count) WHERE {
?item wdt:P214 ?viaf.
?item wdt:P268 ?bnf.
}
Try it.
But this query is inefficient (23 seconds) and, to apply it to 10 properties, would require 90 comparisons two by two. Is there a more efficient way to perform these calculations?

How to get all the entities that do not have a given attribute?

I need to formulate a SPARQL query that returns me all entities that have a given number of values for a given attribute. For example, I want to have all the countries that border with exactly two other countries.
I also might want to find all countries that do not border with any other country (so, the number of values of the attribute "hasBorderWith" is equal to zero. In this context, it is not clear to me if there is a difference between the following two cases:
An entity has zero values for the given attribute.
An entity does not have the given entity.
For example, I can imagine that a country that does not have borders with other country does not have "hasBorderWith" attribute. Will it cause a problem?
There are a couple of questions embedded here. To find countries bordered by exactly two countries, you'd need to group by the country match and get the count. Then use HAVING, which is executed after the aggregate has been calculated to filter by the count criteria:
SELECT ?country (count(?bordered) AS ?borderCount)
WHERE {
?country a :Country .
?country :hasBorderWith ?bordered
} GROUP BY ?country
HAVING (?borderCount = 2)
For the second question, I don't see a difference between 0 and no property, and this can be computed with a negation query:
SELECT ?country
WHERE {
?country a :Country .
FILTER NOT EXISTS {
?country :hasBorderWith ?x
}
}
EDIT: to find a count of 0
Per the questions and #ASKW's suggestion, the following would get a count of 0 if there are no hasBorderWith properties:
SELECT ?country (count(?bordered) AS ?borderCount)
WHERE {
?country a :Country .
OPTIONAL {
?country :hasBorderWith ?bordered
}
} GROUP BY ?country
HAVING (?borderCount = 0)
The OPTIONAL clause allows the match to occur, but will not contribute to the count(?bordered) aggregate if ?bordered is not bound, hence members of :Country without a :hasBorderWith property will get a count of 0.

How to form SPARQL queries that refers to multiple resources

My question is a followup with my first question about SPARQL here.
My SPARQL query results for Mountain objects are here.
From those results I picked a certain object resource.
Now I want to get values of "is dbpedia-owl:highestPlace of" records for this chosen Mountain object.
That is, names of mountain ranges for which this mountain is highest place of.
This is, as I figure, complex. Not only because I do not know the required syntax, but also I get two objects here.
One of them is Mont Blank Massif which is of type "place".
Another one is Western Alps which is of type "mountain range" - my desired record.
I need record # 2 above but not 1. I know 1 is also relevant but sometimes it doesn't follow same pattern. Sometimes the records appear to be of YAGO type, which can be totally misleading. To be safe, I simply want to discard those records whenever there is type mismatch.
How can I form my SPARQL query to get these "is dbpedia-owl:highestPlace of" records and also have the type filtering?
you can use this query, note however that Mont_Blanc_massif in your example is both a dbpedia-owl:Place and a dbpedia-owl:MountainRange
select * where {
?place dbpedia-owl:highestPlace :Mont_Blanc.
?place rdf:type dbpedia-owl:MountainRange.
}
run query
edit after comment: filter
It is not really clear what you want to filter (yago?), technically you can filter for example like this:
select * where {
?place dbpedia-owl:highestPlace :Mont_Blanc.
?place rdf:type dbpedia-owl:MountainRange.
FILTER NOT EXISTS {
?place ?pred ?obj
Filter (regex(?obj, "yago"))
}
}
this filters out results that have any object with 'yago' in its URL.
Extending the result from the previous answer, the appropriate query would be
select * where {
?mountain a dbpedia-owl:Mountain ;
dbpedia-owl:abstract ?abstract ;
foaf:depiction ?depiction .
?range a dbpedia-owl:MountainRange ;
dbpedia-owl:highestPlace ?mountain .
FILTER(langMatches(lang(?abstract),"EN"))
}
LIMIT 10
SPARQL Results
This selects mountains with English abstracts that have at least one depiction (or else the pattern wouldn't match) and for which there is some mountain range of which the mountain is the highest place. Without the parts from the earlier question, if you just want to retrieve mountains that are the highest place of a range, you can use a query like this:
select * where {
?mountain a dbpedia-owl:Mountain .
?range a dbpedia-owl:MountainRange ;
dbpedia-owl:highestPlace ?mountain .
}
LIMIT 10
SPARQL results