Duplicates in SPARQL clause VALUES lead to unexplainable result

Duplicates in SPARQL clause VALUES lead to unexplainable result - sparql

I accidentaly noticed that if you write SPARQL query like this
SELECT ?id ?idLabel WHERE{
VALUES ?id { wd:Q1 wd:Q1 }.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
the result is kinda strange: four rows instead of expected two.
I get it that it's lame to write duplicates in the VALUE clause, but I'm just wondering why it works like this. Could someone explain please?

It's a bug of the non standard label service. When running this query you get two results for the wikibase:label of wd:Q1, these are then added to each wd:Q1. So when you have 2 same values you will get 4 rows, if you have 3 same values, you'll get 9 and so on.

Related

Wikidata: order by last updated item in specific date time range

I noticed that when a query does not have an 'order by' statement at the end, the results are ordered in a last-updated-first manner.
SELECT ?subjectLabel ?subject WHERE {
?subject wdt:P31 wd:Q11424.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 50
What if I wanted to control this and get all items that have been updated during the month of April 2021 only? I would like to limit the results in some way but I can't find the proper way of doing that. Thanks for your help!

SPARQL query for semantic similarity in Wikidata times out

I'd like to find entities "similar" to John Harrison in Wikidata. My naive SPARQL query always times out.
SELECT ?similar ?similarLabel (COUNT(?p) AS ?similarity) WHERE {
wd:Q314335 ?p ?o.
?similar ?p ?o.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?similar ?similarLabel
HAVING (?similarity > 5)
ORDER BY DESC(?similarity)
I've tried limiting the number of properties from a sub query, but it still times out.
Is there a more efficient SPARQL query that might succeed? Are there any other Blazegraph extensions (e.g., GAS) that might help here?

Wikidata SPARQL query returns wrong results

This query, why does it return Gabriel Heinze and not Cristiano Ronaldo? Both satisfy the criteria
SELECT DISTINCT ?person ?personLabel WHERE {
?person wdt:P54 wd:Q18656.
?person wdt:P54 wd:Q75729.
?person wdt:P54 wd:Q8682.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Querying clubs Cristiano Ronaldo is member of, using wdt as property prefix returns only the Real Madrid FC as this statement has a higher priority rank, and wdt focuses on the highest priority rank statements.
Unfortunatly, there is no direct substitute to wdt that would include lower priority statements, but you can use a combination of p and ps:
the fixed query to find all clubs Ronaldo has been part of
your query fixed
Thanks for asking: researching, I finally learned what the t stands for in wdt: truthy \o/

How to force virtuoso sparql endpoint return full answer?

I want to query DBpedia and use Virtuoso. In some queries which their results are too much, it returns only part of the results. For example, in the query below, the predicate http://dbpedia.org/ontology/birthplace is missing. Is there any way to get all results either from Virtuoso or any other endpoint ?
SELECT DISTINCT ( ?p AS ?outEdge )
( ?q AS ?inEdge )
( ?px AS ?dest )
( ?qx AS ?source )
WHERE {
{ <http://dbpedia.org/resource/England> ?p ?px . }
UNION
{ ?qx ?q <http://dbpedia.org/resource/England> . }
}

I want to query DBPeida and use virtuoso. In some queries which their results are too much it returns only part of the results for example in the below query the predicate http://dbpedia.org/ontology/birthplace is missing. Is there anyway to get all results either from virtuoso or any other endpoint ?
While I don't detect anything malicious or mean-spirited in your question, you're essentially asking how circumvent DBpedia's defenses against intentional and unintentional denial of service attacks. Internal limits help to ensure that too many resources aren't consumed by any particular query. The right way to get all the results from a SPARQL query, if they aren't all returned at once, is to use limit, offset, and order by, and to use multiple queries. E.g.,
#-- get first 10 results
select ... where ...
order by ?name
limit 10 offset 0
#-- get next 10 results
select ... where ...
order by ?name
limit 10 offset 10
#-- get more resuls
select ... where ...
order by ?name
limit 10 offset 20

How to retrieve blank nodes from DBpedia in SPARQL, and explaining reduced results with DISTINCT

I want to retrieve blank nodes with a SPARQL query. I am using DBpedia as my dataset. For example, when I use the following query, I got a count of about 3.4 million results.
PREFIX prop:<http://dbpedia.org/property/>
select count(?x) where {
?x prop:name ?y
}
SPARQL results
When I use the DISTINCT solution modifier, I get approximately 2.2 million results.
PREFIX prop:<http://dbpedia.org/property/>
select count(DISTINCT ?x) where {
?x prop:name ?y
}
SPARQL results
I have two questions:
Are the 1.2 million records eliminated in the second query duplicates or blank nodes or something else?
How can I retrieve blank nodes and their values from DBpedia?

Getting Blank Nodes
A query like this could be used to retrieve (up to 10) blank nodes:
select ?bnode where {
?bnode ?p ?o
filter(isBlank(?bnode))
}
limit 10
However, I get no results. It doesn't look like there are blank nodes (as subjects, anyhow) in the DBpedia data.
Using DISTINCT and duplicate results
The reason that your queries return a different number of results is that ?x's have more than one name. A query like your first one:
select count(?x) where { ?x prop:name ?y }
on data like:
<somePerson> prop:name "Jim" .
<somePerson> prop:name "James" .
would produce 2, since there are two ways to match ?x prop:name ?y. ?x is bound to <somePerson> in both of them, but ?y is bound to different names. In a query like your second one:
select count(DISTINCT ?x) where { ?x prop:name ?y }
you're explicitly only counting the distinct values of ?x, and there's only one of those in my sample data. This is one way that you can end up with different numbers of results, and it doesn't require any blank nodes.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Duplicates in SPARQL clause VALUES lead to unexplainable result - sparql

It's a bug of the non standard label service. When running this query you get two results for the wikibase:label of wd:Q1, these are then added to each wd:Q1. So when you have 2 same values you will get 4 rows, if you have 3 same values, you'll get 9 and so on.

Related

Wikidata: order by last updated item in specific date time range

SPARQL query for semantic similarity in Wikidata times out

Wikidata SPARQL query returns wrong results

How to force virtuoso sparql endpoint return full answer?

How to retrieve blank nodes from DBpedia in SPARQL, and explaining reduced results with DISTINCT

Categories

Resources