"Sparql filter by number of objects return for a given predicate - sparql

In the following query I'm trying to get a list of all entries ?s that include more than 3 objects for the predicate sctap:mentionedBy. However, I keep getting a malformed query error for this search. Does anyone see anything wrong with my query?
Thanks
SELECT ?s
WHERE {
?s sctap:mentionedBy ?o
FILTER (count(?o) > 3)
}
The sparql error says: "Aggregate expression not legal". I'm not sure what that means.

Does anyone see anything wrong with my query?
Sure. Just like the error message says, you're using an aggregate expression (count(?o)) where one isn't legal. You can see in the table of contents of SPARQL 1.1 Query Language what things are filter functions that you can use in a filter, and what things are aggregates, and where you can use each. You can also try parsing queries at sparql.org's query validator. For your query, it will give you the line and column numbers where something went wrong. It's at count(?o).
In this case, you're trying to count the number of ?o values for each s, which means that you need to group by ?s, and that your filter will need to be father out. E.g.,
select ?s where {
?s sctap:mentionedBy ?o
}
group by ?s
having (count(?o) > 3)
It's unlikely to make a difference in this case, but you probably only want to count distinct values of ?o, so you could also consider:
select ?s where {
?s sctap:mentionedBy ?o
}
group by ?s
having (count(distinct ?o) > 3)

Related

How to filter the simple Subject in a SPARQL Query

I guess I am stuck at the basics with SPARQL. Can someone help ?
I simply wnat to filter all subjects containing "Mountain" of an RDS database.
Prefix lgdr:<http://linkedgeodata.org/triplify/> Prefix lgdo:<http://linkedgeodata.org/ontology/>
Select * where {
?s ?p ?o .
filter (contains(?s, "Mountain"))
} Limit 1000
The query leads to an error:
Virtuoso 22023 Error SL001: The SPARQL 1.1 function CONTAINS() needs a string value as first argument
You can get it to "work" using:
Prefix lgdr:<http://linkedgeodata.org/triplify/> Prefix lgdo:<http://linkedgeodata.org/ontology/>
Select * where {
?s ?p ?o .
filter (contains(str(?s), "Mountain"))
} Limit 1000
Note the additional str in the query.
However, that results in
Virtuoso S1T00 Error SR171: Transaction timed out
and I am not sure how to deal with that.
But in principle in works: When you use
Limit 1
you get
s p o
http://linkedgeodata.org/ontology/MountainRescue http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/2002/07/owl#Class

Non-group key variable in SELECT

I'm performing with success the following query on virtuoso web interface (e.g.: http://live.dbpedia.org/sparql)
SELECT ?o (COUNT(?member) as ?memberCount) WHERE {
?member <http://purl.org/dc/terms/subject> ?o.
FILTER isIRI(?o) {
SELECT ?o WHERE {
<http://dbpedia.org/resource/Heroic_Purgatory>
<http://purl.org/dc/terms/subject>
?o.
}
}
}
ORDER BY ?memberCount
LIMIT 1
When I do this query through Apache Jena, an exception rises up:
Non-group key variable in SELECT: ?o
I don't understand why... any suggestion?
Adding GROUP BY ?o before ORDER BY, solved my issue.
That still be illegal (and is current Apache Jena).
SELECT can only have GROUP BY variables and aggregations.
When there is no GROUP BY but is an aggregation, COUNT here, then it an implicit GROUP BY on the whole of the results (no GROUP BY key) so no variable directly mentioned, only aggregates, are possible.

Good SPARQL query to find all triples with a resource as subject or object

I need to find all triples on DBpedia where http://dbpedia.org/resource/Benin is a subject or object. This query gives me the output that I want in a format that works the best for me (just three variables and no blank spaces):
PREFIX : <http://dbpedia.org/resource/>
SELECT * WHERE {
?s ?p ?o
FILTER (?s=:Benin OR ?o=:Benin)
}
I get similar results if I have this query:
PREFIX : <http://dbpedia.org/resource/>
SELECT * WHERE {
{:Benin ?p ?o}
UNION
{?s ?p :Benin}
}
However, the formatting of the latter is off. It first gives me p and o output leaving s blank and then s and p leaving o blank. Also, the first query takes more time to execute. I will be grateful for an explanation of the mechanics of how the two queries work and why there is a difference in the output.
However, the formatting of the latter is off
That's because both queries have different result sets together with SELECT *. The union joins the tuples, but since some tuples are missing parts, you get skewed output.
You can resolve the problem by explicitly listing and selecting the variables:
PREFIX : <http://dbpedia.org/resource/>
SELECT ?s ?p ?o WHERE {
{
?s ?p ?o
FILTER (?s=:Benin)
}
UNION
{
?s ?p ?o .
FILTER (?o=:Benin)
}
}
Note that this is still much faster on dbpedia than the OR filter.
The union will return duplicates when a tuple matches both filter expressions (i.e. :Benin ?p :Benin).
SELECT DISTINCT would remedy that at additional cost and since it looks like the problem is non-existent, I omitted it for improved performance.
Also, the first query takes more time to execute.
That's hard to say without the result of an EXPLAIN(), but my first guess would be that the equality filter is using the index, while the OR filter is using a full table scan. Virtuoso does not seem to generate good query plans for nested filters.
Try this --
PREFIX : <http://dbpedia.org/resource/>
DESCRIBE :Benin
-- or just --
DESCRIBE <http://dbpedia.org/resource/Benin>
You can get the output in various other serializations, including N-triples.

Japanese Virtuoso SPARQL EndPoint times out very quickly

I am trying to use the following query on Japanese dbpedia SPARQL Endpoint
select ?s (group_concat(?album_ja ; separator = "|") AS ?name_album_ja_csv) where{
values ?sType { dbpedia-owl:Song
dbpedia-owl:Single
} .
?s a ?sType .
?s (dbpedia-owl:album|^dbpedia-owl:album)* ?albums;rdfs:label ?album_ja
}group by ?s order by ?s offset 0 limit 10
but I get this error Virtuoso 42000 Error The estimated execution time 1005 (sec) exceeds the limit of 400 (sec). Almost any query involving group by has this problem. Is this a server problem? Is my query inefficient? How can I get around it?
I'm not sure exactly what you're trying to do, but since ?o isn't used in the results, you can get rid of it. Even after you do that, though, you'll still have the same problem. You need to change the property path somehow. I don't think that you actually need arbitrary length paths in both directions. You could use ? instead of * to say "path of length 0 or 1" and thus:
select ?s
(group_concat(?album_ja ; separator = "|") AS ?name_album_ja_csv)
where {
values ?sType { dbpedia-owl:Song dbpedia-owl:Single }
?s a ?sType .
?s (dbpedia-owl:album|^dbpedia-owl:album)? ?album_ja
}
group by ?s
limit 100
SPARQL results
Note that a path of length 0 is going to be a link to itself, so one value of ?album_ja is always going to be the same as ?s. Is this really what you want?

SPARQL Query problem -> wrong answer

I want to select a triple using SPARQL. To do it, i'm using following query:
SELECT count (*)
WHERE {?s ?p ?o}
FILTER (?s=http://kjkhlsa.net && ?p=http://lkasdjlkjas.com && ?o=Test)
As answer i get fully wrong triple :( subject ist not equal to "http://kjkhlsa.net", predicate is not equal to "http://lkasdjlkjas.com" and object ist also not equal to "Test". Can someone explain me, what I'm doing wrong :(
edit1:
I have put the query into php file:
$inst_query = 'SELECT * { <http://kjkhlsa.net> <http://lkasdjlkjas.com> "Test"}';
echo $inst_query;
The answer from the echo was "SELECT * { "Test"}". Then i tried it with WHERE:
$inst_query = 'SELECT * WHERE { <http://kjkhlsa.net> <http://lkasdjlkjas.com> "Test"}';
echo $inst_query;
Here was the answer "SELECT * WHERE { "Test"}"...so, i'm missing the URIs, but this seems for me as php issue and not sparql problem.
edit2:
I've put the query into SPARQL Query editor and i get the response "no result"....but I'm sure, that i have this triple.
In its current form the question is not very clear (see my comment above).
Since you are essentially trying to get triples matching a pattern, it is more efficient to use a graph pattern instead of FILTER. Many SPARQL implementations first match candidate triples by graph patterns and only then apply the FILTER expression. In essence, with a ?s ?p ?o graph pattern, you're doing a linear scan over all your triples.
So, here's something that should work, using graph patterns instead of FILTER.
SELECT * { <http://kjkhlsa.net> <http://lkasdjlkjas.com> "Test" }
Notes: I didn't include COUNT(*) which is not standard SPARQL. <> around URIs. "" around literal.
Try to use this :
SELECT count (*) as ?count
WHERE {
?s ?p ?o
FILTER (?s=<http://kjkhlsa.net> && ?p=<http://lkasdjlkjas.com> && ?o=Test)
}
The following query uses the count function to count the number of distinct URI(s) returned to the ?s variable.
SELECT ?s (COUNT (DISTINCT ?s) as ?count)
WHERE {?s ?p ?o}
FILTER (?o="Test")