The following SPARQL query returns 20 results. I was expecting 10 given the OFFSET and LIMIT
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbpedia:<http://dbpedia.org/resource/>
PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>
PREFIX dbpprop: <http://dbpedia.org/property/>
SELECT ?person_id ?person2_id
WHERE {
{
SELECT DISTINCT ?person_id ?person2_id WHERE {
?person rdf:type dbpedia-owl:Person .
?person2 rdf:type dbpedia-owl:Person .
?person ?link ?person2 .
?person dbpedia-owl:wikiPageID ?person_id .
?person2 dbpedia-owl:wikiPageID ?person2_id .
FILTER (?link = dbpedia-owl:wikiPageWikiLink) .
} ORDER BY ?link
}
} OFFSET 10 LIMIT 10
I execute the code in the SPARQL endpoint of an OpenLink Virtuoso Server.
What is the problem with the query?
The clause that makes the query behave weird is ORDER BY ?link. Replacing it with ORDER BY ?person_id all works as expected. It makes still no sense to me but I am a newbie using SPARQL too.
#jordipala said
The clause that makes the query behave weird is ORDER BY ?link. Replacing it with ORDER BY ?person_id all works as expected. It makes still no sense to me but I am a newbie using SPARQL too.
Part of the issue is that the ?link values aren't string literals, though they may appear to be if you include that variable in the SELECT clauses. (Also note that the ?link values are all the same for these solutions, so you definitely need to put something else into the ORDER BY such that it does the desired job of preventing both duplicate solutions and omitted solutions.)
Also, counterintuitively given the ?link datatype, the numeric-appearing person_id and person2_id are not typed as numbers -- they're strings, unless you force their datatype.
If you simply change ?link to str(?link) in the ORDER BY, the query will deliver the desired 10 rows -- and you may note that all the ?link values are identical! -- and if you include person_id and person2_id in the ORDER BY (done in my following links by using the ordinals of the SELECT variables, because of where and how I've coerced the datatypes), you'll get more useful output, as in this query and these results
Related
I'm running the following SPARQL query on DBpedia (in fact I'm running a similar CONSTRUCT query via rdflib, see blow in the edited section):
SELECT *
WHERE {
{ ?influencer dbo:influenced ?influencee .}
UNION
{ ?influencee dbo:influencedBy ?influencer .}
?influencer rdf:type dbo:Person .
?influencee rdf:type dbo:Person .
}
The above query almost works, except that some (a small number of) triples is missing.
E.g. the following relation is missing:
<http://dbpedia.org/resource/Plato> --> <http://dbpedia.org/resource/Aristotle>
Yet, we can see the above relation really should be included, e.g. by manually examining the Aristotle entry on DBpedia and looking at the dbo:influencedBy section.
What's wore, is that if I augment the above code with some FILTER() expressions to limit the amount of returned tuples, I do get this missing relation in return:
SELECT *
WHERE {
{?influencer dbo:influenced ?influencee .}
UNION
{?influencee dbo:influencedBy ?influencer .}
?influencer rdf:type dbo:Person .
?influencee rdf:type dbo:Person .
FILTER(regex(?influencer, "Plato"))
FILTER(regex(?influencee, "Aristo"))
}
Edit 2022-07-02: I'm aware of the 10k query result limit imposed by the DBpedia backend, yet I believe this limit is not interfering here (as hinted by TallTed below). This is because, in fact, I'm using the rdflib to run the query and -- in it -- I'm using the CONSTRUCT rather than the SELECT clause:
>>> import rdflib
>>> g = rdflib.Graph()
>>> query = """
... PREFIX schema: <http://schema.org/>
... PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
... PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
... PREFIX dbo: <http://dbpedia.org/ontology/>
... PREFIX dbp: <http://dbpedia.org/property/>
... PREFIX dbr: <http://dbpedia.org/resource/>
...
... CONSTRUCT {
... ?influencer dbo:influenced ?influencee .
... }
... WHERE {
... SERVICE <https://dbpedia.org/sparql/query> {
... {?influencer (dbo:influenced|dbo:influences) ?influencee .}
... UNION
... {?influencee dbo:influencedBy ?influencer .}
...
... ?influencer rdf:type dbo:Person .
... ?influencee rdf:type dbo:Person .
... }
... }"""
>>> qres = g.query(query)
>>> len(qres)
9464
And this query returns less than 10k...
EDIT: 2022-07-02, part2:
Interestingly, running the above code with the following selector:
SELECT ?influencer ?influencee
instead of the CONSTRUCT, returns indeed 10000 results, suggesting that I'm bouncing from the limit.
So the question really is about why my CONSTRUCT clause returns much less results than the SELECT clause?
Thanks!
A quick check of the COUNT(*) on your initial query shows that there are 19,527 solutions.
It is not surprising that you don't get them all when you run your query, as there's a 10,000 solution limit on that public instance -- and it's not "a small number of" triples that's missing!
To get all solutions, you'll need to add a few clauses -- i.e., ORDER BY ?influencer ?influencee , LIMIT 10000, and OFFSET 0 -- and then run the query to get the first 10,000 solutions. Then, change the OFFSET clause to OFFSET 10000 and run the query again, to get the remaining 9,527.
Of course, you'll probably also want to add a DISTINCT to your SELECT, as there's probably no point in repeated statements of influence.
I'm doing sparql query in this site.
It gives me an empty result, what is wrong with my query?
prefix foaf: <http://xmlns.com/foaf/0.1/>
select * where {
?s rdf:type foaf:Person.
} LIMIT 100
This query is ok, but when I add the second pattern, I got empty result.
?s foaf:name 'Abraham_Robinson'.
prefix foaf: <http://xmlns.com/foaf/0.1/>
select * where {
?s rdf:type foaf:Person.
?s foaf:name 'Abraham_Robinson'.
} LIMIT 100
How to correct my query so the result includes this record:
http://dbpedia.org/resource/Abraham_Robinson
I guess Kingsley misread this as a freetext search, rather than a simple string comparison.
The query Kingsley posted and linked earlier delivers no solution, for the same reasons as the original query failed, as identified in the comment by AKSW, i.e. --
foaf:name values don't generally replaces spaces with underscores as in the original value; i.e., 'Abraham_Robinson' should have been 'Abraham Robinson'
foaf:name strings are typically langtagged, and it is in this case, so that actually needs to be 'Abraham Robinson'#en
Incorporating AKSW's fixes with this line ?s foaf:name 'Abraham Robinson'#en., the query works.
All that said -- you may prefer an alternative query, which will deliver results whether or not the foaf:name value is langtagged and whether or not the spaces are replaced by underscores. This one is Virtuoso-specific, and produces results faster because the bif:contains function uses its free-text indexes, would be --
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT *
WHERE
{
?s rdf:type foaf:Person ;
foaf:name ?name .
?name bif:contains "'Abraham Robinson'" .
}
LIMIT 100
Generic SPARQL using a REGEX FILTER works against both Virtuoso and other RDF stores, but produces results more slowly because REGEX does not leverage the free-text indexes, as in --
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT *
WHERE
{
?s rdf:type foaf:Person ;
foaf:name ?name .
FILTER ( REGEX ( ?name, 'Abraham Robinson' ) ) .
}
LIMIT 100
The following query should work. Right now it isn't working due to a need to refresh the text index associated with this instance (currently in progress):
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT * where {
?s rdf:type foaf:Person.
?s foaf:name "'Abraham Robinson'".
}
LIMIT 100
Note how the phrase is placed within single-quotes that are within double-quotes.
If the literal content is language-tagged, as is the case in DBpedia the exact-match query would take the form (already clarified in #TallTed's response):
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT * where {
?s rdf:type foaf:Person.
?s foaf:name 'Abraham Robinson'#en.
}
LIMIT 100
This SPARQL Results Page Link should produce a solution when the index update completes.
I follow up on query where the schema.org database is used to find the number of children of a class - as a simpler database than my application. I want to get the names of the children concatenated in alphabetic order. The query:
prefix schema: <http://schema.org/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?child (group_concat (?string) as ?strings)
where {
?child rdfs:subClassOf schema:Event .
?grandchild rdfs:subClassOf ?child .
bind (strafter(str(?grandchild), "http://schema.org/") as ?string)
} group by ?child order by asc(?string)
limit 20
gives
schema:PublicationEvent "OnDemandEvent BroadcastEvent"
schema:UserInteraction "UserPageVisits UserComments UserPlays UserBlocks UserDownloads UserPlusOnes UserLikes UserCheckins UserTweets"
Which is not alphabetically ordered. If I replace the sort order to desc the result is exactly the same. I seem not to understand how group by, order by and possibly bind interact.
An additional select subquery is required to push the order inside the groups:
prefix schema: <http://schema.org/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?child (group_concat (?string) as ?strings)
where {
select *
{
?child rdfs:subClassOf schema:Event .
?grandchild rdfs:subClassOf ?child .
bind (strafter(str(?grandchild), "http://schema.org/") as ?string)
} order by asc(?string)
} group by ?child
limit 20
18.5.1.7 GroupConcat:
The order of the strings is not specified.
From the horse's mouth:
On 2011-04-22, at 19:01, Steve Harris wrote:
On 2011-04-22, at 06:18, Jeen Broekstra wrote:
However, looking at the SPARQL 1.1 query spec, I think this is not a guaranteed result: as far as I can tell the solution modifier ORDER BY should be applied to the solution sequence after grouping and aggregation, so it can not influence the order of the input for the GROUP_CONCAT.
That's correct.
The following SPARQL query is giving duplicates in Virtuoso even when the DISTINCT clause is used. You can test the query in the DBpedia public endpoint. Which is the problem with the query?
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbpedia:<http://dbpedia.org/resource/>
PREFIX dbpedia-owl:<http://dbpedia.org/ontology/>
PREFIX dbpprop: <http://dbpedia.org/property/>
PREFIX vrank:<http://purl.org/voc/vrank#>
SELECT DISTINCT ?person1 ?person1_id ?person2 ?person2_id ?person2_rank
FROM <http://dbpedia.org>
FROM <http://people.aifb.kit.edu/ath/#DBpedia_PageRank>
WHERE {
?person1 rdf:type dbpedia-owl:Person.
?person2 rdf:type dbpedia-owl:Person.
?person1 ?link ?person2.
?person1 dbpedia-owl:wikiPageID ?person1_id.
?person2 dbpedia-owl:wikiPageID ?person2_id.
?person2 vrank:hasRank/vrank:rankValue ?person2_rank.
FILTER (?person1_id != ?person2_id).
FILTER (?person1_id = 308)
} ORDER BY DESC(?person2_rank) ASC(?person2_id)
SPARQL results
The results include rows that appear to be duplicates, e.g.:
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Democritus 8211 27.281
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Democritus 8211 27.281
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Heraclitus 13792 26.6914
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Heraclitus 13792 26.6914
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Parmenides 23575 19.6082
http://dbpedia.org/resource/Aristotle 308 http://dbpedia.org/resource/Parmenides 23575 19.6082
I can confirm that it appears that there are duplicates in the results. I'm not absolutely sure what the issue with the duplicates is, but I wonder if it might have something do with the inexact equality for floating point numbers. If, instead of selecting the floating point numbers directly, you select their lexical forms with (note the (str(...) as ?rank) at the end):
SELECT DISTINCT
?person1 ?person1_id
?person2 ?person2_id
(str(?person2_rank) as ?rank)
I get none of the duplicates. This might be worth reporting to the Virtuoso folks as a bug. For what it's worth, if you want floating point values for rank, you can use xsd:float as a function to turn that string back into a floating point value, and when I do that, with the select like the following, I still get the expected distinct results.
SELECT DISTINCT
?person1 ?person1_id
?person2 ?person2_id
(xsd:float(str(?person2_rank)) as ?rank)
SPARQL results
Although this won't help with your dbpedia query, anyone who arrived here via a search on the title who has control of the model and data may want to know that:
virtuoso double does not seem to suffer from this SELECT DISTINCT problem that occurs with float
On this site, for example, take the first SPARQL query and make something very similar:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX p: <http://dbpedia.org/property/>
SELECT *
WHERE {
?name p:name <http://dbpedia.org/resource/Olivier_Theyskens> .
}
Try to execute it: here
And I get no results. However, modify the query to the following:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX p: <http://dbpedia.org/property/>
SELECT *
WHERE {
?name p:name ?otherthing.
}
And I get results, even though they're not the results I want.
Why doesn't the first query work -- what am I doing wrong? :/
In this case, I think it's because you're ordering your query statement backwards.
The DBpedia resource (<http://dbpedia.org/resource/Olivier_Theyskens>) is the Entity or Subject (?s), the property (p:name) is the Attribute or Predicate (?p), and the value of that property (?name) is the Value or Object (?o).
SPARQL expects all statements to be { ?s ?p ?o }, but yours seems to be written as { ?o ?p ?s }...
To sum up, if you try this query --
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX p: <http://dbpedia.org/property/>
SELECT *
WHERE
{
<http://dbpedia.org/resource/Olivier_Theyskens> p:name ?name .
}
-- you'll get the results I think you want.
The problem with your first query is that p:name links to Literal and you try to match a URI.
If you want your first query to work you have to to use the property http://dbpedia.org/ontology/artist that links to the URI and not the literal:
SELECT *
WHERE {
?s <http://dbpedia.org/ontology/artist> <http://dbpedia.org/resource/The_Velvet_Underground> .
}
Notice the different name space for the property <http://dbpedia.org/ontology/artist> this namespace contains ontology instead of property - ontology is the one used for object properties.