Property paths in Sparql - sparql

I am posing a Sparql query to Dbpedia endpoint with property paths:
select (COUNT(distinct ?s2) AS ?count) WHERE{
?s2 skos:broader{0,2} dbc:Countries_in_Europe
}
I want to pose the same query without property paths:
select (COUNT(distinct ?s2) AS ?count) (COUNT(distinct ?s1) AS ?count1) WHERE{
?s2 skos:broader dbc:Countries_in_Europe.
?s1 skos:broader ?s2.
}
I have two questions:
Is it possible to get ?s1+?s2 for the second query?
For the second query, I expect the sum of the count numbers +1 (dbc:Countries_in_Europe) should be the same with the first query. But they are not. What is wrong?
Thanks in advance.

You're using non-standard SPARQL, i.e. restricting the depth did not make it to the final version, see the W3C specs
I guess the first query is supposed to returns sub-categories of the given one up to a depth 2, right? Your second query doesn't do the same. You have to use a UNION of each distance, i.e. one for the direct sub-categories, and one for the other levels.
SELECT (COUNT(distinct ?s) AS ?count) WHERE {
{
?s skos:broader dbc:Countries_in_Europe
} UNION {
?s1 skos:broader dbc:Countries_in_Europe.
?s skos:broader ?s1
}
}
Note, in your first query you used {0,2} which means due to 0 distance the category dbc:Countries_in_Europe itself is also part of the result. If you need it, you should add +1 to the result of the second query.
Update
As per #JohuaTaylor's comment below, a more compact syntax would be
SELECT (COUNT(distinct ?s) AS ?count) WHERE {
?s skos:broader/skos:broader? dbc:Countries_in_Europe
}

Related

SPARQL - Return mutual objects of a list of subjects

How can i get all predicates + objects, which are shared by a list of subjects - without knowing anything about the predicates/objects of these subjects?
Let's look at this example query from Wikidata:
SELECT ?chancellor WHERE{
?chancellor wdt:P39 wd:Q4970706. #P39 = position held, Q4970706 = Chancellor of Germany
}
Link to this query.
This query returns all former chancellors of germany.
Now i want to return every predicate + object, which every chancellor has in common e.g. every of the subjects is an instance of human, is born in Germany and whatever.
I guess this is an easy one. However i have no idea.
This is a good one. Here's a near-hit:
prefix wdt: <http://www.wikidata.org/prop/direct/>
prefix wd: <http://www.wikidata.org/entity/>
select ?p ?o (count(distinct ?chancellor) as ?cs) where {
?chancellor wdt:P39 wd:Q4970706.
?chancellor ?p ?o .
}
group by ?p ?o
order by desc(?cs)
Link to query
This takes all chancellors, and their properties and values. It counts the number of chancellors per prop/val.
By ordering that you can see the most common prop / vals at the top.
Now what you want is the only the results for all chancellors. We can get the number of chancellors in one query easily enough, and stick the two together:
prefix wdt: <http://www.wikidata.org/prop/direct/>
prefix wd: <http://www.wikidata.org/entity/>
select ?p ?o where {
{
# Find number of chancellors
select (count(?chancellor) as ?num_chancellors) where {
?chancellor wdt:P39 wd:Q4970706
}
}
{
# Find number of chancellors per predicate / value
select ?p ?o (count(distinct ?chancellor) as ?chancellor_count) where {
?chancellor wdt:P39 wd:Q4970706.
?chancellor ?p ?o .
}
group by ?p ?o
}
# Choose results all chancellors share
filter (?num_chancellors = ?chancellor_count)
}
Link to query.
I think this does what you want. Not very pretty, I confess.
An interesting aspect of SPARQL and RDF is that you don't need to know anything about the data to query it. In your case I'd suggest adding the triple pattern ?chancellor ?p ?o . and select ?p and ?o. From there you can choose any property you're looking for. Be sure to use OPTIONAL if some of the ?chancellor matches don't have that property value.

How to return all S->P->O triples from a starting resource to a specified path depth?

My goal is to graphically represent the S->P->O relations within a depth two edges from the specified resource, p:Person_1. I want all relations within that path length to be returned from my query as ?s, ?p, ?o for further processing in my graphical application.
I tried the first query below which gives me my first set of ?s ?p ?o with repeats, then ?p2, ?o2, ?p3, ?o3 as additional columns in the result. I want to bind ?p2 and ?p3 to ?p, ?o2 and ?o3 to ?o.
SELECT *
WHERE {
p:Person_1 ?p ?o .
BIND("p:Person_1" as ?s)
OPTIONAL{
?o ?p2 ?o2 .
}
OPTIONAL{
?o2 ?p3 ?o3 .
}
}
Then, based on How do I construct get the whole sub graph from a given resource in RDF Graph?, I tried using CONSTRUCT to return the graph.
PREFIX p: <http://www.example.org/person/>
PREFIX x: <example.org/foo/>
construct { ?s ?p ?o }
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:|!x:)* ?s .
?s ?p ?o .
}
I am using Virtuoso and I get the error:
Virtuoso 37000 Error SP031: SPARQL compiler: Variable ?_::trans_subj_9_3 in T_IN list is not a value from some triple
I could post-process the result from my first query but I want to learn how to do this correctly with SPARQL, preferably on Virtuoso.
Update after testing the advice from #AKSW :
Both CONSTRUCT and SELECT statements work with the pattern suggested.
CONSTRUCT { ?s ?p ?o }
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:foo|!x:bar)* ?s .
?s ?p ?o .
} LIMIT 100
and:
SELECT s ?p ?o
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:foo|!x:bar)* ?s .
?s ?p ?o .
} LIMIT 100
The SELECT results in several duplicates that cannot be removed using DISTINCT, which results in an error that I assume is due to the 'datatype' of some of the returned values.
Virtuoso 22023 Error SR066: Unsupported case in CONVERT (DATETIME -> IRI_ID)
It appears some post-SPARQL processing is in order.
This gets me most of the way there. Still hoping I can find a solution for SPARQL that is like Cypher's "number of hops away" :
OPTIONAL MATCH path=s-[*1..3]-(o)
Here is a SPARQL query that works in Virtuoso. Note the SPARQL W3C standard does not support this syntax and it will fail in other triplestores.
PREFIX p: <http://www.example.org/person/>
PREFIX x: <example.org/foo/>
# CONSTRUCT {?s ?p ?o} # If you wish to return the graph
SELECT ?s ?p ?o # To return the triples
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:foo|!x:bar){1,3} ?s .
?s ?p ?o .
}LIMIT 100
See also K. Idehen's wiki entry here: http://linkedwiki.com/exampleView.php?ex_id=141
And thanks to #Joshua Taylor for advice in the same area.
Working Drafts of SPARQL 1.1 Property Paths included the {n,m} operator for handling this issue, which was implemented (and will remain supported) in Virtuoso. Here's a tweak to #tim's response.
Live SPARQL Query Results Page using the DBpedia endpoint (which is a Virtuoso instance).
Live SPARQL Query Definition Page that opens up query source code in the default DBpedia query editor.
Actual Query Example:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?s AS ?Entity
?o AS ?Category
WHERE {
?s rdf:type <http://dbpedia.org/ontology/AcademicJournal> ;
rdf:type{1,3} ?o
}
LIMIT 100
Should you be looking for LinkedIn-like presentation of Contact Networks and Degrees of Separation between individuals, here is an example using Virtuoso-specific SPARQL Extensions that solve this particular issue:
SELECT ?o AS ?WebID
((SELECT COUNT (*) WHERE {?o foaf:knows ?xx})) AS ?contact_network_size
?dist AS ?DegreeOfSeparation
<http://www.w3.org/People/Berners-Lee/card#i> AS ?knowee
WHERE
{
{
SELECT ?s ?o
WHERE
{
?s foaf:knows ?o
}
} OPTION (TRANSITIVE, t_distinct, t_in(?s), t_out(?o), t_min (1), t_max (4), t_step ('step_no') AS ?dist) .
FILTER (?s= <http://www.w3.org/People/Berners-Lee/card#i>)
FILTER (isIRI(?s) and isIRI(?o))
}
ORDER BY ?dist DESC (?contact_network_size)
LIMIT 500
Note: this approach is the only way (at the current time) to expose actual relational hops between entities in an Entity Relationship Graph that includes Transitive relations.
Live Link to Query Results
Live Link to Query Source Code
Bearing in mind that the r{n,m} operator was deprecated in the final SPARQL 1.1 (but will remain supported in Virtuoso), you can use r/r?/r? instead of r{1,3}, if you want to work strictly off the current spec:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?s AS ?Entity
?o AS ?Category
WHERE {
?s rdf:type <http://dbpedia.org/ontology/AcademicJournal> ;
rdf:type / rdf:type? / rdf:type? ?o
}
LIMIT 100
Here's a live example, against the DBpedia instance hosted in Virtuoso.

How to exclude nodes from path?

I want to get all mathematicians from DBpedia, so I wrote this query for DBpedia's SPARQL service:
SELECT DISTINCT ?person
{
?person dct:subject ?category.
?category skos:broader* dbc:Mathematicians.
}
The problem with this is that the category Mathematicians is polluted, due to categories like dbc:Euclid, which then includes all of Euclidean geometry. I believe it's categories like these which cause the query to fail:
Virtuoso 42000 Error TN...: Exceeded 1000000000 bytes in transitive temp memory. use t_distinct, t_max or more T_MAX_memory options to limit the search or increase the pool
A lot of the problematic categories are in dbc:Wikipedia_categories_named_after_mathematicians.
Is there some way to ignore these categories in the skos:broader* path that would make the error go away?
You can list the categories that you don't want to include by filtering them out:
SELECT DISTINCT ?person
{
?person dct:subject ?category.
?category skos:broader* dbc:Mathematicians.
FILTER (?category NOT IN (dbc:Euclid))
}
But that won't remove the error because Virtuoso still needs to traverse the skos:broader hierarchy, exhausting 'transitive heap memory'. Other approaches include selecting specific categories or traversing part of the hierarchy.
The specific category could use UNION statements, but the VALUES shortcut is a simpler syntax:
SELECT DISTINCT ?person
{
VALUES ?category {dbc:Mathematicians dbc:Mental_calculators dbc:Lists_of_mathematicians}
?person dct:subject ?category.
}
For querying part of the hierarchy, you can use some property path expressions. This one will get parents and grandparents:
SELECT DISTINCT ?person
{
?person dct:subject ?category.
?category skos:broader | (skos:broader/skos:broader) dbc:Mathematicians.
# filter as desired - FILTER (?category NOT IN (dbc:Euclid))
}

sparql how to count variable pairs

I have the following query that gets instances of a class and their label/names. I want to count how many total results there are. However, I do not know how to formulate the count statement.
select ?s ?l {
?s a <http://dbpedia.org/ontology/Ship> .
{?s <http://www.w3.org/2000/01/rdf-schema#label> ?l}
union
{?s <http://xmlns.com/foaf/0.1/name> ?l}
}
I have tried
select ?s ?l (count (?s) as ?count) {
?s a <http://dbpedia.org/ontology/Ship> .
{?s <http://www.w3.org/2000/01/rdf-schema#label> ?l}
union
{?s <http://xmlns.com/foaf/0.1/name> ?l}
}
But that gives the counting for each ?s ?l pair, instead I need to know how many of the ?s ?l pairs there are. Or maybe I should not use count at all? As mentioned all I need to know is how many results in total a query returns (regardless of the hard limit that is put by the server, e.g., DBPedia returns a maximum of 50000 results for each query).
Any suggestions please?
Many thanks!
To count the number of matches, use
SELECT (COUNT(*) AS ?count)
WHERE {
?s <http://www.w3.org/2000/01/rdf-schema#label> | <http://xmlns.com/foaf/0.1/name> ?l .
}
Note I'm using the property path "or" (|) to get the union of the properties.

sparql to retrieve the value of a min constraint

How can I retrieve a min constraint on a class' attribute using sparql? I have value min 1000 decimal, and I would like to get 1000
In a hypothetical world that you have such a statement:
Class: X subClassOf: hasObjectProperty min 1 Y
If you write a SPARQL query as:
SELECT *
WHERE {
?s rdfs:subClassOf ?o.
}
You must extract all the refs:subClassOf axioms. However, if you need to precise and know which ones have cardinality restrictions, you need to go further:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix : <http://example.com#>
SELECT *
WHERE {
?s rdfs:subClassOf ?o.
?o ?x ?y.
filter(?s = :X)
}
Among others, you can see the following result:
As you can see, there are 2 relevant items, one is Y and one is the number presented as a non-negative integer. Therefore, one way to get each item is to put a filter for ?x in the SPARQL query and get each one one by one out. For example, filter owl:onClass will give you ?y:
prefix : <http://example.com#>
SELECT *
WHERE {
?s rdfs:subClassOf ?o.
?o owl:onClass ?y.
filter(?s = :X)
Here is the sparql query I used following Artemis' answer
SELECT ?min
WHERE {?s rdfs:subClassOf ?o.
?o owl:minQualifiedCardinality ?min.
FILTER(?s = :value) }
And with jena, I use getLiteral("min").getFloat();