SPARQL arbitrary-length property path query with arbitrary but directed properties - sparql

Lets's say I have the following RDF graph:
#prefix : <http://example/> .
:order1 :item :z1 .
:order2 :item :z2 .
:z1 :name "Small" .
:z1 :price 5 .
:z2 :name "Large" .
:z2 :price 5 .
I want a SPARQL query to select all triples that describe :z1. i.e. all triples that are children or children of children (etc.) of :z1. So far from what I found this is called an arbitrary-length property path with arbitrary properties. I have used solutions from SPARQL property path queries with arbitrary properties and Sparql - query to get all triples related to a specific subject to come up with this query:
PREFIX : <http://example/>
SELECT ?s ?p ?o
WHERE {
?s ?p ?o .
:order1 (<>|!<>)+ ?o . # Match arbitrary-length property path with arbitrary properties
}
Which results in (in ttl format):
#prefix : <http://example/> .
:order1 :item :z1 .
:z2 :price 5 .
:z1 :name "Small" ;
:price 5 .
As you can see this also selects the :z2 :price 5 . triple, which I did not expect and is also not what I want. For some reason, it seems like the inverse :price path inference from object 5 to subject :z2 is also included.
How do I achieve selecting all triples included in the arbitrary-length property path of :z1 with arbitrary but directed properties?
NB: This is a simplified version of the real problem I try to solve where the predicates in the path and the length of the path can indeed be arbitrary.
Perhaps this drawing helps:

As mentioned by #UninformedUser I was getting all incoming edges starting from :order1, whereas I want all outcoming edges.
The correct query would be:
PREFIX : <http://example/>
SELECT ?s ?p ?o
WHERE {
?s ?p ?o .
:order1 (<>|!<>)* ?s . # Match arbitrary-length property path with arbitrary properties
}
Where I changed ?o to ?s in :order1 (<>|!<>)+ ?o . so I only match outgoing edges, i.e. triples on the path where the subject (?s) is the target of a 'mother' triple.
I also changed + to * in :order1 (<>|!<>)+ ?o . so I match paths of length 0 or more instead of 1 or more so I also get the triples where :order1 is the subject.

Related

Type of returned value in SPARQL Query

Is it possible to know the type of the return values in a SPARQL query?
For example, is there a function to define the type of ?x ?price ?p
in the following query?
SELECT DISTINCT ?x ?price ?p
WHERE {
?x a :Product .
?x :price ?price .
?x ?p ?o .
}
I want to know that
typeOf(x) = resource
typeOf(?p) = property
typeOf(?price) = property target etc.
datatype(?x)
The datatype function will tell you whether a result is a resource or a literal and in the latter case tell you which datatype it has exactly.
For example, the following query on https://dbpedia.org/sparql...
SELECT DISTINCT ?x ?code ?p datatype(?x) datatype(?code) datatype(?p)
WHERE {
?x a dbo:City.
?x dbo:areaCode ?code .
?x ?p ?o .
} limit 1
...will return:
x
code
p
callret-3
callret-4
callret-5
http://dbpedia.org/resource/Aconchi
"+52 623"
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.w3.org/2001/XMLSchema#anyURI
http://www.w3.org/2001/XMLSchema#string
http://www.w3.org/2001/XMLSchema#anyURI
However this will not differentiate between "resource" and "property" because a resource may be a property. What you probably mean is "individual" and "property" but even a property can be treated as an individual, for example in the triple rdfs:label rdfs:label "label".
However you can always query the rdf:type of a resource, which may give you rdf:Property, owl:DatatypeProperty or owl:ObjectProperty.

How to return all S->P->O triples from a starting resource to a specified path depth?

My goal is to graphically represent the S->P->O relations within a depth two edges from the specified resource, p:Person_1. I want all relations within that path length to be returned from my query as ?s, ?p, ?o for further processing in my graphical application.
I tried the first query below which gives me my first set of ?s ?p ?o with repeats, then ?p2, ?o2, ?p3, ?o3 as additional columns in the result. I want to bind ?p2 and ?p3 to ?p, ?o2 and ?o3 to ?o.
SELECT *
WHERE {
p:Person_1 ?p ?o .
BIND("p:Person_1" as ?s)
OPTIONAL{
?o ?p2 ?o2 .
}
OPTIONAL{
?o2 ?p3 ?o3 .
}
}
Then, based on How do I construct get the whole sub graph from a given resource in RDF Graph?, I tried using CONSTRUCT to return the graph.
PREFIX p: <http://www.example.org/person/>
PREFIX x: <example.org/foo/>
construct { ?s ?p ?o }
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:|!x:)* ?s .
?s ?p ?o .
}
I am using Virtuoso and I get the error:
Virtuoso 37000 Error SP031: SPARQL compiler: Variable ?_::trans_subj_9_3 in T_IN list is not a value from some triple
I could post-process the result from my first query but I want to learn how to do this correctly with SPARQL, preferably on Virtuoso.
Update after testing the advice from #AKSW :
Both CONSTRUCT and SELECT statements work with the pattern suggested.
CONSTRUCT { ?s ?p ?o }
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:foo|!x:bar)* ?s .
?s ?p ?o .
} LIMIT 100
and:
SELECT s ?p ?o
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:foo|!x:bar)* ?s .
?s ?p ?o .
} LIMIT 100
The SELECT results in several duplicates that cannot be removed using DISTINCT, which results in an error that I assume is due to the 'datatype' of some of the returned values.
Virtuoso 22023 Error SR066: Unsupported case in CONVERT (DATETIME -> IRI_ID)
It appears some post-SPARQL processing is in order.
This gets me most of the way there. Still hoping I can find a solution for SPARQL that is like Cypher's "number of hops away" :
OPTIONAL MATCH path=s-[*1..3]-(o)
Here is a SPARQL query that works in Virtuoso. Note the SPARQL W3C standard does not support this syntax and it will fail in other triplestores.
PREFIX p: <http://www.example.org/person/>
PREFIX x: <example.org/foo/>
# CONSTRUCT {?s ?p ?o} # If you wish to return the graph
SELECT ?s ?p ?o # To return the triples
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:foo|!x:bar){1,3} ?s .
?s ?p ?o .
}LIMIT 100
See also K. Idehen's wiki entry here: http://linkedwiki.com/exampleView.php?ex_id=141
And thanks to #Joshua Taylor for advice in the same area.
Working Drafts of SPARQL 1.1 Property Paths included the {n,m} operator for handling this issue, which was implemented (and will remain supported) in Virtuoso. Here's a tweak to #tim's response.
Live SPARQL Query Results Page using the DBpedia endpoint (which is a Virtuoso instance).
Live SPARQL Query Definition Page that opens up query source code in the default DBpedia query editor.
Actual Query Example:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?s AS ?Entity
?o AS ?Category
WHERE {
?s rdf:type <http://dbpedia.org/ontology/AcademicJournal> ;
rdf:type{1,3} ?o
}
LIMIT 100
Should you be looking for LinkedIn-like presentation of Contact Networks and Degrees of Separation between individuals, here is an example using Virtuoso-specific SPARQL Extensions that solve this particular issue:
SELECT ?o AS ?WebID
((SELECT COUNT (*) WHERE {?o foaf:knows ?xx})) AS ?contact_network_size
?dist AS ?DegreeOfSeparation
<http://www.w3.org/People/Berners-Lee/card#i> AS ?knowee
WHERE
{
{
SELECT ?s ?o
WHERE
{
?s foaf:knows ?o
}
} OPTION (TRANSITIVE, t_distinct, t_in(?s), t_out(?o), t_min (1), t_max (4), t_step ('step_no') AS ?dist) .
FILTER (?s= <http://www.w3.org/People/Berners-Lee/card#i>)
FILTER (isIRI(?s) and isIRI(?o))
}
ORDER BY ?dist DESC (?contact_network_size)
LIMIT 500
Note: this approach is the only way (at the current time) to expose actual relational hops between entities in an Entity Relationship Graph that includes Transitive relations.
Live Link to Query Results
Live Link to Query Source Code
Bearing in mind that the r{n,m} operator was deprecated in the final SPARQL 1.1 (but will remain supported in Virtuoso), you can use r/r?/r? instead of r{1,3}, if you want to work strictly off the current spec:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?s AS ?Entity
?o AS ?Category
WHERE {
?s rdf:type <http://dbpedia.org/ontology/AcademicJournal> ;
rdf:type / rdf:type? / rdf:type? ?o
}
LIMIT 100
Here's a live example, against the DBpedia instance hosted in Virtuoso.

SPARQL if an instance has a property, others must as well

I have a specific instance and a SPARQL query that retrieves other instances similar to that one. By similar, I mean that the other instances have at least one common property and value in common with the specific instance, or have a class in common with the specific instance.
Now, I'd like to extend the query such that if the specific instance has a value for a "critical" property, then the only instances that are considered similar are those that also have that critical property (as opposed to just having at least one property and value in common).
For instance, here is some sample data in which instance1 has a value for predicate2 which is a subproperty of isCriticalPredicate.
#prefix : <http://example.org/rs#>
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
:instance1 a :class1.
:instance1 :predicate1 :instance3.
:instance1 :predicate2 :instance3. # (#) critical property
:instance2 a :class1.
:instance2 :predicate1 :instance3.
:instance4 :predicate2 :instance3.
:predicate1 :hasSimilarityValue 0.6.
:predicate2 rdfs:subPropertyOf :isCriticalPredicate.
:predicate2 :hasSimilarityValue 0.6.
:class1 :hasSimilarityValue 0.4.
Here is a query in which ?x is the specific instance, instance1. The query retrieves just instance4, which is correct. However, if I remove the critical property from instance1 (line labeled with #), I get no results, but should get instance2, since it has a property in common with instance1. How can I fix this?
PREFIX : <http://example.org/rs#>
select ?item (SUM(?similarity * ?importance * ?levelImportance) as ?summedSimilarity)
(group_concat(distinct ?becauseOf ; separator = " , ") as ?reason) where
{
values ?x {:instance1}
bind (4/7 as ?levelImportance)
{
values ?instanceImportance {1}
?x ?p ?instance.
?item ?p ?instance.
?p :hasSimilarityValue ?similarity
bind (?p as ?becauseOf)
bind (?instanceImportance as ?importance)
}
union
{
values ?classImportance {1}
?x a ?class.
?item a ?class.
?class :hasSimilarityValue ?similarity
bind (?class as ?becauseOf)
bind (?classImportance as ?importance)
}
filter (?x != ?item)
?x :isCriticalPredicate ?y.
?item :isCriticalPredicate ?y.
}
group by ?item
I've said it before and I'll say it again: minimal data is very helpful. From your past questions, I know that your working project has similarity values on properties and the like, but none of that really matters for the problem at hand. Here's some data that just has a few instances, property values, and one property designated as critical:
#prefix : <urn:ex:>
:p a :criticalProperty .
:a :p :u ; #-- :a has a critical property, so
:q :v . #-- only :d can be similar to it.
:c :q :v ; #-- :c has no critical properties, so
:r :w . #-- both :a and :d can be similar to it.
:d :p :u ;
:q :v .
The trick in a query like this is to filter out the results that have the problem, not to try to select the ones that don't. Logically, those mean the same thing, but in writing the query, it's easier to think about constraint violation, and to try to filter out the results that violate the constraint. In this case, you want to filter out any results where the instance has a critical property and value but the similar instance doesn't.
prefix : <urn:ex:>
select ?i (group_concat(distinct ?j) as ?js) where {
#-- :a has a critical property, but
#-- :c does not, so these are useful
#-- starting points
values ?i { :a :c }
#-- get ?j instances that have a value
#-- in common with ?i.
?i ?property ?value .
?j ?property ?value .
#-- make sure that ?i and ?j aren't
#-- the same instance
filter (?i != ?j)
#-- make sure that there is no critical
#-- property value that ?i has that
#-- ?j does not also have
filter not exists {
?i ?criticalProperty ?criticalValue .
?criticalProperty a :criticalProperty .
filter not exists {
?j ?criticalProperty ?criticalValue .
}
}
}
group by ?i
----------------------------
| i | js |
============================
| :a | "urn:ex:d" |
| :c | "urn:ex:d urn:ex:a" |
----------------------------
Related
There are some other questions that also touch on constaint satisfaction/violation that might be useful reading. While not all of these use nested filter not exists, most of them do have a pattern of filter not exists { … filter <negative condition> }.
Find lists containing ALL values in a set? (This actually uses a nested filter not exists.)
Is it possible to express a recursive definition in SPARQL?
Count values that satisfy a constraint and return 0 for those that don't satisfy it in SPARQL
SPARQL: return all intersections fulfilled by specified or equivalent classes (There's another nested filter not exists, in the last query in the answer.)

sparql to retrieve the value of a min constraint

How can I retrieve a min constraint on a class' attribute using sparql? I have value min 1000 decimal, and I would like to get 1000
In a hypothetical world that you have such a statement:
Class: X subClassOf: hasObjectProperty min 1 Y
If you write a SPARQL query as:
SELECT *
WHERE {
?s rdfs:subClassOf ?o.
}
You must extract all the refs:subClassOf axioms. However, if you need to precise and know which ones have cardinality restrictions, you need to go further:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix : <http://example.com#>
SELECT *
WHERE {
?s rdfs:subClassOf ?o.
?o ?x ?y.
filter(?s = :X)
}
Among others, you can see the following result:
As you can see, there are 2 relevant items, one is Y and one is the number presented as a non-negative integer. Therefore, one way to get each item is to put a filter for ?x in the SPARQL query and get each one one by one out. For example, filter owl:onClass will give you ?y:
prefix : <http://example.com#>
SELECT *
WHERE {
?s rdfs:subClassOf ?o.
?o owl:onClass ?y.
filter(?s = :X)
Here is the sparql query I used following Artemis' answer
SELECT ?min
WHERE {?s rdfs:subClassOf ?o.
?o owl:minQualifiedCardinality ?min.
FILTER(?s = :value) }
And with jena, I use getLiteral("min").getFloat();

How to query multiple tables using ARQ jena?

Overview
I am using ARQ in order to query local RDFfiles. The query is applied on 5 files which are:
a_m.nt, description.nt, labels.nt, links.nt, literals.nt
Information is modeled as a set of triples:
subject predicate object
Algorithm
First I want to select specific topics from a_m.nt file. Second I want to select the labels and descriptions of the selected topics from description.nt and labels.nt. In another way, search description.nt and labels.nt for the descriptions and labels that have the same topic as the one that was extract from a_m.nt. Finally I want to extract the rest of properties from links.nt and literals.nt.
Query
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select ?x ?y ?p ?o
where {
?topic rdf:type music.
?topic rdf:description ?x.
?topic rdf:label ?y.
?topic ?p ?o.
}
Command line
sparql --data a_m.nt --data description.nt --data label.nt --data links.nt --data literals.nt --query query_sparql
questions
By using this query, first I select a topic that have the type music then I select its description, label and other properties. Is that correct?
In your current query, it looks like you don't need all those bindings in the where clause, since you are retrieving everything anyhow with the last statement ?topic ?p ?o. You need to namespace the music variable properly and probably add DISTINCT to the select clause. So maybe rewrite the query like this:
PREFIX : <http://example.org/>
select DISTINCT ?topic ?p ?o
where {
?topic a :music.
?topic ?p ?o.
}
A possible result may be:
<foo> <type> <music>
<foo> <description> "this is foo"
<foo> <label> "foo"
<bar> <type> <music>
<bar> <label> "bar"
This is different from the query you have, more general. You basically get everything back that is of type music along with all the properties and values associated with them. In your query you only get results back that have some description and label (and are of type music), along with all the properties and values that are associated with them:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <http://example.org/>
select ?x ?y ?p ?o
where {
?topic rdf:type :music.
?topic rdf:description ?x.
?topic rdf:label ?y.
?topic ?p ?o.
}
Think of it as a table with ?x ?y ?p ?o being the column headers. A possible result may be:
"this is foo" "foo" <type> <music>
"this is foo" "foo" <description> "this is foo"
"this is foo" "foo" <label> "foo"
etc.
Your query will depend on how your data is organised. My question is, are there any other properties in description.nt and labels.nt that you want to avoid in the results? If so, then you may want to load that data into a named graph and extract only descriptions and labels from that graph in your query. Arbitrary example:
SELECT ?a ?b
FROM <A>
FROM NAMED <B>
WHERE
{
?x a <foo> .
GRAPH <B> { ?x ?a ?b }
}