How to create a new variable in CONSTRUCT query - sparql

I'm transforming the data in one data store to another form / ontology / schema, using SPARQL.
The data is actually a provenance, but can be simplified as a series of relationship like this: A produces D + B consumes D.
:A0 :consumes :D0 ;
:produces :D1, :D2 .
:A1 :produces :D3 .
:A2 :consumes :D1, :D2 ;
:produces :D4 .
:A3 :consumes :D3, :D4 ;
:produces :D5, D6 .
(There is no guarantee D is always produced by some A, or will be consumed by some other A. But every D will only be produced by one A.)
I would like to get the information of data dependency. An example query looks like this:
CONSTRUCT {
?producer :hasNextStage ?consumer .
}
WHERE {
?producer :produces ?data .
OPTIONAL {
?consumer :consumes ?data .
FILTER (?producer != ?consumer)
}
}
Everything is fine until here. However, I would like to have more information, say "which A is connected to which another A by what data", something like this:
:A0 :hasInfluence :INFLUENCE .
:INFLUENCE :stage :A2 ;
:data :D1, :D2 .
As demonstrated, this requires me to construct a new variable (:INFLUENCE) and assign triples to it.
Is there a way to do this in SPARQL?
------ UPDATED SECONDARY QUESTION ------
According to cygri's answer, I changed the query to this:
CONSTRUCT {
?producer :hasInfluence ?influence .
?influence :stage ?consumer ;
:data ?data .
}
WHERE {
?producer :produces ?data .
OPTIONAL {
?consumer :consumes ?data .
FILTER (?producer != ?consumer)
BIND (IRI(CONCAT("http://my/ns/#", CONCAT(STRAFTER(STR(?producer), "#"), STRAFTER(STR(?consumer), "#")))) AS ?influence)
}
}
However, the BIND clause seems not having any effect. After shortening it, the problem is with the ?producer variable: if I use this variable here, it won't work. Seems ?producer is not bound here? (But the FILTER does work.)
If I move this BIND clause out of the OPTIONAL, everything works fine. But this is not intuitive, and I'm wondering why it won't work inside OPTIONAL?

The simplest solution would be to avoid a new variable in the CONSTRUCT template altogether and just use a blank node:
CONSTRUCT {
?producer :hasInfluence [
:stage ?consumer;
:data ?data
]
}
This should produce the desired graph structure. If you insist on an IRI instead of a blank node for the influence node (as you probably should), then you would want something like:
CONSTRUCT {
?producer :hasInfluence ?influence.
?influence :stage ?consumer;
:data ?data.
}
WHERE {
...
BIND (IRI(xxx) AS ?influence)
}
This assigns a new IRI to variable ?influence and uses that variable in the CONSTRUCT template.
Now, xxx is just a placeholder for the expression that calculates the IRI. You don’t provide enough detail to say what should go in there. Would there be one influence node for each data node? If so, you could take the string form of the data IRI: str(?data) and do some string replacement using replace(s, search, replace) to make a nice unique IRI for the influence node.

Related

RDFlib Blank node update query

I am trying to update the object of a triple with a blank node as its subject using RDFlib. I firstly select the blank node in the first function and insert this blank node into the update query in the second function, however, this doesn't provide me with the required output. I can't use the add() method or initBindings as I need to save the SPARQL query executed for the user.
Sample data
#prefix rr: <http://www.w3.org/ns/r2rml#> .
[ rr:objectMap [ rr:column "age" ;
rr:language "dhhdhd"] ].
Code
mapping_graph = Graph().parse("valid_mapping.ttl",format="ttl")
# find the blank node for the update query
def find_om_IRI():
query = """SELECT ?om
WHERE {
?om rr:language 'dhhdhd' .
}
"""
qres = mapping_graph.query(query)
for row in qres:
return row[0]
# insert blank node as subject to update query
def change_language_tag():
om_IRI = find_om_IRI()
update_query = """
PREFIX rr: <http://www.w3.org/ns/r2rml#>
DELETE DATA{
_:%s rr:language 'dhhdhd' .
}
""" % (om_IRI)
processUpdate(mapping_graph, update_query)
print(update_query)
print(mapping_graph.serialize(format="ttl").decode("utf-8"))
return update_query
change_language_tag()
This however returns the following output. Leaving the graph unchanged.
#prefix rr: <http://www.w3.org/ns/r2rml#> .
[ rr:objectMap [ rr:column "age" ;
rr:language "dhhdhd"] ].
If you filter based on the blank node value. This is the final query I came up with.
PREFIX rr: <http://www.w3.org/ns/r2rml#>
DELETE { ?om rr:language "dhhdhd" }
INSERT { ?om rr:language "en-fhfhfh" }
WHERE {
SELECT ?om
WHERE {
?om rr:language "dhhdhd" .
FILTER(str(?om) = "ub1bL24C24").
}
}
Indeed, as commenter #TallTed says "Blank nodes cannot be directly referenced in separate queries". You are trying to do something with BNs for which the are expressly not defined, that is persist their absolute identity, e.g. across separate queries. You should take the approach of relative identification (locate the BN with reference to identified, URI, nodes) or single SPARQL queries. So this question is an RDF/SPARQL question, not an RDFlib question.
You said: "I can't combine the queries as there could be other object maps with the same language tag". So if you cannot deterministically refer to a node due to its lack of distinctness, you will have to change your data, but I suspect you can - see the suggestion at the end.
Then you said "I have figured out the solution and have updated the question accordingly. Its a hack really..." Yes, don't do this! You should have a solution that's not dependent on some quirks of RDFlib! RDF and Semantic Web in general is all about universally defined and standard data and querying, so don't rely on a particular toolkit for a data question like this. Use RDFlib only as an implementation but one that should be replicable in another language. I personally model all my RDFlib triple adding/deleting/selecting code as standard SPARQL queries first so that my RDFlib code is then just a standard function equivalent.
In your own answer you said "If you filter based on the blank node value...", also don't do this either!
My suggestion is to change your underlying data to include representations of things - named nodes etc - that you can use to fix on for querying. If you cannot distinguish between things that you want to change without resorting to hacks, then you have a data modelling problem that needs solving. I do think you can distinguishes object maps though.
In your data, you must be able to fix on the particular object map for which you are changing the language. Is the object map unique per column and is the column uniquely identified by its rr:column value? If so:
SELECT ?lang
WHERE {
  ?om rr:column ?col .  ?om rr:language ?lang .
  FILTER (?col = "age")
}
This will get you the object map for the column "age" so, to change it:
DELETE {
  ?om rr:language ?lang .
}
INSERT {
  ?om rr:language "new-language" .
}
WHERE {
  ?om rr:column ?col .  ?om rr:language ?lang .
  FILTER (?col = "age")
}

SPIN representation to SPARQL

Is there an API that could help convert SPIN representation (of a SPARQL query) back to its SPARQL query form?
From:
[ a <http://spinrdf.org/sp#Select> ;
<http://spinrdf.org/sp#where> ( [ <http://spinrdf.org/sp#object> [ <http://spinrdf.org/sp#varName>
"o"^^<http://www.w3.org/2001/XMLSchema#string> ] ;
<http://spinrdf.org/sp#predicate>
[ <http://spinrdf.org/sp#varName>
"p"^^<http://www.w3.org/2001/XMLSchema#string> ] ;
<http://spinrdf.org/sp#subject>
[ <http://spinrdf.org/sp#varName>
"s"^^<http://www.w3.org/2001/XMLSchema#string> ]
] )
] .
To:
SELECT *
WHERE {
?s ?p ?o .
}
Thanks in advance.
I know two jena-based APIs to work with SPIN.
You can use either org.topbraid:shacl:1.0.1 which is based on jena-arq:3.0.4 or the mentioned in the comment org.spinrdf:spinrdf:3.0.0-SNAPSHOT, which is a fork of the first one, but with changed namespaces and updated dependencies.
Note the first (original) API also may work with modern jena (3.13.x), at least your task can be solved in such circumstances.
The second API has no maven release yet, but can be included into your project via jitpack.
To solve the problem you need to find the root org.apache.jena.rdf.model.Resource and cast it to org.topbraid.spin.model.Select (or org.spinrdf.model.Select) using jena polymorphism (i.e. the operation org.apache.jena.rdf.model.RDFNode#as(Class)).
Then #toString() will return the desired query with the model's prefixes.
Note that all personalities are already included into model via static initialization.
A demonstration of this approach is SpinTransformer from ONT-API test-scope, which transforms SPARQL-based queries to an equivalent form with sp:text.

SPARQL path traversing

I am trying to create a query using SPARQL on a ttl file where I have part of the graph representing links as follows:
Is it possible to search for the type Debit and get all the literals associated with its parent ie: R494Vol1D2, Salvo, Vassallo?
Do I need to use paths?
As AKSW correctly said, RDF is about directed graphs. So I created a small n-triples file based on your image of the graph. I assume that the dataset looks like this:
<http://natarchives.com.mt/deed/R494Vol1-D2> <http://purl.org/dc/terms/type> "Debit".
<http://natarchives.com.mt/deed/R494Vol1-D2> <http://purl.org/dc/terms/identifier> "R494Vol1D2".
<http://natarchives.com.mt/deed/R494Vol1-D2> <http://data.archiveshub.ac.uk/def/associatedWith> <http://natarchives.com.mt/person/person796>.
<http://natarchives.com.mt/person/person796> <http://xmlns.com/foaf/0.1/firstName> "Salvo".
<http://natarchives.com.mt/person/person796> <http://xmlns.com/foaf/0.1/family_name> "Vassallo".
Also I did not know the prefix locah but according to http://prefix.cc it stands for http://data.archiveshub.ac.uk/def/
So if this dataset is correct you could use the following query:
1 SELECT ?literal WHERE{
2 ?start <http://purl.org/dc/terms/type> "Debit".
3 ?start <http://data.archiveshub.ac.uk/def/associatedWith>* ?parent.
4 ?parent ?hasLiteral ?literal.
5 FILTER(isLiteral(?literal) && ?literal != "Debit" )
6 }
In line 2 we define the starting point of our path, which is every vertex that has the type "Debit". Then we look for all vertices that are connected to ?start with an edge labelled with <http://data.archiveshub.ac.uk/def/associatedWith>. These vertices are then bound to ?parent. After that we look for all triples that have ?parent as subject and store the object in ?literal. In Line 6 we filter everything that is not a literal or is "Debit" from ?literal resulting in the desired outcome.
If I modeled the direction of <http://data.archiveshub.ac.uk/def/associatedWith> wrongly, you could change line 3 of the query to:
?start ^<http://data.archiveshub.ac.uk/def/associatedWith>* ?parent
This would change the direction of the edge.
And to answer the question if you need to use paths: If you do not know how long the path of edges labeled with <http://data.archiveshub.ac.uk/def/associatedWith> will be, then in my opinion yes, you will have to use either * or + of property paths.

Is it possible to get the nested json output in SPARQL?

I am using MarkLogic 8.0-6.3
I came across a scenario where I need nested JSON output in SPARQL.
For example there are multiple same predicates for an IRI, in the result I want the multiple values in array not as a whole triple.
for example:
Assume triples:
#prefix p0: <http://www.mla.com/term/> .
p0:7 <http://www.w3.org/2004/02/skos/core#narrower> p0:768 ,
p0:769 ,
p0:770 ,
p0:771 .
SPARQL query:
PREFIX skos-mla: <http://www.mlacustom.com#>
PREFIX term: <http://www.mla.com/term/>
select ?iri ?o {
graph<thesaurus-term>{
bind(term:7 as ?iri)
term:7 skos:narrower ?o .
}
}
the above query will return the 4 triples as output.
What I want is it should just return me a single json structure like
{
"iri": "http://www.mla.com/term/7",
"narrowers": ["http://www.mla.com/term/768", "http://www.mla.com/term/769", "http://www.mla.com/term/770", "http://www.mla.com/term/771"]
}
Above JSON is just to explain the problem.
In actual I would need a more complex json structure like instead of string array I need an array of json objects.
I know that I can read the response and recreate the whole json in any format but it has performance impacts.
In recent versions of MarkLogic 9, the Optic API can support this requirement:
Use the op.fromSPARQL() accessor to project columns of values from the triples.
Chain a select() call using op.jsonObject() to collect the values as properties of objects.
Chain a groupBy() call using op.arrayAggregate() to aggregate the objects in an array.
Chain a result() call to get the output.
For more information, see:
http://docs.marklogic.com/op.jsonObject
and:
http://docs.marklogic.com/op.arrayAggregate
Hoping that helps,

How to filter query SPARQL for property "type"

I have a data source file that one of its properties is an actual class instance:
<clinic:Radiology rdf:ID="rad1234">
<clinic:diagnosis>Stage 4</clinic:diagnosis>
<clinic:ProvidedBy rdf:resource="#MountSinai"/>
<clinic:ReceivedBy rdf:resource="#JohnSmith"/>
<clinic:patientId>7890123</clinic:patientId>
<clinic:radiologyDate>01-01-2017</clinic:radiologyDate>
</clinic:Radiology>
so clinic:ProvidedBy is pointing to this:
<clinic:Radiologists rdf:ID="MountSinai">
<clinic:name>Mount Sinai</clinic:name>
<clinic:npi>1234567</clinic:npi>
<clinic:specialty>Oncology</clinic:specialty>
</clinic:Radiologists>
How do I query using the property clinic:providedBy (which is of type clinic:Radiologists)? Whatever I have tried does not bring back results.
It's also not clear what exactly you want to have, so my answer will return "all radiology resources that are provided by MountSinai":
PREFIX clinic: <THE NAMESPACE OF_THE_CLINIC_PREFIX>
PREFIX : <THE_BASE_NAMESPACE_OF_YOUR_RDF_DOCUMENT>
SELECT DISTINCT ?s WHERE {
?s clinic:ProvidedBy :MountSinai
}
But, I really suggest to start with an RDF and SPARQL tutorial, since form your comment your query
SELECT * WHERE { ?x rdf:resource "#MountSinai" }
is missing fundamental SPARQL basics. And for writing a matching SPARQL query it'S always good to have a look at the data in Turtle resp. N-Triples format both of which being closer to the SPARQL syntax.