Verify rdf:Container(rdf:Seq) using shacl constraints - sparql

I am currently trying to build a constraint validation shape for an incoming object of the type rdf:Seq. The shacl shapes I have used, use the path to identify triple and then add additional constraint validations on datatype, length and count. But, with the case of rdf:Seq the predicate is a variable, it can be rdf:_1, rdf:_2,.... . How can I build an effective shape where I would not know how many elements would be present in the incoming rdf:Seq object?
Or is there a way I can check that if the predicate is of the type rdfs:ContainerMembershipProperty I can validate the datatype of the data, otherwise ignore it?
Appreciate any help. Thanks!
Let's say for a given incoming data
...
<incoming node> schema:colors _:blankNode1 .
_:blankNode1 rdf:type rdf:Seq .
_:blankNode1 rdf:_1 "Red" .
_:blankNode1 rdf:_2 "Blue" .
_:blankNode1 rdf:_3 "Yello" .
...
What could the shape be?
I was trying the following
...
sh: property [
sh:path schema:colors ;
sh:class rdf:Seq ;
sh:property [
sh:path <ideally would like a regex here> ;
# or a way to identify the path to data tripe
sh:in ("Red" "Blue" "Green" ) ;
] ;
] ;
...

Related

Is it possible to write a shape that validates domain and range of a given property?

I try to validate my ontology instances using SHACL shapes. However, I cannot find how to say that a given property instance is valid only if it has an instance of Class1 as a subject and an instance of Class2 as an object.
In other words, I want to specify the domain (i.e., Class1) and the range (i.e., Class2) of this property.
In the following example, we precise that the range is (customer and person), but the domain is not specified.
ex:InvoiceShape
a sh:NodeShape ;
sh:property [
sh:path ex:customer ;
sh:class ex:Customer ;
sh:class ex:Person ;
] .
I know it is possible to specify a target class (TC) for the shape, but this limits the range of the property ex:customer when the domain is TC and not in all cases.
Is it possible to write a shape that fix domain and range of a given property?
Thank you!
To state that the property constraint above applies to all instances of ex:Invoice, you either add ex:InvoiceShape rdf:type rdfs:Class or ex:InvoiceShape sh:targetClass ex:Invoice. This however does not specify that all subjects of an ex:customer triple must be instances of ex:Invoice.
To make sure that the property ex:customer can only be used at instances of ex:Invoice, you can use:
ex:InverseInvoiceShape
a sh:NodeShape ;
sh:targetSubjectsOf ex:customer ;
sh:class ex:Invoice .
The shape above applies to all subjects of an ex:customer triple. A violation will be reported if that subject is not an instance of ex:Invoice.
FWIW your original example states that the values of ex:customer must be both ex:Customer and ex:Person instances. If you meant to express 'either customer or person' then use
ex:InvoiceShape
a sh:NodeShape ;
sh:targetClass ex:Invoice ;
sh:property [
sh:path ex:customer ;
sh:or (
[ sh:class ex:Customer ]
[ sh:class ex:Person ]
)
] .

RDF language tag with gender declension

Is there a best practice for defining a literal with a language tag and a gender declension?
I couldn't find if there is a native solution for this.
We're using SPARQL and TriG, so a pseudo example of what I'm trying to achieve would be (using TriG):
content:data2
a k_content:Data;
content:content "איך אתה מרגיש?"#he-male ;
content:content "איך את מרגישה"#he-female .
Would #he-female / #he-male be a big no no? How did you address this?
As #AKSW suggested one approach is using an intermediate node
content:data2
a k_content:Data;
content:content [
rdf:value "איך אתה מרגיש?"#he ;
content:declension gender:male
];
content:content [
rdf:value "איך את מרגישה"#he;
content:declension gender:female
]
Another approach is to use dedicated (sub-)properties:
content:data2
a k_content:Data;
content:content-male "איך אתה מרגיש?"#he ;
content:content-female "איך את מרגישה"#he .
The ontology (T-Box) could contain the following:
content:content-female rdfs:subPropertyOf content:content .
content:content-intersex rdfs:subPropertyOf content:content .
content:content-male rdfs:subPropertyOf content:content .
I agree that label characteristics should be stored in separate fields. The Lemon ontology provides appropriate fields, and the OLIA and GOLD (ISO DCR) provide values. (OLIA probably also has appropriate fields)

RDF Data Cubes, AttributeProperty, units of measurement & QUDT

I'm doing some work with RDF Data Cubes vocabulary for publishing time-series of various data, among others sensors. The measurement of the sensor is taken at a specific time at a specific station.
Both time and station I will model as qb:DimensionProperty, the measurement itself as qb:MeasureProperty. I would also like to state what unit the measurement is in. In this particular example it is atmospheric pressure at the height of the station. My understanding from the spec is that this would be modeled as qb:AttributeProperty.
In the description of the data structure I would have something like this:
<dsd/prestas0> a qb:DataStructureDefinition ;
qb:component
[ qb:dimension <stn>; qb:order 1 ],
[ qb:dimension <time>; qb:order 2 ],
[ qb:attribute <unit>; qb:order 3 ],
[ qb:measure <prestas0>; qb:order 4 ] .
<stn> a qb:DimensionProperty ;
rdfs:label "Station°" .
<time> a qb:DimensionProperty ;
rdfs:label "Time" .
<unit> a qb:AttributeProperty ;
rdfs:label "Unit" ;
rdfs:comment "The unit of the measurement" .
<prestas0> a qb:MeasureProperty ;
rdfs:label "Measurement" ;
rdfs:range xsd:float .
# Units in use
<hPa> a qudt:Unit ;
rdfs:label "Atmospheric pressure (hPa)" ;
rdfs:comment "Atmospheric pressure on station level" ;
rdfs:subClassOf unit:Pascal .
As you can see I also created an instance of a unit, called <hPa>. In there I use rdfs:subClassOf to subclass from QUDT unit:Pascal.
Now my questions:
is my understanding of using qb:AttributeProperty for the unit correct?
Is it fine to sublass from QUDT the way I did? I am aware that I have hPa while QUDT defines Pa only so I would probably have to change the data accordingly
Can I in general simply use units from QUDT directly (in terms of their URIs) if they do not need a specific tailoring like I did in this example?

Set default value to property with SPIN

I am new to SPIN. I wonder if it makes sense to use it for the following purpose. Say I have the following class hierarchy:
ex:Vehicle
ex:Car
ex:Sedan
Some classes have the property owl:equivalenClass set to some value, for example:
ex:Vehicle
owl:equivalentClass wd:MeanOfTransportation
ex:Sedan
owl:equivalentClass wd:Sedan
In the case owl:equivalentClass is not set to a value, it should take the value of it's parent class. In the above example:
ex:Car
owl:equivalentClass wd:MeanOfTransportation
Can this be accomplished with SPIN, in my case using TopBraid?
It makes sense to use SPIN for these purposes, because SPIN inference engine is the only inference engine available in TopBraid Composer Free Edition.
In other TopBraid Composer editions, the appropiateness, as well as the result obtained, may vary depending on your inferencing configuration (Inferences > Configure Inferencing).
The rule is:
rdfs:Class spin:rule [
rdf:type sp:Construct ;
sp:text """
CONSTRUCT {
?this owl:equivalentClass ?equivalentClass .
}
WHERE {
?this rdfs:subClassOf ?superClass .
?superClass owl:equivalentClass|^owl:equivalentClass ?equivalentClass .
FILTER NOT EXISTS {
?this owl:equivalentClass|^owl:equivalentClass [] .
}
} """
] .
Please note that this SPIN rule is attached to rdfs:Class : a class that instances all these ex:Car, ex:Vehicle are.
?this is a special variable that refers to the "current" instance of this class.
It seems that the spl:InferDefaultValue SPIN template can not be used in your case, because spl:InferDefaultValue doesn't accept SPARQL variables as its spl:defaultValue argument.

Apache Jena full-text search (with external content)

I would like to configure something like this:
RDF dataset of metadata about books;
Books placed separately like XHTML files, paragraphs with unique IDs;
Every book’s metadata includes something like dc:source link to the file (absolute? like a proper URI, what about scaling?);
I know this could be pretty trivial but I can’t grasp that properly. At the beginning I am trying to index just pure TXT tiny files, every linked from dc:source in the metadata file. As I understand, this should be enough for indexing everything included. I am trying to do it like the guy in this post here. Unlike him, I want to index RDF dataset as well as external files. Especially these two commands log no errors (in contrary, it logs there are 57 triples):
java -cp /home/honza/.apache-jena-fuseki-2.3.0/fuseki-server.jar tdb.tdbloader --tdb=run/configuration/service2.ttl testDir/test_dataset.ttl
INFO -- Start triples data phase
INFO ** Load into triples table with existing data
INFO -- Start quads data phase
INFO ** Load empty quads table
INFO Load: testDir/test_dataset.ttl -- 2015/11/13 12:46:22 CET
INFO -- Finish triples data phase
INFO ** Data: 57 triples loaded in 0,29 seconds [Rate: 193,22 per second]
INFO -- Finish quads data phase
INFO -- Start triples index phase
INFO -- Finish triples index phase
INFO -- Finish triples load
INFO ** Completed: 57 triples loaded in 0,33 seconds [Rate: 172,21 per second]
INFO -- Finish quads load
and
java -cp /home/honza/.apache-jena-fuseki-2.3.0/fuseki-server.jar jena.textindexer --desc=run/configuration/service2.ttl
WARN Values stored but langField not set. Returned values will not have language tag or datatype.
After that, server runs properly, I see the graph but it includes no data.
My config for this service is (I don’t know whether it is right to have service and DB config in one file, for me it works better at the moment, dividing throws some errors):
#prefix fuseki: <http://jena.apache.org/fuseki#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
#prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
#prefix text: <http://jena.apache.org/text#> .
#prefix : <#> .
[] rdf:type fuseki:Server
.
<#service2> rdf:type fuseki:Service ;
rdfs:label "TDB/text service" ;
fuseki:name "test" ; # http://host:port/ds
fuseki:serviceQuery "sparql" ; # SPARQL query service
fuseki:serviceQuery "query" ; # SPARQL query service (alt name)
fuseki:serviceUpdate "update" ; # SPARQL update service
fuseki:serviceUpload "upload" ; # Non-SPARQL upload service
fuseki:serviceReadWriteGraphStore "data" ; # SPARQL Graph store protocol (read and write)
# A separate read-only graph store endpoint:
fuseki:serviceReadGraphStore "get" ; # SPARQL Graph store protocol (read only)
fuseki:dataset :text_dataset
.
[] ja:loadClass "org.apache.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
:text_dataset rdf:type text:TextDataset ;
text:dataset <#test> ;
text:index <#indexLucene> .
Firstly you haven't actually defined a Lucene index explicitly so likely what you get is a transient in-memory index that is thrown away every time your application stops. At a minimum you need the following in your configuration:
# Text index description
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:path/to/index/> .
Where <file:path/to/index/> points to a directory where you want your text index to be stored.
Secondly you haven't told the text search about how the Lucene index is structured. Even if you have separately created your index from your external files you need to define in your configuration how Jena should use and access that index.
From the documentation you need to define an entity map:
# Mapping in the index
# URI stored in field "uri"
# rdfs:label is mapped to field "text"
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:defaultField "text" ;
text:map (
[ text:field "text" ; text:predicate rdfs:label ]
) .
The comments in the example from the documentation hopefully describe things fairly well. The text:entityField property is used to specify the field in your index that stores the URI associated with an indexed data i.e. this provides the means to link the text index hits back to the RDF in your triple store. The text:defaultField is used to specify the field containing the indexed data i.e. the field that the text search will actuall searche.
The optional text:map shown here can be used to further customise what fields are searched and allow you to index multiple pieces of content in different fields and then write queries that search your text index in different ways.
Once you have an appropriately defined entity map you need to link it to your index configuration like so:
# Text index description
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:path/to/index/> ;
text:entityMap <#entMap> .
With this in place you should actually be able to get results from your index.