GraphDB : datatyped dates comparison - sparql

Given this test data with an xsd:gYearMonth :
<http://exemple.fr/agent/123> <http://exemple.fr/ontologie/date> "2021-12"^^<http://www.w3.org/2001/XMLSchema#gYearMonth> <http://exemple.fr/graphe/testGraphe> .
This SPARQL query that compares it with xsd:gYear returns nothing in GraphDB 9.10 :
SELECT *
WHERE {
?x <http://exemple.fr/ontologie/date> ?date .
FILTER(?date < "2023"^^<http://www.w3.org/2001/XMLSchema#gYear>)
}
I am aware that the SPARQL spec only defines the comparison operators for xsd:dateTime, but I am surprised that no extension has been done on this. I read here that it was done in RDF4J in "2016"^^xsd:gYear :-)
Does GraphDB support comparison between dates using different datatypes ?
Thanks !

Related

Marklogic: How can we perform the case-insensitve search in a pure SPARQL query?

I have a scenario where i am trying to find the content using the SPARQL query for the triples stored in marklogic. The filter condition in SPARQL query needs to perform the case-insensitve search for a particular term. May i know how can i do that?
For eg:
filter(strstarts(?personName, "FA"^^xs:string))
The above filters should fetch me the results whose personName value starts with upperCase also(like: fa). I think this will clearly give some idea about the issue i am asking about.
I believe you have two options to do case-insensitive search using SPARQL in MarkLogic.
If you want to use SPARQL only than you can do the following (modify the select statement as needed):
select * where {
?personName ?p ?o
FILTER (lcase(str(?personName)) = "fa"^^xs:string)
}
As an alternative you could also mix some fn:* functions with your SPARQL statement so you could do something similar to:
prefix fn: <http://www.w3.org/2005/xpath-functions#>
select * where {
?personName ?p ?o
FILTER (?personName, fn:lower-case("FA"))
}
Don't forget that in MarkLogic you can use any fn:* or cts:* function as well (the prefix for cts:* functions would be prefix cts: <http://marklogic.com/cts#>
I hope this helps.
Next to the good suggestions of Tamas, there is also REGEX. It accepts a case-insensitivity flag. Something like:
select * where {
?personName ?p ?o
FILTER( regex(str(?personName), "^fa", "i") )
}
HTH!

Complex SPARQL query - Virtuoso performance hints?

I have a rather complex SPARQL query, which is executed thousands of times in parallel threads (400 threads). The query is here somewhat simplified (namespaces, properties, and variables have been reduced) for readability, but the complexity is left untouched (unions, number of graphs, etc.). The query is run against 4 graphs, the biggest of which contains 5,561,181 triples.
PREFIX graphA: <GraphABaseURI:>
ASK
FROM NAMED <GraphBURI>
FROM NAMED <GraphCURI>
FROM NAMED <GraphABaseURI>
FROM NAMED <GraphDBaseURI>
WHERE{
{
GRAPH <GraphABaseURI>{
?variableA a graphA:ClassA .
?variableA graphA:propertyA ?variableB .
?variableB dcterms:title ?variableC .
?variableA graphA:propertyB ?variableD .
?variableL<GraphABaseURI:propertyB> ?variableD .
?variableD <propertyBURI> ?variableE
}
.
GRAPH <GraphBURI>{
?variableF <propertyCURI>/<propertyDURI> ?variableG .
?variableF <propertyEURI> ?variableH
}
.
GRAPH <GraphCURI>{
?variableI <http://www.w3.org/2004/02/skos/core#notation> ?variableJ .
?variableI <http://www.w3.org/2004/02/skos/core#prefLabel> ?variableK .
FILTER (isLiteral(?variableK) && REGEX(?variableK, "literalA", "i"))
}
.
FILTER (isLiteral(?variableJ) && ?variableG = ?variableJ) .
FILTER (?variableE = ?variableH)
}
UNION
{
GRAPH <GraphABaseURI>{
?variableA a graphA:ClassA .
?variableA graphA:propertyA ?variableB .
?variableB dcterms:title ?variableC .
?variableA graphA:propertyB ?variableD .
?variableL<propertyBURI> ?variableE .
?variableL <propertyFURI> ?variableD .
}
.
GRAPH <GraphDBaseURI>{
?variableM <propertyGURI> ?variableN .
?variableM <propertyHURI> ?variableO .
FILTER (isLiteral(?variableO) && REGEX(?variableO, "literalA", "i"))
}
.
FILTER (?variableE = ?variableN) .
}
UNION
{
GRAPH <GraphABaseURI>{
?variableA a graphA:ClassA .
?variableA graphA:propertyA ?variableB .
?variableB dcterms:title ?variableC .
?variableA graphA:propertyB ?variableD .
?variableL<propertyBURI> ?variableE .
?variableL <propertyIURI> ?variableD .
}
.
GRAPH <GraphDBaseURI>{
?variableM <propertyGURI> ?variableN .
?variableM <propertyHURI> ?variableO .
FILTER (isLiteral(?variableO) && REGEX(?variableO, "literalA", "i"))
}
.
FILTER (?variableE = ?variableN) .
}
. FILTER (isLiteral(?variableC) && REGEX(?variableC, "literalB", "i")) .
}
I would not expect someone to transform the above query (of course...). I am only posting the query to demonstrate the complexity and all the SPARQL structures used.
My questions:
Would I gain regarding performance if I had all my triples in one graph? This way I would avoid unions and simplify my query, however, would this also benefit in terms of performance?
Are there any kind of indexes that I could built and they could be of any help with the above query? I am not really confident on data indexing, however reading in the RDF Index Scheme section of RDF Performance Tuning, I wonder if Virtuoso 7's default indexing scheme is suitable for queries like the above. While the predicates are defined in the above query's SPARQL triple patterns, there are many triple patterns that have not defined subject or predicate. Could this be a major problem regarding performance?
Perhaps there is a SPARQL syntax structure that I am not aware of and could be of great help in the above query. Could you suggest something? For example, I have already improved performance by removing STR() casts and using the isLiteral() function. Could you suggest anything else?
Perhaps you could suggest overusing a complex SPARQL syntax structure?
Please note that I use Virtuoso Open source edition, built on Ubuntu, Version: 07.20.3214, Build: Oct 14 2015.
Regards,
Pantelis Natsiavas
First thing -- your Virtuoso build is long outdated; updating to 7.20.3217 as of April 2016 (or later) is strongly recommended.
Optimization suggestions are naturally limited when looking at a simplified query., but here are several thoughts, in no particular order...
Index Scheme Selection, the RDF Performance Tuning doc section following RDF Index Scheme, offers a couple of alternative and/or additional indexes which may make sense for your queries and data. As you say that some of your patterns will have defined graph and object, and undefined subject and predicate, some other indexes may also make sense (e.g., GOPS, GOSP), depending on some other factors.
Depending on how much your data has changed since original load, it may be worth rebuilding the free-text indexes, with this SQL command (which may be issued through any SQL interface -- iSQL, ODBC, JDBC, etc.) —
VT_INC_INDEX_DB_DBA_RDF_OBJ ()
Using the bif:contains predicate can result in substantially better performance than regex() filters, for instance replacing —
FILTER (isLiteral(?variableO) && REGEX(?variableO, "literalA", "i")) .
— with —
?variableO bif:contains "'literalA'" .
FILTER ( isLiteral(?variableO) ) .
Explain() and profile() can be helpful in query optimization efforts. Much of this output is meant for analysis by Development, so it may not mean much to you, but providing it to other Virtuoso users can still yield helpful suggestions.
For a number of reasons, the rdf:type predicate (often expressed as a, thanks to SPARQL/Turtle semantic sugar) can be a performance killer. Removing those predicates from your graph pattern is likely to boost performance substantially. If needed, there are other ways to limit the solution set (such as by testing for attributes only possessed by entities your desired rdf:type) which do not have such negative performance impacts.
(ObDisclaimer: OpenLink Software produces Virtuoso, and employs me.)

How to get a concise bounded description of a resource with Sesame?

I've been testing Sesame 2.7.2 and I got a big surprise when faced to the fact that DESCRIBE queries do not include blank nodes closure [EDIT: the right term for this is CBD for concise bounded description]
If I correctly understand, the SPARQL spec is quite loose on that and says that what is returned is actually up to the provider, but I'm still surprised at the choice, since bnodes (in the results of the describe query) cannot be used in subsequent SPARQL queries.
So the question is: how can I get a closed description of a resource <uri1> without doing:
query DESCRIBE <uri1>
iterate over the result to determine which objects are blank nodes
then DESCRIBE ?b WHERE { <uri1> pred_relating_to_bnode_ ?b }
do it recursively and chaining over as long as bnodes are found
If I'm not mistaken, depth-2 bnodes would have to be described with
DESCRIBE ?b2 WHERE {<uri1> <p1&> ?b . ?b <p2> ?b2 }
unless there is a simpler way to do this?
Finally, would it not be better and simpler to let DESCRIBE return a closed description of a resource where you can still obtain the currently returned result with something like the following?
CONSTRUCT {<uri1> ?p ?o} WHERE {<uri1> ?p ?o}
EDIT: here is an example of a closed result I want to get back from Sesame
<urn:sites#1> a my:WebSite .
<urn:sites#1> my:domainName _:autos1 .
<urn:sites#1> my:online "true"^^xsd:boolean .
_:autos1 a rdf:Alt .
_:autos1 rdf:_1 _:autos2
_:autos2 my:url "192.168.2.111:15001"#fr
_:autos2 my:url "192.168.2.111:15002"#en
Currently: DESCRIBE <urn:sites#1> returns me the same result as the query CONSTRUCT WHERE {<urn:sites#1> ?p ?o}, so I get only that
<urn:sites#1> a my:WebSite .
<urn:sites#1> my:domainName _:autos1 .
<urn:sites#1> my:online "true"^^xsd:boolean .
Partial solutions using SPARQL
Based on your comments, this isn't an exact solution yet, but note that you can describe multiple things in a given describe query. For instance, given the data:
#prefix : <http://example.org/> .
:Alice :named "Alice" ;
:likes :Bill, [ :named "Carl" ;
:likes [ :named "Daphne" ]].
:Bill :likes :Elaine ;
:named "Bill" .
you can run the query:
PREFIX : <http://example.org/>
describe :Alice ?object where {
:Alice :likes* ?object .
FILTER( isBlank( ?object ) )
}
and get the results:
#prefix : <http://example.org/> .
:Alice
:likes :Bill ;
:likes [ :likes [ :named "Daphne"
] ;
:named "Carl"
] ;
:named "Alice" .
That's not a complete description of course, because it's only following :likes out from :Alice, not arbitrary predicates. But it does get the blank nodes named "Carl" and "Daphne", which is a start.
The larger issue in Sesame
It looks like you're going to have to do something like what's described above, and possibly with multiple searches, or you're going to have to modify Sesame. The alternative to writing some creative SPARQL is to change the way that Sesame implements describe queries. Some endpoints make this relatively easy, but Sesame doesn't seem to be one of them. There's a mailing list thread from 2011, Custom SPARQL DESCRIBE Implementation, that seems addressed at this same problem.
Roberto García asks:
I'm trying to customise the behaviour of SPARQL DESCRIBE queries.
I'm willing to get something similar to CBD (i.e. all properties and
values for the described resource plus all properties and values for
the blank nodes connected to it).
I have tried to reproduce a similar behaviour using a CONSTRUCT query
but the performance is not good and the query gets quite complex if I
try to consider long chains of properties pointing to blank nodes
starting from the described resource.
Jeen Broekstra replies:
The implementation of DESCRIBE in Sesame is hardcoded in the query
parser. It can only be changed by adapting the parser itself, and even
then it will be tricky, as the query model has no easy way to express it
either: it needs an extension of the algebra.
> If this is not possible, any advice about how to implement it using CONSTRUCT
queries?
I'm not sure it's technically possible to do this in a single query.
CBDs are recursive in nature, and while SPARQL does have some support
for recursivity (property chains), the problem is that you have to do an
intermediate check in every step of the property chain to see if the
bound value is a blank node or not. This is not something that SPARQL
supports out of the box: property chains are defined to have only length
of the path as the stop condition.
Perhaps something is possible using a convoluted combination of
subqueries, unions and optionals, but I doubt it.
I think the best workaround is instead to use the standard DESCRIBE
format that Sesame supports, and for each blank node value in that
result do a separate consecutive query. In other words: you solve it by
hand.
The only other option is to log a feature request for support of CBDs in
Sesame. I can't give any guarantees about if/when that will be followed
up on though.

Inquiry on example of explicit join in the SPARQL

I have following sparql query(from the book, semantic web primer):
select ?n
where
{
?x rdf:type uni:Course;
uni:isTaughtBy :949352
?c uni:name ?n .
FILTER(?c=?x) .
}
In this case, I guess this code is same as the the following:
Select ?n
Where
{
?x rdf:type uni:course;
uni:isTaughtBy :949352 .
?x uni:name ?n .
}
Does this query lead to coding error?
No, I don't see why it should give you an error or produce wrong results. Just make sure to always use the right case (uni:Course vs. uni:course), as SPARQL is case sensitive.
To be honest, the first version seems rather obscure as it uses a FILTER without a real need for it. That said, you may further slim down your query if you wish:
SELECT ?n
WHERE
{
?x rdf:type uni:Course;
uni:isTaughtBy :949352;
uni:name ?n .
}
However, keep in mind that saving characters does not always lead to improved readability.
For your example yes the queries are identical and there would be no value in using a FILTER over a join.
However the reason why you might use the FILTER form is the difference in semantics between joins and the = operator
Joins require that the values of the variables be exactly the same RDF term, whereas = does value equality - do the values of RDF terms that represent the same value? This is primarily a concern when one/both of the values may have literal values
It's easier to see if you take a specific example, assume ?x=4 and ?c = 4.0 (which is a bad example for your query but illustrates the point)
?x = ?c would give true while a join would give no results because they are not the exact same term

How write this SPARQL request?

I've the following Ontology built in Protege 4.
In this Ontology : The main class Frame has an datatypeProperty hasDuration with domain 'Frame' and range UnsignedShort. the ClassShortFrame and LongFrame are inferred from the class SizedFrame with the followiing restriction
Rectriction for ShortFrame class
SizedFrame that hasDuration some unsignedLong[<=20]
Rectriction for LongFrame class
SizedFrame that hasDuration some unsignedLong[>=200]
I've manually created an instance of the class frame named frame0, which has a property hasDuration set to 12.
What is the SPARQL query that I need to get the all shortFrame. I hope that frame0 will be inferred like a shortFrame ?
Thanks for any reply !
Edition: sample query
PREFIX frame: <http://www.semantic.org/sample.owl#>
SELECT ?y WHERE {?y rdf:type frame:Frame}
but It is not working ! maybe It is not correct !
I believe, You're going to write some queries for OWL restriction information in SPARQL language. SPARQL is a RDF query language and has no understanding the concepts of OWL. Instead of making a restriction, you can use a data property to define duration value and from that you can get all the shortFrames using SPARQL. Other option I would recommend is use SWRL rules instead of SPARQL. Hope this helps !!
The query you give asks for all instance of type frame:Frame. Since you want just the short frames, you should adapt it like so:
SELECT ?y WHERE {?y a frame:ShortFrame}
...but the above will only work if the reasoner understands your restriction and can correctly classify frame0 as an instance of ShortFrame. I am not overly familiar with Protege's syntax for owl restrictions, so I am not 100% sure your restriction expresses what you want it to express.
As an alternative, you can actually express the restriction you require in SPARQL. To query for all frames with a duration of less than 20:
SELECT ?y
WHERE {
?y a frame:Frame;
frame:hasDuration ?d .
FILTER (?d <= 20)
}