SPARQL join/ combine the langString values of into one

SPARQL join/ combine the langString values of into one - sparql

I'm using the geonames dataset and there are two properties gn:officialName and gn:alternateName which both contain rdf:langString values. I hava a CONSTRUCT query where I would like to combine both values into one. Is that possible with SPARQL 1.1?
Bonus
How to prioritize the values of one property and only use the other on if there is no translation for a locale available?

You can combine them in the WHERE part of the CONSTRUCT query with a BIND( ... AS ?var) and use ?var in the CONSTRUCT template.
BIND ( CONCAT(?v1, ?v2) AS ?var)
If ?v1 and ?v2 are different languages, you need to take the str
BIND ( CONCAT(str(?v1), str(?v2)) AS ?var)
and maybe use strlang to set the language if you want.

Related

Functions in SPARQL to Manipulate IRIs?

I want to write some reusable SPARQL queries to do things like take an IRI, get the name part (typically after the # sign), modify it (e.g., replace underscores with blank spaces) and put it in the rdfs:label property. This would be useful for Protege which doesn't fill in the rdfs:label if you use user defined IRIs. Or take an IRI with a user defined name, do the same and then replace the user defined IRI with a UUID. I looked in the SPARQL spec for functions to manipulate IRIs and either they don't exist or I'm missing something obvious. I'm posting this to make sure it isn't the latter. I know it is easy to do the equivalent with things like SUBSTR but I'm surprised that there aren't predefined operators to do things like getting the name part of an IRI and getting the base and want to double check before I roll my own.

In case anyone else wants to do this, I figured it out. There are some answers on this site but they are all for SQL or other languages than SPARQL. The following is for classes and it should be obvious how to adapt it for other entities. Note: this works in the Snap SPARQL Plugin for Protege (that's why I use CONSTRUCT rather than INSERT), however, there is a bug in their implementation of SUBSTR so that it uses 0 based indexing rather than 1 based as the spec says. So if you use this in Snap SPARQL change the 1 to a 2.
CONSTRUCT {?c rdfs:label ?lblname.}
WHERE {?c rdfs:subClassOf owl:Thing.
BIND(STRAFTER(STR(?c), '#') as ?name)
BIND(REPLACE(?name,"([A-Z])", " $1" ) as ?namewbs)
BIND (IF (STRSTARTS(?namewbs," "),SUBSTR(?namewbs,1),?namewbs) AS ?lblname)
FILTER(?c != owl:Thing || ?c != owl:Nothing)}

Is it possible to use variables as integers in SPARQL property paths?

I am currently trying to create pointers to datatype values as they cannot be linked directly. However, I would like to be able to evaluate the pointers from within the SPARQL environment, which raised specifically in the case that the desired value is part of an ordered rdf:List some questions for me. My approach is to use property paths within a SPARQL query in which I can use the defined individual, property and index of the ordered list that the pointer has attached to it.
Given the following example data with the shortened syntax for ordered lists by ttl:
ex:myObject ex:somePropery ("1" "2" "3") .
ex:myPointer ex:lookAtIndividual ex:myObject;
ex:lookAtProperty ex:someProperty ;
ex:lookAtIndex "3"^^xsd:integer .
Now I would like to create a SPARQL query that -- based on the pointer -- returns the value at the given index. To my understanding the query could/should look something like this:
SELECT ?value
WHERE {
ex:myPointer ex:lookAtIndividual ?individual ;
ex:lookAtProperty ?prop ;
ex:lookAtIndex ?index .
?individual ?prop/rdf:rest{?index-1}/rdf:first ?value .
}
But if I try to execute this query with TopBraid, it shows an error message that ?index has been found when <INTEGER> was expected. I also tried binding the index in the SPARQL query via BIND(?index-1 AS ?i), again without success. If the pointed value is not stored in a list, the query without property path works fine.
Is it in general possible to use a value that is connected via datatype property within a SPARQL query as path length for property paths?

This syntax: rdf:rest{<number>} is not standard SPARQL. So the short answer is, regrettably: no, you can't use variables as integers in SPARQL property paths, for the simple reason that you can't use integers in SPARQL property paths at all.
In an earlier draft of the SPARQL standard, there was a proposal to use this kind of syntax to allow specifying the min and max length of a property path, e.g. rdf:rest{1, 3} would match any paths using rdf:rest properties between length 1 and 3. But this was never fully standardized and most SPARQL engines don't implement it.
If you happen to use a SPARQL engine that does implement it, you will have to get in touch with the developers directly to ask if they can extend the mechanism to allow use of variables in this position (the error message suggests to me that it's currently just not possible).
As an aside: there's a SPARQL 1.2 community initiative going on. It only just got started but one of the proposals on the table is re-introducing this particular piece of functionality to the standard.

How to ignore VALUES in SPARQL?

Part of my query looks something like this:
GRAPH g1: {VALUES (?ut) {$U1}
?IC_uri skos:related ?ut .
}
Normally, based on user input, $U1 gets a list of URIs. I would like to send for test purposes values for $U1 so that the declaration of values is ignored and all possible values are considered. In fact, it should produce the same results as:
GRAPH g1: {
# VALUES (?ut) {$U1}
?IC_uri skos:related ?ut .
}
I remember there was a way to do that, but I couldn't find it in the SPARQL specification.

I'd propose three options:
FILTER (?ut IN ($ut)), passing $ut instead of a list of URIs;
BIND ($ut as ?ut), passing $ut instead of a single URI;
VALUES (?ut) {(UNDEF)}, passing (UNDEF) instead of a space-separated list of (parentheses-enclosed) URIs.
Such SPARQL injections can not be considered safe.
The UNDEF keyword first mentioned in 10.2.2 VALUES Examples:
If a variable has no value for a particular solution in the VALUES clause, the keyword UNDEF is used instead of an RDF term.

Avoiding count explosion in querying with SPARQL ontologies containing owl:sameAs triples

I am trying to build a (local) ontology that describes a finite number of objects, and that links these objects to external resources via the owl:sameAs predicate. However, when I simply query for the number of objects of that kind, I obtain twice as much as the object described. It is clear that also the external resources are counted independently, as what is taken into account is the number of URIs, and not the number of distinct objects.
I have solved this issue in the following way: I assume that the local ontology can be seen as a "reference hub" for knowing basic stuff about these objects, so I select all the objects of a certain kind, and then filter out only those that contain the base URI of the local ontology, i.e.:
# How many objects are there?
PREFIX ch: <http://www.example.com/ontologies/domain#>
SELECT (COUNT(DISTINCT ?elem) AS ?count) WHERE {
?elem a ch:Element.
FILTER (REGEX (STR(?elem) ,"http://www.example.com/ontologies/domain") ).
}
However, I have two concerns with this way of doing:
1) it looks a bit of a hack (even if somehow principled), whilst I would like something that makes more logical sense
2) I have the impression that this query is not very efficient.
I have searched quite a bit here, and on google, but didn't come out with any better solution... any suggestions here?
Thank you very much for any help!

GROUP BY a representative element
If there's some property that should have distinct values for each individual, then you can use it to impose the "equivalent class" structure that you need. E.g., something like this:
prefix ch: <http://www.example.com/ontologies/domain#>
select (count(?label) as ?count) where {
?elem a ch:element ;
rdfs:label ?label .
}
group by ?label
Synthesize a representative element
if there's not a value that will be shared by all elements in an equivalence class, you can still get a representative element from the set by asking for the minimal element in each equivalence class. We can use the IRIs of the elements to order the elements, and use that to select a unique individual. This does presume that each ?elem and all the things that it is the same as have well defined behavior under the str function (and IRIs do).
prefix ch: <http://www.example.com/ontologies/domain#>
select (count(distinct ?elem) as ?count) where {
?elem a ch:element .
filter not exists {
?elem (owl:sameAs|^owl:sameAs)* ?elem_
filter( str(?elem_) < str(?elem) )
}
}

Selecting strings with LIKE operator in SPARQL

In ANSI SQL, you can write something like:
SELECT * FROM DBTable WHERE Description LIKE 'MEET'
or also:
SELECT * FROM DBTable WHERE Description LIKE '%MEET%'
What I would like help with is writing the SPARQL equivalent of the above please.

Use a regex filter. You can find a short tutorial here
Here's what it looks like:
PREFIX ns: <http://example.com/namespace>
SELECT ?x
WHERE
{ ?x ns:SomePredicate ?y .
FILTER regex(?y, "YOUR_REGEX", "i") }
YOUR_REGEX must be an expression of the XQuery regular expression language
i is an optional flag. It means that the match is case insensitive.

If you have a fixed string to match you can use that directly in your graph pattern e.g.
PREFIX ns: <http://example.com/namespace>
SELECT ?x
WHERE
{ ?x ns:SomePredicate "YourString" }
Note this does not always work because pattern matching is based on RDF term equality which means "YourString" is not considered the same term as say "YourString"#en so if that may be an issue use the REGEX approach Tom suggests
Also some SPARQL engines provide a full text search extension which allows you to integrate Lucene style queries into your SPARQL queries which may better fit your use case and almost certainly be more efficient for the generic search which would otherwise require a REGEX

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas