Elasticsearch connector and owl:sameAs on graphdb - graphdb

im using ruleset OWL-RL optimized and using elasticsearch connector for search.
All i want is to recoginize the entity has same value and merge all values into one document in es.
Im doing this by:
Person - hasPhone - Phone and have InverseFunctionalProperty on relation hasPhone
Example:
http://example.com#1 http://example.com#hasPhone http://example.com#111.
http://example.com#2 http://example.com#hasPhone http://example.com#111.
=> #1 owl:sameAs #2
when i search by ES, i receive two result both #1, #2 . But when i repair connector i get only one result (that what i want).
1./ I want to ask is there a way that ES connector auto merge doc and delete previous doc ?, because i dont want to repair connector all the time. When i set manageIndex:false, it always get two results when searching.
2./ How to receive only one record, exculding the others have owl:sameAs with this record by SPARQL.
3./ Is there a better ruleset for owl:sameAs and InverseFunctionalProperty for reference ?

The connector watches for changes to property paths (as specified by the index definition) but I don't think it can detect the merging (smushing) caused by sameAs, that's why you need the rebuilding. If this is an important case, I can post an improvement issue for you, but please email graphdb-support and vladimir.alexiev at ontotext with a description of your business case (and link to this question)
If you have "sameAs optimization" enabled for the repo (which it is by default) and do NOT have "expand sameAs URLs" in the query, you should get only 1 result for queries like ?x <http://example.com#hasPhone> <http://example.com#111>
OWL-RL-Optimized is good for your case. (The rulesets supporting InverseFunctionalProperty are OWL-RL, OWL-QL, rdfsPlus and their optimized variants.)

Related

Analysis with SPARQL

I am trying to accomplish some relatively simple analysis with a specific graph.
In Marklogic SPARQL path are created with the following patterns
path+ (one or more duplicate path links)
path* (zero or more duplicate path links)
path? (zero or one path link)
path1/path2 (traversing through 2 different links)
From here, one analysis I would like to achieve is retrieving all nodes that fulfills a specific condition between node X and node Y. Based on this my query would be something like
?nodeX <nodeID> 1
?nodeY <nodeID> 250
?nodeX <nodeLink>* ?nodeY
Which does not really seem correct to me, as I don't think this allows me to retrieve the path linking nodeX to nodeY.
I would also like to know if it is possible to do things such as
Betweeness centrality which is a measure of the number of times a vertex is found between the shortest path of each vertex pair in a graph.
Closeness centrality which is a measure of the distance of one vertex to all other reachable vertices in the graph.
==Update==
Based on the suggestion I have managed to retrieve the path using the following query.
?nodeX <nodeID> "1"
?nodeY <nodeID> "250"
?nodeX <nodeLink>* ?v
?v ?p ?u
?u <nodeLink>* ?nodeY
When I attempted to do <p> | !<p> in my query an error occurred and stating ! was not a valid expression. However, I believe I can still do the same by using ?path which will accept any predicate.

How to compare values, ignoring diacritics, with SPARQL

I've been trying (with no success so far) to filter values with a "broader equals" condition. That is, ignoring diacritics.
select * where {
?s per:surname1 ?t.
bind (fn:starts-with(str(?t),'Maria') as ?noAccent1) .
bind (fn:translate(str(?t),"áéíóú","aeiou") as ?noAccent2) .
} limit 100
To this moment, I've tried with XPath functions fn:contains, fn:compare, fn:translate, fn:starts-with, but none of them seem to be working.
Is there any other way (other than chaining replaces) to add collation into these functions or achieve the same goal?
The XPath functions you mention are not part of the SPARQL standard really, so as you found out, you can't rely on them being supported out of the box (though some vendors may provide them as an add-on).
However, GraphDB (which is based on RDF4J) allows you to create your own custom functions in SPARQL. It is a matter of writing a Java class that implements the org.eclipse.rdf4j.query.algebra.evaluation.function.Function interface, and registering it in the RDF4J engine by packaging it as a Java Service Provider Interface (SPI) implementation.
SPARQL and REGEX do not support efficiently transliterating character maps. If you want an efficient implementation you would need a custom RDF4J custom as described by Jeen.
If you want a quick and dirty solution use this code sample:
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
PREFIX spif: <http://spinrdf.org/spif#>
select * where {
BIND("Mariana" as ?t) .
BIND("Márénísótú" as ?t2) .
BIND (regex(str(?t),'^Maria') as ?noAccent1) .
BIND (spif:replaceAll(
spif:replaceAll(
spif:replaceAll(
spif:replaceAll(
spif:replaceAll(str(?t2),"á","a"),
"é","e")
,"í","i"),
"ó","o"),
"ú","u") as ?noAccent2) .
}

Use same URL for both query and update

I know that by default fuseki provides different urls for both query and update, allowing some elegant management.
Now, i want to get a single URL for both update and query. The rationale behind this need is to avoid the propagation of two urls in the codebase.
I know that update and query codes should be separated, but my requests are not mixed. It's just to avoid the propagation of two objects instead of one.
My current config looks like:
<#service1> rdf:type fuseki:Service ;
fuseki:name "dataset" ; # http://host:port/dataset
fuseki:serviceQuery "endpoint" ; # SPARQL query service
fuseki:serviceUpdate "endpoint" ; # SPARQL update service
fuseki:dataset <#dataset> ;
.
In theory, an interface exists at /endpoint, but only accept update. When query with:
prefix sfm: <sfm/>
SELECT DISTINCT ?value
WHERE {
sfm:config sfm:component ?value.
}
the server reports many lines like the following:
INFO [4] POST http://localhost:9876/sfm/endpoint
INFO [4] POST /sfm :: 'endpoint' :: [application/x-www-form-urlencoded] ?
INFO [4] 400 SPARQL Update: No 'update=' parameter (0 ms)
I can't find anything in the doc that specify that query and update service can't be at same place, so i'm assume it's possible and i've just missed something.
However the last line of log is explicit: fuseki waits for an update.
One other solution could be to define the url as localhost/dataset/, and depending if i query or update, add the relevant part at the end, giving respectively localhost/dataset/query and localhost/dataset/update.
But (1) this lead the database to need to have a particular url naming, and (2) it looks like a strong requirement about the triplestore: when i will use another one, it will have to provide the same interface, which could be not possible. (don't know if this feature is implemented in other triplestores)
EDIT: fix the POST/GET error
405 HTTP method not allowed: SPARQL Update : use POST
It looks like you are using GET for an SPARQL Update.
It has correctly routed the operation to the update processor (you can use the same endpoint - including dropping the service part and just using the dataset URL).
However, in HTTP, GET are cacheable operations and should not be used when they can cause changes. a GET may not actually reach the end server but some intermediate respond to it from a web cache.
Use POST.
The same is true if you separate services for query and update.
Original Context
The original question has been edited. The original report was asking about this:
INFO [1] 405 HTTP method not allowed: SPARQL Update : use POST (2 ms)
Answer to the revised and different question:
The endpoint for shared services is the dataset URL:
http://localhost:9876/sfm
Whether update, query or services are available is controlled by the configuration file.
Setting fuseki:serviceQuery and fuseki:serviceUpdate the same is not necessary and is discouraged.

How Do I Query Against Data.gov

I am trying to teach myself this weekend how to run API queries against a data source in this case data.gov. At first I thought I'd use a simple SQL variant, but it seems in this case I have to use SPARQL.
I've read through the documentation, downloaded Twinkle, and can't seem to quite get it to run. Here is an example of a query I'm running. I'm basically trying to find all gas stations that are null around Denver, CO.
PREFIX station: https://api.data.gov/nrel/alt-fuel-stations/v1/nearest.json?api_key=???location=Denver+CO
SELECT *
WHERE
{ ?x station:network ?network like "null"
}
Any help would be very much appreciated.
SPARQL is a graph pattern language for RDF triples. A query consists of a set of "basic graph patterns" described by triple patterns of the form <subject>, <predicate>, <object>. RDF defines the subject and predicate with URI's and the object is either a URI (object property) or literal (datatype or language-tagged property). Each triple pattern in a query must therefore have three entities.
Since we don't have any examples of your data, I'll provide a way to explore the data a bit. Let's assume your prefix is correctly defined, which I doubt - it will not be the REST API URL, but the URI of the entity itself. Then you can try the following:
PREFIX station: <http://api.data.gov/nrel...>
SELECT *
WHERE
{ ?s station:network ?network .
}
...setting the PREFIX to correctly represent the namespace for network. Then look at the binding for ?network and find out how they represent null. Let's say it is a string as you show. Then the query would look like:
PREFIX station: <http://api.data.gov/nrel...>
SELECT ?s
WHERE
{ ?s station:network "null" .
}
There is no like in SPARQL, but you could use a FILTER clause using regex or other string matching features of SPARQL.
And please, please, please google "SPARQL" and "RDF". There is lots of information about SPARQL, and the W3C's SPARQL 1.1 Query Language Recommendation is a comprehensive source with many good examples.

SPARQL prefix wildcard

I'm attempting to write a SPARQL query which would allow me to find all nodes which are reachable from a given node. At the moment every edge has the prefix http://www.foo.com/edge# and there are 3 possible edges (uses, extends, implements). While I can get the correct result from "?start (edge:uses | edge:implements | edge:extends)* ?reached " I would like to reduce that down to one statement, some kind of wildcard after edge:, so that if I add more edge types then I wouldn't need to extend the query. Is this possible?
see this SPARQL - Restricting Result Resource to Certain Namespace(s)
If you know it's always going to be in the same namespace, you could have something looking like:
?start ?edge ?reached
FILTER(REGEX(STR(?var), "^http://www.foo.com/edge#"))