GraphDB: use similarity search with embedded repository - graphdb

I have been using GraphDB's (version 9.2) similarity search from the workbench. Now I also want to use this feature for an embedded repository using graphdb-free-runtime 9.2.1. However I have no clue how this feature can be used from the APIs provided by the runtime. My questions are:
In http://graphdb.ontotext.com/documentation/standard/semantic-similarity-searches.html it is mentioned that similarity search is a plugin. However I could not find any such plugin within the runtime. Do I have to load it from some external resource? Where?
Is there any documentation or example how to create a similarity index programmatically?
Would it be an option to run the workbench as some kind of server and access the similarity search via REST API? However the REST API documentation at http://graphdb.ontotext.com/documentation/9.0/free/using-the-workbench-rest-api.html does not mention any API for similarity searches. Am I missing something?
Any hints or pointers are welcome.

You could add similarity plugin runtime, setting following "graphdb.extra.plugins" property to directory where similarity-plugin (you could find such into GDB instance -> dist/lib/plugins) is located , or:
-Dgraphdb.extra.plugins=directory
System.setProperty("graphdb.extra.plugins", "directory");
You may create index programmatically using SPARQL or:
to create similarity text index "allNews" execute following update:
PREFIX : <http://www.ontotext.com/graphdb/similarity/>
PREFIX inst: <http://www.ontotext.com/graphdb/similarity/instance/>
PREFIX pred: <http://www.ontotext.com/graphdb/similarity/psi/>
insert {
inst:allNews :createIndex "-termweight idf" ;
:analyzer "org.apache.lucene.analysis.en.EnglishAnalyzer" ;
:documentID ?documentID .
?documentID :documentText ?documentText .
} where {
SELECT ?documentID ?documentText {
?documentID ?p ?documentText .
filter(isLiteral(?documentText))
}
}
to delete index "allNews" execute following update:
PREFIX :<http://www.ontotext.com/graphdb/similarity/>
PREFIX inst:<http://www.ontotext.com/graphdb/similarity/instance/>
insert { inst:allNews :deleteIndex '' } where {}
to rebuild index "allNews":
PREFIX :<http://www.ontotext.com/graphdb/similarity/>
PREFIX inst:<http://www.ontotext.com/graphdb/similarity/instance/>
insert { inst:allNews :rebuildIndex '' } where {}
followed by the create query!
to list all created indexes, execute following query:
PREFIX :<http://www.ontotext.com/graphdb/similarity/>
PREFIX inst:<http://www.ontotext.com/graphdb/similarity/instance/>
select ?index ?status ?type
where {
?index :status ?status .
?index :type ?type .
}

Related

GraphDB Lucence Index

Any reason when creating an index the query
PREFIX elastic: <http://www.ontotext.com/connectors/elasticsearch#>
SELECT ?cntUri ?cntStr {
?cntUri elastic:listConnectors ?cntStr .
}
doesn't list it?
You can re-run the creation sparql and it reports, rightly, that you can't create it because it already exists.
So, the question is.. where is it?

SPIN representation to SPARQL

Is there an API that could help convert SPIN representation (of a SPARQL query) back to its SPARQL query form?
From:
[ a <http://spinrdf.org/sp#Select> ;
<http://spinrdf.org/sp#where> ( [ <http://spinrdf.org/sp#object> [ <http://spinrdf.org/sp#varName>
"o"^^<http://www.w3.org/2001/XMLSchema#string> ] ;
<http://spinrdf.org/sp#predicate>
[ <http://spinrdf.org/sp#varName>
"p"^^<http://www.w3.org/2001/XMLSchema#string> ] ;
<http://spinrdf.org/sp#subject>
[ <http://spinrdf.org/sp#varName>
"s"^^<http://www.w3.org/2001/XMLSchema#string> ]
] )
] .
To:
SELECT *
WHERE {
?s ?p ?o .
}
Thanks in advance.
I know two jena-based APIs to work with SPIN.
You can use either org.topbraid:shacl:1.0.1 which is based on jena-arq:3.0.4 or the mentioned in the comment org.spinrdf:spinrdf:3.0.0-SNAPSHOT, which is a fork of the first one, but with changed namespaces and updated dependencies.
Note the first (original) API also may work with modern jena (3.13.x), at least your task can be solved in such circumstances.
The second API has no maven release yet, but can be included into your project via jitpack.
To solve the problem you need to find the root org.apache.jena.rdf.model.Resource and cast it to org.topbraid.spin.model.Select (or org.spinrdf.model.Select) using jena polymorphism (i.e. the operation org.apache.jena.rdf.model.RDFNode#as(Class)).
Then #toString() will return the desired query with the model's prefixes.
Note that all personalities are already included into model via static initialization.
A demonstration of this approach is SpinTransformer from ONT-API test-scope, which transforms SPARQL-based queries to an equivalent form with sp:text.

How to compare values, ignoring diacritics, with SPARQL

I've been trying (with no success so far) to filter values with a "broader equals" condition. That is, ignoring diacritics.
select * where {
?s per:surname1 ?t.
bind (fn:starts-with(str(?t),'Maria') as ?noAccent1) .
bind (fn:translate(str(?t),"áéíóú","aeiou") as ?noAccent2) .
} limit 100
To this moment, I've tried with XPath functions fn:contains, fn:compare, fn:translate, fn:starts-with, but none of them seem to be working.
Is there any other way (other than chaining replaces) to add collation into these functions or achieve the same goal?
The XPath functions you mention are not part of the SPARQL standard really, so as you found out, you can't rely on them being supported out of the box (though some vendors may provide them as an add-on).
However, GraphDB (which is based on RDF4J) allows you to create your own custom functions in SPARQL. It is a matter of writing a Java class that implements the org.eclipse.rdf4j.query.algebra.evaluation.function.Function interface, and registering it in the RDF4J engine by packaging it as a Java Service Provider Interface (SPI) implementation.
SPARQL and REGEX do not support efficiently transliterating character maps. If you want an efficient implementation you would need a custom RDF4J custom as described by Jeen.
If you want a quick and dirty solution use this code sample:
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
PREFIX spif: <http://spinrdf.org/spif#>
select * where {
BIND("Mariana" as ?t) .
BIND("Márénísótú" as ?t2) .
BIND (regex(str(?t),'^Maria') as ?noAccent1) .
BIND (spif:replaceAll(
spif:replaceAll(
spif:replaceAll(
spif:replaceAll(
spif:replaceAll(str(?t2),"á","a"),
"é","e")
,"í","i"),
"ó","o"),
"ú","u") as ?noAccent2) .
}

How to create and use GeoSpatial indexes in Marklogic from Sparql

I have loaded the geospatial data from geonames.org into Marklogic using RDF import.
When using the Query Console to explore the data, I see the data has been loaded into an xml document and looks like this:
<sem:triple>
<sem:subject>http://sws.geonames.org/2736540/</sem:subject>
<sem:predicate>http://www.w3.org/2003/01/geo/wgs84_pos#lat</sem:predicate>
<sem:object datatype="http://www.w3.org/2001/XMLSchema#string">40.41476</sem:object>
</sem:triple>
<sem:triple>
<sem:subject>http://sws.geonames.org/2736540/</sem:subject>
<sem:predicate>http://www.w3.org/2003/01/geo/wgs84_pos#long</sem:predicate>
<sem:object datatype="http://www.w3.org/2001/XMLSchema#string">-8.54304</sem:object>
</sem:triple>
I am able to do a SPARQL DESCRIBE and see data. Here is an example.
#prefix geonames: <http://www.geonames.org/ontology#> .
#prefix xs: <http://www.w3.org/2001/XMLSchema#> .
#prefix p0: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
<http://sws.geonames.org/2736540/> geonames:parentCountry <http://sws.geonames.org/2264397/> ;
geonames:countryCode "PT"^^xs:string ;
p0:long "-8.54304"^^xs:string ;
geonames:featureCode <http://www.geonames.org/ontology#P.PPL> ;
geonames:parentADM1 <http://sws.geonames.org/2742610/> ;
geonames:parentFeature <http://sws.geonames.org/2742610/> ;
<http://www.w3.org/2000/01/rdf-schema#isDefinedBy> "http://sws.geonames.org/2736540/about.rdf"^^xs:string ;
a geonames:Feature ;
geonames:locationMap <http://www.geonames.org/2736540/pedreira-de-vilarinho.html> ;
geonames:name "Pedreira de Vilarinho"^^xs:string ;
geonames:nearbyFeatures <http://sws.geonames.org/2736540/nearby.rdf> ;
geonames:featureClass geonames:P ;
p0:lat "40.41476"^^xs:string .
I want to query over this data using SPARQL QUERY as my Query Type in a way where I can take advantage of the geospatial indexes that MarkLogic can create.
I have been having trouble with two aspects of this.
How to correctly create the geospatial indexes for the wgs84_pos#lat and wgs84_pos#long predicates?
How do I access those indexes from SPARQL QUERY?
I would like to have a sparql query that would be able to find subjects within some range of a Point.
=====================================
Followup:
After following David Ennis's Answer (Which worked nicely, thanks!) I ended up with this sample Xquery that was able to select data out of documents via geosearch and then use those IRI's in a sparql values query.
Example:
xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics"
at "/MarkLogic/semantics.xqy";
let $matches := cts:search(//rdf:RDF,
cts:element-pair-geospatial-query (
fn:QName("http://www.geonames.org/ontology#","Feature"),
fn:QName("http://www.w3.org/2003/01/geo/wgs84_pos#", "lat"),
fn:QName ("http://www.w3.org/2003/01/geo/wgs84_pos#","long"),
cts:circle(10, cts:point(19.8,99.8))))
let $iris := sem:iri($matches//#rdf:about)
let $bindings := (fn:map(function($n) { map:entry("featureIRI", $n) }, $iris))
let $sparql := '
PREFIX wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT *
WHERE {
?featureIRI wgs:lat ?lat;
wgs:long ?long.
}
'
return sem:sparql-values($sparql, $bindings)
This xquery queries the geospatial index, finds matching documents and then selects the IRI in the rdf:about attribute of the xml document.
It then maps over all of those IRIs and creates map entries that can be passed in the bindings parameter of the sem:sparql-values function.
I do not believe you can do what you want via just native SPARQL. Geospacial queries in any SPARQL implementation are extensions like geoSPARQL, Apache Jena geospacial queries etc.
My suggested approach in MarkLogic:
Insert the geonames subjects into MarkLogic as unmanaged triples (an XML or JSON document with embedded triples for each one)
In the same document, include the geo-spacial data in one of the acceptable MarkLogic formats. This essentially adds geo-spacial metadata to the triple since it is in the same fragment.
Add geo-spacial path-range-indexes for the geospacial data.
Use SPARQL inside of MarkLogic with a cts query restriction.
The Building Blocks for above:
Understanding unmanaged triples
Understanding Geo-spacial Region Types
Understanding Geo-spacial Indexes
Understanding Geo-spacial Queries
Understanding Semantics with cts search
Another approach to the final query could be the Optic API but I do not see how it would negate the need to do steps 1-3

Path queries in Wikidata endpoint?

Consider the following snippet
ASK WHERE { wd:Q734774 wdt:P31 wd:Q3918. }
This works fine in Wikidata. I want to use some of the path syntax in the this snippet. Specifically I want to limit the number of the times "wdt:P31" used in the path. According to the guidelines this should be the right syntax:
ASK WHERE { wd:Q734774 wdt:P31{,3} wd:Q3918. }
But it's giving me weird error messages. Any ideas?
The final version of SPARQL 1.1 Property Paths lets you do this with the following query --
ASK WHERE
{ wd:Q734774
wdt:P31? / wdt:P31? / wdt:P31?
wd:Q3918
}
For clarity, I've put the full Property Path Predicate (wdt:P31? / wdt:P31? / wdt:P31?) on a separate line between Subject (wd:Q734774) and Object (wd:Q3918). The trailing ? asks for one-or-zero instances of the wdt:P31 predicate, and the / asks for a sequence, so this full path asks for a sequence of zero-or-one-or-two-or-three instances.