GraphDB Lucence Index - lucene

Any reason when creating an index the query
PREFIX elastic: <http://www.ontotext.com/connectors/elasticsearch#>
SELECT ?cntUri ?cntStr {
?cntUri elastic:listConnectors ?cntStr .
}
doesn't list it?
You can re-run the creation sparql and it reports, rightly, that you can't create it because it already exists.
So, the question is.. where is it?

Related

GraphDB: use similarity search with embedded repository

I have been using GraphDB's (version 9.2) similarity search from the workbench. Now I also want to use this feature for an embedded repository using graphdb-free-runtime 9.2.1. However I have no clue how this feature can be used from the APIs provided by the runtime. My questions are:
In http://graphdb.ontotext.com/documentation/standard/semantic-similarity-searches.html it is mentioned that similarity search is a plugin. However I could not find any such plugin within the runtime. Do I have to load it from some external resource? Where?
Is there any documentation or example how to create a similarity index programmatically?
Would it be an option to run the workbench as some kind of server and access the similarity search via REST API? However the REST API documentation at http://graphdb.ontotext.com/documentation/9.0/free/using-the-workbench-rest-api.html does not mention any API for similarity searches. Am I missing something?
Any hints or pointers are welcome.
You could add similarity plugin runtime, setting following "graphdb.extra.plugins" property to directory where similarity-plugin (you could find such into GDB instance -> dist/lib/plugins) is located , or:
-Dgraphdb.extra.plugins=directory
System.setProperty("graphdb.extra.plugins", "directory");
You may create index programmatically using SPARQL or:
to create similarity text index "allNews" execute following update:
PREFIX : <http://www.ontotext.com/graphdb/similarity/>
PREFIX inst: <http://www.ontotext.com/graphdb/similarity/instance/>
PREFIX pred: <http://www.ontotext.com/graphdb/similarity/psi/>
insert {
inst:allNews :createIndex "-termweight idf" ;
:analyzer "org.apache.lucene.analysis.en.EnglishAnalyzer" ;
:documentID ?documentID .
?documentID :documentText ?documentText .
} where {
SELECT ?documentID ?documentText {
?documentID ?p ?documentText .
filter(isLiteral(?documentText))
}
}
to delete index "allNews" execute following update:
PREFIX :<http://www.ontotext.com/graphdb/similarity/>
PREFIX inst:<http://www.ontotext.com/graphdb/similarity/instance/>
insert { inst:allNews :deleteIndex '' } where {}
to rebuild index "allNews":
PREFIX :<http://www.ontotext.com/graphdb/similarity/>
PREFIX inst:<http://www.ontotext.com/graphdb/similarity/instance/>
insert { inst:allNews :rebuildIndex '' } where {}
followed by the create query!
to list all created indexes, execute following query:
PREFIX :<http://www.ontotext.com/graphdb/similarity/>
PREFIX inst:<http://www.ontotext.com/graphdb/similarity/instance/>
select ?index ?status ?type
where {
?index :status ?status .
?index :type ?type .
}

Elasticsearch connector and owl:sameAs on graphdb

im using ruleset OWL-RL optimized and using elasticsearch connector for search.
All i want is to recoginize the entity has same value and merge all values into one document in es.
Im doing this by:
Person - hasPhone - Phone and have InverseFunctionalProperty on relation hasPhone
Example:
http://example.com#1 http://example.com#hasPhone http://example.com#111.
http://example.com#2 http://example.com#hasPhone http://example.com#111.
=> #1 owl:sameAs #2
when i search by ES, i receive two result both #1, #2 . But when i repair connector i get only one result (that what i want).
1./ I want to ask is there a way that ES connector auto merge doc and delete previous doc ?, because i dont want to repair connector all the time. When i set manageIndex:false, it always get two results when searching.
2./ How to receive only one record, exculding the others have owl:sameAs with this record by SPARQL.
3./ Is there a better ruleset for owl:sameAs and InverseFunctionalProperty for reference ?
The connector watches for changes to property paths (as specified by the index definition) but I don't think it can detect the merging (smushing) caused by sameAs, that's why you need the rebuilding. If this is an important case, I can post an improvement issue for you, but please email graphdb-support and vladimir.alexiev at ontotext with a description of your business case (and link to this question)
If you have "sameAs optimization" enabled for the repo (which it is by default) and do NOT have "expand sameAs URLs" in the query, you should get only 1 result for queries like ?x <http://example.com#hasPhone> <http://example.com#111>
OWL-RL-Optimized is good for your case. (The rulesets supporting InverseFunctionalProperty are OWL-RL, OWL-QL, rdfsPlus and their optimized variants.)

How compute lucene FuzzyQuery on top GraphDB lucene index?

GraphDB supports FTS Lucene plugin to build RDF 'molecule' to index texts efficiently. However, when there is a typo (missspell) in the word your are searching, Lucene would not retrieve a result. I wonder if it is possible to implement a FuzzyQuery based on the Damerau-Levenshtein algorithm on top the Lucene Index in GraphDB for FTS. That way even if the word is not correctly spell you can get a list of more 'closed' words based on an edit distance similarity.
This is the index I have created for indexing labels of NounSynset in WordNet RDF.
PREFIX wn20schema: <http://www.w3.org/2006/03/wn/wn20/schema/>
INSERT DATA {
luc:index luc:setParam "uris" .
luc:include luc:setParam "literals" .
luc:moleculeSize luc:setParam "1" .
luc:includePredicates luc:setParam "http://www.w3.org/2000/01/rdf-schema#label" .
luc:includeEntities luc:setParam wn20schema:NounSynset.
luc:nounIndex luc:createIndex "true".
}
When running the query
select * where {
{?id luc:nounIndex "credict"}
?id luc:score ?score.
}
The result is empty and I would like to get at least the word "credit" as the edit distance is 1.
Thank you!!!
If you use the ~ it should give you a fuzzy match.

How to Sparql INSERT in TopBraid Composer?

I'm trying to insert new classes by using Sparql in TopBraid Composer (ME 5.5.2). My simple ontology looks like this:
Then I wrote a Sparql query to insert Berry as a subclass of Fruit:
PREFIX ft: <http://www.semanticweb.org/ontologies/2018/7/fruit#>
PREFIX rdfs: <ttp://www.w3.org/2000/01/rdf-schema#>
INSERT
{ft:Berry rdfs:subClassOf ft:Fruit}
But an error message appeared, saying Encountered "insert". Was expecting one of: "base, "select", ...
A similar post: Sparql insert data not working says that Sparql Query is a different language from Sparql Update. Some other post says that Sparql Update is not supported in Protege but is supported in Composer (for which reason I downloaded Composer). I also checked the Composer manual: https://www.topquadrant.com/docs/TBC-Getting-Started-Guide52.pdf, which mentions Sparql Update but doesn't say much.
My question then is, is it possible to insert classes and axioms in TopBraid? If so, how? My end goal is that the inserted classes will appear in the hierarchical view, and their inserted class definitions can be seen on the side as well. If Composer can't do this, what other tools/workflows can I use?
Sorry for such a newbie question. Any help is appreciated.
There are two forms of INSERT in SPARQL 1.1 Update:
INSERT DATA
INSERT
You are mixing them up.
The following works for me in TBC 5.5.2 Free Edition against the kennedys.ttl example:
INSERT DATA
{ kennedys:UralStateUniversity a kennedys:College }
Being an unknown URI, the subject is underlined in query editor, but just press the "Execute SPARQL" button.
Update
In your particular case, you should say something like
INSERT DATA
{ ft:Berry rdfs:subClassOf ft:Fruit; a owl:Class }
Please note that owl:Class is used. TBC considers instances of rdfs:Class as "system classes", their icons are brown, not yellow.

How to create and use GeoSpatial indexes in Marklogic from Sparql

I have loaded the geospatial data from geonames.org into Marklogic using RDF import.
When using the Query Console to explore the data, I see the data has been loaded into an xml document and looks like this:
<sem:triple>
<sem:subject>http://sws.geonames.org/2736540/</sem:subject>
<sem:predicate>http://www.w3.org/2003/01/geo/wgs84_pos#lat</sem:predicate>
<sem:object datatype="http://www.w3.org/2001/XMLSchema#string">40.41476</sem:object>
</sem:triple>
<sem:triple>
<sem:subject>http://sws.geonames.org/2736540/</sem:subject>
<sem:predicate>http://www.w3.org/2003/01/geo/wgs84_pos#long</sem:predicate>
<sem:object datatype="http://www.w3.org/2001/XMLSchema#string">-8.54304</sem:object>
</sem:triple>
I am able to do a SPARQL DESCRIBE and see data. Here is an example.
#prefix geonames: <http://www.geonames.org/ontology#> .
#prefix xs: <http://www.w3.org/2001/XMLSchema#> .
#prefix p0: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
<http://sws.geonames.org/2736540/> geonames:parentCountry <http://sws.geonames.org/2264397/> ;
geonames:countryCode "PT"^^xs:string ;
p0:long "-8.54304"^^xs:string ;
geonames:featureCode <http://www.geonames.org/ontology#P.PPL> ;
geonames:parentADM1 <http://sws.geonames.org/2742610/> ;
geonames:parentFeature <http://sws.geonames.org/2742610/> ;
<http://www.w3.org/2000/01/rdf-schema#isDefinedBy> "http://sws.geonames.org/2736540/about.rdf"^^xs:string ;
a geonames:Feature ;
geonames:locationMap <http://www.geonames.org/2736540/pedreira-de-vilarinho.html> ;
geonames:name "Pedreira de Vilarinho"^^xs:string ;
geonames:nearbyFeatures <http://sws.geonames.org/2736540/nearby.rdf> ;
geonames:featureClass geonames:P ;
p0:lat "40.41476"^^xs:string .
I want to query over this data using SPARQL QUERY as my Query Type in a way where I can take advantage of the geospatial indexes that MarkLogic can create.
I have been having trouble with two aspects of this.
How to correctly create the geospatial indexes for the wgs84_pos#lat and wgs84_pos#long predicates?
How do I access those indexes from SPARQL QUERY?
I would like to have a sparql query that would be able to find subjects within some range of a Point.
=====================================
Followup:
After following David Ennis's Answer (Which worked nicely, thanks!) I ended up with this sample Xquery that was able to select data out of documents via geosearch and then use those IRI's in a sparql values query.
Example:
xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics"
at "/MarkLogic/semantics.xqy";
let $matches := cts:search(//rdf:RDF,
cts:element-pair-geospatial-query (
fn:QName("http://www.geonames.org/ontology#","Feature"),
fn:QName("http://www.w3.org/2003/01/geo/wgs84_pos#", "lat"),
fn:QName ("http://www.w3.org/2003/01/geo/wgs84_pos#","long"),
cts:circle(10, cts:point(19.8,99.8))))
let $iris := sem:iri($matches//#rdf:about)
let $bindings := (fn:map(function($n) { map:entry("featureIRI", $n) }, $iris))
let $sparql := '
PREFIX wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT *
WHERE {
?featureIRI wgs:lat ?lat;
wgs:long ?long.
}
'
return sem:sparql-values($sparql, $bindings)
This xquery queries the geospatial index, finds matching documents and then selects the IRI in the rdf:about attribute of the xml document.
It then maps over all of those IRIs and creates map entries that can be passed in the bindings parameter of the sem:sparql-values function.
I do not believe you can do what you want via just native SPARQL. Geospacial queries in any SPARQL implementation are extensions like geoSPARQL, Apache Jena geospacial queries etc.
My suggested approach in MarkLogic:
Insert the geonames subjects into MarkLogic as unmanaged triples (an XML or JSON document with embedded triples for each one)
In the same document, include the geo-spacial data in one of the acceptable MarkLogic formats. This essentially adds geo-spacial metadata to the triple since it is in the same fragment.
Add geo-spacial path-range-indexes for the geospacial data.
Use SPARQL inside of MarkLogic with a cts query restriction.
The Building Blocks for above:
Understanding unmanaged triples
Understanding Geo-spacial Region Types
Understanding Geo-spacial Indexes
Understanding Geo-spacial Queries
Understanding Semantics with cts search
Another approach to the final query could be the Optic API but I do not see how it would negate the need to do steps 1-3