Copy a Sesame repository into a new one

Copy a Sesame repository into a new one - sparql

I'd like to copy all the data from an existing Sesame repository into a new one. I need the migration so that I use OWL-inferencing on my triplestore which is not possible using OWLIM in an In Memory repository (the type of my existing repository).
What is the most efficient way to copy all triples from a repository into a new one?
UPDATE 1:
I'm curious to understand why using SPARQL INSERT cannot be a valid approach. I tried this code under the SPARQL Update section of a new repository:
PREFIX : <http://dbpedia.org/resource/>
INSERT{?s ?p ?o}
WHERE
{
SERVICE <http://my.ip.ad.here:8080/openrdf-workbench/repositories/rep_name>
{
?s ?p ?o
}
}
I get the following error:
org.openrdf.query.UpdateExecutionException: org.openrdf.sail.SailException: org.openrdf.query.QueryEvaluationException:
Is there an error in the query or can the data not be inserted this way? I've inserted data from DBpedia using queries of similar structure.

Manually (Workbench)
Open the repository you want to copy from.
select 'Export'.
choose a suitable export format ('TriG' or 'BinaryRDF' are good choices as these both preserve context information), and hit the 'download' button to save the data export to local disk.
Open the repository you want to copy to.
select 'Add'.
choose 'select the file containing the RDF data you wish to upload'
Click 'Browse' to find the data export file on your local disk.
Make sure 'use Base URI as context identifier' is not selected.
Hit 'upload', and sit back.
Programmatically
First, open RepositoryConnnections to both repositories:
RepositoryConnection source = sourceRepo.getConnection();
RepositoryConnection target = targetRepo.getConnection();
Then, read all statements from source and immediately insert into target:
target.add(source.getStatements(null, null, null, true));
Either basic method should work just fine for any repository up to about several million triples in size. Of course there are plenty of more advanced methods for larger bulk sizes.

Related

Clearing the internal string cache in rdf4j with sparql

To avoid a possible "XY problem", let me explain my real goal: I am trying to change the capitalization of language tags in an rdf4j repo, using sparql. But although rdf4j stores language tags as written when they were defined, it knows enough to treat them as case-insensitive as the standard dictates. So it treats my attempted edit as a no-op:
Set-up:
INSERT DATA { test:a skos:prefLabel "hello"#EN }
Attempt:
DELETE { test:a skos:prefLabel "hello"#EN }
INSERT { test:a skos:prefLabel "hello"#en }
WHERE
{ test:a skos:prefLabel "hello"#EN }
Result:
This query does nothing. The language tag is still spelled EN.
Interestingly, this also fails if I execute two separate queries:
Query 1:
DELETE DATA { test:a skos:prefLabel "hello"#EN }
Query 2:
INSERT DATA { test:a skos:prefLabel "hello"#en }
Evidently, deleted strings remain in an internal cache and are resurrected, so that my INSERT query resurrects "hello"#EN instead. A restart will clear the cache, but it's not the best UX...
Now, with some older versions of rdf4j I could clear this internal cache with the magic command CLEAR SILENT GRAPH <urn:uri:cache>. But this does not appear to work with rdf4j 2.3.3, which is what we are stuck with at the moment. Is there still a way to clear the string cache without a restart, or to change the capitalization of language tags in any other way?
PS I found this interesting thread about the handling of case in language tags; but it has brought me no closer to a solution.

At first glance this looks like a bug to me, an unintended consequence of a fix we did donkey's years ago for allowing preservation of case in language tags (https://openrdf.atlassian.net/browse/SES-1659).
I'm not sure there are any SPARQL-only workarounds for this, so please feel free to log a bug report/feature request at https://github.com/eclipse/rdf4j/issues.
Having said that, RDF4J does have functionality for normalizing language tags of course. In particular, the RDF parsers can be configured to normalize language tags (see the Rio configuration documentation), and in addition there's a utility method Literals.normalizeLanguageTag which you can use to convert any language tag to a standard canonical form.

SPIN constraint using CONSTRUCT: where do the CONSTRUCT's triples go?

I'm using TopBraid Composer Free Edition (5.1.3) to create ontologies including SPIN constraints. I then load the resulting RDF files into RDF4J (2.0.1) and use RDF4J Workbench for testing.
I'm working on SPIN constraints. Here's an example to check for non-negative signal rates that I've added to the CRO2:SignalRate class:
CONSTRUCT {
?this soo:hasConstraintViolation _:b0 .
_:b0 a spin:ConstraintViolation .
_:b0 rdfs:label "Non-Positive SignalRate" .
_:b0 spin:violationRoot ?this .
_:b0 spin:violationPath Nuvio:hasDataValue .
_:b0 spin:violationLevel spin:Warning .
}
WHERE {
?this Nuvio:hasDataValue ?signalRate .
FILTER (?signalRate <= 0.0) .
}
So, I'm testing this constraint in RDF4J workbench using the following SPARQL update query:
PREFIX inst: <http://www.disa.mil/dso/a2i/ontologies/PBSM/Sharing/Instantiations#>
PREFIX Nuvio: <http://cogradio.org/ont/Nuvio.owl#>
PREFIX CRO2: <http://cogradio.org/ont/CRO2.owl#>
INSERT DATA {
inst:aSignalRate_test a CRO2:SignalRate ;
Nuvio:hasDataValue "-10"^^xsd:long .
}
This test instant violates the constraint shown above. If I omit the spin:violationLevel triple and allow this to default to a spin:Error, then I get an error message from the query and the test instance is not asserted, as expected. When executed as shown, the constraint violation is a spin:Warning, so the inst:aSignalRate_test individual is created with data value -10.0. My question is, where do the assertions in the constraint's CONSTRUCT clause go? I believe they're being asserted since the change in the spin:violationLevel impacts behavior. Note that I've tried to tie into the blank node with my own soo:hasConstraintViolation property, but this doesn't work. Are the CONSTRUCT triples asserted in some other context/graph? I'm just using the default/graph for everything.
I'm looking for the expected triples both using RDF4J Workbench's Explore and using SPARQL queries. For example, the following query returns nothing after I assert my errant CRO2:SignalRate:
PREFIX spin: <http://spinrdf.org/spin#>
SELECT DISTINCT *
WHERE {
?s spin:violationRoot ?o .
}
This behavior is consistent between asserting in TopBraid Composer FE and RDF4J Workbench.
My goal is to find and use the diagnostic messages asserted in case of SPIN constraint violations, preferably by using SPARQL queries to find such diagnostics. Seems reasonable. I'm missing something.
Thanks.

The short answer: you can't.
SPIN constraints are intended to detect violations and report them. In RDF4J, that reporting mechanism is the log.
The relevant part of the SPIN spec (http://spinrdf.org/spin.html#spin-constraints) :
[...] if an ASK constraint evaluates to true for one
instance, then the instance violates the condition. Optionally,
CONSTRUCT queries can create instances of a spin:ConstraintViolation
class that provide details on a specific violation.
Note that there is no requirement on the reasoner to do anything with the data that a CONSTRUCT-based constraint produces - it's merely for optional "additional information".
It's perhaps worth seeing if we could add an enhancement to the reasoner to report such triples back in one form or another, but in the current system, only SPIN rules (using DELETE/INSERT etc) modify the database.

So, following #JeenBroekstra comments and my response comment above, I've switched to using constuctors so that the error information remains as visible artifacts. I've created several of my own subproperties of the spin:constructor in order to keep things ordered. I've also specified the execution order of these constructors so that these checks run ahead of other rules that might be tripped up (e.g. by a negative signal rate).
Advantages of this approach:
The error detail artifacts (e.g. spin:violationRoot) remain visible in the triple store. This is very important in my application which involves machine-to-machine.
All the compliance checks are done, so an individual with multiple problems lists all problems as separate hasConstraintViolation properties, not just the first violation to block instantiation.
Disadvantages of this approach:
The errant individual is still instantiated.
This is not standard behavior, so tools geared to look for constraint artifacts in the logs probably won't find them.
Here's a screen shot of an example rule implemented as a subproperty of spin:constructor:

Virtuoso SPARQL endpoint retrieving old values

I am having a problem with my Virtuoso RDF Store.
Even though I am uploading new RDF files, everytime I make a SPARQL query, the only values which are retrieved are from old RDF files (the ones I manually deleted a long time ago).
This is an example of the SPARQL query I am doing:
PREFIX owl-time: <http://www.w3.org/2006/time#>
PREFIX cdc-owl: <http://www.contextdatacloud.org/ontology/>
SELECT *
FROM <miOnt:move>
WHERE
{
?ws rdf:type cdc-owl:WeatherSituation.
?ws cdc-owl:hasWeatherTime ?time.
?time owl-time:inXSDDateTime "2015-06-16T09:00:00".
?ws cdc-owl:hasTemperature ?temperature
}
And these are the results I am obtaining (as it can be seen, they are old files):
Any idea of why is this happening?
This is the way my repository looks like:

From what I can see, you have loaded your files into the Virtuoso WebDAV (file) repository, but you have not loaded the RDF therein into the Virtuoso (RDF) Quad Store.
See this guide to the bulk loader, and this page of RDF loading methods.
(ObDisclaimer: I work for OpenLink Software, producer of Virtuoso.)

How to use deal with custom database types in CakePHP ORM? Persist and Fetch Spacial data

Ill try to tell the long story short:
MySQL has a spatial Point data type for some time. To insert data we need to use expression of INSERT (...) VALUES (POINT(lon lat) ...). To do that in CakeORM we need to have our given property as new QueryExpression("POINT($lon,$lat)"); and the CakePHP will handle data binding upon saving entity.
Now what I want to do is to Have entity with property (field) of type GeoJSON Point that corresponds to Mysql column of type Point as well and the ability to store and fetch entities to and from the db
I have used beforeSave call back for that in such manner:
public function beforeSave(Event $event, \Cake\Datasource\EntityInterface $entity, ArrayObject $options) {
$coords = $entity->position->getCoordinates();
$entity->position = new \Cake\Database\Expression\QueryExpression('POINT(' . $coords[0] . ',' . $coords[1] . ')');
}
where $entity->position is of Type Point(additional lib, but this could be anything). This works in general, but modifies actuall entity so if i do
$entity->position=new Point(5,10);
$table->save($position);
$entity->position // <- this will be Query expression insteed of Point(5,10);
And in the third step I want to still have my Point object, and not modified value -QueryExpression.
I know that CakePHP supposed to have support for custom database datatypes but it is useless due to... well it just does not work as I would expect it to work. Here is a GitHub issue describing why remomended (by the docs) solution does not work.
https://github.com/cakephp/cakephp/issues/8041

Separation between TBox and ABox in Fuseki with TDB and Pubby

For my current project I need to load a dataset and different ontologies and expose everything as linked data using Fuseki with TDB and Pubby. Pubby will take a data set from a single location and create URIs based on that location, therefore if we need multiple different locations (as in the case with 2–3 separate ontologies), that would be easy to do with Pubby by adding another data set.
The concept of dataset seems to also apply to Fuseki.
Essentially I will need to expose three types of URIs:
www.mywebsite.com/project/data
www.mywebsite.com/project/data/structure
www.mywebsite.com/project/ontology
In order to create such URIs with Pubby 0.3.3. you will have to specify lines like these:
conf:dataset [
conf:sparqlEndpoint <sparql_endpoint_url_ONE>;
conf:sparqlDefaultGraph <sparql_default_graph_name_ONE>;
conf:datasetBase <http://mywebsite.com/project/>;
conf:datasetURIPattern "(data)/.*";
(...)
]
Each data set specified in Pubby will take its data from a certain URL (typically a SPARQL endpoint).
For ontologies you will have a dataset that uses the second a datasetURIPattern like this one:
conf:dataset [
conf:sparqlEndpoint <sparql_endpoint_url_TWO>;
conf:sparqlDefaultGraph <sparql_default_graph_name_TWO>;
conf:datasetBase <http://mywebsite.com/project/>;
conf:datasetURIPattern "(ontology)/.*";
(...)
]
As you can see the differences would be at the following: conf:sparqlEndpoint (the SPARQL endpoint), conf:sparqlDefaultGraph (the default Graph), conf:datasetURIPattern (needed for creating the actual URIs with Pubby).
It is not however clear to me how can I have separate URIs for the data sets when using Fuseki. When using Sesame, for example, I just create two different repositories and this trick works like charm when publishing data with Pubby. Not immediately clear with
The examples from the official Fuseki documentation present a single dataset (read-only or not, etc), but none of them seem to present such a scenario. There is no immediate example where there is a clear separation between the TBox and the ABox, even though this is a fundamental principle of Linked Data (see Keeping ABox and TBox Split).
As far as I understand this should be possible, but how? Also is it correct that the TBox and ABox can be reunited under a single SPARQL endpoint later by using (tdb:unionDefaultGraph true ;).

The dataset concept is not unique to Jena Fuseki; it's quite central in SPARQL. A dataset is a collection of named graphs and a default graph. The prefix of a URI has nothing to do with where triples about it are stored (whether in a named graph or in the default graph).
It sounds like you want to keep your ABox triples in one named graph and your TBox triples in another. Then, if the default graph is the union of the named graphs, you'll see the contents of both in the default graph. You can do that in Fuseki.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas