SPARUL query to drop most graphs, using Jena API

SPARUL query to drop most graphs, using Jena API - sparql

I am trying to clear most of the graphs contained in my local Virtuoso triple store, using Apache Jena, as part of my clean up process before and after my unit tests. I think that something like this should be done. First, I retrieve the graph URIs to be deleted; then I execute a SPARUL Drop operation.
String sparqlEndpointUsername = ...;
String sparqlEndpointPassword = ...;
String sparqlQueryString = ...; // Returns the URIs of the graphs to be deleted
HttpAuthenticator authenticator = new SimpleAuthenticator(sparqlEndpointUsername,
sparqlEndpointPassword.toCharArray());
ResultSet resultSetToReturn = null;
try (QueryEngineHTTP queryEngine = new QueryEngineHTTP(sparqlEndpoint, sparqlQueryString, authenticator)) {
resultSetToReturn = queryEngine.execSelect();
resultSetToReturn = ResultSetFactory.copyResults(resultSetToReturn);
while(resultSetToReturn.hasNext()){
String graphURI = resultSetToReturn.next().getResource("?g").getURI();
UpdateRequest request = UpdateFactory.create() ;
request.add("DROP GRAPH <"+graphURI+">");
Dataset dataset = ...; // how can I create a default dataset pointing to my local virtuoso installation?
// And perform the operations.
UpdateAction.execute(request, dataset) ;
}
}
;
Questions:
As shown in this example, ARQ needs a dataset to operate on. How would I create this dataset pointing to my local Virtuoso installation for an update operation?
Is there perhaps an alternative to my approach? Would using another approach (apart from jena) be a better idea?
Please note that I am not trying to delete all graphs. I am deleting only the graphs whose names are returned through the SPARQL query defined in the beginning (3rd line).

Your question appears to be specific to Virtuoso, and meant to remove all RDF data, so you could use Virtuoso's built-in RDF_GLOBAL_RESET() function.
This is not a SPARQL/SPARUL query; it is usually issued through an SQL connection -- which could be JDBC, ODBC, ADO.NET, OLE DB, iSQL, etc.
That said, as you are connecting through a SPARUL-privileged connection, you should be able to use Virtuoso's (limited) SQL-in-SPARQL support, a la --
SELECT
( bif:RDF_GLOBAL_RESET() AS reset )
WHERE
{ ?s ?p ?o }
LIMIT 1
(Executing this through an unprivileged connection like the default SPARQL endpoint will result in an error like Virtuoso 37000 Error SP031: SPARQL compiler: Function bif:RDF_GLOBAL_RESET() can not be used in text of SPARQL query due to security restrictions.)
(ObDisclaimer: OpenLink Software produces Virtuoso, and employs me.)

You can build a single SPARQL Update request:
DROP GRAPH <g1> ;
DROP GRAPH <g2> ;
DROP GRAPH <g3> ;
... ;
because in SPARQL Update one HTTP requests can be several update operations, separated by ;.

Related

Apache Jena - Is it possible to write to output the BASE directive?

I just started using Jena Apache, on their introduction they explain how to write out the created model. As input I'm using a Turtle syntax file containing some data about some OWL ontologies, and I'm using the #base directive to use relative URI's on the syntax:
#base <https://valbuena.com/ontology-test/> .
And then writing my data as:
<sensor/AD590/1> a sosa:Sensor ;
rdfs:label "AD590 #1 temperatue sensor"#en ;
sosa:observes <room/1#Temperature> ;
ssn:implements <MeasureRoomTempProcedure> .
Apache Jena is able to read that #base directive and expands the relative URI to its full version, but when I write it out Jena doesn't write the #base directive and the relative URI's. The output is shown as:
<https://valbuena.com/ontology-test/sensor/AD590/1> a sosa:Sensor ;
rdfs:label "AD590 #1 temperatue sensor"#en ;
sosa:observes <https://valbuena.com/ontology-test/room/1#Temperature> ;
ssn:implements <https://valbuena.com/ontology-test/MeasureRoomTempProcedure> .
My code is the following:
Model m = ModelFactory.createOntologyModel();
String base = "https://valbuena.com/ontology-test/";
InputStream in = FileManager.get().open("src/main/files/example.ttl");
if (in == null) {
System.out.println("file error");
return;
} else {
m.read(in, null, "TURTLE");
}
m.write(System.out, "TURTLE");
There are multiple read and write commands that take as parameter the base:
On the read() I've found that if on the data file the #base isn't declared it must be done on the read command, othwerwise it can be set to null.
On the write() the base parameter is optional, it doesn't matter if I specify the base (even like null or an URI) or not, the output is always the same, the #base doesn't appear and all realtive URI's are full URI's.
I'm not sure if this is a bug or it's just not possible.

First - consider using a prefix like ":" -- this is not the same as base but makes the output nice as well.
You can configure the base with (current version of Jena):
RDFWriter.create()
.source(model)
.lang(Lang.TTL)
.base("http://base/")
.output(System.out);

It seems that the command used on the introduction tutorial of Jena RDF API is not updated and they show the reading method I showed before (FileManager) which now is replaced by RDFDataMgr. The FileManager way doesn't work with "base" directive well.
After experimenting I've found that the base directive works well with:
Model model = ModelFactory.createDefaultModel();
RDFDataMgr.read(model,"src/main/files/example.ttl");
model.write(System.out, "TURTLE", base);
or
Model model = ModelFactory.createDefaultModel();
model.read("src/main/files/example.ttl");
model.write(System.out, "TURTLE", base);
Although the model.write() command is said to be legacy on RDF output documentation (whereas model.read() is considered common on RDF input documentation, don't understand why), it is the only one I have found that allows the "base" parameter (required to put the #base directive on the output again), RDFDataMgr write methods don't include it.
Thanks to #AndyS for providing a simpler way to read the data, which led to fix the problem.

#AndyS's answer allowed me to write relative URIs to the file, but did not include the base in use for RDFXML variations. To get the xml base directive added correctly, I had to use the following
RDFDataMgr.read(graph, is, Lang.RDFXML);
Map<String, Object> properties = new HashMap<>();
properties.put("xmlbase", "http://example#");
Context cxt = new Context();
cxt.set(SysRIOT.sysRdfWriterProperties, properties);
RDFWriter.create().source(graph).format(RDFFormat.RDFXML_PLAIN).base("http://example#").context(cxt).output(os);

How to create and use GeoSpatial indexes in Marklogic from Sparql

I have loaded the geospatial data from geonames.org into Marklogic using RDF import.
When using the Query Console to explore the data, I see the data has been loaded into an xml document and looks like this:
<sem:triple>
<sem:subject>http://sws.geonames.org/2736540/</sem:subject>
<sem:predicate>http://www.w3.org/2003/01/geo/wgs84_pos#lat</sem:predicate>
<sem:object datatype="http://www.w3.org/2001/XMLSchema#string">40.41476</sem:object>
</sem:triple>
<sem:triple>
<sem:subject>http://sws.geonames.org/2736540/</sem:subject>
<sem:predicate>http://www.w3.org/2003/01/geo/wgs84_pos#long</sem:predicate>
<sem:object datatype="http://www.w3.org/2001/XMLSchema#string">-8.54304</sem:object>
</sem:triple>
I am able to do a SPARQL DESCRIBE and see data. Here is an example.
#prefix geonames: <http://www.geonames.org/ontology#> .
#prefix xs: <http://www.w3.org/2001/XMLSchema#> .
#prefix p0: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
<http://sws.geonames.org/2736540/> geonames:parentCountry <http://sws.geonames.org/2264397/> ;
geonames:countryCode "PT"^^xs:string ;
p0:long "-8.54304"^^xs:string ;
geonames:featureCode <http://www.geonames.org/ontology#P.PPL> ;
geonames:parentADM1 <http://sws.geonames.org/2742610/> ;
geonames:parentFeature <http://sws.geonames.org/2742610/> ;
<http://www.w3.org/2000/01/rdf-schema#isDefinedBy> "http://sws.geonames.org/2736540/about.rdf"^^xs:string ;
a geonames:Feature ;
geonames:locationMap <http://www.geonames.org/2736540/pedreira-de-vilarinho.html> ;
geonames:name "Pedreira de Vilarinho"^^xs:string ;
geonames:nearbyFeatures <http://sws.geonames.org/2736540/nearby.rdf> ;
geonames:featureClass geonames:P ;
p0:lat "40.41476"^^xs:string .
I want to query over this data using SPARQL QUERY as my Query Type in a way where I can take advantage of the geospatial indexes that MarkLogic can create.
I have been having trouble with two aspects of this.
How to correctly create the geospatial indexes for the wgs84_pos#lat and wgs84_pos#long predicates?
How do I access those indexes from SPARQL QUERY?
I would like to have a sparql query that would be able to find subjects within some range of a Point.
=====================================
Followup:
After following David Ennis's Answer (Which worked nicely, thanks!) I ended up with this sample Xquery that was able to select data out of documents via geosearch and then use those IRI's in a sparql values query.
Example:
xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics"
at "/MarkLogic/semantics.xqy";
let $matches := cts:search(//rdf:RDF,
cts:element-pair-geospatial-query (
fn:QName("http://www.geonames.org/ontology#","Feature"),
fn:QName("http://www.w3.org/2003/01/geo/wgs84_pos#", "lat"),
fn:QName ("http://www.w3.org/2003/01/geo/wgs84_pos#","long"),
cts:circle(10, cts:point(19.8,99.8))))
let $iris := sem:iri($matches//#rdf:about)
let $bindings := (fn:map(function($n) { map:entry("featureIRI", $n) }, $iris))
let $sparql := '
PREFIX wgs: <http://www.w3.org/2003/01/geo/wgs84_pos#>
SELECT *
WHERE {
?featureIRI wgs:lat ?lat;
wgs:long ?long.
}
'
return sem:sparql-values($sparql, $bindings)
This xquery queries the geospatial index, finds matching documents and then selects the IRI in the rdf:about attribute of the xml document.
It then maps over all of those IRIs and creates map entries that can be passed in the bindings parameter of the sem:sparql-values function.

I do not believe you can do what you want via just native SPARQL. Geospacial queries in any SPARQL implementation are extensions like geoSPARQL, Apache Jena geospacial queries etc.
My suggested approach in MarkLogic:
Insert the geonames subjects into MarkLogic as unmanaged triples (an XML or JSON document with embedded triples for each one)
In the same document, include the geo-spacial data in one of the acceptable MarkLogic formats. This essentially adds geo-spacial metadata to the triple since it is in the same fragment.
Add geo-spacial path-range-indexes for the geospacial data.
Use SPARQL inside of MarkLogic with a cts query restriction.
The Building Blocks for above:
Understanding unmanaged triples
Understanding Geo-spacial Region Types
Understanding Geo-spacial Indexes
Understanding Geo-spacial Queries
Understanding Semantics with cts search
Another approach to the final query could be the Optic API but I do not see how it would negate the need to do steps 1-3

Use same URL for both query and update

I know that by default fuseki provides different urls for both query and update, allowing some elegant management.
Now, i want to get a single URL for both update and query. The rationale behind this need is to avoid the propagation of two urls in the codebase.
I know that update and query codes should be separated, but my requests are not mixed. It's just to avoid the propagation of two objects instead of one.
My current config looks like:
<#service1> rdf:type fuseki:Service ;
fuseki:name "dataset" ; # http://host:port/dataset
fuseki:serviceQuery "endpoint" ; # SPARQL query service
fuseki:serviceUpdate "endpoint" ; # SPARQL update service
fuseki:dataset <#dataset> ;
.
In theory, an interface exists at /endpoint, but only accept update. When query with:
prefix sfm: <sfm/>
SELECT DISTINCT ?value
WHERE {
sfm:config sfm:component ?value.
}
the server reports many lines like the following:
INFO [4] POST http://localhost:9876/sfm/endpoint
INFO [4] POST /sfm :: 'endpoint' :: [application/x-www-form-urlencoded] ?
INFO [4] 400 SPARQL Update: No 'update=' parameter (0 ms)
I can't find anything in the doc that specify that query and update service can't be at same place, so i'm assume it's possible and i've just missed something.
However the last line of log is explicit: fuseki waits for an update.
One other solution could be to define the url as localhost/dataset/, and depending if i query or update, add the relevant part at the end, giving respectively localhost/dataset/query and localhost/dataset/update.
But (1) this lead the database to need to have a particular url naming, and (2) it looks like a strong requirement about the triplestore: when i will use another one, it will have to provide the same interface, which could be not possible. (don't know if this feature is implemented in other triplestores)
EDIT: fix the POST/GET error

405 HTTP method not allowed: SPARQL Update : use POST
It looks like you are using GET for an SPARQL Update.
It has correctly routed the operation to the update processor (you can use the same endpoint - including dropping the service part and just using the dataset URL).
However, in HTTP, GET are cacheable operations and should not be used when they can cause changes. a GET may not actually reach the end server but some intermediate respond to it from a web cache.
Use POST.
The same is true if you separate services for query and update.
Original Context
The original question has been edited. The original report was asking about this:
INFO [1] 405 HTTP method not allowed: SPARQL Update : use POST (2 ms)

Answer to the revised and different question:
The endpoint for shared services is the dataset URL:
http://localhost:9876/sfm
Whether update, query or services are available is controlled by the configuration file.
Setting fuseki:serviceQuery and fuseki:serviceUpdate the same is not necessary and is discouraged.

How Do I Query Against Data.gov

I am trying to teach myself this weekend how to run API queries against a data source in this case data.gov. At first I thought I'd use a simple SQL variant, but it seems in this case I have to use SPARQL.
I've read through the documentation, downloaded Twinkle, and can't seem to quite get it to run. Here is an example of a query I'm running. I'm basically trying to find all gas stations that are null around Denver, CO.
PREFIX station: https://api.data.gov/nrel/alt-fuel-stations/v1/nearest.json?api_key=???location=Denver+CO
SELECT *
WHERE
{ ?x station:network ?network like "null"
}
Any help would be very much appreciated.

SPARQL is a graph pattern language for RDF triples. A query consists of a set of "basic graph patterns" described by triple patterns of the form <subject>, <predicate>, <object>. RDF defines the subject and predicate with URI's and the object is either a URI (object property) or literal (datatype or language-tagged property). Each triple pattern in a query must therefore have three entities.
Since we don't have any examples of your data, I'll provide a way to explore the data a bit. Let's assume your prefix is correctly defined, which I doubt - it will not be the REST API URL, but the URI of the entity itself. Then you can try the following:
PREFIX station: <http://api.data.gov/nrel...>
SELECT *
WHERE
{ ?s station:network ?network .
}
...setting the PREFIX to correctly represent the namespace for network. Then look at the binding for ?network and find out how they represent null. Let's say it is a string as you show. Then the query would look like:
PREFIX station: <http://api.data.gov/nrel...>
SELECT ?s
WHERE
{ ?s station:network "null" .
}
There is no like in SPARQL, but you could use a FILTER clause using regex or other string matching features of SPARQL.
And please, please, please google "SPARQL" and "RDF". There is lots of information about SPARQL, and the W3C's SPARQL 1.1 Query Language Recommendation is a comprehensive source with many good examples.

Adding Individuals in Jena OntMOdel and accessing it. Exception ObjectFileStorage.read Impossibly large object

I am trying to add some individuals to my existing Ontology (OntModel) with an objective to add the values/literals for DatatypeProperty with a specific datatype known at runtime from the range of the datatypeproperty. My OntModel is backed by a TDBStore, linked to a dataset(so any changes made to the OntModel ins reflected back to my TDBStore/dataset). Individuals are added as per the following code :
Individual ind =oc.createIndividual(namespace+nameOfIndividual); // oc is OntClass object
Literal l = ontModel.createTypedLiteral("1230",dp.getRange().getURI()); //dp is a DatatypeProperty object
ind.addLiteral(dp,l);
When the code executes the the literal is added and a sparql query :
"SELECT * WHERE { ?s :ABCConstant ?o }"; //:ABCConstant is the datatypeProperty for which the literal is added by the above code.
Gives me the following result:
------------------------------------------------------------------------------------------------------------------------------------------------
| s | o |
================================================================================================================================================
| <http://www.semanticweb.org/ontologies/2012/10/Ontology.owl#individual1ABCDConstant> | "10000.0"^^<http://www.w3.org/2001/XMLSchema#int> |
| <http://www.semanticweb.org/ontologies/2012/10/Ontology.owl#individual2ABCDConstant> | 1230 |
------------------------------------------------------------------------------------------------------------------------------------------------
But when i try to use the same query in a second run(without creating any individual this time) I get the following Exception when the program tries to display/access the result:
Exception in thread "main" com.hp.hpl.jena.tdb.base.file.FileException: ObjectFileStorage.read[nodes](181624)[filesize=248062][file.size()=248062]: Impossibly large object : 1634628966 bytes > filesize-(loc+SizeOfInt)=66434
at com.hp.hpl.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:346)
at com.hp.hpl.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:78)
at com.hp.hpl.jena.tdb.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:178)
at com.hp.hpl.jena.tdb.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:103)
at com.hp.hpl.jena.tdb.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:74)
at com.hp.hpl.jena.tdb.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:103)
at com.hp.hpl.jena.tdb.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:74)
at com.hp.hpl.jena.tdb.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:55)
at com.hp.hpl.jena.tdb.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67)
at com.hp.hpl.jena.tdb.lib.TupleLib.triple(TupleLib.java:126)
at com.hp.hpl.jena.tdb.lib.TupleLib.triple(TupleLib.java:114)
at com.hp.hpl.jena.tdb.lib.TupleLib.access$000(TupleLib.java:45)
at com.hp.hpl.jena.tdb.lib.TupleLib$3.convert(TupleLib.java:76)
at com.hp.hpl.jena.tdb.lib.TupleLib$3.convert(TupleLib.java:72)
at org.apache.jena.atlas.iterator.Iter$4.next(Iter.java:317)
at org.apache.jena.atlas.iterator.Iter$4.next(Iter.java:317)
at org.apache.jena.atlas.iterator.Iter$4.next(Iter.java:317)
at org.apache.jena.atlas.iterator.Iter.next(Iter.java:915)
at com.hp.hpl.jena.util.iterator.WrappedIterator.next(WrappedIterator.java:94)
at com.hp.hpl.jena.util.iterator.WrappedIterator.next(WrappedIterator.java:94)
at com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:55)
at com.hp.hpl.jena.graph.compose.CompositionBase$2.hasNext(CompositionBase.java:94)
at com.hp.hpl.jena.util.iterator.NiceIterator$1.hasNext(NiceIterator.java:103)
at com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIterTriplePattern$TripleMapper.hasNextBinding(QueryIterTriplePattern.java:151)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:81)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIterBlockTriples.hasNextBinding(QueryIterBlockTriples.java:64)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:40)
at com.hp.hpl.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:112)
at com.hp.hpl.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:75)
at com.hp.hpl.jena.sparql.resultset.ResultSetMem.<init>(ResultSetMem.java:97)
at com.hp.hpl.jena.query.ResultSetFactory.makeRewindable(ResultSetFactory.java:420)
at com.hp.hpl.jena.sparql.resultset.TextOutput.write(TextOutput.java:149)
at com.hp.hpl.jena.sparql.resultset.TextOutput.write(TextOutput.java:132)
at com.hp.hpl.jena.sparql.resultset.TextOutput.write(TextOutput.java:120)
at com.hp.hpl.jena.sparql.resultset.TextOutput.format(TextOutput.java:67)
at com.hp.hpl.jena.query.ResultSetFormatter.out(ResultSetFormatter.java:122)
at com.hp.hpl.jena.query.ResultSetFormatter.out(ResultSetFormatter.java:74)
at com.hp.hpl.jena.query.ResultSetFormatter.out(ResultSetFormatter.java:65)
at senseXploreApi.SparqlQuery.sparqlQuery(SparqlQuery.java:82)
at senseXploreApi.TryMain.main(TryMain.java:39)
Note: Other queries to view the added individuals execute fine.
But Queries like:
"SELECT * WHERE { ?s ?o 1230}"
lead to the same error.
Also query like :
"SELECT * WHERE { ?s ?o 10000.0}"
OR
"SELECT * WHERE { ?s ?o 10000}"
don't give any error but return nothing in result.
the literal 10000 was added using the statement :
ind.addLiteral(opp,new Integer(10000));
Please help me!! where am i wrong.. Is the procedure used to create individuals is wrong? if yes! then what can be other possible ways to add literals with specific datatype known at runtime?

The Impossibly Large Object message means that your TDB database is (partially) corrupted, specifically part of the node table that maps between database internal identifiers and the original RDF terms is corrupted. Any query that touches that part of the node table will see this error, other queries may still run without issue.
The only way to recover your database is to rebuild it from the original data, once the corruption has happened it cannot be repaired.
Corruption can happen in various ways, the most common of which are:
Accessing TDB in a non-transactional manner and failing to call sync() on the dataset after making changes (or more likely in your example failing to call close() on the Model after making changes)
Having multiple JVMs accessing the same TDB dataset at the same time, if your application requires this use a server like Fuseki to centralise access to the database.
See the TDB Transactions for how to use TDB transactionally or see the Fuseki documentation if you need to expose a TDB database to multiple JVMs simultaneously.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SPARUL query to drop most graphs, using Jena API - sparql

You can build a single SPARQL Update request: DROP GRAPH <g1> ; DROP GRAPH <g2> ; DROP GRAPH <g3> ; ... ; because in SPARQL Update one HTTP requests can be several update operations, separated by ;.

Related

Apache Jena - Is it possible to write to output the BASE directive?

How to create and use GeoSpatial indexes in Marklogic from Sparql

Use same URL for both query and update

How Do I Query Against Data.gov

Adding Individuals in Jena OntMOdel and accessing it. Exception ObjectFileStorage.read Impossibly large object

Categories

Resources