Apache Jena - Is it possible to write to output the BASE directive? - serialization

I just started using Jena Apache, on their introduction they explain how to write out the created model. As input I'm using a Turtle syntax file containing some data about some OWL ontologies, and I'm using the #base directive to use relative URI's on the syntax:
#base <https://valbuena.com/ontology-test/> .
And then writing my data as:
<sensor/AD590/1> a sosa:Sensor ;
rdfs:label "AD590 #1 temperatue sensor"#en ;
sosa:observes <room/1#Temperature> ;
ssn:implements <MeasureRoomTempProcedure> .
Apache Jena is able to read that #base directive and expands the relative URI to its full version, but when I write it out Jena doesn't write the #base directive and the relative URI's. The output is shown as:
<https://valbuena.com/ontology-test/sensor/AD590/1> a sosa:Sensor ;
rdfs:label "AD590 #1 temperatue sensor"#en ;
sosa:observes <https://valbuena.com/ontology-test/room/1#Temperature> ;
ssn:implements <https://valbuena.com/ontology-test/MeasureRoomTempProcedure> .
My code is the following:
Model m = ModelFactory.createOntologyModel();
String base = "https://valbuena.com/ontology-test/";
InputStream in = FileManager.get().open("src/main/files/example.ttl");
if (in == null) {
System.out.println("file error");
return;
} else {
m.read(in, null, "TURTLE");
}
m.write(System.out, "TURTLE");
There are multiple read and write commands that take as parameter the base:
On the read() I've found that if on the data file the #base isn't declared it must be done on the read command, othwerwise it can be set to null.
On the write() the base parameter is optional, it doesn't matter if I specify the base (even like null or an URI) or not, the output is always the same, the #base doesn't appear and all realtive URI's are full URI's.
I'm not sure if this is a bug or it's just not possible.

First - consider using a prefix like ":" -- this is not the same as base but makes the output nice as well.
You can configure the base with (current version of Jena):
RDFWriter.create()
.source(model)
.lang(Lang.TTL)
.base("http://base/")
.output(System.out);

It seems that the command used on the introduction tutorial of Jena RDF API is not updated and they show the reading method I showed before (FileManager) which now is replaced by RDFDataMgr. The FileManager way doesn't work with "base" directive well.
After experimenting I've found that the base directive works well with:
Model model = ModelFactory.createDefaultModel();
RDFDataMgr.read(model,"src/main/files/example.ttl");
model.write(System.out, "TURTLE", base);
or
Model model = ModelFactory.createDefaultModel();
model.read("src/main/files/example.ttl");
model.write(System.out, "TURTLE", base);
Although the model.write() command is said to be legacy on RDF output documentation (whereas model.read() is considered common on RDF input documentation, don't understand why), it is the only one I have found that allows the "base" parameter (required to put the #base directive on the output again), RDFDataMgr write methods don't include it.
Thanks to #AndyS for providing a simpler way to read the data, which led to fix the problem.

#AndyS's answer allowed me to write relative URIs to the file, but did not include the base in use for RDFXML variations. To get the xml base directive added correctly, I had to use the following
RDFDataMgr.read(graph, is, Lang.RDFXML);
Map<String, Object> properties = new HashMap<>();
properties.put("xmlbase", "http://example#");
Context cxt = new Context();
cxt.set(SysRIOT.sysRdfWriterProperties, properties);
RDFWriter.create().source(graph).format(RDFFormat.RDFXML_PLAIN).base("http://example#").context(cxt).output(os);

Related

Parse portions of SPARQL and reuse them in RDF4j SparqlBuilder

I use some configuration logic to generate Sparql queries with RDF4j and the SparqlBuilder.
// prepare selectVariables, prefixes and whereCondition according to configuration
SelectQuery mainQuery = Queries.SELECT(selectVariables)
.prefix(prefixes)
.where(whereCondition)
Now I wish to allow for users to configure custom WHERE conditions to be used as SubSelects and composed with the rest of the query logic.
Since the configuration is YAML and the users are trained in Sparql, I wished to let users specify custom patterns as YAML multiline strings like this example
customQuery: |
?_ wdt:P31 wd:Q5;
wdt:P19/wdt:P131* wd:Q60.
This way I can let the users customize freely the different queries that I will generate based on the configured condition.
The problem
I already managed to parse the query fragment using RDFj SparqlParser:
SPARQLParserFactory PARSER_FACTORY = new SPARQLParserFactory();
QueryParser parser = PARSER_FACTORY.getParser();
ParsedQuery parsed = parser.parseQuery(query, null);
ProjectionVisitor projectionVisitor = new ProjectionVisitor();
parsed.getTupleExpr().visit(projectionVisitor);
TupleExpr parsedExpression = projectionVisitor.getProjectionArg();
but I can't use the parsedExpression into the SparqlBuilder methods, the nodes representation for the parser looks incompatible with the ones for the fluent builder.
Is there any way to use parsed expressions inside the SparqlBuilder?
No, it is not possible to use parsed expressions in the SparqlBuilder. What you could probably do instead though (freewheeling here) is use the SparqlBuilder to generate a query with a placeholder pattern of some sort, parse that, and then use a parse tree visitor to find that placeholder pattern and replace it with the custom parsed expression you got from the user.

imported .owl files have #'s in prefixes vs original rdf4j triplestore

When I import the dump "PathwayCommons12.All.BIOPAX.owl.gz" (linked from this page) of this Virtuoso triplestore, I've noticed that there are "#"s inserted after the prefix of various URIs.
In particular, the following query runs on the original endpoint:
# Query 1
PREFIX pfx: <http://pathwaycommons.org/pc12/>
select ?pw
where {
?pw a bp:Pathway
values ?pw {pfx:Pathway_c2fd3d95c8c65552a0514393ede60c37}
}
But to get it running on the local endpoint (imported owl dump) I have to add a "#" to the end of pfx: like:
# Query 2
PREFIX pfx: <http://pathwaycommons.org/pc12/#>
select ?pw
where {
?pw a bp:Pathway
values ?pw {pfx:Pathway_c2fd3d95c8c65552a0514393ede60c37}
}
Note that Query 1 works only on the original endpoint, while Query 2 works only on the local endpoint.
What is going on here?
If we look at the first few lines of that massive RDF/XML file, we see:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:bp="http://www.biopax.org/release/biopax-level3.owl#"
xml:base="http://pathwaycommons.org/pc12/">
<owl:Ontology rdf:about="">
<owl:imports rdf:resource="http://www.biopax.org/release/biopax-level3.owl#" />
</owl:Ontology>
<bp:ExperimentalForm rdf:ID="ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0">
<bp:comment rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">[ExperimentalFormVocabulary_bait]</bp:comment>
<bp:experimentalFormDescription rdf:resource="#ExperimentalFormVocabulary_701737e5cf53d06134cbd3ee59611827" />
</bp:ExperimentalForm>
Note the value of the rdf:ID attribute here: "ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0". This is a relative URI, and needs to be resolved against the base URI (which is declared in the document header: "http://pathwaycommons.org/pc12/"). How this resolution is supposed to happen is described in section 2.14 of the RDF/XML syntax specifcation:
The rdf:ID attribute on a node element (not property element, that has another meaning) can be used instead of rdf:about and gives a relative IRI equivalent to # concatenated with the rdf:ID attribute value. So for example if rdf:ID="name", that would be equivalent to rdf:about="#name".
(emphasis mine)
Example 16 in the specification illustrates this further.
What it comes down to is that in parsing this RDF/XML, the values supplied as rdf:ID attributes all resolve to http://pathwaycommons.org/pc12/#<ID>. So the result you're getting in GraphDB is correct for the given input. Why it is different in the Virtuoso endpoint I don't know: either they used a different input file, or they have a bug in their parser, or whatever tool was used to produce this dump file contains a bug.
It is probably safe to say that the intent of whoever created the dump file was that
rdf:ID="ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0" would resolve to the IRI http://pathwaycommons.org/pc12/ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0 (that, is without the added # character). There are several ways to fix this in the file: either replace all occurrences of rdf:ID with rdf:about, or else don't rely on relative URI resolution and just use the full URI as the rdf:ID value.

Path queries in Wikidata endpoint?

Consider the following snippet
ASK WHERE { wd:Q734774 wdt:P31 wd:Q3918. }
This works fine in Wikidata. I want to use some of the path syntax in the this snippet. Specifically I want to limit the number of the times "wdt:P31" used in the path. According to the guidelines this should be the right syntax:
ASK WHERE { wd:Q734774 wdt:P31{,3} wd:Q3918. }
But it's giving me weird error messages. Any ideas?
The final version of SPARQL 1.1 Property Paths lets you do this with the following query --
ASK WHERE
{ wd:Q734774
wdt:P31? / wdt:P31? / wdt:P31?
wd:Q3918
}
For clarity, I've put the full Property Path Predicate (wdt:P31? / wdt:P31? / wdt:P31?) on a separate line between Subject (wd:Q734774) and Object (wd:Q3918). The trailing ? asks for one-or-zero instances of the wdt:P31 predicate, and the / asks for a sequence, so this full path asks for a sequence of zero-or-one-or-two-or-three instances.

SPARUL query to drop most graphs, using Jena API

I am trying to clear most of the graphs contained in my local Virtuoso triple store, using Apache Jena, as part of my clean up process before and after my unit tests. I think that something like this should be done. First, I retrieve the graph URIs to be deleted; then I execute a SPARUL Drop operation.
String sparqlEndpointUsername = ...;
String sparqlEndpointPassword = ...;
String sparqlQueryString = ...; // Returns the URIs of the graphs to be deleted
HttpAuthenticator authenticator = new SimpleAuthenticator(sparqlEndpointUsername,
sparqlEndpointPassword.toCharArray());
ResultSet resultSetToReturn = null;
try (QueryEngineHTTP queryEngine = new QueryEngineHTTP(sparqlEndpoint, sparqlQueryString, authenticator)) {
resultSetToReturn = queryEngine.execSelect();
resultSetToReturn = ResultSetFactory.copyResults(resultSetToReturn);
while(resultSetToReturn.hasNext()){
String graphURI = resultSetToReturn.next().getResource("?g").getURI();
UpdateRequest request = UpdateFactory.create() ;
request.add("DROP GRAPH <"+graphURI+">");
Dataset dataset = ...; // how can I create a default dataset pointing to my local virtuoso installation?
// And perform the operations.
UpdateAction.execute(request, dataset) ;
}
}
;
Questions:
As shown in this example, ARQ needs a dataset to operate on. How would I create this dataset pointing to my local Virtuoso installation for an update operation?
Is there perhaps an alternative to my approach? Would using another approach (apart from jena) be a better idea?
Please note that I am not trying to delete all graphs. I am deleting only the graphs whose names are returned through the SPARQL query defined in the beginning (3rd line).
Your question appears to be specific to Virtuoso, and meant to remove all RDF data, so you could use Virtuoso's built-in RDF_GLOBAL_RESET() function.
This is not a SPARQL/SPARUL query; it is usually issued through an SQL connection -- which could be JDBC, ODBC, ADO.NET, OLE DB, iSQL, etc.
That said, as you are connecting through a SPARUL-privileged connection, you should be able to use Virtuoso's (limited) SQL-in-SPARQL support, a la --
SELECT
( bif:RDF_GLOBAL_RESET() AS reset )
WHERE
{ ?s ?p ?o }
LIMIT 1
(Executing this through an unprivileged connection like the default SPARQL endpoint will result in an error like Virtuoso 37000 Error SP031: SPARQL compiler: Function bif:RDF_GLOBAL_RESET() can not be used in text of SPARQL query due to security restrictions.)
(ObDisclaimer: OpenLink Software produces Virtuoso, and employs me.)
You can build a single SPARQL Update request:
DROP GRAPH <g1> ;
DROP GRAPH <g2> ;
DROP GRAPH <g3> ;
... ;
because in SPARQL Update one HTTP requests can be several update operations, separated by ;.

Use same URL for both query and update

I know that by default fuseki provides different urls for both query and update, allowing some elegant management.
Now, i want to get a single URL for both update and query. The rationale behind this need is to avoid the propagation of two urls in the codebase.
I know that update and query codes should be separated, but my requests are not mixed. It's just to avoid the propagation of two objects instead of one.
My current config looks like:
<#service1> rdf:type fuseki:Service ;
fuseki:name "dataset" ; # http://host:port/dataset
fuseki:serviceQuery "endpoint" ; # SPARQL query service
fuseki:serviceUpdate "endpoint" ; # SPARQL update service
fuseki:dataset <#dataset> ;
.
In theory, an interface exists at /endpoint, but only accept update. When query with:
prefix sfm: <sfm/>
SELECT DISTINCT ?value
WHERE {
sfm:config sfm:component ?value.
}
the server reports many lines like the following:
INFO [4] POST http://localhost:9876/sfm/endpoint
INFO [4] POST /sfm :: 'endpoint' :: [application/x-www-form-urlencoded] ?
INFO [4] 400 SPARQL Update: No 'update=' parameter (0 ms)
I can't find anything in the doc that specify that query and update service can't be at same place, so i'm assume it's possible and i've just missed something.
However the last line of log is explicit: fuseki waits for an update.
One other solution could be to define the url as localhost/dataset/, and depending if i query or update, add the relevant part at the end, giving respectively localhost/dataset/query and localhost/dataset/update.
But (1) this lead the database to need to have a particular url naming, and (2) it looks like a strong requirement about the triplestore: when i will use another one, it will have to provide the same interface, which could be not possible. (don't know if this feature is implemented in other triplestores)
EDIT: fix the POST/GET error
405 HTTP method not allowed: SPARQL Update : use POST
It looks like you are using GET for an SPARQL Update.
It has correctly routed the operation to the update processor (you can use the same endpoint - including dropping the service part and just using the dataset URL).
However, in HTTP, GET are cacheable operations and should not be used when they can cause changes. a GET may not actually reach the end server but some intermediate respond to it from a web cache.
Use POST.
The same is true if you separate services for query and update.
Original Context
The original question has been edited. The original report was asking about this:
INFO [1] 405 HTTP method not allowed: SPARQL Update : use POST (2 ms)
Answer to the revised and different question:
The endpoint for shared services is the dataset URL:
http://localhost:9876/sfm
Whether update, query or services are available is controlled by the configuration file.
Setting fuseki:serviceQuery and fuseki:serviceUpdate the same is not necessary and is discouraged.