imported .owl files have #'s in prefixes vs original rdf4j triplestore - sparql

When I import the dump "PathwayCommons12.All.BIOPAX.owl.gz" (linked from this page) of this Virtuoso triplestore, I've noticed that there are "#"s inserted after the prefix of various URIs.
In particular, the following query runs on the original endpoint:
# Query 1
PREFIX pfx: <http://pathwaycommons.org/pc12/>
select ?pw
where {
?pw a bp:Pathway
values ?pw {pfx:Pathway_c2fd3d95c8c65552a0514393ede60c37}
}
But to get it running on the local endpoint (imported owl dump) I have to add a "#" to the end of pfx: like:
# Query 2
PREFIX pfx: <http://pathwaycommons.org/pc12/#>
select ?pw
where {
?pw a bp:Pathway
values ?pw {pfx:Pathway_c2fd3d95c8c65552a0514393ede60c37}
}
Note that Query 1 works only on the original endpoint, while Query 2 works only on the local endpoint.
What is going on here?

If we look at the first few lines of that massive RDF/XML file, we see:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:bp="http://www.biopax.org/release/biopax-level3.owl#"
xml:base="http://pathwaycommons.org/pc12/">
<owl:Ontology rdf:about="">
<owl:imports rdf:resource="http://www.biopax.org/release/biopax-level3.owl#" />
</owl:Ontology>
<bp:ExperimentalForm rdf:ID="ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0">
<bp:comment rdf:datatype = "http://www.w3.org/2001/XMLSchema#string">[ExperimentalFormVocabulary_bait]</bp:comment>
<bp:experimentalFormDescription rdf:resource="#ExperimentalFormVocabulary_701737e5cf53d06134cbd3ee59611827" />
</bp:ExperimentalForm>
Note the value of the rdf:ID attribute here: "ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0". This is a relative URI, and needs to be resolved against the base URI (which is declared in the document header: "http://pathwaycommons.org/pc12/"). How this resolution is supposed to happen is described in section 2.14 of the RDF/XML syntax specifcation:
The rdf:ID attribute on a node element (not property element, that has another meaning) can be used instead of rdf:about and gives a relative IRI equivalent to # concatenated with the rdf:ID attribute value. So for example if rdf:ID="name", that would be equivalent to rdf:about="#name".
(emphasis mine)
Example 16 in the specification illustrates this further.
What it comes down to is that in parsing this RDF/XML, the values supplied as rdf:ID attributes all resolve to http://pathwaycommons.org/pc12/#<ID>. So the result you're getting in GraphDB is correct for the given input. Why it is different in the Virtuoso endpoint I don't know: either they used a different input file, or they have a bug in their parser, or whatever tool was used to produce this dump file contains a bug.
It is probably safe to say that the intent of whoever created the dump file was that
rdf:ID="ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0" would resolve to the IRI http://pathwaycommons.org/pc12/ExperimentalForm_ee10aeab-1129-49ad-8217-4193f4fbf7e0 (that, is without the added # character). There are several ways to fix this in the file: either replace all occurrences of rdf:ID with rdf:about, or else don't rely on relative URI resolution and just use the full URI as the rdf:ID value.

Related

Is there any way to select some of values for SUMO to write to the output file

I would like to generate the FullOutput file in SUMO, but in https://sumo.dlr.de/docs/Simulation/Output/FullOutput.html we can see that, FullOutput file seems like that:
<full-export>
<data timestep="<TIME_STEP>">
<vehicles>
<vehicle id="<VEHICLE_ID>" eclass="<VEHICLE_ECLASS>" co2="<VEHICLE_CO2>" co="<VEHICLE_CO>" hc="<VEHICLE_HC>"
nox="<VEHICLE_NOX>" pmx="<VEHICLE_PMX>" fuel="<VEHICLE_FUEL>" electricity="<VEHICLE_ELECTRICITY>" noise="<VEHICLE_NOISE>" route="<VEHICLE_ROUTE>" type="<VEHICLE_TYPE>"
waiting="<VEHICLE_WAITING>" lane="<VEHICLE_LANE>" pos_lane="<VEHICLE_POS_LANE>" speed="<VEHICLE_SPEED>"
angle="<VEHICLE_ANGLE>" x="<VEHICLE_POS_X>" y="<VEHICLE_POS_Y>"/>
... more vehicles ...
</vehicles>
<edges>
<edge id="<EDGE_ID>" traveltime="<EDGE_TRAVELTIME>">
<lane id="<LANE_ID>" co="<LANE_CO>" co2="<LANE_CO2>" nox="<LANE_NOX>" pmx="<LANE_CO>"
hc="<LANE_HC>" noise="<LANE_NOISE>" fuel="<LANE_FUEL>" electricity="<LANE_ELECTRICITY>" maxspeed="<LANE_MAXSPEED>" meanspeed="<LANE_MEANSPEED>"
occupancy="<LANE_OCCUPANCY>" vehicle_count="<LANE_VEHICLES_COUNT>"/>
... more lanes of the edge if exists
</edge>
... more edges of the network
</edges>
<tls>
<trafficlight id="0/0" state="GgGr"/>
... more traffic lights
</tls>
</data>
... the next timestep ...
</full-export>
The outputed .xml file is too big, usually more than 1GB, and it contains a lot of values, such as
eclass="<VEHICLE_ECLASS>" co2="<VEHICLE_CO2>" co="<VEHICLE_CO>" hc="<VEHICLE_HC>"
nox="<VEHICLE_NOX>" pmx="<VEHICLE_PMX>" fuel="<VEHICLE_FUEL>"
which I don't need.
So I wonder, is there any way to select some of values I need to output?
You have different options here:
Use a different output. Maybe fcd-output is already enough. It contains all the vehicle positions and you can usually aggregate it yourself to edges if you want to. Furthermore fcd-output can also be given a list of attributes to write using --fcd-output.attributes (full-output does not have this feature).
Filter the output directly. Instead of giving an output file you can give a socket connection and sumo will direct the output there. See https://github.com/eclipse/sumo/blob/main/tests/complex/sumo/socketout/runner.py for an example.
If you are on Linux use a named pipe Example of using named pipes in Linux shell (Bash) and filter yourself
Redirect the output to the xml2csv.py script which removes at least the XML overhead and it may be easier to remove columns in a csv files depending on your setup.

Apache Jena - Is it possible to write to output the BASE directive?

I just started using Jena Apache, on their introduction they explain how to write out the created model. As input I'm using a Turtle syntax file containing some data about some OWL ontologies, and I'm using the #base directive to use relative URI's on the syntax:
#base <https://valbuena.com/ontology-test/> .
And then writing my data as:
<sensor/AD590/1> a sosa:Sensor ;
rdfs:label "AD590 #1 temperatue sensor"#en ;
sosa:observes <room/1#Temperature> ;
ssn:implements <MeasureRoomTempProcedure> .
Apache Jena is able to read that #base directive and expands the relative URI to its full version, but when I write it out Jena doesn't write the #base directive and the relative URI's. The output is shown as:
<https://valbuena.com/ontology-test/sensor/AD590/1> a sosa:Sensor ;
rdfs:label "AD590 #1 temperatue sensor"#en ;
sosa:observes <https://valbuena.com/ontology-test/room/1#Temperature> ;
ssn:implements <https://valbuena.com/ontology-test/MeasureRoomTempProcedure> .
My code is the following:
Model m = ModelFactory.createOntologyModel();
String base = "https://valbuena.com/ontology-test/";
InputStream in = FileManager.get().open("src/main/files/example.ttl");
if (in == null) {
System.out.println("file error");
return;
} else {
m.read(in, null, "TURTLE");
}
m.write(System.out, "TURTLE");
There are multiple read and write commands that take as parameter the base:
On the read() I've found that if on the data file the #base isn't declared it must be done on the read command, othwerwise it can be set to null.
On the write() the base parameter is optional, it doesn't matter if I specify the base (even like null or an URI) or not, the output is always the same, the #base doesn't appear and all realtive URI's are full URI's.
I'm not sure if this is a bug or it's just not possible.
First - consider using a prefix like ":" -- this is not the same as base but makes the output nice as well.
You can configure the base with (current version of Jena):
RDFWriter.create()
.source(model)
.lang(Lang.TTL)
.base("http://base/")
.output(System.out);
It seems that the command used on the introduction tutorial of Jena RDF API is not updated and they show the reading method I showed before (FileManager) which now is replaced by RDFDataMgr. The FileManager way doesn't work with "base" directive well.
After experimenting I've found that the base directive works well with:
Model model = ModelFactory.createDefaultModel();
RDFDataMgr.read(model,"src/main/files/example.ttl");
model.write(System.out, "TURTLE", base);
or
Model model = ModelFactory.createDefaultModel();
model.read("src/main/files/example.ttl");
model.write(System.out, "TURTLE", base);
Although the model.write() command is said to be legacy on RDF output documentation (whereas model.read() is considered common on RDF input documentation, don't understand why), it is the only one I have found that allows the "base" parameter (required to put the #base directive on the output again), RDFDataMgr write methods don't include it.
Thanks to #AndyS for providing a simpler way to read the data, which led to fix the problem.
#AndyS's answer allowed me to write relative URIs to the file, but did not include the base in use for RDFXML variations. To get the xml base directive added correctly, I had to use the following
RDFDataMgr.read(graph, is, Lang.RDFXML);
Map<String, Object> properties = new HashMap<>();
properties.put("xmlbase", "http://example#");
Context cxt = new Context();
cxt.set(SysRIOT.sysRdfWriterProperties, properties);
RDFWriter.create().source(graph).format(RDFFormat.RDFXML_PLAIN).base("http://example#").context(cxt).output(os);

How to configure Virtuoso URL rewrite rule to give SPARQL results in JSON?

JSON is available as result option in the SPARQL endpoint interface, but when configuring a rewrite rule it is missing. Currently SPARQL results options in Virtuoso 07.20.3217 are only "Automatic", "RDF/XML" and "Turtle".
How to configure the rule to give the results in JSON?
As answered in reply to your email to OpenLink Support --
This was an oversight in the Conductor interface, as the SPARQL query results can be returned in any of the serialization formats available from the /sparql Query Form page, which includes JSON. We have logged an internal enhancement request to have these additional formats added to the Conductor URL Rewrite Rule UI.
In the meantime, you can export an existing rule through the link in the Conductor UI, to see the SQL that is used to create the selected rule. For the default RDF/XML output format, it is presented as format=application%2Frdf%2Bxml.
Working from the list of supported output formats, you should be able to change the format=application%2Frdf%2Bxml in the exported rule to something like format=application%2Frdf%2Bjson to get your desired JSON output. You can then manually load the edited rule via isql, which will look something like:
DB.DBA.VHOST_REMOVE (
lhost=>'*ini*',
vhost=>'*ini*',
lpath=>'/rewrite-json'
);
DB.DBA.VHOST_DEFINE (
lhost=>'*ini*',
vhost=>'*ini*',
lpath=>'/rewrite-json',
ppath=>'/',
is_dav=>0,
is_brws=>0,
def_page=>'',
vsp_user=>'dba',
ses_vars=>0,
opts=>vector ('browse_sheet', '', 'url_rewrite', 'http_rule_list_1'),
is_default_host=>0
);
DB.DBA.URLREWRITE_CREATE_RULELIST (
'http_rule_list_1', 1,
vector ('http_rule_1')
);
DB.DBA.URLREWRITE_CREATE_REGEX_RULE (
'http_rule_1', 1,
'/rewrite-json',
vector (),
0,
'/sparql?query=select%%20%%2A%%20where%%20%%7B%%3Fs%%20%%3Fp%%20%%3Fo%%7D%%20limit%%205&format=application%2Frdf%2Bjson',
vector (),
NULL,
NULL,
2,
301,
''
);
Note: you will not be able to edit this rewrite rule in the Conductor until this issue is fixed therein, as the JSON output format will be over-written with one of those in the current list.
Also see this article about working with Virtuoso URL Rewrite Rules.

Use same URL for both query and update

I know that by default fuseki provides different urls for both query and update, allowing some elegant management.
Now, i want to get a single URL for both update and query. The rationale behind this need is to avoid the propagation of two urls in the codebase.
I know that update and query codes should be separated, but my requests are not mixed. It's just to avoid the propagation of two objects instead of one.
My current config looks like:
<#service1> rdf:type fuseki:Service ;
fuseki:name "dataset" ; # http://host:port/dataset
fuseki:serviceQuery "endpoint" ; # SPARQL query service
fuseki:serviceUpdate "endpoint" ; # SPARQL update service
fuseki:dataset <#dataset> ;
.
In theory, an interface exists at /endpoint, but only accept update. When query with:
prefix sfm: <sfm/>
SELECT DISTINCT ?value
WHERE {
sfm:config sfm:component ?value.
}
the server reports many lines like the following:
INFO [4] POST http://localhost:9876/sfm/endpoint
INFO [4] POST /sfm :: 'endpoint' :: [application/x-www-form-urlencoded] ?
INFO [4] 400 SPARQL Update: No 'update=' parameter (0 ms)
I can't find anything in the doc that specify that query and update service can't be at same place, so i'm assume it's possible and i've just missed something.
However the last line of log is explicit: fuseki waits for an update.
One other solution could be to define the url as localhost/dataset/, and depending if i query or update, add the relevant part at the end, giving respectively localhost/dataset/query and localhost/dataset/update.
But (1) this lead the database to need to have a particular url naming, and (2) it looks like a strong requirement about the triplestore: when i will use another one, it will have to provide the same interface, which could be not possible. (don't know if this feature is implemented in other triplestores)
EDIT: fix the POST/GET error
405 HTTP method not allowed: SPARQL Update : use POST
It looks like you are using GET for an SPARQL Update.
It has correctly routed the operation to the update processor (you can use the same endpoint - including dropping the service part and just using the dataset URL).
However, in HTTP, GET are cacheable operations and should not be used when they can cause changes. a GET may not actually reach the end server but some intermediate respond to it from a web cache.
Use POST.
The same is true if you separate services for query and update.
Original Context
The original question has been edited. The original report was asking about this:
INFO [1] 405 HTTP method not allowed: SPARQL Update : use POST (2 ms)
Answer to the revised and different question:
The endpoint for shared services is the dataset URL:
http://localhost:9876/sfm
Whether update, query or services are available is controlled by the configuration file.
Setting fuseki:serviceQuery and fuseki:serviceUpdate the same is not necessary and is discouraged.

How to query for namespace dynamically using Linq to XML

I have 290 Group Policy Backup xml files which I need to enumerate in separate folders.
With each Group Policy backup xml file, I need to query the Policy settings.
Anyone who's looked at a Group Policy xml backup file before would know they're chock-a-block full of Namespace declarations.
I want to know, using Linq to XML, as I query each xml file, how can I dynamically query the XML the Namespace and then append the Namespace into the Linq query for the child nodes/values?
Here are some examples of the xml structure.
<User>
<ExtensionData>
<Extension xmlns:q1="http://www.microsoft.com/GroupPolicy/Settings/Scripts" xsi:type="q1:Scripts">
<q1:Script>
<ExtensionData>
<Extension xmlns:q1="http://www.microsoft.com/GroupPolicy/Settings/IE" xsi:type="q1:InternetExplorerSettings">
<q1:PreferenceMode>true</q1:PreferenceMode>
<ExtensionData>
<Extension xmlns:q2="http://www.microsoft.com/GroupPolicy/Settings/Registry" xsi:type="q2:RegistrySettings">
<q2:Policy>
<q2:Name>Disable changing accessibility settings</q2:Name>
<q2:State>Enabled</q2:State>
<ExtensionData>
<Extension xmlns:q1="http://www.microsoft.com/GroupPolicy/Settings/DriveMaps" xsi:type="q1:DriveMapSettings">
<q1:DriveMapSettings clsid="{8FDDCC1A-0C3C-43cd-A6B4-71A6DF20DA8C}">
My initial code looks like this:
Dim NS As XNamespace = "http://www.microsoft.com/GroupPolicy/Settings"
NodeValue = XDoc.Descendants(NS + NodeName).First().Value
As you can see I'm going to face literally dozens of different Namespaces, at this stage I don't even know what they all are.
My end-task is to trawl through 290 directories, each containing one Group Policy xml backup file. I then need to read the Policy Name from each of the settings contained within the backup file.
Because I don't know what Policy settings each xml flie will contain, I don't know what Namespace(s) I need to use when attempting to read the xml values. Each xml file may even contain multiple namespaces.
How do I dynamically read the Namespace in Linq so I can read the values?
Thanks
Do you care about these namespaces? i.e. Do you distinguish them and do different kind of processing depending on the namespace-uri? If you don't (e.g. you just display something) you can do something like this:
XDoc.Descendants().Where(e => e.Name.LocalName == "Extension")
This should select all Extension elements regardless of the namespace. Alternatively, if you need to query elements using namespaces instead of hardcoding one you could do something like this:
foreach(XElement extensionElement in XDoc.Descendants().Where(e => e.Name.LocalName == "Extension")
{
var ns = extensionElement.Name.NamespaceName;
Console.WriteLine(extensionElement(ns + "DriveMapSettings"));
}
From e In XDoc.Descendants Group By e.Name.Namespace.NamespaceName should find all distinct namespaces, but it is untested.
Here's what I ended up with:
Dim ListOfNamespaces = z.Root.DescendantsAndSelf.Attributes().Where(Function(a) a.IsNamespaceDeclaration).GroupBy(Function(a) If(a.Name.[Namespace] = XNamespace.None, [String].Empty, a.Name.LocalName), Function(a) XNamespace.[Get](a.Value)).ToDictionary(Function(g) g.Key, Function(g) g.First())