Map blank nodes from stardog to pubby - sparql

So I have this .rdf that I have loaded onto Stardog and then I am using Pubby running over Jetty, to browse the triple store.
In my rdf file, I have several blank nodes which is given a blank node identifier by stardog. So this is a snippet of the rdf file.
<kbp:ORG rdf:about="http://somehostname/resource/res1">
<kbp:canonical_mention>
<rdf:Description>
<kbp:mentionStart rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1234</kbp:mentionStart>
<kbp:mentionEnd rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1239</kbp:mentionEnd>
</rdf:Description>
</kbp:canonical_mention>
So basically I have some resource "res1" which has links to blank node which has a mention start and mention end offset value.
The snippet of the config.ttl file for Pubby is shown below.
conf:dataset [
# SPARQL endpoint URL of the dataset
conf:sparqlEndpoint <http://localhost:5822/xxx/query>;
#conf:sparqlEndpoint <http://localhost:5822/foaf/query>;
# Default graph name to query (not necessary for most endpoints)
conf:sparqlDefaultGraph <http://dbpedia.org>;
# Common URI prefix of all resource URIs in the SPARQL dataset
conf:datasetBase <http://somehostname/>;
...
...
So the key thing is the datasetBase which maps URIs to URL.
When I try to map this, there is an "Anonymous node" link but upon clicking, nothing is displayed. My guess is, this is because the blank node has some identifier like _:bnode1234 which is not mapped by Pubby.
I wanted to know if anyone out there knows how to map these blank nodes.
(Note: If I load this rdf as a static rdf file directly onto Pubby, it works fine. But when I use stardog as a triple store, this mapping doesn't quite work)

It probably works in Pubby because they are keeping the BNode id's available; generally, the SPARQL spec does not guarantee or require that BNode identifiers are persistent. That is, you can issue the same query multiple times, which brings back the same result set (including bnodes) and the identifiers can be different each time. Similarly, a bnode identifier in a query is treated like a variable, it does not mean you are querying for that specific bnode.
Thus, Pubby is probably being helpful and making that work which is why using it directly works as opposed to a third party database.
Stardog does support the Jena/ARQ trick of putting a bnode identifier in angle brackets, that is, <_:bnode1234> which is taken to mean, the bnode with the identifier "bnode1234". If you can get Pubby to use that syntax in queries for bnodes, it will probably work.
But generally, I think this is something you will have to take up with the Pubby developers.

Related

GraphDB Lucene index - how to exclude property URIs from search results?

It seems that by default a Lucene index that indexes "uris" will index both nodes and properties. How can properties be excluded from search results?
The documentation shows a setting:
luc:exclude luc:setParam "bnode".
However its only valid values are "literal", "bnode", and "uri". How can property URIs be excluded? (they are not something that a search would be interested in).
I assume that you're using https://graphdb.ontotext.com/documentation/standard/full-text-search.html and not https://graphdb.ontotext.com/documentation/standard/lucene-graphdb-connector.html ?
The doc doesn't show what you show above, but shows
luc:exclude luc:setParam "hello.*"
which means "exclude strings that match the regex".
Which things to index is controlled by
luc:include luc:setParam "literal" # literal, uri, centre
If I understand correctly, you want to index URIs of nodes, but not URIs of outgoing properties? Then the answer would depend on the kind of molecule you are traversing.
luc:include luc:setParam "literal centre" will index only the central node URIs, which is probably what you want
with luc:excludePredicates you can list all properties you want to exclude, but that will also cut out the nodes that they reach...

Is there a way to do string replacement/substitution in sql?

I have some records in a CMS that include HTML fragments with custom tags for a widget tool. The maker of the CMS has apparently updated their CMS without providing proper data conversion. Their widgets use keys for layout based on screen width such as block_lg, block_md, block_sm. The problem kicks in with the fact they used to have a block_xs and they have now shifted them all -- dropping the block_xs and instead placing a block_xl on the other end.
We don't really use these things, but their widget configurations do. What this means for us is the values for each key are identical. The problem occurs when the updated CMS code is looking for the 'block_xl' in any widget definition tags, it can't find it and errors out.
What I'm thinking then is that the new code will appear to 'ignore' the block_xs due to how it reads the tags. (and similarly, the old code will ignore block_xl) Since the values for each are identical, I need to basically read any widget definition and add a block_xl value to it matching the value of [any one of] the other width parameters.
Since the best place order-wise would be 'before' the block_lg value, it's probably easiest to do it as follows:
Replace any thing matching posix style regex matching /block_lg(="\d+,\d+")/ with: block_xl="$1" block_lg="$1"
Or whatever the equivalent of that would be.
Example of an existing CMS block with multiple widget definitions:
<div>{{widget type="CleverSoft\CleverBlock\Block\Widget"
widget_title="The Album" classes="highlight-bottom modish greenfont font52 fontlight"
enable_fullwidth="0" block_ids="127" lazyload="0"
block_lg="127,12," block_md="127,12," block_sm="127,12," block_xs="127,12,"
template="widget/block.phtml" scroll="0" background_overlay_o="0"}}</div>
<!-- Image Block -->
<div>{{widget type="CleverSoft\CleverBlock\Block\Widget"
widget_title="What’s Your Favorite Cover Style?"
classes="zoo-widget-style2 modish grey font26 fontlight"
enable_fullwidth="0" block_ids="126" lazyload="0"
block_lg="126,12," block_md="126,12," block_sm="126,12," block_xs="126,12,"
template="widget/block.phtml" scroll="0" background_overlay_o="0"}}</div>
What I would prefer to end up with from the above (adding block_xl):
<div>{{widget type="CleverSoft\CleverBlock\Block\Widget"
widget_title="The Album" classes="highlight-bottom modish greenfont font52 fontlight"
enable_fullwidth="0" block_ids="127" lazyload="0"
block_xl="127,12," block_lg="127,12," block_md="127,12," block_sm="127,12," block_xs="127,12,"
template="widget/block.phtml" scroll="0" background_overlay_o="0"}}</div>
<!-- Image Block -->
<div>{{widget type="CleverSoft\CleverBlock\Block\Widget"
widget_title="What’s Your Favorite Cover Style?"
classes="zoo-widget-style2 modish grey font26 fontlight"
enable_fullwidth="0" block_ids="126" lazyload="0"
block_xl="126,12," block_lg="126,12," block_md="126,12," block_sm="126,12," block_xs="126,12,"
template="widget/block.phtml" scroll="0" background_overlay_o="0"}}</div>
I know how to do it in php and if necessary, I will just replace it on my local DB and write an sql script to update the modified records, but the html blocks can be kind of big in some cases. It would be preferable, if it is possible, to make the substitutions right in the SQL but I'm not sure how to do it or if it's even possible to do.
And yes, there can be more than one instance of a widget in any given cms page or block. (i.e. there may be a need for more than one such substitutions with different local 'values' assigned to the block_lg)
If anyone can help me do it in SQL, it would be greatly appreciated.
for reference, the tables effected are called cms_page and cms_block, the name of the row in both cases is content
SW

Creating Configuration File for DDS Recording Service

I'm a beginner looking for some clarity on how to create configuration files for the DDS Recording Service in two areas.
If you are looking to record a set of specific topics from a domain how do you set up the topic group? Can you list the topics as individual <topic_expr> i.e.
<topic_group name="SomeTopics">
<topics>
<topic_expr>topic2</topic_expr>
<topic_expr>topic8</topic_expr>
</topics>
<field_expr>*</field_expr>
</topic_group>
When I tried something like this not all the listed topics would be recorded. Is there something I am overlooking?
Secondly, when you use -deserialize to you need to make any changes to the configuration file you used to record the database? As I sometimes get errors about how "rti dds failed to find" followed by something like X::Y::Z. Thanks.
The XSD schema for the configuration file does not expect you to use multiple <topic_expr> tags, but a single tag with a comma-separated list of Topic names. The RTI Recording Service User's Manual explains it as follows:
<topic_expr>POSIX fn expression</topic_expr>
Required.
A comma-separated list of POSIX expressions that specify the names of Topics to be included in the TopicGroup.
The syntax and semantics are the same as for Partition matching.
Default: Null
Note: Keep in mind that spaces are valid first characters in topic names, thus they can affect the matching process. For example, this will match both Triangle and Square topics (notice there is no space before Square):
<topic_expr>Triangle,Square</topic_expr>
However the following will only match Triangle topics (because there is a space before Square):
<topic_expr>Triangle, Square</topic_expr>
With regard to the -deserialize option, this is not applicable to the Recording Service but to the Converter tool (rtirecconv). If you want to record deserialized, you will have to indicate that in the Recording Service configuration, via the tag <deserialize_mode>. Again, see the User's Manual for details.

Querying for shared nodes in JCR (ModeShape)

I have a JCR content repository implemented in ModeShape (4.0.0.Final). The structure of the repository is quite simple and looks like this:
/ (root)
Content/
Item 1
Item 2
Item 3
...
Tags/
Foo/
Bar/
.../
The content is initially created and stored under /Content as [nt:unstructured] nodes with [mix:shareable] mixin. When a content item is tagged, the tag node is first created under /Tags if it's not already there, and the content node is shared/cloned to the tag node using Workspace.clone(...) as described in the JCR 2.0 spec, section 14.1, Creation of Shared Nodes.
(I don't find this particularly elegant and I did just read this answer, about creating a tag based search system in JCR, so I realize this might not be the best/fastest/most scaleable solution. But I "inherited" this solution from developers before me, so I hope I don't have to rewrite it all...)
Anyway, the sharing itself seems to work (I can verify that the nodes are there using the ModeShape Content Explorer web app or programatically by session.getRootNode().getNode("Tags/Foo").getNodes()). But I am not able to find any shared nodes using a query!
My initial try (using JCR_SQL2 syntax) was:
SELECT * FROM [nt:unstructured] AS content
WHERE PATH(content) LIKE '/Tags/Foo/%' // ISDECENDANTNODE(content, '/Tags/Foo') gives same result
ORDER BY NAME(content)
The result set was to my surprise empty.
I also tried searching in [mix:shareable] like this:
SELECT * FROM [mix:shareable] AS content
WHERE PATH(content) LIKE '/Tags/Foo/%' // ISDECENDANTNODE(content, '/Tags/Foo') gives same result
ORDER BY NAME(content)
This also returned an empty result set.
I can see from the query:
SELECT * FROM [nt:unstructured] AS content
WHERE PATH(content) LIKE '/Content/%' // ISDECENDANTNODE(content, '/Content') works just as well
ORDER BY NAME(content)
...that the query otherwise works, and returns the expected result (all content). It just doesn't work when searching for the shared nodes.
How do I correctly search for shared nodes in JCR using ModeShape?
Update: I upgraded to 4.1.0.Final to see if that helped, but it had no effect on the described behaviour.
Cross-posted from the ModeShape forum:
Shared nodes are really just a single node that appears in multiple places within a workspace, so it's not exactly clear what it semantically means to get multiple query results for that one shareable node. Per Section 14.16 of the JSR-283 (JCR 2.0) specification implementations are free to include shareable nodes in query results at just one or at multiple/all of those locations.
ModeShape 2.x and 3.x always returned in query results only a single location of the shared nodes, as this was the behavior of the reference implementation and this was the feedback we got from users. When we were working on Modeshape 4.0, we tried to make it possible to return multiple results, but we ran into problems with the TCK and uncertainty about what this new expected behavior would be. Therefore, we backed off our goals and implemented query to return only one of the shared locations, as we did with 2.x and 3.x.
I may be wrong, but I'm not exactly sure if any JCR implementation returns multiple rows for a single shared node, but I may be wrong.

SPARQL - what does it take to find an ontology?

I'm pretty new to SPARQL, OWL and Jena, so please excuse if I'm asking utterly stupid questions. I'm having a problem that is driving me nuts since a couple of days. I'm using the following String as a query for a Jena QueryFactory.create(queryString),
queryString = "PREFIX foaf: <http://xmlns.com/foaf/0.1/>"+
"PREFIX ho: <http://www.flatlandfarm.de/fhtw/ontologies/2010/5/22/helloOwl.owl#>" +
"SELECT ?name ?person ?test ?group "+
"WHERE { ?person foaf:name ?name ; "+
" a ho:GoodPerson ; "+
" ho:isMemberOf ?group ; "+
"}";
Until this morning it worked as long as I only asked for properties from the foaf namespace. As soon as I asked for properties from my own namespace I always got empty results. While I was about to post this question here and did some final tests to be able to post it as precise as possible, it suddenly worked. So as I did not know what exactly to ask for anymore, I deleted my question before posting it. A couple of hours later I used Protege's Pellet plugin to create and export an inferred Model. I called it helloOwlInferred.owl and uploaded it to the directory on my server where helloWl.owl resided yet. I adjusted my method to load the inferred ontology and changed the above query so that the prefix ho: was assigned to the inferred ontology as well. At once, nothing worked any more. To be exact it was not nothing that worked any more but it was the same symptoms I had till this morning with my original query. My prefix did not work any more. I did a simple test: I renamed all the helloWorldInferred.owl files (the one on my server for the prefix and my local copy which I loaded) to helloWorld.owl. Strange enough that fixed everything.
Renaming it back to helloWorldInferred.owl broke everything again. And so on. What is going on there? Do I just need to wait a couple of weeks until my ontology gets "registered as a valid prefix"?
Maybe your OWL file contains the rdf:ID="something" construct (or some other form of relative URL, such as rdf:about="#something")?
rdf:ID and relative URLs are expanded into full absolute URLs, such as http://whatever/file.owl#something, by using the base URL of the OWL file. If the base URL is not explicitly specified in the file (using something like xml:base="http://whatever/file.owl"), then the location of the file on the web server (or in your file system if you load a local file) will be used as the base URI.
So if you move the file around or have copies in several locations, then the URIs in your file will change, and hence you'd have to change your SPARQL query accordingly.
Including an explicit xml:base, or avoiding relative URIs and rdf:ID, should fix the problem.
The whole idea of prefixes and QNames is just to compress URIs to save space and improve readability, the most common issue with them is spelling errors either in the definitions themselves or in the QNames.
Most likely the prefix definition you are using in your query is causing URIs to be generated which don't match the actual URIs of properties in your ontology.
That being said your issue may be due to something with Jena so it may well be worth asking your question on the Jena Mailing List
It looks like this was caused by a bug (or a feature?) in Protege. When I exported the inferred ontology with a new name, Protege changed the definitions of xmlns(blank) and xml:base to the name of the new file, but it did not change the definition of the actual namespace.
xmlns="http://xyz.com/helloOwl.owl" => xmlns="http://xyz.com/helloOwlInferred.owl"
xml:base="http://xyz.com/helloOwl.owl" => xml:base="http://xyz.com/helloOwlInferred.owl"
xmlns:helloOwl="http://xyz.com/helloOwl.owl" => xml:base="http://xyz.com/helloOwl.owl"
<!ENTITY helloOwl "http://wxyz.com/helloOwl.owl#" > => <!ENTITY helloOwl "http://wxyz.com/helloOwl.owl#" >
Since I fixed that it seems to work.
My fault not having examined the the actual source with the necessary attention.
You have to define a precise URI prefix for ho:, then tell it to Protegé (there is a panel for namespaces and define the same URI as the ontology prefix), so that, when you define GoodPerson in Protegé, it assumes you mean http://www.flatlandfarm.de/fhtw/ontologies/2010/5/22/helloOwl.owl#GoodPerson, which is the same as ho:GoodPerson only if you have used the same URI prefix for the two.
If you don't do so, Protegé (or some other component, like a web server) will do these silly things like composing the ontology's URI and its default URI prefix (the one that goes in front of GoodPerson when you don't specify any prefix) using the file name (or even worse, a URI like file:///home/user/...).
Remember, the ontology's URI is technically different than the URI's prefix that you use for the entities associated to the ontology itself (classes, properties etc), and ho: is just a shortcut having a local meaning, which depends on what you define in documents like files or SPARQL queries.
The ontology URI can also be different than the URL from where the ontology file can be fetched, although it is good to make them the same. Usually you need to play with URL rewriting in Apache to make that happen, but sometimes that ontology file isn't physically published, since the ontology is loaded into a SPARQL endpoint and its URI is resolved to an RDF document through the help of the endpoint itself, by rewriting the ontology URI into a SPARQL request that issues a DESCRIBE statement. The same trick can be used to resolve any other URI (i.e., your ontology-instantiating data), as long as associated data are accessible from your SPARQL endpoint (ie, are in your triple store).