Creating Configuration File for DDS Recording Service - data-distribution-service

I'm a beginner looking for some clarity on how to create configuration files for the DDS Recording Service in two areas.
If you are looking to record a set of specific topics from a domain how do you set up the topic group? Can you list the topics as individual <topic_expr> i.e.
<topic_group name="SomeTopics">
<topics>
<topic_expr>topic2</topic_expr>
<topic_expr>topic8</topic_expr>
</topics>
<field_expr>*</field_expr>
</topic_group>
When I tried something like this not all the listed topics would be recorded. Is there something I am overlooking?
Secondly, when you use -deserialize to you need to make any changes to the configuration file you used to record the database? As I sometimes get errors about how "rti dds failed to find" followed by something like X::Y::Z. Thanks.

The XSD schema for the configuration file does not expect you to use multiple <topic_expr> tags, but a single tag with a comma-separated list of Topic names. The RTI Recording Service User's Manual explains it as follows:
<topic_expr>POSIX fn expression</topic_expr>
Required.
A comma-separated list of POSIX expressions that specify the names of Topics to be included in the TopicGroup.
The syntax and semantics are the same as for Partition matching.
Default: Null
Note: Keep in mind that spaces are valid first characters in topic names, thus they can affect the matching process. For example, this will match both Triangle and Square topics (notice there is no space before Square):
<topic_expr>Triangle,Square</topic_expr>
However the following will only match Triangle topics (because there is a space before Square):
<topic_expr>Triangle, Square</topic_expr>
With regard to the -deserialize option, this is not applicable to the Recording Service but to the Converter tool (rtirecconv). If you want to record deserialized, you will have to indicate that in the Recording Service configuration, via the tag <deserialize_mode>. Again, see the User's Manual for details.

Related

Is there a way to do string replacement/substitution in sql?

I have some records in a CMS that include HTML fragments with custom tags for a widget tool. The maker of the CMS has apparently updated their CMS without providing proper data conversion. Their widgets use keys for layout based on screen width such as block_lg, block_md, block_sm. The problem kicks in with the fact they used to have a block_xs and they have now shifted them all -- dropping the block_xs and instead placing a block_xl on the other end.
We don't really use these things, but their widget configurations do. What this means for us is the values for each key are identical. The problem occurs when the updated CMS code is looking for the 'block_xl' in any widget definition tags, it can't find it and errors out.
What I'm thinking then is that the new code will appear to 'ignore' the block_xs due to how it reads the tags. (and similarly, the old code will ignore block_xl) Since the values for each are identical, I need to basically read any widget definition and add a block_xl value to it matching the value of [any one of] the other width parameters.
Since the best place order-wise would be 'before' the block_lg value, it's probably easiest to do it as follows:
Replace any thing matching posix style regex matching /block_lg(="\d+,\d+")/ with: block_xl="$1" block_lg="$1"
Or whatever the equivalent of that would be.
Example of an existing CMS block with multiple widget definitions:
<div>{{widget type="CleverSoft\CleverBlock\Block\Widget"
widget_title="The Album" classes="highlight-bottom modish greenfont font52 fontlight"
enable_fullwidth="0" block_ids="127" lazyload="0"
block_lg="127,12," block_md="127,12," block_sm="127,12," block_xs="127,12,"
template="widget/block.phtml" scroll="0" background_overlay_o="0"}}</div>
<!-- Image Block -->
<div>{{widget type="CleverSoft\CleverBlock\Block\Widget"
widget_title="What’s Your Favorite Cover Style?"
classes="zoo-widget-style2 modish grey font26 fontlight"
enable_fullwidth="0" block_ids="126" lazyload="0"
block_lg="126,12," block_md="126,12," block_sm="126,12," block_xs="126,12,"
template="widget/block.phtml" scroll="0" background_overlay_o="0"}}</div>
What I would prefer to end up with from the above (adding block_xl):
<div>{{widget type="CleverSoft\CleverBlock\Block\Widget"
widget_title="The Album" classes="highlight-bottom modish greenfont font52 fontlight"
enable_fullwidth="0" block_ids="127" lazyload="0"
block_xl="127,12," block_lg="127,12," block_md="127,12," block_sm="127,12," block_xs="127,12,"
template="widget/block.phtml" scroll="0" background_overlay_o="0"}}</div>
<!-- Image Block -->
<div>{{widget type="CleverSoft\CleverBlock\Block\Widget"
widget_title="What’s Your Favorite Cover Style?"
classes="zoo-widget-style2 modish grey font26 fontlight"
enable_fullwidth="0" block_ids="126" lazyload="0"
block_xl="126,12," block_lg="126,12," block_md="126,12," block_sm="126,12," block_xs="126,12,"
template="widget/block.phtml" scroll="0" background_overlay_o="0"}}</div>
I know how to do it in php and if necessary, I will just replace it on my local DB and write an sql script to update the modified records, but the html blocks can be kind of big in some cases. It would be preferable, if it is possible, to make the substitutions right in the SQL but I'm not sure how to do it or if it's even possible to do.
And yes, there can be more than one instance of a widget in any given cms page or block. (i.e. there may be a need for more than one such substitutions with different local 'values' assigned to the block_lg)
If anyone can help me do it in SQL, it would be greatly appreciated.
for reference, the tables effected are called cms_page and cms_block, the name of the row in both cases is content
SW

Downloading all full-text articles in PMC and PubMed databases

According to one of the answered questions by NCBI Help Desk , we cannot "bulk-download" PubMed Central. However, can I use "NCBI E-utilities" to download all full-text papers in PMC database using Efetch or at least find all corresponding PMCids using Esearch in Entrez Programming Utilities? If yes, then how? If E-utilities cannot be used, is there any other way to download all full-text articles?
First of all, before you go downloading files in bulk, I highly recommend you read the E-utilities usage guidelines.
If you want full-text articles, you're going to want to limit your search to open access files. Furthermore, I suggest also restricting your search to Medline articles if you want articles that are any good. Then you can do the search.
Using Biopython, this gives us :
search_query = 'medline[sb] AND "open access"[filter]'
# getting search results for the query
search_results = Entrez.read(Entrez.esearch(db="pmc", term=search_query, retmax=10, usehistory="y"))
You can use the search function on the PMC website and it will display the generated query that you can copy/paste into your code.
Now that you've done the search, you can actually download the files :
handle = Entrez.efetch(db="pmc", rettype="full", retmode="xml", retstart=0, retmax=int(search_results["Count"]), webenv=search_results["WebEnv"], query_key=search_results["QueryKey"])
You might want to download in batches by changing retstart and retmax by variables in a loop in order to avoid flooding the servers.
If handle contains only one file, handle.read() contains the whole XML file as a string. If it contains more, the articles are contained in <article></article> nodes.
The full text is only available in XML, and the default parser available in pubmed doesn't handle XML namespaces, so you're going to be on your own with ElementTree (or an other parser) to parse your XML.
Here, the articles are found thanks to the internal history of E-utilities, which is accessed with the webenv argument and enabled thanks to the usehistory="y" argument in Entrez.read()
A few tips about XML parsing with ElementTree : You can't delete a grandchild node, so you're probably going to want to delete some nodes recursively. node.text returns the text in node, but only up to the first child, so you'll need to do something along the lines of "".join(node.itertext()) if you want to get all the text in a given node.
According to one of the answered questions by NCBI Help Desk , we cannot "bulk-download" PubMed Central.
https://www.nlm.nih.gov/bsd/medline.html + https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/ will download a good portion of it (I don't know the percentage). It will indeed miss the PMC full-texts articles whose license doesn't allow redistribution as explained on https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/.

Specifying multiple Domain Bases in Rocket.Chat LDAP

On Rocket.Chat's LDAP configuration page, the helper text for Domain Base states that you should enter (emphasis mine):
The fully qualified Distinguished Name (DN) of an LDAP subtree you want to search for users and groups. You can add as many as you like; however, each group must be defined in the same domain base as the users that belong to it. If you specify restricted user groups, only users that belong to those groups will be in scope. We recommend that you specify the top level of your LDAP directory tree as your domain base and use search filter to control access.
Problem is, I don't know how to enter more than one.
My DN looks like this:
OU=IT,OU=Staff,DC=companyname,DC=local
And I want the following users to also be synced:
OU=Example,OU=Staff,DC=companyname,DC=local
But I don't know how to add them both, as the docs aren't clear, and the source code is even less clear.
I've tried the following ways:
Space separated
Semicolon separated
Ampersand (and double ampersand) separated
Wrapping them up in an array (e.g. ["OU=Example ...", "OU=IT ..."]) and as a JSON object
Pipe (and double pipe) separated
'Plus' separated (e.g. DC=local + OU=Example)
But no matter what I do, it won't sync users. The logs tell me:
Exception while invoking method 'ldap_sync_users' NoSuchObjectError: 0000208D: NameErr: DSID-03100238, problem 2001 (NO_OBJECT), data 0, best match of: at Object.Future.wait (/snap/rocketchat-server/511/node_modules/fibers/future.js:449:15) ...
I know I can set up a group restriction so only users in a certain group will be synced, but the helper text says I can use multiple DNs, and I want to know how to use multiple DNs
After reading RFC-4514, I discovered I should construct my DN like so:
OU=Example+OU=IT,OU=Staff,DC=companyname,DC=local
With the plus occurring between the two OUs I wish to add. Now my users are syncing correctly.

Apache OpenNLP: How do I implement a dictionary based entity recognition?

I have already downloaded the jar files to eclipse.
http://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/index.html
How do I do the following:
1.) Be able to add my own names and tags.
2.) Be able to get the names and tags that were in the dictionary.
3.) Configure between case sensitive and insensitive.
For example, let's say, I add the name "Mike Smith" with name tag "Author".
If I have text that has that name, it should be able to recognize that its there along with the tag.
Please give actual java code!!!
I have asked a very similar question here:
Is it possible to conduct 'Context Analysis' for precise entity extraction with OpenNLP?
general concensus is that its 2 steps, first to identify if your sentence contains Author, the second to find the name.
I too would like to do it in 1 step (where the analysis of the corpus includes the words within itself as a way to determine the context of the name)

Map blank nodes from stardog to pubby

So I have this .rdf that I have loaded onto Stardog and then I am using Pubby running over Jetty, to browse the triple store.
In my rdf file, I have several blank nodes which is given a blank node identifier by stardog. So this is a snippet of the rdf file.
<kbp:ORG rdf:about="http://somehostname/resource/res1">
<kbp:canonical_mention>
<rdf:Description>
<kbp:mentionStart rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1234</kbp:mentionStart>
<kbp:mentionEnd rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1239</kbp:mentionEnd>
</rdf:Description>
</kbp:canonical_mention>
So basically I have some resource "res1" which has links to blank node which has a mention start and mention end offset value.
The snippet of the config.ttl file for Pubby is shown below.
conf:dataset [
# SPARQL endpoint URL of the dataset
conf:sparqlEndpoint <http://localhost:5822/xxx/query>;
#conf:sparqlEndpoint <http://localhost:5822/foaf/query>;
# Default graph name to query (not necessary for most endpoints)
conf:sparqlDefaultGraph <http://dbpedia.org>;
# Common URI prefix of all resource URIs in the SPARQL dataset
conf:datasetBase <http://somehostname/>;
...
...
So the key thing is the datasetBase which maps URIs to URL.
When I try to map this, there is an "Anonymous node" link but upon clicking, nothing is displayed. My guess is, this is because the blank node has some identifier like _:bnode1234 which is not mapped by Pubby.
I wanted to know if anyone out there knows how to map these blank nodes.
(Note: If I load this rdf as a static rdf file directly onto Pubby, it works fine. But when I use stardog as a triple store, this mapping doesn't quite work)
It probably works in Pubby because they are keeping the BNode id's available; generally, the SPARQL spec does not guarantee or require that BNode identifiers are persistent. That is, you can issue the same query multiple times, which brings back the same result set (including bnodes) and the identifiers can be different each time. Similarly, a bnode identifier in a query is treated like a variable, it does not mean you are querying for that specific bnode.
Thus, Pubby is probably being helpful and making that work which is why using it directly works as opposed to a third party database.
Stardog does support the Jena/ARQ trick of putting a bnode identifier in angle brackets, that is, <_:bnode1234> which is taken to mean, the bnode with the identifier "bnode1234". If you can get Pubby to use that syntax in queries for bnodes, it will probably work.
But generally, I think this is something you will have to take up with the Pubby developers.