How to use a pattern to match element names in compact relaxng - relaxng

I have some XML that needs validating from external source that has a similar layout too below
<stuff>
<id-0001>test</id-0001>
<id-0002>test</id-0002>
<id-0003>test</id-0003>
<id-0004>test</id-0004>
</stuff>
I tried the following but it is not valid
datatypes xs = "http://www.w3.org/2001/XMLSchema-datatypes"
start = stuff
stuff = element stuff
{
element id-* { text }*
}
Ideally I would like a regex match on the id tag names

To my knowledge it's not possible to define patterns in RELAX NG for element names. See also RelaxNG enumerated element names and relax-ng compact: attribute whose name matches a reg-ex for similar questions.

Related

Open Refine: Exporting nested XML with templating

I have a question regarding the templating option for XML in Open Refine. Is it possible to export data from two columns in a nested XML-structure, if both columns contain multiple values, that need to be split first?
Here's an example to illustrate better what I mean. My columns look like this:
Column1
Column2
https://d-nb.info/gnd/119119110;https://d-nb.info/gnd/118529889
Grützner, Eduard von;Elisabeth II., Großbritannien, Königin
https://d-nb.info/gnd/1037554086;https://d-nb.info/gnd/1245873660
Müller, Jakob;Meier, Anina
Each value separated by semicolon in Column1 has a corresponding value in Column2 in the right order and my desired output would look like this:
<rootElement>
<recordRootElement>
...
<edm:Agent rdf:about="https://d-nb.info/gnd/119119110">
<skos:prefLabel xml:lang="zxx">Grützner, Eduard von</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/118529889">
<skos:prefLabel xml:lang="zxx">Elisabeth II., Großbritannien, Königin</skos:prefLabel>
</edm:Agent>
...
</recordRootElement>
<recordRootElement>
...
<edm:Agent rdf:about="https://d-nb.info/gnd/1037554086">
<skos:prefLabel xml:lang="zxx">Müller, Jakob</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/1245873660">
<skos:prefLabel xml:lang="zxx">Meier, Anina</skos:prefLabel>
</edm:Agent>
...
</recordRootElement>
<rootElement>
(note: in my initial posting, the position of the root element was not indicated and it looked like this:
<edm:Agent rdf:about="https://d-nb.info/gnd/119119110">
<skos:prefLabel xml:lang="zxx">Grützner, Eduard von</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/118529889">
<skos:prefLabel xml:lang="zxx">Elisabeth II., Großbritannien, Königin</skos:prefLabel>
</edm:Agent>
)
I managed to split the values separated by ";" for both columns like this
{{forEach(cells["Column1"].value.split(";"),v,"<edm:Agent rdf:about=\""+v+"\">"+"\n"+"</edm:Agent>")}}
{{forEach(cells["Column2"].value.split(";"),v,"<skos:prefLabel xml:lang=\"zxx\">"+v+"</skos:prefLabel>")}}
but I can't find out how to nest the splitted skos:prefLabel into the edm:Agent element. Is that even possible? If not, I would work with seperate columns or another workaround, but I wanted to make sure, if there's a more direct way before.
Thank you!
Kristina
I am going to expand the answer from RolfBly using the Templating Exporter from OpenRefine.
I do have the following assumptions:
There is some other column left of Column1 acting as record identifying column (see first screenshot).
The columns actually have some proper names
The columns URI and Name are the only columns with multiple values. Otherwise we might produce empty XML elements with the following recipe.
We will use the information about records available via GREL to determine whether to write a <recordRootElement> or not.
Recipe:
Split first Name and then URI on the separator ";" via "Edit cells" => "Split multi-valued cells".
Go to "Export" => "Templating..."
In the prefix field use the value
<?xml version="1.0" encoding="utf-8"?>
<rootElement>
Please note that I skipped the namespace imports for edm, skos, rdf and xml.
In the row template field use the value:
{{if(row.index - row.record.fromRowIndex == 0, '<recordRootElement>', '')}}
<edm:Agent rdf:about="{{escape(cells['URI'].value, 'xml')}}">
<skos:prefLabel xml:lang="zxx">{{escape(cells['Name'].value, 'xml')}}</skos:prefLabel>
</edm:Agent>
{{if(row.index - row.record.fromRowIndex == row.record.rowCount - 1, '</recordRootElement>', '')}}
The row separator field should just contain a linebreak.
In the suffix field use the value:
</rootElement>
Disclaimer: If you're keen on using only OpenRefine, this won't be the answer you were hoping for. There may be ways in OR that I don't know of. That said, here's how I would do it.
Edit The trick is to keep URL and literal side by side on one line. b2m's answer below does just that: go from right to left splitting, not from left to right. You can then skip steps 2 and 3, to get the result in the image.
split each column into 2 columns by separator ;. You'll get 4 columns, 1 and 3 belong together, and 2 and 4 belong together. I'm assuming this will be the case consistently in your data.
export 1 and 3 to a file, and export 2 and 4 to another file, of any convenient format, using the custom tabular exporter.
concatenate those two files into one single file using an editor (I use Notepad++), or any other method you may prefer. Several ways to Rome here. Result in OR would be something like this.
You then have all sorts of options to put text strings in front, between and after your two columns.
In OR, you could use transform on column URL to build your XML using the below code
(note the \n for newline, that's probably just a line feed, you may want to use \r\n for carriage return + line feed if you're using Windows).
'<edm:Agent rdf:about="' + value + '">\n<skos:prefLabel xml:lang="zxx">' + cells.Name.value + '</skos:prefLabel>\n</edm:Agent>'
to get your XML in one column, like so
which you can then export using the custom tabular exporter again. Or instead you could use Add column based on this column in a similar manner, if you want to retain your URL column.
You could even do this in the editor without re-importing the file back into OR, but that's beyond the scope of this answer.

Axiomatics - condition editor

I have a subject like "accessTo" = ["123", "123-edit"]
and a resource like "interestedId" = "123"
Now I'm trying to write a condition - where it checks "interestedId" concatenated with "-edit" equals "123-edit" in "AccessTo".
Im trying to write rule like this
anyOfAny_xacml1(function[stringEqual], "accessTo", "interestedId"+"-edit")
It is not allowing to do this.
Any help is appreciated.
In addition to the answer from Keerthi S ...
If you know there should only be one value of interestedId then you can do this to prevent the indeterminate from happening:
stringBagSize(interestedId) == 1 && anyOfAny(function[stringEqual], accessTo, stringOneAndOnly(interestedId) + "-edit")
If more than value is present then evaluation stops prior to reaching the function that expects only one value. This condition would return false if more than one value is present.
On the other hand if interestedId can have multiple values then this would work:
anyOfAny(function[stringEqual], accessTo, map(function[stringConcatenate],interestedId, "-edit"))
The map function will apply the stringConcatenate function to all values in the bag.
Since Axiomatics products are compliant with XACML specification, all attributes by default are assumed to contain multiple values(called as 'bags').
So if you would like to append a string to an attribute use stringOneAndOnly XACML function for the attribute to indicate that the attribute can have only one value.
So assuming you mean accessTo has attribute ID as Attributes.access_subject.subject_id, interestedId has the attribute ID as Attributes.resource.resource_id and anyOfAny_xacml1 is equivalent to anyOfAny XACML function, the resulting condition would look like,
anyOfAny(function[stringEqual], Attributes.access_subject.subject_id, stringOneAndOnly(Attributes.resource.resource_id) + "-edit")

Cypher: Scope of Match Statements in which Variables are Valid

I think I have a general problem with understanding the structure of matches and the scope in which variables of the match live.
The specific piece of code where I have the problem with is this:
// S sentiment toward A goodFor/badFor T
// => S sentiment toward the idea of A goodFor/badFor T
MATCH (S:A)-[:SOURCE]->(sent1:PS {type:"sentiment"})-[:TARGET]->(gfbf:E {type:"gfbf"}) , (A)-[:SOURCE]->(gfbf)-[:TARGET]->(T) , (Writer:A {type:"writer"})
// if there is some negative belief in any of the writers private state spaces that involve gfbf then inference is blocked
WHERE NOT (Writer)-[*1..]->({type:"believesTrue" , spec:FALSE})-[*1..]->(gfbf)
// if sent1 is in some private state spaces of the writer return all of these
OPTIONAL MATCH p=(Writer)-[*]->(sent1)
WITH NODES(p)[1..-1] AS ps_nodes
WHERE ALL(x IN ps_nodes[1..] WHERE LABELS(x) = "PS")
MERGE (S)-[:SOURCE]->(sent2:PS {type:"sentiment" , spec:(sent1.spec)})-[:TARGET]->(ideaOf:I {name:"ideaOf" , type:"ideaOf"})-[:TARGET]->(gfbf)
ON CREATE SET sent2.name =
CASE sent2.spec
WHEN FALSE THEN "-S"
ELSE "+S"
END
RETURN p
I think it's not relevant to understand what this is for. It suffices to see the structure I assume, but basically what it does is: It looks for a subgraph where there is path S-->sent1-->gfbf and also a path A-->gfbf-->T. If it finds that is makes a new path A-->sent2-->ideaOf-->gfbf, all he while setting the properties of the new nodes depending on the properties of the nodes from the match. Furthermore it looks whether it also has a path writer-->...-->sent where all nodes in the ... part have label PS. If it finds that path then it returns this for further operations in a different part of the program.
The error I am getting is this:
py2neo.cypher.error.statement.InvalidSyntax: sent1 not defined (line 6, column 58 (offset: 421))
"MERGE (S)-[:SOURCE]->(sent2:PS {type:"sentiment" , spec:(sent1.spec)})-[:TARGET]->(ideaOf:I {name:"ideaOf" , type:"ideaOf"})-[:TARGET]->(g"bf)
Why is sent1 no longer defined where I use it and how would I need to restructure the code to make it valid?
sent1 in isn't in the prior WITH - change it so:
WITH NODES(p)[1..-1] AS ps_nodes, sent1

Lucene query with filter "without property"

I need to write lucene query/filter to get objects without specific property.
I tried with ... ISNULL:"cm:param_name" but id didn't work.
Edit: I have added new property in aspect but objects that haven't been updated yet don't have it amongst their listed properties (checked with node browser).
With a query like "cm:*", you should only receive documents that have the field "cm" plus content. Note that you have to allow leading wildcard queries by the query parser with setAllowLeadingWildcard(true).
Also check out this post, which deals with a reversed version of your problem:
Find all Lucene documents having a certain field
Can you please be more clear as to what "without property" means ? Do you mean that you do not want to specify the field like so "field:value" and instead set the filter to "value" ?
EDIT
Are you generating these field names dynamically or is this the only field name that can have it's value missing ? If there is only one field that may or may not appear in your document then you could just populate it with a default value when it's missing and then search for that . Otherwise, you could try a negated rangequery like so : NOT foo:[* TO *] . This should match all documents without a value in the foo field. For performance purposes , in the second case the field should be indexed as a string field (not analyzed).
I managed to get this done with .. AND NOT (#namespace\:property:"")
In Java and Lucene 3.6.2 the "FieldValueFilter" with activated negation can be used: (which was not the question)
import org.apache.lucene.search.FieldValueFilter;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.TopDocs;
final IndexSearcher indexSearcher = getIndexSearcher() <- whereever that comes from
final TopDocs topdocs = indexSearcher.search(new MatchAllDocsQuery(), new FieldValueFilter("cm", true), Integer.MAX_VALUE);
You can use ISUNSET and/or ISNULL for this scenario.
ISUNSET:"cm:title"
ISNULL:"cm:title"

SQL like statement, issue with singular, plural keyword

for (String column : searchCols) {
for (String keyword : keywords) {
listAllSql.append(getDBColumnName(column));
listAllSql.append(" like "); //like
listAllSql.append("'%"); //'%vision%'
listAllSql.append(keyword);
listAllSql.append("%'");
listAllSql.append(" or ");
}
Here is a snippet of of the code. I pass a keyword for example "Networks" to be searched for. I want the statement to return me result event if it finds "Network" (Singular) as well as those which contains "Networks" (plural). What changes to do I make to the above statement to achieve this. I am actually working with 'SQLite' Manager as an add on on Mozilla Firefox FYI.
You could just add to the top (not sure on the language):
for (String keyword : keywords) {
if right(keyword,1) = "s" and length(keyword) > 1 then
keyword = left(keyword,length(keyword)-1)
end if
listAllSql.append.....
you may run into problems where they search for 'as' and returns all which contain 'a' but finding words that are plural and not just ending with s will be MUCH harder, also generally false positives in a search are better than false negatives (in my opinion).