Multiple matchers outputting to the same field - fast-esp

We are looking for a solution to have multiple matchers which all output to the same field.
Background is that in our initial tests every matcher overwrites the output from the previous stages.
Example:
Machter A outputs "red,blue,organge" to fields meta_keywords
Matcher B outputs "soft,hard,whobbly" to fileds meta_keywords
Expected Result: After all stages the field meta_keywords contains: red,blue,organge,soft,hard,whobbly
Actual Result: meta_keywords contains "soft,hard,whobbly"

When having multiple matcher output to a single fields make shure the field is set as a multi value field in the index profile. The field needs to have a seperator set e.g:
<field name="locations" separator=";"/>
With this is each matchers output is appended to the field.

Related

Open Refine: Exporting nested XML with templating

I have a question regarding the templating option for XML in Open Refine. Is it possible to export data from two columns in a nested XML-structure, if both columns contain multiple values, that need to be split first?
Here's an example to illustrate better what I mean. My columns look like this:
Column1
Column2
https://d-nb.info/gnd/119119110;https://d-nb.info/gnd/118529889
Grützner, Eduard von;Elisabeth II., Großbritannien, Königin
https://d-nb.info/gnd/1037554086;https://d-nb.info/gnd/1245873660
Müller, Jakob;Meier, Anina
Each value separated by semicolon in Column1 has a corresponding value in Column2 in the right order and my desired output would look like this:
<rootElement>
<recordRootElement>
...
<edm:Agent rdf:about="https://d-nb.info/gnd/119119110">
<skos:prefLabel xml:lang="zxx">Grützner, Eduard von</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/118529889">
<skos:prefLabel xml:lang="zxx">Elisabeth II., Großbritannien, Königin</skos:prefLabel>
</edm:Agent>
...
</recordRootElement>
<recordRootElement>
...
<edm:Agent rdf:about="https://d-nb.info/gnd/1037554086">
<skos:prefLabel xml:lang="zxx">Müller, Jakob</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/1245873660">
<skos:prefLabel xml:lang="zxx">Meier, Anina</skos:prefLabel>
</edm:Agent>
...
</recordRootElement>
<rootElement>
(note: in my initial posting, the position of the root element was not indicated and it looked like this:
<edm:Agent rdf:about="https://d-nb.info/gnd/119119110">
<skos:prefLabel xml:lang="zxx">Grützner, Eduard von</skos:prefLabel>
</edm:Agent>
<edm:Agent rdf:about="https://d-nb.info/gnd/118529889">
<skos:prefLabel xml:lang="zxx">Elisabeth II., Großbritannien, Königin</skos:prefLabel>
</edm:Agent>
)
I managed to split the values separated by ";" for both columns like this
{{forEach(cells["Column1"].value.split(";"),v,"<edm:Agent rdf:about=\""+v+"\">"+"\n"+"</edm:Agent>")}}
{{forEach(cells["Column2"].value.split(";"),v,"<skos:prefLabel xml:lang=\"zxx\">"+v+"</skos:prefLabel>")}}
but I can't find out how to nest the splitted skos:prefLabel into the edm:Agent element. Is that even possible? If not, I would work with seperate columns or another workaround, but I wanted to make sure, if there's a more direct way before.
Thank you!
Kristina
I am going to expand the answer from RolfBly using the Templating Exporter from OpenRefine.
I do have the following assumptions:
There is some other column left of Column1 acting as record identifying column (see first screenshot).
The columns actually have some proper names
The columns URI and Name are the only columns with multiple values. Otherwise we might produce empty XML elements with the following recipe.
We will use the information about records available via GREL to determine whether to write a <recordRootElement> or not.
Recipe:
Split first Name and then URI on the separator ";" via "Edit cells" => "Split multi-valued cells".
Go to "Export" => "Templating..."
In the prefix field use the value
<?xml version="1.0" encoding="utf-8"?>
<rootElement>
Please note that I skipped the namespace imports for edm, skos, rdf and xml.
In the row template field use the value:
{{if(row.index - row.record.fromRowIndex == 0, '<recordRootElement>', '')}}
<edm:Agent rdf:about="{{escape(cells['URI'].value, 'xml')}}">
<skos:prefLabel xml:lang="zxx">{{escape(cells['Name'].value, 'xml')}}</skos:prefLabel>
</edm:Agent>
{{if(row.index - row.record.fromRowIndex == row.record.rowCount - 1, '</recordRootElement>', '')}}
The row separator field should just contain a linebreak.
In the suffix field use the value:
</rootElement>
Disclaimer: If you're keen on using only OpenRefine, this won't be the answer you were hoping for. There may be ways in OR that I don't know of. That said, here's how I would do it.
Edit The trick is to keep URL and literal side by side on one line. b2m's answer below does just that: go from right to left splitting, not from left to right. You can then skip steps 2 and 3, to get the result in the image.
split each column into 2 columns by separator ;. You'll get 4 columns, 1 and 3 belong together, and 2 and 4 belong together. I'm assuming this will be the case consistently in your data.
export 1 and 3 to a file, and export 2 and 4 to another file, of any convenient format, using the custom tabular exporter.
concatenate those two files into one single file using an editor (I use Notepad++), or any other method you may prefer. Several ways to Rome here. Result in OR would be something like this.
You then have all sorts of options to put text strings in front, between and after your two columns.
In OR, you could use transform on column URL to build your XML using the below code
(note the \n for newline, that's probably just a line feed, you may want to use \r\n for carriage return + line feed if you're using Windows).
'<edm:Agent rdf:about="' + value + '">\n<skos:prefLabel xml:lang="zxx">' + cells.Name.value + '</skos:prefLabel>\n</edm:Agent>'
to get your XML in one column, like so
which you can then export using the custom tabular exporter again. Or instead you could use Add column based on this column in a similar manner, if you want to retain your URL column.
You could even do this in the editor without re-importing the file back into OR, but that's beyond the scope of this answer.

Splunk field extractor unable to extract all values

I want to extract 4 values out of one field, called msg, from a Splunk query; and the msg is in the form of:
msg: "Service call successful k1=v1 k2=v2 k3=v3 k4=v4 k5=v5 something else can be ignored"
keys are always static but values are not, for instance, v2 could be XXX or XXYYZZ; similarly possible values for v3 just have unpredictable length.
I query to get some sample results and hope to use Field Extractor to generate a regex, but the regex generated can't get all the values out and I guess it's probably because values are not having the same length?
Do I need to change my logging format by separating each key=value using a common? Or I am not using the field extractor correctly?
[Update1]: A few sample data:
msg:Service call successful k1=XXX k2=BBBB k3=Something I made up k4=YYYNNN k5=do not need to retrieve this value
msg:Service call successful k1=SSSSSS k2=AAA k3=This could contain space and comma, like this one k4=YYYNNM k5=can be ignored
I could change the logging format if it makes easier to query and extract fields. Will adding a separator like dot or pipe help?
Normally Splunk will pull key-value pairs out automatically
However, when it doesn't, go try your regular expression(s) on regex101 - the field extractor is often a good[ish] start, but rarely creates efficient (or complete) regular expressions
An inline version of this would be as follows (presuming the "value" half of the key-value pair is contiguous characters):
| rex field=_raw "k1=(?<k1>\S+)\s+k2=(?<k2>\S+)\s+k3=(?<k3>\S+)\s+k4=(?<k4>\S+)\s+k5=(?<k5>\S+)"
Normally I prefer to do sequential rex calls, in case something's out of order or missing, but if your data's consistent, this will work
Once you have it the way you want it, update your props.conf and transforms.conf as appropriate for the sourcetype
EDIT for updated sample data / comment response:
...
| rex field=_raw "k3=(?<k3>.+)\s+k4="
| rex field=_raw "k4=(?<k4>.+)\s+k5="
...

Axiomatics - condition editor

I have a subject like "accessTo" = ["123", "123-edit"]
and a resource like "interestedId" = "123"
Now I'm trying to write a condition - where it checks "interestedId" concatenated with "-edit" equals "123-edit" in "AccessTo".
Im trying to write rule like this
anyOfAny_xacml1(function[stringEqual], "accessTo", "interestedId"+"-edit")
It is not allowing to do this.
Any help is appreciated.
In addition to the answer from Keerthi S ...
If you know there should only be one value of interestedId then you can do this to prevent the indeterminate from happening:
stringBagSize(interestedId) == 1 && anyOfAny(function[stringEqual], accessTo, stringOneAndOnly(interestedId) + "-edit")
If more than value is present then evaluation stops prior to reaching the function that expects only one value. This condition would return false if more than one value is present.
On the other hand if interestedId can have multiple values then this would work:
anyOfAny(function[stringEqual], accessTo, map(function[stringConcatenate],interestedId, "-edit"))
The map function will apply the stringConcatenate function to all values in the bag.
Since Axiomatics products are compliant with XACML specification, all attributes by default are assumed to contain multiple values(called as 'bags').
So if you would like to append a string to an attribute use stringOneAndOnly XACML function for the attribute to indicate that the attribute can have only one value.
So assuming you mean accessTo has attribute ID as Attributes.access_subject.subject_id, interestedId has the attribute ID as Attributes.resource.resource_id and anyOfAny_xacml1 is equivalent to anyOfAny XACML function, the resulting condition would look like,
anyOfAny(function[stringEqual], Attributes.access_subject.subject_id, stringOneAndOnly(Attributes.resource.resource_id) + "-edit")

How to get name of streams in MinibatchSource?

How can I get the names of each of the streams in a MinibatchSource?
Can I get the names associated with with the stream information returned by stream_infos?
minibatch_source.stream_infos()
I also have a follow-up-question:
The result from:
print(reader_train.streams.keys())
is
dict_keys(['labels', 'features'
How does these names relate to the construction of the MiniBatchSource, which is done like this?
return MinibatchSource(ImageDeserializer(map_file, StreamDefs(
features = StreamDef(field='image', transforms=transforms), # first column in map file is referred to as 'image'
labels = StreamDef(field='label', shape=num_classes) # and second as 'label'
)))
I would have thought that my streams would be named ‘image’ and ‘label’, but they were named ‘labels’ and ‘features’.
I guess those names are somehow default names?
For your original question:
minibatch_source.streams.keys()
See for example this tutorial under the section "A Brief Look at Data and Data Reading".
For your followup question: The names returned by keys() are the arguments of StreamDefs(). This is all you need in your program. If you define your MinibatchSource like this
return MinibatchSource(ImageDeserializer(map_file, StreamDefs(
image = StreamDef(field='image', transforms=transforms), # first column in map file is referred to as 'image'
label = StreamDef(field='label', shape=num_classes) # and second as 'label')))
then the names will match. You can choose any names you want but the value of the field inside StreamDef() should match the source (which depends on your input data and the Deserializer you are using).

Is getting the General ID same as getting FormattedID in rally?

I am trying to get the ID under "General" from a feature item in rally. This is my query:
body = { "find" => {"_ProjectHierarchy" => projectID, "_TypeHierarchy" => "PortfolioItem/Feature"
},
"fields" => ["FormattedID","Name","State","Release","_ItemHierarchy","_TypeHierarchy","Tags"],
"hydrate" => ["_ItemHierarchy","_TypeHierarchy","Tags"],
"fetch"=>true
}
I am not able to get any value for FormattedID, I tried using "_UnformattedID" but it pulls up an entirely different value than the FormattedID. Any help would be appreciated.
LBAPI does not have FormattedID field. You are correct using _UnformattedID. It is the FormattedID without the prefix. For example, this query:
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/1111/artifact/snapshot/query.js?find={"_ProjectHierarchy":2222,"_TypeHierarchy":"PortfolioItem/Feature","State":"Developing",_ValidFrom: {$gte: "2013-06-01TZ",$lt: "2013-09-01TZ"}},sort:{_ValidFrom:-1}}&fields=["_UnformattedID","Name","State"]&hydrate=["State"]&compress=true&pagesize:200
shows _UnformattedID that correspond to FormattedID as this screenshot shows:
I noticed your are using fields and fetch . Per LBAPI's documentation, it uses fields rather than fetch. If you want to get all fields, use fields=true
As far as the missing custom fields, make sure that the custom field value was set within the dates of the query.
Compare these almost identical queries: the first query does not return a custom field, the second query does.
Query #1:
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/1111/artifact/snapshot/query.js?find={"_ProjectHierarchy":2222,"_TypeHierarchy":"PortfolioItem/Feature","State":"Developing",_ValidFrom: {$gte: "2013-06-01TZ",$lt: "2013-09-01TZ"}}}&fields=["_UnformattedID","Name","State","c_PiCustomField"]&hydrate=["State","c_PiCustomField"]
Query #2:
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/11111/artifact/snapshot/query.js?find={"_ProjectHierarchy":2222,"_TypeHierarchy":"PortfolioItem/Feature","State":"Developing",__At: "current"}&fields=["_UnformattedID","Name","State","c_PiCustomField"]&hydrate=["State","c_PiCustomField"]
The first query uses time period: _ValidFrom: {$gte: "2013-06-01TZ",$lt: "2013-09-01TZ"}
The second query uses __At: "current"
Let's say I just create a new custom field on PortfolioItem. It is not possible to create a custom field on PorfolioItem/Feature, so the field is created on PI, but both queries still use "_TypeHierarchy":"PortfolioItem/Feature".
After I created this custom field, called PiCustomField, I set a value of that field for a specific Feature, F4.
The first query does not have a single snapshot that includes that field because that field did not exist in the time period we lookback. We can't change the past.
The second query returns this field for F4. It does not return it for other Features because all other Features do not have this field set.
Here is the screenshot: