Forward only a single group using the WirePattern helper when config.group is true - noflo

I'm trying to use the WirePattern helper to perform some synchronisation within my graph. I'm setting config.group to true so I can ensure that only packets received with the same group are collected and handled within this component.
For the sake of argument, here is an example packet from the first in-port:
<my-group>
123
</my-group>
And the second in-port:
<my-group>
456
</my-group>
Because config.group is set to true, these 2 packets will match by group and I can do something with them in my component. So far so good.
The problem lies in that I want to wrap the output with the same group that the 2 in-ports were matched by. This is what the out packet group should look like:
<my-group>
123456
</my-group>
I assumed config.group would do this by default but it doesn't, it just sends the output with no group:
123456
I tried setting config.forwardGroups to various values in an effort to forward the group from only one of the in-ports (seeing as they're identical). Regardless of whether this is set to true, "portname" or ["portname"], it double-wraps the out packet:
<my-group>
<my-group>
123456
</my-group>
</my-group>
This causes headaches further down the line as the grouping has changed and no longer matches up with the other components. I could manually remove one of the groups using another component, but I shouldn't have to do that.
How can I set up the WirePattern to continue matching by group (using config.group) but only forward a single group to the out port?
I don't mind doing it manually for now if this is something that the WirePattern doesn't support. I just need to know whether I'm doing something wrong, or whether it's just not possible in NoFlo yet.
Here's my config for reference:
var config = {
in: ["in", "value"],
params: ["property"],
out: "out",
// This doesn't forward the group
group: true, // Wait for packets of same group
// This duplicates groups when group: true
forwardGroups: ["value"],
arrayPolicy: {
in: "all", // Wait for all indexes
params: "all" // Wait for all indexes
}
};

This looks like a bug to me as I remember enforcing group uniqueness upon forwarding. I've opened https://github.com/noflo/noflo/issues/269 and will fix it by next NoFlo release.
For now, another workaround would be: don't use forwardGroups feature but rather send groups manually to the output inside the process handler (which is absolutely legal when using WirePattern too):
out.beginGroup(groups[0]);
out.send(input.in + input.value);
out.endGroup();

Related

Splunk Host header overrides host key from log messages

How can I stop Splunk considering hostname "host" more important than "host" key?
Let's suppose that I have the following logs:
color = red ; host = localhost
color = blue ; host = newhost
The following query works fine:
index=myindex | stats count by color
but the following doesn't:
index=myindex | stats count by host
because instead of considering "host" being the key from the log, it sees the Host header as "host".
How can I deal with this?
When there are two fields with the same name one of them has to "win". In this case, it's the one Splunk defines before it processes the event itself. As you probably know, every event is given 4 fields at input time: index, host, source, and sourcetype. Data from the event won't override these unless specifically told to do so in the config files.
To override the settings, put this in your transforms.conf file
[sethost]
REGEX = host\s*=\s*(\w+)
DEST_KEY = MetaData:Host
FORMAT = host::$1
You'll also need to reference the transform in your props.conf file
[mysourcetype]
TRANSFORMS-host = sethost
I would have thought this solution would be more prominent, but I found it buried deep in the Splunk docs.
https://docs.splunk.com/Documentation/Splunk/8.2.6/Metrics/Search
You can use reserved fields such as "source", "sourcetype", or "host" as dimensions. However, when extracted dimension names are reserved names, the name is prefixed with "extracted_" to avoid name collision. For example, if a dimension name is "host", search for "extracted_host" to find it.
So, in your case:
index=myindex | stats count by extracted_host

How can I show the most recent events per user with Keen IO?

Suppose you have a Keen IO collection called "survey-completed" that contains events matching the following pattern:
keen.id: <unique autogenerated id>
keen.timestamp: <autogenerated overridable timestamp>
userId: <hex string for user>
surveyScore: <integer from 1 to 10>
...
How would you create a report of only the most up-to-date satisfaction score for each user that responded to one or more surveys within a given amount of time (such as one week)?
There isn't a really elegant way to make it happen, but for a given userId you could successfully return your the most up-to-date event create a count query with a group_by on [surveyScore, keen.timestamp] and an order_by on the keen.timestamp property. You will want to set limit=1 to select only the most recent surveyScore.
If you'd like to use an extraction, the most straight forward way would be to run an extraction with property_names set to ["userId","keen.timestamp","surveyScore"]. Once you receive the results you can then do some client-side post processing. This is probably the best way if you want to take a look at all of your userIds.
If you're interested in a given userId and want to use an extraction, you can run an extraction with a filter on the userId eq X, define the optional parameter latest set to latest=1. The latest property is an integer containing the number of most recent events to extract. Note: The use of latest will call upon the keen.created_at timestamp instead of keen.timestamp (https://keen.io/docs/api/#the-keen-object).

Apache Flink Error Handing and Conditional Processing

I am new to Flink and have gone through site(s)/examples/blogs to get started. I am struggling with the correct use of operators. Basically I have 2 questions
Question 1: Does Flink support declarative exception handling, I need to handle parse/validate/... errors?
Can I use org.apache.flink.runtime.operators.sort.ExceptionHandler or similar
to handle errors?
or Rich/FlatMap function my best option?
If Rich/FlatMap the only option then is there a way to get handle to Stream inside Rich/FlatMap function so Sink(s) could be attached for error processing?
Question 2: Can I conditionally attach different Sink(s)?
Based on certain field(s) in keyed split streams I need to select different sink(s), do I split the stream again or use a Rich/FlatMap to handle that?
I am using Flink 1.3.2. Here is the relevant portion of my job
.....
.....
DataStream<String> eventTextStream = env.addSource(messageSource)
KeyedStream<EventPojo, Tuple> eventPojoStream = eventTextStream
// parse, transform or enrich
.flatMap(new MyParseTransformEnrichFunction())
.assignTimestampsAndWatermarks(new EventAscendingTimestampExtractor())
.keyBy("eventId");
// split stream based on eventType as different reduce and windowing functions need to be applied
SplitStream<EventPojo> splitStream = eventPojoStream
.split(new EventStreamSplitFunction());
// need to apply reduce function
DataStream<EventPojo> event1TypeStream = splitStream.select("event1Type");
// need to apply reduce function
DataStream<EventPojo> event2TypeStream = splitStream.select("event2Type");
// need to apply time based windowing function
DataStream<EventPojo> event3TypeStream = splitStream.select("event3Type");
....
....
env.execute("Event Processing");
Am I using the correct operators here?
Update 1:
Tried using the ProcessFunction as suggested by #alpinegizmo but that didn't work as it depends upon a keyed stream which I don't have until I parse/validate input. I get "InvalidProgramException: Field expression must be equal to '*' or '_' for non-composite types. ".
It's such a common use case where your first parse/validate input and won't have keyed stream yet, so how do you solve it?
Thanks for your patience and help.
There's one key building block that you've overlooked. Take a look at side outputs.
This mechanism provides a typesafe way to produce any number of additional output streams. This can be a clean way to report errors, among other uses. In Flink 1.3 side outputs can only be used with ProcessFunction, but 1.4 will add side outputs to ProcessWindowFunction.

Infinite list using websql proxy store

Is there support for an infinite list backed by the websql proxy? It doesn't seem so, as whether infinite is true or false, there are only 25 items in the list.
You should use ListPaging plugin in the list.
{
xclass: 'Ext.plugin.ListPaging',
autoPaging: true,
loadMoreText : 'Loading more',
noMoreRecordsText : 'loaded'
}
Please check sencha touch documentation for further info.
I was able to get this to work by modifying the Sql proxy to include total record count. More specifically, in the selectRecords method I had to change the code:
result.setTotal(count);
to a second executeSql call that queries all records. The sql statement is similar to the original one, except that (1) it does not include the LIMIT expression; and (2) the SELECT * should be SELECT COUNT(*) AS TotalCount. Then read TotalCount value from the first row of the result set, call result.setTotal(totalCount), and finally fire the callback.

Read response body in Apache mod_lua

I'm prototyping a simple "output" filter with Apache + mod_lua. How can I read response body, at the end of other native output filters applied, via LUA? For example, can I get the actual response that will be sent to the client?
The manual has some good guidance on this:
http://httpd.apache.org/docs/current/mod/mod_lua.html#modifying_buckets
Modifying contents with Lua filters Filter functions implemented via
LuaInputFilter or LuaOutputFilter are designed as three-stage
non-blocking functions using coroutines to suspend and resume a
function as buckets are sent down the filter chain. The core structure
of such a function is:
function filter(r)
-- Our first yield is to signal that we are ready to receive buckets.
-- Before this yield, we can set up our environment, check for conditions,
-- and, if we deem it necessary, decline filtering a request alltogether:
if something_bad then
return -- This would skip this filter.
end
-- Regardless of whether we have data to prepend, a yield MUST be called here.
-- Note that only output filters can prepend data. Input filters must use the
-- final stage to append data to the content.
coroutine.yield([optional header to be prepended to the content])
-- After we have yielded, buckets will be sent to us, one by one, and we can
-- do whatever we want with them and then pass on the result.
-- Buckets are stored in the global variable 'bucket', so we create a loop
-- that checks if 'bucket' is not nil:
while bucket ~= nil do
local output = mangle(bucket) -- Do some stuff to the content
coroutine.yield(output) -- Return our new content to the filter chain
end
-- Once the buckets are gone, 'bucket' is set to nil, which will exit the
-- loop and land us here. Anything extra we want to append to the content
-- can be done by doing a final yield here. Both input and output filters
-- can append data to the content in this phase.
coroutine.yield([optional footer to be appended to the content])
end