how can i handle syslog-ng parser failure when using a patterndb - syslog-ng

We parse millions of messages a day using syslog-ng, and are in the process of implementing patterndb.
Due to inconsistency in how the messages are composed, in a small percentage of cases, my patterns are insufficient to capture the fields of the message (spacing is off, or sometimes a field is missing altogether).
How can I deal with these cases? Ideally, the parser entry in my log destination would evaluate to false (like a filter) and it would be captured by my fallback log destination.

Try setting drop-unmatched(yes) (needs syslog-ng OSE 3.11 or later):
parser pattern_db {
db-parser(
file("/opt/syslog-ng/var/db/patterndb.xml")
drop-unmatched(yes)
);
};
Also, recent syslog-ng versions have several different parsers that might be better for certain log messages than patterndb, for example, JSON and key=value parsers.

Related

attempt to call field 'replicate_commands' (a nil value)

I use jedis + lua to eval script, here is my lua script:
redis.replicate_commands()
local second = redis.call('TIME')[1]
local currentKey = KEYS[1]..second
if redis.call('EXISTS', currentKey) == 0 then
redis.call('SETEX', currentKey, 1, 1)
return 1
else
return redis.call('INCR', currentKey)
end
As I use 'Time', it reports error:Write commands not allowed after non deterministic commands.
after searching on internet, I add 'redis.replicate_commands()' as first line of lua script, but it still reports error:ERR Error running script (call to f_c89a6ee8ad732a325e530f4a69226851cde302e2): #user_script:1: user_script:1: attempt to call field 'replicate_commands' (a nil value)
Does replicate_commands need arguments or is there a way to solve my problem?
redis version:3.0
jedis version:2.9
lua version: I don't know where to find
The error attempt to call field 'replicate_commands' (a nil value) means replicate_commands() doesn't exists in the redis object. It is a Lua-side error message.
replicate_commands() was introduced until Redis 3.2. See EVAL - Replicating commands instead of scripts. Consider upgrading.
The first error message (Write commands not allowed after non deterministic commands) is a redis-side message, you cannot call write-commands (like SET, SETEX, INCR, etc) after calling non-deterministic commands (like SPOP, SCAN, RANDOMKEY, TIME, etc).
A very important part of scripting is writing scripts that are pure functions.
Scripts executed in a Redis instance are, by default, propagated to
replicas and to the AOF file by sending the script itself -- not the
resulting commands.
This is so if the Redis server is restarted, playing again the AOF log, or also if replicated in a slave, the script should deliver the same dataset.
This is why in Redis 3.2 replicate_commands() was introduced. And starting with Redis 5 scripts are always replicated as effects -- as if replicate_commands() was called when the script started. But for versions before 3.2, you simply cannot do this.
Therefore, either upgrade to 3.2 or later, or pass currentKey already calculated to the script from the client instead.
Note that creating currentKey dynamically makes your script single-instance-only.
All Redis commands must be analyzed before execution to determine
which keys the command will operate on. In order for this to be true
for EVAL, keys must be passed explicitly. This is useful in many ways,
but especially to make sure Redis Cluster can forward your request to
the appropriate cluster node.
Note this rule is not enforced in order to provide the user with
opportunities to abuse the Redis single instance configuration, at the
cost of writing scripts not compatible with Redis Cluster.
Finally, the Lua version at Redis 3.0.0 is Lua 5.1.5, same as all the way up to Redis 6 RC1.

Flink s3 read error: Data read has a different length than the expected

Using flink 1.7.0, but also seen on flink 1.8.0. We are getting frequent but somewhat random errors when reading gzipped objects from S3 through the flink .readFile source:
org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: Data read has a different length than the expected: dataLength=9713156; expectedLength=9770429; includeSkipped=true; in.getClass()=class org.apache.flink.fs.s3base.shaded.com.amazonaws.services.s3.AmazonS3Client$2; markedSupported=false; marked=0; resetSinceLastMarked=false; markCount=0; resetCount=0
at org.apache.flink.fs.s3base.shaded.com.amazonaws.util.LengthCheckInputStream.checkLength(LengthCheckInputStream.java:151)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:93)
at org.apache.flink.fs.s3base.shaded.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:76)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AInputStream.closeStream(S3AInputStream.java:529)
at org.apache.flink.fs.shaded.hadoop3.org.apache.hadoop.fs.s3a.S3AInputStream.close(S3AInputStream.java:490)
at java.io.FilterInputStream.close(FilterInputStream.java:181)
at org.apache.flink.fs.s3.common.hadoop.HadoopDataInputStream.close(HadoopDataInputStream.java:89)
at java.util.zip.InflaterInputStream.close(InflaterInputStream.java:227)
at java.util.zip.GZIPInputStream.close(GZIPInputStream.java:136)
at org.apache.flink.api.common.io.InputStreamFSInputWrapper.close(InputStreamFSInputWrapper.java:46)
at org.apache.flink.api.common.io.FileInputFormat.close(FileInputFormat.java:861)
at org.apache.flink.api.common.io.DelimitedInputFormat.close(DelimitedInputFormat.java:536)
at org.apache.flink.streaming.api.functions.source.ContinuousFileReaderOperator$SplitReader.run(ContinuousFileReaderOperator.java:336)
ys
Within a given job, we generally see many / most of the jobs read successfully, but there's pretty much always at least one failure (say out of 50 files).
It seems this error is actually originating from the AWS client, so perhaps flink has nothing to do with it, but I'm hopeful someone might have an insight as to how to make this work reliably.
When the error occurs, it ends up killing the source and canceling all the connected operators. I'm still new to flink, but I would think that this is something that could be recoverable from a previous snapshot? Should I expect that flink will retry reading the file when this kind of exception occurs?
Maybe you can try to add more connection for s3a like
flink:
...
config: |
fs.s3a.connection.maximum: 320

How to interpret the RabbitMQ Message stats?

I to want get and historize queue metrics for the "Enqueued, Dequeued an Size" (Terminology formerly met on ActiveMQ).
The moving charts provided in the management plugin are not enough for the monitoring that I need to do.
So with RabbitMQ, I'm getting data from https://rabbitmq-server:15672/api/queues/myvhost
This returns json.. for a queue, I can obtain real life production data like :
"messages":0, // for "Size"
"message_stats":{
"deliver_get":171528, // for "Dequeued"
"ack":162348,
"redeliver":9513,
"deliver_no_ack":0,
"deliver":171528,
"get":0,
"publish":51293 // for "Enqueued"
(...)
I'm in particular surprised by the publish counter:
Its value can even decrease between 2 measures done with a couple of minutes of delay ! (see sample chart around 17:00)
As you can see on my data, the deliver_get is significantly larger than the publish.
https://my-rabbitmq:15672/doc/stats.html doesn't give a lot of details that could explain what I actually notice.
Also, under the message_stats object that I obtain, I'm missing the some counters like confirm and return which could be related to the enqueuing.
Are there relationships between these metrics ? (like deliver_get + messages = redeliver + publish.. but that one doesn't work with my figures)
Is there another more detailed documentation about these metrics ?

Corrupted MD-SAL queries when trying to read flows on tables: missing flow rules

I am experiencing a glitch from OpenDaylight (using Mininet).
Essentially, I am querying flow rules on specific nodes and on specific tables. The relevant code is the following, and is run by 1 separate thread per node that I am polling:
public static final InstanceIdentifier<Nodes NODES_II = InstanceIdentifier
.builder(Nodes.class).build();
public static InstanceIdentifier<Table> makeTableIId(NodeId nodeId, Short tableId) {
return NODES_IID.child(Node.class, new NodeKey(nodeId))
.augmentation(FlowCapableNode.class)
.child(Table.class, new TableKey(tableId));
}
and
InstanceIdentifier<Table> tableIId = makeTableIId(nodeId, tableId);
Optional<Table> tableOptional = dataBroker.newReadOnlyTransaction()
.read(LogicalDatastoreType.OPERATIONAL, tableIId).get();
if(!tableOptional.isPresent()) {
continue;
}
List<Flow> flows = tableOptional.get().getFlow();
The behavior: tableOptional is present, and getFlow() returns an empty list.
The observation: there ARE flow rules installed on ALL nodes on the tables I am querying, but for some reason, some of these nodes show none of these flows on none of the tables (here, tables 3, 4, 5, and 6).
The weirdness: On one of the problematic nodes, I have four rules, installed on tables 9, 13, 17 and 22 respectively. They timeout simultaneously after 150 seconds. After they disappear, the query suddenly begins to "see" the flows installed on tables 3, 4, 5, and 6, returning these for each table.
Question: How is this even possible?
EDIT I just realized that the rules whose timeout "suddenly fix everything" were also rules that generated warnings in ODL's log (OpenFlowPlugin to be more specific). I did not observe any obvious issue, so I'd sort of brushed it aside.
Here is the code relevant to the error:
https://pastebin.com/yJDZesXU
Here are the errors I get every time I install a rule that walks through these lines:
https://pastebin.com/c9HYLBt6
I must stress that these rules work as intended, and that printing them out reveals no evident formatting issue. Again, they appear fine when dumped.
My hypothesis is that this warning is a symptom of ODL "messing up" trying to store the rules in MD-SAL, which ends up messing a lot of rule-reading queries. On uninstallation of the garbage that ensues, rule-reading queries become functional again.
This makes sense to me, but then... I haven't understood how to fix these warnings, or what these warnings were about in the first place.
EDIT 2: By commenting lines suspecting of causing the warnings in the above pastebin:
//ipv4MatchBuilder.setIpv4SourceAddressNoMask(...);
//ipv4MatchBuilder.setIpv4SourcenArbitraryBitmask(...);
The warnings disappear, AND the flows appear correctly on all tables, when pinged. This confirms my hypothesis that somewhere, something wrong happens in the data store.
EDIT 3: I have found that by setting any non-trivial arbitrary bitmask, this error goes away. That is, I have tried setting an arbitrary bitmask which was neither null nor "255.255.255.255", and this error has gone away. The problem is I might like having a bitmask for the source, but an exact match on the destination. Even setting the bitmask to "127.255.255.255" (as I tried) is still unnerving. It really feels to me like this is an OpenFlowPlugin glitch, though.
EDIT 4: Steps for reproducing the bug
Install a rule with ipv4 arbitrary bitmask match, with the destination ip set, and the destination arbitrary bitmask either null or set to 255.255.255.255.
Ipv4MatchArbitraryBitMaskBuilder ipv4MatchBuilder = new Ipv4MatchArbitraryBitMaskBuilder();
ipv4MatchBuilder.setIpv4DestinationAddressNoMask(new Ipv4Address("10.0.0.1"));
ipv4MatchBuilder.setIpv4DestinationArbitraryBitmask(new DottedQuad("255.255.255.255"));
matchBuilder = new MatchBuilder().setEthernetMatch(ethernetMatchBuilder.build()).setLayer3Match(ipv4MatchBuilder.build());
... and so on ...
Extra optional steps: Install one such rule for the destination, one such rule for the source, and install equivalent rules where the bitmask is set to something else, like 127.255.255.255.
Make a query to MDSal to fetch flow information from the node on which you installed the flow rule.
Now, do "log:display" inside your ODL controller. You should have a warning about a malformed destination address. Additionally, the Table object you queried should contain no flows, so tableObject.getFlow() should return an empty list.

Parsing structured syslog with syslog-ng

I am trying to leverage the parsing of structured data feature in syslog-ng. From my firewall, I am forwarding the following message:
<14>1 2012-10-06T11:03:56.493 SRX100 RT_FLOW - RT_FLOW_SESSION_CLOSE [junos#2636.1.1.1.2.36 reason="TCP FIN" source-address="192.168.199.207" source-port="59292" destination-address="184.73.190.157" destination-port="80" service-name="junos-http" nat-source-address="50.193.12.149" nat-source-port="19230" nat-destination-address="184.73.190.157" nat-destination-port="80" src-nat-rule-name="source-nat-rule" dst-nat-rule-name="None" protocol-id="6" policy-name="trust-to-untrust" source-zone-name="trust" destination-zone-name="untrust" session-id-32="9375" packets-from-client="9" bytes-from-client="4342" packets-from-server="7" bytes-from-server="1507" elapsed-time="1" application="UNKNOWN" nested-application="UNKNOWN" username="N/A" roles="N/A" packet-incoming-interface="vlan.0"]
Based on the format of the IETF logs, it appears to be correct, but for some reason the structured data is actually being parsed as the message portion of the log and not being parsed as structured data.
On the syslog-ng side, you need to use either a syslog() source, or a tcp() source with flags(syslog-proto) set, and then the stuff will end up in variables like ${.SDATA.junos#2636.1.1.1.2.36.reason} and so on and so forth, which then you can use as you see fit.