What is causing rules to fire repeatedly? - falco

I have created a local test environment using minikube to test custom falco rules.
The goal is to search for keywords in the namespace and pod names and set an Info priority on them so they can be filtered out in Kibana.
The following are the custom macros and rules that I have written:
- macro: ns_contains_whitelist_terms
condition: k8s.ns.name = monitoring or k8s.ns.name = jenkins
- macro: pod_name_contains_whitelist_terms
condition: >
(k8s.pod.name startswith meseeks or
k8s.pod.name startswith jenkins or
k8s.pod.name startswith wazuh)
- rule: priority_whitelist_ns_alert
desc: add an Info priority to the monitoring and jenkins namespaces
condition: ns_contains_whitelist_terms
output: "Namespace is jenkins or monitoring findme1"
priority: INFO
tag: whitelist
- rule: priority_whitelist_pod_name_alert
desc: add an Info priority to pods that start with wazuh, jenkins or meseeks
condition: pod_name_contains_whitelist_terms
output: "Pod name starts with wazuh, jenkins or meseeks findme2"
priority: INFO
tag: whitelist
I have created namespaces and pods to test the rules, and they are firing when I expect them to (when falco starts up, when I spawn shells or interact with the pods, for example).
However, the alerts are firing repeatedly, sometimes hundreds at a time, so that the output when I grep the logs looks something like this sample.
Out of curiosity, I took line counts of the different rule alerts when different events occurred and noted that they are not the same. See the below table:
Event
Namespace rule fired #:
Pod name rule fired #:
Startup
6
4
Spawn shell
106
55
apt update
943
23
install wget
84
26
The only two reasons that I can think of that these rules would be triggered so many times is
I have written the rules incorrectly, or
There are events taking place in the background (not directly triggered by me) that are causing the rules to fire repeatedly.
I believe 2 is the more likely, but would appreciate anyone who is able to confirm that the rules I have written look alright or has any other insights.

Related

semgrep don't scan, report METRICS message

I run semgrep with local yaml rule, but I got the below message which seem block the result, but no obvious error from semgrep:
the result is:
METRICS: Using configs from the Registry (like --config=p/ci) reports
pseudonymous rule metrics to semgrep.dev.
16:38:04 To disable Registry rule metrics, use "--metrics=off".
Running 1 rules...
16:38:04 ran 1 rules on 0 files: 0 findings
The environment is: Linux Docker + python3.6.8 + semgrep0.81.0
I already added "--metrics=off" in the command according its document, is it a bug? It seems this information blocked the scanning (ran 1 rules on 0 files), does anybody know the reason? thanks

How to interrupt triggered gitlab pipelines

I'm using a webhook to trigger my Gitlab pipeline. Sometimes, this trigger is triggered a bunch of times, but my pipelines only has to run the last one (static site generation). Right now, it will run as many pipelines as I have triggered. My pipelines takes 20 minutes so sometimes it's running the rest of the day, which is completely unnecessary.
https://docs.gitlab.com/ee/ci/yaml/#interruptible and https://docs.gitlab.com/ee/user/project/pipelines/settings.html#auto-cancel-pending-pipelines only work on pushed commits, not on triggers
A similar problem is discussed in gitlab-org/gitlab-foss issue 41560
Example of a use-case:
I want to always push the same Docker "image:tag", for example: "myapp:dev-CI". The idea is that "myapp:dev-CI" should always be the latest Docker image of the application that matches the HEAD of the develop branch.
However if 2 commits are pushed, then 2 pipelines are triggered and executed in paralell. Then the latest triggered pipeline often finishes before the oldest one.
As a consequence the pushed Docker image is not the latest one.
Proposition:
As a workaround for *nix you can get running pipelines from API and wait until they finished or cancel them with the same API.
In the example below script checks for running pipelines with lower id's for the same branch and sleeps.
jq package is required for this code to work.
Or:
Create a new runner instance
Configure it to run jobs marked as deploy with concurrency 1
Add the deploy tag to your CD job.
It's now impossible for two deploy jobs to run concurrently.
To guard against a situation where an older pipeline may run after a new one, add a check in your deploy job to exit if the current pipeline ID is less than the current deployment.
Slight modification:
For me, one slight change: I kept the global concurrency setting the same (8 runners on my machine so concurrency: 8).
But, I tagged one of the runners with deploy and added limit: 1 to its config.
I then updated my .gitlab-ci.yml to use the deploy tag in my deploy job.
Works perfectly: my code_tests job can run simultaneously on 7 runners but deploy is "single threaded" and any other deploy jobs go into pending state until that runner is freed up.

Flink job started from another program on YARN fails with "JobClientActor seems to have died"

I'm new flink user and I have the following problem.
I use flink on YARN cluster to transfer related data extracted from RDBMS to HBase.
I write flink batch application on java with multiple ExecutionEnvironments (one per RDB table to transfer table rows in parrallel) to transfer table by table sequentially (because call of env.execute() is blocking).
I start YARN session like this
export YARN_CONF_DIR=/etc/hadoop/conf
export FLINK_HOME=/opt/flink-1.3.1
export FLINK_CONF_DIR=$FLINK_HOME/conf
$FLINK_HOME/bin/yarn-session.sh -n 1 -s 4 -d -jm 2048 -tm 8096
Then I run my application on YARN session started via shell script transfer.sh. Its content is here
#!/bin/bash
export YARN_CONF_DIR=/etc/hadoop/conf
export FLINK_HOME=/opt/flink-1.3.1
export FLINK_CONF_DIR=$FLINK_HOME/conf
$FLINK_HOME/bin/flink run -p 4 transfer.jar
When I start this script from command line manually it works fine - jobs are submitted to YARN session one by one without errors.
Now I should be able to run this script from another java program.
For this aim I use
Runtime.exec("transfer.sh");
(maybe are there better ways to do this? I have seen at REST API but there are some difficulties because job manager is proxied by YARN).
At the beginning is works as usually - first several jobs are submitted to session and finished successfully. But the following jobs are not submitted to YARN session.
In /opt/flink-1.3.1/log/flink-tsvetkoff-client-hadoop-dev1.log I see error (and no another errors found in DEBUG level)
The program execution failed: JobClientActor seems to have died before the JobExecutionResult could be retrieved.
I have tried to analyse this problem by myself and found out that this error has occurred in JobClient class while sending ping request with timeout to JobClientActor (i.e. YARN cluster).
I tried to increase multiple heartbeat and timeout options like akka.*.timeout, akka.watch.heartbeat.* and yarn.heartbeat-delay options but it doesn't solve the problem - new jobs are not submit to YARN session from CliFrontend.
The environment for both case (manual call and call from another program) is the same. When I call
$ ps axu | grep transfer
it will give me output
/usr/lib/jvm/java-8-oracle/bin/java -Dlog.file=/opt/flink-1.3.1/log/flink-tsvetkoff-client-hadoop-dev1.log -Dlog4j.configuration=file:/opt/flink-1.3.1/conf/log4j-cli.properties -Dlogback.configurationFile=file:/opt/flink-1.3.1/conf/logback.xml -classpath /opt/flink-1.3.1/lib/flink-metrics-graphite-1.3.1.jar:/opt/flink-1.3.1/lib/flink-python_2.11-1.3.1.jar:/opt/flink-1.3.1/lib/flink-shaded-hadoop2-uber-1.3.1.jar:/opt/flink-1.3.1/lib/log4j-1.2.17.jar:/opt/flink-1.3.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.3.1/lib/flink-dist_2.11-1.3.1.jar:::/etc/hadoop/conf org.apache.flink.client.CliFrontend run -p 4 transfer.jar
I also tried to update flink to 1.4.0 release or change parallelism of job (even to -p 1) but error has still occurred.
I have no idea what could be different? Is any workaround by the way?
Thank you for any help.
Finally I find out how to resolve that error
Just replace Runtime.exec(...) with new ProcessBuilder(...).inheritIO().start().
I really don't know why the call of inheritIO helps in that case because as I understand it just redirects IO streams from child process to parent process.
But I have checked that if I comment out this line of code the program begins to fall again.

Snakemake MissingOutputException latency-wait ignored

I am attempting to run some picard tools metrics collection in snakemake. A --dryrun works fine with no errors. When I actually run the snake file I receive an MissingOutputException for reasons I do not understand.
First here is my rule
rule CollectAlignmentSummaryMetrics:
input:
"bam_input/final/{sample}/{sample}.ready.bam"
output:
"bam_input/final/{sample}/metrics/{reference}/alignment_summary.metrics"
params:
reference=config['reference']['file'],
memory="10240m"
run:
"java -Xmx{params.memory} -jar $HOME/software/picard/build/libs/picard.jar CollectAlignmentSummaryMetrics R={params.reference} I={input} O={output}"
Now the error.
snakemake --latency-wait 120 -s metrics.snake -p
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
38 CollectAlignmentSummaryMetrics
1 all
39
rule CollectAlignmentSummaryMetrics:
input: bam_input/final/TB5173-T14/TB5173-T14.ready.bam
output: bam_input/final/TB5173-T14/metrics/GRCh37/alignment_summary.metrics
jobid: 7
wildcards: reference=GRCh37, sample=TB5173-T14
Error in job CollectAlignmentSummaryMetrics while creating output file bam_input/final/TB5173-T14/metrics/GRCh37/alignment_summary.metrics.
MissingOutputException in line 21 of/home/bwubb/projects/PD1WES/metrics.snake:
Missing files after 5 seconds:
bam_input/final/TB5173-T14/metrics/GRCh37/alignment_summary.metrics
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Exiting because a job execution failed. Look above for error message
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
The --latency-wait is completely ignored. I have even tried bumping it up to 84600. If I am to run the intended picard java command, it executes no problem. Ive made several snakemake pipelines without any mysterious issues, so this is driving me quite mad. Thank you for any insight!
thanks for reporting.
It is a bug that latency-wait is not propagated when using the run directive. I have fixed that in the master branch.
In your rule, you use the run directive. After run, Snakemake expects plain Python code. You simply provide a string. This means that Python will simply initialize the String and then exit. What you really want here is to use the shell directive. See here. By using the shell directive, your current problem will be fixed, and you should not be affected by the bug. There is also no need to modify latency-wait. Anyway, the fix for the latency-wait bug will occur in the next release for Snakemake.

Unable to start Active MQ on Linux

I am trying to get ActiveMQ server running on a RaspberryPI Debian Squeeze box and all appears to be installed correctly but when I try and start the service I am getting the following...
root#raspberrypi:/var/www/activemq/apache-activemq-5.7.0# bin/activemq
INFO: Loading '/etc/default/activemq'
INFO: Using java '/usr/jre1.7.0_07/bin/java'
/usr/jre1.7.0_07/bin/java: 1: /usr/jre1.7.0_07/bin/java:ELF0
4°: not found
/usr/jre1.7.0_07/bin/java: 2: /usr/jre1.7.0_07/bin/java: Syntax error: "(" unexpected
Tasks provided by the sysv init script:
restart - stop running instance (if there is one), start new instance
console - start broker in foreground, useful for debugging purposes
status - check if activemq process is running
setup - create the specified configuration file for this init script
(see next usage section)
Configuration of this script:
The configuration of this script can be placed on /etc/default/activemq or /root/.activemqrc.
To use additional configurations for running multiple instances on the same operating system
rename or symlink script to a name matching to activemq-instance-<INSTANCENAME>.
This changes the configuration location to /etc/default/activemq-instance-<INSTANCENAME> and
$HOME/.activemqrc-instance-<INSTANCENAME>. Configuration files in /etc have higher precedence.
root#raspberrypi:/var/www/activemq/apache-activemq-5.7.0#
It looks like there is an error somewhere but I am a fairly newbie at this and don't know where to look.
Just adding an answer since,becoz as per the documentation , the command is wrong
to start activemqm, use
Navigate to [installation directory/bin] and run ./activemq start or simple bin/activemq start
if you want to see it live in a window use bin/activemq console
To stop, you have to kill the process
The default ActiveMQ "getting started" link sends u here : http://activemq.apache.org/getting-started.html which is the "Getting Started Guide for ActiveMQ 4.x".
If you scroll down main documentation page, you will find this link with the proper commands :
http://activemq.apache.org/version-5-getting-started.html