How not to destroy container upon failure - molecule

Is there a way to have molecule not destroy the docker container that was created upon a failure?
I have the following scenario in molecule.yml:
scenario:
name: default
test_sequence:
- create
- converge
- verify
One of the testinfra test is failing and I'd like to inspect the container after the failure (docker exec -it xxxx /bin/bash).
However, molecule keeps on cleaning up the container:
An error occurred during the test sequence action: 'verify'. Cleaning up.
--> Scenario: 'default'
--> Action: 'destroy'

the --destroy never flag should be used (e.g. molecule test --destroy never).

Related

secret detection in gitlab ci, needs a runner I cant get it to run

Really new to gitlab CI, i'm currently struggling to figure out how to get the secret stage to run it keeps saying i need to configure a runner, the pipeline gets stuck
This job is stuck because the project doesn't have any runners online assigned to it.
If i set the tag to docker it runs but it doesn't seem to work..any ideas?
I added the following to the gitlab ci file
include:
- template: Security/Secret-Detection.gitlab-ci.yml
When i add the above pipeline get stuck stating it has no runners assigned to run this.
You have to assign the runner to Secret detection job. See below:
include:
- template: Jobs/Secret-Detection.gitlab-ci.yml
secret_detection:
tags:
- docker # the example tag name of Gitlab runner by Docker executor
If you would like to change the stage
secret_detection:
stage: test
tags:
- docker

step-functions-local: Can't start state machine within state machine

I've got step-functions-local and serverless-offline configured to test a state machine (let's call it #1) that triggers another state machine (#2) defined within the project.
Both show as created when I fire up the local server with sls offline start --stage dev:
[Serverless Step Functions Local] 2022-07-29 11:03:59.867: [200] CreateStateMachine <=
{"sdkResponseMetadata":null,"sdkHttpMetadata":null,"stateMachineArn":"arn:aws:states:us-east-1:123:stateMachine:Foo",
"creationDate":1659117839863}
[Serverless Step Functions Local] 2022-07-29 11:03:59.883: [200] CreateStateMachine <=
{"sdkResponseMetadata":null,"sdkHttpMetadata":null,"stateMachineArn":
"arn:aws:states:us-east-1:123:stateMachine:Bar","creationDate":1659117839882}
I then test #1 with the following command:
aws stepfunctions --endpoint http://localhost:8083 start-execution --state-machine \
arn:aws:states:us-east-1:123:stateMachine:Foo --name local-test-$RANDOM --input <JSON string payload>
#1 executes several steps successfully, including read/write S3 operations, until it reaches the step to trigger #2; at that point, it fails with an exception that reads in part:
"Error":"StepFunctions- StateMachineDoesNotExistException",
"Cause":"State Machine Does Not Exist: 'arn:aws:states:us-east-1:123:stateMachine:Bar'
(Service: AWSStepFunctions; Status Code: 400; Error Code: StateMachineDoesNotExist
Here's how the step to start state machine #1 is defined in the #1 .yml file:
BarStateMachine:
Type: Task
Resource: "arn:aws:states:::states:startExecution.sync:2"
Parameters:
StateMachineArn:
arn:aws:states:us-east-1:123:stateMachine:Bar
I can get #1 to work if, instead of pointing to the arn for the locally-created #2, I point it to the arn of the deployed version. However, this deployed version is of course a remote resource, which sort of defeats the purpose of local testing. Any ideas on how to get the local version of #2 executed properly?
Try something like export STEP_FUNCTIONS_ENDPOINT=http://localhost:8083 && serverless offline start - that should cause step function local to use itself for the step function service integration.

Ansible: will it skip to next task if the current task failed

Since Ansible execute the tasks in order, will it skip to the next task if the current task failed?
As documented in Error handling in playbooks
:
When Ansible receives a non-zero return code from a command or a failure from a module, by default it stops executing on that host and continues on other hosts.
[...]
You can use ignore_errors to continue on in spite of the failure
From the Ansible docs:
By default Ansible stops executing tasks on a host when a task fails on that host. You can use ignore_errors to continue on in spite of the failure.
- name: Do not count this as a failure
ansible.builtin.command: /bin/false
ignore_errors: yes
The ignore_errors directive only works when the task is able to run and returns a value of ‘failed’. It does not make Ansible ignore undefined variable errors, connection failures, execution issues (for example, missing packages), or syntax errors.
REFERENCES:
Ansible docs: Error handling in playbooks: https://docs.ansible.com/ansible/latest/user_guide/playbooks_error_handling.html#error-handling-in-playbooks

Global variable in Jenkins Repository URL

I am trying to use a global Jenkins variable in the Repository URL field:
Repository URL: ${BUILD-PEND-SRC}
BUILD-PEND-SRC is defined in Configure System and a value of a proper URL is set. If I do a shell execution job with echo ${BUILD-PEND-SRC} it does display the correct value.
However, when I run the job, I get
ERROR: Failed to check out ${BUILD-PEND-SRC}
org.tmatesoft.svn.core.SVNException: svn: E125002: Malformed URL '${BUILD-PEND-SRC}'
Which tells me that Jenkins did not resolve ${BUILD-PEND-SRC}.
I am summarizing the SO answer that solved it for git-based Jenkins pipeline jobs but also applies to svn-based jobs: https://stackoverflow.com/a/57065165/1994888 (credits go to #rupesh).
Summary
Edit your job config
go to the Pipeline section
go to the definition Pipeline script from SCM
uncheck Lightweight checkout
The issue seems to be with the scm-api-plugin (see the bug report in the Jenkins issue tracker), hence, it is not specific to a version control system.

Flink job started from another program on YARN fails with "JobClientActor seems to have died"

I'm new flink user and I have the following problem.
I use flink on YARN cluster to transfer related data extracted from RDBMS to HBase.
I write flink batch application on java with multiple ExecutionEnvironments (one per RDB table to transfer table rows in parrallel) to transfer table by table sequentially (because call of env.execute() is blocking).
I start YARN session like this
export YARN_CONF_DIR=/etc/hadoop/conf
export FLINK_HOME=/opt/flink-1.3.1
export FLINK_CONF_DIR=$FLINK_HOME/conf
$FLINK_HOME/bin/yarn-session.sh -n 1 -s 4 -d -jm 2048 -tm 8096
Then I run my application on YARN session started via shell script transfer.sh. Its content is here
#!/bin/bash
export YARN_CONF_DIR=/etc/hadoop/conf
export FLINK_HOME=/opt/flink-1.3.1
export FLINK_CONF_DIR=$FLINK_HOME/conf
$FLINK_HOME/bin/flink run -p 4 transfer.jar
When I start this script from command line manually it works fine - jobs are submitted to YARN session one by one without errors.
Now I should be able to run this script from another java program.
For this aim I use
Runtime.exec("transfer.sh");
(maybe are there better ways to do this? I have seen at REST API but there are some difficulties because job manager is proxied by YARN).
At the beginning is works as usually - first several jobs are submitted to session and finished successfully. But the following jobs are not submitted to YARN session.
In /opt/flink-1.3.1/log/flink-tsvetkoff-client-hadoop-dev1.log I see error (and no another errors found in DEBUG level)
The program execution failed: JobClientActor seems to have died before the JobExecutionResult could be retrieved.
I have tried to analyse this problem by myself and found out that this error has occurred in JobClient class while sending ping request with timeout to JobClientActor (i.e. YARN cluster).
I tried to increase multiple heartbeat and timeout options like akka.*.timeout, akka.watch.heartbeat.* and yarn.heartbeat-delay options but it doesn't solve the problem - new jobs are not submit to YARN session from CliFrontend.
The environment for both case (manual call and call from another program) is the same. When I call
$ ps axu | grep transfer
it will give me output
/usr/lib/jvm/java-8-oracle/bin/java -Dlog.file=/opt/flink-1.3.1/log/flink-tsvetkoff-client-hadoop-dev1.log -Dlog4j.configuration=file:/opt/flink-1.3.1/conf/log4j-cli.properties -Dlogback.configurationFile=file:/opt/flink-1.3.1/conf/logback.xml -classpath /opt/flink-1.3.1/lib/flink-metrics-graphite-1.3.1.jar:/opt/flink-1.3.1/lib/flink-python_2.11-1.3.1.jar:/opt/flink-1.3.1/lib/flink-shaded-hadoop2-uber-1.3.1.jar:/opt/flink-1.3.1/lib/log4j-1.2.17.jar:/opt/flink-1.3.1/lib/slf4j-log4j12-1.7.7.jar:/opt/flink-1.3.1/lib/flink-dist_2.11-1.3.1.jar:::/etc/hadoop/conf org.apache.flink.client.CliFrontend run -p 4 transfer.jar
I also tried to update flink to 1.4.0 release or change parallelism of job (even to -p 1) but error has still occurred.
I have no idea what could be different? Is any workaround by the way?
Thank you for any help.
Finally I find out how to resolve that error
Just replace Runtime.exec(...) with new ProcessBuilder(...).inheritIO().start().
I really don't know why the call of inheritIO helps in that case because as I understand it just redirects IO streams from child process to parent process.
But I have checked that if I comment out this line of code the program begins to fall again.