We are orchestrating data pipeline with AWS steps and we do need to run EMR jobs in parallel.
I have tried using Map state and it works as expected. The only problem with Map is that in case one step fails , it cancels all the other steps as well. To overcome this issue , I am thinking if we can create an array of steps and pass it dynamically to Branches in parallel state but I have not been able to do it as it is not accepting strings.
Is there a workaround for this or can we only hard code branches in Parallel state? Can States.Array() in someway be helpful in this situation?
Wrap the inner state machine in a one-branch parallel state and add error/retry policies to it. Basically, you want to catch all errors and ensure that the iteration always succeeds.
Just for someone who is trying to look for a solution to the stated problem. As suggested by Pooya, I did use catch block inside task within the Map rather than keeping it at map level.The state machine looks like this
Related
can every branch of a GStreamer pipeline has a different state?
for example the main branch has one state and the state of a different branch will be in a other state?
I"ve tried to check implementation while using several pipelines but this approach did not resolves the issue. the issue cannot be solved by using multiple pipelines since it is required that the branches will change their state based on the resolution of the active pipeline.
thank you very much
the situation is such that we move from jenkins to gitlab ci. Every time a stage occurs in the pipeline, a new container is created, I would like to know if it is possible to make the container used by the previous one, that is, a single one. Gitlab Executer is docker.
I want to save condition of one container
No, this is not possible in a practical way with the docker executor. Each job is executed in its own container. There is no setting to change this behavior.
Keep in mind that jobs (even across stages) can run concurrently and that jobs can land on runners on completely different underlying machines. Therefore, this is not really practical.
I have a pipeline in GitLab that consists of multiple stages.
Each stage has a few jobs and produces artifacts, that are passed to the next stage if all the jobs from a stage will pass.
Something similar to this screenshot:
Is there any way to achieve something similar in GitHub actions?
Generally speaking, you can get very close to what you have above in GitHub actions. You'd trigger a workflow based on push and pull_request events so that it triggers when someone pushes to your repository, then you'd define each of your jobs. You would then use the needs syntax to define dependencies instead of stages (which is similar to the 14.2 needs syntax from GitLab), so for example your auto-deploy job would have needs: [test1, test2].
The one thing you will not be able to replicate is the manual wait on pushing to production. GitHub actions does not have the ability to pause at a job step and wait for a manual action. You can typically work around this by running workflows based on the release event, or by using a manual kick-off of the whole pipeline with a given variable set.
When looking at how to handle artifacts, check out the answer in this other stack overflow question: Github actions share workspace/artifacts between jobs?
I have a dynamic_task which kicks off a number of python_tasks. However, as soon as one of the python_tasks fails, the other ones that are still running would fail as well. Is this by design? Is there a way to change this behavior so that other tasks can still complete without failing?
This is by design, as a means to save resources, but it is configurable. Presumably, dynamic tasks are related to each other, and downstream tasks will need the output of all of them. So if one fails, the default behavior is to fail the rest.
If you'd like to change this, create your dynamic task with a float as this argument in the decorator: https://github.com/lyft/flytekit/blob/d4cfedc4c580f08bf904e6e474a0b948a4608737/flytekit/common/tasks/sdk_dynamic.py#L84
The idea is that partial failures are not tolerated within a data passing DAG. If some node fails, then by definition the data is partial.
But for dynamic array tasks, Flyte allows a special provision (actually the Array tasks plugin), which allows the users to provide a ratio of acceptable successful tasks.
We are using the jenkins build flow plugin(https://wiki.jenkins-ci.org/display/JENKINS/Build+Flow+Plugin) to run our test cases by dividing them into small sub test cases and test them in parallel.
The current problem is even one of the job fails, the other parallel jobs and the hosting flow job will continue running, which is a big waste of resources.
I checked the doc there is no place to control the jobs inside the parallel {}. Any ideas how to deal with that?
Looking at the code, I don't see a way to achieve that. I would ask the user mailing list for help.
I am thinking to use Guard / Rescue imbedded in Parallel to do this.
Adding failFast: true within parallel block would cause the build to fail as soon as one of the parallel nodes fails.
You can view this as an example.