In Tekton it's possible to set up a pipeline with multiple tasks that can (potentially) run in parallel and that access the same workspace. However, the documentation is not completely clear on what happens in this situation. Does it "lock" the workspace and force one task to wait until the other is done using it, or can both the tasks access and modify it at the same time (potentially interfering with each others' execution)?
Both tasks can access and modify it at the same time. There is no locking. Be careful that you do not have a concurrency problem!
Related
We are planning to implement a locking mechanism for our documents using xdmp:lock-acquire API in MarkLogic with no timeout option. The document would be locked until the user edits and save the document. As part of this, we are in need to release all the locks at a specified time, say 12.00 AM everyday.
For this, we could use xdmp:lock-release API, but if there are many documents it would take some time to get complete.
Can someone suggest a better way to achieve this in MarkLogic?
If you have a potentially large set of locks that you need to process, and are concerned about timeouts or other issues from doing all of the work in a single transaction, then you can break up the work into smaller chunks or individual transactions.
There are a variety of batch processing tools and frameworks to do that. CoRB is one option that makes it easy to plug in custom selectors and processing scripts, and to execute against giant sets.
If you are looking to initiate the work from a MarkLogic scheduled task and perform all of the work within MarkLogic, then you could spawn multiple tasks to process subsets.
A simple example demonstrating how to set a "chunk size" for each transaction and to keep spawning more work:
declare function local:release-locks($locks, $chunk-size){
if (exists($locks))
then (
(: release all of these locks(you might apply some sort of filter to restrict to a subset,
and maybe a try/catch in case the lock gets released before this runs) :)
$locks[1 to $chunk-size] ! xdmp:node-uri(.) ! xdmp:lock-release(.),
(: now spawn the next set to be released in a separate transaction :)
xdmp:spawn-function(function(){
local:release-locks(subsequence($locks, $chunk-size+1), $chunk-size)
},
<options xmlns="xdmp:eval">
<update>true</update>
<commit>auto</commit>
</options>)
)
else () (: nothing left to do, stop spawning work :)
};
let $locks := xdmp:document-locks()
let $chunk-size := 1000
local:release-locks($locks, $chunk-size)
If you are looking to go down this route, there are some libraries available:
https://github.com/bradmann/marklogic-spawnlib
https://github.com/mblakele/taskbot
The risk of spawning multiple items onto the task server is that if there is a restart or interruption, some tasks may not execute and all locks may not be released. But if you are just looking to release all of the locks, you could then just re-run the script to kick off another round.
This is not clear to me from the docs. Here's our scenario and why we need this as succinctly as I can:
We have 60 coordinators running, launching workflows usually hourly, some of which have sub-workflows (some multiple in parallel). This works out to around 40 workflows running at any given time. However when cluster is under load or some underlying service is slow (e.g. impala or hbase), workflows will run longer than usual and back up so we can end up with 80+ workflows (including sub-workflows) running.
This sometimes results in ALL workflows hanging indefinitely, because we have only enough memory and cores allocated to this pool that oozie can start the launcher jobs (i.e. oozie:launcher:T=sqoop:W=JobABC:A=sqoop-d596:ID=XYZ), but not their corresponding actions (i.e. oozie:action:T=sqoop:W=JobABC:A=sqoop-d596:ID=XYZ).
We could simply allocate enough resources to the pool to accommodate for these spikes, but that would be a massive waste (hundreds of cores and GBs that other pools/tenants could never use).
So I'm trying to enforce some limit on number of workflows running, even if that means some will be running behind sometimes. BTW all our coordinators are configured with execution=LAST_ONLY, and any delayed workflow will simply catch up fully on the next run. We are on CDH 5.13 with Oozie 4.1; pools are setup with DRF scheduler.
Thanks in advance for your ideas.
AFAIK there is not a configuration parameter that let you control the number of workflows running at a given time.
If your coordinators are scheduled to run approximately in the same time-window, you could think to collapse them in just one coordinator/workflow and use the fork/join control nodes to control the degree of parallelism. Thus you can distribute your actions in a K number of queues in your workflow and this will ensure that you will not have more than K actions running at the same time, limiting the load on the cluster.
We use a script to generate automatically the fork queues inside the workflow and distribute the actions (of course this is only for actions that can run in parallel, i.e. there no data dependencies etc).
Hope this helps
I want to manage a huge workflow in Camunda.
I have decided to split this into different processes like Create, Configuration, Review & Confirm. Each of these processes have 10 to 15 tasks. These processes should be executed in sequence.
If I want to design my workflow like this, how will I link each process. What is the proper way for Camunda modular design.
You would probably go with some kind of SubProcess. If you plan to model different processes you most likely will use Call Activities and execute them one ofter another in some kind of root process.
Beware of the fact that each sub process starts its own process instance and thus you have to handle different execution scopes. That will be relevant if you request information from the system like e.g. the List of UserTasks. You can not use the processInstanceId of the root process in this case and will have to use a businessKey.
You also have to handle the process variables and decide which variables you want to propagate to the sub process.
We are using the jenkins build flow plugin(https://wiki.jenkins-ci.org/display/JENKINS/Build+Flow+Plugin) to run our test cases by dividing them into small sub test cases and test them in parallel.
The current problem is even one of the job fails, the other parallel jobs and the hosting flow job will continue running, which is a big waste of resources.
I checked the doc there is no place to control the jobs inside the parallel {}. Any ideas how to deal with that?
Looking at the code, I don't see a way to achieve that. I would ask the user mailing list for help.
I am thinking to use Guard / Rescue imbedded in Parallel to do this.
Adding failFast: true within parallel block would cause the build to fail as soon as one of the parallel nodes fails.
You can view this as an example.
In our environment we have quite a few long-running functional tests which currently tie up build agents and force other builds to queue. Since these agents are only waiting on test results they could theoretically just be handing off the tests to other machines (test agents) and then run queued builds until the test results are available.
For CI builds (including unit tests) this should remain inline as we want instant feedback on failures, but it would be great to get a better balance between the time taken to run functional tests, the lead time of their results, and the throughput of our collective builds.
As far as I can tell, TeamCity does not natively support this scenario so I'm thinking there are a few options:
Spin up more agents and assign them to a 'Test' pool. Trigger functional build configs to run on these agents (triggered by successful Ci builds). While this seems the cleanest it doesn't scale very well as we then have a lead time of purchasing licenses and will often have need to run tests in alternate environments which would temporarily double (or more) the required number of test agents.
Add builds or build steps to launch tests on external machines, then immediately mark the build as successful so queued builds can be processed then, when the tests are complete, we mark the build as succeeded/failed. This is reliant on being able to update the results of a previous build (REST API perhaps?). It also feels ugly to mark something as successful then update it as failed later but we could always be selective in what we monitor so we only see the final result.
Just keep spinning up agents until we no longer have builds queueing. The problem with this is that it's a moving target. If we knew where the plateau was (or whether it existed) this would be the way to go, but our usage pattern means this isn't viable.
Has anyone had success with a similar scenario, or knows pros/cons of any of the above I haven't thought of?
Your description of the available options seems to be pretty accurate.
If you want live update of the builds progress you will need to have one TeamCity agent "busy" for each running build.
The only downside here seems to be the agent licenses cost.
If the testing builds just launch processes on other machines, the TeamCity agent processes themselves can be run on a low-end machine and even many agents on a single computer.
An extension to your second scenario can be two build configurations instead of single one: one would start external process and another one can be triggered on external process completeness and then publish all the external process results as it's own. It can also have a snapshot dependency on the starting build to maintain the relation.
For anyone curious we ended up buying more agents and assigning them to a test pool. Investigations proved that it isn't possible to update build results (I can definitely understand why this ugliness wouldn't be supported out of the box).