Number of Concurrent pipeline Execution config - spinnaker

Have been looking at Spinnaker code to see where the MAX value for the number of concurrent pipeline execution is defined.
Closest I found was new JedisPipelineStack("PIPELINE_QUEUE", jedisPool) in OrcaPersistenceConfiguration.groovy. And I'm trying to find the configmap for Orca.Any pointers?

Related

Snakemake 200000 job submission

I have 200000 fasta sequences. I am doing GATK to call variants and created a wildcard for every sequence. Now I would like to submit 200000 jobs using snakemake. Will this cause a problem to cluster? Is there a way to submit jobs in a set of 10-20?
First off, it might take some time to calculate the DAG, but I have been told the DAG calculation recently has been greatly improved. Anyways, it might be wise to split up in batches.
Most clusters won't allow you to submit more than X jobs at the same time, usually in the range of 100-1000. I believe the documentation is not fully correct, but when using --cluster cluster I believe the --jobs argument controls the number of active/submitted jobs at the same time, so by using snakemake --jobs 20 --cluster "myclustercommand" you should be able to control this. Know that this control the number of submitted jobs, not active jobs. It might be that all your jobs are in the queue, so probably best to check in with your cluster administrator and ask what the maximum number of submitted jobs is, and get as close to that number.

Azure Data Factory Limits

I have created a simple pipeline that operates as such:
Generates an access token via an Azure Function. No problem.
Uses a Lookup activity to create a table to iterate through the rows (4 columns by 0.5M rows). No problem.
For Each activity (sequential off, batch-size = 10):
(within For Each): Set some variables for checking important values.
(within For Each): Pass values through web activity to return a json.
(within For Each): Copy Data activity mapping parts of the json to the sink-dataset (postgres).
Problem: The pipeline slows to a crawl after approximately 1000 entries/inserts.
I was looking at this documentation regarding the limits of ADF.
ForEach items: 100,000
ForEach parallelism: 20
I would expect that this falls within in those limits unless I'm misunderstanding it.
I also cloned the pipeline and tried it by offsetting the query in one, and it tops out at 2018 entries.
Anyone with more experience be able to give me some idea of what is going on here?
As a suggestion, whenever I have to fiddle with variables inside a foreach, I made a new pipeline for the foreach process, and call it from within the foreach. That way I make sure that the variables get their own context for each iteration of the foreach.
Have you already checked that the bottleneck is not at the source or sink? If the database or web service is under some stress, then going sequential may help if your scenario allows that.
Hope this helped!

Spinnaker Automated Canary Analysis results in either 0 or 100 score nothing in between

I am using Spinnaker/Kayenta for canary analysis. When canary stage runs, it either results in giving 0 or 100 score and nothing in between.
Is this is expected behavior??
How is the scoring done??
Looking at the pattern, seems like if the
Run Canary# fails because of genuine reason ['Canary score of previous interval(doesn't matter whether you have intervals or not) in less than marginal score.']. The Aggregate Canary Results phase never runs. Example snapshot below. It just produces a score of 0.
Steps to Reproduce:
Set up a canary pipeline in spinnaker.
Set it fail during canary analysis.
Additional Details:
When Run Canary# phase is successful. It executes the Aggregate Canary Results phase and produces a score of 100.
Just found out it was configuration that was causing Kayenta to terminate the canary build and not perform the Aggregate Canary Results phase.
Criticality: Fail the canary if this metric fails
option was toggled on. It should be off to get the score.

Check number of slots used by a query in BigQuery

Is there a way to check how many slots were used by a query over the period of its execution in BigQuery? I checked the execution plan but I could just see the Slot Time in ms but could not see any parameter or any graph to show the number of slots used over the period of execution. I even tried looking at Stackdriver Monitoring but I could not find anything like this. Please let me know if it can be calculated in some way or if I can see it somewhere I might've missed seeing.
A BigQuery job will report the total number of slot-milliseconds from the extended query stats in the job metadata, which is analogous to computational cost. Each stage of the query plan also indicates input stats for the stage, which can be used to indicate the number of units of work each stage dispatched.
More details about the representation can be found in the REST reference for jobs. See query.statistics.totalSlotMs and statistics.query.queryPlan[].parallelInputs for more information.
BigQuery now provides a key in the Jobs API JSON called "timeline". This structure provides "statistics.query.timeline[].completedUnits" which you can obtain either during job execution or after. If you choose to pull this information after a job has executed, "completedUnits" will be the cumulative sum of all the units of work (slots) utilised during the query execution.
The question might have two parts though: (1) Total number of slots utilised (units of work completed) or (2) Maximum parallel number of units used at a point in time by the query.
For (1), the answer is as above, given by "completedUnits".
For (2), you might need to consider the maximum value of queryPlan.parallelInputs across all query stages, which would indicate the maximum "number of parallelizable units of work for the stage" (https://cloud.google.com/bigquery/query-plan-explanation)
If, after this, you additionally want to know if the 2000 parallel slots that you are allocated across your entire on-demand query project is sufficient, you'd need to find the point in time across all queries taking place in your project where the slots being utilised is at a maximum. This is not a trivial task, but Stackdriver monitoring provides the clearest view for you on this.

JMeter fail tests if threshold exceeded

I'm hoping that I can find help here because I didn't find anything on the internet. I have multiple JMeter plans and I want to fail the plan if a throughput threshold for a group of requests is exceeded. How can I get the real threshold value from JMeter and fail the test if it is exceeded. I need to do this per request, like the threshold value displayed in the Summary Report per each group of requets.
Thank you in advance.
You cannot fail the "plan", you can fail only a sampler using Assertion
The options are in:
Using JMeter AutoStop Plugin stop the test if average response time exceeds threshold. After test finishes you can compare anticipated duration with real duration and if it less - state that test has failed somehow (i.e. return non-zero exit code
Using Taurus tool as a wrapper for your JMeter test you can use Pass/Fail Criteria subsystem to set the desired failure conditions. Taurus will automatically fail the test if the specified criteria is met.