How to create multiple instances from a template map in groovy - variables

For an internal status logging in my jenkins pipeline I have created a "template map which I want do use in multiple stages which are running independently in parallel
def status= [
a : '',
b: [
b1: '',
b2: '',
b3: ''
],
c: [
c1: '',
c2 : ''
]
]
this status template I want to pass to multiple parallel running functions/executors. Inside the parallel branches I want to modify the status independently. See the following minimal example
def status= [
a : '',
b: [
b1: '',
b2: '',
b3: ''
],
c: [
c1: '',
c2 : ''
]
]
def label1 = "windows"
def label2 = ''
parallel firstBranch: {
run_node(label1, status)
}, secondBranch: {
run_node(label2, status)
},
failFast: true|false
def run_node (label, status){
node(label) {
status.b.b1 = env.NODE_NAME +"_"+ env.EXECUTOR_NUMBER
sleep(1)
echo "env.NODE_NAME_env.EXECUTOR_NUMBER: ${status.b.b1}"
// expected: env.NODE_NAME_env.EXECUTOR_NUMBER
this.a_function(status)
echo "env.NODE_NAME_env.EXECUTOR_NUMBER: ${status.b.b1}"
// expected(still): env.NODE_NAME_env.EXECUTOR_NUMBER (off current node)
// is: env.NODE_NAME_env.EXECUTOR_NUMBERmore Info AND probably from the wrong node
}
}
def a_function(status){
status.b.b1 += "more Info"
echo "env.NODE_NAME_env.EXECUTOR_NUMBERmore Info: ${status.b.b1}"
// expected: env.NODE_NAME_env.EXECUTOR_NUMBERmore Info
sleep(0.5)
echo "env.NODE_NAME_env.EXECUTOR_NUMBERmore Info: ${status.b.b1}"
// expected: env.NODE_NAME_env.EXECUTOR_NUMBERmore Info
}
Which results in
[firstBranch] env.NODE_NAME_env.EXECUTOR_NUMBER:
LR-Z4933-39110bdb_0
[firstBranch] env.NODE_NAME_env.EXECUTOR_NUMBERmore Info:
LR-Z4933-39110bdb_0more Info
[firstBranch] env.NODE_NAME_env.EXECUTOR_NUMBER>more Info:
LR-Z4933-39110bdb_0more Info
[firstBranch] env.NODE_NAME_env.EXECUTOR_NUMBER:
LR-Z4933-39110bdb_0more Info
[secondBranch] env.NODE_NAME_env.EXECUTOR_NUMBER:
LR-Z4933-39110bdb_0more Info
[secondBranch] env.NODE_NAME_env.EXECUTOR_NUMBERmore Info:
LR-Z4933-39110bdb_0more Infomore Info
[secondBranch] env.NODE_NAME_env.EXECUTOR_NUMBERmore Info:
LR-Z4933-39110bdb_0more Infomore Info
[secondBranch] env.NODE_NAME_env.EXECUTOR_NUMBER:
LR-Z4933-39110bdb_0more Infomore Info
Note that in the status in the first branch is overwritten by the second branch and the other way around.
How to realize independent status variables when passing thm as a parameter to functions

You could define the template map. When you need multiple instances of the same which you may want to modify differently per instance by using cloned template map.
Here is short code snippet to show the example.
def template = [a: '', b: '']
def instancea = template.clone()
def instanceb = template.clone()
def instancec = template.clone()
instancea.a = 'testa'
instanceb.a = 'testb'
instancec.a = 'testc'
println instancea
println instanceb
println instancec
Of course, you can include bigger map, the above is only for demonstration.

You are passing status by reference to the function. But even if you do a status.clone(), I suspect this isn't a deep copy of status. status.b probably still points to the same reference. You need to make a deep copy of status and send that deep copy to the function.
I'm not sure a deep copy of a framework map is the right way to do this. You could just send an empty map [:] and let the called functions add the pieces to the map that they need. If you really need to pre-define the content of the map, then I think you should add a class and create new objects from that class.

Related

Creating ruleset for API Governance - Anypoint Platform

/example:
/{uriParams}:
get:
is: [defaultResponses, commonHeaders]
uriParameters:
uriParams:
description: Example description uriParams
body:
application/json:
example: !include examples.example.json
I would like creating the ruleset that checking the example !include and the traits (defaultResponse, commonHeaders) Now I have like this but this ruleset working separately.(It's mean that if I have ruleset with "traits" and "example" in the same file there is only working "traits". If I delete the ruleset from file "traits". It's working the ruleset "example".) But I would like that they working together.
And also I'm trying doing ruleset for checking all fields are have name with camelCase example: "camelCase-exampleTwo"
provide-examples:
message: Always include examples in request and response bodies
targetClass: apiContract.Payload
rego: |
schema = find with data.link as $node["http://a.ml/vocabularies/shapes#schema"]
nested_nodes[examples] with data.nodes as object.get(schema, "http://a.ml/vocabularies/apiContract#examples", [])
examples_from_this_payload = { element |
example = examples[_]
sourcemap = find with data.link as object.get(example, "http://a.ml/vocabularies/document-source-maps#sources", [])
tracked_element = find with data.link as object.get(sourcemap, "http://a.ml/vocabularies/document-source-maps#tracked-element", [])
tracked_element["http://a.ml/vocabularies/document-source-maps#value"] = $node["#id"]
element := example
}
$result := (count(examples_from_this_payload) > 0)
traits:
message: common default
targetClass: apiContract.EndPoint
propertyConstraints:
apiContract.ParametrizedTrait:
core.name:
pattern: defaultResponses
camel-case-fields:
message: Use camelCase.
targetClass: apiContract.EndPoint
if:
propertyConstraints:
shacl.name:
in: ['path']
then:
propertyConstraints:
shacl.name:
pattern: "^[a-z]+([A-Z][a-z]+)*$"

Nextflow: Not all items in channel used by process

I've been struggling to identify why a nextflow (v20.10.00) process is not using all the items in a channel. I want the process to run for each sample bam file (10 in total) and for each chromosome (3 in total).
Here is the creation of the channels and the process:
ref_genome = file( params.RefGen, checkIfExists: true )
ref_dir = ref_genome.getParent()
ref_name = ref_genome.getBaseName()
ref_dict = file( "${ref_dir}/${ref_name}.dict", checkIfExists: true )
ref_index = file( "${ref_dir}/${ref_name}.*.fai", checkIfExists: true )
// Handles reading in data if the previous step is skipped
if( params.Skip_BP ){
Channel
.fromFilePairs("${params.ProcBamDir}/*{bam,bai}") { file -> file.name.replaceAll(/.bam|.bai$/,'') }
.ifEmpty { error "No bams found in ${params.ProcBamDir}" }
.map { ID, files -> tuple(ID, files[0], files[1]) }
.set { processed_bams }
}
// Setting up the chromosome channel
if( params.Chroms == "" ){
// Defaulting to using all chromosomes
chromosomes_ch = Channel
.from("AgamP4_2L", "AgamP4_2R", "AgamP4_3L", "AgamP4_3R", "AgamP4_X", "AgamP4_Y_unplaced", "AgamP4_UNKN")
println "No chromosomes specified, using all major chromosomes: AgamP4_2L, AgamP4_2R, AgamP4_3L, AgamP4_3R, AgamP4_X, AgamP4_Y_unplaced, AgamP4_UNKN"
} else {
// User option to choose which chromosome will be used
// This worked with the following syntax nextflow run testing.nf --profile imperial --Chroms "AgamP4_3R,AgamP4_2L"
chrs = params.Chroms.split(",")
chromosomes_ch = Channel
.from( chrs )
println "User defined chromosomes set: ${params.Chroms}"
}
process DNA_HCG {
errorStrategy { sleep(Math.pow(2, task.attempt) * 600 as long); return 'retry' }
maxRetries 3
maxForks params.HCG_Forks
tag { SampleID+"-"+chrom }
executor = 'pbspro'
clusterOptions = "-lselect=1:ncpus=${params.HCG_threads}:mem=${params.HCG_memory}gb:mpiprocs=1:ompthreads=${params.HCG_threads} -lwalltime=${params.HCG_walltime}:00:00"
publishDir(
path: "${params.HCDir}",
mode: 'copy',
)
input:
each chrom from chromosomes_ch
set SampleID, path(bam), path(bai) from processed_bams
path ref_genome
path ref_dict
path ref_index
output:
tuple chrom, path("${SampleID}-${chrom}.vcf") into HCG_ch
path("${SampleID}-${chrom}.vcf.idx") into idx_ch
beforeScript 'module load anaconda3/personal; source activate NF_GATK'
script:
"""
if [ ! -d tmp ]; then mkdir tmp; fi
taskset -c 0-${params.HCG_threads} gatk --java-options \"-Xmx${params.HCG_memory}G -XX:+UseParallelGC -XX:ParallelGCThreads=${params.HCG_threads}\" HaplotypeCaller \\
--tmp-dir tmp/ \\
--pair-hmm-implementation AVX_LOGLESS_CACHING_OMP \\
--native-pair-hmm-threads ${params.HCG_threads} \\
-ERC GVCF \\
-L ${chrom} \\
-R ${ref_genome} \\
-I ${bam} \\
-O ${SampleID}-${chrom}.vcf ${params.GVCF_args}
"""
}
But for reasons I cannot figure out, nextflow only creates 3 jobs: [d8/45499b] process > DNA_HCG (0_wt5_BP-CM029350.1) [ 0%] 0 of 3
I thought maybe it was because it only took the first sample and then one process for each chromosome. Though I doubted this since the code works for a different reference genome correctly. Regardless, I adjusted the input channels:
processed_bams
.combine(chromosomes_ch)
.set { HCG_in }
and
input:
set SampleID, path(bam), path(bai), chrom from HCG_in
But this resulted in only a single job being created: [6e/78b070] process > DNA_HCG (0_wt10_BP-CM029350.1) [ 0%] 0 of 1
Confusingly, when i use HCG_in.view() there are 30 items. And to further confuse me the correct number of jobs comes from the following code:
chrs = params.Chroms.split(",")
chromosomes_ch = Channel
.from(chrs)
Channel
.fromFilePairs("${params.ProcBamDir}/*{bam,bai}") { file -> file.name.replaceAll(/.bam|.bai$/,'') }
.ifEmpty { error "No bams found in ${params.ProcBamDir}" }
.map { ID, files -> tuple(ID, files[0], files[1]) }
.set { processed_bams }
process HCG {
executor 'local'
input:
each chrom from chromosomes_ch
set SampleID, path(bam), path(bai) from processed_bams
//set SampleID, path(bam), path(bai), chrom from HCG_in
script:
"""
echo "${SampleID} - ${chrom}"
"""
}
Output: [75/c1c25a] process > HCG (27) [100%] 30 of 30 ✔
I'm hoping I've just missed something obvious, but I cannot see it at the moment. Thanks in advance for the help.
Issues like this almost always involve the use of multiple input channels:
When two or more channels are declared as process inputs, the process
stops until there’s a complete input configuration ie. it receives an
input value from all the channels declared as input.
Your initial assessment was correct. However, the reason only three processes were run (i.e. one sample for each of the three chromosomes), is because this line (probably) returned a list (i.e. a java LinkedList) containing a single element, and lists behave like queue channels:
ref_index = file( "${ref_dir}/${ref_name}.*.fai", checkIfExists: true )
You might have expected this to return a UnixPath. Ultimately, the solution is to ensure ref_index is value channel.

channel checks as empty even if it has content

I am trying to have a process that is launched only if a combination of conditions is met, but when checking if a channel has a path to a file, it always returns it as empty. Probably I am doing something wrong, in that case please correct my code. I tried to follow some of the suggestions in this issue but no success.
Consider the following minimal example:
process one {
output:
file("test.txt") into _chProcessTwo
script:
"""
echo "Hello world" > "test.txt"
"""
}
// making a copy so I check first if something in the channel or not
// avoids raising exception of MultipleInputChannel
_chProcessTwo.into{
_chProcessTwoView;
_chProcessTwoCheck;
_chProcessTwoUse
}
//print contents of channel
println "Channel contents: " + _chProcessTwoView.toList().view()
process two {
input:
file(myInput) from _chProcessTwoUse
when:
(!_chProcessTwoCheck.toList().isEmpty())
script:
def test = _chProcessTwoUse.toList().isEmpty() ? "I'm empty" : "I'm NOT empty"
println "The outcome is: " + test
}
I want to have process two run if and only if there is a file in the _chProcessTwo channel.
If I run the above code I obtain:
marius#dev:~/pipeline$ ./bin/nextflow run test.nf
N E X T F L O W ~ version 19.09.0-edge
Launching `test.nf` [infallible_gutenberg] - revision: 9f57464dc1
[c8/bf38f5] process > one [100%] 1 of 1 ✔
[- ] process > two -
[/home/marius/pipeline/work/c8/bf38f595d759686a497bb4a49e9778/test.txt]
where the last line are actually the contents of _chProcessTwoView
If I remove the when directive from the second process I get:
marius#mg-dev:~/pipeline$ ./bin/nextflow run test.nf
N E X T F L O W ~ version 19.09.0-edge
Launching `test.nf` [modest_descartes] - revision: 5b2bbfea6a
[57/1b7b97] process > one [100%] 1 of 1 ✔
[a9/e4b82d] process > two [100%] 1 of 1 ✔
[/home/marius/pipeline/work/57/1b7b979933ca9e936a3c0bb640c37e/test.txt]
with the contents of the second worker .command.log file being: The outcome is: I'm empty
I tried also without toList()
What am I doing wrong? Thank you in advance
Update: a workaround would be to check _chProcessTwoUse.view() != "" but that is pretty dirty
Update 2 as required by #Steve, I've updated the code to reflect a bit more the actual conditions i have in my own pipeline:
def runProcessOne = true
process one {
when:
runProcessOne
output:
file("inputProcessTwo.txt") into _chProcessTwo optional true
file("inputProcessThree.txt") into _chProcessThree optional true
script:
// this would replace the probability that output is not created
def outputSomething = false
"""
if ${outputSomething}; then
echo "Hello world" > "inputProcessTwo.txt"
echo "Goodbye world" > "inputProcessThree.txt"
else
echo "Sorry. Process one did not write to file."
fi
"""
}
// making a copy so I check first if something in the channel or not
// avoids raising exception of MultipleInputChannel
_chProcessTwo.into{
_chProcessTwoView;
_chProcessTwoCheck;
_chProcessTwoUse
}
//print contents of channel
println "Channel contents: " + _chProcessTwoView.view()
println _chProcessTwoView.view() ? "Me empty" : "NOT empty"
process two {
input:
file(myInput) from _chProcessTwoUse
when:
(runProcessOne)
script:
"""
echo "The outcome is: ${myInput}"
"""
}
process three {
input:
file(defaultInput) from _chUpstreamProcesses
file(inputFromProcessTwo) from _chProcessThree
script:
def extra_parameters = _chProcessThree.isEmpty() ? "" : "--extra-input " + inputFromProcessTwo
"""
echo "Hooray! We got: ${extra_parameters}"
"""
}
As #Steve mentioned, I should not even check if a channel is empty, NextFlow should know better to not initiate the process. But I think in this construct I will have to.
Marius
I think part of the problem here is that process 'one' creates only optional outputs. This makes dealing with the optional inputs in process 'three' a bit tricky. I would try to reconcile this if possible. If this can't be reconciled, then you'll need to deal with the optional inputs in process 'three'. To do this, you'll basically need to create a dummy file, pass it into the channel using the ifEmpty operator, then use the name of the dummy file to check whether or not to prepend the argument's prefix. It's a bit of a hack, but it works pretty well.
The first step is to actually create the dummy file. I like shareable pipelines, so I would just create this in your baseDir, perhaps under a folder called 'assets':
mkdir assets
touch assets/NO_FILE
Then pass in your dummy file if your '_chProcessThree' channel is empty:
params.dummy_file = "${baseDir}/assets/NO_FILE"
dummy_file = file(params.dummy_file)
process three {
input:
file(defaultInput) from _chUpstreamProcesses
file(optfile) from _chProcessThree.ifEmpty(dummy_file)
script:
def extra_parameters = optfile.name != 'NO_FILE' ? "--extra-input ${optfile}" : ''
"""
echo "Hooray! We got: ${extra_parameters}"
"""
}
Also, these lines are problematic:
//print contents of channel
println "Channel contents: " + _chProcessTwoView.view()
println _chProcessTwoView.view() ? "Me empty" : "NOT empty"
Calling view() will emit all values from the channel to stdout. You can ignore whatever value it returns. Unless you enable DSL2, the channel will then be empty. I think what you're looking for here is a closure:
_chProcessTwoView.view { "Found: $it" }
Be sure to append -ansi-log false to your nextflow run command so the output doesn't get clobbered. HTH.

How to do conditional variables definition on Karate

I had written karate tests for one environment only (staging). Since the tests are successful on capturing bugs (thanks a lot Karate and Intuit team!), there is now request to run the tests on production.
Our tests are graphql-based where most of the requests are query. I wonder if it is possible for us to switch variables based on karate.env we passed on terminal?
Most of our requests look like this:
And def variables = {objectID:"1234566", cursor:"1", cursorType:PAGE, size:'10', objectType:USER}
And request { query: '#(query)', variables: '#(variables)' }
When method POST
Then status 200
I had tried reading the conditional-logic page on github page but haven't yet found a success.
What I tried so far is:
* if (karate.env == 'staging') * def variables = {objectID:"1234566", cursor:"1", cursorType:PAGE, size:'10', objectType:USER}
But to no success.
Any help will be greatly appreciated. Thanks a lot!
We keep our graphql queries & variables in separate json files, but, we're attempting to solve the same issue. Based on what Peter wrote I came up with this, though it will likely get cleaned up before deployment.
Given def query = read('graphqlQuery.graphql')
And def prodVariable = read('prod-variables.json')
And def stageVariable = read('stage-variables.json')
And def variables = karate.env == 'prod' ? prodV : stageV
And path 'api/' + 'graphql'
And request { query: '#(query)', variables: '#(variables)' }
When method post
Then status 200
This should be easy:
* def variables = karate.env == 'staging' ? { objectID: "1234566", cursor: "1", cursorType: 'PAGE', size: '10', objectType: 'USER' } : { }
Here is another hint:
* def data = { staging: { foo: 'bar }, production: { foo: 'baz' } }
* def variables = data[karate.env]
EDIT: also see this explanation: https://stackoverflow.com/a/59162760/143475

How to stream SQL results to JSON using Groovy StreamingJsonBuilder?

I am trying to execute a SQL query and convert the results to JSON as follows. Though I got it working without streaming, I'm having some issues using StreamingJsonBuilder to stream the results.
non-streaming code
def writer = new StringWriter()
def jsonBuilder = new StreamingJsonBuilder(writer)
sql.eachRow("select * from client"){ row ->
jsonBuilder( id: row.id, name: row.name )
}
println writer.toString()
Result from the code above
{"id":123,"name":"ABCD"}{"id":124,"name":"NYU"}
The problem with this result is that, all documents are printed on same line without delimitation. How do I get the results as an array and each document pretty-printed as below
Expected result
[
{
id: 123,
name: "ABCD",
...
},
{
id: 124,
name: "NYU",
...
},
]
I put this here more as an fallback. If your problem is just to have your data properly formatted as json, but the sheer amount of data make you use the streaming API, then you are better off with using the streaming for your data and handle the "array" for yourself.
All the calls in the StreamingJsonBuilder take an object and directly write it to the writer. So there is no safe way (I can see) to have the writer open the array, then send the data in chunks you provide and then close the array. So while we already hold the writer, why not just deal with the array your self (this part of json is rather easy to get right):
new File('/tmp/out.json').withWriter{ writer ->
writer << '['
def jsonBuilder = new groovy.json.StreamingJsonBuilder(writer)
def first = true
10000000.times{
if (!first) writer << "\n,"
first = false
jsonBuilder(id: it, name: it.toString())
}
writer << ']'
}
I've no access to any SQL to try but the following piece of code should do the job (You need to replace the data variable):
import groovy.json.*
def writer = new StringWriter()
def jsonBuilder = new StreamingJsonBuilder(writer)
def data = [
[id:1, name: 'n1', other: 'o1'],
[id:2, name: 'n2', other: 'o2']
]
def dataJson = jsonBuilder(data.collect { [id:it.id, name:it.name] })
println(JsonOutput.prettyPrint(JsonOutput.toJson(dataJson)))
UPDATE (after #cfrick's comment)
Here, every row is processed one ofter another but, a key (data in this case) is needed.
import groovy.json.*
def writer = new StringWriter()
def jsonBuilder = new StreamingJsonBuilder(writer)
def data = [
[id:1, name: 'n1', other: 'o1'],
[id:2, name: 'n2', other: 'o2']
]
def root = jsonBuilder(data: [])
data.each { d ->
root.data << [id:d.id, name: d.name]
}
println(JsonOutput.prettyPrint(JsonOutput.toJson(root)))