Gitlab CI only variables that come from extends - gitlab-ci

I have the following setup (simplified version), which doesn't run the expected merge::my when I use tag that includes the string "TEST". I can't figure out why is it happening - I know that only doesn't support variable expansion, but here the variable is just a string, that is being set up in a different extend - is that a problem? Would using yaml anchors be better? Are there different suggestions?
The reason that I check for only:variable in merge_builds is because I have many languages, in this case I used en, but I have many others, and I don't want to do the only:variables for each (the real matching is more complex - I stripped it to bare minimum for the example)
.merge_builds:
script:
- echo 'test'
only:
variables:
- $CI_COMMIT_TAG =~ $VARIABLEMATCH
.en_variables:
variables:
VARIABLEMATCH: /^$|(?i)EN/
merge::en:
extends:
- .en_variables
- .merge_builds

Based on GitLab issue 35438, I'd say that it is not currently possible to use a variable (as opposed to a literal) as the regular expression pattern.
Within issue 35438, #furkanayhan explains in a comment titled "Introduction" from 2021-09-06 (sorry, I wasn't able to get a permalink to it) that GitLab will make a simple string comparison between a value and a pattern given as a variable:
variables:
teststring: 'abcde'
pattern: '/^ab.*/'
test1:
script: exit 0
rules:
- if: '$teststring =~ $pattern'
test2:
script: exit 0
rules:
- if: '$teststring =~ /^ab.*/'
The test1 job is not created because the backend makes string comparison between "abcde" and "/'^ab.*/".
The test2 job is created because the backend makes regexp comparison between "abcde" and /'^ab.*/.
I believe that you are encountering the same behavior that caused "test1 job" not to be created.
However, issue 35438 shows that GitLab is planning on offering a fix in version 15.0, scheduled for 2022-05-22.
One other thing you might want to check on is the regular expression itself. GitLab's regexp doc (here) states that GitLab uses the re2 regular expression syntax for these kinds of comparison. To achieve case insensitivity, I believe one appends the "i" flag as in:
/pattern/i

Related

Deploy sql workflow with DBX

I am developing deployment via DBX to Azure Databricks. In this regard I need a data job written in SQL to happen everyday. The job is located in the file data.sql. I know how to do it with a python file. Here I would do the following:
build:
python: "pip"
environments:
default:
workflows:
- name: "workflow-name"
#schedule:
quartz_cron_expression: "0 0 9 * * ?" # every day at 9.00
timezone_id: "Europe"
format: MULTI_TASK #
job_clusters:
- job_cluster_key: "basic-job-cluster"
<<: *base-job-cluster
tasks:
- task_key: "task-name"
job_cluster_key: "basic-job-cluster"
spark_python_task:
python_file: "file://filename.py"
But how can I change it so I can run a SQL job instead? I imagine it is the last two lines of code (spark_python_task: and python_file: "file://filename.py") which needs to be changed.
There are various ways to do that.
(1) One of the most simplest is to add a SQL query in the Databricks SQL lens, and then reference this query via sql_task as described here.
(2) If you want to have a Python project that re-uses SQL statements from a static file, you can add this file to your Python Package and then call it from your package, e.g.:
sql_statement = ... # code to read from the file
spark.sql(sql_statement)
(3) A third option is to use the DBT framework with Databricks. In this case you probably would like to use dbt_task as described here.
I found a simple workaround (although might not be the prettiest) to simply change the data.sql to a python file and run the queries using spark. This way I could use the same spark_python_task.

Delimit BigQuery REGEXP_EXTRACT strings in Google Cloud Build YAML script

I have a complex query that creates a View within the BigQuery console.
I have simplified it to the following to illustrate the issue
SELECT
REGEXP_EXTRACT(FIELD1, r"[\d]*") as F1,
REGEXP_REPLACE(FIELD2, r"\'", "") AS F2,
FROM `project.mydataset.mytable`
Now I am trying to automate the creation of the view with cloud build.
I cannot workout how to delimit the strings inside the regex to work with both yaml and SQL.
- name: 'gcr.io/cloud-builders/gcloud'
entrypoint: 'bq'
args: [
'mk',
'--use_legacy_sql=false',
'--project_id=${_PROJECT_ID}',
'--expiration=0',
'--view=
REGEXP_EXTRACT(FIELD1, r"[\d]*") as F1 ,
REGEXP_REPLACE(FIELD2, r"\'", "") AS F2,
REGEXP_EXTRACT(FIELD3, r"\[(\d{3,12}).*\]") AS F3
FROM `project.mydataset.mytable`"
'${_TARGET_DATASET}.${_TARGET_VIEW}'
]
I get the following error
Failed to trigger build: failed unmarshalling build config
cloudbuild/build-views.yaml: json: cannot unmarshal number into Go
value of type string
I have tried using Cloud Build substitution parameters, and as many combinations of SQL and YAML escape sequences as I can think of to find a working solution.
Generally, you want to use block scalars in such cases, as they do not process any special characters inside them and are terminated via indentation.
I have no idea how the command is supposed to look, but here's something that's at least valid YAML:
- name: 'gcr.io/cloud-builders/gcloud'
entrypoint: 'bq'
args:
- 'mk'
- '--use_legacy_sql=false'
- '--project_id=${_PROJECT_ID}'
- '--expiration=0'
- >- # folded block scalar; newlines are folded into spaces
--view=
REGEXP_EXTRACT(FIELD1, r"[\d]*") as F1,
REGEXP_REPLACE(FIELD2, r"\'", "") AS F2,
REGEXP_EXTRACT(FIELD3, r"\[(\d{3,12}).*\]") AS F3
FROM `project.mydataset.mytable`"
'${_TARGET_DATASET}.${_TARGET_VIEW}'
- dummy value to show that the scalar ends here
A folded block scalar is started with >, the following minus tells YAML to not append the final newline to its value.

snakemake unpack with shell & conda

I have the basic "input can be single end or paired end reads" problem for my snakemake pipeline. I'd like to use unpack if possible, since it seems designed for this situation (as illustrated in the answer for this issue), but I also want to use conda:, which requires shell:. I believe that shell: will die if I have {input.read2} but it's not provided by unpack(). Is there any good way of getting around this besides either 1) creating 2 nearly identical rules 2) making an empty read2 (if single-end) and then creating an if-else in shell to check for whether read2 is empty. Neither is ideal.
Try to combine your input function with a params function to generate the flags for either paired or single end. Using the bowtie example from your link:
def bowtie2_inputs(wildcards):
if (seq_type == "pe"):
return expand("{reads}_{strand}.fastq", strand=["R1", "R2"], reads=wildcards.reads)
elif (seq_type == "se"):
return expand("{reads}.fastq", reads=wildcards.reads)
def bowtie2_params(wildcards, input):
if (seq_type == "pe"):
return f'-1 {input.reads[0]} -2 {input.reads[1]}'
else:
return f'-U {input.reads}'
rule bowtie2:
input:
reads=bowtie2_inputs,
index=bowtie2_index
output:
sam="{reads}_bowtie2.sam"
params:
file_args=bowtie2_params
conda: <env>
shell:
"bowtie2 -x {input.index} {params.file_args} -S {output.sam}"
Not sure it's any better than the shell option. I would use two rules with a ruleorder preferring the paired ends. That would be easier to modify if you wanted say a different aligner or to change parameters for each case. As is this requires a bit of jumping around to actually see what the rule does.

What's the easiest way to get Grammar::Tracer working on Perl6 itself?

To get an idea how perl6 parses your code, you can use the --target option:
$ perl6 --target=parse -e '"Hello World".say'
- statementlist: "Hello World".say
- statement: 1 matches
- EXPR: .say
- 0: "Hello World"
- value: "Hello World"
- quote: "Hello World"
- nibble: Hello World
- OPER: .say
- sym: .
- dottyop: say
- methodop: say
- longname: say
- name: say
- identifier: say
- O: <object>
- dotty: .say
- sym: .
- dottyop: say
- methodop: say
- longname: say
- name: say
- identifier: say
- O: <object>
$
Far better is the Grammar::Tracer module described here. According to the module documentation, one simply adds use Grammar::Tracer and any grammar defined in the scope where the use statement appears will be traced.
My question is simply this: If I'm using a "star release", what's the easiest way to get tracing (using Grammar::Tracer) on the Perl6 Grammar itself?
Alternatively, if I'm using rakudobrew, what's the easiest way to get tracing on the Perl6 Grammar itself?
It's recommended that perl6 users use star releases - would a desire to examine more closely how perl6 parses itself, using Grammar::Tracer, be worth building from source locally instead?
So the grammar in Rakudo is near enough a Perl 6 grammar, but its implemented at the NQP level https://github.com/rakudo/rakudo/blob/nom/src/Perl6/Grammar.nqp So the magic of Grammar::Tracer wont work here. However, you can use the STD grammar https://github.com/perl6/std/blob/master/STD.pm6 to parse some code and that should work with Grammar::Tracer, I've been fiddling around trying to get it to work with Grammar::Highlighter. Hope that helps?

Go benchmark by function name

I have this Benchmark function:
BenchmarkMyTest(b *testing.B) {
}
And I would like to run only this function not running all other tests, but the command never worked for me.
go test -bench='BenchmarkMyTest'
or
go test -run='BenchmarkMyTest'
What's the correct way of running one single benchmark function in Go?
It says to use regex but I can't find any documentation.
Thanks,
Described at Command Go: Description of testing flags:
-bench regexp
Run benchmarks matching the regular expression.
By default, no benchmarks run. To run all benchmarks,
use '-bench .' or '-bench=.'.
-run regexp
Run only those tests and examples matching the regular
expression.
So the syntax is that you have to separate it with a space or with the equal sign (with no apostrophe marks), and what you specify is a regexp:
go test -bench BenchmarkMyTest
go test -run TestMyTest
Or:
go test -bench=BenchmarkMyTest
go test -run=TestMyTest
Specifying exactly 1 function
As the specified expression is a regexp, this will also match functions whose name contains the specified name (e.g. another function whose name starts with this, for example "BenchmarkMyTestB"). If you only want to match "BenchmarkMyTest", append the regexp word boundary '\b':
go test -bench BenchmarkMyTest\b
go test -run TestMyTest\b
Note that it's enough to append it only to the end as if the function name doesn't start with "Benchmark", it is not considered to be a benchmark function, and similarly if it doesn't start with "Test", it is not considered to be a test function (and will not be picked up anyway).
I found those answers incomplete, so here is more to the topic...
The following command runs all Benchmarks starting with BenchmarkMyTest (BenchmarkMyTest1, BenchmarkMyTest2, etc...) and also skip all tests with -run=^$ .
You can also specify a test duration with -benchtime 5s or you can force b.ReportAllocs() with -benchmem in order to get values like:
BenchmarkLogsWithBytesBufferPool-48 46416456 26.91 ns/op 0 B/op 0 allocs/op
the final command would be:
go test -bench=^BenchmarkMyTest . -run=^$ . -v -benchtime 5s -benchmem