Accessing the attempt counter for change in job behaviour - snakemake

I have a job that might fail with a specific configuration, what I want to do is let it run once and if it fails run a slightly different configuration.
I found the attempt parameter but I don't find a way to access it outside the resources tag...
do you know how to access it or any alternative?

the attempt counter is contained as argument "'--attempt' 'int'" to the jobscript (in my case a wrapper script in python)
therefore you can access it for example with:
sys.argv[sys.argv.index('--attempt')+1]

Related

serverless - aws - SecureLambdaFunction env

I'm having the following case:
I setting several environments variables on my serverless.yml file like:
ONE_CLIENT_SECRET=${ssm:/one/key_one~true}
ONE_CLIENT_PUBLIC=${ssm:/one/key_two~true}
ANOTHER_SERVICE_KEY=${ssm:/two/key_one~true}
ANOTHER_SERVICE_SECRET=${ssm:/two/key_two~true}
let' say I have like 10 envs, when I try to deploy I get the following error:
An error occurred: SecureLambdaFunction - Lambda was unable to configure your environment variables because the environment variables you have provided exceeded the 4KB limit. String measured: JSON_WITH_MY_VARIABLES_HERE
So I cannot deploy, I have an idea of what the problem is but I dont have a clear path to solve it, so my questions are:
1) How can I extend the 4Kb limit?
2) assuming my variables are set using SSM, I'm using the EC2 Parameter store to save them. (this is more related to a serverless team or someone that knows the topic) how does it work behind the scenes?
- when I run sls deploy does it fetch for the values and included on the .zip file? (this is what I think it does, I just want to clarify) or does it fetch the values when I exec the lambdas? I'm asking cause I go to the aws lambda console and I can see em set there.
Thanks!
After taking a look around in deep, I came with the following conclusion:
Using this pattern ONE_CLIENT_SECRET=${ssm:/one/key_one~true} means that the sls framework is going to download the values on compilation time and embed into the project, this is where the problem comes, you can see this after uploading the project, your variables are going to be set on plain text on the lambda console.
My solution was to use a middy middleware to load ssm values when executing the lambda. This means, you need to code your project in a way that does not trigger any code until the variables are available and find a good strategy to catch the variables (cold start), otherwise, it will add more time to the execution.
The limit of 4Kb cannot be changed and after read about this, it seems obvious.
So short story, find a strategy of middleware and embed values that work best for you if you find this problem.

Execute one feature at a time during application execution

I'm using Karate in this way; during application execution, I get the test files from another source and I create feature files based on what I get.
then I iterate over the list of the tests and execute them.
My problem is that by using
CucumberRunner.parallel(getClass(), 5, resultDirectory);
I execute all the tests at every iteration, which causes tests to be executed multiple times.
Is there a way to execute one test at a time during application execution (I'am fully aware of the empty test class with annotation to specify one class but that doesn't seem to serve me here)
I thought about creating every feature file in a new folder so that I can specify the path of the folder that contains only one feature at a time, but CucumberRunner.parallel() accepts Class and not path.
Do you have any suggestions please?
You can explicitly set a single file (or even directory path) to run via the annotation:
#CucumberOptions(features = "classpath:animals/cats/cats-post.feature")
I think you already are aware of the Java API which can take one file at a time, but you won't get reports.
Well you can try this, set a System property cucumber.options with the value classpath:animals/cats/cats-post.feature and see if that works. If you add tags (search doc) each iteration can use a different tag and that would give you the behavior you need.
Just got an interesting idea, why don't you generate a single feature, and in that feature you make calls to all the generated feature files.
Also how about you programmatically delete (or move) the files after you are done with each iteration.
If all the above fails, I would try to replicate some of this code: https://github.com/intuit/karate/blob/master/karate-junit4/src/main/java/com/intuit/karate/junit4/Karate.java

How can an uncalled test affect another in Go?

I have a test function TestJobqueue() in https://github.com/VertebrateResequencing/wr/blob/develop/jobqueue/jobqueue_test.go that I can call in isolation: go test -tags netgo ./jobqueue -v -run 'TestJobqueue$'.
I recently started getting test failures related to boltdb (one of my dependencies) bombing out with signal SIGBUS: bus error code panics, or just normally failing tests because the database couldn't be opened. But only when working off an NFS mounted directory. Fair enough, I or boltdb have some kind of NFS-related bug.
But the thing I can't wrap my head around is that I only get these errors when an entirely different test function exists.
As per the comments in TestREST() in https://github.com/VertebrateResequencing/wr/blob/92fb61ccd7819c8f1edfa8cce8468c4250d40ea7/jobqueue/rest_test.go, if I call Serve(serverConfig) (a function in the package being tested, a function call which is made many times in TestJobqueue() and other test functions) in that test function, TestJobqueue() fails. If I don't, it doesn't.
In short, the failure of tests in one test function can be controlled by the value of a boolean in a test function that I'm not running.
How is this possible?
Edit: to address some points brought up by the first answer, TestJobqueue() is being run in isolation. No other test runs before or after it. If the database file already exists, Serve() results in those files being deleted first, then a new one created to run the new set of tests. The odd thing that I'm seeking an answer for is how an unexecuted function can have this side effect. I can demonstrate it is really unexecuted by beginning or ending TestREST() with a panic call: the output of that panic is never seen, but TestJobqueue() failure can still be controlled by the boolean in TestREST() (if the panic comes at the end).
Edit2: this turns out to be caused by an unusual thing I do in TestJobqueue(), which is to call go test on itself. Needless to say, if you do this, strange things can happen...
In short, the failure of tests in one test function can be controlled by the value of a boolean in a test function that I'm not running.
This is not a great summary. Your test starts a server. The other test starts a server, clearly, the problem is there. You appear to have commented out the bit of code that stops the server at the end of the test? You can't run two servers on the same port.
You probably have a port conflict or some network condition that is triggered by running the two servers at once, because they both appear to use a similar (identical?) config loaded like this:
config := internal.ConfigLoad("development", true)
Running with no config uses default values, avoiding the conflict, running with config causes the conflict. So to pin it down, try creating a config with one setting at a time till you find the config setting that causes the problem (most likely Port or WebPort). Alternatively, make sure the tests stop the server at the end.
[EDIT] Looks like you have narrowed it down to DBFile config setting by changing one at a time. This implies the server starts a new db instance - if both try to use the same file for a new db, this would cause contention and the second test to run would fail.
It's not entirely clear from your description above what you're doing or what the problem is, so you could try to improve that to state exactly the sequence of actions and the problem. If for example you have previously run a test which creates a db, it could affect later test runs because of the presence of a db file, so your tests are not completely independent.
[EDIT 2 - after further edits to question]
If commenting out TestREST completely solves your problem (or a panic before it starts), and given changing it breaks the other test, you are executing TestREST somehow.
Looking at your code for jobqueue_test, it appears to invoke go test so you might be running more tests that you assume? Given you don't see the panic output I'd suspect your use of exec.Command in this big test. Try removing bits of the failing test till it works to narrow down exactly which invocation is running the other test. Calling go test within a test is pretty unusual!
https://github.com/VertebrateResequencing/wr/blob/develop/jobqueue/jobqueue_test.go#L2445

Check for multiple files

Okay, I'll try to explain as good as I can... Quite a particular case.
Tools: SSIS 2008
We have a control flow that now needs to be triggered by an event: the presence of one or multiple files. (1,2 or 3)
The variables used:
BO_FileLocation_1
BO_FileLocation_2
BO_FileLocation_3
BO_FileName_1
BO_FileName_2
BO_FileName_3
There can be one, two or three files: defined in above variables. When they are filled in,
they should be processed. When they are empty, this means there's just one file file, the process should ignore them and jump to the next (file watcher?) task.
For example:
BO_FileLocation_1= "C:\"
BO_FileLocation_2 NULL
BO_FileLocation_3 NULL
BO_FileName_1= "test.csv"
BO_FileName_2 NULL
BO_FileName_3 NULL
The report only needs one file.
I'd need a generic concept that checks the presence of these files, it could be more generic than my SSIS knowledge can handle right now. For example handy, when there's a 4th file in the future. I was also thinking to work with a single script to handle all the logic.
Thanks in advance
A possibly irrelevant image:
If all you want is to trigger the Copy Source File to handle if one or more of the files is present, just use the OR Constraint in your flow. The following image shows you how:
First connect all to the destination:
Then click one of the green arrows. This will make its properties window pop up. Select the Logical ORinstead of the Logical AND:
If everything went well, you should now see the connections as dashed lines:
There are several possible solutions:
Create a sequence container and include all the file imports in the sequence container. Add int variables for RowCountFile1, RowCountFile2, and RowCountFile3 and set the value to 0 (this is the default value when you create an int variable). Add a RowCount transformation to each of the data flows. Create a precedence constraint from the sequence container to the "Do something" task. Set the precedence constraint to success and expression. Set the expression value to #RowCountFile1 > 0 || #RowCountFile2 > 0 || #RowCountFile3 > 0. The advantage of this approach is that you can take an action as soon as the files are detected, you import all available files, and you only take an action after all the files have been imported. You could then schedule running this SSIS package as a SQL Server Agent job step and run it as frequently as you want.
A variant on solution 1 is to use for each file enumerator containers inside the sequence container. This would be useful if you don't know the exact name of the file and you expect to import more than one under some circumstances. For instance, if you get a file every few minutes with a timestamp in its file name and your process doesn't run for some reason, then you may have to process multiple files to get caught up and then take an action once it has been done.
You could use the file watcher task as you outlined in your question. The only problem I have with the file watcher task is that the package has to be in a constantly running state. This makes it hard to troubleshoot problems and performance. It also can introduce other problems since I remember having some problems with the file watcher task years ago when it first came out. It may well be a totally stable task now, but I prefer other methods over the task after having been burned previously. If you really want the package to run continously instead of having it be called by a job, then you could always use a script task to check for file, sleep thread if not found, check again, etc. I'm sure that's what the file watcher task does, but I would trust my own C# over the task. Power to anyone who has had better experiences than me with File Watcher...
Use PowerShell. If you just want to take an action if a file appears and you aren't importing the data, then a PowerShell script could do this just as well as a SSIS package. The drawback is that you have to learn some basic PowerShell, it may be hard to maintain in the future since PowerShell is probably not your bread and butter core language, and you may have to rewrite the code again to a SSIS package if you want to import the data. You would probably call the PowerShell script from a SQL Server Agent job step, so scheduling can be handled pretty easily.
There are more options than what I listed, so let me know if you still want more suggestions.

SSIS Intermittent variable error: The system cannot find the file specified

Our SSIS pacakges a structured as one Control package and many child packages (about 30) that are invoked from the control package. The child packages are invoked with Execute Package Task. There is one Execute Package Task per child package. Each Execute Package Task uses File Connection Manager to specify path to the child package dtsx file. There is one File Connection Manager per child package. Each File Connection Manager has an expression defined for ConnectionString property. This expression looks like this:
#[Template::FolderPackages]+"MyPackage.dtsx"
The file name is different for each package. The variable (FolderPackages) is specified in the SSIS package configuration file.
The error that is generated during run time is
Error 0x80070002 while loading package file "MyPackage.dtsx"
The system cannot find the file specified." The package that fails is different from run to run and sometimes no packages fail at all. This is when run on exactly the same environment/data etc.
I ran FileMon during this error and found out that when the error happens SSIS tries to read the dtsx file from a wrong place, namely from system32. I checked that this is identical to what would happen if #[Template::FolderPackages] variable were empty, but because the very same variable is used for every child package and works for some but doesn't work sometimes for others, I have no expalnation to this fact.
Anything obvious, or time to raise a support call with Microsoft?
Are you using Expressions on the SSIS variables directly? Variables with Expressions are calculated each time the variable is referenced by the consuming object which needs to use it. That is where the race condition bug exists, because sometimes the expression doesn't get evaluated if another thread is already evaluating a different variable, and the default value for the variable is provided to the consumer object.
If that matches your design, these two bugs on the connect site discuss the problem, and the workarounds:
https://connect.microsoft.com/SQLServer/feedback/details/332372/ssis-variable-expressions-dont-always-evaluate
A second one at
connect.microsoft.com/SQLServer/feedback/details/406534/ssis-2008-variable-expressions-dont-always-evaluate
A summary of workarounds is
{
- Note the parallel tasks that could run in you SSIS control flow and utilize these expression variables. If you have two tasks side-by-side if each relies upon the same variable, and that variable has an Expression to set its value, then you could hit this.
Manually sequentialize such tasks, so that they don't run in parallel. Ie. Add a green arrow on the control flow, so that the tasks occur in order Task1, Task2, Task3, rather than side-by-side on parallel paths and rather than inside the same container with no paths.
You could avoid variable expressions: Assigning local variables in the required order using a home-made script task that does the same kind of work, so that variables are not evaluated using expressions (ie. the thing which can hit this race condition). In other words, manually assign the variable values at a point in time in your control flow just before they are used. The point of using expressions on variables is to dynamically set a value based on another value whenever it is used, so this acheives a similar design goal but in a manual way.
Reduce threads to minimize potential: Setting the Dataflow task EngineThreads to 1 and MaxConcurrentExecutables to 1. This will help sequentalize execution of your package to one task at a time, but that has the side effect which may cause slower performance.
Create and set values on distinct copies of variables at different scope levels in the design, so that they evaluate in different parallel execution scopes and avoid the expression evaluation on parallel threads. Master::Var1, Child1::Var1, Child2::Var1
}
A bit of a stab in the dark but...
I've had a similar issue with variables where readonly=false and multiple components were reading the variable at the same time and causing locking issues.
I consistently recreated the problem by running a pair of dataflows that did nothing but reference the variable inside a for loop container and changed the variable to be read only and this resolved the problem.
If you temporarily hardcode the package name does this resolve the issue?
Turns out after sending trace info to Microsoft that we are encountering heap corruption. I'll update this question if we get to the bottom of it.
The current suggestion is to disable heap lookaside for dtexec.exe.
The official answer to this issue is that it is a bug in SQL 2005 and 2008. Many tasks accessing the same variable cause a race condition, and some tasks get the default value for the expression instead of the evaluated value.
The workaround is to ensure that the default value (the value defined in the property sheet for whatever property you are having trouble with) should be the value that will work in your production environment.
This way, when the race condition happens in prod, SSIS will fall back to the package value, which will still work.
In dev? Well you're just going to have to deal with that manually until we get a bug fix from Microsoft.
There is a KB article relating to this issue: http://support.microsoft.com/kb/2448991 which states when and where this was fixed.