How to customize the temporary directory in Twisted Trial

How to customize the temporary directory in Twisted Trial - testing

I am trying to run a twisted.trial.TestCase that depends on resource folders (images, for instance) that reside alongside my Python package called test. Unfortunately, the temporary directory that gets created upon running the test runner (i.e. issuing trial test) doesn't include (naturally) a copy of the whole original working directory, and my tests fail because the images cannot be found. The function of the software is heavily dependent on those images, so they'll need to be a part of testing.
The question is, is there a way to customize the _trial_temp directory that the test runner normally creates from scratch so that it includes certain files and folders, besides what the test runner itself thinks it needs?

No.
Don't do it this way. If you need data from your project, it is not in any sense temporary data. If you point trial at a directory using --temp-directory, it will assume it is in fact "temporary" and will blow it away. Instead, you should access the data relative to the path of the tests.
If you put your sample data into the same directory as your tests, and treat it as package_data, you can do this:
from twisted.python.modules import getModule
thisModule = getModule(__name__)
dataPath = thisModule.filePath.parent()
and to get data in your tests:
fileobj = dataPath.child("sample_file.data").open()
databytes = dataPath.child("other_file.txt").getContent()
so keep your temporary directories and your sample data separate.

Related

Multiple users executing the same workflow

Are there guidelines regarding how to share a Snakemake workflow among multiple users on the same data under Linux, or is the whole thing considered bad practice?
Let me explain in case it's not clear:
Suppose user A executes a workflow in directory dir/. Assume the workflow terminates successfully, and he/she then properly sets file/directory permissions recursively on all output and intermediate files and the .snakemake/ subdirectory for other users to read/write, of course.
User B subsequently navigates to dir/, adds input files to the workflow, then executes it. Can anything go wrong?
TL;DR: I'm asking about non-concurrent execution of the same workflow by distinct users on the same system, and on the same data on disk. Is Snakemake designed for such use cases?

It's possible to run snakemake --nolock which will prevent locking of the directory, so multiple runs can be made from inside the same directory. However, without lock, there's now an opening for errors due to concurrent runs trying to modify the same files. It's probably OK, if you are certain that this will be avoided, e.g. if you are in constant communication with another user about which files will be modified.
An alternative option is to create a third directory/path, and put all the data there. This way you can work from separate directories/path and avoid costly recomputes.

I would say that from the point of view of snakemake, and workflow management in general, it's ok for user B to add or update input files and re-run the pipeline. After all, one of the advantages of a workflow management system is to update results according to new input. The problem is that user A could find her results updated without being aware of it.
From the top of my head and without more detail this is what I would suggest. Make snakemake read the list of input files from a table (pandas comes in handy for this) or from some configuration file. Keep this sample sheet under version control (with git/github) together with the Snakefile and other source code.
When users update the working directory with new files, they will also need to update the sample sheet in order for snakemake to "see" the new input and other users will know about it via version control. I prefer this setup over dumping files in a directory and letting snakemake process whatever is in there.

Execute one feature at a time during application execution

I'm using Karate in this way; during application execution, I get the test files from another source and I create feature files based on what I get.
then I iterate over the list of the tests and execute them.
My problem is that by using
CucumberRunner.parallel(getClass(), 5, resultDirectory);
I execute all the tests at every iteration, which causes tests to be executed multiple times.
Is there a way to execute one test at a time during application execution (I'am fully aware of the empty test class with annotation to specify one class but that doesn't seem to serve me here)
I thought about creating every feature file in a new folder so that I can specify the path of the folder that contains only one feature at a time, but CucumberRunner.parallel() accepts Class and not path.
Do you have any suggestions please?

You can explicitly set a single file (or even directory path) to run via the annotation:
#CucumberOptions(features = "classpath:animals/cats/cats-post.feature")
I think you already are aware of the Java API which can take one file at a time, but you won't get reports.
Well you can try this, set a System property cucumber.options with the value classpath:animals/cats/cats-post.feature and see if that works. If you add tags (search doc) each iteration can use a different tag and that would give you the behavior you need.
Just got an interesting idea, why don't you generate a single feature, and in that feature you make calls to all the generated feature files.
Also how about you programmatically delete (or move) the files after you are done with each iteration.
If all the above fails, I would try to replicate some of this code: https://github.com/intuit/karate/blob/master/karate-junit4/src/main/java/com/intuit/karate/junit4/Karate.java

Is there a way to clone data from CrateDB into Crate running on a new container?

I currently have one container which runs Crate, and stores all its data in the /data/ directory. I am trying to create a clone of this container for debugging purposes -- ideally, the clone would be running Crate (which I can query) using the exact same data. I've tried mounting the same data directory into the /data/ directory of the cloned container and starting Crate, but when I run any queries, I notice that Crate shows 0 tables (that is, it doesn't recognize the data in the folder as database tables). How do I get around this? I know I can export and import data using COPY TO and COPY FROM, but I have so many tables that that would be quite cumbersome to write.

I’m a little bit wondering why you want to use the same data directory for debugging purposes, since you then modify data, which you probably don’t want to change. Also, the two instances would overwrite each others data, when using the same data directory at the same time. That’s the reason why this is not working.
What you still can do, is simply copying the folder in your file system and mount the second debugging node to the cloned folder.
Another solution would be to create a cluster containing both nodes as documented here: https://crate.io/docs/crate/guide/best_practices/docker.html.
Hope that helps.

Visual Studio Unit Test Data Path and Source Control

I have a unit test project for my VB.NET solution that takes a very large data set (multiple Gigabytes). The data set is static, outside of source control (developers unzip a big .zip file somewhere on their machine)
I'm trying to figure out how to reference this data set inside my project and I'm having trouble because of the following constraints:
The data set may be in a different location on each developer's machine e:\dev\MySolution\TestProject\ vs. c:\MySolution\TestProject\data. We can require it to be in the project folder but I can't require that the project folder be in the same place. This means I can't store the path in source control or I need some way to override it.
Because of its size I don't want to have to copy or deploy the test data at runtime because this will slow things down a lot and I don't want to have two copies of it.
My hack-y solution is to use a project property called DataPath and reference this inside each test. Then users would just change the property and I'd instruct them not to commit this change.... feels like an anti-pattern though.
My initial thoughts were to somehow store the path in the project.suo file which is outside of source control but I can't figure out how to make that work.
Anyone have experience with this?

Executing Abaqus Model in Taverna

I'm pretty new to both Taverna and Abaqus but I am trying to run an Abaqus model using a "Tool" in Taverna remotely on a HPC. This works fine if I already have my model file and inputs on the HPC but I need a way of uploading the files dynamically in Taverna (trying to generically wrap Abaqus models).
I've tried adding a input port that takes a file list but I don't know how I can copy it to the "location" that I've set for the tool. Could a beanshell service be the answer or can I iterate through the file list and copy them up before executing the abaqus model?
Thanks

When you say that you created an input port that takes a file list, I guess you mean an input to the tool service.
Assuming the input port is called my_file_list, when the tool service is run, it will take a list of data values on port my_file_list. As an example, say it has "hello", "hi" and "hola" is the three values in the list.
On the location where the tool service is run, it executes in a temporary directory - a different directory for each execution of the service. It is normally something like /tmp/usecase-2029778474741087696
Three files will be created in the temporary directory; those files contain the (in this example) three values the tool service received on port my_file_list. The files could be called
/tmp/usecase-2029778474741087696/tempfile.0.tmp containing hello
/tmp/usecase-2029778474741087696/tempfile.1.tmp containing hi
/tmp/usecase-2029778474741087696/tempfile.2.tmp containing hola
There will also be a file called my_input_list. That file will contain
/tmp/usecase-2029778474741087696/tempfile.0.tmp
/tmp/usecase-2029778474741087696/tempfile.1.tmp
/tmp/usecase-2029778474741087696/tempfile.2.tmp
The script of your tool service would normally read the contents of my_input_list line by line and do something with the contents of the listed file(s).
I have also seen some scripts that 'cheat' and iterate directly over tempfile*.tmp but that would be "a bad thing". The problem with that trick, is that if you want to add a second list of files to the tool service then the file my_input_list could contain
/tmp/usecase7932018053449784034/tempfile.4.tmp
/tmp/usecase7932018053449784034/tempfile.5.tmp
/tmp/usecase7932018053449784034/tempfile.6.tmp
as other temporary files were used for the other file list port.
I hope that helps

The tool service allows you to upload files - but if you are using the HPC through a job submission node, then you would have to modify your command line tool to then use the job file staging command to further push the files as part of the job. The files would be available in the current (temporary) directory of the specified tool script.
I would try to do it through the Tool service and not involve the beanshell - then you can keep your workflow simpler.
A good thing to remember is that you can write multiple shell commands in the box.
Similarly you would probably want to retrieve back the results so that you can process them further in the workflow (unless they are massive - in which case you should just output their remote filenames and send them in again to the next HPC job)
The exact commands to use for staging files and retrieving them depends on the HPC job submission system. Which one are you using?

Thanks for the input guys.
It was my misunderstanding of how Taverna uses the File list. All the files in the list are copied to the temp "sandbox" and are therefore available for use.
Another nice easy way is to zip the directory and pass the zipped files into an input port for the service. Then just unzip the files inside the command.
Thanks again

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas