Determining in Snakefile whether running in cluster mode or not

Determining in Snakefile whether running in cluster mode or not - snakemake

Is there a generic way to tell whether snakemake was executed in cluster mode (e.g., --cluster, --cluster-sync, --drmaa)?
The use case is that in the cluster case I'd like to copy data from/to storage in some rules.

No, this information is not intentionally exposed. A more snakemakeish way would be to keep such special handling out of the workflow definition. This way, the scalability of the workflow is not limited and the code is also not crowded with platform-specific stuff. Instead, you can use the --default-remote-provider argument if your storage protocol is supported, see here. Another possibility is to copy the files in the jobscript. Both strategies can be very flexibly implemented via configuration profiles, see here. A good example for a comprehensive profile that performs a similar task is this one.

Would it help to use an input function to copy the files?
This solution helped me in a slightly related case:
Snakemake: Generic input function for different file locations

Related

What do i need to wrap a PL/SQL-Package in GitLab CI/CD?

I am trying to learn some more about GitLab CI/CD and wanna write a specific Stage like "wrap_packages", where a specific list of .sql-Files is given and these scripts should be wrapped to .plb, to copy&paste them into a specific folder.
Everything is working so far, but now i have to implement the wrapping. I guess i have to use an image, with Oracle Middleware, to use the wrap-command? Or is there a better way to do this? Because i cant find anything that helps me with this.
I hope you can help me with this.

The wrap utility either exists in the full OCI client installation (not instant client), or within the actual database as an API. The simplest way to wrap your code is using the database API, after it is installed, as demonstrated here: https://github.com/pmdba/code-obfuscation-toolkit. There are a variety of ways that this could be incorporated into your CI/CD pipeline.
If you're looking for a more robust commercial (licensed $$$) solution, consider PCFLObfuscate (http://www.petefinnigan.com/products/pfclobfuscate.htm). It has a command-line option that integrates well with CI/CD.
A question that must also be asked is why you want to obfuscate your code with "wrap"? At best obfuscation only slows down someone who wants to see your code, as it is rather easily undone (at least the wrapping part). Deeper obfuscation (as provided by PFCLObfuscate, for example) actually changes the formatting of your code, your variable names, etc. before wrapping to make it much harder to tell what is going on even after it is unwrapped.
It is important to understand that there is no level of protection available for PL/SQL that can prevent someone with access to the wrapped code from unwrapping it and seeing the actual PL/SQL.

Whatefficient a simple way to lock access to specific resource in kotlin

We received an assignment where we have to create a distributed file system. This file-system should have multiple servers, each performing a certain function.
This question relates to the lock-server, which is used to prevent two people from writing to the same file at once. Every attempt to access a file generates a thread, that when finished provides access to the requested file. If a file that is not currently free is accessed, the thread should be BLOCKED until the lock is released. With JAVA I would probably just use the wait() and notify() methods, but these are not present in Kotlin (I know you can force them in by casting but it is frowned upon). Is there an elegant way to do this? We are not limited in what libraries we can use so if you know one that could fit I will gladly check it out. Right now the one I think would fit the most is the ReentrantLock, but I am looking for more possibilities.
I have also checked out this list: https://stackoverflow.com/a/35521983/7091281
But none of the ones listed seemed to fit - I specifically need to block the thread, while everything I find does the exact opposite.
BTW the different parts of the system are supposed to communicate via RMI. Also while we can go our own way, it is encouraged to use threads instead of coroutines. (we are supposed to work in JAVA but we were allowed to use kotlin and scala)

If you want to use pure Kotlin, you could leverage coroutines, and more specifically its Mutex for locking.
More info can be found at the Kotlin docs, regarding Shared Mutable State and Concurrency

In the Diode library for scalajs, what is the distinction between an Action, AsyncAction, and PotAction, and which is appropriate for authentication?

In the scala and scalajs library Diode, I have used but not entirely understood the PotAction class and only recently discovered the AsyncAction class, both of which seem to be favored in situations involving, well, asynchronous requests. While I understand that, I don't entirely understand the design decisions and the naming choices, which seem to suggest a more narrow use case.
Specifically, both AsyncAction and PotAction require an initialModel and a next, as though both are modeling an asynchronous request for some kind of refreshable, updateable content rather than a command in the sense of CQRS. I have a somewhat-related question open regarding synchronous actions on form inputs by the way.
I have a few specific use cases in mind. I'd like to know a sketch (not asking for implementation, just the concept) of how you use something like PotAction in conjunction with any of:
Username/password authentication in a conventional flow
OpenAuth-style authentication with a third-party involved and a redirect
Token or cookie authentication behind the scenes
Server-side validation of form inputs
Submission of a command for a remote shell
All of these seem to be a bit different in nature to what I've seen using PotAction but I really want to use it because it has already been helpful when I am, say, rendering something based on the current state of the Pot.

Historically speaking, PotAction came first and then at a later time AsyncAction was generalized out of it (to support PotMap and PotVector), which may explain their relationship a bit. Both provide abstraction and state handling for processing async actions that retrieve remote data. So they were created for a very specific (and common) use case.
I wouldn't, however, use them for authentication as that is typically something you do even before your application is loaded, or any data requested from the server.
Form validation is usually a synchronous thing, you don't do it in the background while user is doing something else, so again Async/PotAction are not a very good match nor provide much added value.
Finally for the remote command use case PotAction might be a good fit, assuming you want to show the results of the command to the user when they are ready. Perhaps PotStream would be even better, depending on whether the command is producing a steady stream of data or just a single message.
In most cases you should use the various Pot structures for what they were meant for, that is, fetching and updating remote data, and maybe apply some of the ideas or internal models (such as the retry mechanism) to other request types.
All the Pot stuff was separated from Diode core into its own module to emphasize that they are just convenient helpers for working with Diode. Developers should feel free to create their own helpers (and contribute back to Diode!) for new use cases.

Managing complex configurations

I would like to ask you for your opinion on the best practices of managing big numbers of complex (for example xml, .properties, custom formats etc) configuration files as nowadays every more complex project consist of way to many to count.
How not to get lost in such a mess? How to reuse those in best ways? Any good tool that can help (maybe some Eclipse based)?

Obviously, the most fundamental thing to do is to put those configuration files into source control, as they really are a kind of source code.
But the real challenge with configuration management is deciding how many files to have and what to put where so that you can reuse common configurations. Those are design decisions and very project- and environment-specific, and no tool can make them for you.
A common approach is to have a master config file that contains default values, and environment-specific files that contain only those config values that are different for their environment, and which overwrite the defaults. This happens as part of an automated build process (which you really, really should have).

I'm not sure if this would be applicable to your situation, but I'd try and store them all in a database in a normalised fashion. maybe you need a could write some import tools that grabs the info from the files and adds them to the db.
Alternativly reading this, or this might help you.

How you test your applications for reliability under badly behaving i/o

Almost every application out there performs i/o operations, either with disk or over network.
As my applications work fine under the development-time environment, I want to be sure they will still do when the Internet connection is slow or unstable, or when the user attempts to read data from badly-written CD.
What tools would you recommend to simulate:
slow i/o (opening files, closing files, reading and writing, enumeration of directory items)
occasional i/o errors
occasional 'access denied' responses
packet loss in tcp/ip
etc...
EDIT:
Windows:
The closest solution to do the job as described seems to be holodeck, commercial software (>$900).
Linux:
Open solution wasn't found by now, but the same effect
can be achived as specified by smcameron and krosenvold.
Decorator pattern is a good idea.
It would require to wrap my i/o classes, but resulting in a testing framework.
The only remaining untested code would be in 3rd party libraries.
Yet I decided not to go this way, but leave my code as it is and simulate i/o errors from outside.
I now know that what I need is called 'fault injection'.
I thought it was a common production-line part with plenty of solutions I just didn't know.
(By the way, another similar good idea is 'fuzz testing', thanks to Lennart)
On my mind, the problem is still not worth $900.
I'm going to implement my own open-source tool based on hooks (targeting win32).
I'll update this post when I'm done with it. Come back in 3 or 4 weeks or so...

What you need is a fault injecting testing system. James Whittaker's 'How to break software' is a good read on this subject and includes a CD with many of the tools needed.

If you're on linux you can do tons of magic with iptables;
iptables -I OUTPUT -p tcp --dport 7991 -j DROP
Can simulate connections up/down as well. There's lots of tutorials out there.

Check out "Fuzz testing": http://en.wikipedia.org/wiki/Fuzzing

At a programming level many frameworks will let you wrap the IO stream classes and delegate calls to the wrapped instance. I'd do this and add in a couple of wait calls in the key methods (writing bytes, closing the stream, throwing IO exceptions, etc). You could write a few of these with different failure or issue type and use the decorator pattern to combine as needed.
This should give you quite a lot of flexibility with tweaking which operations would be slowed down, inserting "random" errors every so often etc.
The other advantage is that you could develop it in the same code as your software so maintenance wouldn't require any new skills.

You don't say what OS, but if it's linux or unix-ish, you can wrap open(), read(), write(), or any library or system call etc, with an LD_PRELOAD-able library to inject faults.
Along these lines:
http://scaryreasoner.wordpress.com/2007/11/17/using-ld_preload-libraries-and-glibc-backtrace-function-for-debugging/

I didn't go writing my own file system filter, as I initially thought, because there's a simpler solution.
1. Network i/o
I've found at least 2 ways to simulate i/o errors here.
a) Running a virtual machine (such as vmware) allows to configure bandwidth and packet loss rate. Vmware supports on-machine debugging.
b) Running a proxy on the local machine and tunneling all the traffic through it. For the case of upd/tcp communications a proxifier (e.g. widecap) can be used.
2. File i/o
I've managed to deduce this scenario to the previous one by mapping a drive letter to a network share which resides inside the virtual machine. The file i/o will be slow.
A cheaper alternative exists: to set up a local ftp server (e.g. FileZilla), configure speeds and use Novell's NetDrive to access it.

You'll wanna setup a test lab for this. What type of application are you building anyway? Are you really expecting the application be fed corrupt data?
A test technique I know the Microsoft Exchange Server people tried was sending noise to the server. Basically feeding every possible input with seemingly random data. They managed to crash the server quite often this way.
But still, if you can't trust input that hasn't been signed then general rules apply. Track every operation which could potentially be untrusted (result of corrupt data) and you should be able to handle most problems gracefully.
Just test your application behavior on random input, that should catch most problems but you'll never be able to fully protect your self from corrupt data. That's just not possible, as the data could be part of some internal buffer being handed off within the application itself.
Be mindful of when and how you decode data. That is all.

The first thing you'll need to do is define what "correct" means under these circumstances. You can only test against a definition of what behaviour is intended.
The tactics of testing will depend on technology. In the context of automated unit testing, I have found it very useful, in OO languages such as Java, to use various flavors of "mocking" or "stubbing" to pass e.g. misbehaving InputStreams to parts of my code that used file I/O.

Consider holodeck for some of the fault injection, if you have access to spare hardware you can simulate network impairment using Netem or a commercial product based on it the Mini-Maxwell, which is much more expensive than free but possibly easier to use.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Determining in Snakefile whether running in cluster mode or not - snakemake

Is there a generic way to tell whether snakemake was executed in cluster mode (e.g., --cluster, --cluster-sync, --drmaa)? The use case is that in the cluster case I'd like to copy data from/to storage in some rules.

Would it help to use an input function to copy the files? This solution helped me in a slightly related case: Snakemake: Generic input function for different file locations

Related

What do i need to wrap a PL/SQL-Package in GitLab CI/CD?

Whatefficient a simple way to lock access to specific resource in kotlin

In the Diode library for scalajs, what is the distinction between an Action, AsyncAction, and PotAction, and which is appropriate for authentication?

Managing complex configurations

How you test your applications for reliability under badly behaving i/o

Categories

Resources