Is there any way to generate a set of JWebUnit tests from an apache rewrite config? - apache

Seems unlikely, but is there any way to generate a set of unit tests for the following rewrite rule:
RewriteRule ^/(user|group|country)/([a-z]+)/(photos|videos)$ http:/whatever?type=$1&entity=$2&resource=$3
From this I'd like to generate a set of urls of the form:
/user/foo/photos
/user/bar/photos
/group/baz/videos
/country/bar/photos
etc...
The reason I don't want to just do this once by hand is that I'd like the bounded alternation groups (e.g. (user|group|country)) to be able to grow and maintain coverage without having the update the tests by hand.
Is there a rewrite rule or regex parser that might be able to do this, or am I doing it by hand?

If you don't mind hacking a few lines of Perl then there's a package, Regexp::Genex that you can use to generate something close to what you require e.g.
# perl -MRegexp::Genex=:all -le 'print for strings(qr/\/(user|group|country)\/([a-z]+)\//)'
/user/dxb/
/user/dx/
/user/d/
/group/xd/
/group/x/
# perl -MRegexp::Genex=:all -le 'my $re=qr/\/(user|group|country)\/([a-z]+)\/(phone|videos)/;$Regexp::Genex::DEFAULT_LEN = length $re;print for strings($re)'
/user/mgcgmccdmgdmmzccgmczgmzzdcmmd/phone
/user/mgcgmccdmgdmmzccgmczgmzzdcmm/phone
/user/mgcgmccdmgdmmzccgmczgmzzdcm/phone
/user/mgcgmccdmgdmmzccgmczgmzzdc/phone
...
/group/gg/videos
/group/g/phone
/group/g/videos
/country/jvmmm/phone
/country/jvmmm/videos
/country/jvmm/phone
/country/jvmm/videos
/country/jvm/phone
/country/jvm/videos
/country/jv/phone
/country/jv/videos
/country/j/phone
/country/j/videos
#
Note:
1) You'll need to write a wrapper to parse the source file, tokenise (extract) the source patterns, escape certain characters in the rule e.g. "/", and possibly split your rules into more manageable parts, before expanding, via Genex, and then outputting the results, in the desired format.
2) To install the module type: cpan Regexp::Genex

Related

Can I add a file to rule all: which is not defined in output

A number of commands produce silently extra files not defined in the rule output section.
When I try to make sure these are produced by adding them to 'rule all:' a re-run of the workflow fails because the file are not found in the rule(s) output list.
Can I add a supplementary file (not present as {output}) to the 'rule all:'?
Thanks
eg: STAR index produces a number of files in a folder defined by command arguments, checking for the presence of the folder does not mean that indexing has worked out normally
added for clarity, the STAR index exmple takes 'star_idx_75' as output argument and makes a folder of it in which all the following files are stored (their number may vary in function of the index type).
chrLength.txt
chrName.txt
chrNameLength.txt
chrStart.txt
exonGeTrInfo.tab
exonInfo.tab
geneInfo.tab
Genome
genomeParameters.txt
SA
SAindex
sjdbInfo.txt
sjdbList.fromGTF.out.tab
sjdbList.out.tab
transcriptInfo.tab
What I wanted was to check that they are all present BUT none of them is used to build the command itself and if I required them in the rule all: a rerun breaks because they are not in any snakemake {output} definition.
This is why I asked wether I could create 'fake' output variables that are not 'used' for running a command but allow placing the corresponding items in the 'rule all:' - am I more clear now :-).
Can I add a supplementary file (not present as {output}) to the 'rule all:'?
I don't think so, at least not without resorting on some convoluted solution. Every file in rule all (or more precisely the first rule) must have a rule that lists it in output.
If you don't want to repeat a long list, why not doing something like this?
star_index= ['ref.idx1', 'ref.idx2', ...]
rule all:
input:
star_index
rule make_index:
input:
...
output:
star_index
shell:
...
It's probably better to list them all in the rule's output, but only use the relevant ones in subsequent rules. You could also look into using directory() which could possibly fit here.

How to gather files from subdirectories to run jobs in Snakemake?

I am currently working on this project where iam struggling with this issue.
My current directory structure is
/shared/dir1/file1.bam
/shared/dir2/file2.bam
/shared/dir3/file3.bam
I want to convert various .bam files to fastq in the results directory
results/file1_1.fastq.gz
results/file1_2.fastq.gz
results/file2_1.fastq.gz
results/file2_2.fastq.gz
results/file3_1.fastq.gz
results/file3_2.fastq.gz
I have the following code:
END=["1","2"]
(dirs, files) = glob_wildcards("/shared/{dir}/{file}.bam")
rule all:
input: expand( "/results/{sample}_{end}.fastq.gz",sample=files, end=END)
rule bam_to_fq:
input: {dir}/{sample}.bam"
output: left="/results/{sample}_1.fastq", right="/results/{sample}_2.fastq"
shell: "/shared/packages/bam2fastq/bam2fastq --force -o /results/{sample}.fastq {input}"
This outputs the following error:
Wildcards in input files cannot be determined from output files:
'dir'
Any help would be appreciated
You're just missing an assignment for "dir" in your input directive of the rule bam_to_fq. In your code, you are trying to get Snakemake to determine "{dir}" from the output of the same rule, because you have it setup as a wildcard. Since it didn't exist, as a variable in your output directive, you received an error.
input:
"{dir}/{sample}.bam"
output:
left="/results/{sample}_1.fastq",
right="/results/{sample}_2.fastq",
Rule of thumb: input and output wildcards must match
rule all:
input:
expand("/results/{sample}_{end}.fastq.gz", sample=files, end=END)
rule bam_to_fq:
input:
expand("{dir}/{{sample}}.bam", dir=dirs)
output:
left="/results/{sample}_1.fastq",
right="/results/{sample}_2.fastq"
shell:
"/shared/packages/bam2fastq/bam2fastq --force -o /results/{sample}.fastq {input}
NOTES
the sample variable in the input directive now requires double {}, because that is how one identifies wildcards in an expand.
dir is no longer a wildcard, it is explicitly set to point to the list of directories determined by the glob_wildcard call and assigned to the variable "dirs" which I am assuming you make earlier in your script, since the assignment of one of the variables is successful already, in your rule all input "sample=files".
I like and recommend easily differentiable variable names. I'm not a huge fan of the usage of variable names "dir", and "dirs". This makes you prone to pedantic spelling errors. Consider changing it to "dirLIST" and "dir"... or anything really. I just fear one day someone will miss an 's' somewhere and it's going to be frustrating to debug. I'm personally guilty, an thus a slight hypocrite, as I do use "sample=samples" in my core Snakefile. It has caused me minor stress, thus why I make this recommendation. Also makes it easier for others to read your code as well.
EDIT 1; Adding to response as I had initially missed the requirement for key-value matching of the dir and sample
I recommend keeping separate the path and the sample name in different variables. Two approaches I can think of:
Keep using glob_wildcards to make a blanket search for all possible variables, and then use a python function to validate which path+file combinations are legit.
Drop the usage of glob_wildcards. Propagate the directory name as a wildcard variable, {dir}, throughout your rules. Just set it as a sub-directory of "results". Use pandas to pass known, key-value pairs listed in a file to the rule all. Initially I suggest generating the key-value pairs file manually, but eventually, it's generation could just be a rule upstream of others.
Generalizing bam_to_fq a little bit... utilizing an external config, something like....
from pandas import read_table
rule all:
input:
expand("/results/{{sample[1][dir]}}/{sample[1][file]}_{end}.fastq.gz", sample=read_table(config["sampleFILE"], " ").iterrows(), end=['1','2'])
rule bam_to_fq:
input:
"{dir}/{sample}.bam"
output:
left="/results/{dir}/{sample}_1.fastq",
right="/results/{dir}/{sample}_2.fastq"
shell:
"/shared/packages/bam2fastq/bam2fastq --force -o /results/{sample}.fastq {input}
sampleFILE
dir file
dir1 file1
dir2 file2
dir3 file3

apache velocity: remap $ and # keys

I wonder if it is possible to remap "$" and "#" to other keys.
sample:
#set( $foo = "bar" )
I want to use other keys because those interfere with another syntax of a script I am using.
$ and # characters are not configurable in Velocity. Even at compile time, it would at least imply to recompile the parser, and make a full code review for standalone $ and # chars...
That said:
Velocity does cope pretty well with syntax fragments it cannot parse, like jQuery $ object. It just render them as is, and most of the time it does the job.
You can escape your other script's sensitive characters whenever needed, for instance by using the EscapeTool: ${esc.d} for dollar, ${esc.h} for hash.

OCLint rule customization

I am using OCLint static code analysis tool for objective-C and want to find out how to customize rules? The rules are represented by set of dylib files.
In lieu of passing configuration as arguments (see Jon Boydell's answer), you can also create a YML file named .oclint in the project directory.
Here's an example file that customizes a few things:
rules:
- LongLine
disable-rules:
rulePaths:
- /etc/rules
rule-configurations:
- key: LONG_LINE
value: 20
output: filename
report-type: xml
max-priority-1: 10
max-priority-2: 20
max-priority-3: 30
enable-clang-static-analyzer: false
The answer, as with so many things, is that it depends.
If you want to write your own custom rule then you'll need to get down and dirty into writing your own rule, in C++ on top of the existing source code. Check out the oclint-rules/rules directory, size/LongLineRule.cpp is a simple rule to get going with. You'll need to recompile, etc.
If you want to change the parameters of an existing rule you need to add the command line parameter -rc=<rulename>=<value> to the call to oclint. For example, if you want the long lines rule to only activate for lines longer than 150 chars you need to add -rc=LONG_LINE=150.
I don't have the patience to list out all the different parameters you can change. The list of rules is here http://docs.oclint.org/en/dev/rules/index.html and a list of threshold based rules here http://docs.oclint.org/en/dev/customizing/rules.html but there's no list of acceptable values and I don't know whether these two URLs cover all the rules or not. You might have to look into the source code for each rule to work out how it works.
If you're using Xcode script you should use oclint_args like this:
oclint-json-compilation-database oclint_args "-rc LONG_LINE=150" | sed
's/(..\m{1,2}:[0-9]:[0-9]*:)/\1 warning:/'
in that sample I'm changing the rule of LONG_LINE to 150 chars

Anyone know how to make a self-contained Awk/Gawk program on Windows

I'm using an awk script to do some reasonably heavy parsing that could be useful to repeat in the future but I'm not sure if my unix-unfriendly co-workers will be willing to install awk/gawk in order to do the parsing. Is there a way to create a self-contained executable from my script?
I'm not aware of a way to make a self-contained binary using AWK. However, if you like AWK, chances seem good that you might like Python, and there are several ways to make a self-contained Python program. For example, Py2Exe.
Here's a quick example of Python:
# comments are introduced by '#', same as AWK
import re # make regular expressions available
import sys # system stuff like args or stdin
# read from specified file, else read standard input
if len(sys.argv) == 2:
f = open(sys.argv[1])
else:
f = sys.stdin
# Compile some regular expressions to use later.
# You don't have to pre-compile, but it's more efficient.
pat0 = re.compile("regexp_pattern_goes_here")
pat1 = re.compile("some_other_regexp_here")
# for loop to read input lines.
# This assumes you want normal line separation.
# If you want lines split on some other character, you would
# have to split the input yourself (which isn't hard).
# I can't remember ever changing the line separator in my AWK code...
for line in f:
FS = None # default: split on whitespace
# change FS to some other string to change field sep
words = line.split(FS)
if pat0.search(line):
# handle the pat0 match case
elif pat1.search(line):
# handle the pat1 match case
elif words[0].lower() == "the":
# handle the case where the first word is "the"
else:
for word in words:
# do something with words
Not the same as AWK, but easy to learn, and actually more powerful than AWK (the language has more features and there are many "modules" to import and use). Python doesn't have anything implicit like the
/pattern_goes_here/ {
# code goes here
}
feature in AWK, but you can simply have an if/elif/elif/else chain with patterns to match.
Theres a standalone awk.exe in the Cygwin Toolkit as far as I know.
You could just bundle that in with whatever files you're distributing to your colleagues.
Does it have to be self contained? You could write a small executable that will invoke awk with the right arguments and pipe the results to a file the users chooses, or to stdout - whichever is appropriate for your co-workers.
MAWK in GnuWin32 — http://gnuwin32.sourceforge.net/packages/mawk.htm
also interesting alternative, Java implementation — http://sourceforge.net/projects/jawk/