Snakemake MissingInputException - snakemake

We have always run our Snakemake pipelines through Amazon S3.
snakemake --default-remote-provider S3 --default-remote-prefix '$s3' --use-conda
However, we need to run our Snakemake pipeline locally for a new experiment.
snakemake --use-conda
The pipeline works great when running with --default-remote-provider S3 --default-remote-prefix '$s3'; however, when we try to run locally we get:
Building DAG of jobs...
MissingInputException in line 226 of /usr/local/eclipse/snakemake_eclip/rules/rep_element.smk:
Missing input files for rule compress_parsed:
output: rep_element_pipeline/IN1_BET_S35_R1_001.combined_w_uniquemap.rmDup.sam.parsed.gz
wildcards: sample=IN1_BET_S35_R1_001
affected files:
rep_element_pipeline/IN1_BET_S35_R1_001.combined_w_uniquemap.rmDup.sam.parsed
ERROR conda.cli.main_run:execute(41): `conda run snakemake --use-conda --cores 36` failed. (See above for error)
Below are the two rules that are causing the error. I'm fairly certain it has something to do with the local wrapper temp(local('{full_path}' + 'rep_element_pipeline/{sample}.combined_w_uniquemap.rmDup.sam.parsed')).
{full_path} is an absolute path where the files at the beginning of the pipeline are located.
#Create final rep element parsed file
rule merge_parsed:
input:
'rep_element_pipeline/AA.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/AC.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/AG.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/AN.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/AT.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/CA.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/CC.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/CG.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/CN.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/CT.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/GA.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/GC.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/GG.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/GN.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/GT.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/NA.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/NC.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/NG.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/NN.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/NT.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/TA.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/TC.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/TG.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/TN.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/TT.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed'
output:
temp(local('{full_path}' + 'rep_element_pipeline/{sample}.combined_w_uniquemap.rmDup.sam.parsed'))
conda:
'../envs/rep_element.yaml'
params:
fp=full_path
shell:
'perl ../scripts/merge_multiple_parsed_files.pl {output[0]} {input}'
#Compress sam.parsed
rule compress_parsed:
input:
'rep_element_pipeline/{sample}.combined_w_uniquemap.rmDup.sam.parsed'
output:
temp('rep_element_pipeline/{sample}.combined_w_uniquemap.rmDup.sam.parsed.gz')
params:
fp=full_path
conda:
'../envs/standard_eclip.yaml'
shell:
'pigz -c {input[0]} > {params.fp}rep_element_pipeline/{wildcards.sample}.combined_w_uniquemap.rmDup.sam.parsed.gz'

The output of rule merge_parsed is:
temp(local('{full_path}' + 'rep_element_pipeline/{sample}.combined_w_uniquemap.rmDup.sam.parsed'))
but the input of compress_parsed is:
'rep_element_pipeline/{sample}.combined_w_uniquemap.rmDup.sam.parsed'
The string {full_path} is in the first but not the second. Isn't this an issue or am I missing something?
Besides, I would avoid concatenating file paths with +. Better to use os.path.join which takes care of adding the correct separator.

Related

How to remove line that starts with + or - followed by empty space only

Here is a file that contains:
+
-
+ <>cow apple</>
- apple
+ ball
+ +
- -
+ -
- +
+ !
-
-
+
+ $
+ **
+ *
+ =
+ #
- ?
- ◊
+ ◊◊
-
-
Expect output:
+ <>cow apple</>
- apple
+ ball
+ +
- -
+ -
- +
+ !
+ $
+ **
+ *
+ =
+ #
- ?
- ◊
+ ◊◊
How to remove line that starts with + or - followed by empty space only?
Here is code which gives expected result but better solution would be very helpful. Since I am running this cmd on large file and has to be accurate.
sed ‘/^[^[:alnum:]]* $/d’
You may use this grep with -v (inverse) option:
grep -v '^[-+][[:blank:]]*$' file
+ <>cow apple</>
- apple
+ ball
+ +
- -
+ -
- +
+ !
+ $
+ **
+ *
+ =
+ #
- ?
- ◊
+ ◊◊
Here:
^[-+][[:blank:]]*$: Matches a line starting with - or + followed by 0 or more whitespaces till end.
Following awk or sed solutions would also work:
sed '/^[-+][[:blank:]]*$/d' file
awk '!/^[-+][[:blank:]]*$/' file
grep '^[+-]\s\S' file
^ start of line anchor
[+-] match on + or -
\s match a whitespace
\S match a non-whitespace

Rendering `%` using element_markdown()

How can I keep the % symbol in the title?
library(ggtext)
library(ggplot2)
ggplot(mtcars, aes(cyl, mpg)) +
geom_col() +
ggtitle("%") +
theme(plot.title = element_markdown())
Created on 2022-01-28 by the reprex package (v2.0.1)
You can just add a space before and the character will display correctly. Although there is a space, the formatting of the title will ignore this:
ggplot(mtcars, aes(cyl, mpg)) +
geom_col() +
ggtitle(" %") +
theme(plot.title = element_markdown())

CPLEX: Error 5002 Objective is not convex -> Problem can be solved to global optimality with optimality target 3 ->

I am receiving this error on CPLEX Optimization studio. The problem is a simple quadratic problem with one equality and two inequality constraints.
.mod code shown below (no .dat used):
/*********************************************
* OPL 12.10.0.0 Model
* Author: qdbra
* Creation Date: Sep 14, 2020 at 9:40:57 PM
*********************************************/
range R = 1..5;
range B= 6..10;
dvar float x[R];
dvar boolean y[B];
minimize
( x[1]^2 - 2*x[2]^2 + 3*x[3]^2 + 4*x[4]^2
- 5*x[5]^2 + 6*y[6]^2 + 7*y[7]^2 -
8*y[8]^2 + 9*y[9]^2 + 10*y[10]^2 +
8*x[1]*x[2] + 17*x[3]*y[8] - 20*y[6]*y[9]
+ 26*y[9]*y[10])/2 ;
subject to {
ct1:
x[1] + x[2] + x[3] + x[5] + y[6] + y[7] == 20;
ct2:
x[1] + x[4] + y[8] + y[9] + y[10] >= 1;
ct3:
x[2] - x[4] - y[6] + y[7] >= 0;
}
if you set the optimality target to 3 you ll get a result:
execute
{
cplex.optimalitytarget=3;
}
range R = 1..5;
range B= 6..10;
dvar float x[R];
dvar boolean y[B];
minimize
( x[1]^2 - 2*x[2]^2 + 3*x[3]^2 + 4*x[4]^2
- 5*x[5]^2 + 6*y[6]^2 + 7*y[7]^2 -
8*y[8]^2 + 9*y[9]^2 + 10*y[10]^2 +
8*x[1]*x[2] + 17*x[3]*y[8] - 20*y[6]*y[9]
+ 26*y[9]*y[10])/2 ;
subject to {
ct1:
x[1] + x[2] + x[3] + x[5] + y[6] + y[7] == 20;
ct2:
x[1] + x[4] + y[8] + y[9] + y[10] >= 1;
ct3:
x[2] - x[4] - y[6] + y[7] >= 0;
}
will give
x = [20
0 0 0 0];
y = [0 0 0 0 0];

Problem using inprod() to summarise linear predictor

I am having a problem when trying to summarise my aditive predictor:
mu[j] <- b0 + weights1[1] * A[j] + weights1[2] * A[j+1] + weights1[3] * A[j+2] + weights1[4] * A[j+3] +
weights1[5] * A[j+4] + weights1[6] * A[j+5] + weights1[7] * A[j+6] + weights1[8] * A[j+7] +
weights1[9] * A[j+8] + weights1[10] * A[j+9] + weights1[11] * A[j+10] + weights1[12] * A[j+11] +
weights2[1] * B[j] + weights2[2] * B[j+1] + weights2[3] * B[j+2] + weights2[4] * B[j+3] +
weights2[5] * B[j+4] + weights2[6] * B[j+5] + weights2[7] * B[j+6] + weights2[8] * B[j+7] +
weights2[9] * B[j+8] + weights2[10] * B[j+9] + weights2[11] * B[j+10] + weights2[12] * B[j+11]
by using inprod(). This is what I thought should be the equivalent:
mu[j] <- b0 + inprod(weights1[],A[j:(j+11)]) + inprod(weights2[],B[j:(j+11)])
While the model compiles and seems to work, it stays updating forever. Its been running for hours and it does not end while the first approach ends in few minutes.
These are the priors, just in case:
weights1[1] ~ dnorm(0,1.0E-6)
weights2[1] ~ dnorm(0,1.0E-6)
for(t in 2:12) {
weights1[t]~dnorm(weights1[t-1],tauweight1)}
for(t in 2:12) {
weights2[t]~dnorm(weights2[t-1],tauweight2)}
b0 ~ dnorm(0,.001)
tau ~ dgamma(0.001, 0.001)
sigma <- 1/sqrt(tau)
tauweight1~dgamma(1.0E-3,1.0E-3)
tauweight2~dgamma(1.0E-3,1.0E-3)
I am calling OpenBUGS from R using R2OpenBUGS just in case.
Thanks very much for your time!

T-SQL update with switch-case statement

I want implement this pseudocode in t-sql
UPDATE Resources SET [Path]= CASE ([Path].Substring([Path].LastIndexOf('.')))
WHEN '.jpg' THEN '/image.jpg'
WHEN '.png' THEN '/image.jpg'
WHEN '.avi' THEN '/video.jpg'
WHEN '.mkv' THEN '/video.jpg'
for it I use this solution
UPDATE Resources SET [Path] = CASE (SUBSTRING([Path], LEN([Path]) - CHARINDEX('.', REVERSE([Path])) + 1, 3))
WHEN '.jpg' THEN '/image.jpg'
WHEN '.png' THEN '/image.jpg'
WHEN '.avi' THEN '/video.jpg'
WHEN '.mkv' THEN '/video.jpg'
END
but it is does not return expected result.
Can anyone give me working version please?
UPDATE
Resources
SET
Path = CASE SUBSTRING(Path, LEN(Path) - CHARINDEX('.', REVERSE(Path)) + 1, 4)
WHEN '.jpg' THEN '/image.jpg'
WHEN '.png' THEN '/image.jpg'
WHEN '.avi' THEN '/video.jpg'
WHEN '.mkv' THEN '/video.jpg'
END
Instead of SUBSTRING([Path], LEN([Path]) - CHARINDEX('.', REVERSE([Path])) + 1, 3),
try using lower(right([Path], 4))
Your read of the extension is wrong, instead try:
SUBSTRING(Path, LEN(Path) - CHARINDEX('.', REVERSE(Path)) + 1, LEN(Path))
(Using LEN(Path) as the read length; fine if it overflows the end of the string and allows for n-character extensions)
Try using the ParseName
UPDATE Resources SET [Path] = CASE (Parsename(Path,1))
WHEN 'jpg' THEN '/image.jpg'
WHEN 'png' THEN '/image.jpg'
WHEN 'avi' THEN '/video.jpg'
WHEN 'mkv' THEN '/video.jpg'
END
UPDATE Resources SET ThumbnailPath = CASE SUBSTRING(ThumbnailPath, LEN(ThumbnailPath) - CHARINDEX('.', REVERSE(ThumbnailPath)) + 1, LEN(ThumbnailPath))
WHEN '.doc' THEN #root + '/doc.png'
WHEN '.docx' THEN #root + '/doc.png'
WHEN '.jpg' THEN #root + '/image.png'
WHEN '.jpeg' THEN #root + '/image.png'
WHEN '.gif' THEN #root + '/image.png'
WHEN '.png' THEN #root + '/image.png'
WHEN '.ppt' THEN #root + '/ppt.png'
WHEN '.pptx' THEN #root + '/ppt.png'
WHEN '.pdf' THEN #root + '/pdf.png'
ELSE #root + '/other.png'
END
thank you I finaly use this
This script will assure that you do not update all rows every time you run the script. It will only update changes.
UPDATE r
SET ThumbnailPath = newvalue
FROM Resources r
cross apply
(SELECT right(ThumbnailPath, patindex('%_.%', reverse(ThumbnailPath))) T) a
cross apply
(SELECT CASE
WHEN a.T in ('doc','docx' ) THEN #root + '/doc.png'
WHEN a.T in ('jpg','jpeg','gif','png') THEN #root + '/image.png'
WHEN a.T in ('ppt','pptx') THEN #root + '/ppt.png'
WHEN a.T = 'pdf' THEN '/pdf.png'
ELSE #root + '/other.png'
END newvalue) b
WHERE r.ThumbnailPath <> b.newvalue