How to remove line that starts with + or - followed by empty space only - awk

Here is a file that contains:
+
-
+ <>cow apple</>
- apple
+ ball
+ +
- -
+ -
- +
+ !
-
-
+
+ $
+ **
+ *
+ =
+ #
- ?
- ◊
+ ◊◊
-
-
Expect output:
+ <>cow apple</>
- apple
+ ball
+ +
- -
+ -
- +
+ !
+ $
+ **
+ *
+ =
+ #
- ?
- ◊
+ ◊◊
How to remove line that starts with + or - followed by empty space only?
Here is code which gives expected result but better solution would be very helpful. Since I am running this cmd on large file and has to be accurate.
sed ‘/^[^[:alnum:]]* $/d’

You may use this grep with -v (inverse) option:
grep -v '^[-+][[:blank:]]*$' file
+ <>cow apple</>
- apple
+ ball
+ +
- -
+ -
- +
+ !
+ $
+ **
+ *
+ =
+ #
- ?
- ◊
+ ◊◊
Here:
^[-+][[:blank:]]*$: Matches a line starting with - or + followed by 0 or more whitespaces till end.
Following awk or sed solutions would also work:
sed '/^[-+][[:blank:]]*$/d' file
awk '!/^[-+][[:blank:]]*$/' file

grep '^[+-]\s\S' file
^ start of line anchor
[+-] match on + or -
\s match a whitespace
\S match a non-whitespace

Related

Snakemake MissingInputException

We have always run our Snakemake pipelines through Amazon S3.
snakemake --default-remote-provider S3 --default-remote-prefix '$s3' --use-conda
However, we need to run our Snakemake pipeline locally for a new experiment.
snakemake --use-conda
The pipeline works great when running with --default-remote-provider S3 --default-remote-prefix '$s3'; however, when we try to run locally we get:
Building DAG of jobs...
MissingInputException in line 226 of /usr/local/eclipse/snakemake_eclip/rules/rep_element.smk:
Missing input files for rule compress_parsed:
output: rep_element_pipeline/IN1_BET_S35_R1_001.combined_w_uniquemap.rmDup.sam.parsed.gz
wildcards: sample=IN1_BET_S35_R1_001
affected files:
rep_element_pipeline/IN1_BET_S35_R1_001.combined_w_uniquemap.rmDup.sam.parsed
ERROR conda.cli.main_run:execute(41): `conda run snakemake --use-conda --cores 36` failed. (See above for error)
Below are the two rules that are causing the error. I'm fairly certain it has something to do with the local wrapper temp(local('{full_path}' + 'rep_element_pipeline/{sample}.combined_w_uniquemap.rmDup.sam.parsed')).
{full_path} is an absolute path where the files at the beginning of the pipeline are located.
#Create final rep element parsed file
rule merge_parsed:
input:
'rep_element_pipeline/AA.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/AC.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/AG.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/AN.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/AT.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/CA.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/CC.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/CG.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/CN.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/CT.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/GA.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/GC.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/GG.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/GN.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/GT.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/NA.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/NC.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/NG.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/NN.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/NT.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/TA.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/TC.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/TG.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/TN.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed',
'rep_element_pipeline/TT.{sample}.fastq.gz.mapped_vs_' + config["ref"]["bt2_index"] + '.sam.tmp.combined_w_uniquemap.rmDup.sam.parsed'
output:
temp(local('{full_path}' + 'rep_element_pipeline/{sample}.combined_w_uniquemap.rmDup.sam.parsed'))
conda:
'../envs/rep_element.yaml'
params:
fp=full_path
shell:
'perl ../scripts/merge_multiple_parsed_files.pl {output[0]} {input}'
#Compress sam.parsed
rule compress_parsed:
input:
'rep_element_pipeline/{sample}.combined_w_uniquemap.rmDup.sam.parsed'
output:
temp('rep_element_pipeline/{sample}.combined_w_uniquemap.rmDup.sam.parsed.gz')
params:
fp=full_path
conda:
'../envs/standard_eclip.yaml'
shell:
'pigz -c {input[0]} > {params.fp}rep_element_pipeline/{wildcards.sample}.combined_w_uniquemap.rmDup.sam.parsed.gz'
The output of rule merge_parsed is:
temp(local('{full_path}' + 'rep_element_pipeline/{sample}.combined_w_uniquemap.rmDup.sam.parsed'))
but the input of compress_parsed is:
'rep_element_pipeline/{sample}.combined_w_uniquemap.rmDup.sam.parsed'
The string {full_path} is in the first but not the second. Isn't this an issue or am I missing something?
Besides, I would avoid concatenating file paths with +. Better to use os.path.join which takes care of adding the correct separator.

Problem using inprod() to summarise linear predictor

I am having a problem when trying to summarise my aditive predictor:
mu[j] <- b0 + weights1[1] * A[j] + weights1[2] * A[j+1] + weights1[3] * A[j+2] + weights1[4] * A[j+3] +
weights1[5] * A[j+4] + weights1[6] * A[j+5] + weights1[7] * A[j+6] + weights1[8] * A[j+7] +
weights1[9] * A[j+8] + weights1[10] * A[j+9] + weights1[11] * A[j+10] + weights1[12] * A[j+11] +
weights2[1] * B[j] + weights2[2] * B[j+1] + weights2[3] * B[j+2] + weights2[4] * B[j+3] +
weights2[5] * B[j+4] + weights2[6] * B[j+5] + weights2[7] * B[j+6] + weights2[8] * B[j+7] +
weights2[9] * B[j+8] + weights2[10] * B[j+9] + weights2[11] * B[j+10] + weights2[12] * B[j+11]
by using inprod(). This is what I thought should be the equivalent:
mu[j] <- b0 + inprod(weights1[],A[j:(j+11)]) + inprod(weights2[],B[j:(j+11)])
While the model compiles and seems to work, it stays updating forever. Its been running for hours and it does not end while the first approach ends in few minutes.
These are the priors, just in case:
weights1[1] ~ dnorm(0,1.0E-6)
weights2[1] ~ dnorm(0,1.0E-6)
for(t in 2:12) {
weights1[t]~dnorm(weights1[t-1],tauweight1)}
for(t in 2:12) {
weights2[t]~dnorm(weights2[t-1],tauweight2)}
b0 ~ dnorm(0,.001)
tau ~ dgamma(0.001, 0.001)
sigma <- 1/sqrt(tau)
tauweight1~dgamma(1.0E-3,1.0E-3)
tauweight2~dgamma(1.0E-3,1.0E-3)
I am calling OpenBUGS from R using R2OpenBUGS just in case.
Thanks very much for your time!

SQL replace char(0)

I try to remove the char(0) character form input Text.
Why is this the result in 3 and 4 cases?
print '1------' + Replace('123 nyon 47647' ,CHAR(0),'');
print '2------' + Replace('123 nyon' ,CHAR(0),'');
print '3------' + Replace('nyon' ,CHAR(0),'');
print '4------' + Replace('ny' ,CHAR(0),'');
print '5------' + Replace('n' ,CHAR(0),'');
print '6------' + Replace('yn' ,CHAR(0),'');
1------123 nyon 47647
2------123 nyon
3------yon
4------y
5------n
6------yn
Thank you!

What is the Big-O of this nested loop?

int z=1;
for(int i=0;i*i<n;i++){
z*=3;
for(int j=0;j<z;j++){
// Some code
}
}
Answer is O(3^n).
Is it correct? How to figure out time complexity of nested loop?
outer loop: i goes from 1 to sqrt(n);
inner loop: j,z goes up to 3^(sqrt(n));
"some code" will run 1 + 3 + 3^2 + ... + 3^(sqrt(n)) times
let sum = 1 + 3 + 3^2 + ... + 3^(sqrt(n))
sum - 3*sum = 1 - 3(sqrt(n) + 1)
sum = 1 - 3(sqrt(n) + 1) / (1-3) = 2( 3^(sqrt(n)+1) - 1 )
2( 3^(sqrt(n)+1) - 1 ) << O( 3^sqrt(n) )
O(3^sqrt(n)) is more accurate
You could approach the problem using Sigma notation this way:

How do I sequentially select parts of an expression in Vim?

There is a feature in Vim I would find so great.. may I ask if it exists or if anyone had an idea about how I would start implementing it?
'Inspired from Mathematica's front end's ctrl-. feature, one would be able to sequentially select, in visual mode, the successive layers of an expression the cursor is taken into. For, example, if we consider the following expression in an imaginary langage:
# enter visual mode at this position:
for(i in 1:n){
a = append(a, b[i %% floor((n + 1) / 2)] + c - n * last(a));
^
}
---------------------------------------------------------------------------------
2 # selected text after first hit
(n + 1) / 2 # second hit
floor((n + 1) / 2) # third hit
i %% floor((n + 1) / 2) # fourth hit
b[i %% floor((n + 1) / 2)] # fifth hit
b[i %% floor((n + 1) / 2)] + c - n * last(a) # sixth hit
append(a, b[i %% floor((n + 1) / 2)] + c - n * last(a)) # seventh hit
a = append(a, b[i %% floor((n + 1) / 2)] + c - n * last(a)); # eight hit
for(i in 1:n){
a = append(a, b[i %% floor((n + 1) / 2)] + c - n * last(a)); `# etc. until the whole file gets selected
}
I am aware this would require the feature to be aware of the various operators in the langage and their respective precedences, but this is not too much of an input, is it?
Any idea?
Vim can't do that by default but there is at least one plugin that does what you want: vim-expand-region.