How to run Hisat2 in Google colab?

How to run Hisat2 in Google colab? - google-colaboratory

I am trying to run this code on Google colab, but I keep having errors
hisat2Ref=/mnt/c/3adera-RNA/mkr-index/mkr1
hisat2 -x ${hisat2Ref} \
-U IL1R_Th17_PreTrans_R1.fastq.gz|
samtools view -bh - |
samtools sort - > IL1R_Th17_PreTrans_R1.bam
samtools index IL1R_Th17_PreTrans_R1.bam
htseq-count -f bam IL1R_Th17_PreTrans_R1.bam mkr-mus-genome.gtf.gz > IL1R_Th17_PreTrans_R1.txt
Any help is appreciated.

Related

How to make tensorflow-serving example work

I am trying out the tensorflow example from the tutorial page
at the third step
# Start TensorFlow Serving container and open the REST API port
docker run -t --rm -p 8501:8501 \
-v "$TESTDATA/saved_model_half_plus_two_cpu:/models/half_plus_two" \
-e MODEL_NAME=half_plus_two \
tensorflow/serving &
I get the following error message
2020-07-19 11:54:52.858203: E tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:362] FileSystemStoragePathSource encountered a filesystem access error: /models/half_plus_two; Permission denied
This is continuously repeated. I have installed the demo model as mentioned in the tutorial.
git clone https://github.com/tensorflow/serving
TESTDATA="$(pwd)/serving/tensorflow_serving/servables/tensorflow/testdata"
Can someone please help what am i missing? I am just starting off on the serving part.
Thanks
Krishnan

The problem could be with your -v parameter where you are binding the path.
Try (Change the source parameter):
docker run -p 8501:8501 --mount type=bind,\
source=/path/to/yourmodels/,\
target=/models/half_plus_two/1 \
-e MODEL_NAME=half_plus_two -t tensorflow/serving

Snakemake problem: Merge all files together with space delimiter instead of iterating through it

I was trying to run a command which ideally looks like this,
minimap2 -a -x map-ont -t 20 /staging/reference.fasta fastq/sample01.fastq | samtools view -bS -F 4 - | samtools sort -o fastq_minon/sample01.bam
Similarly, I have multiple samples (referring to fastq/sample01.fastq) in the folder.
The snakemake file I wrote to automate this behaviour is, however, parsing all files at once in the command like,
minimap2 -a -x map-ont -t 1 /staging/reference.fasta fastq/sample02.fastq fastq/sample03.fastq fastq/sample01.fastq | samtools view -bS -F 4 - | samtools sort -o fastq_minon/sample02.bam fastq_minon/sample03.bam fastq_minon/sample01.bam
I have pasted the code and logs below. Please help me try to figure out this mistake.
Code
SAMPLES, = glob_wildcards("fastq/{smp}.fastq")
rule minimap:
input:
expand("fastq/{smp}.fastq", smp=SAMPLES)
output:
expand("fastq_minon/{smp}.bam", smp=SAMPLES)
params:
ref = FASTA
threads: 40
shell:
"""
minimap2 -a -x map-ont -t {threads} {params.ref} {input} | samtools view -bS -F 4 - | samtools sort -o {output}
"""
log
Building DAG of jobs...
Job counts:
count jobs
1 minimap
1
[Tue May 5 03:28:50 2020]
rule minimap:
input: fastq/sample02.fastq, fastq/sample03.fastq, fastq/sample01.fastq
output: fastq_minon/sample02.bam, fastq_minon/sample03.bam, fastq_minon/sample01.bam
jobid: 0
minimap2 -a -x map-ont -t 1 /staging/reference.fasta fastq/sample02.fastq fastq/sample03.fastq fastq/sample01.fastq | samtools view -bS -F 4 - | samtools sort -o fastq_minon/sample02.bam fastq_minon/sample03.bam fastq_minon/sample01.bam
Job counts:
count jobs
1 minimap
1
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

The expand function is used to create a list. Thus, in your rule minimap, you're telling snakemake that you want all fastq files as input and that the rule will produce as many bam files. What you want is a rule that will be triggered for every sample using a wildcard:
SAMPLES, = glob_wildcards("fastq/{smp}.fastq")
rule all:
input: expand("fastq_minon/{smp}.bam", smp=SAMPLES)
rule minimap:
input:
"fastq/{smp}.fastq"
output:
"fastq_minon/{smp}.bam"
params:
ref = FASTA
threads: 40
shell:
"""
minimap2 -a -x map-ont -t {threads} {params.ref} {input} | samtools view -bS -F 4 - | samtools sort -o {output}
"""
By defining all the files wanted in rule all, the rule minimap will be triggered as many times as necessary to create ONE bam file from ONE fastq file.
Have a look at my answer to this question to understand the use of wildcards and expand.

How to get all tags from github api

I usually get the releases/tags from github API with below command
$ repo="helm/helm"
$ curl -sL https://api.github.com/repos/${repo}/tags |jq -r ".[].name"
v3.2.0-rc.1
v3.2.0
v3.1.3
v3.1.2
v3.1.1
v3.1.0-rc.3
v3.1.0-rc.2
v3.1.0-rc.1
v3.1.0
v3.0.3
v3.0.2
v3.0.1
v3.0.0-rc.4
v3.0.0-rc.3
v3.0.0-rc.2
v3.0.0-rc.1
v3.0.0-beta.5
v3.0.0-beta.4
v3.0.0-beta.3
v3.0.0-beta.2
v3.0.0-beta.1
v3.0.0-alpha.2
v3.0.0-alpha.1
v3.0.0
v2.16.6
v2.16.5
v2.16.4
v2.16.3
v2.16.2
v2.16.1
But in fact, it doesn't list all releases, what should I do?
For example, I can't get release before v2.16.1 as below link
https://github.com/helm/helm/tags?after=v2.16.1
I try to reference the same way to add ?after=v2.16.1 in curl api
command, but no help
curl -sL https://api.github.com/repos/${repo}/tags?after=v2.16.1 |jq -r ".[].name"
I got same output.
Reference: https://developer.github.com/v3/git/tags/

This could be because of pagination
See this script as an example of detecting pages, and adding the required ?page=x to access to all the data from a GitHub API call.
Relevant extract:
# single page result-s (no pagination), have no Link: section, the grep result is empty
last_page=`curl -s -I "https://api.github.com${GITHUB_API_REST}" -H "${GITHUB_API_HEADER_ACCEPT}" -H "Authorization: token $GITHUB_TOKEN" | grep '^Link:' | sed -e 's/^Link:.*page=//g' -e 's/>.*$//g'`
# does this result use pagination?
if [ -z "$last_page" ]; then
# no - this result has only one page
rest_call "https://api.github.com${GITHUB_API_REST}"
else
# yes - this result is on multiple pages
for p in `seq 1 $last_page`; do
rest_call "https://api.github.com${GITHUB_API_REST}?page=$p"
done
fi

With help from #VonC, I got the result with extra query string ?page=2, if I'd like to query older releases and so on.
curl -sL https://api.github.com/repos/${repo}/tags?page=2 |jq -r ".[].name"
I can easily get the last page now.
$ GITHUB_API_REST="/repos/helm/helm/tags"
$ GITHUB_API_HEADER_ACCEPT="Accept: application/vnd.github.v3+json"
$ GITHUB_TOKEN=xxxxxxxx
$ last_page=`curl -s -I "https://api.github.com${GITHUB_API_REST}" -H "${GITHUB_API_HEADER_ACCEPT}" -H "Authorization: token $GITHUB_TOKEN" | grep '^Link:' | sed -e 's/^Link:.*page=//g' -e 's/>.*$//g'`
$ echo $last_page
4

BigQuery job filter by label

Is there any way to filter big query jobs with label(s)?
I created a job(query) with label task_id:my_task
bq query --use_legacy_sql=false --label "task_id:my_task" --project my-project 'SELECT * FROM `dataset.mytable`'
Tried those to get all jobs with the label but none of them worked:
bq ls -j --filter 'configuration.labels(task_id):my_task'
bq ls -j --filter 'configuration.labels.task_id:my_task'
bq ls -j --filter configuration.labels(task_id):my_task
bq ls -j --filter labels.task_id:my_task

According to the documentation for "bq ls" [1], --filter lists datasets that match the filter expression. But for listing jobs, the documentation [2] mentions that the allowed flags are three: -j, -a and -n.
So, there isn't a way to filter the listing of jobs, at least through the Bigquery command tool. But as a workaround, you can use the following command to get all the Jobs that are labeled as "task_id:my_task"
for i in $(bq ls -j | awk 'NR>2 {print $1}'); do echo "$(bq show -j $i) $i" | awk '/task_id:my_task/ && /SUCCESS/ {print $(NF)}'; done
This command may take some time, though; so consider adding the -n flag like this:
for i in $(bq ls -j -n 10 | awk 'NR>2 {print $1}'); do echo "$(bq show -j $i) $i" | awk '/task_id:my_task/ && /SUCCESS/ {print $(NF)}'; done
You may also submit a feature request to the BigQuery's Issue Tracker [3].
[1] https://cloud.google.com/bigquery/docs/reference/bq-cli-reference#bq_ls
[2] https://cloud.google.com/bigquery/docs/managing-jobs#listing_jobs
[3] https://issuetracker.google.com/issues/new?component=187149&template=1100108

background xargs/wget not adhering to -P and -n limits

I'm having a problem with xargs and Wget when run as shell scripts in an Applescript app. I want Wget to run 4 parallel processes in the background. The problem: basically, when I try to run the process in the background with
cat urls.txt | xargs -P 4 -n 1 /usr/local/bin/wget -q -E -b 1> NUL 2> NUL
a Wget process is apparently started for each URL passed in from the .txt file. This is too burdensome on the user's memory. When I run it in the foreground, however, with something like:
cat urls.txt | xargs -P 4 -n 1 /usr/local/bin/wget -q -E
I seem to get the four parallel Wget processes I need. Does anybody know how to get this script to run in the background with only 4 processes? I'm a bit of a novice, and I'm afraid I can't figure out why backgrounding the process causes this change.

You might run xargs on the background instead:
cat urls.txt | xargs -P4 -n1 wget -q &
Or if you want to return control to the AppleScript, disown the xargs process:
do shell script "cat urls.txt | xargs -P4 -n1 /usr/local/bin/wget -q & disown $!"

As far as I can tell, I have solved the problem with
cat urls.txt| (xargs -P4 -n1 wget -q -E >/dev/null 2>&1) &
There may well be a better solution, though...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to run Hisat2 in Google colab? - google-colaboratory

Related

How to make tensorflow-serving example work

Snakemake problem: Merge all files together with space delimiter instead of iterating through it

How to get all tags from github api

BigQuery job filter by label

background xargs/wget not adhering to -P and -n limits

Categories

Resources