I'm trying to learn nextflow but it's not going very well.
I used NGS-based double-end sequencing data to build an analysis flow from fastq files to vcf files using Nextflow. However I got stuck right at the beginning, as shown in the code. The first process soapnuke works fine, but when passing the files from the channel (clean_fq1 \ clean_fq2) to the next process there is an ERROR: No such variable: from. As shown in the figure below. What should I do? Thanks for a help.
enter image description here
params.fq1 = "/data/mPCR/220213_I7_V350055104_L3_SZPVL22000812-81/*1.fq.gz"
params.fq2 = "/data/mPCR/220213_I7_V350055104_L3_SZPVL22000812-81/*2.fq.gz"
params.index = "/home/duxu/project/data/index.list"
params.primer = “/home/duxu/project/data/primer_*.fasta"
params.output='results'
fq1 = Channel.frompath(params.fq1)
fq2 = Channel.frompath(params.fq2)
index = Channel.frompath(params.index)
primer = Channel.frompath(params.primer)
process soapnuke{
conda'soapnuke'
tag{"soapnuk ${fq1} ${fq2}"}
publishDir "${params.outdir}/SOAPnuke", mode: 'copy'
input:
file rawfq1 from fq1
file rawfq2 from fq2
output:
file 'clean1.fastq.gz' into clean_fq1
file 'clean2.fastq.gz' into clean_fq2
script:
"""
SOAPnuke filter -1 $rawfq1 -2 $rawfq2 -l 12 -q 0.5 -Q 2 -o . \
-C clean1.fastq.gz -D clean2.fastq.gz
"""
}
I get stuck on this:
process barcode_splitter{
conda'barcode_splitter'
tag{"barcode_splitter ${fq1} ${fq2}"}
publishDir "${params.outdir}/barcode_splitter", mode: 'copy'
input:
file split1 from clean_fq1
file split2 from clean_fq2
index from params.index
output:
file '*-read-1.fastq.gz' into trimmed_index1
file '*-read-2.fastq.gz' into trimmed_index2
script:
"""
barcode_splitter --bcfile $index $split1 $split2 --idxread 1 2 --mismatches 1 --suffix .fastq --gzipout
"""
}
The code below will produce the error you see:
index = Channel.fromPath( params.index )
process barcode_splitter {
...
input:
index from params.index
...
}
What you want is:
index = file( params.index )
process barcode_splitter {
...
input:
path index
...
}
Note that when the file input name is the same as the channel name, the from channel declaration can be omitted. I also used the path qualifier above, as it should be preferred over the file qualifier when using Nextflow 19.10.0 or later.
You may also want to consider refactoring to use the fromFilePairs factory method. Here's one way, untested of course:
params.reads = "/data/mPCR/220213_I7_V350055104_L3_SZPVL22000812-81/*_{1,2}.fq.gz"
params.index = "/home/duxu/project/data/index.list"
params.output = 'results'
reads_ch = Channel.fromFilePairs( params.reads )
index = file( params.index )
process soapnuke {
tag { sample }
publishDir "${params.outdir}/SOAPnuke", mode: 'copy'
conda 'soapnuke'
input:
tuple val(sample), path(reads) from reads_ch
output:
tuple val(sample), path('clean{1,2}.fastq.gz') into clean_reads_ch
script:
def (rawfq1, rawfq2) = reads
"""
SOAPnuke filter \\
-1 "${rawfq1}" \\
-2 "${rawfq2}" \\
-l 12 \\
-q 0.5 \\
-Q 2 \\
-o . \\
-C "clean1.fastq.gz" \\
-D "clean2.fastq.gz"
"""
}
process barcode_splitter {
tag { sample }
publishDir "${params.outdir}/barcode_splitter", mode: 'copy'
conda 'barcode_splitter'
input:
tuple val(sample), path(reads) from clean_reads_ch
path index
output:
tuple val(sample), path('*-read-{1,2}.fastq.gz') into trimmed_index
script:
def (splitfq1, splitfq2) = reads
"""
barcode_splitter \\
--bcfile \\
"${index}" \\
"${split1}" \\
"${split2}" \\
--idxread 1 2 \\
--mismatches 1 \\
--suffix ".fastq" \\
--gzipout
"""
}
I followed the following steps:
1.CityScapes data set preparation
2.Generate TFRecords of CityScapes
3.Download the pre-training model
4.Run official instruction
python deeplab/train.py \
--logtostderr \
--training_number_of_steps=1000 \
--train_split="train" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size=513 \
--train_crop_size=513 \
--train_batch_size=1 \
--dataset="cityscapes" \
--tf_initial_checkpoint='/root/newP/official_tf/models-master/research/deeplab/backbone/deeplabv3_cityscapes_train/model.ckpt' \
--train_logdir='/root/newP/official_tf/models-master/research/deeplab/exp/train_on_train_set/train' \
--dataset_dir='/root/dataset/cityscapesScripts/tfrecord'
An error occurred while training deeplabv3++ using CityScapes dataset
“data split name train not recognized”.
I found the problem after debugging: "train" no longer exit in
"_CITYSCAPES_INFORMATION.splits_to_sizes".
Content in the code:
_CITYSCAPES_INFORMATION = DatasetDescriptor(
splits_to_sizes={'train_fine': 2975,
'train_coarse': 22973,
'trainval_fine': 3475,
'trainval_coarse': 23473,
'val_fine': 500,
'test_fine': 1525},
num_classes=19,
ignore_label=255,
)
I tried several others "train_fine","train_coarse".A new error occurred:
"Total size of new array must be unchanged for image_pooling/weights lh_shape: [(1, 1, 2048, 256)], rh_shape: [(1, 1, 320, 256)]".
May I ask what modifications I should do?
I found that the latest version of the pretraining model had a problem, and I could run it directly when I was not using the pretraining model.
[https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md][1]
now I am practicing to convert my own image data to TFRrcords for tensorflow.I am really new with tensorflow so I just modified the build_image_data.py which I got from Github.
This is some parts of the original code:
bazel-bin/inception/build_image_data \
--train_directory="${TRAIN_DIR}" \
--validation_directory="${VALIDATION_DIR}" \
--output_directory="${OUTPUT_DIRECTORY}" \
--labels_file="${LABELS_FILE}" \
--train_shards=128 \
--validation_shards=24 \
--num_threads=8
And I replace them with :
# convert the data.
bazel-bin/inception/build_image_data \
--train_directory=("C:/Dataset/Training data")
--validation_directory=("C:/Dataset/Test data")
--output_directory=("C:/Dataset/Trf")
--labels_file="C:/Dataset/Labels file"
--train_shards=128
--validation_shards=24
--num_threads=8
But I got an error as follows:
File "<ipython-input-12-4e5ff554c85f>", line 90
bazel-bin/inception/build_image_data --train_directory=("C:/Dataset/Training data")
^
SyntaxError: can't assign to operator
Someone could help me, please.
Thanks.
Just remove parentheses around path:
bazel-bin/inception/build_image_data \
--train_directory="C:/Dataset/Training data"
--validation_directory="C:/Dataset/Test data"
--output_directory="C:/Dataset/Trf"
--labels_file="C:/Dataset/Labels file"
--train_shards=128
--validation_shards=24
--num_threads=8
I need help to understand why when sending audio to the camera you hear ugly, very fast.
The camera is configured audio codec G711Ulaw
The process I am doing is the following:
I download a wav audio and converted to the codec that the camera is configured, these are all evidence conversions.
ffmpeg -i padrino.wav -acodec pcm_mulaw -ar 8000 -ac 1 -b:a 32k output.wav
ffmpeg -i padrino.wav -acodec pcm_mulaw -ar 8000 -ac 2 -b:a 32000 output.wav
ffmpeg -i padrino.wav -f mulaw -acodec pcm_mulaw -ac 1 output.wav
ffmpeg -i padrino.wav -ar 8000 -ac 1 -ab 64k -f mulaw output.ulaw
Turned on the two-way-audio, within the "data.xml" is the xml that enables two-way-audio:
curl -H "application/xml" -X PUT -d #data.xml USER:PASS#IPCAM/ISAPI/System/...hannels/1/open
I send through a curl
curl -H "application/binary" -X PUT -d #output.ulaw USER:PASS#IPCAM/ISAPI/System/...ls/1/audioData
or
curl -H "application/binary" -X PUT -d #output.wav USER:PASS#IPCAM/ISAPI/System/...ls/1/audioData
This is heard in camera but as I explained at the beginning is heard wrong, I distorted, very fast.
What am I doing wrong?
regards
I have found out why this is - its nothing to do with the encoding. I have written a C# app to test this and if you send the data at the rate expected (8000 samples per second) then it plays correctly.
I send the audio data in packets (160 bytes currently but experimenting with optimum values but does not seem to matter much as long as the delay is correct) and delay for the appropriate amount of time before sending again so that the correct amount of samples are sent in a second.
I found this interesting a project on github that helped me create this simple app that can send the audio to the camera using python:
import urllib.request
import requests
import socket
import time
class SocketGrabber:
""" A horrible hack, so as to allow us to recover
the socket we still need from urllib """
def __init__(self):
self.sock = None
def __enter__(self):
self._temp = socket.socket.close
socket.socket.close = lambda sock: self._close(sock)
return self
def __exit__(self, type, value, tb):
socket.socket.close = self._temp
if tb is not None:
self.sock = None
def _close(self, sock):
if sock._closed:
return
if self.sock == sock:
return
if self.sock is not None:
self._temp(self.sock)
self.sock = sock
audio_file = "output.ulaw"
ip = "IPCAM"
username = "USER"
password = "PASS"
index = 1
base = f"http://{ip}"
chunksize = 128
sleep_time = 1.0 / 64
base_url = f"http://{username}:{password}#{ip}"
req = requests.put(
f"{base_url}/ISAPI/System/TwoWayAudio/channels/{index}/open")
mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
mgr.add_password(None, [base], username, password)
auth = urllib.request.HTTPDigestAuthHandler(mgr)
opener = urllib.request.build_opener(auth)
audiopath = f"{base}/ISAPI/System/TwoWayAudio/channels/{index}/audioData"
with SocketGrabber() as sockgrab:
req = urllib.request.Request(audiopath, method='PUT')
resp = opener.open(req)
output = sockgrab.sock
def frames_yield(ulaw_data, chunksize=128):
for i in range(0, len(ulaw_data), chunksize):
for x in [ulaw_data[i:i + chunksize]]:
tosend = x + (b'\xff' * (chunksize - len(x)))
time.sleep(sleep_time)
yield tosend
with open(audio_file, 'rb') as file_obj:
ulaw_data = file_obj.read()
for dataframe in frames_yield(ulaw_data, chunksize):
output.send(dataframe)
I have two machine, each of which has 4 GPUs. I use
with tf.device('/job:worker/replica:%d/task:%d/gpu:%d' % (FLAGS.replica_id, FLAGS.task_id, FLAGS.gpu_device_id)):
to dictate device, but failed with these error log:
tensorflow.python.framework.errors.InvalidArgumentError: Cannot assign a device to node 'init_all_tables': Could not satisfy explicit device specification '/job:worker/replica:1/task:4/device:GPU:0' because no devices matching that specification are registered in this process; available devices: /job:ps/replica:0/task:0/cpu:0,
/job:worker/replica:0/task:0/cpu:0, /job:worker/replica:0/task:0/gpu:0, /job:worker/replica:0/task:0/gpu:1, /job:worker/replica:0/task:0/gpu:2, /job:worker/replica:0/task:0/gpu:3, /job:worker/replica:0/task:1/cpu:0, /job:worker/replica:0/task:1/gpu:0, /job:worker/replica:0/task:1/gpu:1, /job:worker/replica:0/task:1/gpu:2, /job:worker/replica:0/task:1/gpu:3, /job:worker/replica:0/task:2/cpu:0, /job:worker/replica:0/task:2/gpu:0, /job:worker/replica:0/task:2/gpu:1, /job:worker/replica:0/task:2/gpu:2, /job:worker/replica:0/task:2/gpu:3, /job:worker/replica:0/task:4/cpu:0, /job:worker/replica:0/task:4/gpu:0, /job:worker/replica:0/task:4/gpu:1, /job:worker/replica:0/task:4/gpu:2, /job:worker/replica:0/task:4/gpu:3, /job:worker/replica:0/task:5/cpu:0, /job:worker/replica:0/task:5/gpu:0, /job:worker/replica:0/task:5/gpu:1, /job:worker/replica:0/task:5/gpu:2, /job:worker/replica:0/task:5/gpu:3, /job:worker/replica:0/task:6/cpu:0, /job:worker/replica:0/task:6/gpu:0, /job:worker/replica:0/task:6/gpu:1, /job:worker/replica:0/task:6/gpu:2, /job:worker/replica:0/task:6/gpu:3, /job:worker/replica:0/task:7/cpu:0, /job:worker/replica:0/task:7/gpu:0, /job:worker/replica:0/task:7/gpu:1, /job:worker/replica:0/task:7/gpu:2, /job:worker/replica:0/task:7/gpu:3
it seems like tensorflow can't find machine B ? but I have totally same hardware and software configuration on both machine.
the start script:
# machine 10.10.12.28
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=0 \
--task_id=0 \
--gpu_device_id=0 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=0 \
--task_id=1 \
--gpu_device_id=1 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=0 \
--task_id=2 \
--gpu_device_id=2 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=0 \
--task_id=3 \
--gpu_device_id=3 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
CUDA_VISIBLE_DEVICES='' ~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--job_name='ps' \
-task_id=0 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
# machine 10.10.12.29
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=1 \
--task_id=4 \
--gpu_device_id=0 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=1 \
--task_id=5 \
--gpu_device_id=1 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=1 \
--task_id=6 \
--gpu_device_id=2 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
~/models/inception/bazel-bin/inception/imagenet_distributed_train \
--batch_size=32 \
--data_dir=/data1/imagenet1k \
--job_name='worker' \
--replica_id=1 \
--task_id=7 \
--gpu_device_id=3 \
--ps_hosts='10.10.102.28:2220' \
--worker_hosts='10.10.102.28:2221,10.10.102.28:2222,10.10.102.28:2223,10.10.102.29:2224,10.10.102.29:2221,10.10.102.29:2222,10.10.102.29:2223,10.10.102.29:2224' &
TL;DR: Don't ever use '/replica:%d' in your device specification.
The problem seems to be in your device string:
'/job:worker/replica:%d/task:%d/gpu:%d' % (FLAGS.replica_id, FLAGS.task_id, FLAGS.gpu_device_id)
The device specification '/replica:%d' is not supported in the open-source version of TensorFlow (but it is retained for some backwards compatibility reasons). The replica ID should be 0 for all tasks. You can solve this immediately by passing 0 as the --replica_id for each task, but you should really remove that flag from your version of the code.