Try using sample Group info Encounter AttributeError about module 'pulp' - snakemake

Snakemake version
5.25.0
Describe the bug
I am trying construct a rule by group merge-like operation . dry-run is ok, but snakemake -p -j1 failed with this ERROR
AttributeError: module 'pulp' has no attribute 'apis'
Logs
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
2 IDRPeakMerge
1 all
3
Traceback (most recent call last):
File ".../miniconda3/lib/python3.7/site-packages/snakemake/__init__.py", line 735, in snakemake
keepincomplete=keep_incomplete,
File ".../miniconda3/lib/python3.7/site-packages/snakemake/workflow.py", line 972, in execute
success = scheduler.schedule()
File ".../miniconda3/lib/python3.7/site-packages/snakemake/scheduler.py", line 406, in schedule
else self.job_selector_ilp(needrun)
File ".../miniconda3/lib/python3.7/site-packages/snakemake/scheduler.py", line 616, in job_selector_ilp
if pulp.apis.LpSolverDefault is None:
AttributeError: module 'pulp' has no attribute 'apis'
Minimal example
SAMPLES = {
'group1':["SRR1552451","SRR1552453","ERR127302"],
'group2':["ERR127302","SRR1552452"]
}
rule all:
input:
expand('idr/{group}.idr.peak',group=list(SAMPLES.keys()))
rule IDRPeakMerge:
input:
lambda wildcards: expand('macs2/{sample}.summits.bed', sample=SAMPLES[wildcards.group])
output:
'idr/{group}.idr.peak'
run:
shell("head -n 1 {input} >> {output} ")
Additional context
file tree
s03peaks
├── idr
└── macs2
├── ERR127302.summits.bed
├── SRR1552451.summits.bed
├── SRR1552452.summits.bed
└── SRR1552453.summits.bed

Related

How to use snakemake.script in wrappers?

I've been trying to create portable snakemake wrappers that executes pre-created scripts in the "wrapper.py" script. So far though, all the examples I've found call shell from snakemake.shell to run functions from the command line. So I thought an equivalent for scripts would be using script from snakemake.script to execute the scripts. But when I use this in a rule, it throws an error like this:
Traceback (most recent call last):
File "/home/robertlink/stack_overflow_dummy_example/.snakemake/scripts/tmpqfzkhuv_.wrapper.py", line 7, in <module>
script("scripts/foo.py")
TypeError: script() missing 19 required positional arguments: 'basedir', 'input', 'output', 'params', 'wildcards', 'threads', 'resources', 'log', 'config', 'rulename', 'conda_env', 'container_img', 'singularity_args', 'env_modules', 'bench_record', 'jobid', 'bench_iteration', 'cleanup_scripts', and 'shadow_dir'
Is there a way to easily retrieve the information required for using script? Or am I mistaken that I should even use script in this fashion? Here's a dummy example to replicate the message:
Directory structure:
.
├── Snakefile
└── wrapper
└── path
├── scripts
│ ├── bar.py
│ └── foo.py
└── wrapper.py
Snakefile:
rule foobar:
output:
"foobar.txt"
wrapper:
"file:wrapper/path"
wrapper.py
from snakemake.script import script
script("scripts/foo.py")
script("scripts/bar.py")
foo.py
with open("foo_intermediate.txt", 'w') as handle:
handle.write("foo")
bar.py
with open("foo_intermediate.txt", 'w') as handle:
foo = handle.read()
foo += 'bar'
with open(snakemake.output) as handle:
handle.write(foo)
command run:
$ snakemake --cores 3
Any insight into this would be wonderful. Thanks!
You don't have to write a wrapper to call your scripts - the scripts can be the wrapper. Maybe take a look at this wrapper based on an Rscript to get the idea:
https://snakemake-wrappers.readthedocs.io/en/latest/wrappers/tximport.html

Can't build spark py-files with pandas included

I am attempting to package up my dependencies for a spark program I am creating. I have a requirements.txt file as below
pandas
I then run
pip3 install -t dependencies -r requirements.txt
cd dependencies
zip -r ../dependencies.zip .
pyspark --py-files dependencies.zip
And run the line -
import pandas
And I get the error -
Traceback (most recent call last):
File "/mnt/tmp/spark-REDACTED/userFiles-REDACTED/dependencies.zip/pandas/__init__.py", line 31, in <module>
File "/mnt/tmp/spark-REDACTED/userFiles-REDACTED/dependencies.zip/pandas/_libs/__init__.py", line 3, in <module>
File "/mnt/tmp/spark-REDACTED/userFiles-REDACTED/dependencies.zip/pandas/_libs/tslibs/__init__.py", line 3, in <module>
ModuleNotFoundError: No module named 'pandas._libs.tslibs.conversion'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/mnt/tmp/spark-REDACTED/userFiles-REDACTED/dependencies.zip/pandas/__init__.py", line 36, in <module>
ImportError: C extension: No module named 'pandas._libs.tslibs.conversion' not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.
Any ideas on how to fix this?
In order to ship dependency on the worker, there are two ways one is exactly what you did, zip the file, or simple py file then use --py-file. The problem you encountered is because of missing C dependency on the worker side. Pkg like NumPy/pandas all have c dependency.
In order to solve this, create the virtualenv, and zip the virtualenv including
the python executable
PYSPARK_DRIVER_PYTHON = <path to current working python>
PYSPARK_PYTHON = './venv/<path to python executable>'
pyspark --archives = <path to zip file>#venv
or follow this link

Tensorflow build error : Cannot find cudnn.h under ~

I am trying to build tensorflow r1.12 using bazel 0.15 on Redhat 7.5 ppc64le.
I am stuck with the following error.
[u0017649#sys-97184 tensorflow]$ bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
...
ERROR: error loading package 'tensorflow/tools/pip_package': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package
'#local_config_cuda//cuda': Traceback (most recent call last):
File
"/home/u0017649/files/tensorflow/third_party/gpus/cuda_configure.bzl", line 1447
_create_local_cuda_repository(repository_ctx)
File
"/home/u0017649/files/tensorflow/third_party/gpus/cuda_configure.bzl", line 1187, in _create_local_cuda_repository
_get_cuda_config(repository_ctx)
File
"/home/u0017649/files/tensorflow/third_party/gpus/cuda_configure.bzl", line 911, in _get_cuda_config
_cudnn_version(repository_ctx, cudnn_install_base..., ...)
File
"/home/u0017649/files/tensorflow/third_party/gpus/cuda_configure.bzl", line 582, in _cudnn_version
_find_cudnn_header_dir(repository_ctx, cudnn_install_base...)
File
"/home/u0017649/files/tensorflow/third_party/gpus/cuda_configure.bzl", line 869, in _find_cudnn_header_dir
auto_configure_fail(("Cannot find cudnn.h under %s" ...))
File
"/home/u0017649/files/tensorflow/third_party/gpus/cuda_configure.bzl", line 317, in auto_configure_fail
fail(("\n%sCuda Configuration Error:%...)))
Cuda Configuration Error: Cannot find cudnn.h under /usr/local/cuda-9.2/targets/ppc64le-linux/lib
I do have a soft link for cudnn.h under /usr/local/cuda-9.2/targets/ppc64le-linux/lib as below.
[u0017649#sys-97184 tensorflow]$ ls -l /usr/local/cuda-9.2/targets/ppc64le-linux/lib/cudnn.h
lrwxrwxrwx. 1 root root 57 Feb 20 10:15 /usr/local/cuda-9.2/targets/ppc64le-linux/lib/cudnn.h -> /usr/local/cuda-9.2/targets/ppc64le-linux/include/cudnn.h
Any comments, pls ?
After reading tensorflow/third_party/gpus/cuda_configure.bzl, I could solve this by the following.
$ sudo ln -sf /usr/local/cuda-9.2/targets/ppc64le-linux/include/cudnn.h /usr/include/cudnn.h

Mrjob failed when running on hadoop with lxml library

I'm working on a project using hadoop mapreduce. My project tree have showed in this picture:
MyProject
├── parse_xml_file.py
├── store_xml_directory
│   └── my_xml_file.xml
├── requirements.txt
├── input_to_hadoop.txt
└── testMrjob.py
I've run without error when run in local with command:
python testMrjob.py < input_to_hadoop.txt > output
But when running on hadoop using follow command: (all node have installed lxml library)
python testMrjob.py -r hadoop --file parse_xml_file.py < input_to_hadoop.txt
Or
python testMrjob.py -r hadoop --file parse_xml_file.py --file store_xml_directory/my_xml_file.xml < input_to_hadoop.txt > output
I've got error:
no configs found; falling back on auto-configuration
creating tmp directory /tmp/testMrjob.haduser.20141018.152349.482573
Uploading input to hdfs:///user/haduser/tmp/mrjob/testMrjob.haduser.20141018.152349.482573/input
reading from STDIN
Copying non-input files into hdfs:///user/haduser/tmp/mrjob/testMrjob.haduser.20141018.152349.482573/files/
Using Hadoop version 1.2.1
HADOOP: Loaded the native-hadoop library
HADOOP: Snappy native library not loaded
HADOOP: Total input paths to process : 1
HADOOP: getLocalDirs(): [/opt/hadoop/dfs/mapred/local]
HADOOP: Running job: job_201410182107_0012
HADOOP: To kill this job, run:
HADOOP: /opt/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=master:54311 -kill job_201410182107_0012
HADOOP: Tracking URL: http://master:50030/jobdetails.jsp?jobid=job_201410182107_0012
HADOOP: map 0% reduce 0%
HADOOP: map 100% reduce 100%
HADOOP: To kill this job, run:
HADOOP: /opt/hadoop/libexec/../bin/hadoop job -Dmapred.job.tracker=master:54311 -kill job_201410182107_0012
HADOOP: Tracking URL: http://master:50030/jobdetails.jsp?jobid=job_201410182107_0012
HADOOP: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201410182107_0012_m_000000
HADOOP: killJob...
HADOOP: Streaming Command Failed!
STDOUT: packageJobJar: [/opt/hadoop/tmp/hadoop-unjar9122722052766576889/] [] /tmp/streamjob2542718124608434574.jar tmpDir=null
Job failed with return code 1: ['/opt/hadoop/bin/hadoop', 'jar', '/opt/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar', '-files', 'hdfs:///user/haduser/tmp/mrjob/testMrjob.haduser.20141018.152349.482573/files/testMrjob.py#testMrjob.py,hdfs:///user/haduser/tmp/mrjob/testMrjob.haduser.20141018.152349.482573/files/requirements.txt#requirements.txt,hdfs:///user/haduser/tmp/mrjob/testMrjob.haduser.20141018.152349.482573/files/parse_xml_file.py#parse_xml_file.py', '-archives', 'hdfs:///user/haduser/tmp/mrjob/testMrjob.haduser.20141018.152349.482573/files/mrjob.tar.gz#mrjob.tar.gz', '-cmdenv', 'PYTHONPATH=mrjob.tar.gz', '-input', 'hdfs:///user/haduser/tmp/mrjob/testMrjob.haduser.20141018.152349.482573/input', '-output', 'hdfs:///user/haduser/tmp/mrjob/testMrjob.haduser.20141018.152349.482573/output', '-mapper', 'python testMrjob.py --step-num=0 --mapper', '-reducer', 'python testMrjob.py --step-num=0 --reducer']
Scanning logs for probable cause of failure
Traceback (most recent call last):
File "testMrjob.py", line 25, in <module>
MRWordFrequencyCount.run()
File "/usr/lib/python2.7/dist-packages/mrjob/job.py", line 516, in run
mr_job.execute()
File "/usr/lib/python2.7/dist-packages/mrjob/job.py", line 532, in execute
self.run_job()
File "/usr/lib/python2.7/dist-packages/mrjob/job.py", line 602, in run_job
runner.run()
File "/usr/lib/python2.7/dist-packages/mrjob/runner.py", line 516, in run
self._run()
File "/usr/lib/python2.7/dist-packages/mrjob/hadoop.py", line 239, in _run
self._run_job_in_hadoop()
File "/usr/lib/python2.7/dist-packages/mrjob/hadoop.py", line 442, in _run_job_in_hadoop
raise Exception(msg)
Exception: Job failed with return code 1: ['/opt/hadoop/bin/hadoop', 'jar', '/opt/hadoop/contrib/streaming/hadoop-streaming-1.2.1.jar', '-files', 'hdfs:///user/haduser/tmp/mrjob/testMrjob.haduser.20141018.152349.482573/files/testMrjob.py#testMrjob.py,hdfs:///user/haduser/tmp/mrjob/testMrjob.haduser.20141018.152349.482573/files/requirements.txt#requirements.txt,hdfs:///user/haduser/tmp/mrjob/testMrjob.haduser.20141018.152349.482573/files/parse_xml_file.py#parse_xml_file.py', '-archives', 'hdfs:///user/haduser/tmp/mrjob/testMrjob.haduser.20141018.152349.482573/files/mrjob.tar.gz#mrjob.tar.gz', '-cmdenv', 'PYTHONPATH=mrjob.tar.gz', '-input', 'hdfs:///user/haduser/tmp/mrjob/testMrjob.haduser.20141018.152349.482573/input', '-output', 'hdfs:///user/haduser/tmp/mrjob/testMrjob.haduser.20141018.152349.482573/output', '-mapper', 'python testMrjob.py --step-num=0 --mapper', '-reducer', 'python testMrjob.py --step-num=0 --reducer']
To spread python modules using mrjob, you should use --python-archive rather than --file.

How do I run a single test with Nose in Pylons

I have a Pylons 1.0 app with a bunch of tests in the test/functional directory.
I'm getting weird test results and I want to just run a single test.
The nose documentation says I should be able to pass in a test name at the command line but I get ImportErrors no matter what I do
For example:
nosetests -x -s sometestname
Gives:
Traceback (most recent call last):
File "/home/ben/.virtualenvs/tsq/lib/python2.6/site-packages/nose-0.11.4-py2.6.egg/nose/loader.py", line 371, in loadTestsFromName
module = resolve_name(addr.module)
File "/home/ben/.virtualenvs/tsq/lib/python2.6/site-packages/nose-0.11.4-py2.6.egg/nose/util.py", line 334, in resolve_name
module = __import__('.'.join(parts_copy))
ImportError: No module named sometestname
I get the same error for
nosetests -x -s appname.tests.functional.testcontroller
What is the correct syntax?
nosetests appname.tests.functional.test_controller should work, where the file is named test_controller.py.
To run a specific test class and method use a path of the form module.path:ClassNameInFile.method_name, that is, with a colon separating the module/file path and the objects within the file. module.path is the relative path to the file (e.g. tests/my_tests.py:ClassNameInFile.method_name).
For me using Nosetests 1.3.0 these variants are working (but make sure you have __init__.py in your tests folder):
nosetests [options] tests.ui_tests
nosetests [options] tests/ui_tests.py
nosetests [options] tests.ui_tests:TestUI.test_admin_page
Note that single colon between module name and class name.
I have to add the ".py" file extension, that is,
r'/path_to/my_file.py:' + r'test_func_xy'
Maybe this is because I don't have any classes in the file.
Without the .py, nose was complaining:
Can't find callable test_func_xy in file /path_to/my_file: file is not
a python module
And this although I have an __init__.py in the folder /path_to/.
I wrote this small script, based on the previous answers:
#!/usr/bin/env bash
#
# Usage:
#
# ./noseTest <filename> <method_name>
#
# e.g.:
#
# ./noseTest test/MainTest.py mergeAll
#
# It is assumed that the file and the test class have the _same name_
# (e.g. the test class `MainTest` is defined in the file `MainTest.py`).
# If you don't follow this convention, this script won't work for you.
#
testFile="$1"
testMethod="$2"
testClass="$(basename "$testFile" .py)"
nosetests "$testFile:$testClass.test_$testMethod"
The following worked for me just well:
nosetests test_file.py:method_name
Note that my tests where not in a class. Test methods were in a single file.
For nosetests 1.3.7, you need to do:
nosetests --tests=tests.test_something.py,tests.test_something_else.py.