combine to outputs of diffrent rules in Snakemake - snakemake

I would like to use snakemake to first merge some files and than later process other files based on that merge. (Less abstract: I want to combine control IGG bam files of two different sets and than use those to perform peakcalling on other files.
In a minimal example, the folder structure would look like this.
├── data
│   ├── toBeMerged
│   │   ├── singleA
│   │   ├── singleB
│   │   ├── singleC
│   │   └── singleD
│   └── toBeProcessed
│   ├── NotProcess1
│   ├── NotProcess2
│   ├── NotProcess3
│   ├── NotProcess4
│   └── NotProcess5
├── merge.cfg
├── output
│   ├── mergeAB_merge
│   ├── mergeCD_merge
│   ├── NotProcess1_processed
│   ├── NotProcess2_processed
│   ├── NotProcess3_processed
│   ├── NotProcess4_processed
│   └── NotProcess5_processed
├── process.cfg
└── Snakefile
Which files are combined and which are processed are defined in two config files.
merge.cfg
singlePath controlName
data/toBeMerged/singleA output/controlAB
data/toBeMerged/singleB output/controlAB
data/toBeMerged/singleC output/controlCD
data/toBeMerged/singleD output/controlCD
and process.cfg
controlName inName
output/controlAB data/toBeProcessed/NotProcess1
output/controlAB data/toBeProcessed/NotProcess2
output/controlCD data/toBeProcessed/NotProcess3
output/controlCD data/toBeProcessed/NotProcess4
output/controlAB data/toBeProcessed/NotProcess5
I am currently stuck with a snakefile like this, which itself does not work and gives me the error that both rules are ambiguous. And even if I would get it to work, I suspect, that this not the "correct" way, since the process rule, should have {mergeName} as input to build its dag. But this does not work, since then I would need two wildcarts in one rule.
import pandas as pd
cfgMerge = pd.read_table("merge.cfg").set_index("controlName", drop=False)
cfgProc= pd.read_table("process.cfg").set_index("inName", drop=False)
rule all:
input:
expand('{mergeName}', mergeName= cfgMerge.controlName),
expand('{rawName}_processed', rawName= cfgProc.inName)
rule merge:
input:
lambda wc: cfgMerge[cfgMerge.controlName == wc.mergeName].singlePath
output:
"{mergeName}"
shell:
"cat {input} > {output}"
rule process:
input:
inMerge=lambda wc: cfgProc[cfgProc.inName == wc.rawName].controlName.iloc[0],
Name=lambda wc: cfgProc[cfgProc.inName == wc.rawName].inName.iloc[0]
output:
'{rawName}_processed'
shell:
"cat {input.inMerge} {input.Name} > {output}"
I guess the key problem is how to use the output of a rule as the input for another one, when it does not depend on the same wildcard, or includes other another wildcard.

For future reference:
The problem did not seem to be the "using the output of a rule as the input for another one, when it does not depend on the same wildcard, or includes other another wildcard."
It seems that the input for rule all and the output for the other two rules where ambiguous. The simple solution is to put every output in a different directory and it worked (see below).
import pandas as pd
cfgMerge = pd.read_table("merge.cfg").set_index("controlName", drop=False)
cfgProc= pd.read_table("process.cfg").set_index("inName", drop=False)
#ruleorder: merge > process
rule all:
input:
expand('output/bam/{rawName}_processed', rawName= cfgProc.inName),
expand('output/control/{controlNameSnake}', controlNameSnake= cfgMerge.controlName.unique())
rule merge:
input:
lambda wc: cfgMerge[cfgMerge.controlName == wc.controlNameSnake].singlePath.unique()
output:
'output/control/{controlNameSnake}'
shell:
'echo {input} > {output}'
rule process:
input:
in1="data/toBeProcessed/{rawName}",
in2=lambda wc: "output/control/"+"".join(cfgProc[cfgProc.inName == wc.rawName].controlName.unique())
output:
'output/bam/{rawName}_processed'
shell:
'echo {input} > {output}'

Related

How to properly use CMake to create a complex project with dependencies on custom libraries

I'm trying to create a complex project that becomes a single executable file that uses the following libraries: two libraries BHV and HAL that use one interface library.
I have this project structure:
.
├── BHV
│   ├── CMakeCache.txt
│   ├── CMakeFiles
│   ├── cmake_install.cmake
│   ├── CMakeLists.txt
│   ├── include
│   ├── libBHV_Library.so
│   ├── Makefile
│   └── sources
├── HAL
│   ├── check_libraries.cmake
│   ├── CMakeCache.txt
│   ├── CMakeFiles
│   ├── cmake_install.cmake
│   ├── CMakeLists.txt
│   ├── include
│   ├── libHAL_Library.so
│   ├── Makefile
│   └── sources
├── Interface
│   ├── CMakeCache.txt
│   ├── CMakeFiles
│   ├── cmake_install.cmake
│   ├── CMakeLists.txt
│   ├── include
│   ├── libInterface_Library.a
│   ├── Makefile
│   └── sources
├── CMakeCache.txt
├── CMakeFiles
├── cmake_install.cmake
├── CMakeLists.txt
├── main.cpp
├── Makefile
├── README.md
Unfortunately, I can't connect the individual libraries to each other.
In Interface_lib CMakeList.txt I have this:
cmake_minimum_required(VERSION 3.10)
project(Interface_Library)
#requires at least C++17
set(CMAKE_CXX_STANDARD 17)
# Add all .cpp files from sources folder
file(GLOB SOURCES "sources/*.cpp")
# Add all .h files from include folder
file(GLOB HEADERS "include/*.h")
# Add main.cpp to the project
add_library(Interface_Library STATIC ${SOURCES} ${HEADERS})
In HAL_lib CMakeList.txt I have this:
cmake_minimum_required(VERSION 3.10)
project(HAL_Library)
# requires at least C++17
set(CMAKE_CXX_STANDARD 17)
################################### DIR_MANAGMENT #########################################
# Get the parent directory
get_filename_component(PARENT_DIR ${CMAKE_CURRENT_LIST_DIR} DIRECTORY)
#Set the directory of the parent file as the current directory
set(CMAKE_CURRENT_LIST_DIR ${PARENT_DIR})
message("MYPROJECT_DIR directory: ${CMAKE_CURRENT_LIST_DIR}")
################################### HAL_LIB #########################################
# Add all .cpp files from sources folder
file(GLOB SOURCES "sources/*.cpp")
# Add all .h files from include folder
file(GLOB HEADERS "include/*.h")
# Add main.cpp to the project
add_library(HAL_Library SHARED ${SOURCES} ${HEADERS})
################################### INTERFACE_LIB #########################################
target_link_directories(${PROJECT_NAME} PUBLIC ${CMAKE_CURRENT_LIST_DIR}/Interface)
# Link the Interface_Library into the HAL_Library
target_link_libraries(HAL_Library Interface_Library)
# check if libraries were included
set(TARGET_NAME HAL_Library)
include(check_libraries.cmake)
this is the code i use to check_libraries.cmake (from the internet)
# Get a list of referenced libraries
get_target_property(LINK_LIBS ${TARGET_NAME} LINK_LIBRARIES)
# Print the list of referenced libraries
message("Odkazované knihovny: ${LINK_LIBS}")
# Verify that libraries are available on the system
foreach(LIB ${LINK_LIBS})
execute_process(COMMAND ldd $<TARGET_FILE:${TARGET_NAME}> | grep ${LIB}
RESULT_VARIABLE res
OUTPUT_QUIET ERROR_QUIET)
if(res EQUAL "0")
message("Library ${LIB} was successfully linked with ${TARGET_NAME}")
else()
message(FATAL_ERROR "Error: Library ${LIB} not found.")
endif()
endforeach()
As an output I keep getting the library not found. What am I doing wrong?
And is my approach to project structure correct?
Thank you.
Typically you do:
# ./CMakeLists.txt
cmake_minimum_required(VERSION 3.10)
project(Interface_Library)
add_subdirectory(HAL)
add_subdirectory(Interface)
# HAL/CMakeListst.txt
file(srcs GLOB sources/*.c include/*.h)
add_library(HAL ${srcs})
target_include_directories(HAL PUBLIC include)
target_link_libraries(HAL PUBLIC Interface)
# Interface/CMakeLists.txt
file(srcs GLOB source/*.c include/*.h)
add_library(Interface ${srcs})
target_include_directories(HAL PUBLIC include)
That's all. Only one project() call, in the root. Note the target_include_directories missing in your code. Note the main root CMakeLists.txt that includes all. I find HEADERS and SOURCES separately confusing, I would also use lowercase everything. No target_link_directories, CMake will find everything. Also, everything is compiled as one big project, not separately.

Can I make a CMake target dependent upon a target in another CMake project?

I have two separate projects, but one of them must now incorporate aspects of the other, including the generation of some code, which done by a Python script which is called by CMake.
Here is my project structure:
repo/
├── project_top/
│ ├── stuff_and_things.cpp
│ └── CMakeLists.txt
│
└── submods/
└── project_bottom/
├── CMakeLists.txt
└── tools/
├── build_scripts
│ └── cmake_bits.cmake
└── generator
└── gen_code.py
In repo/submods/project_bottom/tools/build_scripts/cmake_bits.cmake there is a macro set_up_additional_targets(), which includes a custom target which runs repo/submods/project_bottom/tools/generator/gen_code.py in that directory. This is based on project_bottom being its own project.
add_custom_target(gen_code
COMMAND echo "Generating code"
COMMAND python3 gen_code.py args
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}/tools/generator
)
Now, I need to make a new target in project_top dependent upon the gen_code target in project_bottom. How do I do this? The gen_code target needs to be run as part of the project_top build, but within the context of project_bottom, because for that target, ${CMAKE_CURRENT_SOURCE_DIR} needs to be repo/submods/project_bottom, not repo/project_top.

What does FindPackageHandleStandardArgs do exactly?

I am trying to write a Find Module for a package that I have installed. But I am having trouble understanding the CMake functions.
Here is a snippet of my code.
find_package(PkgConfig)
pkg_check_modules(PC_zcm QUIET zcm)
find_path(zcm_INCLUDE_DIR
NAMES zcm.h
PATHS $ENV{PATH}
)
mark_as_advanced(zcm_FOUND zcm_INCLUDE_DIR)
include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(zcm DEFAULT_MSG
REQUIRED_VARS zcm_INCLUDE_DIR
)
find_path() is able to find my zcm_INCLUDE_DIR just fine: /usr/bin/zcm/usr/local/include
But find_package_handle_standard_args() gives
-- Could NOT find zcm (missing: REQUIRED_VARS)
My directory tree looks like this:
└── zcm
├── eventlog.h
├── json
│   ├── json-forwards.h
│   └── json.h
├── message_tracker.hpp
├── tools
│   ├── IndexerPlugin.hpp
│   └── TranscoderPlugin.hpp
├── transport
│   └── generic_serial_transport.h
├── transport.h
├── transport_register.hpp
├── transport_registrar.h
├── url.h
├── util
│   └── Filter.hpp
├── zcm-cpp-impl.hpp
├── zcm-cpp.hpp
├── zcm.h
└── zcm_coretypes.h
My understanding is find_package_handle_standard_args() attempts to find the package at the path, which sounds like it would be straightforward as the path is already determined.
As for REQUIRED_VARS the docs just say "Specify the variables which are required for this package." Which doesn't tell much for a noobie like me.
Description of find_package_handle_standard_args notes about two signatures of given function, one signature accepts DEFAULT_MSG option and another one accepts REQUIRED_VARS option.
You are trying to mix these signatures, and this is wrong.
Proper usage of the first signature:
# Everything after DEFAULT_MSG is treated as required variable.
find_package_handle_standard_args(zcm DEFAULT_MSG
zcm_INCLUDE_DIR
)
Proper usage of the second signature:
# By default, the standard error message is used.
find_package_handle_standard_args(zcm REQUIRED_VARS
zcm_INCLUDE_DIR
)

How to add include_directory in simple CMake Project

I have CMake project whose directory structure is as follows:
├── build
├── CMakeLists.txt
├── src
│   ├── CMakeLists.txt
│   ├── headers
│   │   └── utility.h
│   └── main.cpp
└── tests
├── CMakeLists.txt
├── testfeature_a
│   ├── CMakeLists.txt
│   └── test_me.cpp
└── test_main.cpp
In test_me.cpp I wanted to include utility.h as I wanted to test functions defined there. So I did #include "headers/utility.h" and in testfeature_a CMakeLists.txt I did this:
file(GLOB SRCS *.cpp)
ADD_EXECUTABLE(testfeature_a ${SRCS})
include_directories(src/headers)
TARGET_LINK_LIBRARIES(
testfeature_a
libgtest
libgmock
)
add_test(NAME testfeature_a
COMMAND testfeature_a)
But the make fails with the error message fatal error: headers/utility.h: No such file or directory.
How can I include the headers directory in test_me.cpp
Your path in include_directories() may be incorrect. Here are two things that you could check
The file seems to be the CMakeLists.txt in tests-folder so you need to go up one folder before you can go to src-folder, i.e. include_directories(../src/headers).
You repeat the headers-folder in the #include "headers/utility.h" when you have already specified it in include_directories(src/headers). Either use #include "utility.h" in cpp-file or include_directories(src) in CMakeLists.txt.
Other option is that you don't need to specify the headers-folder in the CMakeLists.txt at all. You can simply use #include "path/to/your/file.h without any other configuration.
For debugging your path in the CMakeLists.txt you can call message-function, e.g. message(${your_path}), so it's printed when executed and you can check if it's correct.
In addition you can use CMake built-in variables such as CMAKE_CURRENT_SOURCE_DIR and CMAKE_SOURCE_DIR, e.g. include_directories(${CMAKE_SOURCE_DIR}/src/headers)

Are multiple document roots to serve the same path possible?

Is it possible to have a directory structure on an Apache server similar to this:
└── www
├── root1
│   └── doc1.html
└── root2
└── doc2.html
such that doc1.html and doc2.html are both served on the same path?
(e.g. http://localhost/doc1.html and http://localhost/doc2.html both result in successful requests?)
Do it with aliasing. See httpd.apache.org/docs/2.2/mod/mod_alias.html#alias