Problem using Lloyd optimization and Mesh_domain::create_labeled_image_mesh_domain - cgal

I'm using CGAL 4.13 (Linux Fedora 29) to generate 3D meshes from segmented anathomical images. I would like to use Lloyd optimization, but I got in a reproductible way a runtime error.
In order to illustrate my problem, I modified the example mesh_3D_image.cpp by adding a Lloyd optimization step, as shown hereafter. The program compiles with no error/warning message.
#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
#include <CGAL/Mesh_triangulation_3.h>
#include <CGAL/Mesh_complex_3_in_triangulation_3.h>
#include <CGAL/Mesh_criteria_3.h>
#include <CGAL/Labeled_mesh_domain_3.h>
#include <CGAL/make_mesh_3.h>
#include <CGAL/Image_3.h>
typedef CGAL::Exact_predicates_inexact_constructions_kernel K;
typedef CGAL::Labeled_mesh_domain_3<K> Mesh_domain;
typedef CGAL::Sequential_tag Concurrency_tag;
typedef CGAL::Mesh_triangulation_3<Mesh_domain,CGAL::Default,Concurrency_tag>::type Tr;
typedef CGAL::Mesh_complex_3_in_triangulation_3<Tr> C3t3;
typedef CGAL::Mesh_criteria_3<Tr> Mesh_criteria;
using namespace CGAL::parameters;
int main(int argc, char* argv[])
{
const char* fname = (argc>1)?argv[1]:"data/liver.inr.gz";
CGAL::Image_3 image;
if(!image.read(fname)){
std::cerr << "Error: Cannot read file " << fname << std::endl;
return EXIT_FAILURE;
}
Mesh_domain domain = Mesh_domain::create_labeled_image_mesh_domain(image);
Mesh_criteria criteria(facet_angle=30, facet_size=6, facet_distance=4,
cell_radius_edge_ratio=3, cell_size=8);
C3t3 c3t3 = CGAL::make_mesh_3<C3t3>(domain, criteria);
// !!! THE FOLLOWING LINE MAKES THE PROGRAM CRASH !!!
CGAL::lloyd_optimize_mesh_3(c3t3, domain, time_limit=30);
std::ofstream medit_file("out.mesh");
c3t3.output_to_medit(medit_file);
return 0;
}
I compile it by using the following CMakeLists.txt file:
# Created by the script cgal_create_CMakeLists
project( executables )
cmake_minimum_required(VERSION 2.8.11)
find_package( CGAL QUIET COMPONENTS )
# !!! I had to add manually the following line !!!
find_package(CGAL COMPONENTS ImageIO)
include( ${CGAL_USE_FILE} )
find_package( Boost REQUIRED )
add_executable( executables lloyd.cpp )
add_to_cached_list( CGAL_EXECUTABLE_TARGETS executables )
target_link_libraries(executables ${CGAL_LIBRARIES} ${CGAL_3RD_PARTY_LIBRARIES} )
No mesh is generated. I obtain the following message:
$ ./build/mesh_3D_image
terminate called after throwing an instance of 'CGAL::Precondition_exception'
what(): CGAL ERROR: precondition violation!
Expr: std::distance(first,last) >= 3
File: /usr/include/CGAL/Mesh_3/Lloyd_move.h
Line: 419
Aborted (core dumped)
Where my code is wrong, and how can I trigger optimizations for meshes generated by 3D images?

actually, when CGAL::make_mesh_3() is called like this :
C3t3 c3t3 = CGAL::make_mesh_3<C3t3>(domain, criteria);
it internally launches CGAL::perturb_mesh_3() and CGAL::exude_mesh_3(). The latest changes the weights of vertices in the Regular triangulation, and should always be called last (see the Warning in the documentation of CGAL::exude_mesh_3().
The only limitation on the order is that exuder should be called last. So you can either call
C3t3 c3t3 = CGAL::make_mesh_3<C3t3>(domain, criteria, lloyd(time_limit=30));
or
C3t3 c3t3 = CGAL::make_mesh_3<C3t3>(domain, criteria, no_exude());
CGAL::lloyd_optimize_mesh_3(c3t3, domain, time_limit = 30);
CGAL::exude_mesh_3(c3t3);

You removed the part:
if(!image.read(fname)){
std::cerr << "Error: Cannot read file " << fname << std::endl;
return EXIT_FAILURE;
}
from the example, which is what actually reads the image from the file.

Related

Eigen::placeholders::all for Eigen array/matrix with pybind11

I am building a c++ program linked to python code, using pybind11.
I use Eigen for the matrix operations.
I am having issues with the slicing of an Eigen array.
According to Eigen documentation, it is possible to slice an array using Eigen::placeholders::all -
std::vector<int> ind{4,2,5,5,3};
MatrixXi A = MatrixXi::Random(4,6);
cout << "Initial matrix A:\n" << A << "\n\n";
cout << "A(all,ind):\n" << A(Eigen::placeholders::all,ind) << "\n\n";
However, when I try to use this syntax in my code, I get the following error:
error: ‘Eigen::indexing’ has not been declared
I found an explanation for this - The Eigen header I used is of pybind11, not the original Eigen header.
This explains the issue, but not helping with a solution.
I tried including the original Eigen headers, but it won't include the indexing or placeholders namespaces.
Thank for your assistance!
edit:
Here is the code I tried to compile:
#include <pybind11/pybind11.h>
#include <pybind11/eigen.h>
#include <pybind11/stl.h>
#include <pybind11/numpy.h>
#include <pybind11/iostream.h>
#include <iostream>
#include <valarray>
#include <Eigen/Core>
namespace py = pybind11;
void example()
{
std::vector<int> ind{4,2,5,5,3};
Eigen::MatrixXi A = Eigen::MatrixXi::Random(4,6);
std::cout << "Initial matrix A:\n" << A << "\n\n";
std::cout << "A(all,ind):\n" << A(Eigen::placeholders::all,ind) << "\n\n";
}
For which I got the following error:
error: ‘Eigen::placeholders’ has not been declared
Eventually I was able to solve the issue - it was an issue with the cmake files, and specifically, the FindEigen3.cmake was missing under the cmake folder.
Somehow, (probably because of pybind11/Eigen header) the program was able to compile, but could not find all the relevant headers.
After adding the FindEigen3.cmake under the cmake folder, all included folders were correct, and I could use Eigen::placeholders::all.
Thanks #Homer512 for the assistance!

How to copy a surface mesh in CGAL

I want to copy a mesh with the function copy_face_graph(source, target). But the target mesh is different (it has same number of vertices and faces, but the coordinates and the order are totally different).
The code:
#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
#include <CGAL/Exact_predicates_exact_constructions_kernel.h>
#include <CGAL/Surface_mesh.h>
#include <iostream>
#include <fstream>
#include <CGAL/boost/graph/copy_face_graph.h>
typedef CGAL::Exact_predicates_inexact_constructions_kernel Kernel;
typedef CGAL::Surface_mesh<Kernel::Point_3> Mesh;
namespace PMP = CGAL::Polygon_mesh_processing;
int main(int argc, char* argv[]) {
const char* filename1 = (argc > 1) ? argv[1] : "data/blobby.off";
std::cout << ".off loaded" << std::endl;
std::ifstream input(filename1);
Mesh mesh_orig;
if (!input || !(input >> mesh_orig))
{
std::cerr << "First mesh is not a valid off file." << std::endl;
return 1;
}
input.close();
// ========================================================
Mesh mesh_copy;
CGAL::copy_face_graph(mesh_orig, mesh_copy);
// ========================================================
std::ofstream mesh_cpy("CPY_ANYLYZE/mesh_copy.off");
mesh_cpy << mesh_copy;
mesh_cpy.close();
return 0;
}
Dose anyone knows how to get a complete same mesh from the original mesh? Do I need add the named parameters, or maybe using another function?
Thanks a lot
Except if you intend to write some code working with different data structures, you can use the copy constructor from the Surface_mesh class, Mesh mesh_copy(mesh_orig). copy_face_graph does not do a raw copy because it works also if the input and output are of different types. However the output should be the same up to the order of the simplices.

getting error in running cmake std Tutorial step 2 example

I am totally new to cmake and its syntax .But fortunately I am able to run the cmake tutorial step 1 as per the introductions mention on below links :
https://cmake.org/cmake/help/latest/guide/tutorial/index.html
But I am totally stucked at step 2 project to run using cmake.
I have created the step 2 project and understand the syntax to link the library for doing square root of a number, But I did not understand how to run this as I am getting below error :
user#server:~/TER_CMAKE/Tutorial/step2_build$ cmake ../step2
CMake Error at CMakeLists.txt:19 (add_subdirectory):
The binary directory
/home/user/TER_CMAKE/Tutorial/step2/MathFunctions
is already used to build a source directory. It cannot be used to build
source directory
/home/user/TER_CMAKE/Tutorial/step2/MathFunctions
Specify a unique binary directory name.
-- Configuring incomplete, errors occurred!
The example is available at below location for step 2 under heading Adding a Library (Step 2)..
https://moodle.rrze.uni-erlangen.de/pluginfile.php/14829/mod_resource/content/5/CMakeTutorial.pdf
My intention is to run my example this way
step2_build$ cmake ../step2
step2_build$ cmake --build .
step2_build$ ./Tutorial 121
As I am not sure that is it good to ask this way on this platform ,But as I do not have any other guidance .I am doing this by my own .
Note: I do not wants to use any tool to run my step 2 example.I wants to run everything using command prompt and cmake command only .where I can understand the cmake .
Edit:
Adding my CMakeLists.txt =
cmake_minimum_required(VERSION 3.5)
#set the project name
project(Tutorial VERSION 1.0)
#specify the c++ std
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED True)
option(USE_MYMATH "Use tutorial provided math implementation" ON)
#Configure a header file to pass the version number to the source code
configure_file(TutorialConfig.h.in TutorialConfig.h)
#add the MathFunctions Library
add_subdirectory(MathFunctions)
if(USE_MYMATH)
add_subdirectory(MathFunctions)
list(APPEND EXTRA_LIBS MathFunctions)
list(APPEND EXTRA_INCLUDES "${PROJECT_SOURCE_DIR}/MathFunctions")
endif()
#add the executable
add_executable(Tutorial tutorial.cpp)
target_link_libraries(Tutorial PUBLIC ${EXTRA_LIBS})
# add the binary tree to the search path for include files
# so that we will find TutorialConfig.h
target_include_directories(Tutorial PUBLIC
"${PROJECT_BINARY_DIR}"
${EXTRA_LIBS}
)
My Source tutorial.cpp file:
#include <iostream>
#include <cmath>
#include <cstdlib>
#include <string>
#ifdef USE_MYMATH
#include "MathFunctions.h"
#endif
#include "TutorialConfig.h"
using namespace std;
int main(int argc, char* argv[])
{
if (argc < 2) {
cout << "Usage: " << argv[0] << " number" << endl;
return 1;
}
// convert input to double
const double inputValue = atof(argv[1]);
// calculate square root
#ifdef USE_MYMATH
const double outputValue = mysqrt(inputValue);
#else
const double outputValue = sqrt(inputValue);
#endif
cout << "The square root of " << inputValue << " is " << outputValue << endl;
return 0;
}
ToturialConfig.h.in file :
#define Tutorial_VERSION_MAJOR #Tutorial_VERSION_MAJOR#
#define Tutorial_VERSION_MINOR #Tutorial_VERSION_MINOR#
#cmakedefine USE_MYMATH
EDIT:
Step2 has a folder MathFuctions,Which has Cmake file mysqrt.cpp file
/TER_CMAKE/Tutorial/step2/MathFunctions/CMakeLists.txt
add_library(MathFunctions mysqrt.cpp)
/TER_CMAKE/Tutorial/step2/MathFunctions/mysqrt.cpp
#include <iostream>
// a hack square root calculation using simple operations
double mysqrt(double x)
{
if (x <= 0) {
return 0;
}
double result = x;
// do ten iterations
for (int i = 0; i < 10; ++i) {
if (result <= 0) {
result = 0.1;
}
double delta = x - (result * result);
result = result + 0.5 * delta / result;
std::cout << "Computing sqrt of " << x << " to be " << result << std::endl;
}
return result;
}
In case USE_MYMATH variable is set add_subdirectory(MathFunctions) is invoked twice. You need to decide and remove one of the occurrences on lines 16 and 19 in you CMakeLists.txt.
Two issues I can see:
You're adding the subdirectory "MathFunctions" twice when you configure the build with -DUSE_MYMATH=ON. This is why you are getting "CMake Error at CMakeLists.txt:19 (add_subdirectory):"
To fix, remove
#add the MathFunctions Library
add_subdirectory(MathFunctions)
and rely on
if(USE_MYMATH)
add_subdirectory(MathFunctions)
list(APPEND EXTRA_LIBS MathFunctions)
list(APPEND EXTRA_INCLUDES "${PROJECT_SOURCE_DIR}/MathFunctions")
endif()
In your CMakeLists.txt file, you are doing
target_include_directories(Tutorial PUBLIC
"${PROJECT_BINARY_DIR}"
${EXTRA_LIBS}
)
Instead of
${EXTRA_LIBS}
It should be
${EXTRA_INCLUDES}
in Discourse Cmake Org -- help with tutorial step 2
Josef Angstenberger
jtxa said
The files in Step3 are the expected result if you do everything from Step2.
Can you please compare your files against the ones from Step3 to see if there are any relevant differences?
Blockquote
Marshallb's solution will solve nahesh relkar's problem
Loading Step2/CMakeLists.txt and Step3/CMakeLists.txt into vimdiff helped me to fix mine

cooperative_groups::this_grid() causes any CUDA API call to return 'unknown error'

Following the same steps in CUDA samples to launch a kernel and sync across the grid using cooperative_groups::this_grid().sync() causes any CUDA API call to fails. While using
cooperative_groups::this_thread_block().sync() works fine and gives correct results.
I used the following code and CMakeLists.txt (cmake version 3.11.1) to test it using CUDA 10 on TITAN V GPU (Driver Version 410.73) with Ubuntu 16.04.5 LTS. The code is also available on github in order to make it easy to reproduce the error.
The code reads an array and then reverses it (from [0 1 2 ... 9] to [9 8 7 ... 0]). In order to do this, each thread reads a single element from the array, sync, and then writes its element to the right destination. The code can be easily modified to ensure that this_thread_block().sync() works fine. Simply change arr_size to be less 1024 and use cg::thread_block barrier = cg::this_thread_block(); instead.
test_cg.cu
#include <cuda_runtime_api.h>
#include <stdio.h>
#include <stdint.h>
#include <cstdint>
#include <numeric>
#include <cuda.h>
#include <cooperative_groups.h>
namespace cg = cooperative_groups;
//********************** CUDA_ERROR
inline void HandleError(cudaError_t err, const char *file, int line) {
//Error handling micro, wrap it around function whenever possible
if (err != cudaSuccess) {
printf("\n%s in %s at line %d\n", cudaGetErrorString(err), file, line);
#ifdef _WIN32
system("pause");
#else
exit(EXIT_FAILURE);
#endif
}
}
#define CUDA_ERROR( err ) (HandleError( err, __FILE__, __LINE__ ))
//******************************************************************************
//********************** cg kernel
__global__ void testing_cg_grid_sync(const uint32_t num_elements,
uint32_t *d_arr){
uint32_t tid = threadIdx.x + blockDim.x*blockIdx.x;
if (tid < num_elements){
uint32_t my_element = d_arr[tid];
//to sync across the whole grid
cg::grid_group barrier = cg::this_grid();
//to sync within a single block
//cg::thread_block barrier = cg::this_thread_block();
//wait for all reads
barrier.sync();
uint32_t tar_id = num_elements - tid - 1;
d_arr[tar_id] = my_element;
}
}
//******************************************************************************
//********************** execute
void execute_test(const int sm_count){
//host array
const uint32_t arr_size = 1 << 20; //1M
uint32_t* h_arr = (uint32_t*)malloc(arr_size * sizeof(uint32_t));
//fill with sequential numbers
std::iota(h_arr, h_arr + arr_size, 0);
//device array
uint32_t* d_arr;
CUDA_ERROR(cudaMalloc((void**)&d_arr, arr_size*sizeof(uint32_t)));
CUDA_ERROR(cudaMemcpy(d_arr, h_arr, arr_size*sizeof(uint32_t),
cudaMemcpyHostToDevice));
//launch config
const int threads = 512;
//following the same steps done in conjugateGradientMultiBlockCG.cu
//cuda sample to launch kernel that sync across grid
//https://github.com/NVIDIA/cuda-samples/blob/master/Samples/conjugateGradientMultiBlockCG/conjugateGradientMultiBlockCG.cu#L436
int num_blocks_per_sm = 0;
CUDA_ERROR(cudaOccupancyMaxActiveBlocksPerMultiprocessor(&num_blocks_per_sm,
(void*)testing_cg_grid_sync, threads, 0));
dim3 grid_dim(sm_count * num_blocks_per_sm, 1, 1), block_dim(threads, 1, 1);
if(arr_size > grid_dim.x*block_dim.x){
printf("\n The grid size (numBlocks*numThreads) is less than array size.\n");
exit(EXIT_FAILURE);
}
printf("\n Launching %d blocks, each containing %d threads", grid_dim.x,
block_dim.x);
//argument passed to the kernel
void *kernel_args[] = {
(void *)&arr_size,
(void *)&d_arr, };
//finally launch the kernel
cudaLaunchCooperativeKernel((void*)testing_cg_grid_sync,
grid_dim, block_dim, kernel_args);
//make sure everything went okay
CUDA_ERROR(cudaGetLastError());
CUDA_ERROR(cudaDeviceSynchronize());
//get results on the host
CUDA_ERROR(cudaMemcpy(h_arr, d_arr, arr_size*sizeof(uint32_t),
cudaMemcpyDeviceToHost));
//validate
for (uint32_t i = 0; i < arr_size; i++){
if (h_arr[i] != arr_size - i - 1){
printf("\n Result mismatch in h_arr[%u] = %u\n", i, h_arr[i]);
exit(EXIT_FAILURE);
}
}
}
//******************************************************************************
int main(int argc, char**argv) {
//set to Titan V
uint32_t device_id = 0;
cudaSetDevice(device_id);
//get sm count
cudaDeviceProp devProp;
CUDA_ERROR(cudaGetDeviceProperties(&devProp, device_id));
int sm_count = devProp.multiProcessorCount;
//execute
execute_test(sm_count);
printf("\n Mission accomplished \n");
return 0;
}
CMakeLists.txt
cmake_minimum_required(VERSION 3.8 FATAL_ERROR)
set(PROJECT_NAME "test_cg")
project(${PROJECT_NAME} LANGUAGES CXX CUDA)
#default build type is Release
if (CMAKE_BUILD_TYPE STREQUAL "")
set(CMAKE_BUILD_TYPE Release)
endif ()
SET(CUDA_SEPARABLE_COMPILATION ON)
########## Libraries/flags Starts Here ######################
find_package(CUDA REQUIRED)
include_directories("${CUDA_INCLUDE_DIRS}")
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS}; -lineinfo; -std=c++11; -expt-extended-lambda; -O3; -use_fast_math; -rdc=true;)
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};-gencode=arch=compute_70,code=sm_70) #for TITAN V
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -m64 -Wall -std=c++11")
########## Libraries/flags Ends Here ######################
########## inc/libs/exe/features Starts Here ######################
set(CMAKE_INCLUDE_CURRENT_DIR ON)
CUDA_ADD_EXECUTABLE(${PROJECT_NAME} test_cg.cu)
target_compile_features(${PROJECT_NAME} PUBLIC cxx_std_11)
set_target_properties(${PROJECT_NAME} PROPERTIES POSITION_INDEPENDENT_CODE ON)
set_target_properties(${PROJECT_NAME} PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
target_link_libraries(${PROJECT_NAME} ${CUDA_LIBRARIES} ${CUDA_cudadevrt_LIBRARY})
########## inc/libs/exe/features Ends Here ######################
Running this code gives:
unknown error in /home/ahdhn/test_cg/test_cg.cu at line 67
This is the first line that uses cudaMalloc. I made sure that the code is compiled for the correct architecture by querying __CUDA_ARCH__ from the device and the results is 700. Kindly let me know if you spot me doing something wrong in the code or the CMakeLists.txt file.
With external help, the solution that got the code working is to add string(APPEND CMAKE_CUDA_FLAGS " -gencode arch=compute_70,code=sm_70 --cudart shared") after the second set(CUDA_NVCC_FLAGS...... The reason is that I only have libcudadevrt.a under my /usr/local/cuda-10.0/lib64/ and so I have to signal CUDA to link shared/dynamic run-time library since the default is to link to static. string(APPEND CMAKE_CUDA_FLAGS " -gencode arch=compute_70,code=sm_70") after the second set(CUDA_NVCC_FLAGS...... The reason is that the sm_70 flag was not passed to the linker properly.
Additionally, using only CUDA_NVCC_FLAGS will only pass the sm_70 info to the compiler not the linker. While only using CMAKE_NVCC_FLAGS will report error: namespace "cooperative_groups" has no member "grid_group" error.

Why do I need separable compilation?

I have the code shown below. As far as I understood, separable compilation must be turned on when
CUDA device code is separated into .h and .cu files
Use ObjectA's device code into Object's B device code
however, in my main function I am not having any of the cases above. Could you tell me why do I have to set separable compilation for this sample project?
BitHelper.h
#pragma once
#include <cuda_runtime.h>
#define COMPILE_TARGET __host__ __device__
class BitHelper
{
public:
COMPILE_TARGET BitHelper();
COMPILE_TARGET ~BitHelper();
COMPILE_TARGET static void clear(unsigned int& val0);
};
BitHelper.cu
#include "bithelper.h"
BitHelper::BitHelper()
{}
BitHelper::~BitHelper()
{}
void BitHelper::clear(unsigned int& val0)
{
val0 = 0x0000;
}
Consume_BitHelper.h
#pragma once
class Consume_BitHelper
{
public:
void apply();
private:
bool test_cpu();
bool test_gpu();
};
Consume_BitHelper.cu
#include "consume_bithelper.h"
#include <cuda_runtime.h>
#include <iostream>
#include "bithelper.h"
__global__
void myKernel()
{
unsigned int FLAG_VALUE = 0x2222;
printf("GPU before: %d\n", FLAG_VALUE);
BitHelper::clear(FLAG_VALUE);
printf("GPU after: %d\n", FLAG_VALUE);
}
void Consume_BitHelper::apply()
{
test_cpu();
test_gpu();
cudaDeviceSynchronize();
}
bool Consume_BitHelper::test_cpu()
{
std::cout << "TEST CPU" << std::endl;
unsigned int FLAG_VALUE = 0x1111;
std::cout << "CPU before: " << FLAG_VALUE << std::endl;
BitHelper::clear(FLAG_VALUE);
std::cout << "CPU after : " << FLAG_VALUE << std::endl;
return true;
}
bool Consume_BitHelper::test_gpu()
{
std::cout << "TEST GPU" << std::endl;
myKernel << <1, 1 >> > ();
return true;
}
main.cu
#include "consume_bithelper.h"
#include "bithelper.h"
#include <iostream>
int main(int argc, char** argv)
{
Consume_BitHelper cbh;
cbh.apply();
std::cout << "\nPress any key to continue...";
std::cin.get();
return 0;
}
CMakeLists.txt
cmake_minimum_required(VERSION 3.10)
project(cuda_class LANGUAGES CXX CUDA)
#BitHelper needs separable compilation because we have separated declaration from definition
add_library(bithelper_lib STATIC bithelper.cu)
set_property(TARGET bithelper_lib PROPERTY CUDA_SEPARABLE_COMPILATION ON)
#Consume_BitHelper needs separable compilation because we call BitHelper's device code
#from Consume_BitHelper's kernel
add_library(consume_bithelper_lib STATIC consume_bithelper.cu)
set_property(TARGET consume_bithelper_lib PROPERTY CUDA_SEPARABLE_COMPILATION ON)
target_link_libraries(consume_bithelper_lib bithelper_lib)
#We only call CPU code so no need of separable compilation?
add_executable(${PROJECT_NAME} main.cu)
target_link_libraries(${PROJECT_NAME} bithelper_lib consume_bithelper_lib)
The errors I'm getting are these
EDIT
According to Robert Crovella's post Consume_BitHelper.cu uses BitHelper::clear defined in a separate compilation unit.
Does it mean I have to activate only separate compilation for BitHelper?
Since separate compilation has to do only with device code called from device code.
Why am I getting the mentioned errors when separate compilation is NOT on for cuda_class? (which is the executable created from CMake and is not calling any device code)
Separable compilation has to do with how the compiler handles function calls. In exchange for a little bit of overhead, you get the ability to make true function calls and thus access code from other "compilation units" (i.e. .cu source files).
As GPU programmers are obsessed with performance (particularly the extra registers that get used when separable compilation is enabled) Nvidia made it an option instead of default.
You should only need separable compilation for .cu files that access functions/globals defined in other .cu files.