I am building a c++ program linked to python code, using pybind11.
I use Eigen for the matrix operations.
I am having issues with the slicing of an Eigen array.
According to Eigen documentation, it is possible to slice an array using Eigen::placeholders::all -
std::vector<int> ind{4,2,5,5,3};
MatrixXi A = MatrixXi::Random(4,6);
cout << "Initial matrix A:\n" << A << "\n\n";
cout << "A(all,ind):\n" << A(Eigen::placeholders::all,ind) << "\n\n";
However, when I try to use this syntax in my code, I get the following error:
error: ‘Eigen::indexing’ has not been declared
I found an explanation for this - The Eigen header I used is of pybind11, not the original Eigen header.
This explains the issue, but not helping with a solution.
I tried including the original Eigen headers, but it won't include the indexing or placeholders namespaces.
Thank for your assistance!
edit:
Here is the code I tried to compile:
#include <pybind11/pybind11.h>
#include <pybind11/eigen.h>
#include <pybind11/stl.h>
#include <pybind11/numpy.h>
#include <pybind11/iostream.h>
#include <iostream>
#include <valarray>
#include <Eigen/Core>
namespace py = pybind11;
void example()
{
std::vector<int> ind{4,2,5,5,3};
Eigen::MatrixXi A = Eigen::MatrixXi::Random(4,6);
std::cout << "Initial matrix A:\n" << A << "\n\n";
std::cout << "A(all,ind):\n" << A(Eigen::placeholders::all,ind) << "\n\n";
}
For which I got the following error:
error: ‘Eigen::placeholders’ has not been declared
Eventually I was able to solve the issue - it was an issue with the cmake files, and specifically, the FindEigen3.cmake was missing under the cmake folder.
Somehow, (probably because of pybind11/Eigen header) the program was able to compile, but could not find all the relevant headers.
After adding the FindEigen3.cmake under the cmake folder, all included folders were correct, and I could use Eigen::placeholders::all.
Thanks #Homer512 for the assistance!
Related
I recently was trying to compare different python and C++ matrix libraries against each other for their linear algebra performance in order to see which one(s) to use in an upcoming project. While there are multiple types of linear algebra operations, I have chosen to focus mainly on matrix inversion, as it seems to be the one giving strange results. I have written the following code below for the comparison, but am thinking I must be doing something wrong.
C++ Code
#include <iostream>
#include "eigen/Eigen/Dense"
#include <xtensor/xarray.hpp>
#include <xtensor/xio.hpp>
#include <xtensor/xview.hpp>
#include <xtensor/xrandom.hpp>
#include <xtensor-blas/xlinalg.hpp> //-lblas -llapack for cblas, -llapack -L OpenBLAS/OpenBLAS_Install/lib -l:libopenblas.a -pthread for openblas
//including accurate timer
#include <chrono>
//including vector array
#include <vector>
void basicMatrixComparisonEigen(std::vector<int> dims, int numrepeats = 1000);
void basicMatrixComparisonXtensor(std::vector<int> dims, int numrepeats = 1000);
int main()
{
std::vector<int> sizings{1, 10, 100, 1000, 10000, 100000};
basicMatrixComparisonEigen(sizings, 2);
basicMatrixComparisonXtensor(sizings,2);
return 0;
}
void basicMatrixComparisonEigen(std::vector<int> dims, int numrepeats)
{
std::chrono::high_resolution_clock::time_point t1;
std::chrono::high_resolution_clock::time_point t2;
using time = std::chrono::high_resolution_clock;
std::cout << "Timing Eigen: " << std::endl;
for (auto &dim : dims)
{
std::cout << "Scale Factor: " << dim << std::endl;
try
{
//Linear Operations
auto l = Eigen::MatrixXd::Random(dim, dim);
//Eigen Matrix inversion
t1 = time::now();
for (int i = 0; i < numrepeats; i++)
{
Eigen::MatrixXd pinv = l.completeOrthogonalDecomposition().pseudoInverse();
//note this does not come out to be identity. The inverse is wrong.
//std::cout<<l*pinv<<std::endl;
}
t2 = time::now();
std::cout << "Eigen Matrix inversion took: " << std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1).count() * 1000 / (double)numrepeats << " milliseconds." << std::endl;
std::cout << "\n\n\n";
}
catch (const std::exception &e)
{
std::cout << "Error: '" << e.what() << "'\n";
}
}
}
void basicMatrixComparisonXtensor(std::vector<int> dims, int numrepeats)
{
std::chrono::high_resolution_clock::time_point t1;
std::chrono::high_resolution_clock::time_point t2;
using time = std::chrono::high_resolution_clock;
std::cout << "Timing Xtensor: " << std::endl;
for (auto &dim : dims)
{
std::cout << "Scale Factor: " << dim << std::endl;
try
{
//Linear Operations
auto l = xt::random::randn<double>({dim, dim});
//Xtensor Matrix inversion
t1 = time::now();
for (int i = 0; i < numrepeats; i++)
{
auto inverse = xt::linalg::pinv(l);
//something is wrong here. The inverse is not actually the inverse when you multiply it out.
//std::cout << xt::linalg::dot(inverse,l) << std::endl;
}
t2 = time::now();
std::cout << "Xtensor Matrix inversion took: " << std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1).count() * 1000 / (double)numrepeats << " milliseconds." << std::endl;
std::cout << "\n\n\n";
}
catch (const std::exception &e)
{
std::cout << "Error: '" << e.what() << "'\n";
}
}
}
This is compiled with:
g++ cpp_library.cpp -O2 -llapack -L OpenBLAS/OpenBLAS_Install/lib -l:libopenblas.a -pthread -march=native -o benchmark.exe
for OpenBLAS, and
g++ cpp_library.cpp -O2 -lblas -llapack -march=native -o benchmark.exe
for cBLAS.
g++ version 9.3.0.
And for Python 3:
import numpy as np
from datetime import datetime as dt
#import timeit
start=dt.now()
l=np.random.rand(1000,1000)
for i in range(2):
result=np.linalg.inv(l)
end=dt.now()
print("Completed in: "+str((end-start)/2))
#print(np.matmul(l,result))
#print(np.dot(l,result))
#Timeit also gives similar results
I will focus on the largest decade that runs in a reasonable amount of time on my computer: 1000x1000. I know that only 2 runs introduces a bit of variance, but I've run it with more and the results are roughly the same as below:
Eigen 3.3.9: 196.804 milliseconds
Xtensor/Xtensor-blas w/ OpenBlas: 378.156 milliseconds
Numpy 1.17.4: 172.582 milliseconds
Is this a reasonable result to expect? Why are the C++ libraries slower than Numpy? All 3 packages are using some sort of Lapack/BLAS backend, yet there is a significant difference between the 3. Particularly, Xtensor will pin my CPU to 100% usage with OpenBlas' threads, yet still manage to have worse performance.
I'm wondering if the C++ libraries are actually performing the inverse/pseudoinverse of the matrix, and if this is what is causing these results. In the commented sections of the C++ test code, I have noted that when I sanity-checked the results from both Eigen and Xtensor, the resulting matrix product between the matrix and its inverse was not even close to the identity matrix. I tried with smaller matrices (10x10) thinking it might be a precision error, but the problem remained. In another test, I test for rank, and these matrices are full rank. To be sure I wasn't going crazy, I tried with inv() instead of pinv() in both cases, and the results are the same. Am I using the wrong functions for this linear algebra benchmark, or is this Numpy twisting the knife on 2 disfunctional low level libraries?
EDIT:
Thank you everyone for your interest in this problem. I think I have figured out the issue. I suspect Eigen and Xtensor have lazy evaluation and this actually is causing errors downstream, and outputting random matrices instead of the inversed matrices. I was able to correct the strange numerical inversion failure with the following replacements in the code:
auto temp = Eigen::MatrixXd::Random(dim, dim);
Eigen::MatrixXd l(dim,dim);
l=temp;
and
auto temp = xt::random::randn<double>({dim, dim});
xt::xarray<double> l =temp;
However, the timings didn't change much:
Eigen 3.3.9: 201.386 milliseconds
Xtensor/Xtensor-blas w/ OpenBlas: 337.299 milliseconds.
Numpy 1.17.4: (from before) 172.582 milliseconds
Actually, a little strangely, adding -O3 and -ffast-math actually slowed down the code a little. -march=native had the biggest performance increase for me when I tried it. Also, OpenBLAS is 2-3X faster than CBLAS for these problems.
Firstly, you are not computing same things.
To compute inverse of l matrix, use l.inverse() for Eigen and xt::linalg::inv() for xtensor
When you link Blas to Eigen or xtensor, these operations are automatically dispatched to the your choosen Blas.
I tried replacing the inverse functions, replaced auto with MatrixXd and xt::xtensor to avoid lazy evaluation, linked openblas to Eigen, xtensor and numpy and compiled with only -O3 flag, the following are the results on my Macbook pro M1:
Eigen-3.3.9 (with openblas) - ~ 38 ms
Eigen-3.3.9 (without openblas) - ~ 85 ms
xtensor-master (with openblas) - ~41 ms
Numpy- 1.21.2 (with openblas) - ~35 ms.
I am totally new to cmake and its syntax .But fortunately I am able to run the cmake tutorial step 1 as per the introductions mention on below links :
https://cmake.org/cmake/help/latest/guide/tutorial/index.html
But I am totally stucked at step 2 project to run using cmake.
I have created the step 2 project and understand the syntax to link the library for doing square root of a number, But I did not understand how to run this as I am getting below error :
user#server:~/TER_CMAKE/Tutorial/step2_build$ cmake ../step2
CMake Error at CMakeLists.txt:19 (add_subdirectory):
The binary directory
/home/user/TER_CMAKE/Tutorial/step2/MathFunctions
is already used to build a source directory. It cannot be used to build
source directory
/home/user/TER_CMAKE/Tutorial/step2/MathFunctions
Specify a unique binary directory name.
-- Configuring incomplete, errors occurred!
The example is available at below location for step 2 under heading Adding a Library (Step 2)..
https://moodle.rrze.uni-erlangen.de/pluginfile.php/14829/mod_resource/content/5/CMakeTutorial.pdf
My intention is to run my example this way
step2_build$ cmake ../step2
step2_build$ cmake --build .
step2_build$ ./Tutorial 121
As I am not sure that is it good to ask this way on this platform ,But as I do not have any other guidance .I am doing this by my own .
Note: I do not wants to use any tool to run my step 2 example.I wants to run everything using command prompt and cmake command only .where I can understand the cmake .
Edit:
Adding my CMakeLists.txt =
cmake_minimum_required(VERSION 3.5)
#set the project name
project(Tutorial VERSION 1.0)
#specify the c++ std
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED True)
option(USE_MYMATH "Use tutorial provided math implementation" ON)
#Configure a header file to pass the version number to the source code
configure_file(TutorialConfig.h.in TutorialConfig.h)
#add the MathFunctions Library
add_subdirectory(MathFunctions)
if(USE_MYMATH)
add_subdirectory(MathFunctions)
list(APPEND EXTRA_LIBS MathFunctions)
list(APPEND EXTRA_INCLUDES "${PROJECT_SOURCE_DIR}/MathFunctions")
endif()
#add the executable
add_executable(Tutorial tutorial.cpp)
target_link_libraries(Tutorial PUBLIC ${EXTRA_LIBS})
# add the binary tree to the search path for include files
# so that we will find TutorialConfig.h
target_include_directories(Tutorial PUBLIC
"${PROJECT_BINARY_DIR}"
${EXTRA_LIBS}
)
My Source tutorial.cpp file:
#include <iostream>
#include <cmath>
#include <cstdlib>
#include <string>
#ifdef USE_MYMATH
#include "MathFunctions.h"
#endif
#include "TutorialConfig.h"
using namespace std;
int main(int argc, char* argv[])
{
if (argc < 2) {
cout << "Usage: " << argv[0] << " number" << endl;
return 1;
}
// convert input to double
const double inputValue = atof(argv[1]);
// calculate square root
#ifdef USE_MYMATH
const double outputValue = mysqrt(inputValue);
#else
const double outputValue = sqrt(inputValue);
#endif
cout << "The square root of " << inputValue << " is " << outputValue << endl;
return 0;
}
ToturialConfig.h.in file :
#define Tutorial_VERSION_MAJOR #Tutorial_VERSION_MAJOR#
#define Tutorial_VERSION_MINOR #Tutorial_VERSION_MINOR#
#cmakedefine USE_MYMATH
EDIT:
Step2 has a folder MathFuctions,Which has Cmake file mysqrt.cpp file
/TER_CMAKE/Tutorial/step2/MathFunctions/CMakeLists.txt
add_library(MathFunctions mysqrt.cpp)
/TER_CMAKE/Tutorial/step2/MathFunctions/mysqrt.cpp
#include <iostream>
// a hack square root calculation using simple operations
double mysqrt(double x)
{
if (x <= 0) {
return 0;
}
double result = x;
// do ten iterations
for (int i = 0; i < 10; ++i) {
if (result <= 0) {
result = 0.1;
}
double delta = x - (result * result);
result = result + 0.5 * delta / result;
std::cout << "Computing sqrt of " << x << " to be " << result << std::endl;
}
return result;
}
In case USE_MYMATH variable is set add_subdirectory(MathFunctions) is invoked twice. You need to decide and remove one of the occurrences on lines 16 and 19 in you CMakeLists.txt.
Two issues I can see:
You're adding the subdirectory "MathFunctions" twice when you configure the build with -DUSE_MYMATH=ON. This is why you are getting "CMake Error at CMakeLists.txt:19 (add_subdirectory):"
To fix, remove
#add the MathFunctions Library
add_subdirectory(MathFunctions)
and rely on
if(USE_MYMATH)
add_subdirectory(MathFunctions)
list(APPEND EXTRA_LIBS MathFunctions)
list(APPEND EXTRA_INCLUDES "${PROJECT_SOURCE_DIR}/MathFunctions")
endif()
In your CMakeLists.txt file, you are doing
target_include_directories(Tutorial PUBLIC
"${PROJECT_BINARY_DIR}"
${EXTRA_LIBS}
)
Instead of
${EXTRA_LIBS}
It should be
${EXTRA_INCLUDES}
in Discourse Cmake Org -- help with tutorial step 2
Josef Angstenberger
jtxa said
The files in Step3 are the expected result if you do everything from Step2.
Can you please compare your files against the ones from Step3 to see if there are any relevant differences?
Blockquote
Marshallb's solution will solve nahesh relkar's problem
Loading Step2/CMakeLists.txt and Step3/CMakeLists.txt into vimdiff helped me to fix mine
I'm using CGAL 4.13 (Linux Fedora 29) to generate 3D meshes from segmented anathomical images. I would like to use Lloyd optimization, but I got in a reproductible way a runtime error.
In order to illustrate my problem, I modified the example mesh_3D_image.cpp by adding a Lloyd optimization step, as shown hereafter. The program compiles with no error/warning message.
#include <CGAL/Exact_predicates_inexact_constructions_kernel.h>
#include <CGAL/Mesh_triangulation_3.h>
#include <CGAL/Mesh_complex_3_in_triangulation_3.h>
#include <CGAL/Mesh_criteria_3.h>
#include <CGAL/Labeled_mesh_domain_3.h>
#include <CGAL/make_mesh_3.h>
#include <CGAL/Image_3.h>
typedef CGAL::Exact_predicates_inexact_constructions_kernel K;
typedef CGAL::Labeled_mesh_domain_3<K> Mesh_domain;
typedef CGAL::Sequential_tag Concurrency_tag;
typedef CGAL::Mesh_triangulation_3<Mesh_domain,CGAL::Default,Concurrency_tag>::type Tr;
typedef CGAL::Mesh_complex_3_in_triangulation_3<Tr> C3t3;
typedef CGAL::Mesh_criteria_3<Tr> Mesh_criteria;
using namespace CGAL::parameters;
int main(int argc, char* argv[])
{
const char* fname = (argc>1)?argv[1]:"data/liver.inr.gz";
CGAL::Image_3 image;
if(!image.read(fname)){
std::cerr << "Error: Cannot read file " << fname << std::endl;
return EXIT_FAILURE;
}
Mesh_domain domain = Mesh_domain::create_labeled_image_mesh_domain(image);
Mesh_criteria criteria(facet_angle=30, facet_size=6, facet_distance=4,
cell_radius_edge_ratio=3, cell_size=8);
C3t3 c3t3 = CGAL::make_mesh_3<C3t3>(domain, criteria);
// !!! THE FOLLOWING LINE MAKES THE PROGRAM CRASH !!!
CGAL::lloyd_optimize_mesh_3(c3t3, domain, time_limit=30);
std::ofstream medit_file("out.mesh");
c3t3.output_to_medit(medit_file);
return 0;
}
I compile it by using the following CMakeLists.txt file:
# Created by the script cgal_create_CMakeLists
project( executables )
cmake_minimum_required(VERSION 2.8.11)
find_package( CGAL QUIET COMPONENTS )
# !!! I had to add manually the following line !!!
find_package(CGAL COMPONENTS ImageIO)
include( ${CGAL_USE_FILE} )
find_package( Boost REQUIRED )
add_executable( executables lloyd.cpp )
add_to_cached_list( CGAL_EXECUTABLE_TARGETS executables )
target_link_libraries(executables ${CGAL_LIBRARIES} ${CGAL_3RD_PARTY_LIBRARIES} )
No mesh is generated. I obtain the following message:
$ ./build/mesh_3D_image
terminate called after throwing an instance of 'CGAL::Precondition_exception'
what(): CGAL ERROR: precondition violation!
Expr: std::distance(first,last) >= 3
File: /usr/include/CGAL/Mesh_3/Lloyd_move.h
Line: 419
Aborted (core dumped)
Where my code is wrong, and how can I trigger optimizations for meshes generated by 3D images?
actually, when CGAL::make_mesh_3() is called like this :
C3t3 c3t3 = CGAL::make_mesh_3<C3t3>(domain, criteria);
it internally launches CGAL::perturb_mesh_3() and CGAL::exude_mesh_3(). The latest changes the weights of vertices in the Regular triangulation, and should always be called last (see the Warning in the documentation of CGAL::exude_mesh_3().
The only limitation on the order is that exuder should be called last. So you can either call
C3t3 c3t3 = CGAL::make_mesh_3<C3t3>(domain, criteria, lloyd(time_limit=30));
or
C3t3 c3t3 = CGAL::make_mesh_3<C3t3>(domain, criteria, no_exude());
CGAL::lloyd_optimize_mesh_3(c3t3, domain, time_limit = 30);
CGAL::exude_mesh_3(c3t3);
You removed the part:
if(!image.read(fname)){
std::cerr << "Error: Cannot read file " << fname << std::endl;
return EXIT_FAILURE;
}
from the example, which is what actually reads the image from the file.
I have the code shown below. As far as I understood, separable compilation must be turned on when
CUDA device code is separated into .h and .cu files
Use ObjectA's device code into Object's B device code
however, in my main function I am not having any of the cases above. Could you tell me why do I have to set separable compilation for this sample project?
BitHelper.h
#pragma once
#include <cuda_runtime.h>
#define COMPILE_TARGET __host__ __device__
class BitHelper
{
public:
COMPILE_TARGET BitHelper();
COMPILE_TARGET ~BitHelper();
COMPILE_TARGET static void clear(unsigned int& val0);
};
BitHelper.cu
#include "bithelper.h"
BitHelper::BitHelper()
{}
BitHelper::~BitHelper()
{}
void BitHelper::clear(unsigned int& val0)
{
val0 = 0x0000;
}
Consume_BitHelper.h
#pragma once
class Consume_BitHelper
{
public:
void apply();
private:
bool test_cpu();
bool test_gpu();
};
Consume_BitHelper.cu
#include "consume_bithelper.h"
#include <cuda_runtime.h>
#include <iostream>
#include "bithelper.h"
__global__
void myKernel()
{
unsigned int FLAG_VALUE = 0x2222;
printf("GPU before: %d\n", FLAG_VALUE);
BitHelper::clear(FLAG_VALUE);
printf("GPU after: %d\n", FLAG_VALUE);
}
void Consume_BitHelper::apply()
{
test_cpu();
test_gpu();
cudaDeviceSynchronize();
}
bool Consume_BitHelper::test_cpu()
{
std::cout << "TEST CPU" << std::endl;
unsigned int FLAG_VALUE = 0x1111;
std::cout << "CPU before: " << FLAG_VALUE << std::endl;
BitHelper::clear(FLAG_VALUE);
std::cout << "CPU after : " << FLAG_VALUE << std::endl;
return true;
}
bool Consume_BitHelper::test_gpu()
{
std::cout << "TEST GPU" << std::endl;
myKernel << <1, 1 >> > ();
return true;
}
main.cu
#include "consume_bithelper.h"
#include "bithelper.h"
#include <iostream>
int main(int argc, char** argv)
{
Consume_BitHelper cbh;
cbh.apply();
std::cout << "\nPress any key to continue...";
std::cin.get();
return 0;
}
CMakeLists.txt
cmake_minimum_required(VERSION 3.10)
project(cuda_class LANGUAGES CXX CUDA)
#BitHelper needs separable compilation because we have separated declaration from definition
add_library(bithelper_lib STATIC bithelper.cu)
set_property(TARGET bithelper_lib PROPERTY CUDA_SEPARABLE_COMPILATION ON)
#Consume_BitHelper needs separable compilation because we call BitHelper's device code
#from Consume_BitHelper's kernel
add_library(consume_bithelper_lib STATIC consume_bithelper.cu)
set_property(TARGET consume_bithelper_lib PROPERTY CUDA_SEPARABLE_COMPILATION ON)
target_link_libraries(consume_bithelper_lib bithelper_lib)
#We only call CPU code so no need of separable compilation?
add_executable(${PROJECT_NAME} main.cu)
target_link_libraries(${PROJECT_NAME} bithelper_lib consume_bithelper_lib)
The errors I'm getting are these
EDIT
According to Robert Crovella's post Consume_BitHelper.cu uses BitHelper::clear defined in a separate compilation unit.
Does it mean I have to activate only separate compilation for BitHelper?
Since separate compilation has to do only with device code called from device code.
Why am I getting the mentioned errors when separate compilation is NOT on for cuda_class? (which is the executable created from CMake and is not calling any device code)
Separable compilation has to do with how the compiler handles function calls. In exchange for a little bit of overhead, you get the ability to make true function calls and thus access code from other "compilation units" (i.e. .cu source files).
As GPU programmers are obsessed with performance (particularly the extra registers that get used when separable compilation is enabled) Nvidia made it an option instead of default.
You should only need separable compilation for .cu files that access functions/globals defined in other .cu files.
I am compiling the following code with the -ffast-math option:
#include <limits>
#include <cmath>
#include <iostream>
int main() {
std::cout << std::isnan(std::numeric_limits<double>::quiet_NaN() ) << std::endl;
}
I am getting 0 as output. How can my code tell whether a floating point number is NaN when it is compiled with -ffast-math?
Note: On linux, std::isnan works even with -ffast-math.
Since -ffast-math instructs GCC not to handle NaNs, it is expected that isnan() has an undefined behaviour. Returning 0 is therefore valid.
You can use the following fast replacement for isnan():
#if defined __FAST_MATH__
# undef isnan
#endif
#if !defined isnan
# define isnan isnan
# include <stdint.h>
static inline int isnan(float f)
{
union { float f; uint32_t x; } u = { f };
return (u.x << 1) > 0xff000000u;
}
#endif
On linux, the gcc flag -ffast-math breaks isnan(), isinf() and isfinite() - there may be other related functions that are also broken that I have not tested.
The trick of wrapping the function/macro in parentheses also did not work (ie. (isnan)(x))
Removing -ffast-math works ;-)