Segmentation Fault when i use std::packaged_task? - packaged-task

these code complie success but running error. like
#include <thread>
#include <future>
#include <functional>
#include <memory>
int func(int x, int y)
{
return x + y;
}
void main()
{
std::packaged_task<int()> task(std::bind(func, 2, 11));
std::future<int> ret = task.get_future();
task();
std::cout << ret.get() << std::endl;
}
segment infomation :
#1 0x56cdb624 in std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool) (this=0x578a814c, __res=..., __ignore_failure=false) at /opt/buildtools/gcc-7.3.0/include/c++/7.3.0/future:401
#2 0x56cde1bc in std::__future_base::_Task_state<std::_Bind<int (*(int, int))(int, int)>, std::allocator<int>, int ()>::_M_run() (this=0x578a814c)
at /opt/buildtools/gcc-7.3.0/include/c++/7.3.0/future:1423
#3 0x56cdc239 in std::packaged_task<int ()>::operator()() (this=0xffffc154) at /opt/buildtools/gcc-7.3.0/include/c++/7.3.0/future:1556
#4 0x56cdb072 in test () at /home/builduser/code/sourcecode/otn_xcs_board/bd_otn_test/code/test_frame/test_main.cpp:22
#5 0x56cdb11d in main (argc=1, argv=0xffffc244) at /home/builduser/code/sourcecode/otn_xcs_board/bd_otn_test/code/test_frame/test_main.cpp:28
i try to run same code in another environment, visual studio 2022 or clion . everything goes well.
but same code in linux environment gets segment fault, why ?
version
g++ (GCC) 7.3.0
c++17

Related

Is there a working example of yaml-cpp for Visual Studio?

I've been looking for more than just documentation ( example code and YAML file ) to be able to get going quickly with Yaml-Cpp. I'd prefer a Visual Studio ( 2019 ) solution if someone can point me to it.
One solution if you are using cmake, you can use vcpkg
installing it as documented and then vcpkg install yaml-cpp.
after in your CMakeLists.txt you can add:
find_package(yaml-cpp CONFIG REQUIRED)
### your other config here ...
### ...
target_link_libraries(main PRIVATE yaml-cpp) #main is your executable target's name
after that in your "main.cpp" it would be enough something like:
#pragma warning(push)
#pragma warning(disable: 4996)
#pragma warning(disable: 4251)
#pragma warning(disable: 4275)
#include <yaml-cpp/yaml.h>
#pragma warning(pop)
#include <fstream>
#include <filesystem> // this is C++17, just foe eg to save a YAML file.
namespace fs = std::filesystem;
int main() {
fs::path pf = "file.yaml"; // you can use std::string as well
std::ofstream wf(pf, std::ios::binary | std::ios::out);
if (!wf.is_open()) {
return 1;
}
int values[] = {1, 2, 3, 4};
YAML::Emitter out(wf);
out << YAML::BeginMap;
out << YAML::Key << "MyKey" << YAML::Value << YAML::BeginSeq;
for (auto& v : values) {
out << v;
}
out << YAML::EndSeq << YAML::EndMap;
wf.close();
if (!wf.good()) {
return 1;
}
return 0;
}
Note: the #pragma warning(...) are for disabling some warning related, so optional.
This should be enough to have a "quicksart" as an example.

C++17 refuses to compile example if constexpr giving expected ‘(’ before ‘constexpr’

I am trying out this text-book example of using if constexpr and I am getting error expected ‘(’ before ‘constexpr’ when compiling.
I am compiling with g++ -std=c++17 test.cpp so the version should support it. Visual Studio Code understands this and hints that this expression will be compiled to number 120 (correct).
#include <iostream>
using std::cout;
using std::endl;
template <int N>
constexpr int fun() {
if constexpr (N <= 1) {
return 1;
} else {
return N * fun<N - 1>();
}
}
int main(int argc, char** argv) {
cout << fun<5>() << endl;
return 0;
}
This code should compile error-free
You need a more recent version of GCC. Version 7 and up support this. See:
https://en.cppreference.com/w/cpp/compiler_support#cpp17
(Search for "constexpr if".)
So upgrade your GCC version. If you're on Ubuntu, you can add the Toolchain PPA to install the latest available GCC version:
https://launchpad.net/~ubuntu-toolchain-r/+archive/ubuntu/test

cooperative_groups::this_grid() causes any CUDA API call to return 'unknown error'

Following the same steps in CUDA samples to launch a kernel and sync across the grid using cooperative_groups::this_grid().sync() causes any CUDA API call to fails. While using
cooperative_groups::this_thread_block().sync() works fine and gives correct results.
I used the following code and CMakeLists.txt (cmake version 3.11.1) to test it using CUDA 10 on TITAN V GPU (Driver Version 410.73) with Ubuntu 16.04.5 LTS. The code is also available on github in order to make it easy to reproduce the error.
The code reads an array and then reverses it (from [0 1 2 ... 9] to [9 8 7 ... 0]). In order to do this, each thread reads a single element from the array, sync, and then writes its element to the right destination. The code can be easily modified to ensure that this_thread_block().sync() works fine. Simply change arr_size to be less 1024 and use cg::thread_block barrier = cg::this_thread_block(); instead.
test_cg.cu
#include <cuda_runtime_api.h>
#include <stdio.h>
#include <stdint.h>
#include <cstdint>
#include <numeric>
#include <cuda.h>
#include <cooperative_groups.h>
namespace cg = cooperative_groups;
//********************** CUDA_ERROR
inline void HandleError(cudaError_t err, const char *file, int line) {
//Error handling micro, wrap it around function whenever possible
if (err != cudaSuccess) {
printf("\n%s in %s at line %d\n", cudaGetErrorString(err), file, line);
#ifdef _WIN32
system("pause");
#else
exit(EXIT_FAILURE);
#endif
}
}
#define CUDA_ERROR( err ) (HandleError( err, __FILE__, __LINE__ ))
//******************************************************************************
//********************** cg kernel
__global__ void testing_cg_grid_sync(const uint32_t num_elements,
uint32_t *d_arr){
uint32_t tid = threadIdx.x + blockDim.x*blockIdx.x;
if (tid < num_elements){
uint32_t my_element = d_arr[tid];
//to sync across the whole grid
cg::grid_group barrier = cg::this_grid();
//to sync within a single block
//cg::thread_block barrier = cg::this_thread_block();
//wait for all reads
barrier.sync();
uint32_t tar_id = num_elements - tid - 1;
d_arr[tar_id] = my_element;
}
}
//******************************************************************************
//********************** execute
void execute_test(const int sm_count){
//host array
const uint32_t arr_size = 1 << 20; //1M
uint32_t* h_arr = (uint32_t*)malloc(arr_size * sizeof(uint32_t));
//fill with sequential numbers
std::iota(h_arr, h_arr + arr_size, 0);
//device array
uint32_t* d_arr;
CUDA_ERROR(cudaMalloc((void**)&d_arr, arr_size*sizeof(uint32_t)));
CUDA_ERROR(cudaMemcpy(d_arr, h_arr, arr_size*sizeof(uint32_t),
cudaMemcpyHostToDevice));
//launch config
const int threads = 512;
//following the same steps done in conjugateGradientMultiBlockCG.cu
//cuda sample to launch kernel that sync across grid
//https://github.com/NVIDIA/cuda-samples/blob/master/Samples/conjugateGradientMultiBlockCG/conjugateGradientMultiBlockCG.cu#L436
int num_blocks_per_sm = 0;
CUDA_ERROR(cudaOccupancyMaxActiveBlocksPerMultiprocessor(&num_blocks_per_sm,
(void*)testing_cg_grid_sync, threads, 0));
dim3 grid_dim(sm_count * num_blocks_per_sm, 1, 1), block_dim(threads, 1, 1);
if(arr_size > grid_dim.x*block_dim.x){
printf("\n The grid size (numBlocks*numThreads) is less than array size.\n");
exit(EXIT_FAILURE);
}
printf("\n Launching %d blocks, each containing %d threads", grid_dim.x,
block_dim.x);
//argument passed to the kernel
void *kernel_args[] = {
(void *)&arr_size,
(void *)&d_arr, };
//finally launch the kernel
cudaLaunchCooperativeKernel((void*)testing_cg_grid_sync,
grid_dim, block_dim, kernel_args);
//make sure everything went okay
CUDA_ERROR(cudaGetLastError());
CUDA_ERROR(cudaDeviceSynchronize());
//get results on the host
CUDA_ERROR(cudaMemcpy(h_arr, d_arr, arr_size*sizeof(uint32_t),
cudaMemcpyDeviceToHost));
//validate
for (uint32_t i = 0; i < arr_size; i++){
if (h_arr[i] != arr_size - i - 1){
printf("\n Result mismatch in h_arr[%u] = %u\n", i, h_arr[i]);
exit(EXIT_FAILURE);
}
}
}
//******************************************************************************
int main(int argc, char**argv) {
//set to Titan V
uint32_t device_id = 0;
cudaSetDevice(device_id);
//get sm count
cudaDeviceProp devProp;
CUDA_ERROR(cudaGetDeviceProperties(&devProp, device_id));
int sm_count = devProp.multiProcessorCount;
//execute
execute_test(sm_count);
printf("\n Mission accomplished \n");
return 0;
}
CMakeLists.txt
cmake_minimum_required(VERSION 3.8 FATAL_ERROR)
set(PROJECT_NAME "test_cg")
project(${PROJECT_NAME} LANGUAGES CXX CUDA)
#default build type is Release
if (CMAKE_BUILD_TYPE STREQUAL "")
set(CMAKE_BUILD_TYPE Release)
endif ()
SET(CUDA_SEPARABLE_COMPILATION ON)
########## Libraries/flags Starts Here ######################
find_package(CUDA REQUIRED)
include_directories("${CUDA_INCLUDE_DIRS}")
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS}; -lineinfo; -std=c++11; -expt-extended-lambda; -O3; -use_fast_math; -rdc=true;)
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS};-gencode=arch=compute_70,code=sm_70) #for TITAN V
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -m64 -Wall -std=c++11")
########## Libraries/flags Ends Here ######################
########## inc/libs/exe/features Starts Here ######################
set(CMAKE_INCLUDE_CURRENT_DIR ON)
CUDA_ADD_EXECUTABLE(${PROJECT_NAME} test_cg.cu)
target_compile_features(${PROJECT_NAME} PUBLIC cxx_std_11)
set_target_properties(${PROJECT_NAME} PROPERTIES POSITION_INDEPENDENT_CODE ON)
set_target_properties(${PROJECT_NAME} PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
target_link_libraries(${PROJECT_NAME} ${CUDA_LIBRARIES} ${CUDA_cudadevrt_LIBRARY})
########## inc/libs/exe/features Ends Here ######################
Running this code gives:
unknown error in /home/ahdhn/test_cg/test_cg.cu at line 67
This is the first line that uses cudaMalloc. I made sure that the code is compiled for the correct architecture by querying __CUDA_ARCH__ from the device and the results is 700. Kindly let me know if you spot me doing something wrong in the code or the CMakeLists.txt file.
With external help, the solution that got the code working is to add string(APPEND CMAKE_CUDA_FLAGS " -gencode arch=compute_70,code=sm_70 --cudart shared") after the second set(CUDA_NVCC_FLAGS...... The reason is that I only have libcudadevrt.a under my /usr/local/cuda-10.0/lib64/ and so I have to signal CUDA to link shared/dynamic run-time library since the default is to link to static. string(APPEND CMAKE_CUDA_FLAGS " -gencode arch=compute_70,code=sm_70") after the second set(CUDA_NVCC_FLAGS...... The reason is that the sm_70 flag was not passed to the linker properly.
Additionally, using only CUDA_NVCC_FLAGS will only pass the sm_70 info to the compiler not the linker. While only using CMAKE_NVCC_FLAGS will report error: namespace "cooperative_groups" has no member "grid_group" error.

Why do I need separable compilation?

I have the code shown below. As far as I understood, separable compilation must be turned on when
CUDA device code is separated into .h and .cu files
Use ObjectA's device code into Object's B device code
however, in my main function I am not having any of the cases above. Could you tell me why do I have to set separable compilation for this sample project?
BitHelper.h
#pragma once
#include <cuda_runtime.h>
#define COMPILE_TARGET __host__ __device__
class BitHelper
{
public:
COMPILE_TARGET BitHelper();
COMPILE_TARGET ~BitHelper();
COMPILE_TARGET static void clear(unsigned int& val0);
};
BitHelper.cu
#include "bithelper.h"
BitHelper::BitHelper()
{}
BitHelper::~BitHelper()
{}
void BitHelper::clear(unsigned int& val0)
{
val0 = 0x0000;
}
Consume_BitHelper.h
#pragma once
class Consume_BitHelper
{
public:
void apply();
private:
bool test_cpu();
bool test_gpu();
};
Consume_BitHelper.cu
#include "consume_bithelper.h"
#include <cuda_runtime.h>
#include <iostream>
#include "bithelper.h"
__global__
void myKernel()
{
unsigned int FLAG_VALUE = 0x2222;
printf("GPU before: %d\n", FLAG_VALUE);
BitHelper::clear(FLAG_VALUE);
printf("GPU after: %d\n", FLAG_VALUE);
}
void Consume_BitHelper::apply()
{
test_cpu();
test_gpu();
cudaDeviceSynchronize();
}
bool Consume_BitHelper::test_cpu()
{
std::cout << "TEST CPU" << std::endl;
unsigned int FLAG_VALUE = 0x1111;
std::cout << "CPU before: " << FLAG_VALUE << std::endl;
BitHelper::clear(FLAG_VALUE);
std::cout << "CPU after : " << FLAG_VALUE << std::endl;
return true;
}
bool Consume_BitHelper::test_gpu()
{
std::cout << "TEST GPU" << std::endl;
myKernel << <1, 1 >> > ();
return true;
}
main.cu
#include "consume_bithelper.h"
#include "bithelper.h"
#include <iostream>
int main(int argc, char** argv)
{
Consume_BitHelper cbh;
cbh.apply();
std::cout << "\nPress any key to continue...";
std::cin.get();
return 0;
}
CMakeLists.txt
cmake_minimum_required(VERSION 3.10)
project(cuda_class LANGUAGES CXX CUDA)
#BitHelper needs separable compilation because we have separated declaration from definition
add_library(bithelper_lib STATIC bithelper.cu)
set_property(TARGET bithelper_lib PROPERTY CUDA_SEPARABLE_COMPILATION ON)
#Consume_BitHelper needs separable compilation because we call BitHelper's device code
#from Consume_BitHelper's kernel
add_library(consume_bithelper_lib STATIC consume_bithelper.cu)
set_property(TARGET consume_bithelper_lib PROPERTY CUDA_SEPARABLE_COMPILATION ON)
target_link_libraries(consume_bithelper_lib bithelper_lib)
#We only call CPU code so no need of separable compilation?
add_executable(${PROJECT_NAME} main.cu)
target_link_libraries(${PROJECT_NAME} bithelper_lib consume_bithelper_lib)
The errors I'm getting are these
EDIT
According to Robert Crovella's post Consume_BitHelper.cu uses BitHelper::clear defined in a separate compilation unit.
Does it mean I have to activate only separate compilation for BitHelper?
Since separate compilation has to do only with device code called from device code.
Why am I getting the mentioned errors when separate compilation is NOT on for cuda_class? (which is the executable created from CMake and is not calling any device code)
Separable compilation has to do with how the compiler handles function calls. In exchange for a little bit of overhead, you get the ability to make true function calls and thus access code from other "compilation units" (i.e. .cu source files).
As GPU programmers are obsessed with performance (particularly the extra registers that get used when separable compilation is enabled) Nvidia made it an option instead of default.
You should only need separable compilation for .cu files that access functions/globals defined in other .cu files.

Mingw32 std::isnan with -ffast-math

I am compiling the following code with the -ffast-math option:
#include <limits>
#include <cmath>
#include <iostream>
int main() {
std::cout << std::isnan(std::numeric_limits<double>::quiet_NaN() ) << std::endl;
}
I am getting 0 as output. How can my code tell whether a floating point number is NaN when it is compiled with -ffast-math?
Note: On linux, std::isnan works even with -ffast-math.
Since -ffast-math instructs GCC not to handle NaNs, it is expected that isnan() has an undefined behaviour. Returning 0 is therefore valid.
You can use the following fast replacement for isnan():
#if defined __FAST_MATH__
# undef isnan
#endif
#if !defined isnan
# define isnan isnan
# include <stdint.h>
static inline int isnan(float f)
{
union { float f; uint32_t x; } u = { f };
return (u.x << 1) > 0xff000000u;
}
#endif
On linux, the gcc flag -ffast-math breaks isnan(), isinf() and isfinite() - there may be other related functions that are also broken that I have not tested.
The trick of wrapping the function/macro in parentheses also did not work (ie. (isnan)(x))
Removing -ffast-math works ;-)