CUDA Separable Compilation with CMake, invalid device function

CUDA Separable Compilation with CMake, invalid device function - cmake

I am developing a C++ application with cmake as the build system. Each component in the application builds into a static library, which the executable links to.
I am trying to link in some cuda code that is built as a separate static library, also with cmake. When I attempt to invoke the global function entry point in the cuda static library from the main application, everything seems to work fine - the cudaDeviceSynchronize that follows my global function invocation returns 0. However, the output of the kernel is not set and the call returns immediately.
I ran cuda-gdb. Despite the code being compiled with -g and -G, I was not able to break within the device function called by the kernel. So, I ran cuda-memcheck. When the kernel is launched, this message appears:
========= Program hit cudaErrorInvalidDeviceFunction (error 8) due to "invalid device function" on CUDA API call to cudaLaunchKernel.
I looked this up, and the NVIDIA docs/forum posts I read suggested this is usually due to compiling for the wrong compute capability. However, I'm running Titan V's, and the CC is correctly set to 7.0 when compiling.
I have set CUDA_SEPARABLE_COMPILATION on both the cuda library and the component in the main application that the cuda code links to per https://devblogs.nvidia.com/building-cuda-applications-cmake/. I've also tried setting CUDA_RESOLVE_DEVICE_SYMBOLS.
Here is the relevant portion of the cmake for the main application:
(kronmult_cuda is the component in the main application that links to the cuda library ${KRONLIB}. another component, kronmult, links to kronmult_cuda. Eventually, something that links to kronmult is linked to the main application).
find_package(CUDA 9.0 REQUIRED)
include_directories(${CUDA_INCLUDE_DIRS})
enable_language(CUDA)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -arch sm_70 -g --ptxas-options=-O3")
set_source_files_properties( src/kronmult_cuda.cpp PROPERTIES LANGUAGE CUDA ) # no .cu extension
...
target_include_directories(kronmult_cuda PRIVATE ${KRON_PATH})
target_link_libraries(kronmult_cuda PRIVATE OpenMP::OpenMP_CXX PUBLIC ${KRON_LIB})
if (ASGARD_USE_CUDA)
set_target_properties(kronmult_cuda
PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
endif()
if(APPLE AND ASGARD_USE_GPU)
set_target_properties(kronmult_cuda
PROPERTIES
BUILD_RPATH ${CMAKE_CUDA_IMPLICIT_LINK_DIRECTORIES})
endif ()
target_link_libraries(kronmult PRIVATE kronmult_cuda)
...
Full CMakeLists: https://github.com/bmcdanie/ASGarD/blob/feature/kronmult/CMakeLists.txt.
relevant CMakeLists portion for cuda library:
project(kronmult LANGUAGES CXX CUDA)
set(KRONSRC
[list of all sources]
)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -arch sm_70 -g --ptxas-options=-O3")
set_source_files_properties( ${KRONSRC} PROPERTIES LANGUAGE CUDA )
add_library(kron STATIC ${KRONSRC})
target_compile_features(kron PUBLIC cxx_std_11)
set_target_properties( kron
PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
Full CmakeLists: https://github.com/project-asgard/kronmult/blob/master/CMakeLists.txt.
What am I missing here?
EDIT:
Output of cuda-gdb when I attempt to call kernel:
Thread 1 "asgard" hit Breakpoint 1, kronmult2_xbatched<double> (n=2, Aarray_=0x15551fa24800, lda=8, pX_=0x15551fa23c00, pY_=0x15551fa24400, pW_=0x15551fa24000, batchCount=128)
at /home/3bm/asgard/contrib/kronmult/src/kronmult-ext/kronmult2_xbatched.hpp:36
36 {
(cuda-gdb) step
__wrapper__device_stub_kronmult2_xbatched<double> (__cuda_0=#0x7fffffff9e1c: 2, __cuda_1=0x15551fa24800, __cuda_2=#0x7fffffff9e18: 8, __cuda_3=0x15551fa23c00,
__cuda_4=0x15551fa24400, __cuda_5=0x15551fa24000, __cuda_6=#0x7fffffff9e30: 128) at /tmp/tmpxft_0000ac33_00000000-5_kronmult_cuda.cudafe1.stub.c:40
40 /tmp/tmpxft_0000ac33_00000000-5_kronmult_cuda.cudafe1.stub.c: No such file or directory.
(cuda-gdb) step
__device_stub__Z18kronmult2_xbatchedIdEviPKPKT_iPPS0_S6_S6_i (__par0=2, __par1=0x15551fa24800, __par2=8, __par3=0x15551fa23c00, __par4=0x15551fa24400, __par5=0x15551fa24000,
__par6=128) at /tmp/tmpxft_0000ac33_00000000-5_kronmult_cuda.cudafe1.stub.c:39
39 in /tmp/tmpxft_0000ac33_00000000-5_kronmult_cuda.cudafe1.stub.c
(cuda-gdb) step
dim3::dim3 (this=0x7fffffff9d28, vx=1, vy=1, vz=1)
at /home/dg6/spack/opt/spack/linux-ubuntu18.04-x86_64/gcc-7.3.0/cuda-10.0.130-s6ervywpchxmerrju62il7xkeeamlfcv/include/vector_types.h:420
420 __host__ __device__ dim3(unsigned int vx = 1, unsigned int vy = 1, unsigned int vz = 1) : x(vx), y(vy), z(vz) {}
(cuda-gdb) step
dim3::dim3 (this=0x7fffffff9d34, vx=1, vy=1, vz=1)
at /home/dg6/spack/opt/spack/linux-ubuntu18.04-x86_64/gcc-7.3.0/cuda-10.0.130-s6ervywpchxmerrju62il7xkeeamlfcv/include/vector_types.h:420
420 __host__ __device__ dim3(unsigned int vx = 1, unsigned int vy = 1, unsigned int vz = 1) : x(vx), y(vy), z(vz) {}
(cuda-gdb) step
cudaLaunchKernel<char> (
func=0x5555555f94c0 <kronmult2_xbatched<double>(int, double const* const*, int, double**, double**, double**, int)> "UH\211\345H\203\354\060\211}\374H\211u\360\211U\370H\211M\350L\211E\340L\211M\330L\213E\330H\213}\340H\213M\350H\215U\370H\213u\360H\215E\374H\203\354\bL\215M\020AQM\211\301I\211\370H\211\307\350\355\343\377\377H\203\304\020\220\311\303UH\211\345H\203\354\060\211}\374H\211u\360\211U\370H\211M\350L\211E\340L\211M\330L\213E\330H\213}\340H\213M\350H\215U\370H\213u\360H\215E\374H\203\354\bL\215M\020AQM\211\301I\211\370H\211\307\350\267\345\377\377H\203\304\020\220\311\303UH\211\345H\203\354\060\211}\374H\211u\360\211U\370H\211M\350L\211E\340L\211", <incomplete sequence \330>..., gridDim=..., blockDim=...,
args=0x7fffffff9d40, sharedMem=0, stream=0x0)
at /home/dg6/spack/opt/spack/linux-ubuntu18.04-x86_64/gcc-7.3.0/cuda-10.0.130-s6ervywpchxmerrju62il7xkeeamlfcv/bin/..//include/cuda_runtime.h:202
202 return ::cudaLaunchKernel((const void *)func, gridDim, blockDim, args, sharedMem, stream);
(cuda-gdb) step
warning: Cuda API error detected: cudaLaunchKernel returned (0x8)

After the helpful hint from #talonmies, I suspected this was a device linking problem. I simplified my build process, included all CUDA files in one translation unit, and turned off SEPARABLE COMPILATION.
Still, I did not see a cmake_device_link.o in either my main application binary or the component that called into my cuda library. And, still had the same error. Tried setting CUDA_RESOLVE_DEVICE_SYMBOLS to no effect.
Finally, I tried building the component that calls into my cuda library as SHARED. I saw the device linking step when building the .so in my cmake output, and the program runs fine. I do not know why building SHARED fixes what I suspect was a device linking problem - will accept any answer that deciphers that?

Related

At which address is qemu expecting to find the image?

I'm working with the qemu riscv32 emulator. I have managed to boot a simple hello-world image I have got from github, however I haven't managed to boot my own image. I suspect this is because I built my image without a linker script, therefore it is being loaded at the wrong address. I'm trying to understand how the qemu boot sequence works to fix this.
This is the linker script I'm using
OUTPUT_ARCH( "riscv" )
OUTPUT_FORMAT("elf32-littleriscv")
ENTRY( _start )
SECTIONS
{
/* text: test code section */
. = 0x20400000;
.text : { *(.text) }
/* gnu_build_id: readonly build identifier */
.gnu_build_id : { *(.note.gnu.build-id) }
/* rodata: readonly data segment */
.rodata : { *(.rodata) }
/* data: Initialized data segment */
. = 0x80000000;
.data : { *(.data) }
.sdata : { *(.sdata) }
.debug : { *(.debug) }
. += 0x1000;
stack_top = .;
/* End of uninitalized data segement */
_end = .;
}
And this is the qemu command I'm executing:
qemu-system-riscv32 -nographic -machine sifive_e -bios none -kernel hello
# with -s -S when debugging
The source code is not very relevant, it is just a small assembly file that writes "hello".
My main question is:
How can I know at which address is qemu expecting to find the image?
Other questions I would like to answer:
With gdb, I have noticed that qemu starts executing at address 0x1004 (before me doing anything). I was expecting it to be 0x0. Why is this?
I have read hat qemu can use U-boot. Does it use it, or any other bootloader, by default?
If so, is there any way to load an image at address 0x0 without any sort of bootloader intervening? (I ask this for debugging purposes, because the first time you try a new arch. possibly yo want to keep everything as simple as possible)
Does the kernel option just load the provided image, or does it something more? (like loading a Linux kernel and execute the provided image on top of it)
I'm using the sifive_e emulator, therefore I have gone to the SiFive E series datasheet (like this one ) to check the memory map, and find the starting address. This is what I have found:
Those address are very different from those specified in the linker script above. It seems I'm looking at the wrong place, where can I found the SiFive E boot address?
EDIT
With regards to the last question about the memory map, I found the answer. It is explained here (5.16) and here (chapter 6)

debugging vxworks loadModule failure

I have a VxWorks Image Project project without a File-System on MPC5200B, using DIAB tool-chain.
I need to dynamically load a module from flash.
I allocated memory on my stack char myTemporaryModuleData[MAX_MODULE_SIZE]
and filled it with the module data from Flash.
(checked that the binary data is intact)
then i create a memDevice('/mem/mem01', myTemporaryModuleData, moduleReadLength)
open the psuedo-stream int fdModuleData = open("/mem/mem01", O_RDONLY, 777);
when i run int mId = loadModule(fdModuleData, LOAD_ALL_SYMBOLS);
did not see anything in the console after running loadModule();
but mId = 0 which indicates failure :(.
getErrno() returned 0x3D0004 (S_objLib_OBJ_TIMEOUT)
NOTE: it didn't take long at all to fail => timeout?
i tried replacing the module with a simple void foo() { printf(...); } module but still failes with same issue.
tried loading an .out instead of .o
unfortunately, nothing got me nowhere,
How can i know what caused it to fail? (log, last_error, anything i should check?)

FOUND IT.
Apparently, it was a mistake in the data read from the flash.
What I can contribute is that 'loadModule()' from memDrv device is possible and working.

wx Widgets multiple errors in Windows 10

I have downloaded wxWidgets-3.0.2 and am trying to create a simple FRAME from the main program.
Unfortunately I am receiving multiple errors that are all to do with wxWidgets, NOT my code. This is odd as I am lead to believe wxWidgets should work.
Here are some of the errors I am getting:
*\msw\chkconf.h(19): Error! E080: col(10) "wxUSE_ACTIVEX must be defined."
\msw\chkconf.h(394): Error! E080: col(13) "wxUSE_DATAOBJ requires wxUSE_OLE"
\msw\chkconf.h(414): Error! E080: col(13) "wxMediaCtl requires wxActiveXContainer"
\chkconf.h(1630): Error! E080: col(13) "wxRearrangeCtrl requires wxCheckListBox"
\vector.h(197): Error! E148: col(71) access to private member 'reverse_iterator::m_ptr' is not allowed
\vector.h(187): Note! N392: col(21) definition: 'wxToolTip * * wxVector<wxToolTip *>::reverse_iterator::m_ptr'*
Why am I receiving these messages when wxWidgets is supposed to be ready to go?

Have you configured properly? If not try this link to configure on Windows. Also, you can try to use Linux. It is much easier to set up WxWidgets.

High level steps to installing wxWidgets:
Download ( you appear to have done this )
Build library ( I cannot tell if you have done this )
-OR-
Download binary built libraries if available for your compiler ( what compiler are you using - you haven't told us! )
Build one or two of the sample programs ( It really does not look like you have done this )
NOT UNTIL you have completed all these steps are you "ready to go"

Mounting a network volume from OS X App

I'm trying to mount a network volume in a OS X App.
I get it to work using the FSMountServerVolume function which is deprecated. The documentation says "To mount a network volume, use NetFSMountURLAsync instead". But when I try to use this function I get the following error message:
dyld: lazy symbol binding failed: Symbol not found: _NetFSMountURLSync
Referenced from: /Users/username/Library/Developer/Xcode/DerivedData/AppName-ammmlfwhvlfxkdburfmzioformdn/Build/Products/Debug/AppName.app/Contents/MacOS/AppName
Expected in: /System/Library/Frameworks/NetFS.framework/Versions/A/NetFS
dyld: Symbol not found: _NetFSMountURLSync
Referenced from: /Users/username/Library/Developer/Xcode/DerivedData/AppName-ammmlfwhvlfxkdburfmzioformdn/Build/Products/Debug/AppName.app/Contents/MacOS/AppName
Expected in: /System/Library/Frameworks/NetFS.framework/Versions/A/NetFS
Did I forget anything? I imported the NetFS Framework.

OK, it looks like NetFSMountURLSync() etc where introduced in 10.8.
From NetFS Changes:
Added AsyncRequestID
Added NetFSMountURLAsync()
Added NetFSMountURLBlock
Added NetFSMountURLCancel()
Added NetFSMountURLSync()
Added #def kNAUIOptionAllowUI
Added #def kNAUIOptionForceUI
Added #def kNAUIOptionKey
Added #def kNAUIOptionNoUI
Added #def kNetFSMountAtMountDirKey
Therefore you are going to have to use the "old way" in 10.7 and below and the "new way" in 10.8 and above. This means making the NetFS.framework optional rather than required and the need to perform various runtime checks to see which API you need to use.

windbg - module not loading "dll not found in the image list"

I am trying to get a proper callstack for an unhandled exception in my VS2010 .net4 application using windbg.
The main program is a console application. This dll loads with it's symbols properly.
In the same directory, I have a dll+matching pdb which won't load.
I am running the app on a windows server 2008 R2, 64 bit (no VS installed). But the app was compiled on 32bit. I am using the winX86 debugger to attach to the process.
I have downloaded sosex that supports .net 4 from
http://www.stevestechspot.com/ (the 32 bit version)
the sos.dll (from "C:\Program Files (x86)\Debugging Tools for Windows (x86)\clr10\sos.dll") version is 6.12.2.633.
Issued the following commands:
sympath+ "...folder of exe and dll"
.loadby sos clr
.load sosex.dll
When running !mk I get the following:
Thread 0:
ESP EIP
00:U 0016ec6c 5f636578 0x5f636578
01:U 0016ec70 05f1380f SN!SN_MedistoreEngine::FetchNotes+0x13f [v:\mp\mp\src\sn\sn_medistoreengine.cpp # 59]
02:M 0016edbc 07dd0db1 SN_Bridge.SNB_bridge.fetch_notes(IOD_Msg*, System.String, System.String, Boolean)(+0x4a IL)(+0x131 Native) [v:\mp\portal\commonutils\src\snb\snb_bridge.cpp, # 142,0]
03:M 0016ef24 055bfde5 SN_Bridge.SNB_bridge.FetchNotesByStudyUId(System.String, System.String, System.Collections.Generic.List`1<SN_Bridge.SNB_StickyNote>)(+0x25 IL)(+0x55 Native) [v:\mp\portal\commonutils\src\snb\snb_bridge.cpp, # 296,0]
04:M 0016efb0 055bfd22
"xxx.Services.StickyNotes.dll" was not found in the image list.
Debugger will attempt to load "xxx.Services.StickyNotes.dll" at given base 00000000.
Please provide the full image name, including the extension (i.e. kernel32.dll)
for more reliable results.Base address and size overrides can be given as
.reload <image.ext>=<base>,<size>.
Unable to add module at 00000000
"xxx.ni.Services.StickyNotes.dll" was not found in the image list.
Debugger will attempt to load "xxx.ni.Services.StickyNotes.dll" at given base 00000000.
Please provide the full image name, including the extension (i.e. kernel32.dll)
for more reliable results.Base address and size overrides can be given as
.reload <image.ext>=<base>,<size>.
Unable to add module at 00000000
xxx.Services.StickyNotes.StickyNotesLogic.StickyNotesByStudyID(System.String, System.String, System.String)(+0x1d IL)(+0x52 Native)
05:M 0016efcc 055ba0ab SNConsole.Program.Main(System.String[])(+0x101 IL)(+0x24b Native) [D:\Documents and Settings\tamar\My Documents\Visual Studio 2010\Projects\SNConsole\Program.cs, # 48,17]
06:U 0016f02c 72da21db clr+0x21db
07:U 0016f034 72dae021 clr!DllUnregisterServerInternal+0x8025
08:U 0016f090 72dbc58d clr!DllUnregisterServerInternal+0x16591
When running !clrstack I get the following:
PDB symbol for clr.dll not loaded
OS Thread Id: 0x2984 (0)
Child SP IP Call Site
0016edc8 5f636578 [InlinedCallFrame: 0016edc8]
0016edc4 07dd0db1 SN_Bridge.SNB_bridge.fetch_notes(IOD_Msg*, System.String, System.String, Boolean) [v:\mp\portal\commonutils\src\snb\snb_bridge.cpp # 142]
0016ef30 055bfde5 SN_Bridge.SNB_bridge.FetchNotesByStudyUId(System.String, System.String, System.Collections.Generic.List`1<SN_Bridge.SNB_StickyNote>) [v:\mp\portal\commonutils\src\snb\snb_bridge.cpp # 296]
0016efb0 055bfd22 xxx.Services.StickyNotes.StickyNotesLogic.StickyNotesByStudyID(System.String, System.String, System.String)
0016efcc 055ba0ab SNConsole.Program.Main(System.String[]) [D:\Documents and Settings\tamar\My Documents\Visual Studio 2010\Projects\SNConsole\Program.cs # 48]
0016f25c 72da21db [GCFrame: 0016f25c]
As you can see, in both cases I get the file name + row number in the stack, except for the xxx.Services.StickyNotes.dll line.
I've tried:
.realod /f ""c:...\xxx.Services.StickyNotes.dll" - same errors
and
ld "c:...\xxx.Services.StickyNotes.dll" which resulted in
No modules matched
'c:...\Bin\xxx.Services.StickyNotes.dll'
using !sym noisy didn't help, I think it doesn't even try to load the pdb since the module itself isn't loaded.
I can't figure out why this specific dll won't load. SN_Console.exe and SNB_Bridge.dll load without a problem from the same directory.
(I am not concerned with the exception itself, I planted the code that creates is. The issue is about setting up a good debugging environment).
Thanks in advance,
Tamar

I happened to answer this question in person, but here's the documented story in case anyone else stumbles upon it: http://blogs.microsoft.co.il/blogs/sasha/archive/2011/01/16/clr-4-does-not-use-loadlibrary-to-load-assemblies.aspx
The big picture is that CLR 4 doesn't use LoadLibrary to load assemblies, so the debugger can't pick up the DLL and therefore its symbols.

You can hint the debugger the location of the DLL in its address space using:
.reload /f "c:...\xxx.Services.StickyNotes.dll=image_base_address"
image_base_address is the start of the address range which this DLL is mapped in the process. You can find this value in the DLL Pane of Process Explorer.
I wrote about it with a little more detail here.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

CUDA Separable Compilation with CMake, invalid device function - cmake

Related

At which address is qemu expecting to find the image?

debugging vxworks loadModule failure

wx Widgets multiple errors in Windows 10

Mounting a network volume from OS X App

windbg - module not loading "dll not found in the image list"

Categories

Resources