cupy.RawModule using name_expressions and nvcc and/or path - cupy

I am using CuPy for testing cuda kernels from a library. More specifically I use the cupy.RawModule to exploit the kernels in python. However, the kernels are templated and enclosed in a namespace. Before the name_expressions parameter to RawModule in CuPy 8.0.0, I had to copy the c++-mangled names into the get_function() method manually of the RawModule. Using name_expressions I thought that this should be possible, nevertheless, this requires the code to be compiled from source using the code parameter in combination with backend='nvrtc'.
Should it be possible to enable (any of the below)?:
name_expressions in conjunction with path
name_expressions in conjunction with backend='nvcc'

Should it be possible to enable (any of the below)?:
'name_expressions' in conjunction with 'path'
'name_expressions' in conjunction with 'backend'='nvcc'
The answer is no for both questions.
The name_expressions feature requires the source code for just-in-time (JIT) compilation of your C++ template kernels using NVRTC, whereas the path argument is for loading external cubin, fatbin, or ptx code. If you want to compile an external source code, you can do so by loading it in Python first, and then pass it as the code argument:
with open('my_cuda_cpp_code.cu') as f:
code = f.read()
mod = cp.RawModule(code=code, name_expressions=(...), ...)
Unfortunately unlike NVRTC, NVCC does not provide an API to return mangled names, so using NVCC is not possible. If you pass backend='nvcc' to RawModule, it'd raise an error.

Related

Defining custom compiler/linker in Cmake custom language

Questions
Main question: Can we use/define a placeholder like <CMAKE_MYDUMMYLANGUAGE_LINKER> for CMAKE_MYDUMMYLANGUAGE_LINK_EXECUTABLE the same way we may define CMAKE_MYDUMMYLANGUAGE_COMPILE_OBJECT with <CMAKE_MYDUMMYLANGUAGE_COMPILER>?
If you can elaborate or share good links on placeholders and custom languages, it would be appreciated too. Particularly:
Is there a list of placeholders somewhere or do we need to read the code to get them?
Can we define our own ones? And if yes, is it a recommended practice?
What is the usual way of finding and defining a custom linker?
What should we place into CMakeInformation.cmake, CMakeDetermineCompiler.cmake, and is there a standardized way of setting them?
Speaking of placeholder and tokens, what is the purpose of apparent redundant set with "#MYVAR#"? Is there any non-trivial usage outside packaging? Example in /usr/share/cmake-3.16/Modules/FindCUDA/run_nvcc.cmake
# Set these up as variables to make reading the generated file easier
set(CMAKE_COMMAND "#CMAKE_COMMAND#") # path
set(source_file "#source_file#") # path
(This question is in the continuity of How do CMake placeholders work?)
Context
When doing cross-compilation or intermediate compilations, I know we can use custom commands (for example to pre-compile a ragel file .rl into .cpp). But when browsing CMake available languages in a standard Linux distribution, there are custom *Information.cmake, *DetermineCompiler.cmake etc. files, defining custom languages.
I tried to make a custom one too, but got stuck with the linker, and it made me realize I don't understand placeholder stuff and the standard way of define custom languages (if any). Why can't we use the same logic as for the compiler?
CMakeMyDummyLanguageInformation.txt
set( CMAKE_MYDUMMYLANGUAGE_COMPILE_OBJECT
"<CMAKE_MYDUMMYLANGUAGE_COMPILER> ..."
)
# placeholder CMAKE_MYDUMMYLANGUAGE_LINKER unkown, resulting in `CMAKE_MYDUMMYLANGUAGE_LINKER ...` command to be attempted at link stage
set( CMAKE_MYDUMMYLANGUAGE_LINK_EXECUTABLE
"<CMAKE_MYDUMMYLANGUAGE_LINKER> ..."
)
CMakeMyDummyLanguageDetermineCompiler.txt
find_program(
CMAKE_MYDUMMYLANGUAGE_COMPILER
NAMES "dummycc"
HINTS "${CMAKE_SOURCE_DIR}"
DOC "dummy cc"
)
mark_as_advanced(CMAKE_MYDUMMYLANGUAGE_COMPILER)
find_program(
CMAKE_MYDUMMYLANGUAGE_LINKER
NAMES "dummyld"
HINTS "${CMAKE_SOURCE_DIR}"
DOC "dummy ld"
)
mark_as_advanced(CMAKE_MYDUMMYLANGUAGE_LINKER)
Some links related (more or less)
How do CMake placeholders work?
https://gitlab.kitware.com/cmake/cmake/-/commit/677c091b8140a0655f512b8283c5b120fbd0f99c?view=parallel
https://cmake.org/cmake/help/latest/module/CMakePackageConfigHelpers.html

Why do GNU Radio complex blocks (not custom) have different itemsizes?

I am running GNU Radio 3.7.13.4 and working in GNU Radio Companion on Ubuntu 18.04
I have a very simple flowgraph where I have a source of type complex (I've tried both a signal source and a constant source) which I connect to a transcendental block (of type complex) and then output to a sink of type complex (I don't think anything after the transcendental block matters).
I have tried the transcendental block with functions "sin", "cos" and "exp". When I execute the flowgraph, I get the error:
ValueError: itemsize mismatch: sig_source_c0:0 using 8, transcendental0:0 using 16
The transcendental block is meant to take any cmath functions so I thought perhaps there were different function names for the complex and float cases? Something like "ccos" or csin" but I haven't seen any on the list of available functions.
I have seen similar questions where people are creating custom blocks and OOT modules and seeing this problem. They have often used the wrong datatype (numpy complex 32 instead of 64).
I am not using any custom blocks. This problem is with stock/shipped GR blocks.
Any help is appreciated!

Best way to modify a built-in TensorFlow kernel

I would like to learn the best way to modify TensorFlow built-in operator kernels.
For example, I want to modify the value of static const double A in tensorflow/core/kernels/resize_bicubic_op.cc. I have come up with two possible ways:
Modify it directly and recompile the whole TensorFlow library. The problems of this solution are: A. This influences all the functions which use bicubic interpolation. B. This requires me to recompile the entire library and does not work when installing from a binary.
Define it as a custom op. The problem is that in the source code, there is no REGISTER_OP() inside. I don't know how to write the REGISTER_OP() for this bicubic function and whether other modification needs to be made.
Are there other better ways?
Thanks.
The best way to approach this problem is to build a custom op. See this tutorial for more details about how to add custom ops in general. The REGISTER_OP call for the tf.image.resize_bicubic() op is in tensorflow/core/ops/image_ops.cc.
Another alternative is to re-use the same op registration, and register a new kernel with the alternative implementation. This would enable you to use the (experimental) Graph.kernel_label_map() API to select an alternative implementation for the "ResizeBicubic" op. For example, you could do the following in your Python program:
_ = tf.load_op_library(...) # Load the .so containing your implementation.
with tf.get_default_graph().kernel_label_map({"ResizeBicubic": "my_impl"}):
images = tf.image.resize_bicubic(...) # Will use your implementation.
...and add a kernel registration that specifies the label "my_impl" with your C++ code:
template <typename Device, typename T>
class MyResizeBicubicOp<Device, T> : public OpKernel {
// Custom implementation goes here...
}
#define REGISTER_KERNEL(T) \
REGISTER_KERNEL_BUILDER(Name("ResizeBicubic") \
.Device(DEVICE_CPU) \
.Label("my_impl") \
.TypeConstraint<T>("T") \
.HostMemory("size"), \
MyResizeBicubicOp<CPUDevice, T>);
TF_CALL_REAL_NUMBER_TYPES(REGISTER_KERNEL);

Why does this module only have part of the registered functions available?

I've got this UTF-8 module for Lua.
The thing is that if require() it, only the first two functions (charbytes and len) are available. The rest is unavailable, despite being defined.
I tested this with a very simple script:
utf8 = require("utf8")
print(utf8.len, utf8.sub)
It returns: function: 0xsomeaddress nil. Why is that?
Lua 5.3 has an utf8 module and it's already loaded, so require("utf8") actually doesn't do anything with the modules.

Using system symbol table from VxWorks RTP

I have an existing project, originally implemented as a Vxworks 5.5 style kernel module.
This project creates many tasks that act as a "host" to run external code. We do something like this:
void loadAndRun(char* file, char* function)
{
//load the module
int fd = open (file, O_RDONLY,0644);
loadModule(fdx, LOAD_ALL_SYMBOLS);
SYM_TYPE type;
FUNCPTR func;
symFindByName(sysSymTbl, &function , (char**) &func, &type);
while (true)
{
func();
}
}
This all works a dream, however, the functions that get called are non-reentrant, with global data all over the place etc. We have a new requirement to be able to run multiple instances of these external modules, and my obvious first thought is to use vxworks RTP to provide memory isolation.
However, no matter what I try, I cannot persuade my new RTP project to compile and link.
error: 'sysSymTbl' undeclared (first use in this function)
If I add the correct include:
#include <sysSymTbl.h>
I get:
error: sysSymTbl.h: No such file or directory
and if i just define it extern:
extern SYMTAB_ID sysSymTbl;
i get:
error: undefined reference to `sysSymTbl'
I havent even begun to start trying to stitch in the actual module load code, at the moment I just want to get the symbol lookup working.
So, is the system symbol table accessible from VxWorks RTP applications? Can moduleLoad be used?
EDIT
It appears that what I am trying to do is covered by the Application Programmers Guide in the section on Plugins (section 4.9 for V6.8) (thanks #nos), which is to use dlopen() etc. Like this:
void * hdl= dlopen("pathname",RTLD_NOW);
FUNCPTR func = dlsym(hdl,"FunctionName");
func();
However, i still end up in linker-hell, even when i specify -Xbind-lazy -non-static to the compiler.
undefined reference to `_rtld_dlopen'
undefined reference to `_rtld_dlsym'
The problem here was that the documentation says to specify -Xbind-lazy and -non-static as compiler options. However, these should actually be added to the linker options.
libc.so.1 for the appropriate build target is then required on the target to satisfy the run-time link requirements.