I am working on hardware testing. Our test framework is written in C# but we are using native dlls to talk to hardware.
Say we have a C++ method:
unsigned char someMethod(unsigned long * nativeStatus)
which in turns executes an embedded command and returns a status when command completes.
To use it we create a wrapper
[DllImport(#"native.dll", CharSet=CharSet.Ansi, CallingConvention=CallingConvention.Cdecl)]
internal static extern Byte someMethod(ref UInt32 managedStatus)
This works fine. But there is a scenario when someMethod call does not actually execute a command but just adds it to a sequence. Then the sequence can be executed by sending a special command say ExecuteSequence. As the sequence is being executed C++ code updates the nativeStatus by just copying the data into the memory referenced by the nativeStatus pointer. As the sequence completes ExecuteSequence method returns. At this time I am sure that all data (nativeStatus in this case) is updated. Will my managedStatus be correctly updated as well? I heard that in this case managedStatus and nativeStatus are not pointing to the same memory. Marshaler just returns a copy of nativeState after call completes. If not what is the solution? Do I need to use the unsave keyword and put my code creating and executing a sequence in the fixed{} block?
[DllImport(#"native.dll", CharSet=CharSet.Ansi, CallingConvention=CallingConvention.Cdecl)]
internal static unsave extern Byte someMethod(UInt32 * managedStatus)
So what you need is a variable location of which will not change over a timespan.
Yes, you can use fixed{} for that.
Alternatively, you can pin that variable:
private uint g_Pinnable = 0;
...
var gc = System.Runtime.InteropServices.GCHandle.Alloc(g_Pinnable, System.Runtime.InteropServices.GCHandleType.Pinned);
try
{
// Pass 'ref g_Pinnable' to API
// Then execute the sequence.
}
finally
{
gc.Free(); // Reference to g_Pinnable may be invalid after this point
}
Related
We have a plugin system that calls functions in dlls (user-generated plugins) by dlopening/LoadLibrarying the dll/so/dylib and then dlsyming/GetProcAddressing the function, and then storing that result in a function pointer.
Unfortunately, due to some bad example code being copy-pasted, some of these dlls in the wild do not have the correct function signature, and do not contain a return statement.
A dll might contain this:
extern "C" void Foo() { stuffWithNoReturn(); } // copy-paste from bad code
or it might contain this:
extern "C" int Foo() { doStuff(); return 1; } // good code
The application that loads the dll relies on the return value, but there are a nontrivial number of dlls out there that don't have the return statement. I am trying to detect this situation, and warn the user about the problem with his plugin.
This naive code should explain what I'm trying to do:
typedef int (*Foo_f)(void);
Foo_f func = (Foo_f)getFromDll(); // does dlsym or GetProcAddress depending on platform
int canary = 0x42424242;
canary = (*func)();
if (canary == 0x42424242)
printf("You idiot, this is the wrong signature!!!\n");
else
real_return_value = canary;
This unfortunately does not work, canary contains a random value after calling a dll that has the known defect. I naively assumed calling a function with no return statement would leave the canary intact, but it doesn't.
My next idea was to write a little bit of inline assembler to call the function, and check the eax register upon return, but Visual Studio 2015 doesn't allow __asm() in x64 code anymore.
I know there is no standards-conform solution to this, as casting the function pointer to the wrong type is of course undefined behavior. But if someone has a solution that works at least on 64bit Windows with Visual C++, or a solution that works with clang on MacOS, I would be most delighted.
#Lorinczy Zsigmond is right in that the contents of the register are undefined if the function does something but returns nothing.
We found however that in practice, the plugins that return nothing also have almost always empty functions that compile to a retn 0x0 and leaves the return register untouched. We can detect this case by spraying the rax register with a known value (0xdeadbeef) and checking for that.
I have a C++/CLI method, ManagedMethod, with one output argument that will be modified by a native method as such:
// file: test.cpp
#pragma unmanaged
void NativeMethod(int& n)
{
n = 123;
}
#pragma managed
void ManagedMethod([System::Runtime::InteropServices::Out] int% n)
{
pin_ptr<int> pinned = &n;
NativeMethod(*pinned);
}
void main()
{
int n = 0;
ManagedMethod(n);
// n is now modified
}
Once ManagedMethod returns, the value of n has been modified as I would expect. So far, the only way I've been able to get this to compile is to use a pin_ptr inside ManagedMethod, so is pinning in fact the correct/only way to do this? Or is there a more elegant way of passing n to NativeMethod?
Yes, this is the correct way to do it. Very highly optimized inside the CLR, the variable gets the [pinned] attribute so the CLR knows that it stores an interior pointer to an object that should not be moved. Distinct from GCHandle::Alloc(), pin_ptr<> can do it without creating another handle. It is reported in the table that the jitter generates when it compiles the method, the GC uses that table to know where to look for object roots.
Which only ever matters when a garbage collection occurs at the exact same time that NativeMethod() is running. Doesn't happen very often in practice, you'd have to use threads in the program. YMMV.
There is another way to do it, doesn't require pinning but requires a wee bit more machine code:
void ManagedMethod(int% n)
{
int copy = n;
NativeMethod(copy);
n = copy;
}
Which works because local variables have stack storage and thus won't be moved by the garbage collector. Does not win any elegance points for style but what I normally use myself, estimating the side-effects of pinning is not that easy. But, really, don't fear pin_ptr<>.
I'm optimizing a very time-critical CUDA kernel. My application accepts a wide range of switches that affect the behavior (for instance, whether to use 3rd or 5th order derivative). Consider as an approximation a set of 50 switches, where every switch is an integer variable (a bool sometimes, or a float, but this case is not so relevant for this question).
All these switches are constant during the execution of the application. Most of these switches are run-time and I store them in constant memory, so to exploit the caching mechanism. Some other switches can be compile-time and the customer is fine with having to re-compile the application if he wants to change the value in the switch. A very simple example could be:
__global__ void mykernel(const float* in, float *out)
{
for ( /* many many times */ )
if (compile_time_switch)
do_this(in, out);
else
do_that(in, out);
}
Assume that do_this and do_that are compute-bound and very cheap, that I optimize the for loop so that its overhead is negligible, that I have to place the if inside the iteration. If the compiler recognizes that compile_time_switch is static information it can optimize out the call to the "wrong" function and create code that is just as optimized as if the if weren't there. Now the real question:
In which ways can I provide the compiler with the static value of this switch? I see two such ways, listed below, but none of them work for me. What other possibilities remain?
Template parameters
Providing a template parameter enables this static optimization.
template<int compile_time_switch>
__global__ void mykernel(const float* in, float *out)
{
for ( /* many many times */ )
if (compile_time_switch)
do_this(in, out);
else
do_that(in, out);
}
This simple solution does not work for me, since I don't have direct access to the code that calls the kernel.
Static members
Consider the following struct:
struct GlobalParameters
{
static const bool compile_time_switch = true;
};
Now GlobalParameters::compile_time_switch contains the static information as I want it, and that compiler would be able to optimize the kernel. Unfortunately, CUDA does not support such static members.
EDIT: the last statement is apparently wrong. the definition of the struct is of course legit and you are able to use the static member GlobalParameters::compile_time_switch in device code. The compiler inlines the variable, so that the final code will directly contain the value, not a run-time variable access, which is the behavior you would expect from an optimizer compiler. So, the second options is actually suitable.
I consider my problem solved both thanks to this fact and to kronos' answer. However, I'm still looking for other alternative methods to provide compile-time information to the compiler.
Yor third options are preprocessor definitions:
#define compile_time_switch 1
__global__ void mykernel(const float* in, float *out)
{
for ( /* many many times */ )
if (compile_time_switch)
do_this(in, out);
else
do_that(in, out);
}
The preprocessor will discard the else case compleatly and the compiler has nothing to optimize in his dead code elemination pass, because there is no dead code.
Furthermore, you can specify the definition with the -D comand line switch and (I think) any by nvidia supported compiler will accept -D (msvc may use a different switch).
I've inherited a piece of custom test equipment with a control library built in a COM object, and I'm trying to connect it to our Tcl test script library. I can connect to the DLL using TCOM, and do some simple control operations with single int parameters. However, certain features are controlled by passing in a C/C++ struct that contains the control blocks, and attempting to use them in TCOM is giving me an error 0x80020005 {Type mismatch.}. The struct is defined in the .idl file, so it's available to TCOM to use.
The simplest example is a particular call as follows:
C++ .idl file:
struct SourceScaleRange
{
float MinVoltage;
float MaxVoltage;
};
interface IAnalogIn : IDispatch{
...
[id(4), helpstring("method GetAdcScaleRange")] HRESULT GetAdcScaleRange(
[out] struct SourceScaleRange *scaleRange);
...
}
Tcl wrapper:
::tcom::import [file join $::libDir "PulseMeas.tlb"] ::char
set ::characterizer(AnalogIn) [::char::AnalogIn]
set scaleRange ""
set response [$::characterizer(AnalogIn) GetAdcScaleRange scaleRange]
Resulting error:
0x80020005 {Type mismatch.}
while executing
"$::characterizer(AnalogIn) GetAdcScaleRange scaleRange"
(procedure "charGetAdcScaleRange" line 4)
When I dump TCOM's methods, it knows of the name of the struct, at least, but it seems to have dropped the struct keyword. Some introspection code
set ifhandle [::tcom::info interface $::characterizer(AnalogIn)]
puts "methods: [$ifhandle methods]"
returns
methods: ... {4 VOID GetAdcScaleRange {{out {SourceScaleRange *} scaleRange}}} ...
I don't know if this is meaningful or not.
At this point, I'd be happy to get any ideas on where to look next. Is this a known TCOM limitation (undocumented, but known)? Is there a way to pre-process the parameter into an appropriate format using tcom? Do I need to force it into a correctly sized block of memory via binary format by manual construction? Do I need to take the DLL back to the original developer and have him pull out all the struct parameters? (Not likely to happen, in this reality.) Any input is good input.
i've assumed that dumping a .bc file from a module was a trivial operation, but now,
first time i have to actually do it from code, for the life of me i
can't find one missing step in the process:
static void WriteModule ( const Module * M, BitstreamWriter & Stream )
http://llvm.org/docs/doxygen/html/BitcodeWriter_8cpp.html#a828cec7a8fed9d232556420efef7ae89
to write that module, first i need a BistreamWriter
BitstreamWriter::BitstreamWriter (SmallVectorImpl< char > &O)
http://llvm.org/docs/doxygen/html/classllvm_1_1BitstreamWriter.html
and for a BitstreamWriter i need a SmallVectorImpl. But, what next?
Should i write the content of the SmallVectorImpl byte by byte on a
file handler myself? is there a llvm api for this? do i need something
else?
The WriteModule function is static within lib/Bitcode/Writer/BitcodeWriter.cpp, which means it's not there for outside consumption (you can't even access it).
The same file has another function, however, called WriteBitcodeToFile, with this interface:
/// WriteBitcodeToFile - Write the specified module to the specified output
/// stream.
void llvm::WriteBitcodeToFile(const Module *M, raw_ostream &Out);
I can't imagine a more convenient interface. The header file declaring it is ./include/llvm/Bitcode/ReaderWriter.h, by the way.
I use following code :
std::error_code EC;
llvm::raw_fd_ostream OS("module", EC, llvm::sys::fs::F_None);
WriteBitcodeToFile(pBiFModule, OS);
OS.flush();
and then disassemble using llvm-dis.