Get the address in ARM Inline assembly - inline-assembly

The IAR compiler for ARM Cortex-M3 provides inline assembly. How can one store the address of a specific function to a location on the stack?
The C code would like this
void tick();
void bar()
{
int x;
// modify a value on stack
(&x)[4] = &tick;
}
While this works in general it is optimized away by the compiler in release build. I have tried to code it with inline assembly
void bar()
{
int x;
asm("ldr r0,%0" : : "i" (&tick));
asm("str r0,[sp+#0x10];
}
The ldr instruction is not accepted by the IAR compiler. The problem is that this instruction requires an addressing mode with a register and offset. The actual address of the function tick is store behind the function and the ldr instruction holds only the offset to the memory location the holds the actual address. The disassembly is similar like this:
ldr r0,??tick_address
str r0,[sp+#0x10]
bx lr ; return
??tick_address dd tick
How do I get the address of tick immediately to a register to use it for the stack manipulation?

GNU GCC inline assembly can do mere assignments via pseudo-empty asm() statements, like:
asm("" : "=r"(foo) : "0"(tick));
This tells the compiler:
The variable foo is to be taken from a register after the inline assembly block
The variable tick is to be passed in - in the same register (argument zero)
The actual choice of which register is to be used is completely left to the compiler.
The trick here are the output and input constraints - we just alias the (one and only) input to the output, and the compiler will, on its own, choose a suitable register, and generate the instructions necessary to load / store the respective variables before / after the "actual" inline assembly code. You could even do:
asm("" : "=r"(foo1), "=r"(foo2) : "0"(tick1) , "1"(tick2));
to do two "assignments" in a single inline assembly statement.
This compiler-generated "set the inputs, retrieve the outputs" code generation happens even if the actual inline assembly is empty (as here).
Another example: Say you want to read the current program counter - the PC register. You can do that on ARM via two different inline assembly statements:
asm("" : "=pc"(foo));
asm("mov %0, PC" : "=r"(foo));
This is not 100% identical; in the second case, the compiler knows that whatever register it wants to see foo in after the asm, it'll find it there. In the former, the compiler knows that were it to use foo after the statement, it needs to retrieve it from PC. The difference between the two would be if you did:
uintptr_t *val;
uintptr_t foo;
asm("" : "=pc"(foo));
*val = foo;
In this case, the compiler can possibly identify that this can be turned into a single str [R...], PC because it knows foo is in pc after the asm. Were you to write this via
asm("mov %0, PC" : "=r"(foo));
*val = foo;
the compiler would be forced to create (assuming it chooses R0 / R1 for foo/val):
MOV R0, PC
STR [R1], R0
The documentation for this behaviour is largely in the "Extended ASM" section of the GCC manuals, see the example for the contrived combine instruction.

There is no assignment the variable x in your code, therefore it's value is undefined and setting foo to an undefined value isn't required to change foo.
You need to assign the value to the variable, not to some memory location you assume the compiler use to implement it.

Related

Check whether function called through function-pointer has a return statement

We have a plugin system that calls functions in dlls (user-generated plugins) by dlopening/LoadLibrarying the dll/so/dylib and then dlsyming/GetProcAddressing the function, and then storing that result in a function pointer.
Unfortunately, due to some bad example code being copy-pasted, some of these dlls in the wild do not have the correct function signature, and do not contain a return statement.
A dll might contain this:
extern "C" void Foo() { stuffWithNoReturn(); } // copy-paste from bad code
or it might contain this:
extern "C" int Foo() { doStuff(); return 1; } // good code
The application that loads the dll relies on the return value, but there are a nontrivial number of dlls out there that don't have the return statement. I am trying to detect this situation, and warn the user about the problem with his plugin.
This naive code should explain what I'm trying to do:
typedef int (*Foo_f)(void);
Foo_f func = (Foo_f)getFromDll(); // does dlsym or GetProcAddress depending on platform
int canary = 0x42424242;
canary = (*func)();
if (canary == 0x42424242)
printf("You idiot, this is the wrong signature!!!\n");
else
real_return_value = canary;
This unfortunately does not work, canary contains a random value after calling a dll that has the known defect. I naively assumed calling a function with no return statement would leave the canary intact, but it doesn't.
My next idea was to write a little bit of inline assembler to call the function, and check the eax register upon return, but Visual Studio 2015 doesn't allow __asm() in x64 code anymore.
I know there is no standards-conform solution to this, as casting the function pointer to the wrong type is of course undefined behavior. But if someone has a solution that works at least on 64bit Windows with Visual C++, or a solution that works with clang on MacOS, I would be most delighted.
#Lorinczy Zsigmond is right in that the contents of the register are undefined if the function does something but returns nothing.
We found however that in practice, the plugins that return nothing also have almost always empty functions that compile to a retn 0x0 and leaves the return register untouched. We can detect this case by spraying the rax register with a known value (0xdeadbeef) and checking for that.

Compile-time information in CUDA

I'm optimizing a very time-critical CUDA kernel. My application accepts a wide range of switches that affect the behavior (for instance, whether to use 3rd or 5th order derivative). Consider as an approximation a set of 50 switches, where every switch is an integer variable (a bool sometimes, or a float, but this case is not so relevant for this question).
All these switches are constant during the execution of the application. Most of these switches are run-time and I store them in constant memory, so to exploit the caching mechanism. Some other switches can be compile-time and the customer is fine with having to re-compile the application if he wants to change the value in the switch. A very simple example could be:
__global__ void mykernel(const float* in, float *out)
{
for ( /* many many times */ )
if (compile_time_switch)
do_this(in, out);
else
do_that(in, out);
}
Assume that do_this and do_that are compute-bound and very cheap, that I optimize the for loop so that its overhead is negligible, that I have to place the if inside the iteration. If the compiler recognizes that compile_time_switch is static information it can optimize out the call to the "wrong" function and create code that is just as optimized as if the if weren't there. Now the real question:
In which ways can I provide the compiler with the static value of this switch? I see two such ways, listed below, but none of them work for me. What other possibilities remain?
Template parameters
Providing a template parameter enables this static optimization.
template<int compile_time_switch>
__global__ void mykernel(const float* in, float *out)
{
for ( /* many many times */ )
if (compile_time_switch)
do_this(in, out);
else
do_that(in, out);
}
This simple solution does not work for me, since I don't have direct access to the code that calls the kernel.
Static members
Consider the following struct:
struct GlobalParameters
{
static const bool compile_time_switch = true;
};
Now GlobalParameters::compile_time_switch contains the static information as I want it, and that compiler would be able to optimize the kernel. Unfortunately, CUDA does not support such static members.
EDIT: the last statement is apparently wrong. the definition of the struct is of course legit and you are able to use the static member GlobalParameters::compile_time_switch in device code. The compiler inlines the variable, so that the final code will directly contain the value, not a run-time variable access, which is the behavior you would expect from an optimizer compiler. So, the second options is actually suitable.
I consider my problem solved both thanks to this fact and to kronos' answer. However, I'm still looking for other alternative methods to provide compile-time information to the compiler.
Yor third options are preprocessor definitions:
#define compile_time_switch 1
__global__ void mykernel(const float* in, float *out)
{
for ( /* many many times */ )
if (compile_time_switch)
do_this(in, out);
else
do_that(in, out);
}
The preprocessor will discard the else case compleatly and the compiler has nothing to optimize in his dead code elemination pass, because there is no dead code.
Furthermore, you can specify the definition with the -D comand line switch and (I think) any by nvidia supported compiler will accept -D (msvc may use a different switch).

How to pass a struct parameter using TCOM in Tcl

I've inherited a piece of custom test equipment with a control library built in a COM object, and I'm trying to connect it to our Tcl test script library. I can connect to the DLL using TCOM, and do some simple control operations with single int parameters. However, certain features are controlled by passing in a C/C++ struct that contains the control blocks, and attempting to use them in TCOM is giving me an error 0x80020005 {Type mismatch.}. The struct is defined in the .idl file, so it's available to TCOM to use.
The simplest example is a particular call as follows:
C++ .idl file:
struct SourceScaleRange
{
float MinVoltage;
float MaxVoltage;
};
interface IAnalogIn : IDispatch{
...
[id(4), helpstring("method GetAdcScaleRange")] HRESULT GetAdcScaleRange(
[out] struct SourceScaleRange *scaleRange);
...
}
Tcl wrapper:
::tcom::import [file join $::libDir "PulseMeas.tlb"] ::char
set ::characterizer(AnalogIn) [::char::AnalogIn]
set scaleRange ""
set response [$::characterizer(AnalogIn) GetAdcScaleRange scaleRange]
Resulting error:
0x80020005 {Type mismatch.}
while executing
"$::characterizer(AnalogIn) GetAdcScaleRange scaleRange"
(procedure "charGetAdcScaleRange" line 4)
When I dump TCOM's methods, it knows of the name of the struct, at least, but it seems to have dropped the struct keyword. Some introspection code
set ifhandle [::tcom::info interface $::characterizer(AnalogIn)]
puts "methods: [$ifhandle methods]"
returns
methods: ... {4 VOID GetAdcScaleRange {{out {SourceScaleRange *} scaleRange}}} ...
I don't know if this is meaningful or not.
At this point, I'd be happy to get any ideas on where to look next. Is this a known TCOM limitation (undocumented, but known)? Is there a way to pre-process the parameter into an appropriate format using tcom? Do I need to force it into a correctly sized block of memory via binary format by manual construction? Do I need to take the DLL back to the original developer and have him pull out all the struct parameters? (Not likely to happen, in this reality.) Any input is good input.

How do I link with FreePascal a NASM program calling a DLL?

Problem
I have a function "bob" written in assembler (nasm), which makes use of functions in kernel32.dll. And I have a program in FreePascal, that calls "bob".
I use nasm with:
nasm -fwin32 bob.asm
In FreePascal I declare:
{$link bob.obj}
function bob(s:pchar):longint; stdcall; external name 'bob';
But I get an error when I compile with fpc, telling it doesn't find GetStdHandle and WriteConsoleA (without #n suffix), which are declared extern in bob.asm. I would like to tell fpc to look for them in kernel32.dll, or in an adequate import library.
However, when I use the same function in pure assembly program, it works fine with nasm and golink. And when I don't call DLL functions, I can link with FreePascal with no trouble.
How can I link kernel32 functions with FreePascal, so that assembly functions "see" them ?
A Solution
Given by BeniBela. I change names so that things are easy to follow.
program dlltest;
function WindowsGetStdHandle(n: longint): longint; stdcall;
external 'kernel32.dll' name 'GetStdHandle';
{$asmmode intel}
procedure WrapperGetStdHandle; assembler; public name 'AliasGetStdHandle';
asm
jmp WindowsGetStdHandle
end;
{$link myget.obj}
function AsmGetStdHandle(n: longint): longint; stdcall;
external name 'gethandle';
const STDOUT = -11;
begin
writeln(AsmGetStdHandle(STDOUT));
writeln(WindowsGetStdHandle(STDOUT));
end.
With this in assembly, in myget.asm:
section .text
extern AliasGetStdHandle
global gethandle
gethandle:
mov eax, [esp+4]
push eax
call AliasGetStdHandle
ret 4
WindowsGetStdHandle is another name for GetStdHandle in kernel32.dll.
WrapperGetStdHandle only jump to the preceding, it's here for the alias or public name capability : we give it the name AliasGetStdHandle for external objects. This is the important part, the function get visible to the assembly program.
AsmGetStdHandle is the name in FreePascal of the assembly function gethandle. It calls WrapperStdHandle (nicknamed AliasGetStdHandle), which jumps to WindowsGetStdHandle, the DLL function.
And we are done, now the assembly program can be linked, without changing anything in it. All the renaming machinery is done in the pascal program calling it.
The only drawback: the need for a wrapper function, but it's not overpriced for a fine control of names.
Another solution
If kernel32.dll is not specified in declaration of WindowsGetStdHandle, but with {$linklib kernel32}, then the symbol gets visible in object files linked in the pascal program. However, it seems the $linklib directive alone is not enough, one still has to declare in pascal some function refering to it
program dlltest;
{$linklib kernel32}
function WindowsGetStdHandle(n: longint): longint; stdcall;
external name 'GetStdHandle';
{$link myget.obj}
function AsmGetStdHandle(n: longint): longint; stdcall;
external name 'gethandle';
const STDOUT = -11;
begin
writeln(AsmGetStdHandle(STDOUT));
writeln(WindowsGetStdHandle(STDOUT));
end.
With the following assembly program. AliasGetStdHandle is replaced with GetStdHandle, which now points directly to kernel32 function.
section .text
extern GetStdHandle
global gethandle
gethandle:
mov eax, [esp+4]
push eax
call GetStdHandle
ret 4
But this only works when using the external linker (gnu ld), with command
fpc -Xe dlltest.pas
When omitting opton '-Xe', fpc gives the following error
Free Pascal Compiler version 2.6.0 [2011/12/25] for i386
Copyright (c) 1993-2011 by Florian Klaempfl and others
Target OS: Win32 for i386
Compiling dlltest.pas
Linking dlltest.exe
dlltest.pas(17,1) Error: Asm: Duplicate label __imp_dir_kernel32.dll
dlltest.pas(17,1) Error: Asm: Duplicate label __imp_names_kernel32.dll
dlltest.pas(17,1) Error: Asm: Duplicate label __imp_fixup_kernel32.dll
dlltest.pas(17,1) Error: Asm: Duplicate label __imp_dll_kernel32.dll
dlltest.pas(17,1) Error: Asm: Duplicate label __imp_names_end_kernel32.dll
dlltest.pas(17,1) Error: Asm: Duplicate label __imp_fixup_end_kernel32.dll
dlltest.pas(17,1) Fatal: There were 6 errors compiling module, stopping
Fatal: Compilation aborted
I do not know how to fix the linking issue directly, but you could declare public wrapper functions that export these functions from the Pascal source.
E.g.:
{$ASMMODE INTEL}
procedure WrapperGetStdHandle; assembler; public; alias: '_GetStdHandle#4';
asm jmp GetStdHandle end;
procedure WrapperWriteConsoleA; assembler; public; alias: '_WriteConsoleA#20';
asm jmp WriteConsoleA end;
I suspect there is some import library automatically linked by nasm them for use with nasm code, and probably you need to link the relevant stubs from that library too.
amended:
It might be a problem with smart linking. As said FPC generates import stubs on the fly, but only when needed. Because the Windows unit (that holds all core WINAPI calls) is so large, smart linking (only adding what you use) is activated for it. (there are other reasons too)
The NASM originated obj is outside FPC's control, so the relevant functions are not generated for it.
If that is the case, BeniBela's code might work because it forces a reference from FPC code, linking in the symbols. This is speculation though, it might be something with the decoration too, or something with the leading underscore.
Testing that is simple, use the functions from pascal code without the declarations from Benibela.
Btw, FPC's default is NOT stdcall, so BenBela's functions should probably get a stdcall modifier

Statement goto can not cross variable definition?

Suppose these code compiled in g++:
#include <stdlib.h>
int main() {
int a =0;
goto exit;
int *b = NULL;
exit:
return 0;
}
g++ will throw errors:
goto_test.c:10:1: error: jump to label ‘exit’ [-fpermissive]
goto_test.c:6:10: error: from here [-fpermissive]
goto_test.c:8:10: error: crosses initialization of ‘int* b’
It seems like that the goto can not cross pointer definition, but gcc compiles them ok, nothing complained.
After fixed the error, we must declare all the pointers before any of the goto statement, that is to say you must declare these pointers even though you do not need them at the present (and violation with some principles).
What the origin design consideration that g++ forbidden the useful tail-goto statement?
Update:
goto can cross variable (any type of variable, not limited to pointer) declaration, but except those that got a initialize value. If we remove the NULL assignment above, g++ keep silent now. So if you want to declare variables that between goto-cross-area, do not initialize them (and still violate some principles).
Goto can't skip over initializations of variables, because the respective objects would not exist after the jump, since lifetime of object with non-trivial initialization starts when that initialization is executed:
C++11 §3.8/1:
[…] The lifetime of an object of type T begins when:
storage with the proper alignment and size for type T is obtained, and
if the object has non-trivial initialization, its initialization is complete.
C++11 §6.7/3:
It is possible to transfer into a block, but not in a way that bypasses declarations with initialization. A
program that jumps from a point where a variable with automatic storage duration is not in scope to a
point where it is in scope is ill-formed unless the variable has scalar type, class type with a trivial default
constructor and a trivial destructor, a cv-qualified version of one of these types, or an array of one of the
preceding types and is declared without an initializer (8.5).
Since the error mentions [-fpermissive], you can turn it to warning by specifying that compiler flag. This indicates two things. That it used to be allowed (the variable would exist, but be uninitialized after the jump) and that gcc developers believe the specification forbids it.
The compiler only checks whether the variable should be initialized, not whether it's used, otherwise the results would be rather inconsistent. But if you don't need the variable anymore, you can end it's lifetime yourself, making the "tail-goto" viable:
int main() {
int a =0;
goto exit;
{
int *b = NULL;
}
exit:
return 0;
}
is perfectly valid.
On a side-note, the file has extension .c, which suggests it is C and not C++. If you compile it with gcc instead of g++, the original version should compile, because C does not have that restriction (it only has the restriction for variable-length arrays—which don't exist in C++ at all).
There is an easy work-around for those primitive types like int:
// --- original form, subject to cross initialization error. ---
// int foo = 0;
// --- work-around form: no more cross initialization error. ---
int foo; foo = 0;