I need help translating this basic ACC pragma to OMP - gpu

My question is: I'm trying to make a CUDA function call (cublasDgemm) and I'm getting an error because I'm accessing addresses that should be unnaccessible.
I think it is because the CUBLAS function isn't using the device variables, but the host ones.
I've seen that in OpenACC, you would use this:
#pragma acc host_data use_device(list of variables) {
(call to CUBLAS function)
}
host_data makes you capable of bringing device's variable's addresses to the host, and use_device makes whatever is inside the braces {} use the variables in the device, not in the host. It can be consulted in more detail here -> https://www.openacc.org/sites/default/files/inline-files/OpenACC_2_0_specification.pdf
So, is there a way to replicate this in OpenMP? Do I have to do this? How do I make sure that the CUBLAS call is using the variables of the device?

Try:
#pragma omp target data use_device_ptr(list of variables)
{
call to cuda(vars)
}
See slide 27 of: https://on-demand.gputechconf.com/gtc/2018/presentation/s8344-openmp-on-gpus-first-experiences-and-best-practices.pdf

Related

How to use ModelicaError() function in a .dll (Dymola)

I have a Modelica external C function that calls a function that is in a .dll.
In the C function in the .dll I would like to make use of the ModelicaError() function. However when
#include ModelicaUtilities.h is included a number of errors occur.
What is the correct method for doing this?
I take it I'll need to link against an existing Dymola .lib, which one? What should DYMOLA_STATIC be defined as?
Or should I be compiling the .dll in such a way that these missing functions will be available after compilation with the model?
Any insight into this would be great, Thanks
From all what I know it is currently not possible in a tool-independent way to have shared objects (DLLs on Win) depending on ModelicaError (or any other functions of ModelicaUtilities). See https://github.com/modelica/ModelicaSpecification/issues/2191 for the open issue on the Modelica Language specification.
To use ModelicaError function in a dll you send the a pointer to the ModelicaError function. To do this from Dymola create a wrapper function that passes the pointer to the ModelicaError function to the dll function. For example MathLibraryWrapper:
#pragma once
#include "MathLibrary.h"
int fibonacci_next_int_wrap()
{
return fibonacci_next_int(&ModelicaError);
}
This calls the fibonacci_next_int function which is in MathLibary.cpp in the dll. This is modified to accept a pointer to the ModelicaError function.
int fibonacci_next_int(void(*mError)(const char *))
{
(*mError)("broken");
return (int)fibonacci_next();
}
If this is run it will immediately crash with "broken".

Code sharing between multiple independently compiled binaries/hex files

I'm looking for documentation/information on how to share information/code between multiple binaries compiled for a Cortex-m/0/4/7 architectures. The two binaries will be on the same chip and same architecture. They are flashed at different locations and sets the main stack pointer and resets the program counter so that one binary "jumps" to the other binary. I want to share code between these two binaries.
I've done a simple copy of an array of function pointers into a section defined in the linker script into RAM. Then read the RAM out in the other binary and cast it to an array then use the index to call functions in the other binary. This does work as a Proof-of-concept, but I think what I'm looking for is a bit more complex. As I want some way of describing compatibility between the two binaries. I want some what the functionality of shared libraries, but I'm unsure if I need position independent code.
As an example how the current copy process is done it is basically:
Source binary:
void copy_func()
{
memncpy(array_of_function_pointers, fixed_size, address_custom_ram_section)
}
Binary which is jumped too from source binary:
array_fp_type get_funcs()
{
memncpy(adress_custom_ram_section, fixed_size, array_of_fp)
return array_of_fp;
}
Then I can use the array_of_fp to call into functions residing in the source binary from the jump binary.
So what I'm looking for is some resources or input for someone who have implemented a similar system. Like I would like to not have to have a custom RAM section where I'm copying the function pointers into.
I would be fine with having the compilation step of source binary outputting something which can be included into the compilation step of the jump binary. However it needs to be reproducible and recompiling the source binary shouldn't break the compatibility with the jump binary(even if it included a different file from what is now outputted) as long as you don't change the interface.
To clarify source binary shouldn't require any specific knowledge about the jump binary. The code should not reside in both binaries as this would defeat the purpose of this mechanism. The overall goal if this mechanism is a way to save space when creating multi-binary applications on cortex-m processors.
Any ideas or links to resources are welcome. If you have any more questions feel free to comment on the question and I'll try to answer it.
Its very hard for me to picture what you want to do, but if you're interested in having an application link against your bootloader/ROM, then see Loading symbol file while linking for a hint on what you could do.
Build your "source"(?) image, scrape its mapfile and make a symbol file, then use that when you link your "jump"(?) image.
This does mean you need to link your "jump" image against a specific version of your "source" image.
If you need them to be semi-version independent (i.e. you define a set of functions that get exported, but you can rebuild on either side), then you need to export function pointers at known locations in your "source" image and link against those function pointers in your "jump" image. You can simplify the bookkeeping by making a structure of function pointers access the functions through that on either side.
For example:
shared_functions.h:
struct FunctionPointerTable
{
void(*function1)(int);
void(*function2)(char);
};
extern struct FunctionPointerTable sharedFunctions;
Source file in "source" image:
void function1Implementation(int a)
{
printf("You sent me an integer: %d\r\n", a);
function2Implementation((char)(a%256))
sharedFunctions.function2((char)(a%256));
}
void function2Implementation(char b)
{
printf("You sent me an char: %c\r\n", b);
}
struct FunctionPointerTable sharedFunctions =
{
function1Implementation,
function2Implementation,
};
Source file in "jump" image:
#include "shared_functions.h"
sharedFunctions.function1(1024);
sharedFunctions.function2(100);
When you compile/link the "source", take its mapfile and extract the location of sharedFunctions and create a symbol file that is linked with the source the "jump" image.
Note: the printfs (or anything directly called by the shared functions) would come from the "source" image (and not the "jump" image).
If you need them to come from the "jump" image (or be overridable) , then you need to access them through the same function pointer table, and the "jump" image needs to fix the function pointer table up with its version of the relevant function. I updated the function1() to show this. The direct call to function2 will always be the "source" version. The shared function call version of it will go through the jump table and call the "source" version unless the "jump" image updates the function table to point to its implementation.
You CAN get away from the structure, but then you need to export the function pointers one by one (not a big problem), but you want to keep them in order and at a fixed location, which means explicitly putting them in the linker descriptor file, etc. etc. I showed the structure method to distill it down to the easiest example.
As you can see, things get pretty hairy, and there is some penalty (calling through the function pointer is slower because you need to load up the address to jump to)
As explained in comment, we could imagine an application and a bootloader relying on same dynamic library. So application and bootloader rely on library, application can be changed without impact on library or boot.
I did not find an easy way to do a shared library with arm-none-eabi-gcc. However
this document gives some alternatives to shared libraries. I your case, I would recommand the jump table solution.
Write a library with the functions that need to be used in bootloader and in applicative.
"library" code
typedef void (*genericFunctionPointer)(void)
// use the linker script to set MySection at a known address
// I think this could be a structure like Russ Schultz solution but struct may or may not compile identically in lib and boot. However yes struct would be much easyer and avoiding many function pointer cast.
const genericFunctionPointer FpointerArray[] __attribute__ ((section ("MySection")))=
{
(genericFunctionPointer)lib_f1,
(genericFunctionPointer)lib_f2,
}
void lib_f1(void)
{
//some code
}
uint8_t lib_f2(uint8_t param)
{
//some code
}
applicative and/or bootloader code
typedef void (*genericFunctionPointer)(void)
// Use the linker script to set MySection at same address as library was compiled
// in linker script also put this section as `NOLOAD` because it is init by library and not by our code
//volatile is needed here because you read in flash memory and compiler may initialyse usage of this array to NULL pointers
volatile const genericFunctionPointer FpointerArray[NB_F] __attribute__ ((section ("MySection")));
enum
{
lib_f1,
lib_f2,
NB_F,
}
int main(void)
{
(correctCastF1)(FpointerArray[lib_f1])();
uint8_t a = (correctCastF2)(FpointerArray[lib_f2])(10);
}
You can look into using linker sections. If you have your bootloader source code in folder bootloader, you can use
SECTIONS
{
.bootloader:
{
build_output/bootloader/*.o(.text)
} >flash_region1
.binary1:
{
build_output/binary1/*.o(.text)
} >flash_region2
.binary2:
{
build_output/binary2/*.o(.text)
} >flash_region3
}

Using system symbol table from VxWorks RTP

I have an existing project, originally implemented as a Vxworks 5.5 style kernel module.
This project creates many tasks that act as a "host" to run external code. We do something like this:
void loadAndRun(char* file, char* function)
{
//load the module
int fd = open (file, O_RDONLY,0644);
loadModule(fdx, LOAD_ALL_SYMBOLS);
SYM_TYPE type;
FUNCPTR func;
symFindByName(sysSymTbl, &function , (char**) &func, &type);
while (true)
{
func();
}
}
This all works a dream, however, the functions that get called are non-reentrant, with global data all over the place etc. We have a new requirement to be able to run multiple instances of these external modules, and my obvious first thought is to use vxworks RTP to provide memory isolation.
However, no matter what I try, I cannot persuade my new RTP project to compile and link.
error: 'sysSymTbl' undeclared (first use in this function)
If I add the correct include:
#include <sysSymTbl.h>
I get:
error: sysSymTbl.h: No such file or directory
and if i just define it extern:
extern SYMTAB_ID sysSymTbl;
i get:
error: undefined reference to `sysSymTbl'
I havent even begun to start trying to stitch in the actual module load code, at the moment I just want to get the symbol lookup working.
So, is the system symbol table accessible from VxWorks RTP applications? Can moduleLoad be used?
EDIT
It appears that what I am trying to do is covered by the Application Programmers Guide in the section on Plugins (section 4.9 for V6.8) (thanks #nos), which is to use dlopen() etc. Like this:
void * hdl= dlopen("pathname",RTLD_NOW);
FUNCPTR func = dlsym(hdl,"FunctionName");
func();
However, i still end up in linker-hell, even when i specify -Xbind-lazy -non-static to the compiler.
undefined reference to `_rtld_dlopen'
undefined reference to `_rtld_dlsym'
The problem here was that the documentation says to specify -Xbind-lazy and -non-static as compiler options. However, these should actually be added to the linker options.
libc.so.1 for the appropriate build target is then required on the target to satisfy the run-time link requirements.

IAR initializer function placement

Does anybody know how to deal with the following problem:
I have an IAR Embedded workbench. The project is using the SDRAM for running it's code and Flash ROM too. The code for SDRAM is loaded from SD Card. However, in SDRAM there are also some data stored, like global or static variables. Some of them have to be initialized. The initialization step, the iar_data_init3 function call, goes after the low_level_init function. So the problem is that for initialization of some of the variables in SDRAM, the initializer function is called from iar_data_init3, the code of which is inside of the SDRAM itself. Which is wrong because the loading of SDRAM code from SD Card is not yet done.
I have tried manual initialization as described in the C/C++ development guide, but this didn't help.
The function which is called is __sti__routine, which provides initialization of variables. All of these functions are generated by IAR. Is there any way to tell the linker to put the initializer functions to Flash ROM?
EDIT 1:
Here is information from IAR manual for C/C++.
It is an example of how to use manual initialization.
In the linker config file:
initialize manually { section MYSECTION };
Then IAR documentation says:
you can use this source code example to initialize the section:
#pragma section = "MYSECTION"
#pragma section = "MYSECTION_init"
void DoInit()
{
char * from = __section_begin("MYSECTION_init");
char * to = __section_begin("MYSECTION");
memcpy(to, from, __section_size("MYSECTION"));
}
I can't understand however, first of all,
what is the difference between
MYSECTION_init and MYSECTION.
Aslo, if I have a global variable:
SomeClass myclass;
And it should be placed in SDRAM,
then how does the initialization is done for it? I want to manually initialize the variable,
and place that initializing functions to flash ROM. (the problem is that by placing variable to SDRAM it's initializing function also is placed to SDRAM).
You can specify the location of variables and functions through the use of pragma preprocessor directives. You will need to use either one of the predefined sections or define your own.
You don't mention the specific flavor of IAR you're using. The following is from the Renesas IAR Compiler Reference Guide but you should check the proper reference guide to make sure that the syntax is exactly the same and to learn what the predefined sections are.
Use the # operator or the #pragma location directive to place
groups of functions or global and static variables in named segments,
without having explicit control of each object. The variables must be
declared either __no_init or const. The segments can, for
example, be placed in specific areas of memory, or initialized or
copied in controlled ways using the segment begin and end operators.
This is also useful if you want an interface between separately
linked units, for example an application project and a boot loader
project. Use named segments when absolute control over the placement
of individual variables is not needed, or not useful.
Examples of placing functions in named segments
void f(void) # "FUNCTIONS";
void g(void) # "FUNCTIONS"
{
}
#pragma location="FUNCTIONS"
void h(void);
To override the default segment allocation, you can explicitly specify
a memory attribute other than the default:
__code32 void f(void) # "FUNCTIONS";
Edit
Based on your comments you should have a linker file named generic_cortex.icf that defines your memory regions. In it should be instructions somewhat similar to the following:
/* Define the addressable memory */
define memory Mem with size = 4G;
/* Define a region named SDCARD with start address 0xA0000000 and to be 256 Mbytes large */
define region SDCARD = Mem:[from 0xA0000000 size 0xFFFFFFF ];
/* Define a region named SDRAM with start address 0xB0000000 and to be 256 Mbytes large */
define region SDRAM = Mem:[from 0xB0000000 size 0xFFFFFFF ];
/* Place sections named MyCardStuff in the SDCARD region */
place in SDCARD {section MyCardStuff };
/* Place sections named MyRAMStuff in the SDRAM region */
place in SDRAM {section MyRAMStuff };
/* Override default copy initialization for named section */
initialize manually { section MyRAMStuff };
The actual names, addresses and sizes will be different but should look similar. I'm just using the full size of the first two dynamic memory areas from the datasheet. What's happening here is you are assigning names to address space for the different types of memory (i.e. your SD Card and SDRAM) so that sections named during the compile will be placed in the correct locations by the linker.
So first you must define the address space with define memory:
The maximum size of possible addressable memories
The define memory directive defines a memory space with a given size,
which is the maximum possible amount of addressable memory, not
necessarily physically available.
Then tell it which chips go where with define region:
Available physical memory
The define region directive defines a region in the available memories
in which specific sections of application code and sections of
application data can be placed.
Next the linker needs to know in what region to place the named section with place in:
Placing sections in regions
The place at and place into directives place sets of sections with
similar attributes into previously defined regions.
And tell the linker you want to override part of it's initialization with initialize manually:
Initializing the application
The directives initialize and do not initialize control how the
application should be started. With these directives, the application
can initialize global symbols at startup, and copy pieces of code.
Finally, in your C file, tell the compiler what goes into what sections and how to initialize sections declared manually.
SomeClass myClass # "MyCardStuff";
#pragma section = "MyCardStuff"
#pragma section = "MySDRAMStuff"
void DoInit()
{
/* Copy your code and variables from your SD Card into SDRAM */
char * from = __section_begin("MyCardStuff");
char * to = __section_begin("MySDRAMStuff");
memcpy(to, from, __section_size("MySDRAMStuff"));
/* Initialize your variables */
myClass.init();
}
In order to customize startup initialization among multiple different memory devices, you will need to study the IAR Development Guide for ARM very carefully. Also try turning on the --log initialization option and studying the logs and the map files to make sure you are getting what you want.

How can I grab single key hit in D Programming Language + Tango?

I read this article and try to do the exercise in D Programming Language, but encounter a problem in the first exercise.
(1) Display series of numbers
(1,2,3,4, 5....etc) in an infinite
loop. The program should quit if
someone hits a specific key (Say
ESCAPE key).
Of course the infinite loop is not a big problem, but the rest is. How could I grab a key hit in D/Tango? In tango FAQ it says use C function kbhit() or get(), but as I know, these are not in C standard library, and does not exist in glibc which come with my Linux machine which I use to programming.
I know I can use some 3rd party library like ncurses, but it has same problem just like kbhit() or get(), it is not standard library in C or D and not pre-installed on Windows. What I hope is that I could done this exercise use just D/Tango and could run it on both Linux and Windows machine.
How could I do it?
Here's how you do it in the D programming language:
import std.c.stdio;
import std.c.linux.termios;
termios ostate; /* saved tty state */
termios nstate; /* values for editor mode */
// Open stdin in raw mode
/* Adjust output channel */
tcgetattr(1, &ostate); /* save old state */
tcgetattr(1, &nstate); /* get base of new state */
cfmakeraw(&nstate);
tcsetattr(1, TCSADRAIN, &nstate); /* set mode */
// Read characters in raw mode
c = fgetc(stdin);
// Close
tcsetattr(1, TCSADRAIN, &ostate); // return to original mode
kbhit is indeed not part of any standard C interfaces, but can be found in conio.h.
However, you should be able to use getc/getchar from tango.stdc.stdio - I changed the FAQ you mention to reflect this.
D generally has all the C stdlib available (Tango or Phobos) so answers to this question for GNU C should work in D as well.
If tango doesn't have the needed function, generating the bindings is easy. (Take a look at CPP to cut through any macro junk.)
Thanks for both of your replies.
Unfortunately, my main development environment is Linux + GDC + Tango, so I don't have conio.h, since I don't use DMC as my C compiler.
And I also found both getc() and getchar() is also line buffered in my development environment, so it could not achieve what I wish I could do.
In the end, I've done this exercise by using GNU ncurses library. Since D could interface C library directly, so it does not take much effort. I just declare the function prototype that I used in my program, call these function and linking my program against ncurses library directly.
It works perfectly on my Linux machine, but I still not figure out how could I do this without any 3rd party library and could run on both Linux and Windows yet.
import tango.io.Stdout;
import tango.core.Thread;
// Prototype for used ncurses library function.
extern(C)
{
void * initscr();
int cbreak ();
int getch();
int endwin();
int noecho();
}
// A keyboard handler to quit the program when user hit ESC key.
void keyboardHandler ()
{
initscr();
cbreak();
noecho();
while (getch() != 27) {
}
endwin();
}
// Main Program
void main ()
{
Thread handler = new Thread (&keyboardHandler);
handler.start();
for (int i = 0; ; i++) {
Stdout.format ("{}\r\n", i).flush;
// If keyboardHandler is not ruuning, it means user hits
// ESC key, so we break the infinite loop.
if (handler.isRunning == false) {
break;
}
}
return 0;
}
As Lars pointed out, you can use _kbhit and _getch defined in conio.h and implemented in (I believe) msvcrt for Windows. Here's an article with C++ code for using _kbhit and _getch.