Input setting using Registers - embedded

I have a simple c program for printing n Fibonacci numbers and I would like to compile it to ELF object file. Instead of setting the number of fibonacci numbers (n) directly in my c code, I would like to set them in the registers since I am simulating it for an ARM processor.How can I do that?
Here is the code snippet
#include <stdio.h>
#include <stdlib.h>
#define ITERATIONS 3
static float fib(float i) {
return (i>1) ? fib(i-1) + fib(i-2) : i;
}
int main(int argc, char **argv) {
float i;
printf("starting...\n");
for(i=0; i<ITERATIONS; i++) {
printf("fib(%f) = %f\n", i, fib(i));
}
printf("finishing...\n");
return 0;
}
I would like to set the ITERATIONS counter in my Registers rather than in the code.
Thanks in advance

The register keyword can be used to suggest to the compiler that it uses a registers for the iterator and the number of iterations:
register float i;
register int numIterations = ITERATIONS;
but that will not help much. First of all, the compiler may or may not use your suggestion. Next, values will still need to be placed on the stack for the call to fib(), and, finally, depending on what functions you call within your loop, code in the procedure are calling could save your register contents in the stack frame at procedure entry, and restore them as part of the code implementing the procedure return.
If you really need to make every instruction count, then you will need to write machine code (using an assembly language). That way, you have direct control over your register usage. Assembly language programming is not for the faint of heart. Assembly language development is several times slower than using higher level languages, your risk of inserting bugs is greater, and they are much more difficult to track down. High level languages were developed for a reason, and the C language was developed to help write Unix. The minicomputers that ran the first Unix systems were extremely slow, but the reason C was used instead of assembly was that even then, it was more important to have code that took less time to code, had fewer bugs, and was easier to debug than assembler.
If you want to try this, here are the answers to a previous question on stackoverflow about resources for ARM programming that might be helpful.
One tactic you might take is to isolate your performance-critical code into a procedure, write the procedure in C, the capture the generated assembly language representation. Then rewrite the assembler to be more efficient. Test thoroughly, and get at least one other set of eyeballs to look the resulting code over.
Good Luck!

Make ITERATIONS a variable rather than a literal constant, then you can set its value directly in your debugger/simulator's watch or locals window just before the loop executes.
Alternatively as it appears you have stdio support, why not just accept the value via console input?

Related

Error: No operator "=" matches these operands in "Servo_Project.cpp", Line: 15, Col: 22

So I tried using code from another post around here to see if I could use it, it was a code meant to utilize a potentiometer to move a servo motor, but when I attempted to compile it is gave the error above saying No operator "=" matches these operands in "Servo_Project.cpp". How do I go about fixing this error?
Just in case ill say this, the boards I was trying to compile the code were a NUCLEO-L476RG, the board from the post I mentioned utilized Nucleo L496ZG board and a Tower Pro Micro Servo 9G.
#include "mbed.h"
#include "Servo.h"
Servo myservo(D6);
AnalogOut MyPot(A0);
int main() {
float PotReading;
PotReading = MyPot.read();
while(1) {
for(int i=0; i<100; i++) {
myservo = (i/100);
wait(0.01);
}
}
}
This line:
myservo = (i/100);
Is wrong in a couple of ways. First, i/100 will always be zero - integer division truncates in C++. Second, there's not an = operator that allows an integer value to be assigned to a Servo object. YOu need to invoke some kind of Servo method instead, likely write().
myservo.write(SOMETHING);
The SOMETHING should be the position or speed of the servo you're trying to get working. See the Servo class reference for an explanation. Your code tries to use fractions from 0-1 and thatvisn't going to work - the Servo wants a position/speed between 0 and 180.
You should look in the Servo.h header to see what member functions and operators are implemented.
Assuming what you are using is this, it does have:
Servo& operator= (float percent);
Although note that the parameter is float and you are passing an int (the parameter is also in the range 0.0 to 1.0 - so not "percent" as its name suggests - so be wary, both the documentation and the naming are poor). You should have:
myservo = i/100.0f;
However, even though i / 100 would produce zero for all i in the loop, that does not explain the error, since an implicit cast should be possible - even if clearly undesirable. You should look in the actual header you are using to see if the operator= is declared - possibly you have the wrong file or a different version or just an entirely different implementation that happens to use teh same name.
I also notice that if you look in the header, there is no documentation mark-up for this function and the Servo& operator= (Servo& rhs); member is not documented at all - hence the confusing automatically generated "Shorthand for the write and read functions." on the Servo doc page when the function shown is only one of those things. It is possible it has been removed from your version.
Given that the documentation is incomplete and that the operator= looks like an after thought, the simplest solution is to use the read() / write() members directly in any case. Or implement your own Servo class - it appears to be only a thin wrapper/facade of the PwmOut class in any case. Since that is actually part of mbed rather than user contributed code of unknown quality, you may be on firmer ground.

What is the order of local variables on the stack?

I'm currently trying to do some tests with the buffer overflow vulnerability.
Here is the vulnerable code
void win()
{
printf("code flow successfully changed\n");
}
int main(int argc, char **argv)
{
volatile int (*fp)();
char buffer[64];
fp = 0;
gets(buffer);
if(fp) {
printf("calling function pointer, jumping to 0x%08x\n", fp);
fp();
}
}
The exploit is quite sample and very basic: all what I need here is to overflow the buffer and override the fp value to make it hold the address of win() function.
While trying to debug the program, I figured out that fb is placed below the buffer (i.e with a lower address in memory), and thus I am not able to modify its value.
I thought that once we declare a local variable x before y, x will be higher in memory (i.e at the bottom of the stack) so x can override y if it exceeds its boundaries which is not the case here.
I'm compiling the program with gcc gcc version 5.2.1, no special flags (only tested -O0)
Any clue?
The order of local variable on the stack is unspecified.
It may change between different compilers, different versions or different optimization options. It may even depend on the names of the variables or other seemingly unrelated things.
The order of local variables on the stack is not defined until compile/link (build) time. I'm no expert certainly, but I think you'd have to do some sort of a "hex dump", or perhaps run the code in a debugger environment to find out where it's allocated. And I'd also guess that this would be a relative address, not an absolute one.
And as #RalfFriedl has indicated in his answer, the location could change under any number of compiler options invoked for different platforms, etc.
But we do know it's possible to do buffer overflows. Although you do hear less about them now due to defensive measures such as address space layout randomization (ASLR), they're still around and paying the bills for someone I'd guess. There are of course many many online articles on the subject; here's one that seems fairly current(https://www.synopsys.com/blogs/software-security/detect-prevent-and-mitigate-buffer-overflow-attacks/).
Good luck (should you even say that to someone practicing buffer overflow attacks?). At any rate, I hope you learn some things, and use it for good :)

Is inline PTX more efficient than C/C++ code?

I have noticed that PTX code allows for some instructions with complex semantics, such as bit field extract (bfe), find most-significant non-sign bit (bfind), and population count (popc).
Is it more efficient to use them explicitly rather than write code with their intended semantics in C/C++?
For example: "population count", or popc, means counting the one bits. So should I write:
__device__ int popc(int a) {
int d = 0;
while (a != 0) {
if (a & 0x1) d++;
a = a >> 1;
}
return d;
}
for that functionality, or should I, rather, use:
__device__ int popc(int a) {
int d;
asm("popc.u32 %1 %2;":"=r"(d): "r"(a));
return d;
}
? Will the inline PTX be more efficient? Should we write inline PTX to to get peak performance?
also - does GPU have some extra magic instruction corresponding to PTX instructions?
The compiler may identify what you're doing and use a fancy instruction to do it, or it may not. The only way to know in the general case is to look at the output of the compilation in ptx assembly, by using -ptx flag added to nvcc. If the compiler generates it for you, there is no need to hand-code the inline assembly yourself (or use an instrinsic).
Also, whether or not it makes a performance difference in the general case depends on whether or not the code path is used in a significant way, and on other factors such as the current performance limiters of your kernel (e.g. compute-bound or memory-bound).
A few more points in addition to #RobertCrovella's answer:
Even if you do use PTX for something - that should happen rarely. Limit it to small functions of no more than a few PTX lines - which you can then re-use for multiple purposes as you see fit, with most of your coding being in C/C++.
An example of this principle are the intrinsics #njuffa mentiond, in (that's not an official copy of the file I think). Please read it through to see which intrinsics are available to you. That doesn't mean you should use them all, of course.
For your specific example - you do want the PTX over the first version; it certainly won't do any harm. But, again, it is also an example of how you do not need to actually write PTX, since popc has a corresponding __popc intrinsic (again, as #njuffa has noted).
You might also want to have a look at the source code of some CUDA-based libraries to see what kind of PTX snippets they've chosen to use.

Calling functions from within function(float *VeryBigArray,long SizeofArray) from within objC method fails with EXC_BAD_ACCESS

Ok I finally found the problem. It was inside the C function(CarbonTuner2) not the objC method. I was creating inside the function an array of the same size as the file size so if the filesize was big it created a really big array and my guess is that when I called another function from there, the local variables were put on the stack which created the EXC_BAD_ACCESS. What I did then is instead of using a variable to declare to size of the array I put the number directly. Then the code didnt even compile. it knew. The error wassomething like: Array size too big. I guess working 20+hours in a row isnt good XD But I am definitly gonna look into tools other than step by step debuggin to figure these ones out. Thanks for your help. Here is the code. If you divide gFileByteCount by 2 you dont get the error anymore:
// ConverterController.h
# import <Cocoa/Cocoa.h>
# import "Converter.h"
#interface ConverterController : NSObject {
UInt64 gFileByteCount ;
}
-(IBAction)ProcessFile:(id)sender;
void CarbonTuner2(long numSampsToProcess, long fftFrameSize, long osamp);
#end
// ConverterController.m
# include "ConverterController.h"
#implementation ConverterController
-(IBAction)ProcessFile:(id)sender{
UInt32 packets = gTotalPacketCount;//alloc a buffer of memory to hold the data read from disk.
gFileByteCount=250000;
long LENGTH=(long)gFileByteCount;
CarbonTuner2(LENGTH,(long)8192/2, (long)4*2);
}
#end
void CarbonTuner2(long numSampsToProcess, long fftFrameSize, long osamp)
{
long numFrames = numSampsToProcess / fftFrameSize * osamp;
float g2DFFTworksp[numFrames+2][2 * fftFrameSize];
double hello=sin(2.345);
}
Your crash has nothing to do with incompatibilities between C and ObjC.
And as previous posters said, you don't need to include math.h.
Run your code under gdb, and see where the crash happens by using backtrace.
Are you sure you're not sending bad arguments to the math functions?
E.g. this causes BAD_ACCESS:
double t = cos(*(double *)NULL);
Objective C is built directly on C, and the C underpinnings can and do work.
For an example of using math.h and parts of standard library from within an Objective C module, see:
http://en.wikibooks.org/wiki/Objective-C_Programming/syntax
There are other examples around.
Some care is needed around passing the variables around; use the C variables for the C and standard library calls; don't mix the C data types and Objective C data types incautiously. You'll usually want a conversion here.
If that is not the case, then please consider posting the code involved, and the error(s) you are receiving.
And with all respect due to Mr Hellman's response, I've hit errors when I don't have the header files included; I prefer to include the headers. But then, I tend to dial the compiler diagnostics up a couple of notches, too.
For what it's worth, I don't include math.h in my Cocoa app but have no problem using math functions (in C).
For example, I use atan() and don't get compiler errors, or run time errors.
Can you try this without including math.h at all?
First, you should add your code to your question, rather than posting it as an answer, so people can see what you're asking about. Second, you've got all sorts of weird problems with your memory management here - gFileByteCount is used to size a bunch of buffers, but it's set to zero, and doesn't appear to get re-set anywhere.
err = AudioFileReadPackets (fileID,
false, &bytesReturned, NULL,0,
&packets,(Byte *)rawAudio);
So, at this point, you pass a zero-sized buffer to AudioFileReadPackets, which prompty overruns the heap, corrupting the value of who knows what other variables...
fRawAudio =
malloc(gFileByteCount/(BITS/8)*sizeof(fRawAudio));
Here's another, minor error - you want sizeof(*fRawAudio) here, since you're trying to allocate an array of floats, not an array of float pointers. Fortunately, those entities are the same size, so it doesn't matter.
You should probably start with some example code that you know works (SpeakHere?), and modify it. I suspect there are other similar problems in the code yoou posted, but I don't have time to find them right now. At least get the rawAudio buffer appropriately-sized and use the values returned from AudioFileReadPackets appropriately.

Measuring performance after Tail Call Optimization(TCO)

I have an idea about what it is. My question is :-
1.) If i program my code which is amenable to Tail Call optimization(Last statement in a function[recursive function] being a function call only, no other operation there) then do i need to set any optimization level so that compiler does TCO. In what mode of optimization will compiler perform TCO, optimizer for space or time.
2.) How do i find out which all compilers (MSVC, gcc, ARM-RVCT) does support TCO
3.) Assuming some compiler does TCO, we enable it then, What is the way to find out that the compielr has actually done TCO? Will Code size, tell it or Cycles taken to execute it will tell that or both?
-AD
Most compilers support TCO, it is a relatively old technique. As far as how to enable it with a specific compiler, check the documentation for your compilers. gcc will enable the optimization at every optimization level except -O1, I think the specific option for this is -foptimize-sibling-calls. As far as how to tell how/if the compiler is doing TCO, look at the assembler output (gcc -S for example) or disassemble the object code.
Optimization is Compiler specific. Consult the documentation for the various optimization flags for them
You will find that in the Compilers documentation too. If you are curious, you can write a tail recursive function and pass it a big argument, and lookout for a stack-overflow. (tho checking the generated assembler might be a better choice, if you understand the code generated.)
You just use the debugger, and look out the address of function arguments/local variables. If they increase/decrease on each logical frame that the debugger shows (or if it actually only shows one frame, even though you did several calls), you know whether TCO was done or wasn't done.
If you want your compiler to do tail call optimization, just check either
a) the doc of the compiler at which optimization level it will be performed or
b) check the asm, if the function will call itself (you dont even need big asm knowledge to spot the just the symbol of the function again)
If you really really want tail recursion my question would be:
Why dont you perform the tail call removal yourself? It means nothing else than removing the recursion, and if its removable then its not only possible by the compiler on low level but also on algorithmic level by you, that you can programm it direct into your code (it means nothing else than go for a loop instead of a call to yourself).
One way to determine if tail-call is happening is to see if you can force a stack overflow. The following program does not produce a stack overflow using VC++ 2005 Express Edition and, even though its results exceed the capacity of long double rather quickly, you can tell that all of the iterations are being processed when TCO is happening:
/* FibTail.c 0.00 UTF-8 dh:2008-11-23
* --|----1----|----2----|----3----|----4----|----5----|----6----|----*
*
* Demonstrate Fibonacci computation by tail call to see whether it is
* is eliminated through compiler optimization.
*/
#include <stdio.h>
long double fibcycle(long double f0, long double f1, unsigned i)
{ /* accumulate successive fib(n-i) values by tail calls */
if (i == 0) return f1;
return fibcycle(f1, f0+f1, --i);
}
long double fib(unsigned n)
{ /* the basic fib(n) setup and return. */
return fibcycle(1.0, 0.0, n);
}
int main(int argc, char* argv[ ])
{ /* compute some fibs until something breaks */
int i;
printf("\n i fib(i)\n\n");
for (i = 1; i > 0; i+=i)
{ /* Do for powers of 2 until i flips negative
or stack overflow, whichever comes first */
printf("%12d %30.20LG \n", i, fib((unsigned) i) );
}
printf("\n\n");
return 0;
}
Notice, however, that the simplifications to make a pure tail-call in fibcycle is tantamount to figuring out an interative version that doesn't do a tail-call at all (and will work with or without TCO in the compiler.
It might be interesting to experiment in order to see how well the TCO can find optimizations that are not already near-optimal and easily replaced by iterations.