CGAL 4.12's Lazy_exact_nt not exact? - cgal

The test function below works as expected with CGAL 4.9.1 but with CGAL 4.12 the computations are not exact. Any ideas what can cause the problem (more details below)?
#include <CGAL/Exact_predicates_exact_constructions_kernel.h>
typedef CGAL::Exact_predicates_exact_constructions_kernel K;
typedef K::FT dbl;
void test4()
{
// 1. Create the smallest possible double and verify
double smallDouble(1.0);
while(smallDouble/2.0>0) smallDouble/=2.0;
if(smallDouble/2.0==0) cout<<"can't divide smallDouble anymore, as expected"<<endl;
// 2. Make it a dbl (K::FT)
dbl a(smallDouble);
// 3. Let b be even smaller ( smaller than the smallest double )
dbl b(a/2.0);
// 4. Show interval and try fit_in_double
cout<<"a.approx()="<<a.approx()<<endl;
cout<<"b.approx()="<<b.approx()<<endl;
double d;
if(CGAL::internal::fit_in_double(b,d))
{
cout<<"Yes, b fits in double d: "<<d<<endl;
}
else
{
cout<<"Before b.exact(): b does not fit back into double (as expected)"<<endl;
}
// 5. Call exact and try fit_in_double again
cout<<"\nCalling exact()"<<endl;
b.exact();
cout<<"a.approx()="<<a.approx()<<endl;
cout<<"b.approx()="<<b.approx()<<endl;
if(CGAL::internal::fit_in_double(b,d))
{
cout<<"Yes, after exact() b fits in double d: "<<d<<" - Huh, not as expected!"<<endl;
}
else
{
cout<<"NOK after exact()"<<endl;
}
if(b<a) cout<<"b<a, as expected"<<endl;
else cout<<"b >= a, not expected"<<endl;
double c(to_double(b));
cout<<"c="<<c<<endl;
if(c==b) cout<<"c==b, not as expected"<<endl;
}
The output of CGAL4.9.1 is
can't divide smallDouble anymore, as expected
a.approx()=[4.94066e-324;4.94066e-324]
b.approx()=[0;4.94066e-324]
Before b.exact(): b does not fit back into double (as expected)
Calling exact()
a.approx()=[4.94066e-324;4.94066e-324]
b.approx()=[0;4.94066e-324]
NOK after exact()
b<a, as expected
c=0
The output of CGAL4.12 is
can't divide smallDouble anymore, as expected
a.approx()=[4.94066e-324;4.94066e-324]
b.approx()=[0;4.94066e-324]
Before b.exact(): b does not fit back into double (as expected)
Calling exact()
a.approx()=[4.94066e-324;4.94066e-324]
b.approx()=[4.94066e-324;4.94066e-324]
Yes, after exact() b fits in double d: 4.94066e-324 - Huh, not as expected!
b >= a, not expected
c=4.94066e-324
c==b, not as expected
Details: Ubuntu 18.04, gcc 7.3. I have used the script CGAL412/Scripts/scripts/cgal_create_cmake_script to create a CMakeLists.txt and thus the compiler options should be correct. CGAL4.12 has been compiled from source using
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/CGAL412 ../..
make
make install

This is a bug in CGAL. Unless you also use MPFR directly yourself, you can easily work around it for this specific value by calling mpfr_set_emin (-1073); at the beginning of your program. However, it may not solve all such issues (we are also missing a call to mpfr_subnormalize), for instance for Gmpq(DBL_TRUE_MIN)*3/2. The safest would be to use the old code. For this, locate a test #if MPFR_VERSION_MAJOR >= 3 in files Gmpq.h and mpq_class.h, and replace it with #if 0.
I filed this in CGAL's github so we don't forget to fix it.

Related

Approximation using gmp mpf_class

I am writing a UnitTest using Catch2.
I want to check if two vectors are equal. They look like the following using gmplib:
std::vector<mpf_class> result
Due to me 'faking' the expected_result vector, I get the following message after a failed test:
unittests/test.cpp:01: FAILED:
REQUIRE( actual_result == expected_result )
with expansion:
{ 0.5, 0.166667, 0.166667, 0.166667 }
==
{ 0.5, 0.166667, 0.166667, 0.166667 }
So I was looking for a function that could do an approximation for me.
I just wasn't successful in finding a solution that worked out for me.
I found some Comparison Functions but they do not work on my project.
EDIT:
The "minimal, reproducible example would simply be:
TEST_CASE("DemoTest") {
// simplified:
mpf_class a = 1;
mpf_class b = 6;
mpf_class actual_result = a / b;
mpf_class expected_result= 0.16666666667;
REQUIRE(actual_result == expected_result);
}
The "only" difference to my real application is that the results are stored in vectors. But because I am only "faking" the result by saying it is "0.1666666667" it probably doesn't fit the == anymore. So I need a function that takes an approximation and compares the range like epsilon = +-0.001.
Edit:
After implementing the solution #Arc suggested it worked well until I had some Values that were not complete "even".
So I have a failure with the following values:
actual 0.16666666666666666666700000000000000000000000000000
expected 0.16666666666666665741500000000000000000000000000000
Even though my "expected" value looks like this:
mpf_class expected = 0.16666666666666666666700000000000000000000000000000
Getting back to my original question if there is a way I can compare an approximation of the number with an epsilon of like +-0.0001 or what would be the best way to fix this issue?
First, we need to see some Minimal, Reproducible Example to be sure of what is happening. You can for example cut down some code from your test.cpp until you are left with just a few lines of code, but the issue still happens. Also, please provide compilation and running instructions. Frequently, a little bit of explanation on what your goals are may also help. As Catch2 is available on GitHub you don't need to provide it.
Without seeing the code, the best I can guess is that your code is trying to comparing mpf_t types in the mpf_class using the == operator, which I'm afraid has not been overload (see here). You should compare mpf_ts with the cmp function, since the C type mpf_t is actually an struct containing the pointer to the actual significand limbs. Check some usage examples in the tests/cxx/ directory of GMP (like here).
I note you are using GNU MP 4.1 version which is very old, you probably want to move to the 6.2.1 latest version if possible. Also, for using floats it's recommended that you use the GNU MPFR library instead of GMP floats.
EDIT: I did not yet manage to run Catch2, but the issue with your code is the expected_result is actually not equal to the actual_result. In GMP mpf_t variables are created with a 64-bit significand precision (on 64-bit machines), so that the division a / b actually results in a binary that prints 0.166666666666666666667 (that's 19 sixes after the digit 1). Try printing the result with gmp_printf("%.50Ff\n", actual_result);, because the standard cout output will only give you the value rounded to 6 digits: 0.166667.
But the problem is you can't just assign this like expected_result = 0.166666666666666666667 because in C/C++ numeric constants are parsed as double, thus you have to use the string overload attribution to get more precision.
But you can't also manage to easily (or, in general, justifiably) coin a decimal string that will correctly convert to the exact same binary given by a / b because decimal to float conversion has subtleties, see for example here and here.
So, it all depends on your application and the kind of numerical validation you aim to do. If you know that your decimal validation values are correct to some known precision, and if you set the mpf_t variables to withstanding precision (using for example mpf_set_prec), then you can use tolerance comparison, like so.
in C++ (without Catch2), it works like this:
#include <iostream>
#include <gmpxx.h>
using namespace std;
int main (void)
{
mpf_class a = 1;
mpf_class b = 6;
mpf_class actual = a / b;
mpf_class expected;
mpf_class tol;
expected = "0.166666666666666666666666666666667";
tol = "1e-30";
cout << "actual " << actual << "\n";
cout << "expected " << expected << "\n";
gmp_printf("actual %.50Ff\n", actual);
gmp_printf("expected %.50Ff\n", expected);
gmp_printf("tol %.50Ff\n", tol);
mpf_class diff = expected - actual;
gmp_printf("diff %.50Ff\n", diff);
if (abs(actual - expected) < tol)
cout << "ok\n";
else
cout << "nop\n";
return 0;
}
And compile with -lgmpxx -lgmp options.
It produces the output:
actual 0.166667
expected 0.166667
actual 0.16666666666666666666700000000000000000000000000000
expected 0.16666666666666666666700000000000000000000000000000
tol 0.00000000000000000000000000000100000000000000000000
diff 0.00000000000000000000000000000000033333529249058470
ok
If I understand Catch2 well, it should be ok if you assign expected_result with string then compare with REQUIRE(abs(actual - expected) < tol).

ROL / ROR on variable using inline assembly only in Objective-C [duplicate]

This question already has answers here:
ROL / ROR on variable using inline assembly in Objective-C
(2 answers)
Closed 9 years ago.
A few days ago, I asked the question below. Because I was in need of a quick answer, I added:
The code does not need to use inline assembly. However, I haven't found a way to do this using Objective-C / C++ / C instructions.
Today, I would like to learn something. So I ask the question again, looking for an answer using inline assembly.
I would like to perform ROR and ROL operations on variables in an Objective-C program. However, I can't manage it – I am not an assembly expert.
Here is what I have done so far:
uint8_t v1 = ....;
uint8_t v2 = ....; // v2 is either 1, 2, 3, 4 or 5
asm("ROR v1, v2");
the error I get is:
Unknown use of instruction mnemonic with unknown size suffix
How can I fix this?
A rotate is just two shifts - some bits go left, the others right - once you see this rotating is easy without assembly. The pattern is recognised by some compilers and compiled using the rotate instructions. See wikipedia for the code.
Update: Xcode 4.6.2 (others not tested) on x86-64 compiles the double shift + or to a rotate for 32 & 64 bit operands, for 8 & 16 bit operands the double shift + or is kept. Why? Maybe the compiler understands something about the performance of these instructions, maybe the just didn't optimise - but in general if you can avoid assembler do so, the compiler invariably knows best! Also using static inline on the functions, or using macros defined in the same way as the standard macro MAX (a macro has the advantage of adapting to the type of its operands), can be used to inline the operations.
Addendum after OP comment
Here is the i86_64 assembler as an example, for full details of how to use the asm construct start here.
First the non-assembler version:
static inline uint32 rotl32_i64(uint32 value, unsigned shift)
{
// assume shift is in range 0..31 or subtraction would be wrong
// however we know the compiler will spot the pattern and replace
// the expression with a single roll and there will be no subtraction
// so if the compiler changes this may break without:
// shift &= 0x1f;
return (value << shift) | (value >> (32 - shift));
}
void test_rotl32(uint32 value, unsigned shift)
{
uint32 shifted = rotl32_i64(value, shift);
NSLog(#"%8x <<< %u -> %8x", value & 0xFFFFFFFF, shift, shifted & 0xFFFFFFFF);
}
If you look at the assembler output for profiling (so the optimiser kicks in) in Xcode (Product > Generate Output > Assembly File, then select Profiling in the pop-up menu as the bottom of the window) you will see that rotl32_i64 is inlined into test_rotl32 and compiles down to a rotate (roll) instruction.
Now producing the assembler directly yourself is a bit more involved than for the ARM code FrankH showed. This is because to take a variable shift value a specific register, cl, must be used, so we need to give the compiler enough information to do that. Here goes:
static inline uint32 rotl32_i64_asm(uint32 value, unsigned shift)
{
// i64 - shift must be in register cl so create a register local assigned to cl
// no need to mask as i64 will do that
register uint8 cl asm ( "cl" ) = shift;
uint32 shifted;
// emit the rotate left long
// %n values are replaced by args:
// 0: "=r" (shifted) - any register (r), result(=), store in var (shifted)
// 1: "0" (value) - *same* register as %0 (0), load from var (value)
// 2: "r" (cl) - any register (r), load from var (cl - which is the cl register so this one is used)
__asm__ ("roll %2,%0" : "=r" (shifted) : "0" (value), "r" (cl));
return shifted;
}
Change test_rotl32 to call rotl32_i64_asm and check the assembly output again - it should be the same, i.e. the compiler did as well as we did.
Further note that if the commented out masking line in rotl32_i64 is included it essentially becomes rotl32 - the compiler will do the right thing for any architecture all for the cost of a single and instruction in the i64 version.
So asm is there is you need it, using it can be somewhat involved, and the compiler will invariably do as well or better by itself...
HTH
The 32bit rotate in ARM would be:
__asm__("MOV %0, %1, ROR %2\n" : "=r"(out) : "r"(in), "M"(N));
where N is required to be a compile-time constant.
But the output of the barrel shifter, whether used on a register or an immediate operand, is always a full-register-width; you can shift a constant 8-bit quantity to any position within a 32bit word, or - as here - shift/rotate the value in a 32bit register any which way.
But you cannot rotate 16bit or 8bit values within a register using a single ARM instruction. None such exists.
That's why the compiler, on ARM targets, when you use the "normal" (portable [Objective-]C/C++) code (in << xx) | (in >> (w - xx)) will create you one assembler instruction for a 32bit rotate, but at least two (a normal shift followed by a shifted or) for 8/16bit ones.

PIC C18: Converting double to string

I am using PIC18F2550. Programming it with C18 language.
I need a function that converts double to string like below:
void dtoa( char *szString, // Output string
double dbDouble, // Input number
unsigned char ucFPlaces) // Number of digits in the resulting fractional part
{
// ??????????????
}
To be called like this in the main program:
void main (void)
{
// ...
double dbNumber = 123.45678;
char szText[9];
dtoa(szText, dbNumber, 3); // szText becomes "123.456" or rounded to "123.457"
// ...
}
So write one!
5mins, a bit of graph paper and a coffee is all it should take.
In fact it's a good interview question
Tiny printf might work for you: http://www.sparetimelabs.com/tinyprintf/index.html
Generally, the Newlib C library (BSD license, from RedHat, part of Cygwin as well as used in many many "bare-metal" embedded-systems compilers) is a good place to start for usefuls sources for things that would be in the standard C library.
The Newlib dtoa.c sources are in the src/newlib/libc/stdlib subdirectory of the source tree:
Online source browser: http://sourceware.org/cgi-bin/cvsweb.cgi/src/newlib/libc/stdlib/?cvsroot=src#dirlist
Direct link to the current version of the dtoa.c file: http://sourceware.org/cgi-bin/cvsweb.cgi/~checkout~/src/newlib/libc/stdlib/dtoa.c?rev=1.5&content-type=text/plain&cvsroot=src
The file is going to be a little odd, in that Newlib uses some odd macros for the function declarations, but should be straightforward to adapt -- and, being BSD-licensed, you can pretty much do whatever you want with it if you keep the copyright notice on it.

How to do numerical integration with quantum harmonic oscillator wavefunction?

How to do numerical integration (what numerical method, and what tricks to use) for one-dimensional integration over infinite range, where one or more functions in the integrand are 1d quantum harmonic oscillator wave functions. Among others I want to calculate matrix elements of some function in the harmonic oscillator basis:
phin(x) = Nn Hn(x) exp(-x2/2)
where Hn(x) is Hermite polynomial
Vm,n = \int_{-infinity}^{infinity} phim(x) V(x) phin(x) dx
Also in the case where there are quantum harmonic wavefunctions with different widths.
The problem is that wavefunctions phin(x) have oscillatory behaviour, which is a problem for large n, and algorithm like adaptive Gauss-Kronrod quadrature from GSL (GNU Scientific Library) take long to calculate, and have large errors.
An incomplete answer, since I'm a little short on time at the moment; if others can't complete the picture, I can supply more details later.
Apply orthogonality of the wavefunctions whenever and wherever possible. This should significantly cut down the amount of computation.
Do analytically whatever you can. Lift constants, split integrals by parts, whatever. Isolate the region of interest; most wavefunctions are band-limited, and reducing the area of interest will do a lot to save work.
For the quadrature itself, you probably want to split the wavefunctions into three pieces and integrate each separately: the oscillatory bit in the center plus the exponentially-decaying tails on either side. If the wavefunction is odd, you get lucky and the tails will cancel each other, meaning you only have to worry about the center. For even wavefunctions, you only have to integrate one and double it (hooray for symmetry!). Otherwise, integrate the tails using a high order Gauss-Laguerre quadrature rule. You might have to calculate the rules yourself; I don't know if tables list good Gauss-Laguerre rules, as they're not used too often. You probably also want to check the error behavior as the number of nodes in the rule goes up; it's been a long time since I used Gauss-Laguerre rules and I don't remember if they exhibit Runge's phenomenon. Integrate the center part using whatever method you like; Gauss-Kronrod is a solid choice, of course, but there's also Fejer quadrature (which sometimes scales better to high numbers of nodes, which might work nicer on an oscillatory integrand) and even the trapezoidal rule (which exhibits stunning accuracy with certain oscillatory functions). Pick one and try it out; if results are poor, give another method a shot.
Hardest question ever on SO? Hardly :)
I'd recommend a few other things:
Try transforming the function onto a finite domain to make the integration more manageable.
Use symmetry where possible - break it up into the sum of two integrals from negative infinity to zero and zero to infinity and see if the function is symmetry or anti-symmetric. It could make your computing easier.
Look into Gauss-Laguerre quadrature and see if it can help you.
The WKB approximation?
I am not going to explain or qualify any of this right now. This code is written as is and probably incorrect. I am not even sure if it is the code I was looking for, I just remember that years ago I did this problem and upon searching my archives I found this. You will need to plot the output yourself, some instruction is provided. I will say that the integration over infinite range is a problem that I addressed and upon execution of the code it states the round off error at 'infinity' (which numerically just means large).
// compile g++ base.cc -lm
#include <iostream>
#include <cstdlib>
#include <fstream>
#include <math.h>
using namespace std;
int main ()
{
double xmax,dfx,dx,x,hbar,k,dE,E,E_0,m,psi_0,psi_1,psi_2;
double w,num;
int n,temp,parity,order;
double last;
double propogator(double E,int parity);
double eigen(double E,int parity);
double f(double x, double psi, double dpsi);
double g(double x, double psi, double dpsi);
double rk4(double x, double psi, double dpsi, double E);
ofstream datas ("test.dat");
E_0= 1.602189*pow(10.0,-19.0);// ev joules conversion
dE=E_0*.001;
//w^2=k/m v=1/2 k x^2 V=??? = E_0/xmax x^2 k-->
//w=sqrt( (2*E_0)/(m*xmax) );
//E=(0+.5)*hbar*w;
cout << "Enter what energy level your looking for, as an (0,1,2...) INTEGER: ";
cin >> order;
E=0;
for (n=0; n<=order; n++)
{
parity=0;
//if its even parity is 1 (true)
temp=n;
if ( (n%2)==0 ) {parity=1; }
cout << "Energy " << n << " has these parameters: ";
E=eigen(E,parity);
if (n==order)
{
propogator(E,parity);
cout <<" The postive values of the wave function were written to sho.dat \n";
cout <<" In order to plot the data should be reflected about the y-axis \n";
cout <<" evenly for even energy levels and oddly for odd energy levels\n";
}
E=E+dE;
}
}
double propogator(double E,int parity)
{
ofstream datas ("sho.dat") ;
double hbar =1.054*pow(10.0,-34.0);
double m =9.109534*pow(10.0,-31.0);
double E_0= 1.602189*pow(10.0,-19.0);
double dx =pow(10.0,-10);
double xmax= 100*pow(10.0,-10.0)+dx;
double dE=E_0*.001;
double last=1;
double x=dx;
double psi_2=0.0;
double psi_0=0.0;
double psi_1=1.0;
// cout <<parity << " parity passsed \n";
psi_0=0.0;
psi_1=1.0;
if (parity==1)
{
psi_0=1.0;
psi_1=m*(1.0/(hbar*hbar))* dx*dx*(0-E)+1 ;
}
do
{
datas << x << "\t" << psi_0 << "\n";
psi_2=(2.0*m*(dx/hbar)*(dx/hbar)*(E_0*(x/xmax)*(x/xmax)-E)+2.0)*psi_1-psi_0;
//cout << psi_1 << "=psi_1\n";
psi_0=psi_1;
psi_1=psi_2;
x=x+dx;
} while ( x<= xmax);
//I return 666 as a dummy value sometimes to check the function has run
return 666;
}
double eigen(double E,int parity)
{
double hbar =1.054*pow(10.0,-34.0);
double m =9.109534*pow(10.0,-31.0);
double E_0= 1.602189*pow(10.0,-19.0);
double dx =pow(10.0,-10);
double xmax= 100*pow(10.0,-10.0)+dx;
double dE=E_0*.001;
double last=1;
double x=dx;
double psi_2=0.0;
double psi_0=0.0;
double psi_1=1.0;
do
{
psi_0=0.0;
psi_1=1.0;
if (parity==1)
{double psi_0=1.0; double psi_1=m*(1.0/(hbar*hbar))* dx*dx*(0-E)+1 ;}
x=dx;
do
{
psi_2=(2.0*m*(dx/hbar)*(dx/hbar)*(E_0*(x/xmax)*(x/xmax)-E)+2.0)*psi_1-psi_0;
psi_0=psi_1;
psi_1=psi_2;
x=x+dx;
} while ( x<= xmax);
if ( sqrt(psi_2*psi_2)<=1.0*pow(10.0,-3.0))
{
cout << E << " is an eigen energy and " << psi_2 << " is psi of 'infinity' \n";
return E;
}
else
{
if ( (last >0.0 && psi_2<0.0) ||( psi_2>0.0 && last<0.0) )
{
E=E-dE;
dE=dE/10.0;
}
}
last=psi_2;
E=E+dE;
} while (E<=E_0);
}
If this code seems correct, wrong, interesting or you do have specific questions ask and I will answer them.
I am a student majoring in physics, and I also encountered the problem. These days I keep thinking about this question and get my own answer. I think it may help you solve this question.
1.In gsl, there are functions can help you integrate the oscillatory function--qawo & qawf. Maybe you can set a value, a. And the integration can be separated into tow parts, [0,a] and [a,pos_infinity]. In the first interval, you can use any gsl integration function you want, and in the second interval, you can use qawo or qawf.
2.Or you can integrate the function to a upper limit, b, that is integrated in [0,b]. So the integration can be calculated using a gauss legendry method, and this is provided in gsl. Although there maybe some difference between the real value and the computed value, but if you set b properly, the difference can be neglected. As long as the difference is less than the accuracy you want. And this method using the gsl function is only called once and can use many times, because the return value is point and its corresponding weight, and integration is only the sum of f(xi)*wi, for more details you can search gauss legendre quadrature on wikipedia. Multiple and addition operation is much faster than integration.
3.There is also a function which can calculate the infinity area integration--qagi, you can search it in the gsl-user's guide. But this is called everytime you need to calculate the integration, and this may cause some time consuming, but I'm not sure how long will it use in you program.
I suggest NO.2 choice I offered.
If you are going to work with Harmonic oscillator functions less than n = 100 you might want to try:
http://www.mymathlib.com/quadrature/gauss_hermite.html
The program computes an integral via gauss-hermite quadrature with 100 zeroes and weights (the zeroes of H_100). Once you go over Hermite_100 the integrals are not as accurate.
Using this integration method I wrote a program calculating exactly what you want to calculate and it works fairly well. Also, there might be a way to go beyond n=100 by using the asymptotic form of the Hermite-polynomial zeroes but I haven't looked into it.

Measuring performance after Tail Call Optimization(TCO)

I have an idea about what it is. My question is :-
1.) If i program my code which is amenable to Tail Call optimization(Last statement in a function[recursive function] being a function call only, no other operation there) then do i need to set any optimization level so that compiler does TCO. In what mode of optimization will compiler perform TCO, optimizer for space or time.
2.) How do i find out which all compilers (MSVC, gcc, ARM-RVCT) does support TCO
3.) Assuming some compiler does TCO, we enable it then, What is the way to find out that the compielr has actually done TCO? Will Code size, tell it or Cycles taken to execute it will tell that or both?
-AD
Most compilers support TCO, it is a relatively old technique. As far as how to enable it with a specific compiler, check the documentation for your compilers. gcc will enable the optimization at every optimization level except -O1, I think the specific option for this is -foptimize-sibling-calls. As far as how to tell how/if the compiler is doing TCO, look at the assembler output (gcc -S for example) or disassemble the object code.
Optimization is Compiler specific. Consult the documentation for the various optimization flags for them
You will find that in the Compilers documentation too. If you are curious, you can write a tail recursive function and pass it a big argument, and lookout for a stack-overflow. (tho checking the generated assembler might be a better choice, if you understand the code generated.)
You just use the debugger, and look out the address of function arguments/local variables. If they increase/decrease on each logical frame that the debugger shows (or if it actually only shows one frame, even though you did several calls), you know whether TCO was done or wasn't done.
If you want your compiler to do tail call optimization, just check either
a) the doc of the compiler at which optimization level it will be performed or
b) check the asm, if the function will call itself (you dont even need big asm knowledge to spot the just the symbol of the function again)
If you really really want tail recursion my question would be:
Why dont you perform the tail call removal yourself? It means nothing else than removing the recursion, and if its removable then its not only possible by the compiler on low level but also on algorithmic level by you, that you can programm it direct into your code (it means nothing else than go for a loop instead of a call to yourself).
One way to determine if tail-call is happening is to see if you can force a stack overflow. The following program does not produce a stack overflow using VC++ 2005 Express Edition and, even though its results exceed the capacity of long double rather quickly, you can tell that all of the iterations are being processed when TCO is happening:
/* FibTail.c 0.00 UTF-8 dh:2008-11-23
* --|----1----|----2----|----3----|----4----|----5----|----6----|----*
*
* Demonstrate Fibonacci computation by tail call to see whether it is
* is eliminated through compiler optimization.
*/
#include <stdio.h>
long double fibcycle(long double f0, long double f1, unsigned i)
{ /* accumulate successive fib(n-i) values by tail calls */
if (i == 0) return f1;
return fibcycle(f1, f0+f1, --i);
}
long double fib(unsigned n)
{ /* the basic fib(n) setup and return. */
return fibcycle(1.0, 0.0, n);
}
int main(int argc, char* argv[ ])
{ /* compute some fibs until something breaks */
int i;
printf("\n i fib(i)\n\n");
for (i = 1; i > 0; i+=i)
{ /* Do for powers of 2 until i flips negative
or stack overflow, whichever comes first */
printf("%12d %30.20LG \n", i, fib((unsigned) i) );
}
printf("\n\n");
return 0;
}
Notice, however, that the simplifications to make a pure tail-call in fibcycle is tantamount to figuring out an interative version that doesn't do a tail-call at all (and will work with or without TCO in the compiler.
It might be interesting to experiment in order to see how well the TCO can find optimizations that are not already near-optimal and easily replaced by iterations.