Using g++12 and CMake.
I have a source file
holes5.cpp
which does not
#include <execution>
and does not need to link to tbb.
Now if I add
#include <execution>
it does not require linking to tbb either. So what exact step does it take to start depending on tbb (thus requiring linking to tbb). I am confused.
Installed tbb via
sudo apt install libtbb-dev
In CMakeList.txt:
list(APPEND CMAKE_MODULE_PATH "deps/tbb/cmake/")
find_package(TBB REQUIRED)
set (SOURCES holes5.cpp)
add_executable(holes5 ${SOURCES})
set (SOURCES par_unseq.cpp)
add_executable(par_unseq ${SOURCES})
target_link_libraries(par_unseq PUBLIC TBB::tbb)
par_unseq.cpp:
#include <cstdint>
#include <iostream>
#include <chrono>
#include <cmath>
#include <numeric>
#include <utility>
#include <algorithm>
#include <execution>
using namespace std;
double f(double x) noexcept
{
const int N = 1000;
for (int i = 0; i < N; ++i) {
x = log2(x);
x = cos(x);
x = x * x + 1;
}
return x;
}
double sum(const vector<double>& vec)
{
double sum = 0;
for (auto x : vec)
sum += x;
return sum;
}
int main()
{
cout << "Hey! Your machine has " << thread::hardware_concurrency() << " cores!\n";
// Make an input vector.
const int N = 1000000;
vector<double> vecInput(N);
for (int i = 0; i < N; ++i)
vecInput[i] = i + 1;
{ // Case #1: Plain transform, no parallelism.
auto startTime = chrono::system_clock::now();
vector<double> vecOutput(N);
transform(vecInput.cbegin(), vecInput.cend(), vecOutput.begin(), f);
auto endTime = chrono::system_clock::now();
chrono::duration<double> diff = endTime - startTime;
cout << "1. sum = " << sum(vecOutput) << ", time = " << diff.count() << "\n";
}
{ // Case #2: Transform with parallel unsequenced.
vector<double> vecOutput(N);
auto startTime = chrono::system_clock::now();
transform(execution::par_unseq,
vecInput.cbegin(), vecInput.cend(), vecOutput.begin(), f);
auto endTime = chrono::system_clock::now();
chrono::duration<double> diff = endTime - startTime;
cout << "2. sum = " << sum(vecOutput) << ", time = " << diff.count() << "\n";
}
}
/* Output:
Hey! Your machine has 4 cores!
1. sum = 1.60346e+06, time = 43.7997
2. sum = 1.60346e+06, time = 10.8235
*/
According to the libstdc++ documentation (see Note 3), you must link -ltbb whenever you include the <execution> header.
If you don't actually use any of the functions from <execution>, or you do but the compiler manages to inline them all, then your program might link even without -ltbb. This might be dependent on a specific version of the header or library, compiler version, or compilation options. And even then, it does not guarantee that the program will work correctly. It could be that including the header changes the compilation in some way, and that -ltbb includes initialization code needed for this to work correctly. Even if it doesn't now, it might in the future.
So I think the answer is very simple: if you include the header, link the library. If you don't need the <execution> features, then don't include the header in the first place.
Related
Now I want to do something on the branch and bound using SCIP, and begin from the branching rule. While I tracking the branching process I found something I cannot understand. Begining from getting the branching variable candidates using SCIPgetLPBranchCands, I get the SCIP_VAR** lpcands, then I select the first variable to branch using SCIPbranchVar. This branching rule works on each focused node.
Consider the branch decision at the number 1 node (the root node), after executing SCIPbranchVar function, SCIPgetChildren return two child nodes (2 and 3), but the child node number of root node changed, two or more child node appeared. Suppose node selector selected the node of number 2 to branch, SCIPgetSiblings return 3 siblings (3, 4, and 5).
To make it clear that what happened, I print the branching decision path using SCIPprintNodeRootPath, and print the relationship between nodes. The result shows that after I branching at a node on the selected variable, the SCIIP branching at the same node on another variable which I don't know.
I have print these information generated by scipoptsuite-7.0.1/scip/examples/Binpacking, no more branching decision is made. But more child nodes appearing after I replace the ryanfoster rule using the branching on the first variable. After that, I try the create child node branching style for my procedure, extra nodes appeared anyway.
I cannot find out what happened and how to totally control the branching process, it is very important for my work in the future for it determined how to design the branching rule. I have even try to read the source code of SCIP, unfortunately, it's too hard for me to read.
My procedure as follows:
main.cpp
/* standard library includes */
#include <stdio.h>
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
/* scip includes */
#include "objscip/objscip.h"
#include "objscip/objscipdefplugins.h"
/*user file includes*/
#include "branch_rule.h"
/* namespace usage */
using namespace std;
using namespace scip;
static
SCIP_RETCODE execmain()
{
SCIP* scip = NULL;
/* initialize SCIP environment */
SCIP_CALL( SCIPcreate(&scip) );
SCIP_CALL( SCIPincludeBranchRrule(scip) );
/* include default plugins */
SCIP_CALL( SCIPincludeDefaultPlugins(scip) );
SCIP_CALL( SCIPsetIntParam(scip,"presolving/maxrestarts",0) );
SCIP_CALL( SCIPsetSeparating(scip, SCIP_PARAMSETTING_OFF, TRUE) );
SCIP_CALL(SCIPreadProb(scip, "map18.mps.gz", NULL));
SCIP_CALL( SCIPsolve(scip) );
return SCIP_OKAY;
}
int main(int argc, char** argv)
{
return execmain() != SCIP_OKAY ? 1 : 0;
}
branch_rule.h
#ifndef VRP_BRANCH_RULE_H
#define VRP_BRANCH_RULE_H
#include "scip/scip.h"
#include <vector>
SCIP_RETCODE SCIPincludeBranchRrule(
SCIP* scip
);
#endif //VRP_BRANCH_RULE_H
branch_rule.cpp
#include "branch_rule.h"
#include <assert.h>
#include <string.h>
#include <iostream>
#include <sstream>
#include <iomanip>
#include <fstream>
#include "scip/struct_tree.h"
#include "scip/type_var.h"
using namespace std;
/**#name Branching rule properties
*
* #{
*/
#define BRANCHRULE_NAME "branch rule test"
#define BRANCHRULE_DESC "branch rule test"
#define BRANCHRULE_PRIORITY 50000
#define BRANCHRULE_MAXDEPTH -1
#define BRANCHRULE_MAXBOUNDDIST 1.0
void printBranchTreeNode(SCIP_NODE *node, SCIP_NODE **siblings, int nsiblings){
ofstream f1("branch_path/node_information.txt", ios::app);
if(!f1)return;
f1 << "node number:" << node->number << ", sibling number: " << nsiblings;
if(NULL==node->parent)
f1 << std::endl;
else{
f1 << ", parent number: " << node->parent->number << std::endl;
}
f1 << setw(20) << "siblings:" << std::endl;
for(int i = 0; i < nsiblings; ++i){
f1 << setw(20) << "node number:" << siblings[i]->number << ", sibling number: " << nsiblings << ", parent number: " << siblings[i]->parent->number << endl;
}
f1.close();
}
static
SCIP_DECL_BRANCHEXECLP(branchExeclpTest)
{
assert(scip != NULL);
assert(branchrule != NULL);
assert(strcmp(SCIPbranchruleGetName(branchrule), BRANCHRULE_NAME) == 0);
assert(result != NULL);
SCIP_NODE *current_node = SCIPgetCurrentNode(scip) ;
SCIP_NODE **leaves = NULL;
int nleaves;
SCIP_CALL( SCIPgetLeaves (scip, &leaves, &nleaves) );
std::vector<SCIP_NODE*> leaves_vector;
for(int i =0; i < nleaves; ++i){
leaves_vector.push_back(leaves[i]);
}
SCIP_NODE **current_node_slibings = NULL;
int ncurrent_node_slibings;
SCIP_CALL( SCIPgetSiblings(scip, ¤t_node_slibings, &ncurrent_node_slibings) );
std::vector<SCIP_NODE*> slibings_vector;
for(int i =0; i < ncurrent_node_slibings; ++i){
slibings_vector.push_back(current_node_slibings[i]);
}
printBranchTreeNode(current_node, current_node_slibings, ncurrent_node_slibings);
SCIP_NODE **childrens;
int nchildrens;
SCIP_CALL( SCIPgetChildren(scip, &childrens, &nchildrens) );
std::vector<SCIP_NODE*> childrens_vector;
for(int i =0; i < nchildrens; ++i){
childrens_vector.push_back(childrens[i]);
}
stringstream filename;
filename.str("");
filename << "branch_path/branch_path_to_node_" << current_node->number <<".dat";
std::string stringFileName = filename.str();
FILE *fp = fopen(stringFileName.c_str(), "wt+");
SCIP_CALL( SCIPprintNodeRootPath(scip, current_node, fp) );
fclose(fp);
// create child node branching style
SCIP_NODE *childsame;
SCIP_NODE *childdiffer;
SCIP_CALL( SCIPcreateChild(scip, &childsame, 0.0, SCIPgetLocalTransEstimate(scip)) );
SCIP_CALL( SCIPcreateChild(scip, &childdiffer, 0.0, SCIPgetLocalTransEstimate(scip)) );
/*
// SCIPbranchVar branching style
SCIP_VAR** lpcands;
SCIP_Real* lpcandssol;
SCIP_Real* lpcandsfrac;
int nlpcands;
int npriolpcands;
int nfracimplvars;
SCIP_CALL( SCIPgetLPBranchCands(scip, &lpcands, &lpcandssol, &lpcandsfrac, &nlpcands, &npriolpcands, &nfracimplvars) );
SCIP_NODE* downchild;
SCIP_NODE* eqchild;
SCIP_NODE* upchild;
SCIP_CALL( SCIPbranchVar(scip, lpcands[0], &downchild, &eqchild, &upchild) );
// SCIP_CALL( SCIPbranchVar(scip, var, NULL, NULL, NULL) );
SCIP_CALL( SCIPgetLeaves (scip, &leaves, &nleaves) );
int numLeaves = SCIPgetNLeaves(scip);
SCIP_CALL( SCIPgetSiblings(scip, ¤t_node_slibings, &ncurrent_node_slibings) );
slibings_vector.clear();
for(int i =0; i < ncurrent_node_slibings; ++i){
slibings_vector.push_back(current_node_slibings[i]);
}
SCIP_CALL( SCIPgetChildren(scip, &childrens, &nchildrens) );
childrens_vector.clear();
for(int i =0; i < nchildrens; ++i){
childrens_vector.push_back(childrens[i]);
}
*/
return SCIP_OKAY;
}
SCIP_RETCODE SCIPincludeBranchRrule(
SCIP* scip /**< SCIP data structure */
){
SCIP_BRANCHRULEDATA* branchruledata;
SCIP_BRANCHRULE* branchrule;
branchruledata = NULL;
branchrule = NULL;
// include branching rule
SCIP_CALL( SCIPincludeBranchruleBasic(scip, &branchrule, BRANCHRULE_NAME, BRANCHRULE_DESC, BRANCHRULE_PRIORITY, BRANCHRULE_MAXDEPTH,
BRANCHRULE_MAXBOUNDDIST, branchruledata) );
assert(branchrule != NULL);
SCIP_CALL( SCIPsetBranchruleExecLp(scip, branchrule, branchExeclpTest) );
return SCIP_OKAY;
}
Does anyone can help me to solve this problem?
so the first thing I notice when looking at your code is that you do not set the result pointer. After branching, you need to set *result = SCIP_BRANCHED;.
Can you please try that and check if it fixes your problem?
I am new to CGAL.
I tried to modify Examples/Arrangement_on_surfaces_2 Bezier_curves.cpp to save arrangement to file as shown below:
//! \file examples/Arrangement_on_surface_2/Bezier_curves.cpp
// Constructing an arrangement of Bezier curves.
#include <fstream>
#include <CGAL/basic.h>
#ifndef CGAL_USE_CORE
#include <iostream>
int main ()
{
std::cout << "Sorry, this example needs CORE ..." << std::endl;
return 0;
}
#else
#include <CGAL/Cartesian.h>
#include <CGAL/CORE_algebraic_number_traits.h>
#include <CGAL/Arr_Bezier_curve_traits_2.h>
#include <CGAL/Arrangement_2.h>
#include <CGAL/IO/Arr_iostream.h>
#include "arr_inexact_construction_segments.h"
#include "arr_print.h"
typedef CGAL::CORE_algebraic_number_traits Nt_traits;
typedef Nt_traits::Rational NT;
typedef Nt_traits::Rational Rational;
typedef Nt_traits::Algebraic Algebraic;
typedef CGAL::Cartesian<Rational> Rat_kernel;
typedef CGAL::Cartesian<Algebraic> Alg_kernel;
typedef Rat_kernel::Point_2 Rat_point_2;
typedef CGAL::Arr_Bezier_curve_traits_2<Rat_kernel, Alg_kernel, Nt_traits>
Traits_2;
typedef Traits_2::Curve_2 Bezier_curve_2;
typedef CGAL::Arrangement_2<Traits_2> Arrangement_2;
//typedef CGAL::Arrangement_2<Traits_2> Arrangement;
int main (int argc, char *argv[])
{
// Get the name of the input file from the command line, or use the default
// Bezier.dat file if no command-line parameters are given.
const char *filename = (argc > 1) ? argv[1] : "Bezier.dat";
const char *outfilename = (argc > 1) ? argv[1] : "BezierOut.dat";
// Open the input file.
std::ifstream in_file (filename);
if (! in_file.is_open()) {
std::cerr << "Failed to open " << filename << std::endl;
return 1;
}
// Read the curves from the input file.
unsigned int n_curves;
std::list<Bezier_curve_2> curves;
Bezier_curve_2 B;
unsigned int k;
in_file >> n_curves;
for (k = 0; k < n_curves; k++) {
// Read the current curve (specified by its control points).
in_file >> B;
curves.push_back (B);
std::cout << "B = {" << B << "}" << std::endl;
}
in_file.close();
// Construct the arrangement.
Arrangement_2 arr;
insert (arr, curves.begin(), curves.end());
// Print the arrangement size.
std::ofstream out_file;
out_file.open(outfilename);
out_file << "The arrangement size:" << std::endl
<< " V = " << arr.number_of_vertices()
<< ", E = " << arr.number_of_edges()
<< ", F = " << arr.number_of_faces() << std::endl;
out_file << arr;
out_file.close();
return 0;
}
#endif
If I comment out the line out_file << arr; it works fine. Otherwise it generates a C2678 error in read_x_monotone_curve in Arr_text_formtter.h
I am using Visual Studio 15 x86.
Thank you for any help.
I solve this by modifying the print_arrangement(arr) routine in arr_print.h to save_arrangement(arr) with a std::ofstream in place of std::cout.
It appears that the << operator does not work.
If someone else has a better solution I am open to it.
Points of intersections in an arrangement of Bezier curves cannot be represented in an exact manner. Therefore, such an arrangement cannot be saved using the default export (<<) operator and the standard format.
The easiest solution is to store the curves, but this means that the arrangement must be recomputed each time the curves are read. Perhaps other solution could be devised, but they are not implemented.
I am working on this code (in c++) and I finished but i have 2 errors on line 19 when I use them in for loops about variables y and m, saying that they are uninitialized local variables. I don't see how this is possible because I declared them at the beginning as int and their value is assigned when the user inputs in cin.
#include <iostream>
#include <string>
#include <cmath>
#include <math.h>
#include <vector>
using namespace std;
int main()
{
int a, b, n, l = 0;
cin >> a, b, n;
for (int i = 0; i < 20; i++)
{
for (int j = 0; j < 20; j++)
{
if (l < (i*a + j*b) && (i*a + j*b) <= n)
l = i*a + j*b;
}
}
cout << l;
return 0;
}
I'm not in a position to test this, but Multiple inputs on one line suggests that your syntax should be
cin >> a >> b >> c;
Regardless, I think the compiler is suggesting that assignment to all variables isn't guaranteed by cin so without explicit initialisation when they're declared you're assuming too much.
I am using QtCreator on windows, and I would like to know how to make my compiler optimize the output.
My understanding is that MinGW is a port of GCC. So, I should be able to use arguments such as -O2. However, in the "Projects" bar, the only things I see are :
Build step for qmake (probably not here, qmake is about the .pro files / MOC / Qt stuff ...)
Build step for mingw32-make (probably)
Clean steps (probably not)
So, I tried to add -O2 in the "Make arguments" box, but unfortunately, I get an error "invalid option --O"
For anyone interested, I was trying to make an implementation of the Ackermann function because I read that :
The Ackermann function, due to its definition in terms of extremely
deep recursion, can be used as a benchmark of a compiler's ability to
optimize recursion
The code (which doesn't really use Qt) :
#include <QtCore/QCoreApplication>
#include <QDebug>
#include <ctime>
using namespace std;
int nbRecursion;
int nbRecursions9;
int Ackermann(int m, int n){
nbRecursion++;
if(nbRecursion % 1000000 == 0){
qDebug() << nbRecursions9 << nbRecursion;
}
if(nbRecursion == 1000000000){
nbRecursion = 0;
nbRecursions9++;
}
if(m==0){
return n+1;
}
if(m>0 && n>0){
return Ackermann(m-1,Ackermann(m, n-1));
}
if(m>0 && n==0){
return Ackermann(m-1,1);
}
qDebug() << "Bug at " << m << ", " << n;
}
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
nbRecursion = 0;
nbRecursions9 = 0;
int m = 3;
int n = 13;
clock_t begin = clock();
Ackermann(m,n);
clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
qDebug() << "There are " << CLOCKS_PER_SEC << " CLOCKS_PER_SEC";
qDebug() << "There were " << nbRecursions9 << nbRecursion << " recursions in " << elapsed_secs << " seconds";
double timeX = 1000000000.0*((elapsed_secs)/(double)nbRecursion);
if(nbRecursions9>0){
timeX += elapsed_secs/(double)nbRecursions9;
}
qDebug() << "Time for a function call : " << timeX << " nanoseconds";
return a.exec();
}
-O2 is used by default when you do a release build. Only debug builds don't use optimization. Regardless, if you want to use specific compiler options, you do so in the project file (*.pro) itself, by appending your options to the QMAKE_CFLAGS_RELEASE (for C files) and QMAKE_CXXFLAGS_RELEASE (for C++ files). For example:
QMAKE_CFLAGS_RELEASE += -O3 -march=i686 -mtune=generic -fomit-frame-pointer
QMAKE_CXXFLAGS_RELEASE += -O3 -march=i686 -mtune=generic -fomit-frame-pointer
If you really want to use some specific options always, regardless of whether it's a debug or release build, then append to QMAKE_CFLAGS and QMAKE_CXXFLAGS instead. But usually, you'll only want optimization options in your release builds, not the debug ones.
So, after spending hours reading and understanding I have finally made my first OpenCL program that actually does something, which is it adds two vectors and outputs to a file.
#include <iostream>
#include <vector>
#include <cstdlib>
#include <string>
#include <fstream>
#define __CL_ENABLE_EXCEPTIONS
#include <CL/cl.hpp>
int main(int argc, char *argv[])
{
try
{
// get platforms, devices and display their info.
std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
std::vector<cl::Platform>::iterator i=platforms.begin();
std::cout<<"OpenCL \tPlatform : "<<i->getInfo<CL_PLATFORM_NAME>()<<std::endl;
std::cout<<"\tVendor: "<<i->getInfo<CL_PLATFORM_VENDOR>()<<std::endl;
std::cout<<"\tVersion : "<<i->getInfo<CL_PLATFORM_VERSION>()<<std::endl;
std::cout<<"\tExtensions : "<<i->getInfo<CL_PLATFORM_EXTENSIONS>()<<std::endl;
// get devices
std::vector<cl::Device> devices;
i->getDevices(CL_DEVICE_TYPE_ALL,&devices);
int o=99;
std::cout<<"\n\n";
// iterate over available devices
for(std::vector<cl::Device>::iterator j=devices.begin(); j!=devices.end(); j++)
{
std::cout<<"\tOpenCL\tDevice : " << j->getInfo<CL_DEVICE_NAME>()<<std::endl;
std::cout<<"\t\t Type : " << j->getInfo<CL_DEVICE_TYPE>()<<std::endl;
std::cout<<"\t\t Vendor : " << j->getInfo<CL_DEVICE_VENDOR>()<<std::endl;
std::cout<<"\t\t Driver : " << j->getInfo<CL_DRIVER_VERSION>()<<std::endl;
std::cout<<"\t\t Global Mem : " << j->getInfo<CL_DEVICE_GLOBAL_MEM_SIZE>()/(1024*1024)<<" MBytes"<<std::endl;
std::cout<<"\t\t Local Mem : " << j->getInfo<CL_DEVICE_LOCAL_MEM_SIZE>()/1024<<" KBbytes"<<std::endl;
std::cout<<"\t\t Compute Unit : " << j->getInfo<CL_DEVICE_MAX_COMPUTE_UNITS>()<<std::endl;
std::cout<<"\t\t Clock Rate : " << j->getInfo<CL_DEVICE_MAX_CLOCK_FREQUENCY>()<<" MHz"<<std::endl;
}
std::cout<<"\n\n\n";
//MAIN CODE BEGINS HERE
//get Kernel
std::ifstream ifs("vector_add_kernel.cl");
std::string kernelSource((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>());
std::cout<<kernelSource;
//Create context, select device and command queue.
cl::Context context(devices);
cl::Device &device=devices.front();
cl::CommandQueue cmdqueue(context,device);
// Generate Source vector and push the kernel source in it.
cl::Program::Sources sourceCode;
sourceCode.push_back(std::make_pair(kernelSource.c_str(), kernelSource.size()));
//Generate program using sourceCode
cl::Program program=cl::Program(context, sourceCode);
//Build program..
try
{
program.build(devices);
}
catch(cl::Error &err)
{
std::cerr<<"Building failed, "<<err.what()<<"("<<err.err()<<")"
<<"\nRetrieving build log"
<<"\n Build Log Follows \n"
<<program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(devices.front());
}
//Declare and initialize vectors
std::vector<cl_float>B(993448,1.3);
std::vector<cl_float>C(993448,1.3);
std::vector<cl_float>A(993448,1.3);
cl_int N=A.size();
//Declare and intialize proper work group size and global size. Global size raised to the nearest multiple of workGroupSize.
int workGroupSize=128;
int GlobalSize;
if(N%workGroupSize) GlobalSize=N - N%workGroupSize + workGroupSize;
else GlobalSize=N;
//Declare buffers.
cl::Buffer vecA(context, CL_MEM_READ_WRITE, sizeof(cl_float)*N);
cl::Buffer vecB(context, CL_MEM_READ_ONLY , (B.size())*sizeof(cl_float));
cl::Buffer vecC(context, CL_MEM_READ_ONLY , (C.size())*sizeof(cl_float));
//Write vectors into buffers
cmdqueue.enqueueWriteBuffer(vecB, 0, 0, (B.size())*sizeof(cl_float), &B[0] );
cmdqueue.enqueueWriteBuffer(vecB, 0, 0, (C.size())*sizeof(cl_float), &C[0] );
//Executing kernel
cl::Kernel kernel(program, "vector_add");
cl::KernelFunctor kernel_func=kernel.bind(cmdqueue, cl::NDRange(GlobalSize), cl::NDRange(workGroupSize));
kernel_func(vecA, vecB, vecC, N);
//Reading back values into vector A
cmdqueue.enqueueReadBuffer(vecA,true,0,N*sizeof(cl_float), &A[0]);
cmdqueue.finish();
//Saving into file.
std::ofstream output("vectorAdd.txt");
for(int i=0;i<N;i++) output<<A[i]<<"\n";
}
catch(cl::Error& err)
{
std::cerr << "OpenCL error: " << err.what() << "(" << err.err() <<
")" << std::endl;
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
The problem is, for smaller values of N, I'm getting the correct result that is 2.6
But for larger values, like the one in the code above (993448) I get garbage output varying between 1 and 2.4.
Here is the Kernel code :
__kernel void vector_add(__global float *A, __global float *B, __global float *C, int N) {
// Get the index of the current element
int i = get_global_id(0);
//Do the operation
if(i<N) A[i] = C[i] + B[i];
}
UPDATE : Ok it seems the code is working now. I have fixed a few minor mistakes in my code above
1) The part where GlobalSize is initialized has been fixed.
2)Stupid mistake in enqueueWriteBuffer (wrong parameters given)
It is now outputting the correct result for large values of N.
Try to change the data type from float to double etc.