I an using Nef Polyhedra for some 3D boolean operation. I first convert a bunch polyhedron_3 meshes into Nef_polyhedron_3 meshes and then perform the boolean operation. However, almost any manipulation of the Nef polyhedra results in a bunch of Uncertain_conversion_exception. I tried running CGAL with CGAL_PROFILE profile. This told me that the I get this exception on two different types. These are
GCAL::Uncertain<enum CGAL::Sign>
CGAL::Uncertain<bool>
Especially the second is thrown a lot. They are throw both when converting meshes into Nef polyhedra and during the boolean operations them selves. The call that that throws the exception is made in Filtered_predicate. The exceptions are being handled here and the boolean operation returns the expected result. However, the exceptions seems to slow down the operations quite a bit.
I am experiencing this with CGAL 4.7 on windows using visual studio 2015.I have the following symbols defined for the preprocessor
WIN32;
_WINDOWS;
_CRT_SECURE_NO_DEPRECATE;
_SCL_SECURE_NO_DEPRECATE;
_CRT_SECURE_NO_WARNINGS;
_SCL_SECURE_NO_WARNINGS;
_DEBUG;CGAL_USE_MPFR;
CGAL_USE_GMP;
BOOST_ALL_DYN_LINK;
CMAKE_INTDIR="Debug";
CGAL_EXPORTS;
CGAL is built with the flags: /GR, /EHsc, fp:strict, fp:except-, /wd4503 and /bigobj.
Additionally the project that includes CGAL is build with the flags: /GS /Zc:wchar_t /Zc:inline /fp:precise /Zc:forScope /clr /MDd /EHa
I left out a bunch flags that didn't seem relevant like include flags.
Is this expected behavior or can i do something to avoid it?
Related
For fun, I'm writing a bignum library in Rust. My goal (as with most bignum libraries) is to make it as efficient as I can. I'd like it to be efficient even on unusual architectures.
It seems intuitive to me that a CPU will perform arithmetic faster on integers with the native number of bits for the architecture (i.e., u64 for 64-bit machines, u16 for 16-bit machines, etc.) As such, since I want to create a library that is efficient on all architectures, I need to take the target architecture's native integer size into account. The obvious way to do this would be to use the cfg attribute target_pointer_width. For instance, to define the smallest type which will always be able to hold more than the maximum native int size:
#[cfg(target_pointer_width = "16")]
type LargeInt = u32;
#[cfg(target_pointer_width = "32")]
type LargeInt = u64;
#[cfg(target_pointer_width = "64")]
type LargeInt = u128;
However, while looking into this, I came across this comment. It gives an example of an architecture where the native int size is different from the pointer width. Thus, my solution will not work for all architectures. Another potential solution would be to write a build script which codegens a small module which defines LargeInt based on the size of a usize (which we can acquire like so: std::mem::size_of::<usize>().) However, this has the same problem as above, since usize is based on the pointer width as well. A final obvious solution is to simply keep a map of native int sizes for each architecture. However, this solution is inelegant and doesn't scale well, so I'd like to avoid it.
So, my questions: is there a way to find the target's native int size, preferably before compilation, in order to reduce runtime overhead? Is this effort even worth it? That is, is there likely to be a significant difference between using the native int size as opposed to the pointer width?
It's generally hard (or impossible) to get compilers to emit optimal code for BigNum stuff, that's why https://gmplib.org/ has its low level primitive functions (mpn_... docs) hand-written in assembly for various target architectures with tuning for different micro-architecture, e.g. https://gmplib.org/repo/gmp/file/tip/mpn/x86_64/core2/mul_basecase.asm for the general case of multi-limb * multi-limb numbers. And https://gmplib.org/repo/gmp/file/tip/mpn/x86_64/coreisbr/aors_n.asm for mpn_add_n and mpn_sub_n (Add OR Sub = aors), tuned for SandyBridge-family which doesn't have partial-flag stalls so it can loop with dec/jnz.
Understanding what kind of asm is optimal may be helpful when writing code in a higher level language. Although in practice you can't even get close to that so it sometimes makes sense to use a different technique, like only using values up to 2^30 in 32-bit integers (like CPython does internally, getting the carry-out via a right shift, see the section about Python in this). In Rust you do have access to add_overflow to get the carry-out, but using it is still hard.
For practical use, writing Rust bindings for GMP is probably your best bet, unless that already exists.
Using the largest chunks possible is very good; on all current CPUs, add reg64, reg64 has the same throughput and latency as add reg32, reg32 or reg8. So you get twice as much work done per unit. And carry propagation through 64 bits of result in 1 cycle of latency.
(There are alternate ways to store BigInteger data that can make SIMD useful; #Mysticial explains in Can long integer routines benefit from SSE?. e.g. 30 value bits per 32-bit int, allowing you to defer normalization until after a few addition steps. But every use of such numbers has to be aware of these issues so it's not an easy drop-in replacement.)
In Rust, you probably want to just use u64 regardless of the target, unless you really care about small-number (single-limb) performance on 32-bit targets. Let the compiler build u64 operations for you out of add / adc (add with carry).
The only thing that might need to be ISA-specific is if u128 is not available on some targets. You want to use 64 * 64 => 128-bit full multiply as your building block for multiplication; if the compiler can do that for you with u128 then that's great, especially if it inlines efficiently.
See also discussion in comments under the question.
One stumbling block for getting compilers to emit efficient BigInt addition loops (even inside the body of one unrolled loop) is writing an add that takes a carry input and produces a carry output. Note that x += 0xff..ff + carry=1 needs to produce a carry out even though 0xff..ff + 1 wraps to zero. So in C or Rust, x += y + carry has to check for carry out in both the y+carry and the x+= parts.
It's really hard (probably impossible) to convince compiler back-ends like LLVM to emit a chain of adc instructions. An add/adc is doable when you don't need the carry-out from adc. Or probably if the compiler is doing it for you for u128.overflowing_add
Often compilers will turn the carry flag into a 0 / 1 in a register instead of using adc. You can hopefully avoid that for at least pairs of u64 in addition by combining the input u64 values to u128 for u128.overflowing_add. That will hopefully not cost any asm instructions because a u128 already has to be stored across two separate 64-bit registers, just like two separate u64 values.
So combining up to u128 could just be a local optimization for a function that adds arrays of u64 elements, to get the compiler to suck less.
In my library ibig what I do is:
Select architecture-specific size based on target_arch.
If I don't have a value for an architecture, select 16, 32 or 64 based on target_pointer_width.
If target_pointer_width is not one of these values, use 64.
I'm using MPFR multiple precision library, and in particular the implementation from here.
Is there any way to compile the code in such a way that all operations are carried out using the standard types (e.g. double)? E.g. a compilation flag that would turn all "software operations" into "hardware operations" normally implemented in standard types?
In practice, the code is slow even when I'm using 64 bits, I profiled that the culprit is the mpfr/gmp, and I would like to measure how much I gain by changing to double (without having to re-write all the code).
This is not possible in the MPFR library for several reasons. First the formats are different. In particular, MPFR has a different exponent range, no subnormals, a single NaN... Moreover it provides correct rounding in 5 rounding modes, while processors only have 4 rounding modes, and for the native types, most operations are not correctly rounded.
You might want to write wrappers, C++ classes or whatever to do what you want, but this is not necessarily interesting as you may get many conversions between both formats.
EDIT: If you don't care about the exact behavior, perhaps what you want is something based on C++ templates. You probably need to look at another C++ MPFR interface such as MPFRCPP or mpfr::real class.
As far as I understand, the implementation you mention (MPFR C++ from Pavel Holoborodko) uses operator overloading to make MPFR calls look like standard C float operations, from the site:
//MPFR C - version
void mpfr_schwefel(mpfr_t y, mpfr_t x)
{
mpfr_t t;
mpfr_init(t);
mpfr_abs(t,x,GMP_RNDN);
mpfr_sqrt(t,t,GMP_RNDN);
mpfr_sin(t,t,GMP_RNDN);
mpfr_mul(t,t,x,GMP_RNDN);
mpfr_set_str(y,“418.9829“,10,GMP_RNDN);
mpfr_sub(y,y,t,GMP_RNDN);
mpfr_clear(t);
}
can be written like this:
// MPFR C++ - version
mpreal mpfr_schwefel(mpreal& x)
{
return "418.9829"-x*sin(sqrt(abs(x)));
}
which is cool by the way, so you just have to make slighty changes like replacing "418.9829" by 418.9829, and comment out MPFR inclusion to your code.
If your code still has remaining mpfr_... calls you can get native double-like behaviour by setting the MPFR precision to 53 bits in variable initialization or using, say, specific functions like mpfr_set_prec, but note that (as another answer points out), results won't be exactly the same:
In particular, with a precision of 53 bits and in any of the four standard rounding modes, MPFR is able to exactly reproduce all computations with double-precision machine floating-point numbers (e.g., double type in C, with a C implementation that rigorously follows Annex F of the ISO C99 standard and FP_CONTRACT pragma set to OFF) on the four arithmetic operations and the square root, except the default exponent range is much wider and subnormal numbers are not implemented (but can be emulated).
This might be just good enough for you to have a rough idea of how much MPFR performance differs from native floats.
If that isn't precise enough though, you can place a temporary include into your main file after including MPFR, with defines that override the MPFR functions you use, more or less like so:
typedef double mpfr_t;
#define mpfr_add(a,b,c,r) {a=b+c;}
I'm trying to do some 3D boolean operations with CGAL. I have successfully converted my polyhedra to nef polyhedra. However, when ever I try doing a simple union, I get an assertion fail on line 286 of SM_overlayer.h:
CGAL error: assertion violation!
Expression : G.mark(v1,0)==G.mark(v2,0)&& G.mark(v1,1)==G.mark(v2,1)
File : ./CGAL/include/CGAL/Nef_S2/SM_overlayer.h
Line : 287
I tried searching the documentation for "mark". Apparently it is a template argument on the Nef_polyhedron_3" that defaults to bool. However the documentation also says it is not implemented and that you shouldn't mess with it. I am a bit confused why there is even an assertion on some unimplemented feature. I tried simply commenting out the assertion, but it simply goes on to fail at a later point.
I am using the following kernel and typedefs as it was the only example I could find that allowed doubles in the construction of the meshes.
typedef CGAL::Exact_predicates_exact_constructions_kernel Kernel;
typedef CGAL::Nef_polyhedron_3<Kernel> Nef_polyhedron;
typedef CGAL::Polyhedron_3<Kernel> Polyhedron;
I used CGAL 4.6.3 installed with the exe installer. I have also tried the 4.7 beta and I get the same exception(at line 300 instead though).
Related github issue: https://github.com/CGAL/cgal/issues/353
EDIT: The issue turned out to be with the meshes. The meshes I were using had self intersections. Thus, even though is_valid, is_closed and is_triangular returned true, the mesh was in valid for conversion to a nef polyhedron. From CGAL 4.7. The Polygon mesh processing package has been introduced which contains this which can be used to check for self intersections.
I have a simple hello world objective-c lib:
hello.m:
#import <Foundation/Foundation.h>
#import "hello.h"
void sayHello()
{
#ifdef FRENCH
NSString *helloWorld = #"Hello World!\n";
#else
NSString *helloWorld = #"Bonjour Monde!\n";
#endif
NSFileHandle *stdout = [NSFileHandle fileHandleWithStandardOutput];
NSData *strData = [helloWorld dataUsingEncoding: NSASCIIStringEncoding];
[stdout writeData: strData];
}
the hello.h file looks like this:
int main (int argc, const char * argv[]);
int sum(int a, int b);
void sayHello();
This compiles just fine on osx and linux using clang and gcc.
Now my question:
When running a clean compile against hello.m multiple times with clang on ubuntu the generated hello.o can differ. This seems not related to a timestamp, because even after a second or more, the generated .o file can have the same checksum. From my naive point of view, this seems like a complete random/unpredicatable behaviour.
I ran the compilation with the -Sto inspect the generated assembler code. The assembler code also differs (as expected). The diff file of comparing the assembler code can be found here: http://pastebin.com/uY1LERGX
From a first look it just looks like the sorting is different in the assembler code.
This does not happen when compiling it with gcc.
Is there a way to tell clang to generate exactly the same .o file like gcc does?
clang --version:
Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0)
The feature when compiler always produce the same code is called Reproducible Builds or deterministic compilation.
One of possible sources of compiler's output instability is ASLR (Address space layout randomization). Sometimes compiler, or some libraries used by it, may read object address and use them, for example as keys of hashes or maps; or when sorting objects according to their addresses. When compiler is iterating over the hash, it will read objects in the order that depends on addresses of objects, and ASLR will place objects in different orders. The effect of such may looks like your reordered symbols (.quads in your diffs)
You can disable Linux ASLR globally with echo 0 | sudo tee /proc/sys/kernel/randomize_va_space. Local way of disabling ASLR in Linux is
setarch `uname -m` -R /bin/bash`
man page of setarch says: -R, "--addr-no-randomize" Disables randomization of the virtual address space (turns on ADDR_NO_RANDOMIZE).
For OS X 10.6 there is DYLD_NO_PIE environment variable (check man dyld, possible usage in bash export DYLD_NO_PIE=1); in 10.7 and newer there is --no_pie build flag to be used in building the LLVM itself or by setting _POSIX_SPAWN_DISABLE_ASLR which should be used in posix_spawnattr_setflags before starting the llvm; or by using in 10.7+ the script http://src.chromium.org/viewvc/chrome/trunk/src/build/mac/change_mach_o_flags.py with --no-pie option to clear PIE flag from llvm binaries (thanks to asan people).
There were some errors in clang and llvm which prevents/prevented them to be completely deterministic, for example:
[cfe-dev] clang: not deterministic anymore? - Nov 3 2009, indeterminism was detected on code from LLVM bug 5355. Author says that indeterminism was present only with -g option enabled
[LLVMdev] Deterministic code generation and llvm::Iterators (2010)
[llvm-commits] Fix some TableGen non-deterministic behavior. (Sep 2012)
r196520 - Fix non-deterministic behavior. - SLPVectorizer was fixed into deterministic only at Dec 5, 2013 (replaced SmallSet with VectorSet)
190793 - TableGen: give asm match classes deterministic order. "TableGen was sorting the entries in some of its internal data structures by pointer." - Sep 16, 2013
LLVM bug 14901 is the case when order of compiler warnings was Non-deterministic (Jan 2013).
The patch from 14901 contains comments about non-deterministic iterating over llvm::DenseMap:
- typedef llvm::DenseMap<const VarDecl *, std::pair<UsesVec*, bool> > UsesMap;
+ typedef std::pair<UsesVec*, bool> MappedType;
+ // Prefer using MapVector to DenseMap, so that iteration order will be
+ // the same as insertion order. This is needed to obtain a deterministic
+ // order of diagnostics when calling flushDiagnostics().
+ typedef llvm::MapVector<const VarDecl *, MappedType> UsesMap;
...
- // FIXME: This iteration order, and thus the resulting diagnostic order,
- // is nondeterministic.
Documentation of LLVM says that there are non-deterministic and deterministic variants of several internal containers, like Map vs MapVector: trunk/docs/ProgrammersManual.rst:
1164 The difference between SetVector and other sets is that the order of iteration
1165 is guaranteed to match the order of insertion into the SetVector. This property
1166 is really important for things like sets of pointers. Because pointer values
1167 are non-deterministic (e.g. vary across runs of the program on different
1168 machines), iterating over the pointers in the set will not be in a well-defined
1169 order.
1170
1171 The drawback of SetVector is that it requires twice as much space as a normal
1172 set and has the sum of constant factors from the set-like container and the
1173 sequential container that it uses. Use it **only** if you need to iterate over
1174 the elements in a deterministic order.
...
1277 StringMap iteratation order, however, is not guaranteed to be deterministic, so
1278 any uses which require that should instead use a std::map.
...
1364 ``MapVector<KeyT,ValueT>`` provides a subset of the DenseMap interface. The
1365 main difference is that the iteration order is guaranteed to be the insertion
1366 order, making it an easy (but somewhat expensive) solution for non-deterministic
1367 iteration over maps of pointers.
It is possible that some authors of LLVM thought that in their code there was no need to save determinism in iteration order. For example, there are comments in ARMTargetStreamer about usage of MapVector for ConstantPools (ARMTargetStreamer.cpp - class AssemblerConstantPools). But how can we sure that all usages of non-deterministic containers like DenseMap will not affect output of compiler? There are tens loops iterating over DenseMap: "DenseMap.*const_iterator" regex in codesearch.debian.net
Your version of LLVM and clang (3.0, from 2011-11-30) is clearly too old to have all determinism enhances from 2012 and 2013 years (some are listed in my answer). You should update your LLVM and Clang, then recheck your program for deterministic compilation, then locate non-determinism in shorter and easier to reproduce examples (e.g. save bc - bitcode - from middle stages), then you can post a bug in LLVM bugzilla.
Try the -S option for clang and gcc during compiling your source. This will generate a .s file in which you can see the assembler code this could give you an idea what the differences on a lower level. Maybe you will realise the output will be the same and your problem shifts from the compiler further down to the linker.
You should report this as a bug; a compiler certainly should be deterministic.
Your guess about the sort order is quite probably correct, in my experience. Most likely the compiler makes an arbitrary decision when two items compare equal (according to whatever measure is significant; they don't have to be actually the same), and that can vary depending on environmental factors, somehow. I've seen this before, in GCC, in which the same compiler compiled for different host OS produced different results; in that case it turned out that the Windows qsort function operated slightly differently to the Linux (glibc) implementation.
That said, it could be something else; compilers aren't supposed to make random decisions, but there plenty of opportunities for arbitrary decisions that might turn out to be unstable, somehow (address space randomization, perhaps?)
Stupid question that I'm sure is some bit of syntax that's not right. How do I get dlsym to work with a function that returns a value? I'm getting the error 'invalid conversion of void* to LSError (*)()' in the following code - trying to get the compile the linux lightscribe sample program hoping that I can link it against the OSX dylib (why the hell won't HP release an actual Cocoa SDK? LS has only been around for what? 6 or 7 years now?):
void* LSHandle = dlopen("liblightscribe.1.dylib", RTLD_LOCAL|RTLD_LAZY);
if (LSHandle) {
LSError (*LS_DiscPrinter_ReleaseExclusiveUse)() = dlsym(LSHandle, "LS_DiscPrinter_ReleaseExclusiveUse");
..
lsError = LS_DiscPrinter_ReleaseExclusiveUse( pDiscPrinter);
The C standard does not actually define behaviour for converting to and from function pointers. Explanations vary as to why; the most common being that not all architectures implement function pointers as simple pointers to data. On some architectures, functions may reside in an entirely different segment of memory that is unaddressable using a pointer to void.
The “recommended” way to use dlsym is:
LSError (*LS_DiscPrinter_ReleaseExclusiveUse)(LS_DiscPrinterHandle);
*(void **)&LS_DiscPrinter_ReleaseExclusiveUse = dlsym("LS_DiscPrinter_ReleaseExclusiveUse");
Read the rationale and example on the POSIX page for dlsym for more information.