Testing bignum arithmetic - testing

I'm writing an arbitrary precision rational number package, which I'll need to test for correctness and efficiency. Of course I could put together an ad hoc set of tests myself, but since I'm far from the first to be doing this, I figure it's worth asking: can anyone recommend an existing set of tests I could use?
Edit: I ended up writing a test routine that each time around the loop, generates three random numbers and verifies that various arithmetic identities hold. It's found several bugs in the numeric code so far. Here's the actual code:
for (i = 0;; i++)
{
mem = memlo;
printf(fmtw "\r", i);
a = rndnum();
b = rndnum();
c = rndnum();
// Equality
test(eq(a, a));
test(!eq(a, b) || !eq(b, c) || eq(a, c));
// Addition
test(eq(add(add(a, b), c), add(a, add(b, c))));
test(eq(add(a, b), add(b, a)));
test(eq(add(a, zero), a));
// Subtraction
test(eq(sub(add(a, b), b), a));
test(sub(a, a) == zero);
test(eq(sub(a, b), add(a, sub(zero, b))));
// Multiplication
test(eq(mul(mul(a, b), c), mul(a, mul(b, c))));
test(eq(mul(a, b), mul(b, a)));
test(eq(mul(a, one), a));
test(eq(mul(a, add(b, c)), add(mul(a, b), mul(a, c))));
// Division
test(b == zero || eq(div_(mul(a, b), b), a));
test(a == zero || div_(a, a) == (one));
test(b == zero || eq(div_(a, b), mul(a, div_(one, b))));
test(c == zero
|| eq(div_(sub(a, b), c), sub(div_(a, c), div_(b, c))));
// I/O
test(eq(a, roundtrip(a)));
}

Try looking at unit tests of open source rational implementations. Ruby's rational supports arbitrary precision, though only a few tests in test/ruby/test_rational2.rb push past 32 bits. For instance:
assert_equal(Rational(2305842940494218450, 1152921470247108503),
Rational(1073741789, 1073741827) + Rational(1073741827, 1073741789))
Similarly for Python's test_fractions.py:
self.assertTypedEquals(10**23, 10**22 // F(1, 10))
GNU MPL has some rational unit tests mostly based on random numbers.
The IMath package has a good set of tests, such as:
qadd:-14,9/2,-20:-19/2
qadd:-60,=1,43/2:-120
qadd:375/18696391582109365451,-131/32949770573031503434,166/56750232802998421883:9906936667630486913669/616041813174061083543307811398267458734
qadd:615/80348516296708248277,=1,=2:1230/80348516296708248277
It's not obvious to me how other open source rational bignum packages like Scheme and Sage are tested, but if you're motivated their tests should exist somewhere.

Related

Algorithms for factorizing a 30 decimal digit number

My professor has given me an RSA factoring problem has assignment. The given modulus is 30 decimal digits long. I have been searching a lot about factoring algorithms. But it has been quite a headache to choose one for my given requirements. Which all algorithms give better performance for 30 decimal digit numbers?
Note: So far I have read about Brute force approach and Quadratic Sieve. The latter is complex and the former time consuming.
There's another method called Pollard's Rho algorithm, which is not as fast as the GNFS but is capable of factoring 30-digit numbers in minutes rather than hours.
The algorithm is very simple. It stops when it finds any factor, so you'll need to call it recursively to obtain a complete factorisation. Here's a basic implementation in Python:
def rho(n):
def gcd(a, b):
while b > 0:
a, b = b, a%b
return a
g = lambda z: (z**2 + 1) % n
x, y, d = 2, 2, 1
while d == 1:
x = g(x)
y = g(g(y))
d = gcd(abs(x-y), n)
if d == n:
print("Can't factor this, sorry.")
print("Try a different polynomial for g(), maybe?")
else:
print("%d = %d * %d" % (n, d, n // d))
rho(441693463910910230162813378557) # = 763728550191017 * 578338290221621
Or you could just use an existing software library. I can't see much point in reinventing this particular wheel.

Calculate function time complexity

I am trying to calculate the time complexity of this function
Code
int Almacen::poner_items(id_sala s, id_producto p, int cantidad){
it_prod r = productos.find(p);
if(r != productos.end()) {
int n = salas[s - 1].size();
int m = salas[s - 1][0].size();
for(int i = n - 1; i >= 0 && cantidad > 0; --i) {
for(int j = 0; j < m && cantidad > 0; ++j) {
if(salas[s - 1][i][j] == "NULL") {
salas[s - 1][i][j] = p;
r->second += 1;
--cantidad;
}
}
}
}
else {
displayError();
return -1;
}
return cantidad;
}
the variable productos is a std::map and its find method has a time complexity of Olog(n) and other variable salas is a std::vector.
I calculated the time and I found that it was log(n) + nm but am not sure if it is the correct expression or I should leave it as nm because it is the worst or if I whould use n² only.
Thanks
The overall function is O(nm). Big-O notation is all about "in the limit of large values" (and ignores constant factors). "Small" overheads (like an O(log n) lookup, or even an O(n log n) sort) are ignored.
Actually, the O(n log n) sort case is a bit more complex. If you expect m to be typically the same sort of size as n, then O(nm + nlogn) == O(nm), if you expect n ≫ m, then O(nm + nlogn) == O(nlogn).
Incidentally, this is not a question about C++.
In general when using big O notation, you only leave the most dominant term when taking all variables to infinity.
n by itself is much larger than log n at infinity, so even without m you can (and generally should) drop the log n term, so O(nm) looks fine to me.
In non-theoretical use cases, it is sometimes important to understand the actual complexity (for non-infinite inputs), since sometimes algorithms that are slow at infinity can produce better results for shorter inputs (there are some examples where O(1) algorithms have such a terrible constant that an exponential algorithm does better in real life). quick sort is considered a practical example of an O(n^2) algorithm that often does better than it's O(n log n) counterparts.
Read about "Big O Notation" for more info.
let
k = productos.size()
n = salas[s - 1].size()
m = salas[s - 1][0].size()
your algorithm is O(log(k) + nm). You need to use a distinct name for each independent variable
Now it might be the case that there is a relation between k, n, m and you can re-label with a reduced set of variables, but that is not discernible from your code, you need to know about the data.
It may also be the case that some of these terms won't grow large, in which case they are actually constants, i.e. O(1).
E.g. you may know k << n, k << m and n ~= m , which allows you describe it as O(n^2)

meaning of "<>" in ADT specification

In reading "Data Structures Using C" (Tenenbaum, Langsam, Augenstein), I've come across an unfamiliar operator in an explanation of abstract data types.
Here's a part specifying rational as an ADT (pg. 14):
"
/* value definition*/
abstract typedef <integer, integer> RATIONAL;
condition RATIONAL[1] <> 0;
/*operator definition*/
abstract RATIONAL makerational(a,b)
int a,b;
precondition b<>0;
postcondition makerational[0] == a;
makerational[1] == b;
abstract RATIONAL add(a,b)
RATIONAL a,b;
postcondition add[1] == a[1] * b[1]
postcondition add[0] == a[0] * b[1] + b[0] * a[1];
..."
My understanding is that the overall goal here is to define a rational data type for input integers a over b in terms of legal operations. However, the use of the <> operator on lines 4 and 9 is not clear. What's the meaning of "<>" in this context?
A quick search through the stacks returned no relevant results, but I apologize in advance for a possible repeat.

Optimizing partial computation in Haskell

I'm curious how to optimize this code :
fun n = (sum l, f $ f0 l, g $ g0 l)
where l = map h [1..n]
Assuming that f, f0, g, g0, and h are all costly, but the creation and storage of l is extremely expensive.
As written, l is stored until the returned tuple is fully evaluated or garbage collected. Instead, length l, f0 l, and g0 l should all be executed whenever any one of them is executed, but f and g should be delayed.
It appears this behavior could be fixed by writing :
fun n = a `seq` b `seq` c `seq` (a, f b, g c)
where
l = map h [1..n]
a = sum l
b = inline f0 $ l
c = inline g0 $ l
Or the very similar :
fun n = (a,b,c) `deepSeq` (a, f b, g c)
where ...
We could perhaps specify a bunch of internal types to achieve the same effects as well, which looks painful. Are there any other options?
Also, I'm obviously hoping with my inlines that the compiler fuses sum, f0, and g0 into a single loop that constructs and consumes l term by term. I could make this explicit through manual inlining, but that'd suck. Are there ways to explicitly prevent the list l from ever being created and/or compel inlining? Pragmas that produce warnings or errors if inlining or fusion fail during compilation perhaps?
As an aside, I'm curious about why seq, inline, lazy, etc. are all defined to by let x = x in x in the Prelude. Is this simply to give them a definition for the compiler to override?
If you want to be sure, the only way is to do it yourself. For any given compiler version, you can try out several source-formulations and check the generated core/assembly/llvm byte-code/whatever whether it does what you want. But that could break with each new compiler version.
If you write
fun n = a `seq` b `seq` c `seq` (a, f b, g c)
where
l = map h [1..n]
a = sum l
b = inline f0 $ l
c = inline g0 $ l
or the deepseq version thereof, the compiler might be able to merge the computations of a, b and c to be performed in parallel (not in the concurrency sense) during a single traversal of l, but for the time being, I'm rather convinced that GHC doesn't, and I'd be surprised if JHC or UHC did. And for that the structure of computing b and c needs to be simple enough.
The only way to obtain the desired result portably across compilers and compiler versions is to do it yourself. For the next few years, at least.
Depending on f0 and g0, it might be as simple as doing a strict left fold with appropriate accumulator type and combining function, like the famous average
data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Double
average :: [Double] -> Double
average = ratio . foldl' count (P 0 0)
where
ratio (P n s) = s / fromIntegral n
count (P n s) x = P (n+1) (s+x)
but if the structure of f0 and/or g0 doesn't fit, say one's a left fold and the other a right fold, it may be impossible to do the computation in one traversal. In such cases, the choice is between recreating l and storing l. Storing l is easy to achieve with explicit sharing (where l = map h [1..n]), but recreating it may be difficult to achieve if the compiler does some common subexpression elimination (unfortunately, GHC does have a tendency to share lists of that form, even though it does little CSE). For GHC, the flags fno-cse and -fno-full-laziness can help avoiding unwanted sharing.

Finite difference in Haskell, or how to disable potential optimizations

I'd like to implement the following naive (first order) finite differencing function:
finite_difference :: Fractional a => a -> (a -> a) -> a -> a
finite_difference h f x = ((f $ x + h) - (f x)) / h
As you may know, there is a subtle problem: one has to make sure that (x + h) and x differ by an exactly representable number. Otherwise, the result has a huge error, leveraged by the fact that (f $ x + h) - (f x) involves catastrophic cancellation (and one has to carefully choose h, but that is not my problem here).
In C or C++, the problem can be solved like this:
volatile double temp = x + h;
h = temp - x;
and the volatile modifier disables any optimization pertaining to the variable temp, so we are assured that a "clever" compiler will not optimize away those two lines.
I don't know enough Haskell yet to know how to solve this. I'm afraid that
let temp = x + h
hh = temp - x
in ((f $ x + hh) - (f x)) / h
will get optimized away by Haskell (or the backend it uses). How do I get the equivalent of volatile here (if possible without sacrificing laziness)? I don't mind GHC specific answers.
I have two solutions and a suggestion:
First solution: You can guarantee that this won't be optimized out with two helper functions and the NOINLINE pragma:
norm1 x h = x+h
{-# NOINLINE norm1 #-}
norm2 x tmp = tmp-x
{-# NOINLINE norm2 #-}
normh x h = norm2 x (norm1 x h)
This will work, but will introduce a small cost.
Second solution: Write the normalization function in C using volatile and call it through the FFI. The performance penalty will be minimal.
Now for the suggestion: Currently the math isn't optimized out, so it will work properly at present. You're afraid it will break in a future compiler. I think this is unlikely, but not so unlikely that I wouldn't want to guard against it also. So write some unit tests that cover the cases in question. Then if it does break in the future (for any reason), you'll know exactly why.
One way is to look at the Core.
Specializing to Doubles (which will be the case most likely to trigger some optimization):
finite_difference :: Double -> (Double -> Double) -> Double -> Double
finite_difference h f x = ((f $ x + hh) - (f x)) / h
where
temp = x + h
hh = temp - x
Is compiled to:
A.$wfinite_difference h f x =
case f (case x of
D# x' -> D# (+## x' (-## (+## x' h) x'))
) of
D# x'' -> case f x of D# y -> /## (-## x'' y) h
And similarly (with even less rewriting) for the polymorphic version.
So while the variables are inlined, the math isn't optimized away.
Beyond looking at the Core, I can't think of a way to guarantee the property you want.
I don't think that
temp = unsafePerformIO $ return $ x + h
would get optimised out. Just a guess.