F# comparing lambdas for equality - serialization

I would like to try and compare F# lambdas for equality. This is, at first inspection, not possible.
let foo = 10
let la = (fun x y -> x + y + foo)
let lb = (fun x y -> x + y + foo)
printfn "lambda equals %b" (la = lb)
which generates the error
The type '('a -> 'b -> int)' does not support the 'equality' constraint because it is a function typeF# Compiler(1)
However, and surprisingly, it is possible to serialize lambda functions.
open System.Runtime.Serialization.Formatters.Binary
open System.IO
let serialize o =
let bf = BinaryFormatter()
use ms = new MemoryStream()
bf.Serialize(ms,o)
ms.ToArray()
let ByteToHex bytes =
bytes
|> Array.map (fun (x : byte) -> System.String.Format("{0:X2}", x))
|> String.concat System.String.Empty
let foo = 10
let la = (fun x y -> x + y + foo)
let lb = (fun x y -> x + y + foo)
let a = serialize la
let b = serialize lb
printfn "%s" (ByteToHex a)
printfn "%s" (ByteToHex b)
printfn "lambda equals %b" (a = b)
which suggests that if they can be serialized they can be compared. However, inspection of the byte stream for this example shows two bytes where there is a difference.
Is there possibly a strategy to solve this problem by intelligently comparing the byte arrays?

From an equivalence perspective, functions aren't meaningfully serialized.
Curryable functions in F# are implemented as derived from FSharpFunc.
let la = (fun x y -> x + y + foo)
would be implemented as an instance of the following class (in equivalent C#):
[Serializable] class Impl : FSharpFunc<int, int, int>
{
public int foo;
Impl(int foo_) => foo = foo_;
public override int Invoke(int x, int y) =>
x + y + _foo;
}
What binary serialization captures would be the full typename and the value of foo.
In fact if we look at strings in the byte stream we see:
test, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
Program+la#28
foo
...where la#28 is the name of our derived class.
Where the byte stream for la and lb differs is the name of the implementing class. The implementations of la and lb could be entirely different.
You could, for instance, change lb into let lb = (fun x y -> x * y + foo), and the result would be same for both runs.
You can however, do this with Code Quotations:
let foo = 10
let la = <# fun x y -> x + y + foo #>
let lb = <# fun x y -> x + y + foo #>
printfn "Is same: %b" (la.ToString() = lb.ToString()) //true
F# also supports Expression<Func<>> (C#'s expression trees) - which is also a valid avenue for comparison.

Related

How can I reference an Idris interface method from within the same interface declaration?

I built a little TotalOrder class that has stronger proof properties than the regular Ord class. The problem is that these properties need to refer to each other in ways I'm not sure how to do, syntactically.
I need to reference my lessThan type in the other methods in the interface, but it seems to be getting treated as a regular (implicitly forall'd) variable.
module Order
%default total
public export
data Ordering : (t -> t -> Type) -> t -> t -> Type where
LessThan : f x y -> Ordering f x y
Equal : x = y -> Ordering f x y
GreaterThan : f y x -> Ordering f x y
public export
interface TotalOrder a where
lessThan : a -> a -> Type
compare : (x, y : a) -> Ordering lessThan x y
ltNeq : lessThan x y -> x = y -> Void
ltNgt : lessThan x y -> lessThan y x -> Void
-- Implementation for Nats
compareNat : (x, y : Nat) -> Ordering LT x y
compareNat x y = ...
ltNeqNat : LT x y -> x = y -> Void
ltNeqNat lt eq = ...
ltNgtNat : LT x y -> LT y x -> Void
ltNgtNat lt gt = ...
implementation TotalOrder Nat where
lessThan = LT
compare = compareNat -- THIS LINE REFUSES TO COMPILE
ltNeq = ltNeqNat
ltNgt = ltNgtNat
As best I can determine, the typechecker automatically assumes that the lessThans in function position in ltNeq and ltNgt are the same as the lessThan declared in the interface, but the lessThan in argument position in compare is treated as a regular argument to compare, meaning that the type for the Nat implementation is
compare : {lessThan : Nat -> Nat -> Type} -> (x, y : Nat) -> Ordering lessThan x y
rather than
compare : (x, y : Nat) -> Ordering LT x y
and compare = compareNat no longer typechecks. When I run into this problem in top-level functions, fully-qualifying the name works, but I'm not sure what the fully-qualified name here would be, and the obvious ideas (Order.lessThan, TotalOrder.lessThan, Order.TotalOrder.lessThan) all get "no such variable"d.

Changing a mutable field in OCaml

When I run the following code I get a syntax error, although as far as I can tell the syntax is correct. This attempts to implement a queue structure, where the function from_list converts a list to a queue with the corresponding values. I wrote str_of_int_q to print the contents of a queue. x and y are supposed to be two nodes, with x at the head and y at the tail.
;; open Assert
type 'a qnode = {v: 'a;
mutable next: 'a qnode option}
type 'a queue = {mutable head: 'a qnode option;
mutable tail: 'a qnode option}
let from_list (l: 'a list) : 'a queue =
let rec loop (l2: 'a list) (qu: 'a queue) =
begin match l2 with
| [] -> qu
| [x] -> let y = {v = x; next = None} in
qu.head <- Some y; qu.tail <- Some y;
qu
| h1::h2::t -> let y = qu.head in
let z = {v = h1; next = y} in
qu.head <- Some z;
qu
end
in loop l {head = None; tail = None}
let str_of_int_q (q: int queue) : string =
let rec loop (r: int qnode option) (s: string) : string =
begin match r with
| None -> s
| Some n -> loop n.next (s ^ (string_of_int n.v))
end
in loop q.head ""
let x = {v = 1; next = None}
let y = {v = 2; next = None}
x.next <- Some y;
let z = {head = Some x; tail = Some y}
;; print_endline (str_of_int_q z)
My error:
line 32, characters 7-9:
Error: Syntax error
Line 32 is the line x.next <- Some y; and characters 7-9 indicate the <-. But I'm storing into a mutable field an object of the appropriate type, so I don't see what's going wrong.
Top-level statements are separated by ;; in OCaml. However, ;; is optional before several keywords, such as let, open, type, etc. This is why you don't need ;; most of the time.
In your case, ;; is needed to disambiguate between let y = {v = 2; next = None} and x.next <- Some y. The latter is an expression and doesn't start with a special keyword, so OCaml doesn't know to insert an implicit ;; here.
See also http://ocaml.org/learn/tutorials/structure_of_ocaml_programs.html#The-disappearance-of.
As explained there, you can either do
let y = {v = 2; next = None}
;; x.next <- Some y
or
let y = {v = 2; next = None}
let () = x.next <- Some y
This latter solution works because by introducing a dummy binding we're starting our statement with let, which disambiguates again.
Note: I've also removed the trailing ; from your code. ; is actually an infix operator that combines two expressions (by throwing the result of the first one away and returning the result of the second one). This is not what you want here.

How to prove that the boolean inequality of a type with itself is uninhabited in Idris?

I was wondering how to prove that (So (not (y == y))) is an instance of Uninhabited, and I'm not sure how to go about it. Is it provable in Idris, or is not provable due to the possibility of a weird Eq implementation for y?
The Eq interface does not require an implementation to follow the normal laws of equality. But, we can define an extended LawfulEq interface which does:
%default total
is_reflexive : (t -> t -> Bool) -> Type
is_reflexive {t} rel = (x : t) -> rel x x = True
is_symmetric : (t -> t -> Bool) -> Type
is_symmetric {t} rel = (x : t) -> (y : t) -> rel x y = rel y x
is_transitive : (t -> t -> Bool) -> Type
is_transitive {t} rel = (x : t) -> (y : t) -> (z : t) -> rel x y = True -> rel x z = rel y z
interface Eq t => LawfulEq t where
eq_is_reflexive : is_reflexive {t} (==)
eq_is_symmetric : is_symmetric {t} (==)
eq_is_transitive : is_transitive {t} (==)
The result asked for in the question can be proved for type Bool:
so_false_is_void : So False -> Void
so_false_is_void Oh impossible
so_not_y_eq_y_is_void : (y : Bool) -> So (not (y == y)) -> Void
so_not_y_eq_y_is_void False = so_false_is_void
so_not_y_eq_y_is_void True = so_false_is_void
The result can be proved not true for the following Weird type:
data Weird = W
Eq Weird where
W == W = False
weird_so_not_y_eq_y : (y : Weird) -> So (not (y == y))
weird_so_not_y_eq_y W = Oh
The Weird (==) can be shown to be not reflexive, so an implementation of LawfulEq Weird is not possible:
weird_eq_not_reflexive : is_reflexive {t=Weird} (==) -> Void
weird_eq_not_reflexive is_reflexive_eq =
let w_eq_w_is_true = is_reflexive_eq W in
trueNotFalse $ trans (sym w_eq_w_is_true) (the (W == W = False) Refl)
Shersh is right: you can't. Implementations of (==) aren't guaranteed to be reflexive, so it might not be true.
You could restrict the type of y so that you are proving a property of a specific implementation of (==), but I suspect you want to use decEq and (=) instead of So and (==). It's easy to show Not (y = y) is uninhabited.

Proof in Coq that equality is reflexivity

The HOTT book writes on page 51:
... we can prove by path induction on p: x = y that
$(x, y, p) =_{ \sum_{(x,y:A)} (x=y)} (x, x, refl x)$ .
Can someone show me how to prove this in Coq?
Actually, it is possible to prove this result in Coq:
Notation "y ; z" := (existT _ y z) (at level 80, right associativity).
Definition hott51 T x y e :
(x; y; e) = (x; x; eq_refl) :> {x : T & {y : T & x = y} } :=
match e with
| eq_refl => eq_refl
end.
Here, I've used a semicolon tuple notation to express dependent pairs; in Coq, {x : T & T x} is the sigma type \sum_{x : T} T x. There is also a slightly easier-to-read variant, where we do not mention y:
Definition hott51' T x e : (x; e) = (x; eq_refl) :> {y : T & x = y} :=
match e with
| eq_refl => eq_refl
end.
If you're not used to writing proof terms by hand, this code might look a bit mysterious, but it is doing exactly what the HoTT book says: proceeding by path induction. There's one crucial bit of information that is missing here, which are the type annotations needed to do path induction. Coq is able to infer those, but we can ask it to tell us what they are explicitly by printing the term. For hott51', we get the following (after some rewriting):
hott51' =
fun (T : Type) (x : T) (e : x = x) =>
match e as e' in _ = y' return (y'; e') = (x; eq_refl) with
| eq_refl => eq_refl
end
: forall (T : Type) (x : T) (e : x = x),
(x; e) = (x; eq_refl)
The important detail there is that in the return type of the match, both x and e are generalized to y' and e'. The only reason this is possible is because we wrapped x in a pair. Consider what would happen if we tried proving UIP:
Fail Definition uip T (x : T) (e : x = x) : e = eq_refl :=
match e as e' in _ = y' return e' = eq_refl with
| eq_refl => eq_refl
end.
Here, Coq complains, saying:
The command has indeed failed with message:
In environment
T : Type
x : T
e : x = x
y' : T
e' : x = y'
The term "eq_refl" has type "x = x" while it is expected to have type
"x = y'" (cannot unify "x" and "y'").
What this error message is saying is that, in the return type of the match, the e' has type x = y', where y' is generalized. This means that the equality e' = eq_refl is ill-typed, because the right-hand side must have type x = x or y' = y'.
Simple answer: you can't. All proofs of x = y in Coq are not instances of eq_refl x. You will have to assume Uniqueness of Identity Proof to have such a result. This is a very nice axiom, but it's still an axiom in the Calculus of Inductive Constructions.

To memoize or not to memoize

... that is the question. I have been working on an algorithm which takes an array of vectors as input, and part of the algorithm repeatedly picks pairs of vectors and evaluates a function of these two vectors, which doesn't change over time. Looking at ways to optimize the algorithm, I thought this would be a good case for memoization: instead of recomputing the same function value over and over again, cache it lazily and hit the cache.
Before jumping to code, here is the gist of my question: the benefits I get from memoization depend on the number of vectors, which I think is inversely related to number of repeated calls, and in some circumstances memoization completely degrades performance. So is my situation inadequate for memoization? Am I doing something wrong, and are there smarter ways to optimize for my situation?
Here is a simplified test script, which is fairly close to the real thing:
open System
open System.Diagnostics
open System.Collections.Generic
let size = 10 // observations
let dim = 10 // features per observation
let runs = 10000000 // number of function calls
let rng = new Random()
let clock = new Stopwatch()
let data =
[| for i in 1 .. size ->
[ for j in 1 .. dim -> rng.NextDouble() ] |]
let testPairs = [| for i in 1 .. runs -> rng.Next(size), rng.Next(size) |]
let f v1 v2 = List.fold2 (fun acc x y -> acc + (x-y) * (x-y)) 0.0 v1 v2
printfn "Raw"
clock.Restart()
testPairs |> Array.averageBy (fun (i, j) -> f data.[i] data.[j]) |> printfn "Check: %f"
printfn "Raw: %i" clock.ElapsedMilliseconds
I create a list of random vectors (data), a random collection of indexes (testPairs), and run f on each of the pairs.
Here is the memoized version:
let memoized =
let cache = new Dictionary<(int*int),float>(HashIdentity.Structural)
fun key ->
match cache.TryGetValue(key) with
| true, v -> v
| false, _ ->
let v = f data.[fst key] data.[snd key]
cache.Add(key, v)
v
printfn "Memoized"
clock.Restart()
testPairs |> Array.averageBy (fun (i, j) -> memoized (i, j)) |> printfn "Check: %f"
printfn "Memoized: %i" clock.ElapsedMilliseconds
Here is what I am observing:
* when size is small (10), memoization goes about twice as fast as the raw version,
* when size is large (1000), memoization take 15x more time than raw version,
* when f is costly, memoization improves things
My interpretation is that when the size is small, we have more repeat computations, and the cache pays off.
What surprised me was the huge performance hit for larger sizes, and I am not certain what is causing it. I know I could improve the dictionary access a bit, with a struct key for instance - but I didn't expect the "naive" version to behave so poorly.
So - is there something obviously wrong with what I am doing? Is memoization the wrong approach for my situation, and if yes, is there a better approach?
I think memoization is a useful technique, but it is not a silver bullet. It is very useful in dynamic programming where it reduces the (theoretical) complexity of the algorithm. As an optimization, it can (as you would probably expect) have varying results.
In your case, the cache is certainly more useful when the number of observations is smaller (and f is more expensive computation). You can add simple statistics to your memoization:
let stats = ref (0, 0) // Count number of cache misses & hits
let memoized =
let cache = new Dictionary<(int*int),float>(HashIdentity.Structural)
fun key ->
let (mis, hit) = !stats
match cache.TryGetValue(key) with
| true, v -> stats := (mis, hit + 1); v // Increment hit count
| false, _ ->
stats := (mis + 1, hit); // Increment miss count
let v = f data.[fst key] data.[snd key]
cache.Add(key, v)
v
For small size, the numbers I get are something like (100, 999900) so there is a huge benefit from memoization - the function f is computed 100x and then each result is reused 9999x.
For big size, I get something like (632331, 1367669) so f is called many times and each result is reused just twice. In that case, the overhead with allocation and lookup in the (big) hash table is much bigger.
As a minor optimization, you can pre-allocate the Dictionary and write new Dictionary<_, _>(10000,HashIdentity.Structural), but that does not seem to help much in this case.
To make this optimization efficient, I think you would need to know some more information about the memoized function. In your example, the inputs are quite regular, so there is porbably no point in memoization, but if you know that the function is more often called with some values of arguments, you can perhaps only memoize only for these common arguments.
Tomas's answer is great for when you should use memoization. Here's why memoization is going so slow in your case.
It sounds like you're testing in Debug mode. Run your test again in Release and you should get a faster result for memoization. Tuples can cause a large performance hit while in Debug mode. I added a hashed version for comparison along with some micro optimizations.
Release
Raw
Check: 1.441687
Raw: 894
Memoized
Check: 1.441687
Memoized: 733
memoizedHash
Check: 1.441687
memoizedHash: 552
memoizedHashInline
Check: 1.441687
memoizedHashInline: 493
memoizedHashInline2
Check: 1.441687
memoizedHashInline2: 385
Debug
Raw
Check: 1.409310
Raw: 797
Memoized
Check: 1.409310
Memoized: 5190
memoizedHash
Check: 1.409310
memoizedHash: 593
memoizedHashInline
Check: 1.409310
memoizedHashInline: 497
memoizedHashInline2
Check: 1.409310
memoizedHashInline2: 373
Source
open System
open System.Diagnostics
open System.Collections.Generic
let size = 10 // observations
let dim = 10 // features per observation
let runs = 10000000 // number of function calls
let rng = new Random()
let clock = new Stopwatch()
let data =
[| for i in 1 .. size ->
[ for j in 1 .. dim -> rng.NextDouble() ] |]
let testPairs = [| for i in 1 .. runs -> rng.Next(size), rng.Next(size) |]
let f v1 v2 = List.fold2 (fun acc x y -> acc + (x-y) * (x-y)) 0.0 v1 v2
printfn "Raw"
clock.Restart()
testPairs |> Array.averageBy (fun (i, j) -> f data.[i] data.[j]) |> printfn "Check: %f"
printfn "Raw: %i\n" clock.ElapsedMilliseconds
let memoized =
let cache = new Dictionary<(int*int),float>(HashIdentity.Structural)
fun key ->
match cache.TryGetValue(key) with
| true, v -> v
| false, _ ->
let v = f data.[fst key] data.[snd key]
cache.Add(key, v)
v
printfn "Memoized"
clock.Restart()
testPairs |> Array.averageBy (fun (i, j) -> memoized (i, j)) |> printfn "Check: %f"
printfn "Memoized: %i\n" clock.ElapsedMilliseconds
let memoizedHash =
let cache = new Dictionary<int,float>(HashIdentity.Structural)
fun key ->
match cache.TryGetValue(key) with
| true, v -> v
| false, _ ->
let i = key / size
let j = key % size
let v = f data.[i] data.[j]
cache.Add(key, v)
v
printfn "memoizedHash"
clock.Restart()
testPairs |> Array.averageBy (fun (i, j) -> memoizedHash (i * size + j)) |> printfn "Check: %f"
printfn "memoizedHash: %i\n" clock.ElapsedMilliseconds
let memoizedHashInline =
let cache = new Dictionary<int,float>(HashIdentity.Structural)
fun key ->
match cache.TryGetValue(key) with
| true, v -> v
| false, _ ->
let i = key / size
let j = key % size
let v = f data.[i] data.[j]
cache.Add(key, v)
v
printfn "memoizedHashInline"
clock.Restart()
let mutable total = 0.0
for i, j in testPairs do
total <- total + memoizedHashInline (i * size + j)
printfn "Check: %f" (total / float testPairs.Length)
printfn "memoizedHashInline: %i\n" clock.ElapsedMilliseconds
printfn "memoizedHashInline2"
clock.Restart()
let mutable total2 = 0.0
let cache = new Dictionary<int,float>(HashIdentity.Structural)
for i, j in testPairs do
let key = (i * size + j)
match cache.TryGetValue(key) with
| true, v -> total2 <- total2 + v
| false, _ ->
let i = key / size
let j = key % size
let v = f data.[i] data.[j]
cache.Add(key, v)
total2 <- total2 + v
printfn "Check: %f" (total2 / float testPairs.Length)
printfn "memoizedHashInline2: %i\n" clock.ElapsedMilliseconds
Console.ReadLine() |> ignore