Pointwise function equality proof [duplicate] - idris

I'm just starting playing with idris and theorem proving in general. I can follow most of the examples of proofs of basic facts on the internet, so I wanted to try something arbitrary by my own. So, I want to write a proof term for the following basic property of map:
map : (a -> b) -> List a -> List b
prf : map id = id
Intuitively, I can imagine how the proof should work: Take an arbitrary list l and analyze the possibilities for map id l. When l is empty, it's obvious; when
l is non-empty it's based on the concept that function application preserves equality.
So, I can do something like this:
prf' : (l : List a) -> map id l = id l
It's like a for all statement. How can I turn it into a proof of the equality of the functions involved?

You can't. Idris's type theory (like Coq's and Agda's) does not support general extensionality. Given two functions f and g that "act the same", you will never be able to prove Not (f = g), but you will only be able to prove f = g if f and g are defined the same, up to alpha and eta equivalence or so. Unfortunately, things only get worse when you consider higher-order functions; there's a theorem about such in the Coq standard library, but I can't seem to find or remember it right now.

Related

Idiomatic way of listing elements of a sum type in Idris

I have a sum type representing arithmetic operators:
data Operator = Add | Substract | Multiply | Divide
and I'm trying to write a parser for it. For that, I would need an exhaustive list of all the operators.
In Haskell I would use deriving (Enum, Bounded) like suggested in the following StackOverflow question: Getting a list of all possible data type values in Haskell
Unfortunately, there doesn't seem to be such a mechanism in Idris as suggested by Issue #19. There is some ongoing work by David Christiansen on the question so hopefully the situation will improve in the future : david-christiansen/derive-all-the-instances
Coming from Scala, I am used to listing the elements manually, so I pretty naturally came up with the following:
Operators : Vect 4 Operator
Operators = [Add, Substract, Multiply, Divide]
To make sure that Operators contains all the elements, I added the following proof:
total
opInOps : Elem op Operators
opInOps {op = Add} = Here
opInOps {op = Substract} = There Here
opInOps {op = Multiply} = There (There Here)
opInOps {op = Divide} = There (There (There Here))
so that if I add an element to Operator without adding it to Operators, the totality checker complains:
Parsers.opInOps is not total as there are missing cases
It does the job but it is a lot of boilerplate.
Did I miss something? Is there a better way of doing it?
There is an option of using such feature of the language as elaborator reflection to get the list of all constructors.
Here is a pretty dumb approach to solving this particular problem (I'm posting this because the documentation at the moment is very scarce):
%language ElabReflection
data Operator = Add | Subtract | Multiply | Divide
constrsOfOperator : Elab ()
constrsOfOperator =
do (MkDatatype _ _ _ constrs) <- lookupDatatypeExact `{Operator}
loop $ map fst constrs
where loop : List TTName -> Elab ()
loop [] =
do fill `([] : List Operator); solve
loop (c :: cs) =
do [x, xs] <- apply `(List.(::) : Operator -> List Operator -> List Operator) [False, False]
solve
focus x; fill (Var c); solve
focus xs
loop cs
allOperators : List Operator
allOperators = %runElab constrsOfOperator
A couple comments:
It seems that to solve this problem for any inductive datatype of a similar structure one would need to work through the Elaborator Reflection: Extending Idris in Idris paper.
Maybe the pruviloj library has something that might make solving this problem for a more general case easier.

Where is the Idris == operator useful?

As a beginner in type-driven programming, I'm curious about the use of the == operator. Examples demonstrate that it's not sufficient to prove equality between two values of a certain type, and special equality checking types are introduced for the particular data types. In that case, where is == useful at all?
(==) (as the single constituent function of the Eq interface) is a function from a type T to Bool, and is good for equational reasoning. Whereas x = y (where x : T and y : T) AKA "intensional equality" is itself a type and therefore a proposition. You can and often will want to bounce back and forth between the two different ways of expressing equality for a particular type.
x == y = True is also a proposition, and is often an intermediate step between reasoning about (==) and reasoning about =.
The exact relationship between the two types of equality is rather complex, and you can read https://github.com/pdorrell/learning-idris/blob/9d3454a77f6e21cd476bd17c0bfd2a8a41f382b7/finished/EqFromEquality.idr for my own attempt to understand some aspects of it. (One thing to note is that even though an inductively defined type will have decideable intensional equality, you still have to go through a few hoops to prove that, and a few more hoops to define a corresponding implementation of Eq.)
One particular handy code snippet is this:
-- for rel x y, provide both the computed value, and the proposition that it is equal to the value (as a dependent pair)
has_value_dpair : (rel : t -> t -> Bool) -> (x : t) -> (y : t) -> (value: Bool ** rel x y = value)
has_value_dpair rel x y = (rel x y ** Refl)
You can use it with the with construct when you have a value returned from rel x y and you want to reason about the proposition rel x y = True or rel x y = False (and rel is some function that might represent a notion of equality between x and y).
(In this answer I assume the case where (==) corresponds to =, but you are entirely free to define a (==) function that doesn't correspond to =, eg when defining a Setoid. So that's another reason to use (==) instead of =.)
You still need good old equality because sometimes you can't prove things. Sometimes you don't even need to prove. Consider next example:
countEquals : Eq a => a -> List a -> Nat
countEquals x = length . filter (== x)
You might want to just count number of equal elements to show some statistics to user. Another example: tests. Yes, even with strong type system and dependent types you might want to perform good old unit tests. So you want to check for expectations and this is rather convenient to do with (==) operator.
I'm not going to write full list of cases where you might need (==). Equality operator is not enough for proving but you don't always need proofs.

Can GHC really never inline map, scanl, foldr, etc.?

I've noticed the GHC manual says "for a self-recursive function, the loop breaker can only be the function itself, so an INLINE pragma is always ignored."
Doesn't this say every application of common recursive functional constructs like map, zip, scan*, fold*, sum, etc. cannot be inlined?
You could always rewrite all these function when you employ them, adding appropriate strictness tags, or maybe employ fancy techniques like the "stream fusion" recommended here.
Yet, doesn't all this dramatically constrain our ability to write code that's simultaneously fast and elegant?
Indeed, GHC cannot at present inline recursive functions. However:
GHC will still specialise recursive functions. For instance, given
fac :: (Eq a, Num a) => a -> a
fac 0 = 1
fac n = n * fac (n-1)
f :: Int -> Int
f x = 1 + fac x
GHC will spot that fac is used at type Int -> Int and generate a specialised version of fac for that type, which uses fast integer arithmetic.
This specialisation happens automatically within a module (e.g. if fac and f are defined in the same module). For cross-module specialisation (e.g. if f and fac are defined in different modules), mark the to-be-specialised function with an INLINABLE pragma:
{-# INLINABLE fac #-}
fac :: (Eq a, Num a) => a -> a
...
There are manual transformations which make functions nonrecursive. The lowest-power technique is the static argument transformation, which applies to recursive functions with arguments which don't change on recursive calls (eg many higher-order functions such as map, filter, fold*). This transformation turns
map f [] = []
map f (x:xs) = f x : map f xs
into
map f xs0 = go xs0
where
go [] = []
go (x:xs) = f x : go xs
so that a call such as
g :: [Int] -> [Int]
g xs = map (2*) xs
will have map inlined and become
g [] = []
g (x:xs) = 2*x : g xs
This transformation has been applied to Prelude functions such as foldr and foldl.
Fusion techniques are also make many functions nonrecursive, and are more powerful than the static argument transformation. The main approach for lists, which is built into the Prelude, is shortcut fusion. The basic approach is to write as many functions as possible as non-recursive functions which use foldr and/or build; then all the recursion is captured in foldr, and there are special RULES for dealing with foldr.
Taking advantage of this fusion is in principle easy: avoid manual recursion, preferring library functions such as foldr, map, filter, and any functions in this list. In particular, writing code in this style produces code which is "simultaneously fast and elegant".
Modern libraries such as text and vector use stream fusion behind the scenes. Don Stewart wrote a pair of blog posts (1, 2) demonstrating this in action in the now obsolete library uvector, but the same principles apply to text and vector.
As with shortcut fusion, taking advantage of stream fusion in text and vector is in principle easy: avoid manual recursion, preferring library functions which have been marked as "subject to fusion".
There is ongoing work on improving GHC to support inlining of recursive functions. This falls under the general heading of supercompilation, and recent work on this seems to have been led by Max Bolingbroke and Neil Mitchell.
In short, not as often as you would think. The reason is that the "fancy techniques" such as stream fusion are employed when the libraries are implemented, and library users don't need to worry about them.
Consider Data.List.map. The base package defines map as
map :: (a -> b) -> [a] -> [b]
map _ [] = []
map f (x:xs) = f x : map f xs
This map is self-recursive, so GHC won't inline it.
However, base also defines the following rewrite rules:
{-# RULES
"map" [~1] forall f xs. map f xs = build (\c n -> foldr (mapFB c f) n xs)
"mapList" [1] forall f. foldr (mapFB (:) f) [] = map f
"mapFB" forall c f g. mapFB (mapFB c f) g = mapFB c (f.g)
#-}
This replaces uses of map via foldr/build fusion, then, if the function cannot be fused, replaces it with the original map. Because the fusion happens automatically, it doesn't depend on the user being aware of it.
As proof that this all works, you can examine what GHC produces for specific inputs. For this function:
proc1 = sum . take 10 . map (+1) . map (*2)
eval1 = proc1 [1..5]
eval2 = proc1 [1..]
when compiled with -O2, GHC fuses all of proc1 into a single recursive form (as seen in the core output with -ddump-simpl).
Of course there are limits to what these techniques can accomplish. For example, the naive average function, mean xs = sum xs / length xs is easily manually transformed into a single fold, and frameworks exist that can do so automatically, however at present there's no known way to automatically translate between standard functions and the fusion framework. So in this case, the user does need to be aware of the limitations of the compiler-produced code.
So in many cases compilers are sufficiently advanced to create code that's fast and elegant. Knowing when they will do so, and when the compiler is likely to fall down, is IMHO a large part of learning how to write efficient Haskell code.
for a self-recursive function, the loop breaker can only be the function itself, so an INLINE pragma is always ignored.
If something is recursive, to inline it, you would have to know how many times it is executed at compile time. Considering it will be a variable length input, that is not possible.
Yet, doesn't all this dramatically constrain our ability to write code that's simultaneously fast and elegant?
There are certain techniques though that can make recursive calls much, much faster than their normal situation. For example, tail call optimization SO Wiki

What is the name of this generalization of idempotence?

Lots of commonly useful properties of functions have concise names. For example, associativity, commutativity, transitivity, etc.
I am making a library for use with QuickCheck that provides shorthand definitions of these properties and others.
The one I have a question about is idempotence of unary functions. A function f is idempotent iif ∀x . f x == f (f x).
There is an interesting generalization of this property for which I am struggling to find a similarly concise name. To avoid biasing peoples name choices by suggesting one, I'll name it P and provide the following definition:
A function f has the P property with respect to g iif ∀x . f x == f (g x). We can see this as a generalization of idempotence by redefining idempotence in terms of P. A function f is idempotent iif it has the P property with respect to itself.
To see that this is a useful property observe that it justifies a rewrite rule that can be used to implement a number of common optimizations. This often but not always arises when g is some sort of canonicalization function. Some examples:
length is P with respect to map f (for all choices of f)
Converting to CNF is P with respect to converting to DNF (and vice versa)
Unicode normalization to form NFC is P with respect to normalization to form NFD (and vice versa)
minimum is P with respect to nub
What would you name this property?
One can say that map f is length-preserving, or that length is invariant under map fing. So how about:
g is f-preserving.
f is invariant under (applying) g.

GroupBy function from .NET in Haskell

LINQ library in .NET framework does have a very useful function called GroupBy, which I have been using all the time.
Its type in Haskell would look like
Ord b => (a-> b) -> [a] -> [(b, [a])]
Its purpose is to classify items based on the given classification function f into buckets, with each bucket containing similar items, that is (b, l) such that for any item x in l, f x == b.
Its performance in .NET is O(N) because it uses hash-tables, but in Haskell I am OK with O(N*log(N)).
I can't find anything similar in standard Haskell libraries. Also, my implementation in terms of standard functions is somewhat bulky:
myGroupBy :: Ord k => (a -> k) -> [a] -> [(k, [a])]
myGroupBy f = map toFst
. groupBy ((==) `on` fst)
. sortBy (comparing fst)
. map (\a -> (f a, a))
where
toFst l#((k,_):_) = (k, map snd l)
This is definitely not something I want to see amongst my problem-specific code.
My question is: how can I implement this function nicely exploiting standard libraries to their maximum?
Also, the seeming absence of such a standard function hints that it may rarely be needed by experienced Haskellers because they may know some better way. Is that true? What can be used to implement similar functionality in a better way?
Also, what would be the good name for it, considering groupBy is already taken? :)
GHC.Exts.groupWith
groupWith :: Ord b => (a -> b) -> [a] -> [[a]]
Introduced as part of generalised list comprehensions: http://www.haskell.org/ghc/docs/7.0.2/html/users_guide/syntax-extns.html#generalised-list-comprehensions
Using Data.Map as the intermediate structure:
import Control.Arrow ((&&&))
import qualified Data.Map as M
myGroupBy f = M.toList . M.fromListWith (++) . map (f &&& return)
The map operation turns the input list into a list of keys paired with singleton lists containing the elements. M.fromListWith (++) turns this into a Data.Map, concatenating when two items have the same key, and M.toList gets the pairs back out again.
Note that this reverses the lists, so adjust for that if necessary. It is also easy to replace return and (++) with other monoid-like operations if you for example only wanted the sum of the elements in each group.