What is the name of this generalization of idempotence? - optimization

Lots of commonly useful properties of functions have concise names. For example, associativity, commutativity, transitivity, etc.
I am making a library for use with QuickCheck that provides shorthand definitions of these properties and others.
The one I have a question about is idempotence of unary functions. A function f is idempotent iif ∀x . f x == f (f x).
There is an interesting generalization of this property for which I am struggling to find a similarly concise name. To avoid biasing peoples name choices by suggesting one, I'll name it P and provide the following definition:
A function f has the P property with respect to g iif ∀x . f x == f (g x). We can see this as a generalization of idempotence by redefining idempotence in terms of P. A function f is idempotent iif it has the P property with respect to itself.
To see that this is a useful property observe that it justifies a rewrite rule that can be used to implement a number of common optimizations. This often but not always arises when g is some sort of canonicalization function. Some examples:
length is P with respect to map f (for all choices of f)
Converting to CNF is P with respect to converting to DNF (and vice versa)
Unicode normalization to form NFC is P with respect to normalization to form NFD (and vice versa)
minimum is P with respect to nub
What would you name this property?

One can say that map f is length-preserving, or that length is invariant under map fing. So how about:
g is f-preserving.
f is invariant under (applying) g.

Related

Pointwise function equality proof [duplicate]

I'm just starting playing with idris and theorem proving in general. I can follow most of the examples of proofs of basic facts on the internet, so I wanted to try something arbitrary by my own. So, I want to write a proof term for the following basic property of map:
map : (a -> b) -> List a -> List b
prf : map id = id
Intuitively, I can imagine how the proof should work: Take an arbitrary list l and analyze the possibilities for map id l. When l is empty, it's obvious; when
l is non-empty it's based on the concept that function application preserves equality.
So, I can do something like this:
prf' : (l : List a) -> map id l = id l
It's like a for all statement. How can I turn it into a proof of the equality of the functions involved?
You can't. Idris's type theory (like Coq's and Agda's) does not support general extensionality. Given two functions f and g that "act the same", you will never be able to prove Not (f = g), but you will only be able to prove f = g if f and g are defined the same, up to alpha and eta equivalence or so. Unfortunately, things only get worse when you consider higher-order functions; there's a theorem about such in the Coq standard library, but I can't seem to find or remember it right now.

Return highest or lowest value Z notation , formal method

I am new to Z notation,
Lets say I have a function f defined as X |--> Y ,
where X is string and Y is number.
How can I get highest Y value in this function? Does 'loop' exist in formal method so I can solve it using loop?
I know there is recursion in Z notation, but based on the material provided, I only found it apply in multiset or bag, can it apply in function?
Any extra reference application of 'loop' or recursion application will be appreciated. Sorry for my English.
You can just use the predefined function max that takes a set of integers as input and returns the maximum number. The input values here are the range (the set of all values) of the function:
max(ran(f))
Please note that the maximum is not defined for empty sets.
Regarding your question about recursion or loops: You can actually define a function recursively but I think your question aims more at a way to compute something. This is not easily expressed in Z and this is IMO a good thing because it is used for specifications and it is not a programming language. Even if there wouldn't be a max or ran function, you could still specify the number m you are looking for by:
\exists s:String # (s,m):f /\
\forall s2:String, i2:Z # (s2,i2):f ==> i2 <= m
("m is a value of f, belonging to an s and all other values i2 of f are smaller or equal")
After getting used to the style it is usually far better to understand than any programming language (except your are trying to describe an algorithm itself and not its expected outcome).#
Just for reference: An example of a recursive definition (let's call it rmax) for the maximum would consist of a base case:
\forall e:Z # rmax({e}) = e
and a recursive case:
\forall e:Z; S:\pow(Z) #
S \noteq {} \land
rmax({e} \cup S) = \IF e > rmax(S) \THEN e \ELSE rmax(S)
But note that this is still not a "computation rule" of rmax because e in the second rule can be an arbitrary element of S. In more complex scenarios it might even be not obvious that the defined relation is a function at all because depending on the chosen elements different results could be computed.

How to use SmallCheck in Haskell?

I am trying to use SmallCheck to test a Haskell program, but I cannot understand how to use the library to test my own data types. Apparently, I need to use the Test.SmallCheck.Series. However, I find the documentation for it extremely confusing. I am interested in both cookbook-style solutions and an understandable explanation of the logical (monadic?) structure. Here are some questions I have (all related):
If I have a data type data Person = SnowWhite | Dwarf Integer, how do I explain to smallCheck that the valid values are Dwarf 1 through Dwarf 7 (or SnowWhite)? What if I have a complicated FairyTale data structure and a constructor makeTale :: [Person] -> FairyTale, and I want smallCheck to make FairyTale-s from lists of Person-s using the constructor?
I managed to make quickCheck work like this without getting my hands too dirty by using judicious applications of Control.Monad.liftM to functions like makeTale. I couldn't figure out a way to do this with smallCheck (please explain it to me!).
What is the relationship between the types Serial, Series, etc.?
(optional) What is the point of coSeries? How do I use the Positive type from SmallCheck.Series?
(optional) Any elucidation of what is the logic behind what should be a monadic expression, and what is just a regular function, in the context of smallCheck, would be appreciated.
If there is there any intro/tutorial to using smallCheck, I'd appreciate a link. Thank you very much!
UPDATE: I should add that the most useful and readable documentation I found for smallCheck is this paper (PDF). I could not find the answer to my questions there on the first look; it is more of a persuasive advertisement than a tutorial.
UPDATE 2: I moved my question about the weird Identity that shows up in the type of Test.SmallCheck.list and other places to a separate question.
NOTE: This answer describes pre-1.0 versions of SmallCheck. See this blog post for the important differences between SmallCheck 0.6 and 1.0.
SmallCheck is like QuickCheck in that it tests a property over some part of the space of possible types. The difference is that it tries to exhaustively enumerate a series all of the "small" values instead of an arbitrary subset of smallish values.
As I hinted, SmallCheck's Serial is like QuickCheck's Arbitrary.
Now Serial is pretty simple: a Serial type a has a way (series) to generate a Series type which is just a function from Depth -> [a]. Or, to unpack that, Serial objects are objects we know how to enumerate some "small" values of. We are also given a Depth parameter which controls how many small values we should generate, but let's ignore it for a minute.
instance Serial Bool where series _ = [False, True]
instance Serial Char where series _ = "abcdefghijklmnopqrstuvwxyz"
instance Serial a => Serial (Maybe a) where
series d = Nothing : map Just (series d)
In these cases we're doing nothing more than ignoring the Depth parameter and then enumerating "all" possible values for each type. We can even do this automatically for some types
instance (Enum a, Bounded a) => Serial a where series _ = [minBound .. maxBound]
This is a really simple way of testing properties exhaustively—literally test every single possible input! Obviously there are at least two major pitfalls, though: (1) infinite data types will lead to infinite loops when testing and (2) nested types lead to exponentially larger spaces of examples to look through. In both cases, SmallCheck gets really large really quickly.
So that's the point of the Depth parameter—it lets the system ask us to keep our Series small. From the documentation, Depth is the
Maximum depth of generated test values
For data values, it is the depth of nested constructor applications.
For functional values, it is both the depth of nested case analysis and the depth of results.
so let's rework our examples to keep them Small.
instance Serial Bool where
series 0 = []
series 1 = [False]
series _ = [False, True]
instance Serial Char where
series d = take d "abcdefghijklmnopqrstuvwxyz"
instance Serial a => Serial (Maybe a) where
-- we shrink d by one since we're adding Nothing
series d = Nothing : map Just (series (d-1))
instance (Enum a, Bounded a) => Serial a where series d = take d [minBound .. maxBound]
Much better.
So what's coseries? Like coarbitrary in the Arbitrary typeclass of QuickCheck, it lets us build a series of "small" functions. Note that we're writing the instance over the input type---the result type is handed to us in another Serial argument (that I'm below calling results).
instance Serial Bool where
coseries results d = [\cond -> if cond then r1 else r2 |
r1 <- results d
r2 <- results d]
these take a little more ingenuity to write and I'll actually refer you to use the alts methods which I'll describe briefly below.
So how can we make some Series of Persons? This part is easy
instance Series Person where
series d = SnowWhite : take (d-1) (map Dwarf [1..7])
...
But our coseries function needs to generate every possible function from Persons to something else. This can be done using the altsN series of functions provided by SmallCheck. Here's one way to write it
coseries results d = [\person ->
case person of
SnowWhite -> f 0
Dwarf n -> f n
| f <- alts1 results d ]
The basic idea is that altsN results generates a Series of N-ary function from N values with Serial instances to the Serial instance of Results. So we use it to create a function from [0..7], a previously defined Serial value, to whatever we need, then we map our Persons to numbers and pass 'em in.
So now that we have a Serial instance for Person, we can use it to build more complex nested Serial instances. For "instance", if FairyTale is a list of Persons, we can use the Serial a => Serial [a] instance alongside our Serial Person instance to easily create a Serial FairyTale:
instance Serial FairyTale where
series = map makeFairyTale . series
coseries results = map (makeFairyTale .) . coseries results
(the (makeFairyTale .) composes makeFairyTale with each function coseries generates, which is a little confusing)
If I have a data type data Person = SnowWhite | Dwarf Integer, how do I explain to smallCheck that the valid values are Dwarf 1 through Dwarf 7 (or SnowWhite)?
First of all, you need to decide which values you want to generate for each depth. There's no single right answer here, it depends on how fine-grained you want your search space to be.
Here are just two possible options:
people d = SnowWhite : map Dwarf [1..7] (doesn't depend on the depth)
people d = take d $ SnowWhite : map Dwarf [1..7] (each unit of depth increases the search space by one element)
After you've decided on that, your Serial instance is as simple as
instance Serial m Person where
series = generate people
We left m polymorphic here as we don't require any specific structure of the underlying monad.
What if I have a complicated FairyTale data structure and a constructor makeTale :: [Person] -> FairyTale, and I want smallCheck to make FairyTale-s from lists of Person-s using the constructor?
Use cons1:
instance Serial m FairyTale where
series = cons1 makeTale
What is the relationship between the types Serial, Series, etc.?
Serial is a type class; Series is a type. You can have multiple Series of the same type — they correspond to different ways to enumerate values of that type. However, it may be arduous to specify for each value how it should be generated. The Serial class lets us specify a good default for generating values of a particular type.
The definition of Serial is
class Monad m => Serial m a where
series :: Series m a
So all it does is assigning a particular Series m a to a given combination of m and a.
What is the point of coseries?
It is needed to generate values of functional types.
How do I use the Positive type from SmallCheck.Series?
For example, like this:
> smallCheck 10 $ \n -> n^3 >= (n :: Integer)
Failed test no. 5.
there exists -2 such that
condition is false
> smallCheck 10 $ \(Positive n) -> n^3 >= (n :: Integer)
Completed 10 tests without failure.
Any elucidation of what is the logic behind what should be a monadic expression, and what is just a regular function, in the context of smallCheck, would be appreciated.
When you are writing a Serial instance (or any Series expression), you work in the Series m monad.
When you are writing tests, you work with simple functions that return Bool or Property m.
While I think that #tel's answer is an excellent explanation (and I wish smallCheck actually worked the way he describes), the code he provides does not work for me (with smallCheck version 1). I managed to get the following to work...
UPDATE / WARNING: The code below is wrong for a rather subtle reason. For the corrected version, and details, please see this answer to the question mentioned below. The short version is that instead of instance Serial Identity Person one must write instance (Monad m) => Series m Person.
... but I find the use of Control.Monad.Identity and all the compiler flags bizarre, and I have asked a separate question about that.
Note also that while Series Person (or actually Series Identity Person) is not actually exactly the same as functions Depth -> [Person] (see #tel's answer), the function generate :: Depth -> [a] -> Series m a converts between them.
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses, FlexibleContexts, UndecidableInstances #-}
import Test.SmallCheck
import Test.SmallCheck.Series
import Control.Monad.Identity
data Person = SnowWhite | Dwarf Int
instance Serial Identity Person where
series = generate (\d -> SnowWhite : take (d-1) (map Dwarf [1..7]))

QuickCheck: Arbitrary instances of nested data structures that generate balanced specimens

tl;dr: how do you write instances of Arbitrary that don't explode if your data type allows for way too much nesting? And how would you guarantee these instances produce truly random specimens of your data structure?
I want to generate random tree structures, then test certain properties of these structures after I've mangled them with my library code. (NB: I'm writing an implementation of a subtyping algorithm, i.e. given a hierarchy of types, is type A a subtype of type B. This can be made arbitrarily complex, by including multiple-inheritance and post-initialization updates to the hierarchy. The classical method that supports neither of these is Schubert Numbering, and the latest result known to me is Alavi et al. 2008.)
Let's take the example of rose-trees, following Data.Tree:
data Tree a = Node a (Forest a)
type Forest a = [Tree a]
A very simple (and don't-try-this-at-home) instance of Arbitray would be:
instance (Arbitrary a) => Arbitrary (Tree a) where
arbitrary = Node <$> arbitrary <$> arbitrary
Since a already has an Arbitrary instance as per the type constraint, and the Forest will have one, because [] is an instance, too, this seems straight-forward. It won't (typically) terminate for very obvious reasons: since the lists it generates are arbitrarily long, the structures become too large, and there's a good chance they won't fit into memory. Even a more conservative approach:
arbitrary = Node <$> arbitrary <*> oneof [arbitrary,return []]
won't work, again, for the same reason. One could tweak the size parameter, to keep the length of the lists down, but even that won't guarantee termination, since it's still multiple consecutive dice-rolls, and it can turn out quite badly (and I want the odd node with 100 children.)
Which means I need to limit the size of the entire tree. That is not so straight-forward. unordered-containers has it easy: just use fromList. This is not so easy here: How do you turn a list into a tree, randomly, and without incurring bias one way or the other (i.e. not favoring left-branches, or trees that are very left-leaning.)
Some sort of breadth-first construction (the functions provided by Data.Tree are all pre-order) from lists would be awesome, and I think I could write one, but it would turn out to be non-trivial. Since I'm using trees now, but will use even more complex stuff later on, I thought I might try to find a more general and less complex solution. Is there one, or will I have to resort to writing my own non-trivial Arbitrary generator? In the latter case, I might actually just resort to unit-tests, since this seems too much work.
Use sized:
instance Arbitrary a => Arbitrary (Tree a) where
arbitrary = sized arbTree
arbTree :: Arbitrary a => Int -> Gen (Tree a)
arbTree 0 = do
a <- arbitrary
return $ Node a []
arbTree n = do
(Positive m) <- arbitrary
let n' = n `div` (m + 1)
f <- replicateM m (arbTree n')
a <- arbitrary
return $ Node a f
(Adapted from the QuickCheck presentation).
P.S. Perhaps this will generate overly balanced trees...
You might want to use the library presented in the paper "Feat: Functional Enumeration of Algebraic Types" at the Haskell Symposium 2012. It is on Hackage as testing-feat, and a video of the talk introducing it is available here: http://www.youtube.com/watch?v=HbX7pxYXsHg
As Janis mentioned, you can use the package testing-feat, which creates enumerations of arbitrary algebraic data types. This is the easiest way to create unbiased uniformly distributed generators
for all trees of up to a given size.
Here is how you would use it for rose trees:
import Test.Feat (Enumerable(..), uniform, consts, funcurry)
import Test.Feat.Class (Constructor)
import Data.Tree (Tree(..))
import qualified Test.QuickCheck as QC
-- We make an enumerable instance by listing all constructors
-- for the type. In this case, we have one binary constructor:
-- Node :: a -> [Tree a] -> Tree a
instance Enumerable a => Enumerable (Tree a) where
enumerate = consts [binary Node]
where
binary :: (a -> b -> c) -> Constructor c
binary = unary . funcurry
-- Now we use the Enumerable instance to create an Arbitrary
-- instance with the help of the function:
-- uniform :: Enumerable a => Int -> QC.Gen a
instance Enumerable a => QC.Arbitrary (Tree a) where
QC.arbitrary = QC.sized uniform
-- QC.shrink = <some implementation>
The Enumerable instance can also be generated automatically with TemplateHaskell:
deriveEnumerable ''Tree

Can GHC really never inline map, scanl, foldr, etc.?

I've noticed the GHC manual says "for a self-recursive function, the loop breaker can only be the function itself, so an INLINE pragma is always ignored."
Doesn't this say every application of common recursive functional constructs like map, zip, scan*, fold*, sum, etc. cannot be inlined?
You could always rewrite all these function when you employ them, adding appropriate strictness tags, or maybe employ fancy techniques like the "stream fusion" recommended here.
Yet, doesn't all this dramatically constrain our ability to write code that's simultaneously fast and elegant?
Indeed, GHC cannot at present inline recursive functions. However:
GHC will still specialise recursive functions. For instance, given
fac :: (Eq a, Num a) => a -> a
fac 0 = 1
fac n = n * fac (n-1)
f :: Int -> Int
f x = 1 + fac x
GHC will spot that fac is used at type Int -> Int and generate a specialised version of fac for that type, which uses fast integer arithmetic.
This specialisation happens automatically within a module (e.g. if fac and f are defined in the same module). For cross-module specialisation (e.g. if f and fac are defined in different modules), mark the to-be-specialised function with an INLINABLE pragma:
{-# INLINABLE fac #-}
fac :: (Eq a, Num a) => a -> a
...
There are manual transformations which make functions nonrecursive. The lowest-power technique is the static argument transformation, which applies to recursive functions with arguments which don't change on recursive calls (eg many higher-order functions such as map, filter, fold*). This transformation turns
map f [] = []
map f (x:xs) = f x : map f xs
into
map f xs0 = go xs0
where
go [] = []
go (x:xs) = f x : go xs
so that a call such as
g :: [Int] -> [Int]
g xs = map (2*) xs
will have map inlined and become
g [] = []
g (x:xs) = 2*x : g xs
This transformation has been applied to Prelude functions such as foldr and foldl.
Fusion techniques are also make many functions nonrecursive, and are more powerful than the static argument transformation. The main approach for lists, which is built into the Prelude, is shortcut fusion. The basic approach is to write as many functions as possible as non-recursive functions which use foldr and/or build; then all the recursion is captured in foldr, and there are special RULES for dealing with foldr.
Taking advantage of this fusion is in principle easy: avoid manual recursion, preferring library functions such as foldr, map, filter, and any functions in this list. In particular, writing code in this style produces code which is "simultaneously fast and elegant".
Modern libraries such as text and vector use stream fusion behind the scenes. Don Stewart wrote a pair of blog posts (1, 2) demonstrating this in action in the now obsolete library uvector, but the same principles apply to text and vector.
As with shortcut fusion, taking advantage of stream fusion in text and vector is in principle easy: avoid manual recursion, preferring library functions which have been marked as "subject to fusion".
There is ongoing work on improving GHC to support inlining of recursive functions. This falls under the general heading of supercompilation, and recent work on this seems to have been led by Max Bolingbroke and Neil Mitchell.
In short, not as often as you would think. The reason is that the "fancy techniques" such as stream fusion are employed when the libraries are implemented, and library users don't need to worry about them.
Consider Data.List.map. The base package defines map as
map :: (a -> b) -> [a] -> [b]
map _ [] = []
map f (x:xs) = f x : map f xs
This map is self-recursive, so GHC won't inline it.
However, base also defines the following rewrite rules:
{-# RULES
"map" [~1] forall f xs. map f xs = build (\c n -> foldr (mapFB c f) n xs)
"mapList" [1] forall f. foldr (mapFB (:) f) [] = map f
"mapFB" forall c f g. mapFB (mapFB c f) g = mapFB c (f.g)
#-}
This replaces uses of map via foldr/build fusion, then, if the function cannot be fused, replaces it with the original map. Because the fusion happens automatically, it doesn't depend on the user being aware of it.
As proof that this all works, you can examine what GHC produces for specific inputs. For this function:
proc1 = sum . take 10 . map (+1) . map (*2)
eval1 = proc1 [1..5]
eval2 = proc1 [1..]
when compiled with -O2, GHC fuses all of proc1 into a single recursive form (as seen in the core output with -ddump-simpl).
Of course there are limits to what these techniques can accomplish. For example, the naive average function, mean xs = sum xs / length xs is easily manually transformed into a single fold, and frameworks exist that can do so automatically, however at present there's no known way to automatically translate between standard functions and the fusion framework. So in this case, the user does need to be aware of the limitations of the compiler-produced code.
So in many cases compilers are sufficiently advanced to create code that's fast and elegant. Knowing when they will do so, and when the compiler is likely to fall down, is IMHO a large part of learning how to write efficient Haskell code.
for a self-recursive function, the loop breaker can only be the function itself, so an INLINE pragma is always ignored.
If something is recursive, to inline it, you would have to know how many times it is executed at compile time. Considering it will be a variable length input, that is not possible.
Yet, doesn't all this dramatically constrain our ability to write code that's simultaneously fast and elegant?
There are certain techniques though that can make recursive calls much, much faster than their normal situation. For example, tail call optimization SO Wiki