How to properly use GHC's SPECIALIZE pragma? (Example: specializing pure function from monadic ones using Identity.) - optimization

As an example, suppose I want to write a monadic and non-monadic map over lists. I'll start with the monadic one:
import Control.Monad
import Control.Monad.Identity
mapM' :: (Monad m) => (a -> m b) -> ([a] -> m [b])
mapM' _ [] = return []
mapM' f (x:xs) = liftM2 (:) (f x) (mapM f xs)
Now I want to reuse the code to write the pure map (instead of repeating the code):
map' :: (a -> b) -> ([a] -> [b])
map' f = runIdentity . mapM' (Identity . f)
What is necessary to make map' as optimized as if it were written explicitly like map is? In particular:
Is it necessary to write
{-# SPECIALIZE mapM' :: (a -> Identity b) -> ([a] -> Identity [b]) #-}
or does GHC optimize map' itself (by factoring out Identity completely)?
Anything else (more pragmas) need to be added?
How can I verify how well the compiled map' is optimized wrt the explicitly written code for map?

Well, let us ask the compiler itself.
Compiling the module
module PMap where
import Control.Monad
import Control.Monad.Identity
mapM' :: (Monad m) => (a -> m b) -> ([a] -> m [b])
mapM' _ [] = return []
mapM' f (x:xs) = liftM2 (:) (f x) (mapM f xs)
map' :: (a -> b) -> ([a] -> [b])
map' f = runIdentity . mapM' (Identity . f)
with ghc -O2 -ddump-simpl -ddump-to-file PMap.hs (ghc-7.6.1, 7.4.2 produces the same except for unique names) produces the following core for map'
PMap.map'
:: forall a_afB b_afC. (a_afB -> b_afC) -> [a_afB] -> [b_afC]
[GblId,
Arity=2,
Caf=NoCafRefs,
Str=DmdType LS,
Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True,
ConLike=True, WorkFree=True, Expandable=True,
Guidance=IF_ARGS [60 30] 160 40}]
PMap.map' =
\ (# a_c) (# b_d) (f_afK :: a_c -> b_d) (eta_B1 :: [a_c]) ->
case eta_B1 of _ {
[] -> GHC.Types.[] # b_d;
: x_afH xs_afI ->
GHC.Types.:
# b_d
(f_afK x_afH)
(letrec {
go_ahZ [Occ=LoopBreaker]
:: [a_c] -> Data.Functor.Identity.Identity [b_d]
[LclId, Arity=1, Str=DmdType S]
go_ahZ =
\ (ds_ai0 :: [a_c]) ->
case ds_ai0 of _ {
[] ->
(GHC.Types.[] # b_d)
`cast` (Sym <(Data.Functor.Identity.NTCo:Identity <[b_d]>)>
:: [b_d] ~# Data.Functor.Identity.Identity [b_d]);
: y_ai5 ys_ai6 ->
(GHC.Types.:
# b_d
(f_afK y_ai5)
((go_ahZ ys_ai6)
`cast` (<Data.Functor.Identity.NTCo:Identity <[b_d]>>
:: Data.Functor.Identity.Identity [b_d] ~# [b_d])))
`cast` (Sym <(Data.Functor.Identity.NTCo:Identity <[b_d]>)>
:: [b_d] ~# Data.Functor.Identity.Identity [b_d])
}; } in
(go_ahZ xs_afI)
`cast` (<Data.Functor.Identity.NTCo:Identity <[b_d]>>
:: Data.Functor.Identity.Identity [b_d] ~# [b_d]))
}
Yup, only casts, no real overhead. You get a local worker go that acts exactly as map does.
Summing up: You only need -O2, and you can verify how well optimised the code is by looking at the core (-ddump-simpl) or, if you can read it, at the produced assembly (-ddump-asm) resp LLVM bit code -ddump-llvm).
It is probably good to elaborate a bit. Concerning
Is it necessary to write
{-# SPECIALIZE mapM' :: (a -> Identity b) -> ([a] -> Identity [b]) #-}
or does GHC optimize map' itself (by factoring out Identity completely)?
the answer is that if you use the specialisation in the same module as the general function is defined, then in general you don't need a {-# SPECIALISE #-} pragma, GHC creates the specialisation on its own if it sees any benefit in that. In the above module, GHC created the specialisation rule
"SPEC PMap.mapM' [Data.Functor.Identity.Identity]" [ALWAYS]
forall (# a_abG)
(# b_abH)
($dMonad_sdL :: GHC.Base.Monad Data.Functor.Identity.Identity).
PMap.mapM' # Data.Functor.Identity.Identity
# a_abG
# b_abH
$dMonad_sdL
= PMap.mapM'_$smapM' # a_abG # b_abH
that also benefits any uses of mapM' at the Identity monad outside the defining module (if compiled with optimisations, and the monad is recognised as Identity in time for the rule to fire).
However, if GHC doesn't understand the type to specialise to well enough, it may not see any benefit and not specialise (I don't know it well enough to tell whether it will try anyway - so far I have found a specialisation each time I looked).
If you want to be sure, look at the core.
If you need the specialisation in a different module, GHC has no reason to specialise the function when it compiles the defining module, so in that case a pragma is necessary. Instead of a {-# SPECIALISE #-} pragma demanding a specialisation for a few hand-picked types, it is probably better - as of ghc-7 - to use an {-# INLINABLE #-} pragma, so that the (slightly modified) source code is made accessible in importing modules, which allows specialisations for any required types there.
Anything else (more pragmas) need to be added?
Different uses may of course require different pragmas, but as a rule of thumb, {#- INLINABLE #-} is the one you want most. And of course {-# RULES #-} can do magic the compiler cannot do on its own.
How can I verify how well the compiled map' is optimized wrt the explicitly written code for map?
Look at the produced core, asm, or llvm bitcode, whichever you understand best (core is relatively easy).
Benchmark the produced code against a hand-written specialisation, if you are not sure from the core, and need to know. Ultimately, unless you get identical intermediate results at some stage (core/cmm/asm/llvm), benchmarking is the only way to know for sure.

Related

Transformers in type signature or not?

Just thinking about an API design. What is "common" in Haskell? Transformers in type signature or rather "hidden"?
findById :: ID -> IO (Maybe User)
findById x = runMaybeT $ do
...
return User
or
findById :: ID -> MaybeT IO User
findById x = do
...
return User
If this is for something simple, and it's only a few functions that do this maybe-in-IO, I would just make the type IO (Maybe User).
If this is a pattern that stretches across your library, I would give a semi-abstract name to the tfm-stack monad:
type Request = MaybeT IO
findById :: ID -> Request User
... or even
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
newtype Request a = Request (runRequest :: MaybeT IO a)
deriving (Functor, Applicative, Monad)
Making the signature ID -> MaybeT IO User isn't very good: the transformer only helps if you're doing a whole bunch of actions in that monad, but in that case always writing out MaybeT IO violates the DRY principle.

How does one use monadic properties in SmallCheck?

I would like to write a SmallCheck property that uses IO, but I can't figure out how I am supposed to do it. Specifically, the goal is to write a property that is an instance of Testable IO Bool so that I can feed it into smallCheck (or testProperty in test-framework). Unfortunately, the best I can come up with is the following:
smallCheck 5 (\(x :: Int) → return True :: IO Bool)
This doesn't work because it is an instance of Testable IO (IO Bool) rather than Testable IO Bool, but I can't figure out how to rewrite it so that it works.
Any help would be appreciated.
You want the monadic combinator. It takes an arbitrary monad m and wraps it into a Property that is an instance of Testable.
smallCheck 5 $ \(x :: Int) -> monadic $ (return True :: IO Bool)
It turns out that there is a function that does exactly what I wanted:
monadic :: Testable m a => m a -> Property m
You use it like so:
smallCheck 5 $ \(x :: Int) → monadic (putStrLn (show x) >> return True)
Specifically, note how monadic needs to be nested after the function argument.

Why does smallCheck's `Series` class have two types in the constructor?

This question is related to my other question about smallCheck's Test.SmallCheck.Series class. When I try to define an instance of the class Serial in the following natural way (suggested to me by an answer by #tel to the above question), I get compiler errors:
data Person = SnowWhite | Dwarf Int
instance Serial Person where ...
It turns out that Serial wants to have two arguments. This, in turn, necessitates a some compiler flags. The following works:
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
import Test.SmallCheck
import Test.SmallCheck.Series
import Control.Monad.Identity
data Person = SnowWhite | Dwarf Int
instance Serial Identity Person where
series = generate (\d -> SnowWhite : take (d-1) (map Dwarf [1..7]))
My question is:
Was putting that Identity there the "right thing to do"? I was inspired by the type of the Test.Series.list function (which I also found extremely bizarre when I first saw it):
list :: Depth -> Series Identity a -> [a]
What is the right thing to do? Will I be OK if I just blindly put Identity in whenever I see it? Should I have put something like Serial m Integer => Serial m Person instead (that necessitates some more scary-looking compiler flags: FlexibleContexts and UndecidableInstances at least)?
What is that first parameter (the m in Serial m n) for?
Thank you!
I'm just an user of smallcheck and not a developer, but I think the answer is
1) Not really. You should leave it polymorphic, which you can do without the said extensions:
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses #-}
import Test.SmallCheck
import Test.SmallCheck.Series
import Control.Monad.Identity
data Person = SnowWhite | Dwarf Int deriving (Show)
instance (Monad m) => Serial m Person where
series = generate (\d -> SnowWhite : take (d-1) (map Dwarf [1..7]))
2) Series is currently defined as
newtype Series m a = Series (ReaderT Depth (LogicT m) a)
which means that mis the base monad for LogicT which is used to generate the values in the series. For example, writing IO in place of m would allow IO actions to happen while generating the series.
In SmallCheck, m appears also in the Testable instance declarations, such as instance (Serial m a, Show a, Testable m b) => Testable m (a->b). This has the concrete effect that the pre-existing driver functions such as smallCheck :: Testable IO a => Depth -> a -> IO () cannot be used if you only have instances for Identity.
In practice, you could make use of this fact by writing a custom driver function which
interleaves some monadic effect like logging of the generated values (or some such) inside the said driver.
It might also be useful for other things which I'm not aware of.

Is it possible to create a collection api like Scala 2.8's in Haskell?

The Scala collections api has some pretty interesting properties and I'm wondering how one would implement it in Haskell; or if it's even possible (or a good idea in general). I'm a bit of a haskell newbie so I'd like to hear your thoughts.
The scala map definition looks like this:
def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That
An interesting feature of this API is that if you map over a string and your map function returns a character, the result will be of type string (and not a list of characters).
We have something roughly as general as the Scala API. It's called Foldable.
class Foldable t where
fold :: Monoid m => t m -> m
foldMap :: Monoid m => (a -> m) -> t a -> m
foldr :: (a -> b -> b) -> b -> t a -> b
foldl :: (a -> b -> a) -> a -> t b -> a
foldr1 :: (a -> a -> a) -> t a -> a
foldl1 :: (a -> a -> a) -> t a -> a
http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/Data-Foldable.html
I want to say this map function in Scala is really closer to this from Haskell:
fmap :: (Functor f) => (a -> b) -> f a -> f b
Where the list type is just another Functor.

How does Haskell deal with documentation?

How do I get online documentation in Haskell?
Are there anything as elegant/handy as what Python does below?
>>> help([].count)
Help on built-in function count:
count(...)
L.count(value) -> integer -- return number of occurrences of value
Interactive help in GHCi
The standard Haskell REPL is GHCi. While it is not possible to access complete documentation from within GHCi, it is possible to get quite a lot of useful info.
Print types. In 90% of cases this is enough to understand what a function does and how to use it.
ghci> :t zipWith
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
:t is short for :type.
Print information about symbols. This is useful to find what module does a symbol belong. For a data type it allows to see its definition and class instances. For a type class it allows to see its interface and a list of types which are its instances.
ghci> :i Bool
data Bool = False | True -- Defined in GHC.Bool
instance Bounded Bool -- Defined in GHC.Enum
instance Enum Bool -- Defined in GHC.Enum
instance Eq Bool -- Defined in GHC.Base
instance Ord Bool -- Defined in GHC.Base
instance Read Bool -- Defined in GHC.Read
instance Show Bool -- Defined in GHC.Show
ghci> :i Eq
class Eq a where
(==) :: a -> a -> Bool
(/=) :: a -> a -> Bool
-- Defined in GHC.Classes
instance (Eq a) => Eq (Maybe a) -- Defined in Data.Maybe
instance (Eq a, Eq b) => Eq (Either a b) -- Defined in Data.Either
(many more instances follow)
ghci> :i zipWith
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
-- Defined in GHC.List
:i is short for :info.
Print kinds. Use :k on type constructors.
ghci> :k Maybe
Maybe :: * -> *
ghci> :k Int
Int :: *
:k is short for :kind.
Browse module's contents. This allows to see what symbols an imported module offers.
ghci> :browse Data.List
(\\) :: (Eq a) => [a] -> [a] -> [a]
delete :: (Eq a) => a -> [a] -> [a]
deleteBy :: (a -> a -> Bool) -> a -> [a] -> [a]
...
(many lines follow)
:t, :k and :i work only for symbols in scope (you need to import module with :m + Module.Name first). :browse works for all available modules.
Online documentation
Most Haskell libraries are documented with Haddock. You can open an HTML version of the documentation and read the details.
You can install it locally, if you use --enable-documentation flag in cabal install.
Otherwise, a good point to browse through all the documentation is a package list on Hackage. It allows to see the documentation also for earlier versions of any package. Sometimes it is very useful.
Currently, there is no way to view the Haddock documentation within ghci, but there is a ticket for it.
You can however get a small bit of info using the :info command, e.g.
ghci> :i nub
nub :: (Eq a) => [a] -> [a] -- Defined in Data.List
so that you at least know where to look for the documentation for a particular function.
You can use Hoogle to search for documentation by function name or its type signature (perhaps, approximate type signature). There's also a command-line offline version of this tool which you can get from hackage.