Optimizing Haskell Recursive Lists - optimization

Another Haskell optimization question from my previous. I need to generate a list recursively, similar to the fibs function found in many introductory Haskell articles:
generateSchedule :: [Word32] -> [Word32]
generateSchedule blkw = take 80 ws
where
ws = blkw ++ zipWith4 gen (drop 13 ws) (drop 8 ws) (drop 2 ws) ws
gen a b c d = rotate (a `xor` b `xor` c `xor` d) 1
The above function has overtaken as the most time and alloc -consuming function for me. The profiler gives me the following statistics:
COST CENTRE MODULE %time %alloc ticks bytes
generateSchedule Test.Hash.SHA1 22.1 40.4 31 702556640
I thought of applying unboxed vectors to calculate the list but cannot figure a way to do it since the list is recursive. This would have a natural implementation in C but I do not see a way to make this faster (other than to unroll and write 80 lines of variable declarations). Any help?
Update: I actually did unroll it quickly to see if it helps. The code is here. It is ugly, and in fact it was slower.
COST CENTRE MODULE %time %alloc ticks bytes
generateSchedule GG.Hash.SHA1 22.7 27.6 40 394270592

import Data.Array.Base
import Data.Array.ST
import Data.Array.Unboxed
generateSchedule :: [Word32] -> UArray Int Word32
generateSchedule ws0 = runSTUArray $ do
arr <- unsafeNewArray_ (0,79)
let fromList i [] = fill i 0
fromList i (w:ws) = do
unsafeWrite arr i w
fromList (i+1) ws
fill i j
| i == 80 = return arr
| otherwise = do
d <- unsafeRead arr j
c <- unsafeRead arr (j+2)
b <- unsafeRead arr (j+8)
a <- unsafeRead arr (j+13)
unsafeWrite arr i (gen a b c d)
fill (i+1) (j+1)
fromList 0 ws0
will create an unboxed array corresponding to your list. It relies on the assumption that the list argument contains at least 14 and at most 80 items, otherwise it'll misbehave badly. I think it'll always be 16 items (64 bytes), so that should be safe for you. (But it's probably better to start filling directly from the ByteString rather than to construct an intermediate list.)
By strictly evaluating this before doing the hashing rounds, you save the switching between the list-construction and the hashing you have with the lazily construced list, that should reduce time needed. By using an unboxed array we avoid the allocation overhead of lists, which may further improve speed (but ghc's allocator is very fast, so don't expect too much impact from that).
In your hashing rounds, get the needed Word32 via unsafeAt array t to avoid unnecessary bounds-checking.
Addendum: Unrolling the creation of the list might be faster if you put a bang on each wn, though I'm not sure. Since you already have the code, adding bangs and checking isn't too much work, is it? I'm curious.

We can use lazy arrays to get a halfway house between going straight mutable and using pure lists. You get the benefits of a recursive definition, but for that reason still pay the price of laziness and boxing -- though less so than with lists. The following code uses criterion to test two lazy array solutions (using standard arrays, and vectors) as well as the original list code and Daniel's mutable uarray code above:
module Main where
import Data.Bits
import Data.List
import Data.Word
import qualified Data.Vector as LV
import Data.Array.ST
import Data.Array.Unboxed
import qualified Data.Array as A
import Data.Array.Base
import Criterion.Main
gen :: Word32 -> Word32 -> Word32 -> Word32 -> Word32
gen a b c d = rotate (a `xor` b `xor` c `xor` d) 1
gss blkw = LV.toList v
where v = LV.fromList $ blkw ++ rest
rest = map (\i -> gen (LV.unsafeIndex v (i + 13))
(LV.unsafeIndex v (i + 8))
(LV.unsafeIndex v (i + 2))
(LV.unsafeIndex v i)
)
[0..79 - 14]
gss' blkw = A.elems v
where v = A.listArray (0,79) $ blkw ++ rest
rest = map (\i -> gen (unsafeAt v (i + 13))
(unsafeAt v (i + 8))
(unsafeAt v (i + 2))
(unsafeAt v i)
)
[0..79 - 14]
generateSchedule :: [Word32] -> [Word32]
generateSchedule blkw = take 80 ws
where
ws = blkw ++ zipWith4 gen (drop 13 ws) (drop 8 ws) (drop 2 ws) ws
gs :: [Word32] -> [Word32]
gs ws = elems (generateSched ws)
generateSched :: [Word32] -> UArray Int Word32
generateSched ws0 = runSTUArray $ do
arr <- unsafeNewArray_ (0,79)
let fromList i [] = fill i 0
fromList i (w:ws) = do
unsafeWrite arr i w
fromList (i+1) ws
fill i j
| i == 80 = return arr
| otherwise = do
d <- unsafeRead arr j
c <- unsafeRead arr (j+2)
b <- unsafeRead arr (j+8)
a <- unsafeRead arr (j+13)
unsafeWrite arr i (gen a b c d)
fill (i+1) (j+1)
fromList 0 ws0
args = [0..13]
main = defaultMain [
bench "list" $ whnf (sum . generateSchedule) args
,bench "vector" $ whnf (sum . gss) args
,bench "array" $ whnf (sum . gss') args
,bench "uarray" $ whnf (sum . gs) args
]
I compiled the code with -O2 and -funfolding-use-threshold=256 to force lots of inlining.
The criterion benchmarks demonstrate that the vector solution is slightly better, and the array solution slightly better still, but that the unboxed mutable solution still wins by a landslide:
benchmarking list
mean: 8.021718 us, lb 7.720636 us, ub 8.605683 us, ci 0.950
std dev: 2.083916 us, lb 1.237193 us, ub 3.309458 us, ci 0.950
benchmarking vector
mean: 6.829923 us, lb 6.725189 us, ub 7.226799 us, ci 0.950
std dev: 882.3681 ns, lb 76.20755 ns, ub 2.026598 us, ci 0.950
benchmarking array
mean: 6.212669 us, lb 5.995038 us, ub 6.635405 us, ci 0.950
std dev: 1.518521 us, lb 946.8826 ns, ub 2.409086 us, ci 0.950
benchmarking uarray
mean: 2.380519 us, lb 2.147896 us, ub 2.715305 us, ci 0.950
std dev: 1.411092 us, lb 1.083180 us, ub 1.862854 us, ci 0.950
I ran some basic profiling too, and noticed that the lazy/boxed array solutions did slightly better than the list solution, but again significantly worse than the unboxed array approach.

Related

Compile-time invariant/property checking in Elm

I have some value constraints/properties that I want to check at compile-time. In this example, I want to track whether a vector is normalized or not. I think I have a solution, using type tags, but I need someone with some Elm/FP experience to tell me if I have missed something obvious. Thank you.
module TagExperiment exposing (..)
type Vec t = Vec Float Float Float
type Unit = Unit
type General = General
toGeneral : Vec t -> Vec General
toGeneral (Vec x y z) =
Vec x y z
scaleBy : Vec t -> Vec t -> Vec t
scaleBy (Vec bx by bz) (Vec ax ay az) =
{- Operate on two General's or two Unit's. If you have one of each,
then the Unit needs to be cast to a General.
-}
let
mag =
sqrt ((bx * bx) + (by * by) + (bz * bz))
in
Vec (ax * mag) (ay * mag) (az * mag)
-- These cases work as desired.
a : Vec Unit
a = Vec 0 0 1
b : Vec General
b = Vec 2 2 2
d : Vec Unit
d = scaleBy a a
e : Vec General
e = scaleBy b b
g : Vec General
g = scaleBy (toGeneral a) b
h : Vec General
h = scaleBy b (toGeneral a)
-- Here is where I have trouble.
c : Vec t -- unknown... uh-oh
c = Vec 3 3 3
f : Vec t -- still unknown... sure
f = scaleBy c c
i : Vec Unit -- wrong !!!
i = scaleBy a c
j : Vec Unit -- wrong !!!
j = scaleBy c a
k : Vec General -- lucky
k = scaleBy b c
l : Vec General -- lucky
l = scaleBy c b
{- The trouble is that I am not required to specify a tag for c. I guess the
solution is to write "constructors" and make the built-in Vec constructor
opaque?
-}
newUnitVec : Float -> Float -> Float -> (Vec Unit)
newUnitVec x y z =
-- add normalization
Vec x y z
newVec : Float -> Float -> Float -> (Vec General)
newVec x y z =
Vec x y z
Yes, without Dependant Types the most ergonomic way to ensure constraints on values at compile time is to use opaque types along with a "Parse" approach.
Perhaps something like:
module Vec exposing (UnitVector, Vector, vectorToUnit)
type UnitVector
= UnitVector Float Float Float
type Vector
= Vector Float Float Float
vectorToUnit : Vector -> Maybe UnitVector
vectorToUnit (Vector x y z) =
case ( x, y, z ) of
( 0, 0, 0 ) ->
Nothing
_ ->
normalize x y z
Then, with the only ways to get a UnitVector both defined inside this module and known to obey the constraints, then any time you see a UnitVector at compile-time it is correct to assume the constraints are met.
For vectors in particular, it may be worth having a look at ianmackenzie/elm-geometry for comparison?

To memoize or not to memoize

... that is the question. I have been working on an algorithm which takes an array of vectors as input, and part of the algorithm repeatedly picks pairs of vectors and evaluates a function of these two vectors, which doesn't change over time. Looking at ways to optimize the algorithm, I thought this would be a good case for memoization: instead of recomputing the same function value over and over again, cache it lazily and hit the cache.
Before jumping to code, here is the gist of my question: the benefits I get from memoization depend on the number of vectors, which I think is inversely related to number of repeated calls, and in some circumstances memoization completely degrades performance. So is my situation inadequate for memoization? Am I doing something wrong, and are there smarter ways to optimize for my situation?
Here is a simplified test script, which is fairly close to the real thing:
open System
open System.Diagnostics
open System.Collections.Generic
let size = 10 // observations
let dim = 10 // features per observation
let runs = 10000000 // number of function calls
let rng = new Random()
let clock = new Stopwatch()
let data =
[| for i in 1 .. size ->
[ for j in 1 .. dim -> rng.NextDouble() ] |]
let testPairs = [| for i in 1 .. runs -> rng.Next(size), rng.Next(size) |]
let f v1 v2 = List.fold2 (fun acc x y -> acc + (x-y) * (x-y)) 0.0 v1 v2
printfn "Raw"
clock.Restart()
testPairs |> Array.averageBy (fun (i, j) -> f data.[i] data.[j]) |> printfn "Check: %f"
printfn "Raw: %i" clock.ElapsedMilliseconds
I create a list of random vectors (data), a random collection of indexes (testPairs), and run f on each of the pairs.
Here is the memoized version:
let memoized =
let cache = new Dictionary<(int*int),float>(HashIdentity.Structural)
fun key ->
match cache.TryGetValue(key) with
| true, v -> v
| false, _ ->
let v = f data.[fst key] data.[snd key]
cache.Add(key, v)
v
printfn "Memoized"
clock.Restart()
testPairs |> Array.averageBy (fun (i, j) -> memoized (i, j)) |> printfn "Check: %f"
printfn "Memoized: %i" clock.ElapsedMilliseconds
Here is what I am observing:
* when size is small (10), memoization goes about twice as fast as the raw version,
* when size is large (1000), memoization take 15x more time than raw version,
* when f is costly, memoization improves things
My interpretation is that when the size is small, we have more repeat computations, and the cache pays off.
What surprised me was the huge performance hit for larger sizes, and I am not certain what is causing it. I know I could improve the dictionary access a bit, with a struct key for instance - but I didn't expect the "naive" version to behave so poorly.
So - is there something obviously wrong with what I am doing? Is memoization the wrong approach for my situation, and if yes, is there a better approach?
I think memoization is a useful technique, but it is not a silver bullet. It is very useful in dynamic programming where it reduces the (theoretical) complexity of the algorithm. As an optimization, it can (as you would probably expect) have varying results.
In your case, the cache is certainly more useful when the number of observations is smaller (and f is more expensive computation). You can add simple statistics to your memoization:
let stats = ref (0, 0) // Count number of cache misses & hits
let memoized =
let cache = new Dictionary<(int*int),float>(HashIdentity.Structural)
fun key ->
let (mis, hit) = !stats
match cache.TryGetValue(key) with
| true, v -> stats := (mis, hit + 1); v // Increment hit count
| false, _ ->
stats := (mis + 1, hit); // Increment miss count
let v = f data.[fst key] data.[snd key]
cache.Add(key, v)
v
For small size, the numbers I get are something like (100, 999900) so there is a huge benefit from memoization - the function f is computed 100x and then each result is reused 9999x.
For big size, I get something like (632331, 1367669) so f is called many times and each result is reused just twice. In that case, the overhead with allocation and lookup in the (big) hash table is much bigger.
As a minor optimization, you can pre-allocate the Dictionary and write new Dictionary<_, _>(10000,HashIdentity.Structural), but that does not seem to help much in this case.
To make this optimization efficient, I think you would need to know some more information about the memoized function. In your example, the inputs are quite regular, so there is porbably no point in memoization, but if you know that the function is more often called with some values of arguments, you can perhaps only memoize only for these common arguments.
Tomas's answer is great for when you should use memoization. Here's why memoization is going so slow in your case.
It sounds like you're testing in Debug mode. Run your test again in Release and you should get a faster result for memoization. Tuples can cause a large performance hit while in Debug mode. I added a hashed version for comparison along with some micro optimizations.
Release
Raw
Check: 1.441687
Raw: 894
Memoized
Check: 1.441687
Memoized: 733
memoizedHash
Check: 1.441687
memoizedHash: 552
memoizedHashInline
Check: 1.441687
memoizedHashInline: 493
memoizedHashInline2
Check: 1.441687
memoizedHashInline2: 385
Debug
Raw
Check: 1.409310
Raw: 797
Memoized
Check: 1.409310
Memoized: 5190
memoizedHash
Check: 1.409310
memoizedHash: 593
memoizedHashInline
Check: 1.409310
memoizedHashInline: 497
memoizedHashInline2
Check: 1.409310
memoizedHashInline2: 373
Source
open System
open System.Diagnostics
open System.Collections.Generic
let size = 10 // observations
let dim = 10 // features per observation
let runs = 10000000 // number of function calls
let rng = new Random()
let clock = new Stopwatch()
let data =
[| for i in 1 .. size ->
[ for j in 1 .. dim -> rng.NextDouble() ] |]
let testPairs = [| for i in 1 .. runs -> rng.Next(size), rng.Next(size) |]
let f v1 v2 = List.fold2 (fun acc x y -> acc + (x-y) * (x-y)) 0.0 v1 v2
printfn "Raw"
clock.Restart()
testPairs |> Array.averageBy (fun (i, j) -> f data.[i] data.[j]) |> printfn "Check: %f"
printfn "Raw: %i\n" clock.ElapsedMilliseconds
let memoized =
let cache = new Dictionary<(int*int),float>(HashIdentity.Structural)
fun key ->
match cache.TryGetValue(key) with
| true, v -> v
| false, _ ->
let v = f data.[fst key] data.[snd key]
cache.Add(key, v)
v
printfn "Memoized"
clock.Restart()
testPairs |> Array.averageBy (fun (i, j) -> memoized (i, j)) |> printfn "Check: %f"
printfn "Memoized: %i\n" clock.ElapsedMilliseconds
let memoizedHash =
let cache = new Dictionary<int,float>(HashIdentity.Structural)
fun key ->
match cache.TryGetValue(key) with
| true, v -> v
| false, _ ->
let i = key / size
let j = key % size
let v = f data.[i] data.[j]
cache.Add(key, v)
v
printfn "memoizedHash"
clock.Restart()
testPairs |> Array.averageBy (fun (i, j) -> memoizedHash (i * size + j)) |> printfn "Check: %f"
printfn "memoizedHash: %i\n" clock.ElapsedMilliseconds
let memoizedHashInline =
let cache = new Dictionary<int,float>(HashIdentity.Structural)
fun key ->
match cache.TryGetValue(key) with
| true, v -> v
| false, _ ->
let i = key / size
let j = key % size
let v = f data.[i] data.[j]
cache.Add(key, v)
v
printfn "memoizedHashInline"
clock.Restart()
let mutable total = 0.0
for i, j in testPairs do
total <- total + memoizedHashInline (i * size + j)
printfn "Check: %f" (total / float testPairs.Length)
printfn "memoizedHashInline: %i\n" clock.ElapsedMilliseconds
printfn "memoizedHashInline2"
clock.Restart()
let mutable total2 = 0.0
let cache = new Dictionary<int,float>(HashIdentity.Structural)
for i, j in testPairs do
let key = (i * size + j)
match cache.TryGetValue(key) with
| true, v -> total2 <- total2 + v
| false, _ ->
let i = key / size
let j = key % size
let v = f data.[i] data.[j]
cache.Add(key, v)
total2 <- total2 + v
printfn "Check: %f" (total2 / float testPairs.Length)
printfn "memoizedHashInline2: %i\n" clock.ElapsedMilliseconds
Console.ReadLine() |> ignore

Why is `logBase 10 x` slower than `log x / log 10`, even when specialized?

solrize in #haskell asked a question about one version of this code and I tried some other cases and was wondering what was going on. On my machine the "fast" code takes ~1 second and the "slow" code takes ~1.3-1.5 (everything is compiled with ghc -O2).
import Data.List
log10 :: Double -> Double
--log10 x = log x / log 10 -- fast
--log10 = logBase 10 -- slow
--log10 = barLogBase 10 -- fast
--log10 = bazLogBase 10 -- fast
log10 = fooLogBase 10 -- see below
class Foo a where
fooLogBase :: a -> a -> a
instance Foo Double where
--fooLogBase x y = log y / log x -- slow
fooLogBase x = let lx = log x in \y -> log y / lx -- fast
barLogBase :: Double -> Double -> Double
barLogBase x y = log y / log x
bazLogBase :: Double -> Double -> Double
bazLogBase x = let lx = log x in \y -> log y / lx
main :: IO ()
main = print . foldl' (+) 0 . map log10 $ [1..1e7]
I'd've hoped that GHC would be able to turn logBase x y into exactly the same thing as log y / log x, when specialised. What's going on here, and what would be the recommended way of using logBase?
As always, look at the Core.
Fast (1.563s) -
-- note: top level constant, referred to by specialized fooLogBase
Main.main_lx :: GHC.Types.Double
Main.main_lx =
case GHC.Prim.logDouble# 10.0 of { r ->
GHC.Types.D# r
}
Main.main7 :: GHC.Types.Double -> GHC.Types.Double
Main.main7 =
\ (y :: GHC.Types.Double) ->
case y of _ { GHC.Types.D# y# ->
case GHC.Prim.logDouble# y# of { r0 ->
case Main.main_lx of { GHC.Types.D# r ->
case GHC.Prim./## r0 r of { r1 ->
GHC.Types.D# r1
}
}
}
Slow (2.013s)
-- simpler, but recomputes log10 each time
Main.main7 =
\ (y_ahD :: GHC.Types.Double) ->
case y_ahD of _ { GHC.Types.D# x_aCD ->
case GHC.Prim.logDouble# x_aCD of wild1_aCF { __DEFAULT ->
case GHC.Prim.logDouble# 10.0 of wild2_XD9 { __DEFAULT ->
case GHC.Prim./## wild1_aCF wild2_XD9 of wild3_aCz { __DEFAULT ->
GHC.Types.D# wild3_aCz
}
}
}
}
In the fast version, log10 is computed once and shared (the static argument is applied once only). In the slow version it is recomputed each time.
You can follow this line of reasoning to produce even better versions:
-- 1.30s
lx :: Double
lx = log 10
log10 :: Double -> Double
log10 y = log y / lx
main :: IO ()
main = print . foldl' (+) 0 . map log10 $ [1..1e7]
And, using array fusion, you can remove the penalty of the compositional style:
import qualified Data.Vector.Unboxed as V
lx :: Double
lx = log 10
log10 :: Double -> Double
log10 y = log y / lx
main :: IO ()
main = print . V.sum . V.map log10 $ V.enumFromN 1 (10^7)
Cutting the cost by 3x
$ time ./A
6.5657059080059275e7
real 0m0.672s
user 0m0.000s
sys 0m0.000s
Which is as good as writing it by hand. The below offers no benefit over the correctly written version above.
lx :: Double
lx = D# (GHC.Prim.logDouble# 10.0##)
log10 :: Double -> Double
log10 (D# y) = D# (case logDouble# y of r -> r /## d#)
where
D# d# = lx
main :: IO ()
main = print . V.sum . V.map log10 $ V.enumFromN 1 (10^7)
Another missed optimization: dividing by a constant (log 10) should be replaced with multiplying by the reciprocal.

How to Eliminate Cost-centres in String Taversals and List Comprehensions

I'm implementing a motif finding algorithm from the domain of bioinformatics using Haskell. I wont go into the details of the algorithm other then to say it's branch and bound median string search. I had planned on making my implementation more interesting by implementing a concurrent approach (and later an STM approach) in order to get a multicore speed up but after compiling with the follow flags
$ ghc -prof -auto-all -O2 -fllvm -threaded -rtsopts --make main
and printing the profile I saw something interesting (and perhaps obvious):
COST CENTRE entries %time %alloc
hammingDistance 34677951 47.6 14.7
motifs 4835446 43.8 71.1
It's clear that a remarkable speedup could be gained without going anywhere near multicore programming (although that's been done and I just need to find some good test data and sort out Criterion for that).
Anyway, both of these functions are purely functional and in no way concurrent. They're also doing quite simple stuff, so I was surprised that they took so much time. Here's the code for them:
data NukeTide = A | T | C | G deriving (Read, Show, Eq, Ord, Enum)
type Motif = [NukeTide]
hammingDistance :: Motif -> Motif -> Int
hammingDistance [] [] = 0
hammingDistance xs [] = 0 -- optimistic
hammingDistance [] ys = 0 -- optimistic
hammingDistance (x:xs) (y:ys) = case (x == y) of
True -> hammingDistance xs ys
False -> 1 + hammingDistance xs ys
motifs :: Int -> [a] -> [[a]]
motifs n nukeTides = [ take n $ drop k nukeTides | k <- [0..length nukeTides - n] ]
Note that of the two arguments to hammingDistance, I can actually assume that xs is going to be x long and that ys is going to be less than or equal to that, if that opens up room for improvements.
As you can see, hammingDistance calculates the hamming distance between two motifs, which are lists of nucleotides. The motifs function takes a number and a list and returns all the sub strings of that length, e.g.:
> motifs 3 "hello world"
["hel","ell","llo","lo ","o w"," wo","wor","orl","rld"]
Since the algorithmic processes involved are so simple I can't think of a way to optimize this further. I do however have two guesses as to where I should be headed:
HammingDistance: The data types I'm using (NukeTides and []) are slow/clumsy. This is just a guess, since I'm not familiar with their implementations but I think defining my own datatype, although more legible, probably involves more overhead then I intend. Also the pattern matching is foreign to me, I don't know if that is trivial or costly.
Motifs: If I'm reading this correctly, 70% of all memory allocations are done by motifs, and I'd assume that has to be garbage collected at some time. Again using the all purpose list might be slowing me down or the list comprehension, since the cost of that is incredibly unclear to me.
Does anybody have any advice on the usual procedure here? If data types are the problem, would arrays be the right answer? (I've heard they come in boxes)
Thanks for the help.
Edit: It just occurred to me that it might be useful if I describe the manner in which these two functions are called:
totalDistance :: Motif -> Int
totalDistance motif = sum $ map (minimum . map (hammingDistance motif) . motifs l) dna
This function is the result of another function, and is passed around nodes in a tree. At each node in the tree an evaluation of the nucleotide (of length <= n, that is if == n then it is a leaf node) is done, using totalDistance to score the node. From then on it's your typical branch and bound algorithm.
Edit: John asked that I print out the change I made which virutally eliminated the cost of motifs:
scoreFunction :: DNA -> Int -> (Motif -> Int)
scoreFunction dna l = totalDistance
where
-- The sum of the minimum hamming distance in each line of dna
-- is given by totalDistance motif
totalDistance motif = sum $ map (minimum . map (hammingDistance motif)) possibleMotifs
possibleMotifs = map (motifs l) dna -- Previously this was computed in the line above
I didn't make it clear in my original post, but scoreFunction is only called once, and the result is passed around in a tree traversal/branch and bound and used to evaluate nodes. Recomputing motifs at every step of the way, in retrospect, isn't one of the brightest things I've done.
Your definition of motifs looks like it's doing a lot more traversing than necessary because each application of drop has to traverse the list from the beginning. I would implement it using Data.List.tails instead:
motifs2 :: Int -> [a] -> [[a]]
motifs2 n nukeTides = map (take n) $ take count $ tails nukeTides
where count = length nukeTides - n + 1
A quick comparison in GHCi shows the difference (using sum . map length to force evaluation):
*Main> let xs = concat (replicate 10000 [A, T, C, G])
(0.06 secs, 17914912 bytes)
*Main> sum . map length $ motifs 5 xs
199980
(3.47 secs, 56561208 bytes)
*Main> sum . map length $ motifs2 5 xs
199980
(0.15 secs, 47978952 bytes)
Your definition of hammingDistance is probably much less efficient than it could be.
hammingDistance (x:xs) (y:ys) = case (x == y) of
True -> hammingDistance xs ys
False -> 1 + hammingDistance xs ys
Because of haskell's laziness, this will be expanded to (in the worst case):
(1 + (1 + (1 + ...)))
which will exist as a thunk on the stack, getting reduced only when it's used. Whether this is actually a problem depends on the call site, compiler options, etc., so it's often good practice to write your code in a form which avoids this issue altogether.
A common solution is to create a tail-recursive form with a strict accumulator, but in this case you could use higher-order functions, like this:
hammingDistance :: Motif -> Motif -> Int
hammingDistance xs ys = length . filter (uncurry (==)) $ zip xs ys
here's the tail-recursive implementation, for comparison
hammingDistance :: Motif -> Motif -> Int
hammingDistance xs ys = go 0 xs ys
where
go !acc [] [] = acc
go !acc xs [] = acc -- optimistic
go !acc [] ys = acc -- optimistic
go !acc (x:xs) (y:ys) = case (x == y) of
True -> go acc xs ys
False -> go (acc+1) xs ys
This uses the BangPatterns extension to force the accumulator to be strictly evaluated, otherwise it would have the same problem as your current definition.
To directly answer some of your other questions:
Pattern matching is trivial
Whether you should use lists or arrays depends mostly on how the data is created and how it's consumed. For this case, it's possible that lists may be the best type. In particular, if your lists are all consumed as they're created, and you don't ever need the whole list in memory, they should be fine. If you do retain lists in memory though, they have a lot of space overhead.
Usage patterns
I think the way you use these functions does some extra work as well:
(minimum . map (hammingDistance motif) . motifs l
Since you only need the minimum hammingDistance, you may be calculating a lot of extra values which aren't necessary. I can think of two solutions to this:
Option 1. Define a new function hammingDistanceThresh :: Motif -> Int -> Motif -> Int, which stops when it exceeds the threshold. The slightly odd type ordering is to facilitate using it in a fold, like this:
let motifs' = motifs l
in foldl' (hammingDistanceThresh motif) (hammingDistance motif $ head motifs') (tail motifs')
Option 2. If you define a lazy natural number type, you can use that instead of Ints for the result of hammingDistance. Then only as much of the hamming distance as necessary will be calculated.
One final note: using -auto-all will very frequently generate much slower code than other profiling options. I would suggest you try using just -auto first, and then adding manual SCC annotations if necessary.
Right... I could not resist going to the limit and wrote a plain-metal-ish packed-bits implementation:
{-# language TypeSynonymInstances #-}
{-# language BangPatterns #-}
import Data.Bits
import Data.Word
data NukeTide = A | T | C | G deriving (Read, Show, Eq, Ord, Enum)
type UnpackedMotif = [NukeTide]
type PackageType = Word32
nukesInPackage = 16 :: Int
allSetMask = complement 0 :: PackageType
-- Be careful to have length of motif == nukesInPackage here!
packNukesToWord :: UnpackedMotif -> PackageType
packNukesToWord = packAt 0
where packAt _ [] = 0
packAt i (m:ml) = (b0 m .&. bit i)
.|. (b1 m .&. bit (i+1))
.|. packAt (i+2) ml
b0 A = 0
b0 T = allSetMask
b0 C = 0
b0 G = allSetMask
b1 A = 0
b1 T = 0
b1 C = allSetMask
b1 G = allSetMask
unpackNukesWord :: PackageType -> UnpackedMotif
unpackNukesWord = unpackNNukesFromWord nukesInPackage
unpackNNukesFromWord :: Int -> PackageType -> UnpackedMotif
unpackNNukesFromWord = unpackN
where unpackN 0 _ = []
unpackN i w = (nukeOf $ w .&. r2Mask):(unpackN (i-1) $ w`shiftR`2)
nukeOf bs
| bs == 0 = A
| bs == bit 0 = T
| bs == bit 1 = C
| otherwise = G
r2Mask = (bit 1 .|. bit 0) :: PackageType
data PackedMotif = PackedMotif { motifPackets::[PackageType]
, nukesInLastPack::Int }
-- note nukesInLastPack will never be zero; motifPackets must be [] to represent empty motifs.
packNukes :: UnpackedMotif -> PackedMotif
packNukes m = case remain of
[] -> PackedMotif [packNukesToWord takeN] (length takeN)
r -> prAppend (packNukesToWord takeN) (packNukes r)
where (takeN, remain) = splitAt nukesInPackage m
prAppend w (PackedMotif l i) = PackedMotif (w:l) i
unpackNukes :: PackedMotif -> UnpackedMotif
unpackNukes (PackedMotif l i) = unpack l i
where unpack [l] i = unpackNNukesFromWord i l
unpack (l:ls) i = unpackNukesWord l ++ unpack ls i
unpack [] _ = []
instance Show PackedMotif where
show = show . unpackNukes
class Nukes a where
pLength :: a -> Int
shiftLN1 :: a -> a
hammingDistance :: a -> a -> Int
motifs :: Int -> a -> [a]
instance Nukes PackageType where
pLength _ = nukesInPackage
shiftLN1 = (`shiftR`2)
hammingDistance !x !y = fromIntegral $ abt (x `xor` y)
where abt !b = bbt(b.&.a0Mask .|. ((b.&.a1Mask) `shiftR` 1))
bbt !b = sbt $ (b.&.r16Mask) + (b `shiftR` nukesInPackage)
sbt !b = (r2Mask .&. b) + (r2Mask .&. (b`shiftR`2))
+ (r2Mask .&. (b`shiftR`4)) + (r2Mask .&. (b`shiftR`6))
+ (r2Mask .&. (b`shiftR`8)) + (r2Mask .&. (b`shiftR`10))
+ (r2Mask .&. (b`shiftR`12)) + (r2Mask .&. (b`shiftR`14))
a0Mask = 0x55555555 :: PackageType
a1Mask = 0xAAAAAAAA :: PackageType
r16Mask = 0xFFFF :: PackageType
r2Mask = 0x3 :: PackageType
motifs 0 _ = []
motifs l x = x : motifs (l-1) (shiftLN1 x)
maskNukesBut :: Int -> PackageType -> PackageType
maskNukesBut i = ( ( allSetMask `shiftR` (2*(nukesInPackage - i)) ) .&.)
instance Nukes PackedMotif where
pLength (PackedMotif (x:xs) ix) = nukesInPackage * (length xs) + ix
pLength _ = 0
shiftLN1 ξ#(PackedMotif [] _) = ξ
shiftLN1 (PackedMotif [x] ix) | ix>1 = PackedMotif [x`shiftR`2] (ix-1)
| otherwise = PackedMotif [] nukesInPackage
shiftLN1 (PackedMotif (x:x':xs) ix)
= PackedMotif (( shiftLN1 x .|. pnext ):sxs) resLMod
where sxs = motifPackets $ shiftLN1 (PackedMotif (x':xs) ix)
pnext = shiftL (x'.&.0x3) 30
resLMod = if ix > 1 then (ix-1) else nukesInPackage
hammingDistance xs ys = go 0 xs ys
where
go :: Int -> PackedMotif -> PackedMotif -> Int
go !acc (PackedMotif [x] ix) (PackedMotif [y] iy)
| ix > iy = acc + (hammingDistance y $ maskNukesBut iy x)
| otherwise = acc + (hammingDistance x $ maskNukesBut ix y)
go !acc (PackedMotif [x] ix) (PackedMotif (y:ys) iy)
= acc + (hammingDistance x $ maskNukesBut ix y)
go !acc (PackedMotif (x:xs) ix) (PackedMotif [y] iy)
= acc + (hammingDistance y $ maskNukesBut iy x)
go !acc (PackedMotif (x:xs) ix) (PackedMotif (y:ys) iy)
= go (acc + hammingDistance x y) (PackedMotif xs ix) (PackedMotif ys iy)
go !acc _ _ = acc
motifs l ξ
| l>0 = fShfts (min nukesInPackage $ pLength ξ + 1 - l) ξ >>= ct
| otherwise = []
where fShfts k χ | k > 0 = χ : fShfts (k-1) (shiftLN1 χ)
| otherwise = []
ct (PackedMotif ys iy) = case remain of
[] -> if (length takeN - 1) * nukesInPackage + iy >= l
then [PackedMotif takeN lMod] else []
_ -> PackedMotif takeN lMod : ct(PackedMotif (tail ys) iy)
where (takeN, remain) = splitAt lQuot ys
(lQuot,lMod) = case l `quotRem` nukesInPackage of
(i,0) -> (i, nukesInPackage)
(i,m) -> (i+1, m)
It can be used from UnpackedMotif = [NukeTide]s with the packNukes function, e.g.
*BioNuke0> motifs 23 $ packNukes $ take 27 $ cycle [A,T,G,C,A]
[[A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G],[T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C],[G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A],[C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A],[A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T]]
*BioNuke0> hammingDistance (packNukes [A,T,G,C,A,A,T,G]) (packNukes [A,T,C,C,A,T,G])
3
*BioNuke0> map (hammingDistance (packNukes $ take 52 $ cycle [A,T,C,C,A,T,G])) (motifs 52 $ packNukes $ take 523 $ cycle [A,T,C,C,A,T,G])
[0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44]
I haven't compared the performance to the original version yet, but it should be quite a bit faster than any algebraic-datatype implementation. Plus, it readily offers a space-efficient storage format.

Dynamically build list comprehension in Haskell

I am curious if it is possible to dynamically build a list comprehension in Haskell.
As an example, if I have the following:
all_pows (a,a') (b,b') = [ a^y * b^z | y <- take a' [0..], z <- take b' [0..] ]
I get what I am after
*Main> List.sort $ all_pows (2,3) (5,3)
[1,2,4,5,10,20,25,50,100]
However, what I'd really like is to have something like
all_pows [(Int,Int)] -> [Integer]
So that I can support N pairs of arguments without building N versions of all_pows. I'm still pretty new to Haskell so I may have overlooked something obvious. Is this even possible?
The magic of the list monad:
ghci> let powers (a, b) = [a ^ n | n <- [0 .. b-1]]
ghci> powers (2, 3)
[1,2,4]
ghci> map powers [(2, 3), (5, 3)]
[[1,2,4],[1,5,25]]
ghci> sequence it
[[1,1],[1,5],[1,25],[2,1],[2,5],[2,25],[4,1],[4,5],[4,25]]
ghci> mapM powers [(2, 3), (5, 3)]
[[1,1],[1,5],[1,25],[2,1],[2,5],[2,25],[4,1],[4,5],[4,25]]
ghci> map product it
[1,5,25,2,10,50,4,20,100]
ghci> let allPowers list = map product $ mapM powers list
ghci> allPowers [(2, 3), (5, 3)]
[1,5,25,2,10,50,4,20,100]
This probably deserves a bit more explanation.
You could have written your own
cartesianProduct :: [[a]] -> [[a]]
cartesianProduct [] = [[]]
cartesianProduct (list:lists)
= [ (x:xs) | x <- list, xs <- cartesianProduct lists ]
such that cartesianProduct [[1],[2,3],[4,5,6]] ⇒ [[1,2,4],[1,2,5],[1,2,6],[1,3,4],[1,3,5],[1,3,6]].
However, comprehensions and monads are intentionally similar. The standard Prelude has sequence :: Monad m => [m a] -> m [a], and when m is the list monad [], it actually does exactly what we wrote above.
As another shortcut, mapM :: Monad m => (a -> m b) -> [a] -> m [b] is simply a composition of sequence and map.
For each inner list of varying powers of each base, you want to multiply them to a single number. You could write this recursively
product list = product' 1 list
where product' accum [] = accum
product' accum (x:xs)
= let accum' = accum * x
in accum' `seq` product' accum' xs
or using a fold
import Data.List
product list = foldl' (*) 1 list
but actually, product :: Num a => [a] -> a is already defined! I love this language ☺☺☺