Elm calculation using fold - elm

I just started playing around with Elm and functional programming. I really like the language but I do have trouble implementing very simply calculations.
My below code takes as input wacc : Float and cfs : List Float and should calculate a net preset value (i.e. for each element of cfs calculate cfs_i / (1 + wacc)^i and then calculate the sum of the values).
The code works but is very verbose and potentially not idiomatic.
My main question besides hints how to make it more concise / idiomatic is how do I change my code to be able to accept wacc and cfs of types Maybe.
Helpful for any hint / info. Thanks!
-- helper functions
zip : List a -> List b -> List (a,b)
zip list1 list2 =
List.map2 Tuple.pair list1 list2
calcDF : Float -> Int -> List Float
calcDF wacc n =
let
waccs = List.repeat n wacc
time = List.range 0 n |> List.map toFloat
waccs_time = zip waccs time
in
List.map (\x -> 1/ (1 + Tuple.first x)^(Tuple.second x)) waccs_time
-- my npv function
calcNPV : List Float -> Float -> Html text
calcNPV cfs wacc =
let
n = List.length cfs
df = calcDF wacc n
cfs_df = zip cfs df
in
List.map (\x -> (Tuple.first x) * (Tuple.second x)) cfs_df
|> List.foldl (+) 0
Example:
calcNPV [100,100,100] 0.1
-- returns 273.553719

I'm not sure why you want to use Maybes.
But as you suspected, you have made the current code more complex than needed. Here's a reworking of the first function. You want n discount values, so we start by creating something with n items to loop over and then just do the calculation in the map function
calcDF : Float -> Int -> List Float
calcDF wacc n =
List.range 0 n
|> List.map (calcDF_ wacc)
calcDF_ : Float -> Int -> Float
calcDF_ wacc idx =
1 / (1 + toFloat idx) ^ wacc
If you use https://package.elm-lang.org/packages/elm-community/list-extra/latest/List-Extra#indexedFoldl you could simplify the main function just to call calcDF_ while looping folding over cfs and skip calcDF altogether

With the help from Elm discourse forum (see here) I came up with the following solutions.
NPV calculation without type Maybe
calcNPV : List Float -> Float -> Float
calcNPV cashflows wacc =
let
time = List.length cashflows |> List.range 0 |> List.map toFloat
waccs = List.repeat (List.length cashflows) wacc
calcPV : Float -> Float -> Float -> Float
calcPV cf i t = cf / (1+i)^t
in
List.map3 calcPV cashflows waccs time |> List.foldl (+) 0
NPV calculation with type Maybe
calcMaybeNPV : List (Maybe Float) -> Maybe Float -> Maybe Float
calcMaybeNPV maybecashflows maybewacc =
let
time = List.length maybecashflows |> List.range 0 |> List.map (\x -> Just (toFloat x))
waccs = List.repeat (List.length maybecashflows) maybewacc
calcPV : Maybe Float -> Maybe Float -> Maybe Float -> Maybe Float
calcPV cf i t =
Maybe.map3 (\a b c -> a / (1+b)^c) cf i t
in
List.map3 calcPV maybecashflows waccs time |> List.foldl (Maybe.map2 (+)) (Just 0)

Related

Partition list into more than 2 parts

So I want to partitision a List ItemModel in Elm into List (List ItemModel). List.partition only makes the list into two lists.
I wrote some code that makes the list into the parts I want (code below).
But it's not as nice of a solution as I'd like, and since it seems like an issue many people would have, I wonder are there better examples of doing this?
partition : List (ItemModel -> Bool) -> List ItemModel -> List (List ItemModel)
partition filters models =
let
filterMaybe =
List.head filters
in
case filterMaybe of
Just filter ->
let
part =
Tuple.first (List.partition filter models)
in
part :: (partition (List.drop 1 filters) models)
Nothing ->
[]
The returned list maps directly from the filters parameter, so it's actually pretty straightforward to do this using just List.map and List.filter (which is what you're really doing since you're discarding the remainder list returned from List.partition):
multifilter : List (a -> Bool) -> List a -> List (List a)
multifilter filters values =
filters |> List.map(\filter -> List.filter filter values)
Repeated partitioning needs to use the leftovers from each step as the input for the next step. This is different than simple repeated filtering of the same sequence by several filters.
In Haskell (which this question was initially tagged as, as well),
partitions :: [a -> Bool] -> [a] -> [[a]]
partitions preds xs = go preds xs
where
go [] xs = []
go (p:ps) xs = let { (a,b) = partition p xs } in (a : go ps b)
which is to say,
partitions preds xs = foldr g (const []) preds xs
where
g p r xs = let { (a,b) = partition p xs } in (a : r b)
or
-- mapAccumL :: (acc -> x -> (acc, y)) -> acc -> [x] -> (acc, [y])
partitions preds xs = snd $ mapAccumL (\xs p -> partition (not . p) xs) xs preds
Testing:
> partitions [ (<5), (<10), const True ] [1..15]
[[1,2,3,4],[5,6,7,8,9],[10,11,12,13,14,15]]
unlike the repeated filtering,
> [ filter p xs | let xs = [1..15], p <- [ (<5), (<10), const True ]]
[[1,2,3,4],[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]]

Dependent types: enforcing global properties in inductive types

I have the following inductive type MyVec:
import Data.Vect
data MyVec: {k: Nat} -> Vect k Nat -> Type where
Nil: MyVec []
(::): {k, n: Nat} -> {v: Vect k Nat} -> Vect n Nat -> MyVec v -> MyVec (n :: v)
-- example:
val: MyVec [3,2,3]
val = [[2,1,2], [0,2], [1,1,0]]
That is, the type specifies the lengths of all vectors inside a MyVec.
The problem is, val will have k = 3 (k is the number of vectors inside a MyVec), but the ctor :: does not know this fact. It will first build a MyVec with k = 1, then with 2, and finally with 3. This makes it impossible to define constraints based on the final shape of the value.
For example, I cannot constrain the values to be strictly less than k. Accepting Vects of Fin (S k) instead of Vects of Nat would rule out some valid values, because the last vectors (the first inserted by the ctor) would "know" a smaller value of k, and thus a stricter contraint.
Or, another example, I cannot enforce the following constraint: the vector at position i cannot contain the number i. Because the final position of a vector in the container is not known to the ctor (it would be automatically known if the final value of k was known).
So the question is, how can I enforce such global properties?
There are (at least) two ways to do it, both of which may require tracking additional information in order to check the property.
Enforcing properties in the data definition
Enforcing all elements < k
I cannot constrain the values to be strictly less than k. Accepting Vects of Fin (S k) instead of Vects of Nat would rule out some valid values...
You are right that simply changing the definition of MyVect to have Vect n (Fin (S k)) in it would not be correct.
However, it is not too hard to fix this by generalizing MyVect to be polymorphic, as follows.
data MyVec: (A : Type) -> {k: Nat} -> Vect k Nat -> Type where
Nil: {A : Type} -> MyVec A []
(::): {A : Type} -> {k, n: Nat} -> {v: Vect k Nat} -> Vect n A -> MyVec A v -> MyVec A (n :: v)
val : MyVec (Fin 3) [3,2,3]
val = [[2,1,2], [0,2], [1,1,0]]
The key to this solution is separating the type of the vector from k in the definition of MyVec, and then, at top level, using the "global value of k to constrain the type of vector elements.
Enforcing vector at position i does not contain i
I cannot enforce that the vector at position i cannot contain the number i because the final position of a vector in the container is not known to the constructor.
Again, the solution is to generalize the data definition to keep track of the necessary information. In this case, we'd like to keep track of what the current position in the final value will be. I call this index. I then generalize A to be passed the current index. Finally, at top level, I pass in a predicate enforcing that the value does not equal the index.
data MyVec': (A : Nat -> Type) -> (index : Nat) -> {k: Nat} -> Vect k Nat -> Type where
Nil: {A : Nat -> Type} -> {index : Nat} -> MyVec' A index []
(::): {A : Nat -> Type} -> {k, n, index: Nat} -> {v: Vect k Nat} ->
Vect n (A index) -> MyVec' A (S index) v -> MyVec' A index (n :: v)
val : MyVec' (\n => (m : Nat ** (n == m = False))) 0 [3,2,3]
val = [[(2 ** Refl),(1 ** Refl),(2 ** Refl)], [(0 ** Refl),(2 ** Refl)], [(1 ** Refl),(1 ** Refl),(0 ** Refl)]]
Enforcing properties after the fact
Another, sometimes simpler way to do it, is to not enforce the property immediately in the data definition, but to write a predicate after the fact.
Enforcing all elements < k
For example, we can write a predicate that checks whether all elements of all vectors are < k, and then assert that our value has this property.
wf : (final_length : Nat) -> {k : Nat} -> {v : Vect k Nat} -> MyVec v -> Bool
wf final_length [] = True
wf final_length (v :: mv) = isNothing (find (\x => x >= final_length) v) && wf final_length mv
val : (mv : MyVec [3,2,3] ** wf 3 mv = True)
val = ([[2,1,2], [0,2], [1,1,0]] ** Refl)
Enforcing vector at position i does not contain i
Again, we can express the property by checking it, and then asserting that the value has the property.
wf : (index : Nat) -> {k : Nat} -> {v : Vect k Nat} -> MyVec v -> Bool
wf index [] = True
wf index (v :: mv) = isNothing (find (\x => x == index) v) && wf (S index) mv
val : (mv : MyVec [3,2,3] ** wf 0 mv = True)
val = ([[2,1,2], [0,2], [1,1,0]] ** Refl)

Idris Vect.fromList usage with generated list

I am trying to feel my way into dependent types. Based on the logic of the windowl function below, I want to return a list of vectors whose length depend on the size provided.
window : (n : Nat) -> List a -> List (Vect n a)
window size = map fromList loop
where
loop xs = case splitAt size xs of
(ys, []) => if length ys == size then [ys] else []
(ys, _) => ys :: loop (drop 1 xs)
windowl : Nat -> List a -> List (List a)
windowl size = loop
where
loop xs = case List.splitAt size xs of
(ys, []) => if length ys == size then [ys] else []
(ys, _) => ys :: loop (drop 1 xs)
When I attempt to load the function into Idris, I get the following:
When checking argument func to function Prelude.Functor.map:
Type mismatch between
(l : List elem) -> Vect (length l) elem (Type of fromList)
and
a1 -> List (Vect size a) (Expected type)
Specifically:
Type mismatch between
Vect (length v0) elem
and
List (Vect size a)
When reading the documentation on fromList I notice that it says
The length of the list should be statically known.
So I assume that the type error has to do with Idris not knowing that the length of the list is corresponding to the size specified.
I am stuck because I don't even know if it is something impossible I want to do or whether I can specify that the length of the list corresponds to the length of the vector that I want to produce.
Is there a way to do that?
Since in your case it is not possible to know the length statically, we need a function which can fail at run-time:
total
fromListOfLength : (n : Nat) -> (xs : List a) -> Maybe (Vect n a)
fromListOfLength n xs with (decEq (length xs) n)
fromListOfLength n xs | (Yes prf) = rewrite (sym prf) in Just (fromList xs)
fromListOfLength n xs | (No _) = Nothing
fromListOfLength converts a list of length n into a vector of length n or fails. Now let's combine it and windowl to get to window.
total
window : (n : Nat) -> List a -> List (Vect n a)
window n = catMaybes . map (fromListOfLength n) . windowl n
Observe that the window function's type is still an underspecification of what we are doing with the input list, because nothing prevents us from always returning the empty list (this could happen if fromListOfLength returned Nothing all the time).

Finding and replacing

There are times that we want to find an element in a list with a function a -> Bool and replace it using a function a -> a, this may result in a new list:
findr :: (a -> Bool) -> (a -> a) -> [a] -> Maybe [a]
findr _ _ [] = Nothing
findr p f (x:xs)
| p x = Just (f x : xs)
| otherwise = case findr p f xs of Just xs -> Just (x:xs)
_ -> Nothing
Is there any function in the main modules which is similar to this?
Edit: #gallais points out below that you end up only changing the first instance; I thought you were changing every instance.
This is done with break :: (a -> Bool) -> [a] -> ([a], [a]) which gives you the longest prefix which does not satisfy the predicate, followed by the rest of the list.
findr p f list = case break p list of
(xs, y : ys) -> Just (xs ++ f y : ys)
(_, []) -> Nothing
This function is, of course, map, as long as you can combine your predicate function and replacement function the right way.
findr check_f replace_f xs = map (replace_if_needed check_f replace_f) xs
replace_if_needed :: (a -> Bool) -> (a -> a) -> (a -> a)
replace_if_needed check_f replace_f = \x -> if check_f x then replace_f x else x
Now you can do things like findr isAplha toUpper "a123-bc".

How to Eliminate Cost-centres in String Taversals and List Comprehensions

I'm implementing a motif finding algorithm from the domain of bioinformatics using Haskell. I wont go into the details of the algorithm other then to say it's branch and bound median string search. I had planned on making my implementation more interesting by implementing a concurrent approach (and later an STM approach) in order to get a multicore speed up but after compiling with the follow flags
$ ghc -prof -auto-all -O2 -fllvm -threaded -rtsopts --make main
and printing the profile I saw something interesting (and perhaps obvious):
COST CENTRE entries %time %alloc
hammingDistance 34677951 47.6 14.7
motifs 4835446 43.8 71.1
It's clear that a remarkable speedup could be gained without going anywhere near multicore programming (although that's been done and I just need to find some good test data and sort out Criterion for that).
Anyway, both of these functions are purely functional and in no way concurrent. They're also doing quite simple stuff, so I was surprised that they took so much time. Here's the code for them:
data NukeTide = A | T | C | G deriving (Read, Show, Eq, Ord, Enum)
type Motif = [NukeTide]
hammingDistance :: Motif -> Motif -> Int
hammingDistance [] [] = 0
hammingDistance xs [] = 0 -- optimistic
hammingDistance [] ys = 0 -- optimistic
hammingDistance (x:xs) (y:ys) = case (x == y) of
True -> hammingDistance xs ys
False -> 1 + hammingDistance xs ys
motifs :: Int -> [a] -> [[a]]
motifs n nukeTides = [ take n $ drop k nukeTides | k <- [0..length nukeTides - n] ]
Note that of the two arguments to hammingDistance, I can actually assume that xs is going to be x long and that ys is going to be less than or equal to that, if that opens up room for improvements.
As you can see, hammingDistance calculates the hamming distance between two motifs, which are lists of nucleotides. The motifs function takes a number and a list and returns all the sub strings of that length, e.g.:
> motifs 3 "hello world"
["hel","ell","llo","lo ","o w"," wo","wor","orl","rld"]
Since the algorithmic processes involved are so simple I can't think of a way to optimize this further. I do however have two guesses as to where I should be headed:
HammingDistance: The data types I'm using (NukeTides and []) are slow/clumsy. This is just a guess, since I'm not familiar with their implementations but I think defining my own datatype, although more legible, probably involves more overhead then I intend. Also the pattern matching is foreign to me, I don't know if that is trivial or costly.
Motifs: If I'm reading this correctly, 70% of all memory allocations are done by motifs, and I'd assume that has to be garbage collected at some time. Again using the all purpose list might be slowing me down or the list comprehension, since the cost of that is incredibly unclear to me.
Does anybody have any advice on the usual procedure here? If data types are the problem, would arrays be the right answer? (I've heard they come in boxes)
Thanks for the help.
Edit: It just occurred to me that it might be useful if I describe the manner in which these two functions are called:
totalDistance :: Motif -> Int
totalDistance motif = sum $ map (minimum . map (hammingDistance motif) . motifs l) dna
This function is the result of another function, and is passed around nodes in a tree. At each node in the tree an evaluation of the nucleotide (of length <= n, that is if == n then it is a leaf node) is done, using totalDistance to score the node. From then on it's your typical branch and bound algorithm.
Edit: John asked that I print out the change I made which virutally eliminated the cost of motifs:
scoreFunction :: DNA -> Int -> (Motif -> Int)
scoreFunction dna l = totalDistance
where
-- The sum of the minimum hamming distance in each line of dna
-- is given by totalDistance motif
totalDistance motif = sum $ map (minimum . map (hammingDistance motif)) possibleMotifs
possibleMotifs = map (motifs l) dna -- Previously this was computed in the line above
I didn't make it clear in my original post, but scoreFunction is only called once, and the result is passed around in a tree traversal/branch and bound and used to evaluate nodes. Recomputing motifs at every step of the way, in retrospect, isn't one of the brightest things I've done.
Your definition of motifs looks like it's doing a lot more traversing than necessary because each application of drop has to traverse the list from the beginning. I would implement it using Data.List.tails instead:
motifs2 :: Int -> [a] -> [[a]]
motifs2 n nukeTides = map (take n) $ take count $ tails nukeTides
where count = length nukeTides - n + 1
A quick comparison in GHCi shows the difference (using sum . map length to force evaluation):
*Main> let xs = concat (replicate 10000 [A, T, C, G])
(0.06 secs, 17914912 bytes)
*Main> sum . map length $ motifs 5 xs
199980
(3.47 secs, 56561208 bytes)
*Main> sum . map length $ motifs2 5 xs
199980
(0.15 secs, 47978952 bytes)
Your definition of hammingDistance is probably much less efficient than it could be.
hammingDistance (x:xs) (y:ys) = case (x == y) of
True -> hammingDistance xs ys
False -> 1 + hammingDistance xs ys
Because of haskell's laziness, this will be expanded to (in the worst case):
(1 + (1 + (1 + ...)))
which will exist as a thunk on the stack, getting reduced only when it's used. Whether this is actually a problem depends on the call site, compiler options, etc., so it's often good practice to write your code in a form which avoids this issue altogether.
A common solution is to create a tail-recursive form with a strict accumulator, but in this case you could use higher-order functions, like this:
hammingDistance :: Motif -> Motif -> Int
hammingDistance xs ys = length . filter (uncurry (==)) $ zip xs ys
here's the tail-recursive implementation, for comparison
hammingDistance :: Motif -> Motif -> Int
hammingDistance xs ys = go 0 xs ys
where
go !acc [] [] = acc
go !acc xs [] = acc -- optimistic
go !acc [] ys = acc -- optimistic
go !acc (x:xs) (y:ys) = case (x == y) of
True -> go acc xs ys
False -> go (acc+1) xs ys
This uses the BangPatterns extension to force the accumulator to be strictly evaluated, otherwise it would have the same problem as your current definition.
To directly answer some of your other questions:
Pattern matching is trivial
Whether you should use lists or arrays depends mostly on how the data is created and how it's consumed. For this case, it's possible that lists may be the best type. In particular, if your lists are all consumed as they're created, and you don't ever need the whole list in memory, they should be fine. If you do retain lists in memory though, they have a lot of space overhead.
Usage patterns
I think the way you use these functions does some extra work as well:
(minimum . map (hammingDistance motif) . motifs l
Since you only need the minimum hammingDistance, you may be calculating a lot of extra values which aren't necessary. I can think of two solutions to this:
Option 1. Define a new function hammingDistanceThresh :: Motif -> Int -> Motif -> Int, which stops when it exceeds the threshold. The slightly odd type ordering is to facilitate using it in a fold, like this:
let motifs' = motifs l
in foldl' (hammingDistanceThresh motif) (hammingDistance motif $ head motifs') (tail motifs')
Option 2. If you define a lazy natural number type, you can use that instead of Ints for the result of hammingDistance. Then only as much of the hamming distance as necessary will be calculated.
One final note: using -auto-all will very frequently generate much slower code than other profiling options. I would suggest you try using just -auto first, and then adding manual SCC annotations if necessary.
Right... I could not resist going to the limit and wrote a plain-metal-ish packed-bits implementation:
{-# language TypeSynonymInstances #-}
{-# language BangPatterns #-}
import Data.Bits
import Data.Word
data NukeTide = A | T | C | G deriving (Read, Show, Eq, Ord, Enum)
type UnpackedMotif = [NukeTide]
type PackageType = Word32
nukesInPackage = 16 :: Int
allSetMask = complement 0 :: PackageType
-- Be careful to have length of motif == nukesInPackage here!
packNukesToWord :: UnpackedMotif -> PackageType
packNukesToWord = packAt 0
where packAt _ [] = 0
packAt i (m:ml) = (b0 m .&. bit i)
.|. (b1 m .&. bit (i+1))
.|. packAt (i+2) ml
b0 A = 0
b0 T = allSetMask
b0 C = 0
b0 G = allSetMask
b1 A = 0
b1 T = 0
b1 C = allSetMask
b1 G = allSetMask
unpackNukesWord :: PackageType -> UnpackedMotif
unpackNukesWord = unpackNNukesFromWord nukesInPackage
unpackNNukesFromWord :: Int -> PackageType -> UnpackedMotif
unpackNNukesFromWord = unpackN
where unpackN 0 _ = []
unpackN i w = (nukeOf $ w .&. r2Mask):(unpackN (i-1) $ w`shiftR`2)
nukeOf bs
| bs == 0 = A
| bs == bit 0 = T
| bs == bit 1 = C
| otherwise = G
r2Mask = (bit 1 .|. bit 0) :: PackageType
data PackedMotif = PackedMotif { motifPackets::[PackageType]
, nukesInLastPack::Int }
-- note nukesInLastPack will never be zero; motifPackets must be [] to represent empty motifs.
packNukes :: UnpackedMotif -> PackedMotif
packNukes m = case remain of
[] -> PackedMotif [packNukesToWord takeN] (length takeN)
r -> prAppend (packNukesToWord takeN) (packNukes r)
where (takeN, remain) = splitAt nukesInPackage m
prAppend w (PackedMotif l i) = PackedMotif (w:l) i
unpackNukes :: PackedMotif -> UnpackedMotif
unpackNukes (PackedMotif l i) = unpack l i
where unpack [l] i = unpackNNukesFromWord i l
unpack (l:ls) i = unpackNukesWord l ++ unpack ls i
unpack [] _ = []
instance Show PackedMotif where
show = show . unpackNukes
class Nukes a where
pLength :: a -> Int
shiftLN1 :: a -> a
hammingDistance :: a -> a -> Int
motifs :: Int -> a -> [a]
instance Nukes PackageType where
pLength _ = nukesInPackage
shiftLN1 = (`shiftR`2)
hammingDistance !x !y = fromIntegral $ abt (x `xor` y)
where abt !b = bbt(b.&.a0Mask .|. ((b.&.a1Mask) `shiftR` 1))
bbt !b = sbt $ (b.&.r16Mask) + (b `shiftR` nukesInPackage)
sbt !b = (r2Mask .&. b) + (r2Mask .&. (b`shiftR`2))
+ (r2Mask .&. (b`shiftR`4)) + (r2Mask .&. (b`shiftR`6))
+ (r2Mask .&. (b`shiftR`8)) + (r2Mask .&. (b`shiftR`10))
+ (r2Mask .&. (b`shiftR`12)) + (r2Mask .&. (b`shiftR`14))
a0Mask = 0x55555555 :: PackageType
a1Mask = 0xAAAAAAAA :: PackageType
r16Mask = 0xFFFF :: PackageType
r2Mask = 0x3 :: PackageType
motifs 0 _ = []
motifs l x = x : motifs (l-1) (shiftLN1 x)
maskNukesBut :: Int -> PackageType -> PackageType
maskNukesBut i = ( ( allSetMask `shiftR` (2*(nukesInPackage - i)) ) .&.)
instance Nukes PackedMotif where
pLength (PackedMotif (x:xs) ix) = nukesInPackage * (length xs) + ix
pLength _ = 0
shiftLN1 ξ#(PackedMotif [] _) = ξ
shiftLN1 (PackedMotif [x] ix) | ix>1 = PackedMotif [x`shiftR`2] (ix-1)
| otherwise = PackedMotif [] nukesInPackage
shiftLN1 (PackedMotif (x:x':xs) ix)
= PackedMotif (( shiftLN1 x .|. pnext ):sxs) resLMod
where sxs = motifPackets $ shiftLN1 (PackedMotif (x':xs) ix)
pnext = shiftL (x'.&.0x3) 30
resLMod = if ix > 1 then (ix-1) else nukesInPackage
hammingDistance xs ys = go 0 xs ys
where
go :: Int -> PackedMotif -> PackedMotif -> Int
go !acc (PackedMotif [x] ix) (PackedMotif [y] iy)
| ix > iy = acc + (hammingDistance y $ maskNukesBut iy x)
| otherwise = acc + (hammingDistance x $ maskNukesBut ix y)
go !acc (PackedMotif [x] ix) (PackedMotif (y:ys) iy)
= acc + (hammingDistance x $ maskNukesBut ix y)
go !acc (PackedMotif (x:xs) ix) (PackedMotif [y] iy)
= acc + (hammingDistance y $ maskNukesBut iy x)
go !acc (PackedMotif (x:xs) ix) (PackedMotif (y:ys) iy)
= go (acc + hammingDistance x y) (PackedMotif xs ix) (PackedMotif ys iy)
go !acc _ _ = acc
motifs l ξ
| l>0 = fShfts (min nukesInPackage $ pLength ξ + 1 - l) ξ >>= ct
| otherwise = []
where fShfts k χ | k > 0 = χ : fShfts (k-1) (shiftLN1 χ)
| otherwise = []
ct (PackedMotif ys iy) = case remain of
[] -> if (length takeN - 1) * nukesInPackage + iy >= l
then [PackedMotif takeN lMod] else []
_ -> PackedMotif takeN lMod : ct(PackedMotif (tail ys) iy)
where (takeN, remain) = splitAt lQuot ys
(lQuot,lMod) = case l `quotRem` nukesInPackage of
(i,0) -> (i, nukesInPackage)
(i,m) -> (i+1, m)
It can be used from UnpackedMotif = [NukeTide]s with the packNukes function, e.g.
*BioNuke0> motifs 23 $ packNukes $ take 27 $ cycle [A,T,G,C,A]
[[A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G],[T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C],[G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A],[C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A],[A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T,G,C,A,A,T]]
*BioNuke0> hammingDistance (packNukes [A,T,G,C,A,A,T,G]) (packNukes [A,T,C,C,A,T,G])
3
*BioNuke0> map (hammingDistance (packNukes $ take 52 $ cycle [A,T,C,C,A,T,G])) (motifs 52 $ packNukes $ take 523 $ cycle [A,T,C,C,A,T,G])
[0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44,38,52,0,52,36,45,44]
I haven't compared the performance to the original version yet, but it should be quite a bit faster than any algebraic-datatype implementation. Plus, it readily offers a space-efficient storage format.