optimization of a haskell code - optimization

I write the following Haskell code which take a triplet (x,y,z) and a list of triplets [(Int,Int,Int)] and look if there is a triplet (a,b,c) in the list such that x == a and y == b if it is a case i just need to update c = c + z, if there is not a such of triplet in the list I just add the triplet in the list.
-- insertEdge :: (Int,Int,Int) -> [(Int, Int, Int)] -> [(Int, Int, Int)]
insertEdge (x,y,z) cs =
if (length [(a,b,c) | (a,b,c) <- cs, a /= x || b /= y]) == (length cs)
then ((x,y,z):cs))
else [if (a == x && b == y) then (a,b,c+1) else (a,b,c) | (a,b,c) <- cs]
After profiling my code it appears that this fuction take 65% of the execution time.
How can I re-write my code to be more efficient?

Other answers are correct, so I want to offer some unasked-for advice instead: how about using Data.Map (Int,Int) Int instead of list?
Then your function becomes insertWith (+) (a,b) c mymap

The first thing that jumps out at me is the conditional: length examines the entire list, so in the worst-case scenario (updating the last element) your function traverses the list three times: Once for the length of the filtered list, once for the length of cs, and once to find the element to update.
However, even getting rid of the extra traversals, the best you can do with the function as written will usually require a traversal of most of the list. From the name of the function and how much time was being spent in it, I'm guessing you're calling this repeatedly to build up a data structure? If so, you should strongly consider using a more efficient representation.
For instance, a quick and easy improvement would be to use Data.Map, the first two elements of the triplet in a 2-tuple as the key, and the third element as the value. That way you can avoid making so many linear-time lookups/redundant traversals.
As a rule of thumb, lists in Haskell are only an appropriate data structure when all you do is either walk sequentially down the list a few times (ideally, just once) or add/remove from the head of the list (i.e., using it like a stack). If you're searching, filtering, updating elements in the middle, or--worst of all--indexing by position, using lists will only end in tears.
Here's a quick example, if that helps:
import qualified Data.Map as M
incEdge :: M.Map (Int, Int) Int -> ((Int, Int), Int) -> M.Map (Int, Int) Int
incEdge cs (k,v) = M.alter f k cs
where f (Just n) = Just $ n + v
f Nothing = Just v
The alter function is just insert/update/delete all rolled into one. This inserts the key into the map if it's not there, and sums the values if the key does exist. To build up a structure incrementally, you can do something like foldl incEdge M.empty edgeList. Testing this out, for a few thousand random edges your version with a list takes several seconds, whereas the Data.Map version is pretty much immediate.

It's always a good idea to benchmark (and Criterion makes it so easy). Here are the results for the original solution (insertEdgeO), Geoff's foldr (insertEdgeF), and Data.Map (insertEdgeM):
benchmarking insertEdgeO...
mean: 380.5062 ms, lb 379.5357 ms, ub 381.1074 ms, ci 0.950
benchmarking insertEdgeF...
mean: 74.54564 ms, lb 74.40043 ms, ub 74.71190 ms, ci 0.950
benchmarking insertEdgeM...
mean: 18.12264 ms, lb 18.03029 ms, ub 18.21342 ms, ci 0.950
Here's the code (I compiled with -O2):
module Main where
import Criterion.Main
import Data.List (foldl')
import qualified Data.Map as M
insertEdgeO :: (Int, Int, Int) -> [(Int, Int, Int)] -> [(Int, Int, Int)]
insertEdgeO (x, y, z) cs =
if length [(a, b, c) | (a, b, c) <- cs, a /= x || b /= y] == length cs
then (x, y, z) : cs
else [if (a == x && b == y) then (a, b, c + z) else (a, b, c) | (a, b, c) <- cs]
insertEdgeF :: (Int, Int, Int) -> [(Int, Int, Int)] -> [(Int, Int, Int)]
insertEdgeF (x,y,z) cs =
case foldr f (False, []) cs of
(False, cs') -> (x, y, z) : cs'
(True, cs') -> cs'
where
f (a, b, c) (e, cs')
| (a, b) == (x, y) = (True, (a, b, c + z) : cs')
| otherwise = (e, (a, b, c) : cs')
insertEdgeM :: (Int, Int, Int) -> M.Map (Int, Int) Int -> M.Map (Int, Int) Int
insertEdgeM (a, b, c) = M.insertWith (+) (a, b) c
testSet n = [(a, b, c) | a <- [1..n], b <- [1..n], c <- [1..n]]
testO = foldl' (flip insertEdgeO) [] . testSet
testF = foldl' (flip insertEdgeF) [] . testSet
testM = triplify . M.toDescList . foldl' (flip insertEdgeM) M.empty . testSet
where
triplify = map (\((a, b), c) -> (a, b, c))
main = let n = 25 in defaultMain
[ bench "insertEdgeO" $ nf testO n
, bench "insertEdgeF" $ nf testF n
, bench "insertEdgeM" $ nf testM n
]
You can improve insertEdgeF a bit by using foldl' (55.88634 ms), but Data.Map still wins.

The main reason your function is slow is that it traverses the list at least twice, maybe three times. The function can be rewritten to to traverse the list only once using a fold. This will transform the list into a tuple (Bool,[(Int,Int,Int)]) where the Bool indicates if there was a matching element in the list and the list is the transformed list
insertEdge (x,y,z) cs = case foldr f (False,[]) cs of
(False,cs') -> (x,y,z):cs'
(True,cs') -> cs'
where f (a,b,c) (e,cs') = if (a,b) == (x,y) then (True,(a,b,c+z):cs') else (e,(a,b,c):cs')
If you haven't seen foldr before, it has type
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr embodies a pattern of recursive list processing of defining a base case and combining the current list element with the result from the rest of the list. Writing foldr f b xs is the same as writing a function g with definition
g [] = b
g (x:xs) = f x (g xs)

Sticking with your data structure, you might
type Edge = (Int,Int,Int)
insertEdge :: Edge -> [Edge] -> [Edge]
insertEdge t#(x,y,z) es =
case break (abx t) es of
(_, []) -> t : es
(l, ((_,_,zold):r)) -> l ++ (x,y,z+zold) : r
where abx (a1,b1,_) (a2,b2,_) = a1 == a2 && b1 == b2
No matter what language you're using, searching lists is always a red flag. When searching you want sublinear complexity (think: hashes, binary search trees, and so on). In Haskell, an implementation using Data.Map is
import Data.Map
type Edge = (Int,Int,Int)
type EdgeMap = Map (Int,Int) Int
insertEdge :: Edge -> EdgeMap -> EdgeMap
insertEdge (x,y,z) es = alter accumz (x,y) es
where accumz Nothing = Just z
accumz (Just zold) = Just (z + zold)
You may not be familiar with alter:
alter :: Ord k => (Maybe a -> Maybe a) -> k -> Map k a -> Map k a
O(log n). The expression (alter f k map) alters the value x at k, or absence thereof. alter can be used to insert, delete, or update a value in a Map. In short: lookup k (alter f k m) = f (lookup k m).
let f _ = Nothing
alter f 7 (fromList [(5,"a"), (3,"b")]) == fromList [(3, "b"), (5, "a")]
alter f 5 (fromList [(5,"a"), (3,"b")]) == singleton 3 "b"
let f _ = Just "c"
alter f 7 (fromList [(5,"a"), (3,"b")]) == fromList [(3, "b"), (5, "a"), (7, "c")]
alter f 5 (fromList [(5,"a"), (3,"b")]) == fromList [(3, "b"), (5, "c")]
But as ADEpt shows in another answer, this is a bit of overengineering.

In
insertEdgeM :: (Int, Int, Int) -> M.Map (Int, Int) Int -> M.Map (Int, Int) Int
insertEdgeM (a, b, c) = M.insertWith (+) (a, b) c
you want to use the strict version of insertWith, namely insertWith'.

Very small optimisation: Use an as-pattern, this avoids multiple reconstructions of the same tuple. Like this:
insertEdge xyz#(x,y,z) cs =
if (length [abc | abc#(a,b,c) <- cs, a /= x || b /= y]) == (length cs)
then (xyz:cs))
else [if (a == x && b == y) then (a,b,c+1) else abc' | abc'#(a,b,c) <- cs]
You should apply the other optimization hionts first, but this may save a very small amount of time, since the tuple doesn't have to be reconstructed again and again. At least in the last at-pattern (The first two patterns are not important, since the tuple never gets evaluated in the first case and the as-pattern is only applied once in the second case).

Related

Is it possible to use the intermediate result of a pipe in F#?

I have to implement a function that takes two lists of tuples let foo l1 l2 and has to append them and apply a recursive function let rec bar x l to one element of each tuple in the list.
The number of recursive calls depends on l, so I'd like to use the intermediate results of the pipe as l in order to reduce the calls instead of saving the initial list and pass that one.
My current solution is as follows and I'd like to optimise it with some sort of dynamic programming solution
let foo l1 l2 =
l = l1 # l2
l |> List.map (fun (x, y) -> x, bar y l)

Understanding 'impossible'

Type-Driven Development with Idris presents:
twoPlusTwoNotFive : 2 + 2 = 5 -> Void
twoPlusTwoNotFive Refl impossible
Is the above a function or value? If it's the former, then why is there no variable arguments, e.g.
add1 : Int -> Int
add1 x = x + 1
In particular, I'm confused at the lack of = in twoPlusTwoNotFive.
impossible calls out combinations of arguments which are, well, impossible. Idris absolves you of the responsibility to provide a right-hand side when a case is impossible.
In this instance, we're writing a function of type (2 + 2 = 5) -> Void. Void is a type with no values, so if we succeed in implementing such a function we should expect that all of its cases will turn out to be impossible. Now, = has only one constructor (Refl : x = x), and it can't be used here because it requires ='s arguments to be definitionally equal - they have to be the same x. So, naturally, it's impossible. There's no way anyone could successfully call this function at runtime, and we're saved from having to prove something that isn't true, which would have been quite a big ask.
Here's another example: you can't index into an empty vector. Scrutinising the Vect and finding it to be [] tells us that n ~ Z; since Fin n is the type of natural numbers less than n there's no value a caller could use to fill in the second argument.
at : Vect n a -> Fin n -> a
at [] FZ impossible
at [] (FS i) impossible
at (x::xs) FZ = x
at (x::xs) (FS i) = at xs i
Much of the time you're allowed to omit impossible cases altogether.
I slightly prefer Agda's notation for the same concept, which uses the symbol () to explicitly pinpoint which bit of the input expression is impossible.
twoPlusTwoNotFive : (2 + 2 ≡ 5) -> ⊥
twoPlusTwoNotFive () -- again, no RHS
at : forall {n}{A : Set} -> Vec A n -> Fin n -> A
at [] ()
at (x ∷ xs) zero = x
at (x ∷ xs) (suc i) = at xs i
I like it because sometimes you only learn that a case is impossible after doing some further pattern matching on the arguments; when the impossible thing is buried several layers down it's nice to have a visual aid to help you spot where it was.

Idris: function works with Nat parameter and fails type checking with Integer parameter

I am new to Idris. I am experimenting with types and my task is to make an "onion": a function that takes two arguments: a number and whatever and puts whatever into List nested such number of times.
For example, the result for mkOnion 3 "Hello World" should be [[["Hello World"]]].
I've made such a function, this is my code:
onionListType : Nat -> Type -> Type
onionListType Z b = b
onionListType (S a) b = onionListType a (List b)
mkOnionList : (x : Nat) -> y -> onionListType x y
mkOnionList Z a = a
mkOnionList (S n) a = mkOnionList n [a]
prn : (Show a) => a -> IO ();
prn a = putStrLn $ show a;
main : IO()
main = do
prn $ mkOnionList 3 4
prn $ mkOnionList 2 'a'
prn $ mkOnionList 5 "Hello"
prn $ mkOnionList 0 3.14
The result of program work:
[[[4]]]
[['a']]
[[[[["Hello"]]]]]
3.14
This is exactly what I need.
But when I do the same, but change Nat to Integer like this
onionListTypeI : Integer -> Type -> Type
onionListTypeI 0 b = b
onionListTypeI a b = onionListTypeI (a-1) (List b)
mkOnionListI : (x : Integer) -> y -> onionListTypeI x y
mkOnionListI 0 a = a
mkOnionListI n a = mkOnionListI (n-1) [a]
I get an error:
When checking right hand side of mkOnionListI with expected type
onionListTypeI 0 y
Type mismatch between
y (Type of a) and
onionListTypeI 0 y (Expected type)
Why does type checking fails?
I think this is because Integer can take negative values and Type can't be computed in case of negative values. If I am right, how does the compiler understand this?
You are right, that the type can't be computed. But that is because the onionListTypeI is not total. You can check this in the REPL
*test> :total onionListTypeI
Main.onionListTypeI is possibly not total due to recursive path:
Main.onionListTypeI, Main.onionListTypeI
(Or even better, demanding %default total in the source code, which would raise an error.)
Because the type constructor is not total, the compiler won't normalize onionListTypeI 0 y to y. It is not total, because of the case onionListTypeI a b = onionListTypeI (a-1) (List b). The compiler does only know that subtracting 1 from an Integer results to an Integer, but not which number exactly (unlike when doing it with a Nat). This is because arithmetic with Integer, Int, Double and the various Bits are defined with primary functions like prim__subBigInt. And if these functions wouldn't be blind, the compiler should have a problem with negative values, like you assumed.

vector reflexivity under setoid equality using CoRN MathClasses

I have a simple lemma:
Lemma map2_comm: forall A (f:A->A->B) n (a b:t A n),
(forall x y, (f x y) = (f y x)) -> map2 f a b = map2 f b a.
which I was able to prove using standard equality (≡). Now I am need to prove the similar lemma using setoid equality (using CoRN MathClasses). I am new to this library and type classes in general and having difficulty doing so. My first attempt is:
Lemma map2_setoid_comm `{Equiv B} `{Equiv (t B n)} `{Commutative B A}:
forall (a b: t A n),
map2 f a b = map2 f b a.
Proof.
intros.
induction n.
dep_destruct a.
dep_destruct b.
simpl.
(here '=' is 'equiv'). After 'simpl' the goal is "(nil B)=(nil B)" or "[]=[]" using VectorNotations. Normally I would finish it using 'reflexivity' tactics but it gives me:
Tactic failure: The relation equiv is not a declared reflexive relation. Maybe you need to require the Setoid library.
I guess I need somehow to define reflexivity for vector types, but I am not sure how to do that. Please advise.
First of all the lemma definition needs to be adjusted to:
Lemma map2_setoid_comm : forall `{CO:Commutative B A f} `{SB: !Setoid B} ,
forall n:nat, Commutative (map2 f (n:=n)).
To be able to use reflexivity:
Definition vec_equiv `{Equiv A} {n}: relation (vector A n) := Vforall2 (n:=n) equiv.
Instance vec_Equiv `{Equiv A} {n}: Equiv (vector A n) := vec_equiv.

Optimizing Haskell Inner Loops

Still working on my SHA1 implementation in Haskell. I've now got a working implementation and this is the inner loop:
iterateBlock' :: Int -> [Word32] -> Word32 -> Word32 -> Word32 -> Word32 -> Word32 -> [Word32]
iterateBlock' 80 ws a b c d e = [a, b, c, d, e]
iterateBlock' t (w:ws) a b c d e = iterateBlock' (t+1) ws a' b' c' d' e'
where
a' = rotate a 5 + f t b c d + e + w + k t
b' = a
c' = rotate b 30
d' = c
e' = d
The profiler tells me that this function takes 1/3 of the runtime of my implementation. I can think of no way to further optimize it other than maybe inlining the temp variables but I believe -O2 will do that for me anyway.
Can anyone see a significant optimization that can be further applied?
FYI the k and f calls are below. They are so simple I don't think there is a way to optimize these other. Unless the Data.Bits module is slow?
f :: Int -> Word32 -> Word32 -> Word32 -> Word32
f t b c d
| t <= 19 = (b .&. c) .|. ((complement b) .&. d)
| t <= 39 = b `xor` c `xor` d
| t <= 59 = (b .&. c) .|. (b .&. d) .|. (c .&. d)
| otherwise = b `xor` c `xor` d
k :: Int -> Word32
k t
| t <= 19 = 0x5A827999
| t <= 39 = 0x6ED9EBA1
| t <= 59 = 0x8F1BBCDC
| otherwise = 0xCA62C1D6
Looking at the core produced by ghc-7.2.2, the inlining works out well. What doesn't work so well is that in each iteration a couple of Word32 values are first unboxed, to perform the work, and then reboxed for the next iteration. Unboxing and re-boxing can cost a surprisingly large amount of time (and allocation).
You can probably avoid that by using Word instead of Word32. You couldn't use rotate from Data.Bits then, but would have to implement it yourself (not hard) to have it work also on 64-bit systems. For a' you would have to manually mask out the high bits.
Another point that looks suboptimal is that in each iteration t is compared to 19, 39 and 59 (if it's large enough), so that the loop body contains four branches. It will probably be faster if you split iterateBlock' into four loops (0-19, 20-39, 40-59, 60-79) and use constants k1, ..., k4, and four functions f1, ..., f4 (without the t parameter) to avoid branches and have smaller code-size for each loop.
And, as Thomas said, using a list for the block data isn't optimal, an unboxed Word array/vector would probably help too.
With the bang patterns, the core looks much better. Two or three less-than-ideal points remain.
(GHC.Prim.narrow32Word#
(GHC.Prim.plusWord#
(GHC.Prim.narrow32Word#
(GHC.Prim.plusWord#
(GHC.Prim.narrow32Word#
(GHC.Prim.plusWord#
(GHC.Prim.narrow32Word#
(GHC.Prim.plusWord#
(GHC.Prim.narrow32Word#
(GHC.Prim.or#
(GHC.Prim.uncheckedShiftL# sc2_sEn 5)
(GHC.Prim.uncheckedShiftRL# sc2_sEn 27)))
y#_aBw))
sc6_sEr))
y#1_XCZ))
y#2_XD6))
See all these narrow32Word#? They're cheap, but not free. Only the outermost is needed, there may be a bit to harvest by hand-coding the steps and using Word.
Then the comparisons of t with 19, ..., they appear twice, once to determine the k constant, and once for the f transform. The comparisons alone are cheap, but they cause branches and without them, further inlining may be possible. I expect a bit could be gained here too.
And still, the list. That means w can't be unboxed, the core could be simpler if w were unboxable.