Is it possible to use the intermediate result of a pipe in F#? - optimization

I have to implement a function that takes two lists of tuples let foo l1 l2 and has to append them and apply a recursive function let rec bar x l to one element of each tuple in the list.
The number of recursive calls depends on l, so I'd like to use the intermediate results of the pipe as l in order to reduce the calls instead of saving the initial list and pass that one.
My current solution is as follows and I'd like to optimise it with some sort of dynamic programming solution
let foo l1 l2 =
l = l1 # l2
l |> List.map (fun (x, y) -> x, bar y l)

Related

List split in Elm

Write a function to split a list into two lists. The length of the first part is specified by the caller.
I am new to Elm so I am not sure if my reasoning is correct. I think that I need to transform the input list in an array so I am able to slice it by the provided input number. I am struggling a bit with the syntax as well. Here is my code so far:
listSplit: List a -> Int -> List(List a)
listSplit inputList nr =
let myArray = Array.fromList inputList
in Array.slice 0 nr myArray
So I am thinking to return a list containing 2 lists(first one of the specified length), but I am stuck in the syntax. How can I fix this?
Alternative implementation:
split : Int -> List a -> (List a, List a)
split i xs =
(List.take i xs, List.drop i xs)
I'll venture a simple recursive definition, since a big part of learning functional programming is understanding recursion (which foldl is just an abstraction of):
split : Int -> List a -> (List a, List a)
split splitPoint inputList =
splitHelper splitPoint inputList []
{- We use a typical trick here, where we define a helper function
that requires some additional arguments. -}
splitHelper : Int -> List a -> List a -> (List a, List a)
splitHelper splitPoint inputList leftSplitList =
case inputList of
[] ->
-- This is a base case, we end here if we ran out of elements
(List.reverse leftSplitList, [])
head :: tail ->
if splitPoint > 0 then
-- This is the recursive case
-- Note the typical trick here: we are shuffling elements
-- from the input list and putting them onto the
-- leftSplitList.
-- This will reverse the list, so we need to reverse it back
-- in the base cases
splitHelper (splitPoint - 1) tail (head :: leftSplitList)
else
-- here we got to the split point,
-- so the rest of the list is the output
(List.reverse leftSplitList, inputList)
Use List.foldl
split : Int -> List a -> (List a, List a)
split i xs =
let
f : a -> (List a, List a) -> (List a, List a)
f x (p, q) =
if List.length p >= i then
(p, q++[x])
else
(p++[x], q)
in
List.foldl f ([], []) xs
When list p reaches the desired length, append element x to the second list q.
Append element x to list p otherwise.
Normally in Elm, you use List for a sequence of values. Array is used specifically for fast indexing access.
When dealing with lists in functional programming, try to think in terms of map, filter, and fold. They should be all you need.
To return a pair of something (e.g. two lists), use tuple. Elm supports tuples of up to three elements.
Additionally, there is a function splitAt in the List.Extra package that does exactly the same thing, although it is better to roll your own for the purpose of learning.

Is there a way to cache a function result in elm?

I want to calculate nth Fibonacci number with O(1) complexity and O(n_max) preprocessing.
To do it, I need to store previously calculated value like in this C++ code:
#include<vector>
using namespace std;
vector<int> cache;
int fibonacci(int n)
{
if(n<=0)
return 0;
if(cache.size()>n-1)
return cache[n-1];
int res;
if(n<=2)
res=1;
else
res=fibonacci(n-1)+fibonacci(n-2);
cache.push_back(res);
return res;
}
But it relies on side effects which are not allowed in Elm.
Fibonacci
A normal recursive definition of fibonacci in Elm would be:
fib1 n = if n <= 1 then n else fib1 (n-2) + fib1 (n-1)
Caching
If you want simple caching, the maxsnew/lazy library should work. It uses some side effects in the native JavaScript code to cache computation results. It went through a review to check that the native code doesn't expose side-effects to the Elm user, for memoisation it's easy to check that it preserves the semantics of the program.
You should be careful in how you use this library. When you create a Lazy value, the first time you force it it will take time, and from then on it's cached. But if you recreate the Lazy value multiple times, those won't share a cache. So for example, this DOESN'T work:
fib2 n = Lazy.lazy (\() ->
if n <= 1
then n
else Lazy.force (fib2 (n-2)) + Lazy.force (fib2 (n-1)))
Working solution
What I usually see used for fibonacci is a lazy list. I'll just give the whole compiling piece of code:
import Lazy exposing (Lazy)
import Debug
-- slow
fib1 n = if n <= 1 then n else fib1 (n-2) + fib1 (n-1)
-- still just as slow
fib2 n = Lazy.lazy <| \() -> if n <= 1 then n else Lazy.force (fib2 (n-2)) + Lazy.force (fib2 (n-1))
type List a = Empty | Node a (Lazy (List a))
cons : a -> Lazy (List a) -> Lazy (List a)
cons first rest =
Lazy.lazy <| \() -> Node first rest
unsafeTail : Lazy (List a) -> Lazy (List a)
unsafeTail ll = case Lazy.force ll of
Empty -> Debug.crash "unsafeTail: empty lazy list"
Node _ t -> t
map2 : (a -> b -> c) -> Lazy (List a) -> Lazy (List b) -> Lazy (List c)
map2 f ll lr = Lazy.map2 (\l r -> case (l,r) of
(Node lh lt, Node rh rt) -> Node (f lh rh) (map2 f lt rt)
) ll lr
-- lazy list you can index into, better speed
fib3 = cons 0 (cons 1 (map2 (+) fib3 (unsafeTail fib3)))
So fib3 is a lazy list that has all the fibonacci numbers. Because it uses fib3 itself internally, it'll use the same (cached) lazy values and not need to compute much.

Optimizing partial computation in Haskell

I'm curious how to optimize this code :
fun n = (sum l, f $ f0 l, g $ g0 l)
where l = map h [1..n]
Assuming that f, f0, g, g0, and h are all costly, but the creation and storage of l is extremely expensive.
As written, l is stored until the returned tuple is fully evaluated or garbage collected. Instead, length l, f0 l, and g0 l should all be executed whenever any one of them is executed, but f and g should be delayed.
It appears this behavior could be fixed by writing :
fun n = a `seq` b `seq` c `seq` (a, f b, g c)
where
l = map h [1..n]
a = sum l
b = inline f0 $ l
c = inline g0 $ l
Or the very similar :
fun n = (a,b,c) `deepSeq` (a, f b, g c)
where ...
We could perhaps specify a bunch of internal types to achieve the same effects as well, which looks painful. Are there any other options?
Also, I'm obviously hoping with my inlines that the compiler fuses sum, f0, and g0 into a single loop that constructs and consumes l term by term. I could make this explicit through manual inlining, but that'd suck. Are there ways to explicitly prevent the list l from ever being created and/or compel inlining? Pragmas that produce warnings or errors if inlining or fusion fail during compilation perhaps?
As an aside, I'm curious about why seq, inline, lazy, etc. are all defined to by let x = x in x in the Prelude. Is this simply to give them a definition for the compiler to override?
If you want to be sure, the only way is to do it yourself. For any given compiler version, you can try out several source-formulations and check the generated core/assembly/llvm byte-code/whatever whether it does what you want. But that could break with each new compiler version.
If you write
fun n = a `seq` b `seq` c `seq` (a, f b, g c)
where
l = map h [1..n]
a = sum l
b = inline f0 $ l
c = inline g0 $ l
or the deepseq version thereof, the compiler might be able to merge the computations of a, b and c to be performed in parallel (not in the concurrency sense) during a single traversal of l, but for the time being, I'm rather convinced that GHC doesn't, and I'd be surprised if JHC or UHC did. And for that the structure of computing b and c needs to be simple enough.
The only way to obtain the desired result portably across compilers and compiler versions is to do it yourself. For the next few years, at least.
Depending on f0 and g0, it might be as simple as doing a strict left fold with appropriate accumulator type and combining function, like the famous average
data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Double
average :: [Double] -> Double
average = ratio . foldl' count (P 0 0)
where
ratio (P n s) = s / fromIntegral n
count (P n s) x = P (n+1) (s+x)
but if the structure of f0 and/or g0 doesn't fit, say one's a left fold and the other a right fold, it may be impossible to do the computation in one traversal. In such cases, the choice is between recreating l and storing l. Storing l is easy to achieve with explicit sharing (where l = map h [1..n]), but recreating it may be difficult to achieve if the compiler does some common subexpression elimination (unfortunately, GHC does have a tendency to share lists of that form, even though it does little CSE). For GHC, the flags fno-cse and -fno-full-laziness can help avoiding unwanted sharing.

Given a substitution S and list Xs, how to apply S to Xs

Suppose I have a substitution S and list Xs, where each variable occurring in Xs also occurs in S. How would I find the list S(Xs), i.e., the list obtained by applying the substitution S to the list Xs.
More concretely, I have a set of predicates and DCG rules that look something like
pat(P) --> seg(_), P, seg(_).
seg(X,Y,Z) :- append(X,Z,Y).
If I attempt to match a pattern P with variables against a list, I receive a substitution S:
?- pat([a,X,b,Y],[d,a,c,b,e,d],[]).
X = c,
Y = e
I want to apply the substitution S = {X = c, Y = e} to a list Xs with variables X and Y, and receive the list with substitutions made, but I'm not sure what the best way to approach the problem is.
If I were approaching this problem in Haskell, I would build a finite map from variables to values, then perform the substitution. The equivalent approach would be to produce a list in the DCG rule of pairs of variables and values, then use the map to find the desired list. This is not a suitable approach, however.
Since the substitution is not reified (is not a Prolog object), you can bind the list to a variable and let unification do its work:
?- Xs = [a,X,b,Y], pat(Xs,[d,a,c,b,e,d],[]).
Xs = [a, c, b, e],
X = c,
Y = e .
Edit: If you want to keep the original list around after the substitution, use copy_term:
?- Xs = [a,X,b,Y], copy_term(Xs,Ys), pat(Xs,[d,a,c,b,e,d],[]).
Xs = [a, c, b, e],
X = c,
Y = e,
Ys = [a, _G118, b, _G124] .

optimization of a haskell code

I write the following Haskell code which take a triplet (x,y,z) and a list of triplets [(Int,Int,Int)] and look if there is a triplet (a,b,c) in the list such that x == a and y == b if it is a case i just need to update c = c + z, if there is not a such of triplet in the list I just add the triplet in the list.
-- insertEdge :: (Int,Int,Int) -> [(Int, Int, Int)] -> [(Int, Int, Int)]
insertEdge (x,y,z) cs =
if (length [(a,b,c) | (a,b,c) <- cs, a /= x || b /= y]) == (length cs)
then ((x,y,z):cs))
else [if (a == x && b == y) then (a,b,c+1) else (a,b,c) | (a,b,c) <- cs]
After profiling my code it appears that this fuction take 65% of the execution time.
How can I re-write my code to be more efficient?
Other answers are correct, so I want to offer some unasked-for advice instead: how about using Data.Map (Int,Int) Int instead of list?
Then your function becomes insertWith (+) (a,b) c mymap
The first thing that jumps out at me is the conditional: length examines the entire list, so in the worst-case scenario (updating the last element) your function traverses the list three times: Once for the length of the filtered list, once for the length of cs, and once to find the element to update.
However, even getting rid of the extra traversals, the best you can do with the function as written will usually require a traversal of most of the list. From the name of the function and how much time was being spent in it, I'm guessing you're calling this repeatedly to build up a data structure? If so, you should strongly consider using a more efficient representation.
For instance, a quick and easy improvement would be to use Data.Map, the first two elements of the triplet in a 2-tuple as the key, and the third element as the value. That way you can avoid making so many linear-time lookups/redundant traversals.
As a rule of thumb, lists in Haskell are only an appropriate data structure when all you do is either walk sequentially down the list a few times (ideally, just once) or add/remove from the head of the list (i.e., using it like a stack). If you're searching, filtering, updating elements in the middle, or--worst of all--indexing by position, using lists will only end in tears.
Here's a quick example, if that helps:
import qualified Data.Map as M
incEdge :: M.Map (Int, Int) Int -> ((Int, Int), Int) -> M.Map (Int, Int) Int
incEdge cs (k,v) = M.alter f k cs
where f (Just n) = Just $ n + v
f Nothing = Just v
The alter function is just insert/update/delete all rolled into one. This inserts the key into the map if it's not there, and sums the values if the key does exist. To build up a structure incrementally, you can do something like foldl incEdge M.empty edgeList. Testing this out, for a few thousand random edges your version with a list takes several seconds, whereas the Data.Map version is pretty much immediate.
It's always a good idea to benchmark (and Criterion makes it so easy). Here are the results for the original solution (insertEdgeO), Geoff's foldr (insertEdgeF), and Data.Map (insertEdgeM):
benchmarking insertEdgeO...
mean: 380.5062 ms, lb 379.5357 ms, ub 381.1074 ms, ci 0.950
benchmarking insertEdgeF...
mean: 74.54564 ms, lb 74.40043 ms, ub 74.71190 ms, ci 0.950
benchmarking insertEdgeM...
mean: 18.12264 ms, lb 18.03029 ms, ub 18.21342 ms, ci 0.950
Here's the code (I compiled with -O2):
module Main where
import Criterion.Main
import Data.List (foldl')
import qualified Data.Map as M
insertEdgeO :: (Int, Int, Int) -> [(Int, Int, Int)] -> [(Int, Int, Int)]
insertEdgeO (x, y, z) cs =
if length [(a, b, c) | (a, b, c) <- cs, a /= x || b /= y] == length cs
then (x, y, z) : cs
else [if (a == x && b == y) then (a, b, c + z) else (a, b, c) | (a, b, c) <- cs]
insertEdgeF :: (Int, Int, Int) -> [(Int, Int, Int)] -> [(Int, Int, Int)]
insertEdgeF (x,y,z) cs =
case foldr f (False, []) cs of
(False, cs') -> (x, y, z) : cs'
(True, cs') -> cs'
where
f (a, b, c) (e, cs')
| (a, b) == (x, y) = (True, (a, b, c + z) : cs')
| otherwise = (e, (a, b, c) : cs')
insertEdgeM :: (Int, Int, Int) -> M.Map (Int, Int) Int -> M.Map (Int, Int) Int
insertEdgeM (a, b, c) = M.insertWith (+) (a, b) c
testSet n = [(a, b, c) | a <- [1..n], b <- [1..n], c <- [1..n]]
testO = foldl' (flip insertEdgeO) [] . testSet
testF = foldl' (flip insertEdgeF) [] . testSet
testM = triplify . M.toDescList . foldl' (flip insertEdgeM) M.empty . testSet
where
triplify = map (\((a, b), c) -> (a, b, c))
main = let n = 25 in defaultMain
[ bench "insertEdgeO" $ nf testO n
, bench "insertEdgeF" $ nf testF n
, bench "insertEdgeM" $ nf testM n
]
You can improve insertEdgeF a bit by using foldl' (55.88634 ms), but Data.Map still wins.
The main reason your function is slow is that it traverses the list at least twice, maybe three times. The function can be rewritten to to traverse the list only once using a fold. This will transform the list into a tuple (Bool,[(Int,Int,Int)]) where the Bool indicates if there was a matching element in the list and the list is the transformed list
insertEdge (x,y,z) cs = case foldr f (False,[]) cs of
(False,cs') -> (x,y,z):cs'
(True,cs') -> cs'
where f (a,b,c) (e,cs') = if (a,b) == (x,y) then (True,(a,b,c+z):cs') else (e,(a,b,c):cs')
If you haven't seen foldr before, it has type
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr embodies a pattern of recursive list processing of defining a base case and combining the current list element with the result from the rest of the list. Writing foldr f b xs is the same as writing a function g with definition
g [] = b
g (x:xs) = f x (g xs)
Sticking with your data structure, you might
type Edge = (Int,Int,Int)
insertEdge :: Edge -> [Edge] -> [Edge]
insertEdge t#(x,y,z) es =
case break (abx t) es of
(_, []) -> t : es
(l, ((_,_,zold):r)) -> l ++ (x,y,z+zold) : r
where abx (a1,b1,_) (a2,b2,_) = a1 == a2 && b1 == b2
No matter what language you're using, searching lists is always a red flag. When searching you want sublinear complexity (think: hashes, binary search trees, and so on). In Haskell, an implementation using Data.Map is
import Data.Map
type Edge = (Int,Int,Int)
type EdgeMap = Map (Int,Int) Int
insertEdge :: Edge -> EdgeMap -> EdgeMap
insertEdge (x,y,z) es = alter accumz (x,y) es
where accumz Nothing = Just z
accumz (Just zold) = Just (z + zold)
You may not be familiar with alter:
alter :: Ord k => (Maybe a -> Maybe a) -> k -> Map k a -> Map k a
O(log n). The expression (alter f k map) alters the value x at k, or absence thereof. alter can be used to insert, delete, or update a value in a Map. In short: lookup k (alter f k m) = f (lookup k m).
let f _ = Nothing
alter f 7 (fromList [(5,"a"), (3,"b")]) == fromList [(3, "b"), (5, "a")]
alter f 5 (fromList [(5,"a"), (3,"b")]) == singleton 3 "b"
let f _ = Just "c"
alter f 7 (fromList [(5,"a"), (3,"b")]) == fromList [(3, "b"), (5, "a"), (7, "c")]
alter f 5 (fromList [(5,"a"), (3,"b")]) == fromList [(3, "b"), (5, "c")]
But as ADEpt shows in another answer, this is a bit of overengineering.
In
insertEdgeM :: (Int, Int, Int) -> M.Map (Int, Int) Int -> M.Map (Int, Int) Int
insertEdgeM (a, b, c) = M.insertWith (+) (a, b) c
you want to use the strict version of insertWith, namely insertWith'.
Very small optimisation: Use an as-pattern, this avoids multiple reconstructions of the same tuple. Like this:
insertEdge xyz#(x,y,z) cs =
if (length [abc | abc#(a,b,c) <- cs, a /= x || b /= y]) == (length cs)
then (xyz:cs))
else [if (a == x && b == y) then (a,b,c+1) else abc' | abc'#(a,b,c) <- cs]
You should apply the other optimization hionts first, but this may save a very small amount of time, since the tuple doesn't have to be reconstructed again and again. At least in the last at-pattern (The first two patterns are not important, since the tuple never gets evaluated in the first case and the as-pattern is only applied once in the second case).