Clojure - optimize a threaded map reduce - optimization

I have the following code :
(defn series-sum
"Compute a series : (+ 1 1/4 1/7 1/10 1/13 1/16 ...)"
[n]
(->> (iterate (partial + 3) 1)
(map #(/ 1 %))
(take n)
(reduce +)
float
(format "%.2f")
(str)))
It is working just fine, except that it's running time explodes when numbers get big. On my computer (series-sum 2500) is maybe a second or two, but (series-sum 25000) and I have to kill my REPL.
I tried moving (take n) as far as possible, but that is not enough. I feel that I don't understand something about Clojure since I don't see why it would be slower (I would expect (series-sum 25000) to take roughly 10 times as (series-sum 2500)).
There is an obvious loop/recur solution to optimize it, but I like the idea of being able to print the steps and to have one step (the (take n) looking like the docstring).
How can I improve the performance of this code while maintaining debuggability ?
Better yet, can I measure the time of each step to see the one taking time ?

yes, it is relevant to #zerkms's link. You map to rationals, probably should better map to floats:
(defn series-sum
"Compute a series : (+ 1 1/4 1/7 1/10 1/13 1/16 ...)"
[n]
(->> (iterate (partial + 3) 1)
(take n)
(map #(/ 1.0 %))
(reduce +)
(format "%.2f")))
now it works much faster:
user> (time (series-sum 2500000))
"Elapsed time: 686.233199 msecs"
"5,95"

For this type of mathematical operation, computing in a loop is faster than using lazy sequences. This is an order of magnitude faster than the other answer for me:
(defn series-sum
[n]
(loop [i 0
acc 0.0]
(if (< i n)
(recur (inc i)
(+ acc (/ (float 1) (inc (* 3 i)))))
(format "%.2f" acc))))
Note: you don't need the str because format returns a string.
Edit: of course this is not the main issue with the code in the original question. The bulk of the improvement comes from eliminating rationals as shown by the other answer. This is just a further optimization.

Related

Asymptotic growth (Big o notation)

What I am trying to do is to sort the following functions:
n, n^3, nlogn, n/logn, n/log^2n, sqrt(n), sqrt(n^3)
in increasing order of asymptotic growth.
What I did is,
n/logn, n/log^2n, sqrt(n), n, sqrt(n^3), nlogn, n^3.
1) Is my answer correct?
2) I know about the time complexity of the basic functions such as n, nlogn, n^2, but I am really confused on the functions like, n/nlogn, sqrt(n^3).
How should I figure out which one is faster or slower? Is there any way to do this with mathematical calculations?
3) Are the big O time complexity and asymptotic growth different thing?
I would be really appreciated if anyone blows up my confusion... Thanks!
An important result we need here is:
log n grows more slowly than n^a for any strictly positive number a > 0.
For a proof of the above, see here.
If we re-write sqrt(n^3) as n^1.5, we can see than n log n grows more slowly (divide both by n and use the result above).
Similarly, n / log n grows more quickly than any n^b where b < 1; again this is directly from the result above. Note that it is however slower than n by a factor of log n; same for n / log^2 n.
Combining the above, we find the increasing order to be:
sqrt(n)
n / log^2 n
n / log n
n
n log n
sqrt(n^3)
n^3
So I'm afraid to say you got only a few of the orderings right.
EDIT: to answer your other questions:
If you take the limit of f(n) / g(n) as n -> infinity, then it can be said that f(n) is asymptotically greater than g(n) if this limit is infinite, and lesser if the limit is zero. This comes directly from the definition of big-O.
big-O is a method of classifying asymptotic growth, typically as the parameter approaches infinity.

Optimization in Typed Racket... Is this going too far?

A question about typed/racket. I'm currently working my way through the Euler Project problems to better learn racket. Some of my solutions are really slow, especially when dealing with primes and factors. So for some problems, I've tried to make a typed/racket version and I find no improvement in speed, quite the opposite. (I try to minimize the impact of overhead by using really big numbers, calculations are around 10 seconds.)
I know from the Racket docs that the best optimizations happen when using Floats/Flonums. So... yeah, I've tried to make float versions of problems dealing with integers. As in this problem with a racket version using integers, and a typed/racket one artificially turning integers to floats. I have to use tricks: checking equality between two numbers actually means checking that they are "close enough", like in this function which checks if x can be divided by y :
(: divide? (-> Flonum Flonum Boolean))
(define (divide? x y)
(let ([r (/ x y)])
(< (- r (floor r)) 1e-6)))
It works (well... the solution is correct) and I have a 30%-40% speed improvement.
How acceptable is this? Do people actually do that in real life? If not, what is the best way to optimize typed/racket solutions when using integers? Or should typed/racket be abandoned altogether when dealing with integers and reserved for problems with float calculations?
In most cases the solution is to use better algorithms rather than converting to Typed Racket.
Since most problems at Project Euler concern integers, here is a few tips and tricks:
The division operator / needs to compute the greatest common division between the denominator and the numerator in order to cancel out common factors. This makes / a bad choice if you only want to know whether one number divides another. Use (= (remainder n m) 0) to check whether m divides n. Also: use quotient rander than / when you know the division has a zero remainder.
Use memoization to avoid recomputation. I.e. use a vector to store already computed results. Example: https://github.com/racket/math/blob/master/math-lib/math/private/number-theory/eulerian-number.rkt
First implement a naive algorithm. Then consider how to reduce the number of cases. A rule of thumb: brute force works best if you can reduce the number of cases to 1-10 million.
To reduce the number of cases look for parametrizations of the search space. Example: If you need to find a Pythagorean triple: loop over numbers m and n and then compute a = m^2 - n^2, b = 2mn, and, c = m^2 + n^2. This will be faster than looping over a, b, and, c skipping those triples where a^2 + b^2 = c^2 is not true.
Look for tips and tricks in the source of math/number-theory.
Not aspiring to be an real answer since I can't provide any general tips soegaard hasn't posted already, but since I recently have done "Amicable numbers
Problem 21", I thought may as well leave you my solution here (Sadly not many Lisp solutions get posted on Euler...).
(define (divSum n)
(define (aux i sum)
(if (> (sqr i) n)
(if (= (sqr (sub1 i)) n) ; final check if n is a perfect square
(- sum (sub1 i))
sum)
(aux (add1 i) (if (= (modulo n i) 0)
(+ sum i (/ n i))
sum))))
(aux 2 1))
(define (amicableSum n)
(define (aux a sum)
(if (>= a n)
sum
(let ([b (divSum a)])
(aux (add1 a)
(if (and (> b a) (= (divSum b) a))
(+ sum a b)
sum)))))
(aux 2 0))
> (time (amicableSum 10000))
cpu time: 47 real time: 46 gc time: 0
When dealing with divisors one can often stop at the square-root of n like here with divSum. And when you find an amicable pair you may as well add both to the sum at once, what saves you an unnecessary computation of (divSum b) in my code.

O(n log n) with input of n^2

Could somebody explain me why when you have an algorithm A that has a time complexity of O(n log n) and give it input of size n^2 it gives the following: O(n^2 log n).
I understand that it becomes O(n^2 log n2) and then O(n^2 * 2 * log n) but why does the 2 disappear?
It disappears because time complexity does not care about things that have no effect when n increases (such as a constant multiplier). In fact, it often doesn't even care about things that have less effect.
That's why, if your program runtime can be calculated as n3 - n + 7, the complexity is the much simpler O(n3). You can think of what happens as n approaches infinity. In that case, all the other terms become totally irrelevant compared to the first. That's when you're adding terms.
It's slightly different when multiplying since even lesser terms will still have a substantial effect (because they're multiplied by the thing having the most effect, rather than being added to).
For your specific case, O(n2 log n2) becomes O(n2 2 log n). Then you can remove all terms that have no effect on the outcome as n increases. That's the 2.

Can't explain results when testing clojure data structures

I'm fairly new to clojure and functional programming generally. Being curious about the speed of some basic operations with data structures (clojures defaults and some I may implement) I wrote something to automate testing operations such as adding to the structures.
My method that runs a test for 3 data structures consistently returns very different average run times depending on how its called even when its inputs remain the same.
Code with tests and results at the bottom
(import '(java.util Date))
(defrecord test-suite ;;holds the test results for 3 datastructures
[t-list
t-vector
t-set]
)
(defrecord test-series ;;holds list of test results (list of test-suite) and the list of functions used in the respective tests
[t-suites
t-functions])
;;;Runs the test, returns time it took
(defn time-test [func init-ds delta-list]
(def startTime (. (new Date) (getTime)))
(reduce func init-ds delta-list)
(def endTime (. (new Date) (getTime)))
(- endTime startTime)
)
;;;Runs the test x number of times returning the average run time
(defn test-struct ([iter func init-ds delta-list] (test-struct iter func init-ds delta-list ()))
([iter ;;number of times to run tests
func ;;function being tested (add remove etc)
init-ds ;;initial datastructure being tested
delta-list
addRes ;;test results
]
(println (first addRes));;print previous run time for debugging
;;test if done recursing
(if (> iter 0)
(test-struct
(- iter 1)
func
init-ds
delta-list
(conj addRes (time-test func init-ds delta-list)))
(/ (reduce + addRes) (count addRes)))
))
;;;Tests a function on a passed in data structure and a randomly generated list of numbers
(defn run-test
[iter ;;the number of times each test will be run
func ;;the function being tested
init-ds] ;;the initial datstructure being tested
(def delta-list (shuffle (range 1000000)));;the random values being added/removed/whatever from the ds
(println init-ds)
(println iter)
(test-suite.
;;the list test
(test-struct iter func (nth init-ds 0) delta-list)
;;the vector test
(test-struct iter func (nth init-ds 1) delta-list)
;;the set test
(test-struct iter func (nth init-ds 2) delta-list)
)
)
;;;Calls run-test a number of times storing the results as a list in a test-series data structure along with the list of functions tested.
(defn run-test-set
([iter func-list ds-list] (run-test-set iter (test-series. nil func-list) func-list ds-list))
([iter ;;the number of times each test is run before being averaged
series ;;data-structure that aggregates the list of test results, and ultimately is returned
func-list ;;the list of functions to be tested
ds-list] ;;the list of initial datastructures to be tested
(if (> (count func-list) 0)
(run-test-set ;;recursively run this aggregateing test-suites as we go
iter
(test-series. ;;create a new test series with all the functions and suites run so far
(conj (:t-suites series) ;;run a test suite and append it to those run so far
(run-test iter (first func-list) (first ds-list)))
(:t-functions series))
(rest func-list)
(rest ds-list)
)
series)) ;;finished with last run return results
)
Tests
All times in ms
;;;;;;;;;;;;;;;;;;EVALUATING 'run-test' directly
;;;get average speeds for adding 100000 random elements to list vector and set
;;;run the test 20 times and average the results
(run-test 20 conj '(() [] #{}))
;;;;;RESULT
#test.test-suite{:t-list 254/5, :t-vector 2249/20, :t-set 28641/20}
or about 51 112 and 1432 for list vector and set respectively
;;;;;;;;;;;;;;;;;;EVALUATING using 'run-test-set' which calls run-test
(run-test-set
20 ;;;times the test is run
'(conj) ;;;just run conj (adding to the ds for now)
'((() [] #{})) ;;;add random values to blank structures
)
;;;;RESULT
#test.test-series{
:t-suites (
#test.test-suite{
:t-list 1297/10,
:t-vector 1297/10,
:t-set 1289/10}) ;;;;;;;;;;;;Result of adding values
:t-functions (conj)}
or about 130 for list vector and set, this is roughly the same rate as vector above
Does anyone know why its returning such different results depending on how its run?
Is this clojure related or possibly an optimization Java is doing?
The right way to test performance of clojure code is criterium. Among other things, criterium reports statistical information about the distribution of times of code execution, and ensures that the jvm hotspot compiler is warmed up before taking the measurements. The jvm hotspot compiler is likely the reason you are seeing these performance discrepencies.
Don't use def inside defn, def is designed for top level global definitions. Use let for bindings that are only used inside one function.
Defining records that are only used once and only exist to hold a few variables is not idiomatic in clojure, the overhead of defining the classes is greater than any benefit they may give you (for increased difficulty understanding the code, if not the performance of your code). Save records for when you need to specialize a protocol or improve performance in a tight loop.
When your priority is human readability of numbers, rather than accuracy, you can use double to coerce to a more readable format for printing.
Here is how one would test the properties you are interested in idiomatically (transcript from a repl session, though this could be run from a -main function as well):
user> (require '[criterium.core :as crit])
nil
user> (def input (shuffle (range 1000000)))
#'user/input
user> (crit/bench (reduce conj [] input))
WARNING: Final GC required 3.501501258094413 % of runtime
WARNING: Final GC required 2.381979156956431 % of runtime
Evaluation count : 1680 in 60 samples of 28 calls.
Execution time mean : 36.435413 ms
Execution time std-deviation : 1.607847 ms
Execution time lower quantile : 35.764207 ms ( 2.5%)
Execution time upper quantile : 37.527755 ms (97.5%)
Overhead used : 2.222121 ns
Found 4 outliers in 60 samples (6.6667 %)
low-severe 1 (1.6667 %)
low-mild 3 (5.0000 %)
Variance from outliers : 30.3257 % Variance is moderately inflated by outliers
nil
user> (crit/bench (reduce conj () input))
WARNING: Final GC required 9.024275674955403 % of runtime
Evaluation count : 3540 in 60 samples of 59 calls.
Execution time mean : 19.623083 ms
Execution time std-deviation : 3.842658 ms
Execution time lower quantile : 17.891881 ms ( 2.5%)
Execution time upper quantile : 26.569738 ms (97.5%)
Overhead used : 2.222121 ns
Found 3 outliers in 60 samples (5.0000 %)
low-severe 3 (5.0000 %)
Variance from outliers : 91.0960 % Variance is severely inflated by outliers
nil
user> (crit/bench (reduce conj #{} input))
WARNING: Final GC required 12.0207279064623 % of runtime
Evaluation count : 120 in 60 samples of 2 calls.
Execution time mean : 965.389668 ms
Execution time std-deviation : 682.645918 ms
Execution time lower quantile : 674.958427 ms ( 2.5%)
Execution time upper quantile : 2.287535 sec (97.5%)
Overhead used : 2.222121 ns
Found 9 outliers in 60 samples (15.0000 %)
low-severe 9 (15.0000 %)
Variance from outliers : 98.2830 % Variance is severely inflated by outliers
nil
user>

Speed up string matching in emacs

I want to implement the vim commandT plugin in emacs. This code is mostly a translation from the matcher.
I've got some elisp here that's still too slow to use on my netbook -
how can I speed it up?
(eval-when-compile (require 'cl))
(defun commandT-fuzzy-match (choices search-string)
(sort (loop for choice in choices
for score = (commandT-fuzzy-score choice search-string (commandT-max-score-per-char choice search-string))
if (> score 0.0) collect (list score choice))
#'(lambda (a b) (> (first a) (first b)))
))
(defun* commandT-fuzzy-score (choice search-string &optional (score-per-char (commandT-max-score-per-char choice search-string)) (choice-pointer 0) (last-found nil))
(condition-case error
(loop for search-char across search-string
sum (loop until (char-equal search-char (elt choice choice-pointer))
do (incf choice-pointer)
finally return (let ((factor (cond (last-found (* 0.75 (/ 1.0 (- choice-pointer last-found))))
(t 1.0))))
(setq last-found choice-pointer)
(max (commandT-fuzzy-score choice search-string score-per-char (1+ choice-pointer) last-found)
(* factor score-per-char)))))
(args-out-of-range 0.0) ; end of string hit without match found.
))
(defun commandT-max-score-per-char (choice search-string)
(/ (+ (/ 1.0 (length choice)) (/ 1.0 (length search-string))) 2))
Be sure to compile that part, as that already helps a lot.
And a benchmark:
(let ((choices (split-string (shell-command-to-string "curl http://sprunge.us/FcEL") "\n")))
(benchmark-run-compiled 10
(commandT-fuzzy-match choices "az")))
Here are some micro optimizations you can try:
Use car-less-than-car instead of your lambda expression. This has no visible effect since the time is not spent in sort but in commandT-fuzzy-score.
Use defun instead of defun*: those optional arguments with a non-nil default have a non-negligible hidden cost. This reduces the GC cost by almost half (and you started with more than 10% of the time spent in the GC).
(* 0.75 (/ 1.0 XXX)) is equal to (/ 0.75 XXX).
use eq instead of char-equal (that changes the behavior to always be case-sensitive, tho). This makes a fairly large difference.
use aref instead of elt.
I don't understand why you pass last-found in your recursive call, so I obviously don't fully understand what your algorithm is doing. But assuming that was an error, you can turn it into a local variable instead of passing it as an argument. This saves you time.
I don't understand why you make a recursive call for every search-char that you find, instead of only for the first one. Another way to look at this is that your max compares a "single-char score" with a "whole search-string score" which seems rather odd. If you change your code to do the max outside of the two loops with the recursive call on (1+ first-found), that speeds it up by a factor of 4 in my test case.
The multiplication by score-per-char can be moved outside of the loop (this doesn't seem to be true for your original algorithm).
Also, the Elisp as implemented in Emacs is pretty slow, so you're often better off using "big primitives" so as to spend less time interpreting Elisp (byte-)code and more time running C code. Here is for example an alternative implementation (not of your original algorithm but of the one I got after moving the max outside of the loops), using regexp pattern maching to do the inner loop:
(defun commandT-fuzzy-match-re (choices search-string)
(let ((search-re (regexp-quote (substring search-string 0 1)))
(i 1))
(while (< i (length search-string))
(setq search-re (concat search-re
(let ((c (aref search-string i)))
(format "[^%c]*\\(%s\\)"
c (regexp-quote (string c))))))
(setq i (1+ i)))
(sort
(delq nil
(mapcar (lambda (choice)
(let ((start 0)
(best 0.0))
(while (string-match search-re choice start)
(let ((last-found (match-beginning 0)))
(setq start (1+ last-found))
(let ((score 1.0)
(i 1)
(choice-pointer nil))
(while (setq choice-pointer (match-beginning i))
(setq i (1+ i))
(setq score (+ score (/ 0.75 (- choice-pointer last-found))))
(setq last-found choice-pointer))
(setq best (max best score)))))
(when (> best 0.0)
(list (* (commandT-max-score-per-char
choice search-string)
best)
choice))))
choices))
#'car-less-than-car)))