clojure sum of all the primes under 2000000 - optimization

It's a Project Euler problem .
I learned from Fastest way to list all primes below N
and implemented a clojure :
(defn get-primes [n]
(loop [numbers (set (range 2 n))
primes []]
(let [item (first numbers)]
(cond
(empty? numbers)
primes
:else
(recur (clojure.set/difference numbers (set (range item n item)))
(conj primes item))))))
used like follows:
(reduce + (get-primes 2000000))
but It is so slow..
I am wondering why, Can someone enlighten me?

This algorithm is not even correct: at each iteration except the final one it adds the value of (first numbers) at that point to primes, but there is no guarantee that it will in fact be a prime, since the set data structure in use is unordered. (This is also true of the Python original, as mentioned by its author in an edit to the question you link to.) So, you'd first need to fix it by changing (set (range ...)) to (into (sorted-set) (range ...)).
Even then, this is simply not a great algorithm for finding primes. To do better, you may want to write an imperative implementation of the Sieve of Eratosthenes using a Java array and loop / recur, or maybe a functional SoE-like algorithm such as those described in Melissa E. O'Neill's beautiful paper The Genuine Sieve of Eratosthenes.

Related

Is it possible / what are examples of using hygienic macros for the compile time computational optimization?

I've been reading through https://lispcast.com/when-to-use-a-macro, and it states (about clojure's macros)
Another example is performing expensive calculations at compile time as an optimization
I looked up, and it seems clojure has unhygienic macros. Can this also be applied to hygienic ones? Particularly talking about Scheme. As far as I understand hygienic macros, they only transform syntax, but the actual execution of code is deferred until the runtime no matter what.
Yes. Macro hygiene just refers to whether or not macro expansion can accidentally capture identifiers. Whether or not a macro is hygienic, regular macro expansion (as opposed to reader macro expansion) occurs at compile-time. Macro expansion replaces the macro's code with the results of it being executed. Two major use cases for them are to transform syntax (i.e. DSLs), to enhance performance by eliminating computations at run time or both.
A few examples come to mind:
You prefer to write your code with angles in degrees but all of the calculations are actually in radians. You could have macros eliminate these trivial, but unnecessary (at run time) conversions, at compile time.
Memoization is a broad example of compute optimization that macros can be used for.
You have a string representing a SQL statement or complex textual math expression which you want to parse and possibly even execute at compile time.
You could also combine the examples and have a memoizing SQL parser. Pretty much any scenario where you have all the necessary inputs at compile time and can therefore compute the result is a candidate.
Yes, hygienic macros can do this sort of thing. As an example here is a macro called plus in Racket which is like + except that, at macroexpansion-time, it sums sequences of adjacent literal numbers. So it does some of the work you might expect to be done at run-time at macroexpansion-time (so, effectively, at compile-time). So, for instance
(plus a b 1 2 3 c 4 5)
expands to
(+ a b 6 c 9)
Some notes on this macro.
It's probably not very idiomatic Racket, because I'm a mostly-unreformed CL hacker, which means I live in a cave and wear animal skins and say 'ug' a lot. In particular I am sure I should use syntax-parse but I can't understand it.
It might not even be right.
There are subtleties with arithmetic which means that this macro can return different results than +. In particular + is defined to add pairwise from left to right, while plus does not in general: all the literals get added firsto in particular (assuming you have done (require racket/flonum, and +max.0 &c have the same values as they do on my machine), then (+ -max.0 1.7976931348623157e+308 1.7976931348623157e+308) has a value of 1.7976931348623157e+308, while (plus -max.0 1.7976931348623157e+308 1.7976931348623157e+308) has a value of +inf.0, because the two literals get added first and this overflows.
In general this is a useless thing: it's safe to assume, I think, that any reasonable compiler will do these kind of optimisations for you. The only purpose of it is to show that it's possible to do the detect-and-compile-away compile-time constants.
Remarkably, at least from the point of view of caveman-lisp users like me, you can treat this just like + because of the last in the syntax-case: it works to say (apply plus ...) for instance (although no clever optimisation happens in that case of course).
Here it is:
(require (for-syntax racket/list))
(define-syntax (plus stx)
(define +/stx (datum->syntax stx +))
(syntax-case stx ()
[(_)
;; return additive identity
#'0]
[(_ a)
;; identity with one argument
#'a]
[(_ a ...)
;; the interesting case: there's more than one argument, so walk over them
;; looking for literal numbers. This is probably overcomplicated and
;; unidiomatic
(let* ([syntaxes (syntax->list #'(a ...))]
[reduced (let rloop ([current (first syntaxes)]
[tail (rest syntaxes)]
[accum '()])
(cond
[(null? tail)
(reverse (cons current accum))]
[(and (number? (syntax-e current))
(number? (syntax-e (first tail))))
(rloop (datum->syntax stx
(+ (syntax-e current)
(syntax-e (first tail))))
(rest tail)
accum)]
[else
(rloop (first tail)
(rest tail)
(cons current accum))]))])
(if (= (length reduced) 1)
(first reduced)
;; make sure the operation is our +
#`(#,+/stx #,#reduced)))]
[_
;; plus on its own is +, but we want our one. I am not sure this is right
+/stx]))
It is possible to do this even more aggressively in fact, so that (plus a b 1 2 c 3) is turned into (+ a b c 6). This has probably even more exciting might-get-different answers implications. It's worth noting what the CL spec says about this:
For functions that are mathematically associative (and possibly commutative), a conforming implementation may process the arguments in any manner consistent with associative (and possibly commutative) rearrangement. This does not affect the order in which the argument forms are evaluated [...]. What is unspecified is only the order in which the parameter values are processed. This implies that implementations may differ in which automatic coercions are applied [...].
So an optimisation like this is clearly legal in CL: I'm not clear that it's legal in Racket (although I think it should be).
(require (for-syntax racket/list))
(define-for-syntax (split-literals syntaxes)
;; split a list into literal numbers and the rest
(let sloop ([tail syntaxes]
[accum/lit '()]
[accum/nonlit '()])
(if (null? tail)
(values (reverse accum/lit) (reverse accum/nonlit))
(let ([current (first tail)])
(if (number? (syntax-e current))
(sloop (rest tail)
(cons (syntax-e current) accum/lit)
accum/nonlit)
(sloop (rest tail)
accum/lit
(cons current accum/nonlit)))))))
(define-syntax (plus stx)
(define +/stx (datum->syntax stx +))
(syntax-case stx ()
[(_)
;; return additive identity
#'0]
[(_ a)
;; identity with one argument
#'a]
[(_ a ...)
;; the interesting case: there's more than one argument: split the
;; arguments into literals and nonliterals and handle approprately
(let-values ([(literals nonliterals)
(split-literals (syntax->list #'(a ...)))])
(if (null? literals)
(if (null? nonliterals)
#'0
#`(#,+/stx #,#nonliterals))
(let ([sum/stx (datum->syntax stx (apply + literals))])
(if (null? nonliterals)
sum/stx
#`(#,+/stx #,#nonliterals #,sum/stx)))))]
[_
;; plus on its own is +, but we want our one. I am not sure this is right
+/stx]))

Lisp, iterating backwards

Is there a way (with loop or iterate, doesn't matter) to iterate over sequence backwards?
Apart from (loop for i downfrom 10 to 1 by 1 do (print i)) which works with indexes, and requires length, or (loop for elt in (reverse seq)) which requires reversing sequence (even worse then the first option).
For lists the easiest is (dolist (x (reverse list)) ..) or using the more efficient nreverse if the list can be modified.
For vectors an alternative is dotimes with index calculation, something like:
(let* ((vec #(1 2 3))
(len (length vec)))
(dotimes (i len)
(print (aref vec (- len i 1)))))
Typically lists are iterated over from the start as each cons points to the next. Doing it from the back is inherently inefficient.
If you nevertheless have a list and wish fast reverse or random access, an option is to coerce it to a vector using e.g (coerce my-list 'array) and then access the elements using aref (or coerce to simple-vector and use svref).
If you are the one building the list, consider creating an adjustable vector with fill-pointer (see make-array documentation) and then use vector-push-extend to add items. That gives fast random access from the beginning.
Iterate can do it:
(iterate (for x :in-sequence #(1 2 3) :downto 0)
(princ x))
; => 321
As others have noted, this will be very inefficient if used on lists.

How to generate random elements depending on previous elements using Quickcheck?

I'm using QuickCheck to do generative testing in Clojure.
However I don't know it well and often I end up doing convoluted things. One thing that I need to do quite often is something like that:
generate a first prime number from a list of prime-numbers (so far so good)
generate a second prime number which is smaller than the first one
generate a third prime number which is smaller than the first one
However I have no idea as to how to do that cleanly using QuickCheck.
Here's an even simpler, silly, example which doesn't work:
(prop/for-all [a (gen/choose 1 10)
b (gen/such-that #(= a %) (gen/choose 1 10))]
(= a b))
It doesn't work because a cannot be resolved (prop/for-all isn't like a let statement).
So how can I generate the three primes, with the condition that the two latter ones are inferior to the first one?
In test.check we can use gen/bind as the bind operator in the generator monad, so we can use this to make generators which depend on other generators.
For example, to generate pairs [a b] where we must have (>= a b) we can use this generator:
(def pair-gen (gen/bind (gen/choose 1 10)
(fn [a]
(gen/bind (gen/choose 1 a)
(fn [b]
(gen/return [a b]))))))
To satisfy ourselves:
(c/quick-check 10000
(prop/for-all [[a b] pair-gen]
(>= a b)))
gen/bind takes a generator g and a function f. It generates a value from g, let's call it x. gen/bind then returns the value of (f x), which must be a new generator. gen/return is a generator which only generates its argument (so above I used it to return the pairs).

Fast way to count how often pairs appear in Clojure 1.4

I need to count the number of times that particular pairs of integers occur in some process (a heuristic detecting similar images using locality sensitive hashing - integers represent images and the "neighbours" below are images that have the same hash value, so the count indicates how many different hashes connect the given pair of images).
The counts are stored as a map from (ordered) pair to count (matches below).
The input (nbrs-list below) is a list of a list of integers that are considered neighbours, where every distinct (ordered) pair in the inner list ("the neighbours") should be counted. So, for example, if nbrs-list is [[1,2,3],[10,8,9]] then the pairs are [1,2],[1,3],[2,3],[8,9],[8,10],[9,10].
The routine collect is called multiple times; the matches parameter accumulates results and the nbrs-list is new on each call. The smallest number of neighbours (size of the inner list) is 1 and the largest ~1000. Each integer in nbrs-list occurs just once in any call to collect (this implies that, on each call, no pair occurs more than once) and the integers cover the range 0 - 1,000,000 (so the size of nbrs-list is less than 1,000,000 - since each value occurs just once and sometimes they occur in groups - but typically larger than 100,000 - as most images have no neighbours).
I have pulled the routines out of a larger chunk of code, so they may contain small edit errors, sorry.
(defn- flatten-1
[list]
(apply concat list))
(defn- pairs
([nbrs]
(let [nbrs (seq nbrs)]
(if nbrs (pairs (rest nbrs) (first nbrs) (rest nbrs)))))
([nbrs a bs]
(lazy-seq
(let [bs (seq bs)]
(if bs
(let [b (first bs)]
(cons (if (> a b) [[b a] 1] [[a b] 1]) (pairs nbrs a (rest bs))))
(pairs nbrs))))))
(defn- pairs-map
[nbrs]
(println (count nbrs))
(let [pairs-list (flatten-1 (pairs nbrs))]
(apply hash-map pairs-list)))
(defn- collect
[matches nbrs-list]
(let [to-add (map pairs-map nbrs-list)]
(merge-with + matches (apply (partial merge-with +) to-add))))
So the above code expands each set of neighbours to ordered pairs; creates a map from pairs to 1; then combines maps using addition of values.
I'd like this to run faster. I don't see how to avoid the O(n^2) expansion of pairs, but I imagine I can at least reduce the constant overhead by avoiding so many intermediate structures. At the same time, I'd like the code to be fairly compact and readable...
Oh, and now I am exceeding the "GC overhead limit". So reducing memory use/churn is also a priority :o)
[Maybe this is too specific? I am hoping the lessons are general and haven't seem many posts about optimising "real" clojure code. Also, I can profile etc, but my code seems so ugly I am hoping there's an obvious, cleaner approach - particularly for the pairs expansion.]
I guess you want the frequency with which each pair occurs ?
Try function frequencies. It uses transients under the hood, which should avoid GC overheads.
(I hope I haven't misunderstood your question)
If you just want to count the pairs in lists as this, then [[1,2,3],[8,9,10]] is
(defn count-nbr-pairs [n]
(/ (* n (dec n))
2))
(defn count-pairs [nbrs-list]
(apply + (map #(count-nbr-pairs (count %)) nbrs-list)))
(count-pairs [[1 2 3 4] [8 9 10]])
; => 9
This of course assumes, that you don't need to remove duplicate pairs.
=>(require '[clojure.math.combinatorics :as c])
=>(map #(c/combinations % 2) [[1 2 3] [8 9 10]])
(((1 2) (1 3) (2 3)) ((8 9) (8 10) (9 10)))
It's a pretty small library, take a look at the source
Performance wise you're looking at around the following number for your usecase of 1K unique values under 1M
=> (time (doseq
[i (c/combinations (take 1000 (shuffle (range 1000000))) 2)]))
"Elapsed time: 270.99675 msecs"
That's including generating the target set, which takes about 100 ms on my machine.
the above suggestions seemed to help, but were insufficient. i finally got decent performance with:
packing pairs into a long value. this works because MAX_LONG > 1e12 and long instances are comparable (so work well as hash keys, unlike long[2]). this had a significant effect on lowering memory use compared to [n1 n2].
using a TLongByteHashMap primitive type hash map and mutating it.
handling the pair code with nested doseq loops (or nested for loops when using immutable data structures).
improving my locality sensitive hash. a big part of the problem was that it was too weak, so finding too many neighbours - if the neighbours of a million images are ill-constrained then you get a million million pairs, which consumes a little too much memory...
the inner loop now looks like:
(doseq [i (range 1 (count nbrs))]
(doseq [j (range i)]
(let [pair (pack size (nth nbrs i) (nth nbrs j))]
(if-let [value (.get matches pair)]
(.put matches pair (byte (inc value)))
(.put matches pair (byte 0))))))
where
(defn- pack
[size n1 n2]
(+ (* size n1) n2))
(defn- unpack
[size pair]
[(int (/ pair size)) (mod pair size)])

Transition from infix to prefix notation

I started learning Clojure recently. Generally it looks interesting, but I can't get used to some syntactic inconveniences (comparing to previous Ruby/C# experience).
Prefix notation for nested expressions. In Ruby I get used to write complex expressions with chaining/piping them left-to-right: some_object.map { some_expression }.select { another_expression }. It's really convenient as you move from input value to result step-by-step, you can focus on a single transformation and you don't need to move cursor as you type. Contrary to that when I writing nested expressions in Clojure, I write the code from inner expression to outer and I have to move cursor constantly. It slows down and distracts. I know about -> and ->> macros but I noticed that it's not an idiomatic. Did you have the same problem when you started coding in Clojure/Haskell etc? How did you solve it?
I felt the same about Lisps initially so I feel your pain :-)
However the good news is that you'll find that with a bit of time and regular usage you will probably start to like prefix notation. In fact with the exception of mathematical expressions I now prefer it to infix style.
Reasons to like prefix notation:
Consistency with functions - most languages use a mix of infix (mathematical operators) and prefix (functional call) notation . In Lisps it is all consistent which has a certain elegance if you consider mathematical operators to be functions
Macros - become much more sane if the function call is always in the first position.
Varargs - it's nice to be able to have a variable number of parameters for pretty much all of your operators. (+ 1 2 3 4 5) is nicer IMHO than 1 + 2 + 3 + 4 + 5
A trick then is to use -> and ->> librerally when it makes logical sense to structure your code this way. This is typically useful when dealing with subsequent operations on objects or collections, e.g.
(->>
"Hello World"
distinct
sort
(take 3))
==> (\space \H \W)
The final trick I found very useful when working in prefix style is to make good use of indentation when building more complex expressions. If you indent properly, then you'll find that prefix notation is actually quite clear to read:
(defn add-foobars [x y]
(+
(bar x y)
(foo y)
(foo x)))
To my knowledge -> and ->> are idiomatic in Clojure. I use them all the time, and in my opinion they usually lead to much more readable code.
Here are some examples of these macros being used in popular projects from around the Clojure "ecosystem":
Ring cookie parsing
Leiningen internals
ClojureScript compiler
Proof by example :)
If you have a long expression chain, use let. Long runaway expressions or deeply nested expressions are not especially readable in any language. This is bad:
(do-something (map :id (filter #(> (:age %) 19) (fetch-data :people))))
This is marginally better:
(do-something (map :id
(filter #(> (:age %) 19)
(fetch-data :people))))
But this is also bad:
fetch_data(:people).select{|x| x.age > 19}.map{|x| x.id}.do_something
If we're reading this, what do we need to know? We're calling do_something on some attributes of some subset of people. This code is hard to read because there's so much distance between first and last, that we forget what we're looking at by the time we travel between them.
In the case of Ruby, do_something (or whatever is producing our final result) is lost way at the end of the line, so it's hard to tell what we're doing to our people. In the case of Clojure, it's immediately obvious that do-something is what we're doing, but it's hard to tell what we're doing it to without reading through the whole thing to the inside.
Any code more complex than this simple example is going to become pretty painful. If all of your code looks like this, your neck is going to get tired scanning back and forth across all of these lines of spaghetti.
I'd prefer something like this:
(let [people (fetch-data :people)
adults (filter #(> (:age %) 19) people)
ids (map :id adults)]
(do-something ids))
Now it's obvious: I start with people, I goof around, and then I do-something to them.
And you might get away with this:
fetch_data(:people).select{|x|
x.age > 19
}.map{|x|
x.id
}.do_something
But I'd probably rather do this, at the very least:
adults = fetch_data(:people).select{|x| x.age > 19}
do_something( adults.map{|x| x.id} )
It's also not unheard of to use let even when your intermediary expressions don't have good names. (This style is occasionally used in Clojure's own source code, e.g. the source code for defmacro)
(let [x (complex-expr-1 x)
x (complex-expr-2 x)
x (complex-expr-3 x)
...
x (complex-expr-n x)]
(do-something x))
This can be a big help in debugging, because you can inspect things at any point by doing:
(let [x (complex-expr-1 x)
x (complex-expr-2 x)
_ (prn x)
x (complex-expr-3 x)
...
x (complex-expr-n x)]
(do-something x))
I did indeed see the same hurdle when I first started with a lisp and it was really annoying until I saw the ways it makes code simpler and more clear, once I understood the upside the annoyance faded
initial + scale + offset
became
(+ initial scale offset)
and then try (+) prefix notation allows functions to specify their own identity values
user> (*)
1
user> (+)
0
There are lots more examples and my point is NOT to defend prefix notation. I just hope to convey that the learning curve flattens (emotionally) as the positive sides become apparent.
of course when you start writing macros then prefix notation becomes a must-have instead of a convenience.
to address the second part of your question, the thread first and thread last macros are idiomatic anytime they make the code more clear :) they are more often used in functions calls than pure arithmetic though nobody will fault you for using them when they make the equation more palatable.
ps: (.. object object2 object3) -> object().object2().object3();
(doto my-object
(setX 4)
(sety 5)`