Serialize persistent data structures in clojure - serialization

We all know that Rich uses a ideal hash tree-based method to implement the persistent data structures in Clojure. This structure enables us to manipulate the persistent data structures without copying a lot.
But it seems I cannot find the correct way to serialize this specific structure. For example given:
(def foo {:a :b :c :d})
(def bar (assoc foo :e :f))
(def bunny {:foo foo :bar bar})
My question is:
How can I serialize the bunny such that the contents of foo, i.e. :a mapping to :b and :c mapping to :d, appear only once in the serialized content? It's like dumping a memory image of the structures. It's also like serializing the "internal nodes" as well as the "leaf nodes" referenced here.
P.S. In case this is relevant, I am building a big DAG (directed acyclic graph) where we assoc quite a bit to link these nodes to those nodes, and want to serialize the DAG for later de-serialization. The expanded representation of the graph (i.e., the content one'll get when printing the DAG in repl) is unacceptably long.

Davyzhu,
Few things first:
The DAG, without tokenization strategy, will be as long as the DAG is. If foo is referenced 1 or more times each will be fully realized (i.e. displayed) in turn during printing.
For the interchanges of the information (serialize and deserialize) it will be largely dependent on your goals. For example, if you are serializing to send it off over the wire you will either want to do it fully (like the printed representation) or you will need to encode individual data points with some identification/tokenization strategy. The latter, of course, assumes the receiving end can deserialize with understanding of the tokenization protocol.
The tokenization strategy example, could use Clojure meta facilities perhaps, would require encoding unique keys for each content block reference and your DAG contains nodes where the edges are represented by the keys.
Edit:: Modified since original post to clarify as per comments but the example
does not reflect the hierarchical nature of the DAG.
A contrived example:
(def node1 {:a :b :c :d})
(def node2 {:e :f})
(def dictionary {:foo node1 :bar node2})
(def DAG [:bunny [:foo :bar]])
(println DAG) ; => [:bunny [:foo :bar]]
(defn expand-dag1
[x]
(if (keyword? x)
(get dictionary x x)
x))
(println (w/postwalk expand-dag1 DAG)) ; => [:bunny [{:a :b, :c :d} {:e :f}]]
Note: Use of vectors, maps, lists, etc. to express your DAG is up to you.

This is one option (that will not work in Clojurescript, in case that matters) and in general may be perceived as a bad idea, but it is worth mentioning anyway.
If I understand your question, you want the foo in (def bunny {:foo foo :bar bar}) to not be "pasted" as a full copy, but rather retain a "reference" to the (def foo..) such that the original foo map is only serialized once.
One technique I would consider though not necessarily encourage (and only after exhausting other options such as a reorginization of your data structure as hinted by Frank C.), is to serialize the code for bunny rather than the structure itself. Then you read the code string back in and eval it. This would only work if the structure for bunny does not change, or if it does, that you can easily build a string of the bunny map with the relevant symbols included as part of the string, rather than the contents of those symbols.
But a much better idea would be to serialize your "raw" data structures only, like the maps foo and bar, then build your bunny after these are read back in -- by also serializing the structure but not the contents of bunny. I believe this is what Frank's answer is getting at.
Worth noting that if the structure of bunny does change dynamically, and you are able to create a string of symbols as suggested in 1. above, then that means you also have the tools to instead build a representation of bunny as in 2. above, which would be preferable.
Since code is data, option 1. is an example of the type of flexibility available to us as lisp programmers -- but that doesn't mean there are not better options.

Related

How do I remember the root of a binary search tree in Haskell

I am new to Functional programming.
The challenge I have is regarding the mental map of how a binary search tree works in Haskell.
In other programs (C,C++) we have something called root. We store it in a variable. We insert elements into it and do balancing etc..
The program takes a break does other things (may be process user inputs, create threads) and then figures out it needs to insert a new element in the already created tree. It knows the root (stored as a variable) and invokes the insert function with the root and the new value.
So far so good in other languages. But how do I mimic such a thing in Haskell, i.e.
I see functions implementing converting a list to a Binary Tree, inserting a value etc.. That's all good
I want this functionality to be part of a bigger program and so i need to know what the root is so that i can use it to insert it again. Is that possible? If so how?
Note: Is it not possible at all because data structures are immutable and so we cannot use the root at all to insert something. in such a case how is the above situation handled in Haskell?
It all happens in the same way, really, except that instead of mutating the existing tree variable we derive a new tree from it and remember that new tree instead of the old one.
For example, a sketch in C++ of the process you describe might look like:
int main(void) {
Tree<string> root;
while (true) {
string next;
cin >> next;
if (next == "quit") exit(0);
root.insert(next);
doSomethingWith(root);
}
}
A variable, a read action, and loop with a mutate step. In haskell, we do the same thing, but using recursion for looping and a recursion variable instead of mutating a local.
main = loop Empty
where loop t = do
next <- getLine
when (next /= "quit") $ do
let t' = insert next t
doSomethingWith t'
loop t'
If you need doSomethingWith to be able to "mutate" t as well as read it, you can lift your program into State:
main = loop Empty
where loop t = do
next <- getLine
when (next /= "quit") $ do
loop (execState doSomethingWith (insert next t))
Writing an example with a BST would take too much time but I give you an analogous example using lists.
Let's invent a updateListN which updates the n-th element in a list.
updateListN :: Int -> a -> [a] -> [a]
updateListN i n l = take (i - 1) l ++ n : drop i l
Now for our program:
list = [1,2,3,4,5,6,7,8,9,10] -- The big data structure we might want to use multiple times
main = do
-- only for shows
print $ updateListN 3 30 list -- [1,2,30,4,5,6,7,8,9,10]
print $ updateListN 8 80 list -- [1,2,3,4,5,6,7,80,9,10]
-- now some illustrative complicated processing
let list' = foldr (\i l -> updateListN i (i*10) l) list list
-- list' = [10,20,30,40,50,60,70,80,90,100]
-- Our crazily complicated illustrative algorithm still needs `list`
print $ zipWith (-) list' list
-- [9,18,27,36,45,54,63,72,81,90]
See how we "updated" list but it was still available? Most data structures in Haskell are persistent, so updates are non-destructive. As long as we have a reference of the old data around we can use it.
As for your comment:
My program is trying the following a) Convert a list to a Binary Search Tree b) do some I/O operation c) Ask for a user input to insert a new value in the created Binary Search Tree d) Insert it into the already created list. This is what the program intends to do. Not sure how to get this done in Haskell (or) is am i stuck in the old mindset. Any ideas/hints welcome.
We can sketch a program:
data BST
readInt :: IO Int; readInt = undefined
toBST :: [Int] -> BST; toBST = undefined
printBST :: BST -> IO (); printBST = undefined
loop :: [Int] -> IO ()
loop list = do
int <- readInt
let newList = int : list
let bst = toBST newList
printBST bst
loop newList
main = loop []
"do balancing" ... "It knows the root" nope. After re-balancing the root is new. The function balance_bst must return the new root.
Same in Haskell, but also with insert_bst. It too will return the new root, and you will use that new root from that point forward.
Even if the new root's value is the same, in Haskell it's a new root, since one of its children has changed.
See ''How to "think functional"'' here.
Even in C++ (or other imperative languages), it would usually be considered a poor idea to have a single global variable holding the root of the binary search tree.
Instead code that needs access to a tree should normally be parameterised on the particular tree it operates on. That's a fancy way of saying: it should be a function/method/procedure that takes the tree as an argument.
So if you're doing that, then it doesn't take much imagination to figure out how several different sections of code (or one section, on several occasions) could get access to different versions of an immutable tree. Instead of passing the same tree to each of these functions (with modifications in between), you just pass a different tree each time.
It's only a little more work to imagine what your code needs to do to "modify" an immutable tree. Obviously you won't produce a new version of the tree by directly mutating it, you'll instead produce a new value (probably by calling methods on the class implementing the tree for you, but if necessary by manually assembling new nodes yourself), and then you'll return it so your caller can pass it on - by returning it to its own caller, by giving it to another function, or even calling you again.
Putting that all together, you can have your whole program manipulate (successive versions of) this binary tree without ever having it stored in a global variable that is "the" tree. An early function (possibly even main) creates the first version of the tree, passes it to the first thing that uses it, gets back a new version of the tree and passes it to the next user, and so on. And each user of the tree can call other subfunctions as needed, with possibly many of new versions of the tree produced internally before it gets returned to the top level.
Note that I haven't actually described any special features of Haskell here. You can do all of this in just about any programming language, including C++. This is what people mean when they say that learning other types of programming makes them better programmers even in imperative languages they already knew. You can see that your habits of thought are drastically more limited than they need to be; you could not imagine how you could deal with a structure "changing" over the course of your program without having a single variable holding a structure that is mutated, when in fact that is just a small part of the tools that even C++ gives you for approaching the problem. If you can only imagine this one way of dealing with it then you'll never notice when other ways would be more helpful.
Haskell also has a variety of tools it can bring to this problem that are less common in imperative languages, such as (but not limited to):
Using the State monad to automate and hide much of the boilerplate of passing around successive versions of the tree.
Function arguments allow a function to be given an unknown "tree-consumer" function, to which it can give a tree, without any one place both having the tree and knowing which function it's passing it to.
Lazy evaluation sometimes negates the need to even have successive versions of the tree; if the modifications are expanding branches of the tree as you discover they are needed (like a move-tree for a game, say), then you could alternatively generate "the whole tree" up front even if it's infinite, and rely on lazy evaluation to limit how much work is done generating the tree to exactly the amount you need to look at.
Haskell does in fact have mutable variables, it just doesn't have functions that can access mutable variables without exposing in their type that they might have side effects. So if you really want to structure your program exactly as you would in C++ you can; it just won't really "feel like" you're writing Haskell, won't help you learn Haskell properly, and won't allow you to benefit from many of the useful features of Haskell's type system.

Difference between VQ and V2Q

To output a verb phrase that has an object as a question, then as it seems RGL only offers two functions:
VQ -> QS -> VP
V2Q -> NP -> QS -> VP
And in these two functions, the verb type was divided into two different categories. But the type V2Q has a parameter that requires adding a preposition to the sentence. In order to generate the sentence Tell me who I am I used the following code:
MySentence = {s = (mkPhr
(mkImp
(mkVP
(mkV2Q
(mkV "tell")
(mkPrep ""))
(i_NP)
(mkQS
(mkQCl
(mkIComp (who_IP))
(i_NP)))))).s };
The code above generates the output I desire without a problem. So my question is, is there any reason the preposition was added to the verb V2Q? Or was this output generated in a wrong way?
First, yes you constructed the sentence correctly.
Why is there a slot for preposition in V2Q
In general, all V2* (and V3*) may take their NP object as a direct object, like eat ___, see ___, or with a preposition, like believe in ___.
This is more flexible than forcing all transitive verbs only take direct objects, and all prepositional phrases to be analysed as optional adverbials. Take a VP like "believe in yourself", it's not that you're believing (something) and also your location is yourself. It's nice to be able to encode that believe_V2 takes an obligatory argument, and that argument is introduced by the preposition in.
(Side note: for a VP like "sleep in a soft bed", "in a soft bed" is not an obligatory argument of sleep. So then we just make sleep into an intransitive verb, sleep_V, and make in a soft bed into an Adv.)
So, this generalises to all verbs that take some NP argument (V2V, V2S, V2Q, V2A). Take a VP like "lie [to children] [that moon is made of cheese]": the verb lie is a V2S that introduces its NP object with the preposition to.
In fact, many RGL languages offer a noPrep in their Paradigms module—you can Ctrl+F in the RGL synopsis page to see examples.
The constructors of V2Q
So why are you forced to make your V2Q with mkV2Q (mkV "tell") (mkPrep ""), even when there is no preposition?
More common verb types, like V2, have several overload instances of mkV2. The simplest is just mkV2 : Str -> V2. Since it's such a common thing for transitive verbs to have a direct object (i.e. not introduce their object with a preposition), and there are so many simple V2s, it would be pretty annoying to have to always specify a noPrep for most of them.
V2Q is rarer than V2, so nobody just hasn't bothered creating an instance that doesn't take a preposition. The constructor that takes preposition is more general than the constructor that doesn't, since you can always choose the preposition to be noPrep. Well, I just pushed a few new additions, see here, so if you get the latest RGL, you can now just do mkV2Q "tell".
This kind of thing is completely customisable: if you want more overload instances of some mkX oper, you can just make them.

Clojure: Store and Compile Large Derived Data Structure

I have a large data structure, a tree, that takes up about 2gb in ram. It includes clojure sets in the leaves, and refs as the branches. The tree is built by reading and parsing a large flat file and inserting the rows into the tree. However this takes about 30 seconds. Is there a way I can build the tree once, emit it to a clj file, and then compile the tree into my standalone jar so I can lookup values in the tree without re-reading the large text file? I think this will trim out the 30 second tree build, but also this will help me deploy my standalone jar without needing the text file to come along for the ride.
My first swing at this failed:
(def x (ref {:zebra (ref #{1 2 3 4})}))
#<Ref#6781a7dc: {:zebra #<Ref#709c4f85: #{1 2 3 4}>}>
(def y #<Ref#6781a7dc: {:zebra #<Ref#709c4f85: #{1 2 3 4}>}>)
RuntimeException Unreadable form clojure.lang.Util.runtimeException (Util.java:219)
Embedding data this big in compiled code may not be possible because of size limits imposed upon the JVM. In particular, no single method may exceed 64 KiB in length. Embedding data in the way I describe further below also necessitates including tons of stuff in the class file it's going to live in; doesn't seem like a great idea.
Given that you're using the data structure read-only, you can construct it once, then emit it to a .clj / .edn (that's for edn, the serialization format based on Clojure literal notation), then include that file on your class path as a "resource", so that it's included in the überjar (in resources/ with default Leiningen settings; it'll then get included in the überjar unless excluded by :uberjar-exclusions in project.clj) and read it from the resource at runtime at full speed of Clojure's reader:
(ns foo.core
(:require [clojure.java.io :as io]))
(defn get-the-huge-data-structure []
(let [r (io/resource "huge.edn")
rdr (java.io.PushbackReader. (io/reader r))]
(read r)))
;; if you then do something like this:
(def ds (get-the-huge-data-structure))
;; your app will load the data as soon as this namespace is required;
;; for your :main namespace, this means as soon as the app starts;
;; note that if you use AOT compilation, it'll also be loaded at
;; compile time
You could also not add it to the überjar, but rather add it to the classpath when running your app. This way your überjar itself would not have to be huge.
Handling stuff other than persistent Clojure data could be accomplished using print-method (when serializing) and reader tags (when deserializing). Arthur already demonstrated using reader tags; to use print-method, you'd do something like
(defmethod print-method clojure.lang.Ref [x writer]
(.write writer "#ref ")
(print-method #x writer))
;; from the REPL, after doing the above:
user=> (pr-str {:foo (ref 1)})
"{:foo #ref 1}"
Of course you only need to have the print-method methods defined when serializing; you're deserializing code can leave it alone, but will need appropriate data readers.
Disregarding the code size issue for a moment, as I find the data embedding issue interesting:
Assuming your data structure only contains immutable data natively handled by Clojure (Clojure persistent collections, arbitrarily nested, plus atomic items such as numbers, strings (atomic for this purpose), keywords, symbols; no Refs etc.), you can indeed include it in your code:
(defmacro embed [x]
x)
The generated bytecode will then recreate x without reading anything, by using constants included in the class file and static methods of the clojure.lang.RT class (e.g. RT.vector and RT.map).
This is, of course, how literals are compiled, since the macro above is a noop. We can make things more interesting though:
(ns embed-test.core
(:require [clojure.java.io :as io])
(:gen-class))
(defmacro embed-resource [r]
(let [r (io/resource r)
rdr (java.io.PushbackReader. (io/reader r))]
(read r)))
(defn -main [& args]
(println (embed-resource "foo.edn")))
This will read foo.edn at compile time and embed the result in the compiled code (in the sense of including appropriate constants and code to reconstruct the data in the class file). At run time, no further reading will be performed.
Is this structure something that doesn't change? If not, consider using Java serialization to persist the structure. Deserializing will be much faster than rebuilding every time.
If you can structure the tree to be a single value instead of a tree fo references to many values then you would be able to print the tree and read it. Because refs are not readable you won't be able to treat the entire tree as something readable without doing doing your own parsing.
It may be worth looking into using the extensible reader to add print and read functions for your tree by making it a type.
here is a minimal example of using data-readers to produce references to sets and maps from a string:
first define handlers for the contents of each EDN tag/type
user> (defn parse-map-ref [m] (ref (apply hash-map m)))
#'user/parse-map-ref
user> (defn parse-set-ref [s] (ref (set s)))
#'user/parse-set-ref
Then bind the map data-readers to associate the handlers with textual tags:
(def y-as-string
"#user/map-ref [:zebra #user/set-ref [1 2 3 4]]")
user> (def y (binding [*data-readers* {'user/set-ref user/parse-set-ref
'user/map-ref user/parse-map-ref}]
(read-string y-as-string)))
user> y
#<Ref#6d130699: {:zebra #<Ref#7c165ec0: #{1 2 3 4}>}>
this also works with more deeply nested trees:
(def z-as-string
"#user/map-ref [:zebra #user/set-ref [1 2 3 4]
:ox #user/map-ref [:amimal #user/set-ref [42]]]")
user> (def z (binding [*data-readers* {'user/set-ref user/parse-set-ref
'user/map-ref user/parse-map-ref}]
(read-string z-as-string)))
#'user/z
user> z
#<Ref#2430c1a0: {:ox #<Ref#7cf801ef: {:amimal #<Ref#7e473201: #{42}>}>,
:zebra #<Ref#7424206b: #{1 2 3 4}>}>
producing strings from trees can be accomplished by extending the print-method multimethod, though it would be a lot easier if you define a type for ref-map and ref-set using deftype so the printer can know which ref should produce which strings.
If in general reading them as strings is too slow there are faster binary serialization libraries like protocol buffers.

QuickCheck: Arbitrary instances of nested data structures that generate balanced specimens

tl;dr: how do you write instances of Arbitrary that don't explode if your data type allows for way too much nesting? And how would you guarantee these instances produce truly random specimens of your data structure?
I want to generate random tree structures, then test certain properties of these structures after I've mangled them with my library code. (NB: I'm writing an implementation of a subtyping algorithm, i.e. given a hierarchy of types, is type A a subtype of type B. This can be made arbitrarily complex, by including multiple-inheritance and post-initialization updates to the hierarchy. The classical method that supports neither of these is Schubert Numbering, and the latest result known to me is Alavi et al. 2008.)
Let's take the example of rose-trees, following Data.Tree:
data Tree a = Node a (Forest a)
type Forest a = [Tree a]
A very simple (and don't-try-this-at-home) instance of Arbitray would be:
instance (Arbitrary a) => Arbitrary (Tree a) where
arbitrary = Node <$> arbitrary <$> arbitrary
Since a already has an Arbitrary instance as per the type constraint, and the Forest will have one, because [] is an instance, too, this seems straight-forward. It won't (typically) terminate for very obvious reasons: since the lists it generates are arbitrarily long, the structures become too large, and there's a good chance they won't fit into memory. Even a more conservative approach:
arbitrary = Node <$> arbitrary <*> oneof [arbitrary,return []]
won't work, again, for the same reason. One could tweak the size parameter, to keep the length of the lists down, but even that won't guarantee termination, since it's still multiple consecutive dice-rolls, and it can turn out quite badly (and I want the odd node with 100 children.)
Which means I need to limit the size of the entire tree. That is not so straight-forward. unordered-containers has it easy: just use fromList. This is not so easy here: How do you turn a list into a tree, randomly, and without incurring bias one way or the other (i.e. not favoring left-branches, or trees that are very left-leaning.)
Some sort of breadth-first construction (the functions provided by Data.Tree are all pre-order) from lists would be awesome, and I think I could write one, but it would turn out to be non-trivial. Since I'm using trees now, but will use even more complex stuff later on, I thought I might try to find a more general and less complex solution. Is there one, or will I have to resort to writing my own non-trivial Arbitrary generator? In the latter case, I might actually just resort to unit-tests, since this seems too much work.
Use sized:
instance Arbitrary a => Arbitrary (Tree a) where
arbitrary = sized arbTree
arbTree :: Arbitrary a => Int -> Gen (Tree a)
arbTree 0 = do
a <- arbitrary
return $ Node a []
arbTree n = do
(Positive m) <- arbitrary
let n' = n `div` (m + 1)
f <- replicateM m (arbTree n')
a <- arbitrary
return $ Node a f
(Adapted from the QuickCheck presentation).
P.S. Perhaps this will generate overly balanced trees...
You might want to use the library presented in the paper "Feat: Functional Enumeration of Algebraic Types" at the Haskell Symposium 2012. It is on Hackage as testing-feat, and a video of the talk introducing it is available here: http://www.youtube.com/watch?v=HbX7pxYXsHg
As Janis mentioned, you can use the package testing-feat, which creates enumerations of arbitrary algebraic data types. This is the easiest way to create unbiased uniformly distributed generators
for all trees of up to a given size.
Here is how you would use it for rose trees:
import Test.Feat (Enumerable(..), uniform, consts, funcurry)
import Test.Feat.Class (Constructor)
import Data.Tree (Tree(..))
import qualified Test.QuickCheck as QC
-- We make an enumerable instance by listing all constructors
-- for the type. In this case, we have one binary constructor:
-- Node :: a -> [Tree a] -> Tree a
instance Enumerable a => Enumerable (Tree a) where
enumerate = consts [binary Node]
where
binary :: (a -> b -> c) -> Constructor c
binary = unary . funcurry
-- Now we use the Enumerable instance to create an Arbitrary
-- instance with the help of the function:
-- uniform :: Enumerable a => Int -> QC.Gen a
instance Enumerable a => QC.Arbitrary (Tree a) where
QC.arbitrary = QC.sized uniform
-- QC.shrink = <some implementation>
The Enumerable instance can also be generated automatically with TemplateHaskell:
deriveEnumerable ''Tree

Transition from infix to prefix notation

I started learning Clojure recently. Generally it looks interesting, but I can't get used to some syntactic inconveniences (comparing to previous Ruby/C# experience).
Prefix notation for nested expressions. In Ruby I get used to write complex expressions with chaining/piping them left-to-right: some_object.map { some_expression }.select { another_expression }. It's really convenient as you move from input value to result step-by-step, you can focus on a single transformation and you don't need to move cursor as you type. Contrary to that when I writing nested expressions in Clojure, I write the code from inner expression to outer and I have to move cursor constantly. It slows down and distracts. I know about -> and ->> macros but I noticed that it's not an idiomatic. Did you have the same problem when you started coding in Clojure/Haskell etc? How did you solve it?
I felt the same about Lisps initially so I feel your pain :-)
However the good news is that you'll find that with a bit of time and regular usage you will probably start to like prefix notation. In fact with the exception of mathematical expressions I now prefer it to infix style.
Reasons to like prefix notation:
Consistency with functions - most languages use a mix of infix (mathematical operators) and prefix (functional call) notation . In Lisps it is all consistent which has a certain elegance if you consider mathematical operators to be functions
Macros - become much more sane if the function call is always in the first position.
Varargs - it's nice to be able to have a variable number of parameters for pretty much all of your operators. (+ 1 2 3 4 5) is nicer IMHO than 1 + 2 + 3 + 4 + 5
A trick then is to use -> and ->> librerally when it makes logical sense to structure your code this way. This is typically useful when dealing with subsequent operations on objects or collections, e.g.
(->>
"Hello World"
distinct
sort
(take 3))
==> (\space \H \W)
The final trick I found very useful when working in prefix style is to make good use of indentation when building more complex expressions. If you indent properly, then you'll find that prefix notation is actually quite clear to read:
(defn add-foobars [x y]
(+
(bar x y)
(foo y)
(foo x)))
To my knowledge -> and ->> are idiomatic in Clojure. I use them all the time, and in my opinion they usually lead to much more readable code.
Here are some examples of these macros being used in popular projects from around the Clojure "ecosystem":
Ring cookie parsing
Leiningen internals
ClojureScript compiler
Proof by example :)
If you have a long expression chain, use let. Long runaway expressions or deeply nested expressions are not especially readable in any language. This is bad:
(do-something (map :id (filter #(> (:age %) 19) (fetch-data :people))))
This is marginally better:
(do-something (map :id
(filter #(> (:age %) 19)
(fetch-data :people))))
But this is also bad:
fetch_data(:people).select{|x| x.age > 19}.map{|x| x.id}.do_something
If we're reading this, what do we need to know? We're calling do_something on some attributes of some subset of people. This code is hard to read because there's so much distance between first and last, that we forget what we're looking at by the time we travel between them.
In the case of Ruby, do_something (or whatever is producing our final result) is lost way at the end of the line, so it's hard to tell what we're doing to our people. In the case of Clojure, it's immediately obvious that do-something is what we're doing, but it's hard to tell what we're doing it to without reading through the whole thing to the inside.
Any code more complex than this simple example is going to become pretty painful. If all of your code looks like this, your neck is going to get tired scanning back and forth across all of these lines of spaghetti.
I'd prefer something like this:
(let [people (fetch-data :people)
adults (filter #(> (:age %) 19) people)
ids (map :id adults)]
(do-something ids))
Now it's obvious: I start with people, I goof around, and then I do-something to them.
And you might get away with this:
fetch_data(:people).select{|x|
x.age > 19
}.map{|x|
x.id
}.do_something
But I'd probably rather do this, at the very least:
adults = fetch_data(:people).select{|x| x.age > 19}
do_something( adults.map{|x| x.id} )
It's also not unheard of to use let even when your intermediary expressions don't have good names. (This style is occasionally used in Clojure's own source code, e.g. the source code for defmacro)
(let [x (complex-expr-1 x)
x (complex-expr-2 x)
x (complex-expr-3 x)
...
x (complex-expr-n x)]
(do-something x))
This can be a big help in debugging, because you can inspect things at any point by doing:
(let [x (complex-expr-1 x)
x (complex-expr-2 x)
_ (prn x)
x (complex-expr-3 x)
...
x (complex-expr-n x)]
(do-something x))
I did indeed see the same hurdle when I first started with a lisp and it was really annoying until I saw the ways it makes code simpler and more clear, once I understood the upside the annoyance faded
initial + scale + offset
became
(+ initial scale offset)
and then try (+) prefix notation allows functions to specify their own identity values
user> (*)
1
user> (+)
0
There are lots more examples and my point is NOT to defend prefix notation. I just hope to convey that the learning curve flattens (emotionally) as the positive sides become apparent.
of course when you start writing macros then prefix notation becomes a must-have instead of a convenience.
to address the second part of your question, the thread first and thread last macros are idiomatic anytime they make the code more clear :) they are more often used in functions calls than pure arithmetic though nobody will fault you for using them when they make the equation more palatable.
ps: (.. object object2 object3) -> object().object2().object3();
(doto my-object
(setX 4)
(sety 5)`