How to use SmallCheck in Haskell? - testing

I am trying to use SmallCheck to test a Haskell program, but I cannot understand how to use the library to test my own data types. Apparently, I need to use the Test.SmallCheck.Series. However, I find the documentation for it extremely confusing. I am interested in both cookbook-style solutions and an understandable explanation of the logical (monadic?) structure. Here are some questions I have (all related):
If I have a data type data Person = SnowWhite | Dwarf Integer, how do I explain to smallCheck that the valid values are Dwarf 1 through Dwarf 7 (or SnowWhite)? What if I have a complicated FairyTale data structure and a constructor makeTale :: [Person] -> FairyTale, and I want smallCheck to make FairyTale-s from lists of Person-s using the constructor?
I managed to make quickCheck work like this without getting my hands too dirty by using judicious applications of Control.Monad.liftM to functions like makeTale. I couldn't figure out a way to do this with smallCheck (please explain it to me!).
What is the relationship between the types Serial, Series, etc.?
(optional) What is the point of coSeries? How do I use the Positive type from SmallCheck.Series?
(optional) Any elucidation of what is the logic behind what should be a monadic expression, and what is just a regular function, in the context of smallCheck, would be appreciated.
If there is there any intro/tutorial to using smallCheck, I'd appreciate a link. Thank you very much!
UPDATE: I should add that the most useful and readable documentation I found for smallCheck is this paper (PDF). I could not find the answer to my questions there on the first look; it is more of a persuasive advertisement than a tutorial.
UPDATE 2: I moved my question about the weird Identity that shows up in the type of Test.SmallCheck.list and other places to a separate question.

NOTE: This answer describes pre-1.0 versions of SmallCheck. See this blog post for the important differences between SmallCheck 0.6 and 1.0.
SmallCheck is like QuickCheck in that it tests a property over some part of the space of possible types. The difference is that it tries to exhaustively enumerate a series all of the "small" values instead of an arbitrary subset of smallish values.
As I hinted, SmallCheck's Serial is like QuickCheck's Arbitrary.
Now Serial is pretty simple: a Serial type a has a way (series) to generate a Series type which is just a function from Depth -> [a]. Or, to unpack that, Serial objects are objects we know how to enumerate some "small" values of. We are also given a Depth parameter which controls how many small values we should generate, but let's ignore it for a minute.
instance Serial Bool where series _ = [False, True]
instance Serial Char where series _ = "abcdefghijklmnopqrstuvwxyz"
instance Serial a => Serial (Maybe a) where
series d = Nothing : map Just (series d)
In these cases we're doing nothing more than ignoring the Depth parameter and then enumerating "all" possible values for each type. We can even do this automatically for some types
instance (Enum a, Bounded a) => Serial a where series _ = [minBound .. maxBound]
This is a really simple way of testing properties exhaustively—literally test every single possible input! Obviously there are at least two major pitfalls, though: (1) infinite data types will lead to infinite loops when testing and (2) nested types lead to exponentially larger spaces of examples to look through. In both cases, SmallCheck gets really large really quickly.
So that's the point of the Depth parameter—it lets the system ask us to keep our Series small. From the documentation, Depth is the
Maximum depth of generated test values
For data values, it is the depth of nested constructor applications.
For functional values, it is both the depth of nested case analysis and the depth of results.
so let's rework our examples to keep them Small.
instance Serial Bool where
series 0 = []
series 1 = [False]
series _ = [False, True]
instance Serial Char where
series d = take d "abcdefghijklmnopqrstuvwxyz"
instance Serial a => Serial (Maybe a) where
-- we shrink d by one since we're adding Nothing
series d = Nothing : map Just (series (d-1))
instance (Enum a, Bounded a) => Serial a where series d = take d [minBound .. maxBound]
Much better.
So what's coseries? Like coarbitrary in the Arbitrary typeclass of QuickCheck, it lets us build a series of "small" functions. Note that we're writing the instance over the input type---the result type is handed to us in another Serial argument (that I'm below calling results).
instance Serial Bool where
coseries results d = [\cond -> if cond then r1 else r2 |
r1 <- results d
r2 <- results d]
these take a little more ingenuity to write and I'll actually refer you to use the alts methods which I'll describe briefly below.
So how can we make some Series of Persons? This part is easy
instance Series Person where
series d = SnowWhite : take (d-1) (map Dwarf [1..7])
...
But our coseries function needs to generate every possible function from Persons to something else. This can be done using the altsN series of functions provided by SmallCheck. Here's one way to write it
coseries results d = [\person ->
case person of
SnowWhite -> f 0
Dwarf n -> f n
| f <- alts1 results d ]
The basic idea is that altsN results generates a Series of N-ary function from N values with Serial instances to the Serial instance of Results. So we use it to create a function from [0..7], a previously defined Serial value, to whatever we need, then we map our Persons to numbers and pass 'em in.
So now that we have a Serial instance for Person, we can use it to build more complex nested Serial instances. For "instance", if FairyTale is a list of Persons, we can use the Serial a => Serial [a] instance alongside our Serial Person instance to easily create a Serial FairyTale:
instance Serial FairyTale where
series = map makeFairyTale . series
coseries results = map (makeFairyTale .) . coseries results
(the (makeFairyTale .) composes makeFairyTale with each function coseries generates, which is a little confusing)

If I have a data type data Person = SnowWhite | Dwarf Integer, how do I explain to smallCheck that the valid values are Dwarf 1 through Dwarf 7 (or SnowWhite)?
First of all, you need to decide which values you want to generate for each depth. There's no single right answer here, it depends on how fine-grained you want your search space to be.
Here are just two possible options:
people d = SnowWhite : map Dwarf [1..7] (doesn't depend on the depth)
people d = take d $ SnowWhite : map Dwarf [1..7] (each unit of depth increases the search space by one element)
After you've decided on that, your Serial instance is as simple as
instance Serial m Person where
series = generate people
We left m polymorphic here as we don't require any specific structure of the underlying monad.
What if I have a complicated FairyTale data structure and a constructor makeTale :: [Person] -> FairyTale, and I want smallCheck to make FairyTale-s from lists of Person-s using the constructor?
Use cons1:
instance Serial m FairyTale where
series = cons1 makeTale
What is the relationship between the types Serial, Series, etc.?
Serial is a type class; Series is a type. You can have multiple Series of the same type — they correspond to different ways to enumerate values of that type. However, it may be arduous to specify for each value how it should be generated. The Serial class lets us specify a good default for generating values of a particular type.
The definition of Serial is
class Monad m => Serial m a where
series :: Series m a
So all it does is assigning a particular Series m a to a given combination of m and a.
What is the point of coseries?
It is needed to generate values of functional types.
How do I use the Positive type from SmallCheck.Series?
For example, like this:
> smallCheck 10 $ \n -> n^3 >= (n :: Integer)
Failed test no. 5.
there exists -2 such that
condition is false
> smallCheck 10 $ \(Positive n) -> n^3 >= (n :: Integer)
Completed 10 tests without failure.
Any elucidation of what is the logic behind what should be a monadic expression, and what is just a regular function, in the context of smallCheck, would be appreciated.
When you are writing a Serial instance (or any Series expression), you work in the Series m monad.
When you are writing tests, you work with simple functions that return Bool or Property m.

While I think that #tel's answer is an excellent explanation (and I wish smallCheck actually worked the way he describes), the code he provides does not work for me (with smallCheck version 1). I managed to get the following to work...
UPDATE / WARNING: The code below is wrong for a rather subtle reason. For the corrected version, and details, please see this answer to the question mentioned below. The short version is that instead of instance Serial Identity Person one must write instance (Monad m) => Series m Person.
... but I find the use of Control.Monad.Identity and all the compiler flags bizarre, and I have asked a separate question about that.
Note also that while Series Person (or actually Series Identity Person) is not actually exactly the same as functions Depth -> [Person] (see #tel's answer), the function generate :: Depth -> [a] -> Series m a converts between them.
{-# LANGUAGE FlexibleInstances, MultiParamTypeClasses, FlexibleContexts, UndecidableInstances #-}
import Test.SmallCheck
import Test.SmallCheck.Series
import Control.Monad.Identity
data Person = SnowWhite | Dwarf Int
instance Serial Identity Person where
series = generate (\d -> SnowWhite : take (d-1) (map Dwarf [1..7]))

Related

menhir - associate AST nodes with token locations in source file

I am using Menhir to parse a DSL. My parser builds an AST using an elaborate collection of nested types. During later typecheck and other passes in error reports generated for a user, I would like to refer to source file position where it occurred. These are not parsing errors, and they generated after parsing is completed.
A naive solution would be to equip all AST types with additional location information, but that would make working with them (e.g. constructing or matching) unnecessary clumsy. What are the established practices to do that?
I don't know if it's a best practice, but I like the approach taken in the abstract syntax tree of the Frama-C system; see https://github.com/Frama-C/Frama-C-snapshot/blob/master/src/kernel_services/ast_data/cil_types.mli
This approach uses "layers" of records and algebraic types nested in each other. The records hold meta-information like source locations, as well as the algebraic "node" you can match on.
For example, here is a part of the representation of expressions:
type ...
and exp = {
eid: int; (** unique identifier *)
enode: exp_node; (** the expression itself *)
eloc: location; (** location of the expression. *)
}
and exp_node =
| Const of constant (** Constant *)
| Lval of lval (** Lvalue *)
| UnOp of unop * exp * typ
| BinOp of binop * exp * exp * typ
...
So given a variable e of type exp, you can access its source location with e.eloc, and pattern match on its abstract syntax tree in e.enode.
So simple, "top-level" matches on syntax are very easy:
let rec is_const_expr e =
match e.enode with
| Const _ -> true
| Lval _ -> false
| UnOp (_op, e', _typ) -> is_const_expr e'
| BinOp (_op, l, r, _typ) -> is_const_expr l && is_const_expr r
To match deeper in an expression, you have to go through a record at each level. This adds some syntactic clutter, but not too much, as you can pattern match on only the one record field that interests you:
let optimize_double_negation e =
match e.enode with
| UnOp (Neg, { enode = UnOp (Neg, e', _) }, _) -> e'
| _ -> e
For comparison, on a pure AST without metadata, this would be something like:
let optimize_double_negation e =
match e.enode with
| UnOp (Neg, UnOp (Neg, e', _), _) -> e'
| _ -> e
I find that Frama-C's approach works well in practice.
You need somehow to attach the location information to your nodes. The usual solution is to encode your AST node as a record, e.g.,
type node =
| Typedef of typdef
| Typeexp of typeexp
| Literal of string
| Constant of int
| ...
type annotated_node = { node : node; loc : loc}
Since you're using records, you can still pattern match without too much syntactic overhead, e.g.,
match node with
| {node=Typedef t} -> pp_typedef t
| ...
Depending on your representation, you may choose between wrapping each branch of your type individually, wrapping the whole type, or recursively, like in Frama-C example by #Isabelle Newbie.
A similar but more general approach is to extend a node not with the location, but just with a unique identifier and to use a final map to add arbitrary data to nodes. The benefit of this approach is that you can extend your nodes with arbitrary data as you actually externalize node attributes. The drawback is that you can't actually guarantee the totality of an attribute since finite maps are no total. Thus it is harder to preserve an invariant that, for example, all nodes have a location.
Since every heap allocated object already has an implicit unique identifier, the address, it is possible to attach data to the heap allocated objects without actually wrapping it in another type. For example, we can still use type node as it is and use finite maps to attach arbitrary pieces of information to them, as long as each node is a heap object, i.e., the node definition doesn't contain constant constructors (in case if it has, you can work around it by adding a bogus unit value, e.g., | End can be represented as | End of unit.
Of course, by saying an address, I do not literally mean the physical or virtual address of an object. OCaml uses a moving GC so an actual address of an OCaml object may change during a program execution. Moreover, an address, in general, is not unique, as once an object is deallocated its address can be grabbed by a completely different entity.
Fortunately, after ephemera were added to the recent version of OCaml it is no longer a problem. Moreover, an ephemeron will play nicely with the GC, so that if a node is no longer reachable its attributes (like file locations) will be collected by the GC. So, let's ground this with a concrete example. Suppose we have two nodes c1 and c2:
let c1 = Literal "hello"
let c2 = Constant 42
Now we can create a location mapping from nodes to locations (we will represent the latter as just strings)
module Locations = Ephemeron.K1.Make(struct
type t = node
let hash = Hashtbl.hash (* or your own hash if you have one *)
let equal = (=) (* or a specilized equal operator *)
end)
The Locations module provides an interface of a typical imperative hash table. So let's use it. In the parser, whenever you create a new node you should register its locations in the global locations value, e.g.,
let locations = Locations.create 1337
(* somewhere in the semantics actions, where c1 and c2 are created *)
Locations.add c1 "hello.ml:12:32"
Locations.add c2 "hello.ml:13:56"
And later, you can extract the location:
# Locations.find locs c1;;
- : string = "hello.ml:12:32"
As you see, although the solution is nice in the sense, that it doesn't touch the node data type, so the rest of your code can pattern match on it nice and easy, it is still a little bit dirty, as it requires global mutable state, that is hard to maintain. Also, since we are using an object address as a key, every newly created object, even if it was logically derived from the original object, will have a different identity. For example, suppose you have a function, that normalizes all literals:
let normalize = function
| Literal str -> Literal (normalize_literal str)
| node -> node
It will create a new Literal node from the original nodes, so all the location information will be lost. That means, that you need to update the location information, every time you derive one node from another.
Another issue with ephemera is that they can't survive the marshaling or serialization. I.e., if you store your AST somewhere in a file, and then you restore it, all nodes will loose their identity, and the location table will become empty.
Speaking of the "monadic approach" that you mentioned in comments. Though monads are magic, they still can't magically solve all the problems. They are not silver bullets :) In order to attach something to a node we still need to extend it with an extra attribute - either a location information directly or an identity through which we can attach properties indirectly. The monad can be useful for the latter though, as instead of having a global reference to the last assigned identifier, we can use a state monad, to encapsulate our id generator. And for the sake of completeness, instead of using a state monad or a global reference to generate unique identifiers, you can use UUID and get identifiers that are not only unique in a program run, but are also universally unique, in the sense that there are no other objects in the world with the same identifier, no matter how often you run your program (in the sane world). And although it looks like that generating the UUID doesn't use any state, underneath the hood it still uses an imperative random number generator, so it is sort of cheating, but still can seen as pure functional, as it doesn't contain observable effects.

Return highest or lowest value Z notation , formal method

I am new to Z notation,
Lets say I have a function f defined as X |--> Y ,
where X is string and Y is number.
How can I get highest Y value in this function? Does 'loop' exist in formal method so I can solve it using loop?
I know there is recursion in Z notation, but based on the material provided, I only found it apply in multiset or bag, can it apply in function?
Any extra reference application of 'loop' or recursion application will be appreciated. Sorry for my English.
You can just use the predefined function max that takes a set of integers as input and returns the maximum number. The input values here are the range (the set of all values) of the function:
max(ran(f))
Please note that the maximum is not defined for empty sets.
Regarding your question about recursion or loops: You can actually define a function recursively but I think your question aims more at a way to compute something. This is not easily expressed in Z and this is IMO a good thing because it is used for specifications and it is not a programming language. Even if there wouldn't be a max or ran function, you could still specify the number m you are looking for by:
\exists s:String # (s,m):f /\
\forall s2:String, i2:Z # (s2,i2):f ==> i2 <= m
("m is a value of f, belonging to an s and all other values i2 of f are smaller or equal")
After getting used to the style it is usually far better to understand than any programming language (except your are trying to describe an algorithm itself and not its expected outcome).#
Just for reference: An example of a recursive definition (let's call it rmax) for the maximum would consist of a base case:
\forall e:Z # rmax({e}) = e
and a recursive case:
\forall e:Z; S:\pow(Z) #
S \noteq {} \land
rmax({e} \cup S) = \IF e > rmax(S) \THEN e \ELSE rmax(S)
But note that this is still not a "computation rule" of rmax because e in the second rule can be an arbitrary element of S. In more complex scenarios it might even be not obvious that the defined relation is a function at all because depending on the chosen elements different results could be computed.

How are records stored in erlang, and how are they mutated?

I recently came across some code that looked something like the following:
-record(my_rec, {f0, f1, f2...... f711}).
update_field({f0, Val}, R) -> R#my_rec{f0 = Val};
update_field({f1, Val}, R) -> R#my_rec{f1 = Val};
update_field({f2, Val}, R) -> R#my_rec{f2 = Val};
....
update_field({f711, Val}, R) -> R#my_rec{f711 = Val}.
generate_record_from_proplist(Props)->
lists:foldl(fun update_field/2, #my_rec{}, Props).
My question is about what actually happens to the record - lets say the record has 711 fields and I'm generating it from a proplist - since the record is immutable, we are, at least semantically, generating a new full record one every step in the foldr - making what looks like a function that would be linear in the length of the arguments, into one that is actually quadratic in the length, since there are updates corresponding to the length the record for every insert - Am I correct in this assumption,or is the compiler intelligent enough
to save me?
Records are tuples which first element contains the name of the records, and the next one the record fields.
The name of the fields is not stored, it is a facility for the compiler, and of course the programmer. I think it was introduced only to avoid errors in the field order when writting programs, and to allow tupple extension when releasing new version without rewriting all pattern matches.
Your code will make 712 copies of a 713 element tuple.
I am afraid, the compiler is not smart enough.
You can read more in this SO answer. If you have so big number of fields and you want to update it in O(1) time, you should use ETS tables.

QuickCheck: Arbitrary instances of nested data structures that generate balanced specimens

tl;dr: how do you write instances of Arbitrary that don't explode if your data type allows for way too much nesting? And how would you guarantee these instances produce truly random specimens of your data structure?
I want to generate random tree structures, then test certain properties of these structures after I've mangled them with my library code. (NB: I'm writing an implementation of a subtyping algorithm, i.e. given a hierarchy of types, is type A a subtype of type B. This can be made arbitrarily complex, by including multiple-inheritance and post-initialization updates to the hierarchy. The classical method that supports neither of these is Schubert Numbering, and the latest result known to me is Alavi et al. 2008.)
Let's take the example of rose-trees, following Data.Tree:
data Tree a = Node a (Forest a)
type Forest a = [Tree a]
A very simple (and don't-try-this-at-home) instance of Arbitray would be:
instance (Arbitrary a) => Arbitrary (Tree a) where
arbitrary = Node <$> arbitrary <$> arbitrary
Since a already has an Arbitrary instance as per the type constraint, and the Forest will have one, because [] is an instance, too, this seems straight-forward. It won't (typically) terminate for very obvious reasons: since the lists it generates are arbitrarily long, the structures become too large, and there's a good chance they won't fit into memory. Even a more conservative approach:
arbitrary = Node <$> arbitrary <*> oneof [arbitrary,return []]
won't work, again, for the same reason. One could tweak the size parameter, to keep the length of the lists down, but even that won't guarantee termination, since it's still multiple consecutive dice-rolls, and it can turn out quite badly (and I want the odd node with 100 children.)
Which means I need to limit the size of the entire tree. That is not so straight-forward. unordered-containers has it easy: just use fromList. This is not so easy here: How do you turn a list into a tree, randomly, and without incurring bias one way or the other (i.e. not favoring left-branches, or trees that are very left-leaning.)
Some sort of breadth-first construction (the functions provided by Data.Tree are all pre-order) from lists would be awesome, and I think I could write one, but it would turn out to be non-trivial. Since I'm using trees now, but will use even more complex stuff later on, I thought I might try to find a more general and less complex solution. Is there one, or will I have to resort to writing my own non-trivial Arbitrary generator? In the latter case, I might actually just resort to unit-tests, since this seems too much work.
Use sized:
instance Arbitrary a => Arbitrary (Tree a) where
arbitrary = sized arbTree
arbTree :: Arbitrary a => Int -> Gen (Tree a)
arbTree 0 = do
a <- arbitrary
return $ Node a []
arbTree n = do
(Positive m) <- arbitrary
let n' = n `div` (m + 1)
f <- replicateM m (arbTree n')
a <- arbitrary
return $ Node a f
(Adapted from the QuickCheck presentation).
P.S. Perhaps this will generate overly balanced trees...
You might want to use the library presented in the paper "Feat: Functional Enumeration of Algebraic Types" at the Haskell Symposium 2012. It is on Hackage as testing-feat, and a video of the talk introducing it is available here: http://www.youtube.com/watch?v=HbX7pxYXsHg
As Janis mentioned, you can use the package testing-feat, which creates enumerations of arbitrary algebraic data types. This is the easiest way to create unbiased uniformly distributed generators
for all trees of up to a given size.
Here is how you would use it for rose trees:
import Test.Feat (Enumerable(..), uniform, consts, funcurry)
import Test.Feat.Class (Constructor)
import Data.Tree (Tree(..))
import qualified Test.QuickCheck as QC
-- We make an enumerable instance by listing all constructors
-- for the type. In this case, we have one binary constructor:
-- Node :: a -> [Tree a] -> Tree a
instance Enumerable a => Enumerable (Tree a) where
enumerate = consts [binary Node]
where
binary :: (a -> b -> c) -> Constructor c
binary = unary . funcurry
-- Now we use the Enumerable instance to create an Arbitrary
-- instance with the help of the function:
-- uniform :: Enumerable a => Int -> QC.Gen a
instance Enumerable a => QC.Arbitrary (Tree a) where
QC.arbitrary = QC.sized uniform
-- QC.shrink = <some implementation>
The Enumerable instance can also be generated automatically with TemplateHaskell:
deriveEnumerable ''Tree

If I come from an imperative programming background, how do I wrap my head around the idea of no dynamic variables to keep track of things in Haskell?

So I'm trying to teach myself Haskell. I am currently on the 11th chapter of Learn You a Haskell for Great Good and am doing the 99 Haskell Problems as well as the Project Euler Problems.
Things are going alright, but I find myself constantly doing something whenever I need to keep track of "variables". I just create another function that accepts those "variables" as parameters and recursively feed it different values depending on the situation. To illustrate with an example, here's my solution to Problem 7 of Project Euler, Find the 10001st prime:
answer :: Integer
answer = nthPrime 10001
nthPrime :: Integer -> Integer
nthPrime n
| n < 1 = -1
| otherwise = nthPrime' n 1 2 []
nthPrime' :: Integer -> Integer -> Integer -> [Integer] -> Integer
nthPrime' n currentIndex possiblePrime previousPrimes
| isFactorOfAnyInThisList possiblePrime previousPrimes = nthPrime' n currentIndex theNextPossiblePrime previousPrimes
| otherwise =
if currentIndex == n
then possiblePrime
else nthPrime' n currentIndexPlusOne theNextPossiblePrime previousPrimesPlusCurrentPrime
where currentIndexPlusOne = currentIndex + 1
theNextPossiblePrime = nextPossiblePrime possiblePrime
previousPrimesPlusCurrentPrime = possiblePrime : previousPrimes
I think you get the idea. Let's also just ignore the fact that this solution can be made to be more efficient, I'm aware of this.
So my question is kind of a two-part question. First, am I going about Haskell all wrong? Am I stuck in the imperative programming mindset and not embracing Haskell as I should? And if so, as I feel I am, how do avoid this? Is there a book or source you can point me to that might help me think more Haskell-like?
Your help is much appreciated,
-Asaf
Am I stuck in the imperative programming mindset and not embracing
Haskell as I should?
You are not stuck, at least I don't hope so. What you experience is absolutely normal. While you were working with imperative languages you learned (maybe without knowing) to see programming problems from a very specific perspective - namely in terms of the van Neumann machine.
If you have the problem of, say, making a list that contains some sequence of numbers (lets say we want the first 1000 even numbers), you immediately think of: a linked list implementation (perhaps from the standard library of your programming language), a loop and a variable that you'd set to a starting value and then you would loop for a while, updating the variable by adding 2 and putting it to the end of the list.
See how you mostly think to serve the machine? Memory locations, loops, etc.!
In imperative programming, one thinks about how to manipulate certain memory cells in a certain order to arrive at the solution all the time. (This is, btw, one reason why beginners find learning (imperative) programming hard. Non programmers are simply not used to solve problems by reducing it to a sequence of memory operations. Why should they? But once you've learned that, you have the power - in the imperative world. For functional programming you need to unlearn that.)
In functional programming, and especially in Haskell, you merely state the construction law of the list. Because a list is a recursive data structure, this law is of course also recursive. In our case, we could, for example say the following:
constructStartingWith n = n : constructStartingWith (n+2)
And almost done! To arrive at our final list we only have to say where to start and how many we want:
result = take 1000 (constructStartingWith 0)
Note that a more general version of constructStartingWith is available in the library, it is called iterate and it takes not only the starting value but also the function that makes the next list element from the current one:
iterate f n = n : iterate f (f n)
constructStartingWith = iterate (2+) -- defined in terms of iterate
Another approach is to assume that we had another list our list could be made from easily. For example, if we had the list of the first n integers we could make it easily into the list of even integers by multiplying each element with 2. Now, the list of the first 1000 (non-negative) integers in Haskell is simply
[0..999]
And there is a function map that transforms lists by applying a given function to each argument. The function we want is to double the elements:
double n = 2*n
Hence:
result = map double [0..999]
Later you'll learn more shortcuts. For example, we don't need to define double, but can use a section: (2*) or we could write our list directly as a sequence [0,2..1998]
But not knowing these tricks yet should not make you feel bad! The main challenge you are facing now is to develop a mentality where you see that the problem of constructing the list of the first 1000 even numbers is a two staged one: a) define how the list of all even numbers looks like and b) take a certain portion of that list. Once you start thinking that way you're done even if you still use hand written versions of iterate and take.
Back to the Euler problem: Here we can use the top down method (and a few basic list manipulation functions one should indeed know about: head, drop, filter, any). First, if we had the list of primes already, we can just drop the first 1000 and take the head of the rest to get the 1001th one:
result = head (drop 1000 primes)
We know that after dropping any number of elements form an infinite list, there will still remain a nonempty list to pick the head from, hence, the use of head is justified here. When you're unsure if there are more than 1000 primes, you should write something like:
result = case drop 1000 primes of
[] -> error "The ancient greeks were wrong! There are less than 1001 primes!"
(r:_) -> r
Now for the hard part. Not knowing how to proceed, we could write some pseudo code:
primes = 2 : {-an infinite list of numbers that are prime-}
We know for sure that 2 is the first prime, the base case, so to speak, thus we can write it down. The unfilled part gives us something to think about. For example, the list should start at some value that is greater 2 for obvious reason. Hence, refined:
primes = 2 : {- something like [3..] but only the ones that are prime -}
Now, this is the point where there emerges a pattern that one needs to learn to recognize. This is surely a list filtered by a predicate, namely prime-ness (it does not matter that we don't know yet how to check prime-ness, the logical structure is the important point. (And, we can be sure that a test for prime-ness is possible!)). This allows us to write more code:
primes = 2 : filter isPrime [3..]
See? We are almost done. In 3 steps, we have reduced a fairly complex problem in such a way that all that is left to write is a quite simple predicate.
Again, we can write in pseudocode:
isPrime n = {- false if any number in 2..n-1 divides n, otherwise true -}
and can refine that. Since this is almost haskell already, it is too easy:
isPrime n = not (any (divides n) [2..n-1])
divides n p = n `rem` p == 0
Note that we did not do optimization yet. For example we can construct the list to be filtered right away to contain only odd numbers, since we know that even ones are not prime. More important, we want to reduce the number of candidates we have to try in isPrime. And here, some mathematical knowledge is needed (the same would be true if you programmed this in C++ or Java, of course), that tells us that it suffices to check if the n we are testing is divisible by any prime number, and that we do not need to check divisibility by prime numbers whose square is greater than n. Fortunately, we have already defined the list of prime numbers and can pick the set of candidates from there! I leave this as exercise.
You'll learn later how to use the standard library and the syntactic sugar like sections, list comprehensions, etc. and you will gradually give up to write your own basic functions.
Even later, when you have to do something in an imperative programming language again, you'll find it very hard to live without infinte lists, higher order functions, immutable data etc.
This will be as hard as going back from C to Assembler.
Have fun!
It's ok to have an imperative mindset at first. With time you will get more used to things and start seeing the places where you can have more functional programs. Practice makes perfect.
As for working with mutable variables you can kind of keep them for now if you follow the rule of thumb of converting variables into function parameters and iteration into tail recursion.
Off the top of my head:
Typeclassopedia. The official v1 of the document is a pdf, but the author has moved his v2 efforts to the Haskell wiki.
What is a monad? This SO Q&A is the best reference I can find.
What is a Monad Transformer? Monad Transformers Step by Step.
Learn from masters: Good Haskell source to read and learn from.
More advanced topics such as GADTs. There's a video, which does a great job explaining it.
And last but not least, #haskell IRC channel. Nothing can even come close to talk to real people.
I think the big change from your code to more haskell like code is using higher order functions, pattern matching and laziness better. For example, you could write the nthPrime function like this (using a similar algorithm to what you did, again ignoring efficiency):
nthPrime n = primes !! (n - 1) where
primes = filter isPrime [2..]
isPrime p = isPrime' p [2..p - 1]
isPrime' p [] = True
isPrime' p (x:xs)
| (p `mod` x == 0) = False
| otherwise = isPrime' p xs
Eg nthPrime 4 returns 7. A few things to note:
The isPrime' function uses pattern matching to implement the function, rather than relying on if statements.
the primes value is an infinite list of all primes. Since haskell is lazy, this is perfectly acceptable.
filter is used rather than reimplemented that behaviour using recursion.
With more experience you will find you will write more idiomatic haskell code - it sortof happens automatically with experience. So don't worry about it, just keep practicing, and reading other people's code.
Another approach, just for variety! Strong use of laziness...
module Main where
nonmults :: Int -> Int -> [Int] -> [Int]
nonmults n next [] = []
nonmults n next l#(x:xs)
| x < next = x : nonmults n next xs
| x == next = nonmults n (next + n) xs
| otherwise = nonmults n (next + n) l
select_primes :: [Int] -> [Int]
select_primes [] = []
select_primes (x:xs) =
x : (select_primes $ nonmults x (x + x) xs)
main :: IO ()
main = do
let primes = select_primes [2 ..]
putStrLn $ show $ primes !! 10000 -- the first prime is index 0 ...
I want to try to answer your question without using ANY functional programming or math, not because I don't think you will understand it, but because your question is very common and maybe others will benefit from the mindset I will try to describe. I'll preface this by saying I an not a Haskell expert by any means, but I have gotten past the mental block you have described by realizing the following:
1. Haskell is simple
Haskell, and other functional languages that I'm not so familiar with, are certainly very different from your 'normal' languages, like C, Java, Python, etc. Unfortunately, the way our psyche works, humans prematurely conclude that if something is different, then A) they don't understand it, and B) it's more complicated than what they already know. If we look at Haskell very objectively, we will see that these two conjectures are totally false:
"But I don't understand it :("
Actually you do. Everything in Haskell and other functional languages is defined in terms of logic and patterns. If you can answer a question as simple as "If all Meeps are Moops, and all Moops are Moors, are all Meeps Moors?", then you could probably write the Haskell Prelude yourself. To further support this point, consider that Haskell lists are defined in Haskell terms, and are not special voodoo magic.
"But it's complicated"
It's actually the opposite. It's simplicity is so naked and bare that our brains have trouble figuring out what to do with it at first. Compared to other languages, Haskell actually has considerably fewer "features" and much less syntax. When you read through Haskell code, you'll notice that almost all the function definitions look the same stylistically. This is very different than say Java for example, which has constructs like Classes, Interfaces, for loops, try/catch blocks, anonymous functions, etc... each with their own syntax and idioms.
You mentioned $ and ., again, just remember they are defined just like any other Haskell function and don't necessarily ever need to be used. However, if you didn't have these available to you, over time, you would likely implement these functions yourself when you notice how convenient they can be.
2. There is no Haskell version of anything
This is actually a great thing, because in Haskell, we have the freedom to define things exactly how we want them. Most other languages provide building blocks that people string together into a program. Haskell leaves it up to you to first define what a building block is, before building with it.
Many beginners ask questions like "How do I do a For loop in Haskell?" and innocent people who are just trying to help will give an unfortunate answer, probably involving a helper function, and extra Int parameter, and tail recursing until you get to 0. Sure, this construct can compute something like a for loop, but in no way is it a for loop, it's not a replacement for a for loop, and in no way is it really even similar to a for loop if you consider the flow of execution. Similar is the State monad for simulating state. It can be used to accomplish similar things as static variables do in other languages, but in no way is it the same thing. Most people leave off the last tidbit about it not being the same when they answer these kinds of questions and I think that only confuses people more until they realize it on their own.
3. Haskell is a logic engine, not a programming language
This is probably least true point I'm trying to make, but hear me out. In imperative programming languages, we are concerned with making our machines do stuff, perform actions, change state, and so on. In Haskell, we try to define what things are, and how are they supposed to behave. We are usually not concerned with what something is doing at any particular time. This certainly has benefits and drawbacks, but that's just how it is. This is very different than what most people think of when you say "programming language".
So that's my take how how to leave an imperative mindset and move to a more functional mindset. Realizing how sensible Haskell is will help you not look at your own code funny anymore. Hopefully thinking about Haskell in these ways will help you become a more productive Haskeller.