Does functional programming use variables?
If no, how do the functional programs occupy memory?
Both functional programs and imperative (C#, Java) programs use variables, but they define them differently.
In functional programs the variables are like those in mathematics, once a value has been assigned the value cannot change.
In imperative languages it is typical that the values held by variables an be changed.
In both cases variables use memory.
If you're asking about implementation details for various methods of compiling functional programs, you probably need to start with reading "Implementing functional languages: a tutorial". It is a bit out of date (e.g., it does not cover the modern STG approach), but still valuable. Another, even older text to read is Field, Harrison, "Functional programming" (never mind the title, it's mostly about implementing FP compilers).
Pure functional programming uses no variables, but maybe constants in the C sense (that is, assigned only once, but at runtime).
Functional programs occupy memory with the function call "stack", i.e. the current expression and the arguments of recursively called functions.
Does functional programming use variables?
Well, at least you can bind names to values. One can call this name a variable, even if it is not variable. But in math, when we see:
x + 3 = 5
we call x a variale, though it is just another name of 2.
Otoh, the names that are bound to arguments of functions are indeed variable, if only across different invocations of the function.
If no, how do the functional programs occupy memory?
There will be language elements to construct non-primitive values, like lists, tuples, etc. Such a data constructor creates new values from old ones (somewhere in memory, but those details are irrelevant for FP).
Related
I've started playing around with Kotlin, but I sense my own limitation in the way I program. My problem is that I still think Java therefore the style is still imperative, my question is to all functional programming zealots , which I believe would be very useful to all people who at the very beginning stage and also need to 'brake' their brain to start building it again; to leave comfort zone and start thinking pseudo and not in "whatever is your first language". I believe it is possible for highly experienced polyglot developers to chew the concepts down to plain advices of what makes your program being written in entirely functional way and what violates the paradigm. I don't know all the quirks but please don't hesitate to include universally accepted terms which might be unknown to me(I can always lookup). At this point I need this set of rules to make myself suffer at first and not break them but then I know I will feel it, analyze guidelines and understand how they are worse/better which of course is my own homework.
So example of these guidelines, would be something like:
Never change state, this can be avoided by using x, y, z
Operate using higher order functions only (I maybe wrong, just example)
I hope the answer will give me long term reference to put myself in extreme conditions where I stop escaping to OOP whenever I feel uncomfortable. And now when I look at Kotlin I understand how I've should've been thinking about problems, it is about intention not about the structure imposed by one language or another. Intention can always be converted to a language of your choice and backed up by design patterns applicable to the language, but to find that middle ground I need to jail myself first from the comfort zone.
Avoid mutable state like the plague.
One of the main points of using functional programming, possibly the main one, is to avoid all the little pitfalls, bugs, issues one needs to deal with when using mutable state. You should do everything you can in order to avoid mutating state. For instance, instead of using C-style for-loops where you need to keep a counter variable updated, use map and other higher-order functions in order to abstract away your iteration patterns. This also means that you should never change the value of a variable if you can avoid that. Instead, you should be defining almost all of your variables, preferrably all of them, as constants, and using functions to compute new values from them instead of mutating them.
Avoid side-effects like the plague.
Mutable state's ugly cousin, side-effects. Side effects mean anything other than taking a value and returning a value in a function. If that function prints data, mutates global variables, sends messages to threads, or anything, anything other than simply taking its parameters, computing a value from them, and returning a value, that function has side-effects. Side-effects are important (see next bullet point), but if you use them a lot, they get impossible to track. Just think of how everyone tells you to avoid global variables in imperative programming. Functional programming goes a step further and tries to avoid all side-effects. The bulk of your program should be made of pure functions. (See ahead)
When you need to use side-effects, keep them contained.
Yes, I just told you to run away from side-effects. However, no program is useful without side-effects of some kind. Graphical User Interface? Side-effect. Audio output? Side-effect. Printing to a shell? Side-effect. So you can't really get rid of side-effects if you want to build useful stuff.
What you should do instead is write your code so that all your side-effecting code lives in a thin layer which mostly calls pure functions and then does the required side-effects using the result of these pure function calls.
Use pure functions for everything you can.
This is sort of the flipside of the previous point. A pure function is a function which has no side-effects and does not mutate anything. It can only take in parameters and return a value. You should use these a lot. For instance, instead of doing your logging within functions which are computing stuff, you should be constructing your log strings using pure functions, and then letting your side-effects layer call these pure functions, call more pure functions in order to format the log strings into a full log, and then output the log itself from your side-effects layer.
Use higher-order functions to structure your code.
Higher-order functions are, in a way, the glue that makes functional programming work. A higher-order function is a function which takes one or more functions as parameters and/or returns a function. The power of higher-order functions is that they can encapsulate many of the patterns which you would use in an imperative-style program in a declarative manner. For instance, let's take a look at the three most common higher-order functions:
map is a function which takes a function and a list of values, applies its function argument to each of those values, and returns a new list with the results. map encapsulates the whole pattern of iterating over a list doing an operation on each value in a declarative manner.
filter is a function which takes a function which returns a boolean and a list of values, applies its function argument to each of those values and returns a list containing only those values for which its function argument returns true. It encapsulates the whole pattern of selecting results from a list in a declarative manner.
reduce, also known as fold, takes an initial value, a binary function and a list of values. It uses its function argument to combine the initial value with the first value of the list, then combines the result with the next value of the list and keeps on doing this until it has reduced the list to just one single value. It encapsulates the entire pattern of obtaining an aggregate value from a list of values.
This is in no way an exhaustive list of higher-order functions, but these three are the most common ones. I hope this has been enough to show how you can structure code which would require a lot of tracking variables using only functions in a declarative manner. If you use these higher-order functions well, it's likely you won't ever need a for or while loop again.
This is definitely not an exhaustive list of functional programming practices, but I think most functional programmers would agree these five guidelines form the core of what functional programming is about. If you want to really learn how to apply these, my advice would be to learn a pure functional programming language such as Haskell, so you are forced to abandon the imperative paradigm and to learn how to structure things functionally instead. I would recommend the fantastic Haskell Programming from First Principles as a starting resource if you choose to go this way. In case you don't want to/can't put down the cash, Brent Yorgey's Haskell course at UPenn is also a great free resource.
In Chapter 36.4 of HTDP(How to Design Programs),
I found this warning:
Warning: The state variable is never a parameter of a function.
But as far as I've heard before, in functional programming, functions will be corrupted if they refer state variables. They will not be pure functions anymore. They will be hard to test, do unpredictable works, cannot be memoized ... etc. The state variables also should be passed by as parameters, not just referred as some global constants.
So I wonder
is HTDP is arguing something wrong,
in some of functional programming practices, global state variables are allowed? or
I have wrong idea?
Thanks in advance.
Disclaimer: I like&respect this book very much and learned a lot. Actually I would like to spread good words about this book to my friends(if any). So don't get it wrong.
I don't think there's anything incompatible with what you've heard about functional programming and what is written in the chapter you linked. However, you're conflating two concepts here: the presence of mutable state in functional programs (a purity issue) vs. the order in which things are evaluated, and the restrictions on the syntax you have available to write things down.
Consider: if you're using an eager evaluation strategy, then passing a "state variable" of the kind they describe in that chapter would have the effect of dereferencing it, and you would get the value of the variable as the function argument. Similarly, if the variable was bound as a parameter to the function, you would get a different bit of memory at every call. There are many different options here. The fact that some languages permit you to pass references around as values is not universal.
So they are really just describing global variables (or variables that are accessed from some parent scope), which by their very nature need not be passed to functions as parameters. If the specific language permits pass-by-reference, this might not be such a clear distinction.
The browser-based software StudyTRAX ( http://wiki.studytrax.com ), used for research data management, allows for custom form and form variable management via JavaScript. However, a StudyTRAX "variable" (essentially, a representation of both an element of a form [HTML properties included] and its corresponding parameter, with some data typing/etc.) must be referred to with #<varname>, while regular JavaScript variables will just be <varname>.
Is this sort of thing done to make parsing easier, or is it just to distinguish between the two so that researchers who aren't so technologically-inclined won't have as much trouble figuring out what they're doing? Given the nature of JavaScript, I would think the StudyTRAX "variables" are just regular JavaScript objects defined in such a way to make form design and customization simpler, and thus the latter would make more sense, but am I wrong?
Also, I know that there are other programming languages that do require specific variable prefixes (though I can't think of some off the top of my head at the moment); what is/was the usual reasoning for that choice in language design?
Two part answer, StudyTRAX is almost certainly using a preprocessor to do some magic. JavaScript makes this relativity easy, but not as easy as a Lisp would. You still need to parse the code. By prefixing, the parser can ignore a lot of the complicated syntax of JavaScript and get to the good part without needing a "picture perfect" compiler. Actually, a lot of templeting systems do this. It is an implementation of Lisp's quasi-quote (see Greenspun's Tenth Rule).
As for prefixes in general, the best way to understand them is to try to write a parser for a language without them. For very dynamic and pure languages like Lisp and JavaScript where everything is a List / object it is not too bad. When you get languages where methods are distinct from objects, or functions are not first class the parser begins having to ask itself what type of thing doe "foo" refer to? An annoying example from Ruby: an unprefixed identifier is either a local variable or a method implicitly on self. In Rails there are a few functions that are implemented with method_missing. Person.find_first_by_rank works fine, but
Class Person < ActiveRecord::Base
def promotion(name)
p = find_first_by_rank
[...]
end
end
gives an error because find_first_by_rank looks like it might be a local variable and Ruby is scared to call method_missing on something that might just be a misspelled local variable.
Now imagine trying to distinguish between instance variables (prefix-#), class-variables (prefix-##), global variables (prefix-$), Constants (first letter Capitol), method names and local variables (no prefix small case) by context alone.
(From a Compiler & Language Hobbyst Designer).
Your question is more especific to the "StudyTRAX" software.
In early days of programming, variables in Basic used prefixes as $ (for strings, "a$"), to difference from numeric values. Today, some programming languages such as PHP prefixes variables with "$". COBNOL used variables starting with A to I, for integers, and later letters for floats.
Transforming, and later, executing some code, its a complex task, that's why many programmers, use shortcuts like adding prefixes or suffixes to programming languages.
In many Collegues or Universities, exist specialized classes / courses for transforming code from a programming language, to something that the computer does, like "Compilers", "Automatons", "Language Design", because its not an easy task.
Perl requires different variable prefixes, depending on the type of data:
$scalar = 4.2;
#array = (1, 4, 9, 16);
%map = ("foo" => 42, "bar" => 17, "baz" => 137);
As I understand it, this is so the reader can immediately identify what kind of object they're dealing with. It's not a matter of whether the reader is technologically inclined or not: if you reduce the programmer's cognitive load, he can use his brainpower for more important things than figuring out fiddly syntactic details.
Whether Perl's design is successful in this respect is another question, but I believe that's the reasoning behind the feature.
I'm currently enjoying the transition from an object oriented language to a functional language. It's a breath of fresh air, and I'm finding myself much more productive than before.
However - there is one aspect of OOP that I've not yet seen a satisfactory answer for on the FP side, and that is polymorphism. i.e. I have a large collection of data items, which need to be processed in quite different ways when they are passed into certain functions. For the sake of argument, let's say that there are multiple factors driving polymorphic behaviour so potentially exponentially many different behaviour combinations.
In OOP that can be handled relatively well using polymorphism: either through composition+inheritance or a prototype-based approach.
In FP I'm a bit stuck between:
Writing or composing pure functions that effectively implement polymorphic behaviours by branching on the value of each data item - feels rather like assembling a huge conditional or even simulating a virtual method table!
Putting functions inside pure data structures in a prototype-like fashion - this seems like it works but doesn't it also violate the idea of defining pure functions separately from data?
What are the recommended functional approaches for this kind of situation? Are there other good alternatives?
Putting functions inside pure data structures in a prototype-like fashion - this seems like it works but doesn't it also violate the idea of defining pure functions separately from data?
If virtual method dispatch is the way you want to approach the problem, this is a perfectly reasonable approach. As for separating functions from data, that is a distinctly non-functional notion to begin with. I consider the fundamental principle of functional programming to be that functions ARE data. And as for your feeling that you're simulating a virtual function, I would argue that it's not a simulation at all. It IS a virtual function table, and that's perfectly OK.
Just because the language doesn't have OOP support built in doesn't mean it's not reasonable to apply the same design principles - it just means you'll have to write more of the machinery that other languages provide built-in, because you're fighting against the natural spirit of the language you're using. Modern typed functional languages do have very deep support for polymorphism, but it's a very different approach to polymorphism.
Polymorphism in OOP is a lot like "existential quantification" in logic - a polymorphic value has SOME run-time type but you don't know what it is. In many functional programming languages, polymorphism is more like "universal quantification" - a polymorphic value can be instantiated to ANY compatible type its user wants. They're two sides of the exact same coin (in particular, they swap places depending on whether you're looking at a function from the "inside" or the "outside"), but it turns out to be extremely hard when designing a language to "make the coin fair", especially in the presence of other language features such as subtyping or higher-kinded polymorphism (polymorphism over polymorphic types).
If it helps, you may want to think of polymorphism in functional languages as something very much like "generics" in C# or Java, because that's exactly the type of polymorphism that, e.g., ML and Haskell, favor.
Well, in Haskell you can always make a type-class to achieve a kind of polymorphism. Basically, it is defining functions that are processed for different types. Examples are the classes Eq and Show:
data Foo = Bar | Baz
instance Show Foo where
show Bar = 'bar'
show Baz = 'baz'
main = putStrLn $ show Bar
The function show :: (Show a) => a -> String is defined for every data type that instances the typeclass Show. The compiler finds the correct function for you, depending on the type.
This allows to define functions more generally, for example:
compare a b = a < b
will work with any type of the typeclass Ord. This is not exactly like OOP, but you even may inherit typeclasses like so:
class (Show a) => Combinator a where
combine :: a -> a -> String
It is up to the instance to define the actual function, you only define the type - similar to virtual functions.
This is not complete, and as far as I know, many FP languages do not feature type classes. OCaml does not, it pushes that over to its OOP part. And Scheme does not have any types. But in Haskell it is a powerful way to achieve a kind of polymorphism, within limits.
To go even further, newer extensions of the 2010 standard allow type families and suchlike.
Hope this helped you a bit.
Who said
defining pure functions separately from data
is best practice?
If you want polymorphic objects, you need objects. In a functional language, objects can be constructed by glueing together a set of "pure data" with a set of "pure functions" operating on that data. This works even without the concept of a class. In this sense, a class is nothing but a piece of code that constructs objects with the same set of associated "pure functions".
And polymorphic objects are constructed by replacing some of those functions of an object by different functions with the same signature.
If you want to learn more about how to implement objects in a functional language (like Scheme), have a look into this book:
Abelson / Sussman: "Structure and Interpration of Computer programs"
Mike, both your approaches are perfectly acceptable, and the pros and cons of each are discussed, as Doc Brown says, in Chapter 2 of SICP. The first suffers from having a big type table somewhere, which needs to be maintained. The second is just traditional single-dispatch polymorphism/virtual function tables.
The reason that scheme doesn't have a built-in system is that using the wrong object system for the problem leads to all sorts of trouble, so if you're the language designer, which to choose? Single despatch single inheritance won't deal well with 'multiple factors driving polymorphic behaviour so potentially exponentially many different behaviour combinations.'
To synopsize, there are many ways of constructing objects, and scheme, the language discussed in SICP, just gives you a basic toolkit from which you can construct the one you need.
In a real scheme program, you'd build your object system by hand and then hide the associated boilerplate with macros.
In clojure you actually have a prebuilt object/dispatch system built in with multimethods, and one of its advantages over the traditional approach is that it can dispatch on the types of all arguments. You can (apparently) also use the heirarchy system to give you inheritance-like features, although I've never used it, so you should take that cum grano salis.
But if you need something different from the object scheme chosen by the language designer, you can just make one (or several) that suits.
That's effectively what you're proposing above.
Build what you need, get it all working, hide the details with macros.
The argument between FP and OO is not about whether data abstraction is bad, it's about whether the data abstraction system is the place to stuff all the separate concerns of the program.
"I believe that a programming language should allow one to define new data types. I do not believe that a program should consist solely of definitions of new data types."
http://www.haskell.org/haskellwiki/OOP_vs_type_classes#Everything_is_an_object.3F nicely discusses some solutions.
Alonzo Church's lambda calculus is the mathematical theory behind functional languages. Has object oriented programming some formal theory ?
Object Orientation comes from psychology not math.
If you think about it, it resembles more how humans work than how computers work.
We think in objects that we class-ify. For instance this table is a seating furniture.
Take Jean Piaget (1896-1980), who worked on a theory of children's cognitive development.
Wikipedia says:
Piaget also had a considerable effect in the field of computer science and artificial intelligence.
Some cognitive concepts he discovered (that imply to the Object Orientation concept):
Classification The ability to group objects together on the basis of common features.
Class Inclusion The understanding, more advanced than simple classification, that some classes or sets of objects are also sub-sets of a larger class. (E.g. there is a class of objects called dogs. There is also a class called animals. But all dogs are also animals, so the class of animals includes that of dogs)
Read more: Piaget's developmental theory http://www.learningandteaching.info/learning/piaget.htm#ixzz1CipJeXyZ
OOP is a bit of a mixed bag of features that various languages implement in slightly different ways. There is no single formal definition of OOP but a number of people have tried to describe OOP based on the common features of languages that claim to be object oriented. From Wikipedia:
Benjamin Cuire Pierce and some other researchers view as futile any attempt to distill OOP to a minimal set of features. He nonetheless identifies fundamental features that support the OOP programming style in most object-oriented languages:
Dynamic dispatch – when a method is invoked on an object, the object itself determines what code gets executed by looking up the method at run time in a table associated with the object. This feature distinguishes an object from an abstract data type (or module), which has a fixed (static) implementation of the operations for all instances. It is a programming methodology that gives modular component development while at the same time being very efficient.
Encapsulation (or multi-methods, in which case the state is kept separate)
Subtype polymorphism
object inheritance (or delegation)
Open recursion – a special variable (syntactically it may be a keyword), usually called this or self, that allows a method body to invoke another method body of the same object. This variable is late-bound; it allows a method defined in one class to invoke another method that is defined later, in some subclass thereof.
Abadi and Cardelli have written A Theory Of Objects, you might want to look into that. Another exposition is given the venerable TAPL (IIRC, they approach objects as recursive records in a typed lambda calculus). I don't really know much about this stuff.
One formal definition I've run into for strongly defining and constraining subtyping is the Liskov Substitution Principle. It is certainly not all of object-oriented programming, but yet it might serve as a link into the formal foundations in development.
I'd check out wikipedia's page on OO http://en.wikipedia.org/wiki/Object-oriented_programming It's got the principles and fundamentals and history.
My understanding is that it was an evolutionary progression of features and ideas in a variety of languages that finally came together with the push in the 90's for GUI's going mainstream. But i could be horribly wrong :-D
Edit: What's even more interesting is that people still argue about "what makes an OO language OO"..i'm not sure the feature set is even generally agreed upon that defines an OO language.
The history (simplified) goes that way :
First came the spagetti code.
Then came the procedural code (Like C and Pascal).
Then came modular code (Like in modula).
Then came the object oriented code (Like in smalltalk).
Whats the porpuse of object oriented programming ?
You can only understand if you recall history.
At first code was simply a sequence of instructions given to the computer (Literally in binary representation)
Then came the macro assemblers. With mneomonics for instructions.
Then people detected that sometimes you have code that is repeated around.
So they created GOTO. But GOTO (Or branch or jump etc) cannot return back to where it was called, and cannot give direct return values, nor can accept formal parameters (You had to use global variables).
Against the first problem, people created subroutines (GOSUB-like). Groups of instructions that could be called repeatedly and return to where it was called.
Then people detected that routines would be more usefull if they had parameters and could return values.
For this they created functions, procedures and calling conventions. Those abstractions where called on top of an abstraction called the stack.
The stack allows formal parameters, return values and something called recursion (direct or indirect).
With the stack and the ability for a function to be called arbitrarely (even indirectly), came the procedural programming, solving the GOTO problem.
But then came the large projects, and the necessity to group procedures into logical entities (modules).
Thats where you will understand why object oriented programming evolved.
When you have a module, you have module local variables.
Think about this :
Module MyScreenModule;
Var X, Y : Integer;
Procedure SetX(value : Integer);
Procedure SetY(value : Integer);
End Module.
There are X and Y variables that are local to that module. In that example, X and Y holds the position of the cursor. But lets suppose your computer has more than one screen. So what can we do now ? X and Y alone arent able to hold the X and Y values of all screens you have. You need a way to INSTANTIATE that information. Thats where the jump from modular programming goes to object oriented programming.
In a non object oriented language you would usually do :
Var Screens : Array of Record X, Y : Integer End;
And then pass a index value to each module call :
Procedure SetX(ScreenID : Integer; X : Integer);
Procedure SetY(ScreenID : Integer; Y : Integer);
Here screenid refers to wich of the multiple screens that you are talking about.
Object oriented inverts the relationship. Instead of a module with multiple data instances (Effectively what screenid does here), you make the data the first class citizen and attach code to it, like this :
Class MyScreenModule;
Field X, Y : Integer;
Procedure SetX(value : Integer);
Procedure SetY(value : Integer);
End Class.
Its almost same thing as a module !
Now you instantiate it by providing a implicit pointer to a instance, like :
ScreenNumber1 := New MyScreenModule;
And then proceed to use it :
ScreenNumber1::SetX(100);
You effectively turned your modular programming into a multi-instance programming where the variable holding the module pointer itself differentiates each instance. Gotcha ?
Its an evolution of the abstraction.
So now you have multiple-instances, whats the next level ?
Polymorphism. Etc. The rest is pretty standard object oriented lessons.
Whats the point ? The point is that object oriented (like procedures, like subroutines etc) did not evolve from a theoretical standpoint but from the praxys of many coders working around decades. Its a evolution of computer programming, a continual evolution.
IMO a good example of what makes a successful OO language could be found by comparing the similarities between JAVA and C#. Both are extremely popular and very similar. Though I think the general idea of a minimum OO language can be found by looking at Simula67. I believe the general idea behind Object Oriented programming to be that it makes it seem like the computer thinks more like a human, this is supported by things like inheritance (both class "mountain bike" and "road bike" belong to parent class "bicycle", and have the same basic features). Another important idea is that objects (which can be executable lines of code) can be passed around like variables, effectively allowing the program to edit itself based on certain criteria (although this point is highly arguable, I cite the ability to change every instance of an object based on one comparison). Another point to make is modularity. As entire programs could effectively be passed around like a variable (because everything is treated as an object), it becomes easier to modify large programs, by simply modifying the class or method being called, and never having to modify the main method. Because of this, expanding the functionality of a program can become much simpler. This is why web businesses love languages like C# and JAVA (which are full fledged OO). Gaming companies like C++ because it gives them some of the control of an imperative language (like C) and some of the features of object orientation (like C# or JAVA).
Object-oriented is a bit of a misnomer and I don't really think classes or type systems have much to do with OO programming. Alan Kay's inspiration was biological and what's really going on that matters is communication. A better name would be message-oriented programming. There is plenty of theory about messaging, for example Pi calculus and the actor model both have rigorous mathematical descriptions. And that is really just the tip of the iceberg.
What about Petri nets? Object might be a place, a composition an arc, messages tokens. I have not though about it very thoroughly, so there might be some flaws I am not aware of, but you can investigate - there is a lot of theoretical works related to Petri nets.
I found this, for example:
http://link.springer.com/book/10.1007%2F3-540-45397-0
Readable PDF: http://www.informatik.uni-hamburg.de/bib/medoc/M-329.pdf
In 2000 in my degree thesys I proposed this model; very shortly:
y + 1 = f(u, x)
x + 1 = g(u, x)
where:
u: input
y: output
x: state
f: output function
g: state function
Formally it's very similar to finite state machine, but the difference is that U, Y, S are set not finite, but infinite (numerable) and f and g are Touring Machine (TM).
f and g togheter form a class; if we add an initial state x0 we have an object. So OOP in something more than a TM a TM is a specific case of OOP. Note that the state x is of a different level than the state "inside" the TM. Inside the TM there are not side effect, the x state count for side effect.