Why is Clojure dynamically typed? - dynamic

One thing I like very much is reading about different programming languages. Currently, I'm learning Scala but that doesn't mean I'm not interested in Groovy, Clojure, Python, and many others. All these languages have a unique look and feel and some characteristic features. In the case of Clojure I don't understand one of these design decisions. As far as I know, Clojure puts great emphasis on its functional paradigm and pretty much forces you to use immutable "variables" wherever possible. So if half of your values are immutable, why is the language dynamically typed?
The Clojure website says:
First and foremost, Clojure is dynamic. That means that a Clojure program is not just something you compile and run, but something with which you can interact.
Well, that sounds completely strange. If a program is compiled you can't change it anymore. Sure you can "interact" with it, that's what UIs are used for but the website certainly doesn't mean a neat "dynamic" GUI.
How does Clojure benefit from dynamical typing
I mean the special case of Clojure and not general advantages of dynamic typing.
How does the dynamic type system help improve functional programming
Again, I know the pleasure of not spilling "int a;" all over the source code but type inference can ease a lot of the pain. Therefore I would just like to know how dynamic typing supports the concepts of a functional language.

If a program is compiled you can't change it anymore.
This is wrong. In image-based systems, like Lisp (Clojure can be seen as a Lisp dialect) and Smalltalk, you can change the compiled environment. Development in such a language typically means working on a running system, adding and changing function definitions, macro definitions, parameters etc. (adding means compiling and loading into the image).
This has a lot of benefits. For one, all the tools can interact directly with the program and do not need to guess at the system's behaviour. You also do not have any long compilation pauses, because each compiled unit is very small (it is very rare to recompile everything). The NASA JPL once corrected a running Lisp system on a probe hundreds of thousands of kilometres away in space.
For such a system, it is very natural to have type information available at runtime (that is what dynamic typing means). Of course, nothing hinders you from also doing type inference and type checks at compilation time. These concepts are orthogonal. Modern Lisp implementations typically can do both.

Well first of all Clojure is a Lisp and Lisps traditionally have always been dynamically typed.
Second as the excerpt you quoted said Clojure is a dynamic language. This means, among other things, that you can define new functions at runtime, evaluate arbitrary code at runtime and so on. All of these things are hard or impossible to do in statically typed languages (without plastering casts all over the place).
Another reason is that macros might complicate debugging type errors immensely. I imagine that generating meaningful error messages for type errors produced by macro-generated code would be quite a task for the compiler.

I agree, a purely functional language can still have an interactive read-eval-print-loop, and would have an easier time with type inference. I assume Clojure wanted to attract lisp programmers by being "lisp for the jvm", and chose to be dynamic like other lisps. Another factor is that type systems need to be designed as the very first step of the language, and it's faster for language implementors to just skip that step.

(I'm rephrasing the original answer since it generated too much misunderstanding)
One of the reasons to keep Clojure (and any Lisp) dynamically typed is to simplify creation of macros. In short, macros deal with abstract syntax trees (ASTs) which can contain nodes of many, many different types (usually, any objects at all). In theory, it's possible to make full statically typed macro system, but in practice such systems are usually limited and sparsely spread. Please, see examples below and extended discussion in the thread.
EDIT 2020: Wow, 9 years passed from the time I posted this answer, and people still add comments. What a legacy we all have left!
Some people noted in comments that having a statically typed language doesn't prevent you from expressing code as data structure. And, strictly speaking, it's true - union types allow to express data structures of any complexity, including syntax of a language. However I claim that to express the syntax, you must either reduce expressiveness, or use such wide unions that you lose all advantages of static typing. To prove this claim I will use another language - Julia.
Julia is optionally typed - you can constrain any function or struct field to have a particular type, and Julia will check it. The language supports AST as a first class citizen using Expr and Symbol types. Expression definition looks something like this:
struct Expr
head::Symbol
args::Vector{Any}
end
Expression consists of a head which is always a symbol and list of arguments which may have any types. Julia also supports special Union which can constrain argument to specific types, e.g. Symbols and other Exprs:
struct Expr
head::Symbol
args::Vector{Union{Symbol, Expr}}
end
Which is sufficient to express e.g. :(x + y):
dump(:(x + y))
Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol +
2: Symbol x
3: Symbol y
But Julia also supports a number of other types in expressions. One obvious and helpful example is literals:
:(x + 1)
Moreover, you can use interpolation or construct expressions manually to put any object to AST:
obj = create_some_object()
ex1 = :(x + $objs)
ex2 = Expr(:+, :x, obj)
These examples are not just a funny experiments, they are actively used in real code, especially in macros. So you cannot constrain expression arguments to a specific union of types - expressions may contain any values.
Of course, when designing a new language you can put any restrictions on it. Perhaps, restricting Expr to contain only Symbol, Expr and some Literals would be useful in some contexts. But it goes against principles of simplicity and flexibility in both - Julia and Clojure, and would significantly reduce usefulness of macros.

Because that's what the world/market needed. No sense in building what's already built.
I hear the JVM already has a statically typed language ;)

Related

Are there any interpreted languages in which you can dynamically modify the interpreter?

I've been thinking about this writing (apparently) by Mark Twain in which he starts off writing in English but throughout the text makes changes to the rules of spelling so that by the end he ends up with something probably best described as pseudo-German.
This made me wonder if there is interpreter for some established language in which one has access to the interpreter itself, so that you can change the syntax and structure of the language as you go along. For example, often an if clause is a keyword; is there a language that would let you change or redefine this on the fly? Imagine beginning a console session in one language, and by the end, working in another.
Clearly one could write an interpreter and run it, and perhaps there is no concrete distinction between doing this and modifying the interpreter. I'm not sure about this. Perhaps there are limits to the modifications you can make dynamically to any given interpreter?
These more open questions aside, I would simply like to know if there are any known interpreters that allow this at all? Or, perhaps, this ability is just a matter of extent and my question is badly posed.
There are certainly languages in which this kind of self-modifying behavior at the level of the language syntax itself is possible. Lisp programs can contain macros, which allow among other things the creation of new control constructs on the fly, to the extent that two Lisp programs that depend on extensive macro programming can look almost as if they are written in two different languages. Forth is somewhat similar in that a Forth interpreter provides a core set of just a dozen or so primitive operations on which a program must be built in the language of the problem domain (frequently some kind of real-world interaction that must be done precisely and programmatically, such as industrial robotics). A Forth programmer creates an interpreter that understands a language specific to the problem he or she is trying to solve, then writes higher-level programs in that language.
In general the common idea here is that of languages or systems that treat code and data as equivalent and give the user just as much power to modify one as the other. Every Lisp program is a Lisp data structure, for example. This is in contrast to a language such as Java, in which a sharp distinction is made between the program code and the data that it manipulates.
A related subject is that of self-modifying low-level code, which was a fairly common technique among assembly-language programmers in the days of minicomputers with complex instruction sets, and which spilled over somewhat into the early 8-bit and 16-bit microcomputer worlds. In this programming idiom, for purposes of speed or memory savings, a program would be written with the "awareness" of the location where its compiled or interpreted instructions would be stored in memory, and could alter in place the actual machine-level instructions byte by byte to affect its behavior on the fly.
Forth is the most obvious thing I can think of. It's concatenative and stack based, with the fundamental atom being a word. So you write a stream of words and they are performed in the order in which they're written with the stack being manipulated explicitly to effect parameter passing, results, etc. So a simple Forth program might look like:
6 3 + .
Which is the words 6, 3, + and .. The two numbers push their values onto the stack. The plus symbol pops the last two items from the stack, adds them and pushes the result. The full stop outputs whatever is at the top of the stack.
A fundamental part of Forth is that you define your own words. Since all words are first-class members of the runtime, in effect you build an application-specific grammar. Having defined the relevant words you might end up with code like:
red circle draw
That wold draw a red circle.
Forth interprets each sequence of words when it encounters them. However it distinguishes between compile-time and ordinary words. Compile-time words do things like have a sequence of words compiled and stored as a new word. So that's the equivalent of defining subroutines in a classic procedural language. They're also the means by which control structures are implemented. But you can also define your own compile-time words.
As a net result a Forth program usually defines its entire grammar, including relevant control words.
You can read a basic introduction here.
Prolog is an homoiconic language, allowing meta interpreters (MIs) to be declined in a variety of ways. A meta interpreter - interpreting the interpreter - is a common and useful native construct in Prolog.
See this page for an introduction to this argument. An interesting and practical technique illustrated is partial execution:
The overhead incurred by implementing these things using MIs can be compiled away using partial evaluation techniques.

I have heard "dynamic" changing during Runtime? Whats that?

i heard that these(say for example Groovy) languages have the capability of changing the variable name or call methods dynamically in runtime! What you meant by dynamic languages? And what is the real need for changing any values during runtime? Is that doen't lead to confusion, because at runtime if you change any value(or your programming constriants change anything), then whats the need for compilation(because it decides and confirms these values will be used, then there is no meaning of changing it in dynamically)? And i know there should be something useful, so only people have introduced these concepts!
I guess i'm clear about my question! And i need some brief explanation :)
Three points:
1) Yes, that is what is meant by 'dynamic languages' -- that you can add methods to a class at runtime is a common feature of dynamic languages (for example).
2) You bring up a good point that this ability could lead to confusing runtime issues. Proponents of dynamic languages would say the benefits of the feature outway the downsides. Being able to meta program can be very powerful.
3) compile time checks can still help in development with dynamic languages, if nothing more than for syntax checks. However, with dynamic languages, you do lose some of the compile time safety of non-dynamic languages. Note that some dynamic languages are interpreted (e.g. javascript) so its kind of a moot point.

Basic F# questions: mutability, capitalization standards, functions vs. methods

Feel free to point me to other answers if these have already been asked!
I'm just starting F# with the new release this month. I've got some background in both OO and functional languages (Haskell and Scheme, but not OCaml/ML). A couple of questions have arisen so far from reading through the little tutorial thing that comes with the F# CTP.
1) Are mutable variables preferred over monads? If so, are monads entirely shunned in F#?
2) I'm a tad confused by the capitalization being used. In this tutorial code file, sometimes functions start with a lowercase letter, and sometimes uppercase. I know MS tends to like initial caps with functions and methods, but here it seems like there are two ways of doing it. It's not a huge deal to me, as I'm just playing around on my own time, but I am curious what the standard is.
3) I'm pretty confused about this whole combination of OO and functional styles. print_string "string" makes sense, but then here is List.map fn list (unless List is just the namespace, forgive me if so). Then here is str.Length. Anyone care to elucidate when to use what, and which is preferred?
Thanks!
Regarding mutability: F# allows you to be pragmatic. I rarely prefer a state monad to a mutable/ref, but you can do whichever you like. Monads are not shunned, but I think people tend to only use them when they're a clear-cut win (e.g. async programming).
Regarding naming: there is a tension in the fact that 'functions are values' means you might choose to name a let-bound function with a capital letter (because it's a function, and functions begin with capitals) or with a lower-case letter (because it's a let-bound value (that just happens to have an '->' in its type name)). Personally I prefer to always use upper-case names for all functions, but you'll see both styles (especially since the style of the F# library itself has been slowly evolving/standardizing over the past year or two). Underscores seem to be shunned throughout .Net, and the F# library is no longer an exception (there are a few names left that use underscores, but they stand out now like a sore thumb and will probably be changed).
Regarding function style: I am unclear what you are asking. In the case of List.map, 'List' is the name of an F# module, and 'map' is a function in that module. Member functions (e.g. str.Length) have the advantage of being commonly used throughout .Net, and provide a nice intellisense experience in the editor.
1) I wouldn't go so far as to say monads are shunned... You mentioned you have some background with Haskell - so F# workflows are what you want to look into (the term workflow maybe confusing but these only have a tiny bit to do with business process stuff). In general, sequence expressions and more generally computational expressions (a.k.a workflows) are going to be close to monads. That said mutable is pretty common though I'm not sure 'preferred' would be the way to express it. They each have a place - Personally, I started with two books -- 'Foundation of F#' - but if you're looking to dive in - go 'Expert F#' -- both are good. Honestly, I needed Foundations to help me get started.
2) In the experience I've had, the confusion is that traditional functions in .NET have a convention that really doesn't lend itself to functional programming. As such, I feel that in F# you the confusion can sometimes be when you're looking at usage of 'traditional .NET' named functions vs. elements of F# that are clearly functional... For example, in the book I mention above 'Expert F#' - they mention that you'll see let values such as List.map and Dates.Today in both camelCase and PascalCase (Pascal case being a more traditional .NET). A good rule of thumb is that if you're staying in the functional world - use more traditional functional (camelCase) naming - however if what you're making is expected to be used by other .NET languages, go with the more .NET norm (Pascal). Also note in the functional world there is a much higher tolerance for abbreviation (itr, tbl, etc... ) where as .NET in general has went away from this... So again, you'll see a varying degree of this sort of thing based on if you're calling functional elements vs. elements exposed in F# that are shared across the whole of .NET.
3) I agree that the combination of OO and function styles can be confusing. Again, I'm not sure which is preferred, beyond saying that F# (being functional) clearly styles itself in terms of the functional paradigm. However, F# is a functional language in the .NET world (notice the relative confusion of naming I mention above)... So again, it's not totally clear cut.
Hope this helps... Believe it or not, as you use F# and think of other languages that have shared concepts C++ with C (for example) - it gets easier. Personally, the naming began to make sense and the concepts worked as I started to get that I was a F# is a functional language operating on a traditional platform - made to interoperate (though I'm not sure calling the interoperation seamless would be appropriate :D)
In general monads ("workflows" in F#) are much rarer than in Haskell, firstly because mutable variables are available so it's unlikely that people would choose to use a state monad instead. Secondly there are no higher-kinded type variables, which means that you can't write code that is overloaded to work on any monad, which makes using them much less attractive.
One place that they are commonly used is in sequence expressions (which are really like list comprehensions in Haskell, but those are closely related to the list monad).

what would be the impediments to creating an "Europanto" type universal scripting language?

After switching back and forth between several scripting languages this week, I found myself thinking how similar they all are. Yet I'm always reaching for Google (or nowadays SO) to remember details like what the local equivalents of "instanceof" and "endswith" are, or the right syntax to declare an interface, or whatever.
This reminded me of the (human) language Europonto. Just pick some vaguely English syntax and some vaguely Romance/Germanic/Slavic vocabulary, and it's all good!
So what would happen if we tried to do the same thing with a scripting language. In the mood for Python-style indented blocks today? Fine! Want to use a prototype object? Ok! Can only remember how to spell the PHP names of some library function? No problem!
Anyway, that's the wild and crazy idea. Since we need a question that admits concrete answers, let's tighten it up like this:
What would be the most significant conflicts in creating a scripting language that permitted all the native syntax and library functions of [Python, Ruby, PHP, Perl, shell, and JavaScript], such that you could freely intermix code blocks and function names between languages?
And let's say that any particular construction should be consistent at the statement level. So we'll allow:
foreach( $foo as $bar )
{
if $foo == 2:
print "hi"
}
but not, say,
foreach( $foo as $bar )
{
if $foo == 2:
print "hi"
endif
end
Conflicts can include: parser ambiguities; name collision; conflicting semantics for objects or functions or closures; etc. I'm guessing that scope will be a ginormous issue, but you tell me.
I'll start this as "community wiki" from the get go, so if you think it's a fun question but want to make it more rigorous, feel free to edit.
I would suggest that the main problem is recognising what the syntax of each statement is supposed to be.
In any case, what is the point? Almost all scripting languages have facilities to do much the same things, which is why people tend to master one that they use consistently, and stick with it.
The main difficulty would to be to allow people maintain it. With a well defined language you can only print a certain way and do sys.argv a certain way. once you allow multiple syntaxes there is no sane way to search for all the sys.argv in the code base you have.
At the syntactical level the only problem I can see would be to detect which block has which syntax, then separate them and parse them with specific parsers. Of course given very small statements there could be ambiguities as to which language it is and you could argue that it doesn't matter, but it just may be the case, that in different languages the same string of characters does different things so this could be a subtle issue.
At the API level you would have lots of different methods of doing the same thing but in a subtly different way or subset of doing it. So for example you could have no way of doing Java's string.startsWith() in let's say PHP, so you would do something different, or no way of doing PHP's strstr() (which returns a part of the string from the found needle to the end) and you would implement something different for that or even think differently about the problem. Then you would have to have all those different API methods of doing the same things and that would be huge API to implement, support and (god forbid) learn.
At the wetware level the code written by others would be totally unreadable unless you know a ton of languages and their subtle differences. I think it is difficult enough to learn a single programming language to the smallest details and so it is not practical at all to have this kind of frankensteinish beast created. I can think of an exception for use as an algorithm description language which it already is used in universities all over the world, where teacher takes some language of his liking and makes the code as readable as it can be for a human without needing to implement a parser for it.
As a side note I think this kind of system could be implemented at the least effort by somehow utilizing .NET's CLR where you have a ton of different languages each compiling to the same bytecode and accessing the same variables and stuff. All you'd need to do is split the code to clusters of different languages, then compile them separately on their respective compilers and then just merge the bytecode and somehow make sure they all point to the same variables and functions when mentioning the same names across the different languages.
I have begun to see that syntax is but one property of a language. And most of them look like C to me. The purpose of a language (object oriented, strong typing, etc) is something else again. It starts to look like syntax is not the most important aspect.
I went and read the wikipedia entry...
Europanto is a linguistic jest presented as a "constructed language" with a hodge-podge vocabulary
"Hodge-podge" sounds like the way Perl has been described to me!
I found a rather detailed discussion of closures in Ruby. It sounds like getting Ruby's behavior to coexist with JavaScript's or Python's would require some kind of ugly disambiguation.
If anybody were to add Perl to the list of languages to be covered, I think its lexical scoping rules would present a related problem?

Inversion of Control in Compilers

Has anyone out there actually used inversion of control containers within compiler implementations yet? I know that by design, compilers need to be very fast, but I've always been curious about how IoC/DI could affect the construction of a programming language--hot-swappable syntaxes, anyone?
Lisp-style languages often do this. Reader macros are pieces of user-written code which extend the reader (and hence, the syntax) of a language. Plain-old macros are pieces of user-written code which also extend the language.
The entire syntaxes aren't hot-swappable, but certain pieces are extendable in various ways.
All this isn't a new idea. Before it was deemed worthy of a three-letter acronym, IoC was known as "late binding", and pretty agreed on as a Good Idea.
LR(k) grammars typically use a generic parser system driven by tables (action-goto/shift-reduce tables), so you use a table generator tool which produces these tables and feed them to the generic parser system which then can parse your input using the tables. In general these parser systems then signal you that a non-terminal has been reduced. See for example the GoldParser system which is free.
I wouldn't really call it inversion of control because it's natural for compilers. They are usually a series of passes that transform code in the input language to code in the output language. You can, of course, swap in a different pass (for example, gcc compiles multiple languages by using a different frontend).