what would be the impediments to creating an "Europanto" type universal scripting language? - scripting

After switching back and forth between several scripting languages this week, I found myself thinking how similar they all are. Yet I'm always reaching for Google (or nowadays SO) to remember details like what the local equivalents of "instanceof" and "endswith" are, or the right syntax to declare an interface, or whatever.
This reminded me of the (human) language Europonto. Just pick some vaguely English syntax and some vaguely Romance/Germanic/Slavic vocabulary, and it's all good!
So what would happen if we tried to do the same thing with a scripting language. In the mood for Python-style indented blocks today? Fine! Want to use a prototype object? Ok! Can only remember how to spell the PHP names of some library function? No problem!
Anyway, that's the wild and crazy idea. Since we need a question that admits concrete answers, let's tighten it up like this:
What would be the most significant conflicts in creating a scripting language that permitted all the native syntax and library functions of [Python, Ruby, PHP, Perl, shell, and JavaScript], such that you could freely intermix code blocks and function names between languages?
And let's say that any particular construction should be consistent at the statement level. So we'll allow:
foreach( $foo as $bar )
{
if $foo == 2:
print "hi"
}
but not, say,
foreach( $foo as $bar )
{
if $foo == 2:
print "hi"
endif
end
Conflicts can include: parser ambiguities; name collision; conflicting semantics for objects or functions or closures; etc. I'm guessing that scope will be a ginormous issue, but you tell me.
I'll start this as "community wiki" from the get go, so if you think it's a fun question but want to make it more rigorous, feel free to edit.

I would suggest that the main problem is recognising what the syntax of each statement is supposed to be.
In any case, what is the point? Almost all scripting languages have facilities to do much the same things, which is why people tend to master one that they use consistently, and stick with it.

The main difficulty would to be to allow people maintain it. With a well defined language you can only print a certain way and do sys.argv a certain way. once you allow multiple syntaxes there is no sane way to search for all the sys.argv in the code base you have.

At the syntactical level the only problem I can see would be to detect which block has which syntax, then separate them and parse them with specific parsers. Of course given very small statements there could be ambiguities as to which language it is and you could argue that it doesn't matter, but it just may be the case, that in different languages the same string of characters does different things so this could be a subtle issue.
At the API level you would have lots of different methods of doing the same thing but in a subtly different way or subset of doing it. So for example you could have no way of doing Java's string.startsWith() in let's say PHP, so you would do something different, or no way of doing PHP's strstr() (which returns a part of the string from the found needle to the end) and you would implement something different for that or even think differently about the problem. Then you would have to have all those different API methods of doing the same things and that would be huge API to implement, support and (god forbid) learn.
At the wetware level the code written by others would be totally unreadable unless you know a ton of languages and their subtle differences. I think it is difficult enough to learn a single programming language to the smallest details and so it is not practical at all to have this kind of frankensteinish beast created. I can think of an exception for use as an algorithm description language which it already is used in universities all over the world, where teacher takes some language of his liking and makes the code as readable as it can be for a human without needing to implement a parser for it.
As a side note I think this kind of system could be implemented at the least effort by somehow utilizing .NET's CLR where you have a ton of different languages each compiling to the same bytecode and accessing the same variables and stuff. All you'd need to do is split the code to clusters of different languages, then compile them separately on their respective compilers and then just merge the bytecode and somehow make sure they all point to the same variables and functions when mentioning the same names across the different languages.

I have begun to see that syntax is but one property of a language. And most of them look like C to me. The purpose of a language (object oriented, strong typing, etc) is something else again. It starts to look like syntax is not the most important aspect.
I went and read the wikipedia entry...
Europanto is a linguistic jest presented as a "constructed language" with a hodge-podge vocabulary
"Hodge-podge" sounds like the way Perl has been described to me!

I found a rather detailed discussion of closures in Ruby. It sounds like getting Ruby's behavior to coexist with JavaScript's or Python's would require some kind of ugly disambiguation.
If anybody were to add Perl to the list of languages to be covered, I think its lexical scoping rules would present a related problem?

Related

Any Data Structures & Algorithms books w/ examples in Objective-C or other Keyword Message Language?

I've tried searching for Data Structure / Algorithms books that provide examples in either Objective-C, or another language supporting keyword message syntax, to no avail.
The reason I'm interested in this is because I really think the keyword syntax would help me understand the intent of code, which I find I have to think longer about in languages with typical function call syntax.
A good example is this snippet from a SplayTree implementation in C:
/* Continue down the tree. */
n = splay_tree_splay_helper (sp, key, next, node, parent);
The function name is pretty unhelpful, and even with the comment I have to thoroughly read the code to have any idea what's really happening there.
I know that technically any piece of C code is valid Objective-C, but I'm looking for something that structures algorithm implementations utilizing a good object model like Objective-C's since I believe the resulting code is more maintainable. This may seem counter-intuitive in the performance restricted space of algorithm design, but I've seen plenty of Algorithms books that have examples in idiomatic Ruby, Python, Javascript etc.
Basically I'm looking for anything with a good object model that allows for very descriptive keyword messages, whether it's Objective-C or even (though probably unlikely) anything else in the Smalltalk family.
Why would you want a book? Just download a smalltalk environment and read the whole actual source. Open a system browser, select one of the Collections categories (collection of classes) and start browsing the code (the extra column is for message categories). Open a workspace, type Object cmd-B (or ctrl-B, for browse) and see for yourself why the single responsibility principle was invented. Navigate through the code with hierarchy, senders and implementors.
I think you are looking for the wrong thing.
A good algorithms and data structures books will try to not waste your time with hard to read source code. Most of the good books I know spend most of their time explaining things at a high level and only show actual code in small snippets that can be easily understood independently of the language used and how proficient you are with it.
It doesn't matter how convoluted some guy's implementation of splay trees is. As long as you know what the splay tree is you should be able to implement your own version without looking at hit too much.
And finally, a good object model and nice syntax is not the be-all-end-all of things. Many datastructures make use of union types that are not very nicely implemented in OO style and the naming patterns and syntax are things you should be able to get used to very quickly.

What is the point of the lower camel case variable casing convention (thisVariable, for example)?

I hope this doesn't get closed due to being too broad. I know it comes down to personal preference, but there is an origin to all casing conventions and I would like to know where this one came from and a logical explanation as to why people use it.
It's where you go all like var empName;. I call that lower camel, although it's probably technically called something else. Personally, I go like var EmpName. I call that proper camel and I like it.
When I first started programming, I began with the lower camel convention. I didn't know why. I just followed the examples set by all the old guys. Variables and functions (VB) got lower camel while subs and properties got proper camel. Then, after I finally acquired a firm grasp on programming itself, I became comfortable enough to question the tactics of my mentors. It didn't make logical sense to me to use lower camel because it wasn't consistent, especially if you have a variable that consists of one word which ends up being in all lowercase. There is also no validation mechanism in place to make sure you are appropriately using lower vs. upper camel, so I asked why not just use proper camel for everything. It's consistent since all variable names are subject to proper camelization.
Having dug deeper into it, it turns out that this is a very sensitive issue to many programmers when it is brought to question. They usually answer with, "Well, it's just personal preference" or "That's just how I learned it". Upon prodding further, it usually invokes a sort of dogmatic reaction with the person as I attempt to find a logical reason behind their use of lower camel.
So anyone want to shed a little history and logic behind casing of the proper camelatory variety?
It's a combination of two things:
The convention of variables starting with lower case, to differentiate from classes or other entities which use a capital. This is also sometimes used to differentiate based on access level (private/public)
CamelCasing as a way to make multi-word names more readable without spaces (of course this is a preference over underscore, which some people use). I would guess the logic is that CamelCasing is easier/faster for some to type than word_underscores.
Whether or not it gets used is of course up to whomever is setting the coding standards that govern the code being written. Underscores vs CamelCase, lowercasevariables vs Uppercasevariables. CamelCase + lowercasevariable = camelCase
In languages like C# or VB, the standard is to start private things with lowercase and start public/protected things with uppercase. This way, just by looking at the first letter you can tell whether the thing you are messing could be used by other classes and thus any changes need more scrutiny. Also, there are tools to enforce naming conventions like this. The one created/used internally at Microsoft is called StyleCop and is available as a free download.
Historically, well named variables in C (a case-sensitive language) consisted of a single word in lower case. UPPERCASE was reserved for macros.
Then came along C++, where classes are usually CapitalizedAndCamelCased, and variables/functions consisting of several words are camelCased. (Note that C people tend to dislike camelCase, and instead write identifiers_this_way.
From there, it spread.
And, yes, probably other case-sensitive languages have had some influence.
lowerCamelCase I think has become popular because of java and javascript.
In java, it is specifically defined why, that the first word should be a verb with small letters where the remaining words start with a capital letter.
The reason why java chose lowerCamelCase I think depends on what they wanted to solve. Java was launched in 1995 as a language that would make programming easy. C/C++ that was often used was often considered difficult and too technical.
This was something java claimed to solve, more people would be able to program and the same code would work on different hardware. The code was the documentation, you didn't need to comment code, just read and everything would be great.
lowerCamelCase makes it harder to write "technical" code because it removes options to use uppercase and lowercase letters to better describe the code from a technical perspective. Java didn't want to be hard, java was the language to use where everyone could learn to program.
javascript in browsers was created in 10 days by Brendan Eich in 1995. Why javascript selected lowerCamelCase I think is because of java. It has nothing to do with java but it has "java" in its name "javascript".

Common variable names in different languages

I see a lot of different styles of variable names used in different kind of languages. Sometimes these names are lowercase and using underscores (i.e. test_var) and other times I see variables like testVar.
Is there a specific reason why programmers use different variable name styles in different languages?
It's really just the convention for that programming language.
For example, most Java programs use camel-casing (testVar) while a lot of C programs use _ to seperate words (test_var).
It's completely the choice of the programmer, but most languages have "standard" naming conventions.
As Wiki says :
Reasons for using a naming convention (as opposed to allowing programmers to choose any character sequence) include the following:
to reduce the effort needed to read and understand source code;1
to enhance source code appearance (for example, by disallowing overly long names or abbreviations).
Also there are code conventions in companies that care about readability of their code.
This simplify the code sharing between programmers and they don't spend time to understand what means variables name "aaa" and "bbb".
There is no real reason. Each language and sometimes even platform can have varying naming conventions.
For instance, in .Net TestVar would be seen if it was a public class variable. In C++, testVar would probably be opted for. In Ruby, test_var, etc. It's just a matter of preference by the community and/or creators.
I urge you to follow language standards. I work on a team that has had many developers working on the code over the years, and very few standards have been followed. The majority of our code is nearly unreadable. I have been working on a standardization project for the last several months. It has been very difficult to enforce and get buy-in. I'm hopeful that people will come around as they start seeing the benefits of easy to read code.
For naming conventions/standards keep this in mind:
Follow team/company standards
Follow language standards
Follow the style that the program is already using
Do whatever you want (Not really - if you don't have standards follow
your language standards/conventions.)

Why is Clojure dynamically typed?

One thing I like very much is reading about different programming languages. Currently, I'm learning Scala but that doesn't mean I'm not interested in Groovy, Clojure, Python, and many others. All these languages have a unique look and feel and some characteristic features. In the case of Clojure I don't understand one of these design decisions. As far as I know, Clojure puts great emphasis on its functional paradigm and pretty much forces you to use immutable "variables" wherever possible. So if half of your values are immutable, why is the language dynamically typed?
The Clojure website says:
First and foremost, Clojure is dynamic. That means that a Clojure program is not just something you compile and run, but something with which you can interact.
Well, that sounds completely strange. If a program is compiled you can't change it anymore. Sure you can "interact" with it, that's what UIs are used for but the website certainly doesn't mean a neat "dynamic" GUI.
How does Clojure benefit from dynamical typing
I mean the special case of Clojure and not general advantages of dynamic typing.
How does the dynamic type system help improve functional programming
Again, I know the pleasure of not spilling "int a;" all over the source code but type inference can ease a lot of the pain. Therefore I would just like to know how dynamic typing supports the concepts of a functional language.
If a program is compiled you can't change it anymore.
This is wrong. In image-based systems, like Lisp (Clojure can be seen as a Lisp dialect) and Smalltalk, you can change the compiled environment. Development in such a language typically means working on a running system, adding and changing function definitions, macro definitions, parameters etc. (adding means compiling and loading into the image).
This has a lot of benefits. For one, all the tools can interact directly with the program and do not need to guess at the system's behaviour. You also do not have any long compilation pauses, because each compiled unit is very small (it is very rare to recompile everything). The NASA JPL once corrected a running Lisp system on a probe hundreds of thousands of kilometres away in space.
For such a system, it is very natural to have type information available at runtime (that is what dynamic typing means). Of course, nothing hinders you from also doing type inference and type checks at compilation time. These concepts are orthogonal. Modern Lisp implementations typically can do both.
Well first of all Clojure is a Lisp and Lisps traditionally have always been dynamically typed.
Second as the excerpt you quoted said Clojure is a dynamic language. This means, among other things, that you can define new functions at runtime, evaluate arbitrary code at runtime and so on. All of these things are hard or impossible to do in statically typed languages (without plastering casts all over the place).
Another reason is that macros might complicate debugging type errors immensely. I imagine that generating meaningful error messages for type errors produced by macro-generated code would be quite a task for the compiler.
I agree, a purely functional language can still have an interactive read-eval-print-loop, and would have an easier time with type inference. I assume Clojure wanted to attract lisp programmers by being "lisp for the jvm", and chose to be dynamic like other lisps. Another factor is that type systems need to be designed as the very first step of the language, and it's faster for language implementors to just skip that step.
(I'm rephrasing the original answer since it generated too much misunderstanding)
One of the reasons to keep Clojure (and any Lisp) dynamically typed is to simplify creation of macros. In short, macros deal with abstract syntax trees (ASTs) which can contain nodes of many, many different types (usually, any objects at all). In theory, it's possible to make full statically typed macro system, but in practice such systems are usually limited and sparsely spread. Please, see examples below and extended discussion in the thread.
EDIT 2020: Wow, 9 years passed from the time I posted this answer, and people still add comments. What a legacy we all have left!
Some people noted in comments that having a statically typed language doesn't prevent you from expressing code as data structure. And, strictly speaking, it's true - union types allow to express data structures of any complexity, including syntax of a language. However I claim that to express the syntax, you must either reduce expressiveness, or use such wide unions that you lose all advantages of static typing. To prove this claim I will use another language - Julia.
Julia is optionally typed - you can constrain any function or struct field to have a particular type, and Julia will check it. The language supports AST as a first class citizen using Expr and Symbol types. Expression definition looks something like this:
struct Expr
head::Symbol
args::Vector{Any}
end
Expression consists of a head which is always a symbol and list of arguments which may have any types. Julia also supports special Union which can constrain argument to specific types, e.g. Symbols and other Exprs:
struct Expr
head::Symbol
args::Vector{Union{Symbol, Expr}}
end
Which is sufficient to express e.g. :(x + y):
dump(:(x + y))
Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol +
2: Symbol x
3: Symbol y
But Julia also supports a number of other types in expressions. One obvious and helpful example is literals:
:(x + 1)
Moreover, you can use interpolation or construct expressions manually to put any object to AST:
obj = create_some_object()
ex1 = :(x + $objs)
ex2 = Expr(:+, :x, obj)
These examples are not just a funny experiments, they are actively used in real code, especially in macros. So you cannot constrain expression arguments to a specific union of types - expressions may contain any values.
Of course, when designing a new language you can put any restrictions on it. Perhaps, restricting Expr to contain only Symbol, Expr and some Literals would be useful in some contexts. But it goes against principles of simplicity and flexibility in both - Julia and Clojure, and would significantly reduce usefulness of macros.
Because that's what the world/market needed. No sense in building what's already built.
I hear the JVM already has a statically typed language ;)

Basic F# questions: mutability, capitalization standards, functions vs. methods

Feel free to point me to other answers if these have already been asked!
I'm just starting F# with the new release this month. I've got some background in both OO and functional languages (Haskell and Scheme, but not OCaml/ML). A couple of questions have arisen so far from reading through the little tutorial thing that comes with the F# CTP.
1) Are mutable variables preferred over monads? If so, are monads entirely shunned in F#?
2) I'm a tad confused by the capitalization being used. In this tutorial code file, sometimes functions start with a lowercase letter, and sometimes uppercase. I know MS tends to like initial caps with functions and methods, but here it seems like there are two ways of doing it. It's not a huge deal to me, as I'm just playing around on my own time, but I am curious what the standard is.
3) I'm pretty confused about this whole combination of OO and functional styles. print_string "string" makes sense, but then here is List.map fn list (unless List is just the namespace, forgive me if so). Then here is str.Length. Anyone care to elucidate when to use what, and which is preferred?
Thanks!
Regarding mutability: F# allows you to be pragmatic. I rarely prefer a state monad to a mutable/ref, but you can do whichever you like. Monads are not shunned, but I think people tend to only use them when they're a clear-cut win (e.g. async programming).
Regarding naming: there is a tension in the fact that 'functions are values' means you might choose to name a let-bound function with a capital letter (because it's a function, and functions begin with capitals) or with a lower-case letter (because it's a let-bound value (that just happens to have an '->' in its type name)). Personally I prefer to always use upper-case names for all functions, but you'll see both styles (especially since the style of the F# library itself has been slowly evolving/standardizing over the past year or two). Underscores seem to be shunned throughout .Net, and the F# library is no longer an exception (there are a few names left that use underscores, but they stand out now like a sore thumb and will probably be changed).
Regarding function style: I am unclear what you are asking. In the case of List.map, 'List' is the name of an F# module, and 'map' is a function in that module. Member functions (e.g. str.Length) have the advantage of being commonly used throughout .Net, and provide a nice intellisense experience in the editor.
1) I wouldn't go so far as to say monads are shunned... You mentioned you have some background with Haskell - so F# workflows are what you want to look into (the term workflow maybe confusing but these only have a tiny bit to do with business process stuff). In general, sequence expressions and more generally computational expressions (a.k.a workflows) are going to be close to monads. That said mutable is pretty common though I'm not sure 'preferred' would be the way to express it. They each have a place - Personally, I started with two books -- 'Foundation of F#' - but if you're looking to dive in - go 'Expert F#' -- both are good. Honestly, I needed Foundations to help me get started.
2) In the experience I've had, the confusion is that traditional functions in .NET have a convention that really doesn't lend itself to functional programming. As such, I feel that in F# you the confusion can sometimes be when you're looking at usage of 'traditional .NET' named functions vs. elements of F# that are clearly functional... For example, in the book I mention above 'Expert F#' - they mention that you'll see let values such as List.map and Dates.Today in both camelCase and PascalCase (Pascal case being a more traditional .NET). A good rule of thumb is that if you're staying in the functional world - use more traditional functional (camelCase) naming - however if what you're making is expected to be used by other .NET languages, go with the more .NET norm (Pascal). Also note in the functional world there is a much higher tolerance for abbreviation (itr, tbl, etc... ) where as .NET in general has went away from this... So again, you'll see a varying degree of this sort of thing based on if you're calling functional elements vs. elements exposed in F# that are shared across the whole of .NET.
3) I agree that the combination of OO and function styles can be confusing. Again, I'm not sure which is preferred, beyond saying that F# (being functional) clearly styles itself in terms of the functional paradigm. However, F# is a functional language in the .NET world (notice the relative confusion of naming I mention above)... So again, it's not totally clear cut.
Hope this helps... Believe it or not, as you use F# and think of other languages that have shared concepts C++ with C (for example) - it gets easier. Personally, the naming began to make sense and the concepts worked as I started to get that I was a F# is a functional language operating on a traditional platform - made to interoperate (though I'm not sure calling the interoperation seamless would be appropriate :D)
In general monads ("workflows" in F#) are much rarer than in Haskell, firstly because mutable variables are available so it's unlikely that people would choose to use a state monad instead. Secondly there are no higher-kinded type variables, which means that you can't write code that is overloaded to work on any monad, which makes using them much less attractive.
One place that they are commonly used is in sequence expressions (which are really like list comprehensions in Haskell, but those are closely related to the list monad).