Related
I've been thinking about this writing (apparently) by Mark Twain in which he starts off writing in English but throughout the text makes changes to the rules of spelling so that by the end he ends up with something probably best described as pseudo-German.
This made me wonder if there is interpreter for some established language in which one has access to the interpreter itself, so that you can change the syntax and structure of the language as you go along. For example, often an if clause is a keyword; is there a language that would let you change or redefine this on the fly? Imagine beginning a console session in one language, and by the end, working in another.
Clearly one could write an interpreter and run it, and perhaps there is no concrete distinction between doing this and modifying the interpreter. I'm not sure about this. Perhaps there are limits to the modifications you can make dynamically to any given interpreter?
These more open questions aside, I would simply like to know if there are any known interpreters that allow this at all? Or, perhaps, this ability is just a matter of extent and my question is badly posed.
There are certainly languages in which this kind of self-modifying behavior at the level of the language syntax itself is possible. Lisp programs can contain macros, which allow among other things the creation of new control constructs on the fly, to the extent that two Lisp programs that depend on extensive macro programming can look almost as if they are written in two different languages. Forth is somewhat similar in that a Forth interpreter provides a core set of just a dozen or so primitive operations on which a program must be built in the language of the problem domain (frequently some kind of real-world interaction that must be done precisely and programmatically, such as industrial robotics). A Forth programmer creates an interpreter that understands a language specific to the problem he or she is trying to solve, then writes higher-level programs in that language.
In general the common idea here is that of languages or systems that treat code and data as equivalent and give the user just as much power to modify one as the other. Every Lisp program is a Lisp data structure, for example. This is in contrast to a language such as Java, in which a sharp distinction is made between the program code and the data that it manipulates.
A related subject is that of self-modifying low-level code, which was a fairly common technique among assembly-language programmers in the days of minicomputers with complex instruction sets, and which spilled over somewhat into the early 8-bit and 16-bit microcomputer worlds. In this programming idiom, for purposes of speed or memory savings, a program would be written with the "awareness" of the location where its compiled or interpreted instructions would be stored in memory, and could alter in place the actual machine-level instructions byte by byte to affect its behavior on the fly.
Forth is the most obvious thing I can think of. It's concatenative and stack based, with the fundamental atom being a word. So you write a stream of words and they are performed in the order in which they're written with the stack being manipulated explicitly to effect parameter passing, results, etc. So a simple Forth program might look like:
6 3 + .
Which is the words 6, 3, + and .. The two numbers push their values onto the stack. The plus symbol pops the last two items from the stack, adds them and pushes the result. The full stop outputs whatever is at the top of the stack.
A fundamental part of Forth is that you define your own words. Since all words are first-class members of the runtime, in effect you build an application-specific grammar. Having defined the relevant words you might end up with code like:
red circle draw
That wold draw a red circle.
Forth interprets each sequence of words when it encounters them. However it distinguishes between compile-time and ordinary words. Compile-time words do things like have a sequence of words compiled and stored as a new word. So that's the equivalent of defining subroutines in a classic procedural language. They're also the means by which control structures are implemented. But you can also define your own compile-time words.
As a net result a Forth program usually defines its entire grammar, including relevant control words.
You can read a basic introduction here.
Prolog is an homoiconic language, allowing meta interpreters (MIs) to be declined in a variety of ways. A meta interpreter - interpreting the interpreter - is a common and useful native construct in Prolog.
See this page for an introduction to this argument. An interesting and practical technique illustrated is partial execution:
The overhead incurred by implementing these things using MIs can be compiled away using partial evaluation techniques.
Can PMD be customized to fully support a new language, in a reasonable amount of time. I mean I know that technically almost anything can be done, but im wondering if this can be done in a reasonable amount of time? E.g. < 2 weeks
This page mentions how to write a CPD parser http://pmd.sourceforge.net/cpd-parser-howto.html
But is this just for copy / paste detection? Does writing a CPD parser give me full support of PMD in terms of rile sets?
I would guess not, but I'm not a PMD expert (and I have my own bias, check my bio).
The issues are:
Can you define a syntax for my langauge quickly (maybe, depending on how good you are, how messy the language is, and the strength of the parsing machinery offered by PMD)
Can you define the semantics of my language so that "semantic checks" provided by PMD work. You have to do this, because syntax tells you (and a tool) literally nothing about semantic of the syntax. I would guess that the PMD tool 'semantic checks' are pretty wired into the precise details of Java; if you language matched java perfectly, this would be zero work. But it doesn't, or you wouldn't be asking the question. And langauge semantics differences, even minor ones, cause discontinuous changes to the interpreation of the code. Before you get to doing even "serious" semantics, you're likely to have to build a symbol table mapping identifiers in the code to declarations (and the "semantic" type) for those symbols. Based on tool infrastructure I work with, this step alone takes 1-2 months for a real language.
Lastly, you are likely to have to code special PMD checks that are specific to your langauge. That takes time and energy, too.
I build generic compiler-type machinery (parsers, flow analyzers, style/error checkers) and get asked the equivalent of this question all the time WRT to our machinery. We try to have a lot of machinery available, try to make it easy to integrate new langauges, and we've been working on trying to make this "convenient and fast" for 15+ years. Its still not convenient, and there's no way to do this with our tools in a few weeks. I doubt PMD is better.
I just wanted to know if it is 100% possible, if my language is turing-complete, to write a program in it that prints itself out (of course not using a file reading function)
So if the language just has the really necessary things in order to make it turing complete (I would prove that by translating Brainf*ck code to it), like output, variables, conditions and gotos (hell yes, gotos), can I try writing a quine in it?
I'm also asking this because I'm not sure that a quine directly fits into Turing's law that the turing machine is capable of any computational task.
I just want to know so I don't try for years without knowing that it may be impossible.
Any programming language which is
Turing complete, and which is able to
output any string (by a computable
function of the string as program —
this is a technical condition that is
satisfied in every programming
language in existence) has a quine
program (and, in fact, infinitely many
quine programs, and many similar
curiosities) as follows by the
fixed-point theorem.
See here
I ran into this issue a couple of months ago.
While writing a quine doesn't necessarily prove that a language is Turing Complete, it is a strong suggestion ;) As far as Turing Completeness goes, if you can (like you said) provide a valid translation from your language to another Turing-Complete language, then your language is Turing Complete.
That being said, any language that is Turing Complete that can output a string should be able to generate a quine. Also, from Wikipedia:
A quine is a fixed point of an execution environment, when the execution environment is viewed as a function. Quines are possible in any programming language that has the ability to output any computable string, as a direct consequence of Kleene's recursion theorem. For amusement, programmers sometimes attempt to develop the shortest possible quine in any given programming language.
It is possible to have a programming language that cannot print all the symbols in its representation. For example, the I/O may be limited to 7-bit ASCII characters with language keywords in Arabic. That's the only exception I can think of.
Well, technically, not always. According to the proof on Wikipedia, the programming language has to be an admissible numbering. Practical and sane Turing-complate programming languages are all admissible numberings. And a Turing-complate programming language is an admissible numbering if it's possible to translate between that and another admissible numbering.
An example Turing-complete programming language that is not an admissible numbering:
The source code always contains one or two doublequoted escaped strings. If the input is empty, output the first string if there are two strings, or loop forever if there is one. Otherwise, evaluate the last string in Python, using the original input as input.
It's not an admissible numbering because, given a Python program, we have to know its behavior when the input is empty, to translate it into this language. But we may never know if it is an infinite loop, as we cannot solve the halting problem. We know a translation always exists, though.
It's impossible to write quines in this language.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
For the popular languages and libraries we use every day:
What are examples of some bad design, embarrassing APIs, or generally bad usability? Design errors that we have to pay for because they introduce subtle bugs, we have to use awkward workarounds or memorize unintuitive ways to get things done.
I'm especially thinking of issues like: There's a class in an OO language that really shouldn't inherit from that other class. There's special operator that makes a certain language hard to parse, and it turned out to be unused anyway. A function that's misnamed or is often in use for other things than it was designed for (I'm thinking of std::getline to tokenize strings).
I'm not looking for contributions that bash languages and claim that, say, Perl or some other language is just badly designed. I'm more looking for concrete examples or anecdotes about things that clearly should have been done differently. (Maybe the designers caught it too late and tried to fix it in subsequent versions, but had to retain backward compatibility.)
Java's URL class does (or did, a few years ago) a DNS lookup when determining the hashCode of a URL object. So, not only is a hash table with URLs as keys extremely slow, but it can change at runtime if two successive DNS requests return different values or you unplug the network cable!
Java class called "NullPointerException"
Having migrated from C++ to Java I always found it amusing to find a NullPointerException in a language with "no pointers"
The majority of C++ is design flaws yet most people learn to live with it. [Ducks for mass of haters downvotes]
PHP has a good bunch of them, the ($needle,$haystack) ($haystack,$needle) inconsistencies.
The string functions with mb_ prefixed to them to enable multibytesupport.
My favourite mysql_escape_string and mysql_real_escape_string...
Lets not get started on the OO part... :)
My personal favorite is atoi of the C standard lib.
int atoi ( const char * str );
On success, the function returns the converted integral number as an int value.
If no valid conversion could be performed, a zero value is returned.
Well, unfortunately it's painful to convert "0" with this function as you never sure if it was an error or "0".
Java "Calendar" API. It is error-prone and hard to use.
To enumerate just a few of the problems:
misleading name: a Calendar object is supposed to model a "calendar system", but in practice encapsulates a time object, i.e it is a kind of Date (which is itself a problem, since Date should better be named DateTime or TimeStamp)
there is only one concrete subclass of Calendar (GregorianCalendar), but the abstract Calendar class contains constants only useful in this specific case (JANUARY, MONDAY, AM_PM, ERA)
you modify fields by using constants ("MONDAY", "WEEK_OF_MONTH" etc); these are integers and can be mixed up rather easily.
in fact, just about every argument and return value is an integer; the usual problems with 0- and 1-based numbers apply (is January 0 or 1?=
constants like "HOUR" and "HOUR_OF_DAY" that only make sense when the AM/PM system is used (which nobody outside the US understands anyway ;-)
My favorite comes from the world of Smalltalk. Squeak to be specific. In Squeak the Semaphore class inherits from LinkedList. Semaphores may utilize linked lists, but they are not linked lists themselves. This makes for a very odd interface. This is terrible OO design.
API's where functions return null when they should return empty items...and then don't document when, how, or why they'll return null.
Checked vs. unchecked exceptions in Java. Most Java developers either don't know when to use which, or, if they do, they disagree.
Then we have things like IOExeption which you can never handle them when they occur but have to throw them up (in all senses of the word). When you have finally reached a place where you can handle them, you have no idea how to figure out what might have caused them, so you can only present the user with the message and stack trace and hope she can figure it out (here "she" is normal user who knows that Java is a kind of coffee).
http://www.infoq.com/presentations/effective-api-design
Skip to about 40 minutes and he talks about some of the XML APIs in Java.
In general, it's a pretty interesting presentation.
I found one in Ruby's Dir.glob().
I haven't been able to prove it's not related to my particular environment yet, since my environment is old and cobbled together by hand and I need to continue to support it unfortunately. There seem to be at least 5 cases:
Directory has files and some match -> List of those files
Directory has files and none match -> Empty list
Directory is empty -> Nil
Directory exists but the current user can't read it (don't know what happens here)
Directory is missing (haven't tested this either)
It had me pulling my hair out because the documentation doesn't describe it that way.
In Java I would go with InterruptedException being a checked exception. If you need to know that your sleep method woke up early, that should have been a boolean returned from the method. Nothing gives checked exceptions a bad name like InterruptedException.
After switching back and forth between several scripting languages this week, I found myself thinking how similar they all are. Yet I'm always reaching for Google (or nowadays SO) to remember details like what the local equivalents of "instanceof" and "endswith" are, or the right syntax to declare an interface, or whatever.
This reminded me of the (human) language Europonto. Just pick some vaguely English syntax and some vaguely Romance/Germanic/Slavic vocabulary, and it's all good!
So what would happen if we tried to do the same thing with a scripting language. In the mood for Python-style indented blocks today? Fine! Want to use a prototype object? Ok! Can only remember how to spell the PHP names of some library function? No problem!
Anyway, that's the wild and crazy idea. Since we need a question that admits concrete answers, let's tighten it up like this:
What would be the most significant conflicts in creating a scripting language that permitted all the native syntax and library functions of [Python, Ruby, PHP, Perl, shell, and JavaScript], such that you could freely intermix code blocks and function names between languages?
And let's say that any particular construction should be consistent at the statement level. So we'll allow:
foreach( $foo as $bar )
{
if $foo == 2:
print "hi"
}
but not, say,
foreach( $foo as $bar )
{
if $foo == 2:
print "hi"
endif
end
Conflicts can include: parser ambiguities; name collision; conflicting semantics for objects or functions or closures; etc. I'm guessing that scope will be a ginormous issue, but you tell me.
I'll start this as "community wiki" from the get go, so if you think it's a fun question but want to make it more rigorous, feel free to edit.
I would suggest that the main problem is recognising what the syntax of each statement is supposed to be.
In any case, what is the point? Almost all scripting languages have facilities to do much the same things, which is why people tend to master one that they use consistently, and stick with it.
The main difficulty would to be to allow people maintain it. With a well defined language you can only print a certain way and do sys.argv a certain way. once you allow multiple syntaxes there is no sane way to search for all the sys.argv in the code base you have.
At the syntactical level the only problem I can see would be to detect which block has which syntax, then separate them and parse them with specific parsers. Of course given very small statements there could be ambiguities as to which language it is and you could argue that it doesn't matter, but it just may be the case, that in different languages the same string of characters does different things so this could be a subtle issue.
At the API level you would have lots of different methods of doing the same thing but in a subtly different way or subset of doing it. So for example you could have no way of doing Java's string.startsWith() in let's say PHP, so you would do something different, or no way of doing PHP's strstr() (which returns a part of the string from the found needle to the end) and you would implement something different for that or even think differently about the problem. Then you would have to have all those different API methods of doing the same things and that would be huge API to implement, support and (god forbid) learn.
At the wetware level the code written by others would be totally unreadable unless you know a ton of languages and their subtle differences. I think it is difficult enough to learn a single programming language to the smallest details and so it is not practical at all to have this kind of frankensteinish beast created. I can think of an exception for use as an algorithm description language which it already is used in universities all over the world, where teacher takes some language of his liking and makes the code as readable as it can be for a human without needing to implement a parser for it.
As a side note I think this kind of system could be implemented at the least effort by somehow utilizing .NET's CLR where you have a ton of different languages each compiling to the same bytecode and accessing the same variables and stuff. All you'd need to do is split the code to clusters of different languages, then compile them separately on their respective compilers and then just merge the bytecode and somehow make sure they all point to the same variables and functions when mentioning the same names across the different languages.
I have begun to see that syntax is but one property of a language. And most of them look like C to me. The purpose of a language (object oriented, strong typing, etc) is something else again. It starts to look like syntax is not the most important aspect.
I went and read the wikipedia entry...
Europanto is a linguistic jest presented as a "constructed language" with a hodge-podge vocabulary
"Hodge-podge" sounds like the way Perl has been described to me!
I found a rather detailed discussion of closures in Ruby. It sounds like getting Ruby's behavior to coexist with JavaScript's or Python's would require some kind of ugly disambiguation.
If anybody were to add Perl to the list of languages to be covered, I think its lexical scoping rules would present a related problem?