Is there some other way to describe a formal language other than grammars? - grammar

I'm looking for the mathematical theory which deals with describing formal languages (set of strings) in general and not just grammar hierarchies.

Grammars give you the algorithm that lists all possible strings in the language. You could specify the algorithm any other way, but grammars are a concise and well-accepted format to do so.
Another way is to list every string that belongs to the language -- this will only work if the set of strings in the language is small (and definitely not when the set is infinite).

Regular expressions are a formalism for describing a set of languages, for instance. Although there are algorithms for transforming regular grammars and expressions in both ways, they are still two different theories. Also, automata (as a plural of automaton) can help you describe languages, not just DFA and NFA which describe the same set as regular languages, but 2DFA, stack automata. For example, a two-stacks automata is as powerful as a Turing machine. Finally, Turing machines itself are a formalism for languages. For any Turing machine, the set of all string on which the given Turing machine stops on a finite number of steps is a formally defined language.

Related

How to describe the semantics of a language?

For syntax there is the EBNF ISO 14977 standard.
for runtime we have CLI ISO 23271 standard
see also Simple definition of "semantics" as it is commonly used in relation to programming languages/APIs?
but how to describe the transition from EBNF to CLI specs in declarative way?
i.e. is it enough to use the S-attributed grammar? Which standard define the syntax of such grammar?
There are many ways to define the semantics of a language. All of them have to express somehow the relationship between the program text and "what it computes".
A short but incomplete list of basic techniques:
Define an interpreter ("operational semantics")
Define a map from the source code to an enriched lambda calculus ("denotational semantics")
Define a map from the source code to another well-defined language ("transformational semantics")
Essentially, these are computations defined over the source text of a program instance.
You can implement these computations in many different ways. One way to implement them might be "S-attributed" grammars, although why you would want to restrict yourself to only S-attributes rather than a standard attributed grammar with inherited attributes is beyond me.
Given that there are so many ways to do this, I doubt you are going to find a standard. Certainly the programming langauge committees aren't using one. Heck, they won't even use a standard for BNF.

chomsky hierarchy in plain english

I'm trying to find a plain (i.e. non-formal) explanation of the 4 levels of formal grammars (unrestricted, context-sensitive, context-free, regular) as set out by Chomsky.
It's been an age since I studied formal grammars, and the various definitions are now confusing for me to visualize. To be clear, I'm not looking for the formal definitions you'll find everywhere (e.g. here and here -- I can google as well as anyone else), or really even formal definitions of any sort. Instead, what I was hoping to find was clean and simple explanations that don't sacrifice clarity for the sake of completeness.
Maybe you get a better understanding if you remember the automata generating these languages.
Regular languages are generated by regular automata. They have only have a finit knowledge of the past (their compute memory has limits) so everytime you have a language with suffixes depending on prefixes (palindrome language) this can not be done with regular languages.
Context-free languages are generated by nondeterministic pushdown automata. They have a kind of knowledge of the past (the stack, which is not limited in contrast to regular automata) but a stack can only be viewed from top so you don't have complete knowledge of the past.
Context-sensitive languages are generated by linear-bound non-deterministic turing machines. They know the past and can deal with different contexts because they are non-deterministic and can access all the past at every time.
Unrestricted languages are generated by Turing machines. According to the Church-Turing-Thesis turing machines are able to calculate everything you can imagine (which means everything decidable).
As for regular languages, there are many equivalent characterizations. They give many different ways of looking at regular languages. It is hard to give a "plain English" definition, and if you find it hard to understand any of the characterizations of regular languages, it is unlikely that a "plain English" explanation will help. One thing to note from the definitions and various closure properties is that regular languages embody the notion of "finiteness" somehow. But this is again hard to appreciate without better familiarity with regular languages.
Do you find the notion of a finite automaton to be not simple and clean?
Let me mention some of the many equivalent characterizations (at least for other readers) :
Languages accepted by deterministic finite automata
Languages accepted by nondeterministic finite automata
Languages accepted by alternating finite automata
Languages accepted by two-way deterministic finite automata
Languages generated by left-linear grammars
Languages generated by right-linear grammars
Languages generated by regular expressions.
A union of some equivalence classes of a right-congruence of finite index.
A union of some equivalence classes of a congruence of finite index.
The inverse image under a monoid homomorphism of a subset of a finite monoid.
Languages expressible in monadic second order logic over words.
Regular: These languages answer yes/no with finite automata
Context free: These languages when given input word ( using state machiene and stack ) we can always answer yes/no if it is member of the language
Context sensitive: As long as production in grammar never shrinks ( α -> β ) we can answer yes/no (using state machiene and chunk of memory that is linear in size with input)
Recursively ennumerable: It can answer yes but in case of no it will go into infinite loop
see this video for full explanation.

When you are proving a language is decidable, what are you effectively doing?

When you are proving a language is decidable, what are you effectively doing?
If you asking HOW is it done, I'm unsure, but I can check.
Basically, decidable is the language for which one can construct an algorithm (i.e. Turing machine) that will halt for ANY finite input (with accepting or rejecting the input).
Undecidable is the language which is not decidable.
http://en.wikipedia.org/wiki/Recursive_language ... but more on the subject can easily be found. On this link there is only a quick mention of the term.
p.s. So, when constructing above mentioned algorithm, you are basically proving that language is decidable.

Is it possible to create a quine in every turing-complete language?

I just wanted to know if it is 100% possible, if my language is turing-complete, to write a program in it that prints itself out (of course not using a file reading function)
So if the language just has the really necessary things in order to make it turing complete (I would prove that by translating Brainf*ck code to it), like output, variables, conditions and gotos (hell yes, gotos), can I try writing a quine in it?
I'm also asking this because I'm not sure that a quine directly fits into Turing's law that the turing machine is capable of any computational task.
I just want to know so I don't try for years without knowing that it may be impossible.
Any programming language which is
Turing complete, and which is able to
output any string (by a computable
function of the string as program —
this is a technical condition that is
satisfied in every programming
language in existence) has a quine
program (and, in fact, infinitely many
quine programs, and many similar
curiosities) as follows by the
fixed-point theorem.
See here
I ran into this issue a couple of months ago.
While writing a quine doesn't necessarily prove that a language is Turing Complete, it is a strong suggestion ;) As far as Turing Completeness goes, if you can (like you said) provide a valid translation from your language to another Turing-Complete language, then your language is Turing Complete.
That being said, any language that is Turing Complete that can output a string should be able to generate a quine. Also, from Wikipedia:
A quine is a fixed point of an execution environment, when the execution environment is viewed as a function. Quines are possible in any programming language that has the ability to output any computable string, as a direct consequence of Kleene's recursion theorem. For amusement, programmers sometimes attempt to develop the shortest possible quine in any given programming language.
It is possible to have a programming language that cannot print all the symbols in its representation. For example, the I/O may be limited to 7-bit ASCII characters with language keywords in Arabic. That's the only exception I can think of.
Well, technically, not always. According to the proof on Wikipedia, the programming language has to be an admissible numbering. Practical and sane Turing-complate programming languages are all admissible numberings. And a Turing-complate programming language is an admissible numbering if it's possible to translate between that and another admissible numbering.
An example Turing-complete programming language that is not an admissible numbering:
The source code always contains one or two doublequoted escaped strings. If the input is empty, output the first string if there are two strings, or loop forever if there is one. Otherwise, evaluate the last string in Python, using the original input as input.
It's not an admissible numbering because, given a Python program, we have to know its behavior when the input is empty, to translate it into this language. But we may never know if it is an infinite loop, as we cannot solve the halting problem. We know a translation always exists, though.
It's impossible to write quines in this language.

Is there any Mathematical Model or Theory behind Programming Languages? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
RDBMS are based on Relational Algebra as well as Codd's Model. Do we have something similar to that for Programming languages or OOP?
Do we have [an underlying model] for programming languages?
Heavens, yes. And because there are so many programming languages, there are multiple models to choose from. Most important first:
Church's untyped lambda calculus is a model of computation that is as powerful as a Turing machine (no more and no less). The famous "Church-Turing hypothesis" is that these two equivalent models represent the most general model of computation that we know how to implement. The lambda calculus is extremely simple; in its entirety the language is
e ::= x | e1 e2 | \x.e
which constitute variables, function applications, and function definitions. The lambda calculus also comes with a fairly large collection of "reduction rules" for simplifying expressions. If you find an expression that can't be reduced, that is called a "normal form" and represents a value.
The lambda calculus is so general that you can take it in several directions.
If you want to use all the available rules, you can write specialized tools like partial evaluators and parts of compilers.
If you avoid reducing any subexpression under a lambda, but otherwise use all the rules available, you wind up with a model of a lazy functional language like Haskell or Clean. In this model, if a reduction can terminate, it is guaranteed to, and it is easy to represent infinite data structures. Very powerful.
If you avoid reducing any subexpression under a lambda, and if you also insist on reducing each argument to a normal form before a function is applied, then you have a model of an eager functional language like F#, Lisp, Objective Caml, Scheme, or Standard ML.
There are also several flavors of typed lambda calculi, of which the most famous are grouped under the name System F, which were discovered independently by Girard (in logic) and by Reynolds (in computer science). System F is an excellent model for languages like CLU, Haskell, and ML, which are polymorphic but have compile-time type checking. Hindley (in logic) and Milner (in computer science) discovered a restricted form of System F (now called the Hindley-Milner type system) which makes it possible to infer System F expressions from some expressions of the untyped lambda calculus. Damas and Milner developed an algorithm do this inference, which is used in Standard ML and has been generalized in other languages.
Lambda calculus is just pushing symbols around. Dana Scott's pioneering work in denotational semantics showed that expressions in the lambda calculus actually correspond to mathematical functions—and he identified which ones. Scott's work is especially important in making sense of "recursive definitions", which are commonplace in computer science but are nonsensical from a mathematical point of view. Scott and Christopher Strachey showed that a recursive definition is equivalent to the least defined solution to a recursion equation, and furthermore showed how that solution could be constructed. Any language that allows recursion, and especially languages that allow recursion at arbitrary type (like Haskell and Clean) owes something to Scott's model.
There is a whole family of models based on abstract machines. Here there is not so much an individual model as a technique. You can define a language by using a state machine and defining transitions on the machine. This definition encompasses everything from Turing machines to Von Neumann machines to term-rewriting systems, but generally the abstract machine is designed to be "as close to the language as possible." The design of such machines, and the business of proving theorems about them, comes under the heading of operational semantics.
What about object-oriented programming?
I'm not as well educated as I should be about abstract models used for OOP. The models I'm most familiar with are very closely connected to implementation strategies. If I wanted to investigate this area further I would start with William Cook's denotational semantics for Smalltalk. (Smalltalk as a language is very simple, almost as simple as the lambda calculus, so it makes a good case study for modeling more complicated object-oriented languages.)
Wei Hu reminds me that Martin Abadi and Luca Cardelli have put together an ambitious body of work on foundational calculi (analogous to the lambda calculus) for object-oriented languages. I don't understand the work well enough to summarize it, but here is a passage from the Prologue of their book, which I feel is worth quoting:
Procedural languages are generally well understood; their constructs are by now standard, and their formal underpinnings are solid. The fundamental features of these languages have been distilled into formalisms that prove useful in identifying and explaining issues of implementation, static analysis, semantics, and verification.
An analogous understanding has not yet emerged for object-oriented languages. There is no widespread agreement on a collection of basic constructs and on their properties... This situation might improve if we had a better understanding of the foundations of object-oriented languages.
... we take objects as primitive and concentrate on the intrinsic rules that objects should obey. We introduce object calculi and develop a theory of objects around them. These object calculi are as simple as function calculi, but represent objects directly.
I hope this quotation gives you an idea of the flavor of the work.
Lisp is based on Lambda Calculus, and is the inspiration for much of what we see in modern languages today.
Von-Neumann machines are the foundation of modern computers, which were first programmed in assembler language, then in FORmula TRANslator. Then the formal linguistic theory of context-free-grammars was applied, and underlies the syntax of all modern languages.
Computability theory (formal automata) has a hierachy of machine-types that parallels the hierarchy of formal grammars, for example, regular-grammar = finite-state-machine, context-free-grammar = pushdown-automaton, context-sensitive-grammar = turing-machine.
There also is information theory, of two types, Shannon and Kolmogorov, that can be applied to computing.
There are lesser-known models of computing, such as recursive-function-theory, register-machines, and Post-machines.
And don't forget predicate-logic in its various forms.
Added: I forgot to mention discrete math - group theory and lattice theory. Lattices in particular are (IMHO) a particularly nifty concept underlying all boolean logic, and some models of computation, such as denotational semantics.
Functional languages like lisp inherit their basic concepts from Church's "lambda calculs" (wikipedia article here).
Regards
One concept may be Turing Machine.
If you study programming languages (eg: at a University), there is quite a lot of theory, and not a little math involved.
Examples are:
Finite State Machines
Formal Lanugages (and Context Free Grammars like BNF used to describe them)
The construction of LRish parser tables
The closest analogy I can think of is Gurevich Evolving Algebras that, nowadays, are more known under the name of "Gurevich Abstract State Machines" (GASM).
I've long hoped to see more real applications of the theory when Gurevich joined Microsoft, but it seems that very few is coming out. You can check the ASML page on the Microsoft site.
The good point about GASM is that they closely resemble pseudo-code even if their semantic is formally specified. This means that practitioners can easily grasp them.
After all, I think that part of the success of Relational Algebra is that it is the formal foundation of concepts that can be easily grasped, namely tables, foreign keys, joins, etc.
I think we need something similar for the dynamic components of a software system.
There are many dimensions to address your question, scattering in the answers.
First of all, to describe the syntax of a language and specify how a parser would work, we use context-free grammars.
Then you need to assign meanings to the syntax. Formal semantics come in handy; the main players are operational semantics, denotational semantics, and axiomatic semantics.
To rule out bad programs you have the type system.
In the end, all computer programs can reduce to (or compile to, if you will) very simple computation models. Imperative programs are more easily mapped to Turing machines, and functional programs are mapped to lambda calculus.
If you're learning all this stuff by yourself, I highly recommend http://www.uni-koblenz.de/~laemmel/paradigms0910/, because the lectures are videotaped and put online.
The history section of Wikipedia's Object-oriented programming could be enlightening.
Plenty has been mentioned of the application of math to computational theory and semantics. I like the mention of type theory and I'm glad someone mentioned lattice theory. Here are just a few more.
No one has explicitly mentioned category theory, which shows up more in functional languages than elsewhere, such as through the concepts of monads and functors. Then there's model theory and the various incarnations of logic that actually show up in theorem provers or the logic language Prolog. There are also mathematical applications to foundations of and problems in concurrent languages.
There is no mathematical model for OOP.
Relational algebra in the mathemaical model for SQL. It was created bt E.F. Codd. C.J. Date was also a reknown cientist who helped with this theory. The whole idea is that you can do every operation as a set operation, affecting a lot of values at the same time. This of course means that the database engine has to be told WHAT to get out, and the database is able to optimize your query.
Both Codd and Date criticized SQL because they were involved in the theory, but they were not involved in the creation of SQL.
See this video: http://player.oreilly.com/videos/9781491908853?toc_id=182164
There is a lot of information from Chris Date. I remember that Date criticized the SQL programming language as being a terrible language, but I cannot find the paper.
Teh critique was basically that most languages allow to write expressions and assign variables to those expressions, but SQL does not.
Since SQL is a kind of logical language, I guess you could write relational algebra in Prolog. At least you would have a real language. So you could write queries in Prolog. And since in prolog you have a lot of programs to interpret natural language, you could query your database using natural language.
According to Uncle Bob, databases are not going to be needed when everyone has SSD, because the architecture of SSDs means that access is so fast as RAM. So you can have all your objects in RAM.
https://www.youtube.com/watch?feature=player_detailpage&v=t86v3N4OshQ#t=3287
The only problem with ditching SQL is that you would end up without a query language for the database.
So yes and no, relational algebra was used as inspiration for SQL, but SQL is not really an implementation of relational algebra.
In the case of the Lisp, things are different. The main idea was that implementing the eval function in Lisp you could have the whole language implemented. That's whe the first Lisp implementation is only half a page of code.
http://www.michaelnielsen.org/ddi/lisp-as-the-maxwells-equations-of-software/
To laugh a little bit: https://www.youtube.com/watch?v=hzf3hTUKk8U
The importance of functional programming all comes down to curried functions and lazy calls. And never forget environments and closures. And map-reduce. This all means we will be coding in functional languages in 20 years.
Now back to OOP, there is no formalization of OOP.
Interestingly, the second OO language ever created, Smalltalk, only has objects, it doesn't have primitives or anything like that. And the creator, Alan Kay, explicitly created blocks to work exactly as Lisp functions.
Some people claim OOP could maybe be formalized using category theory, which is kind of set theory but with morphisms. A morphism is a structure preserving map between objects. So in general you could have map( f, collection ) and get back a collection with all elements being f applied.
I'm pretty sure Lisp has that, but Lisp also has functions that return one element in a collection, that destroys the structure, so a morphism is a especial kind of function and because of that, you would need to reduce and limit the functions in Lisp so that they are all morphisms.
https://www.youtube.com/watch?feature=player_detailpage&v=o6L6XeNdd_k#t=250
The main problem with this is that functions don't exist independently of objects in OOP, but in category theory they do. They are therefore incompatible. You could develop a new language in which to express category theory.
An experimental theoretical language created explicitly to try to formalize OOP is Z. Z is derived from requirements formalism.
Another attempt is Luca Cardelli's formalism:
Javahttp://lucacardelli.name/Papers/PrimObjImp.pdf
Javahttp://lucacardelli.name/Papers/PrimObj1stOrder.A4.pdf
Javahttp://lucacardelli.name/Papers/PrimObjSemLICS.A4.pdf
I'm unable to read and understand that notation. It seems like a useless excercise, since as far as I know, no one has ever implemented this the way lamba calculus was implemented in Lisp.
As I know, Formal grammars is used for description of syntax.