Can someone give a simple but non-toy example of a context-sensitive grammar? [closed] - grammar

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I'm trying to understand context-sensitive grammars, and I understand why languages like
{ww | w is a string}
{an bn cn | a,b,c are symbols}
are not context free, but what I'd like to know if a language similar to the untyped lambda calculus is context sensitive. I'd like to see an example of a simple, but non-toy (I consider the above toy examples), example of a context-sensitive grammar that can, for some production rule, e.g., tell whether or not some string of symbols is in scope currently (e.g. when producing the body of a function). Are context sensitive grammars powerful enough to make undefined/undeclared/unbound variables a syntactic (rather than semantic) error?

Yes, context-sensitive grammars (CSG) are powerful enough to make undefined/undeclared/unbound variables check, but unfortunately we don't know any efficient algorithm to parse strings of CSG.
A real example of a context-sensitive language is the C programming language. A feature like declare variables first and then use them later make C-language a context-sensitive language (CSL). (I don't know about untyped lambda calculus).
And because we don't know any linear parsing algorithm for CSL (or CSG). That is the reason in compiler design, we use CFG (and its parsing algoritm only) for syntax checking since we know efficient algorithms to parse CFG (if it's in restricted form). Compilers first parse a context free feature and then later handle context-sensitive features in a problematically way (for example, checks any used variable in the symbol table if it's defined. Otherwise, it generates an error).
Also context-sensitive grammar is used in natural-language processing (NLP). And most natural languages are examples of context-sensitive languages. (I am not sure for the Sanskrit language).
I will try to explain it with a silly but simple example (it's just an idea, you can refine it):
NOUN --> { BlueBomber, Grijesh, I, We}
TENSE --> { am, was, is, were}
VERB --> { going, eating, working}
SENTENCE --> <NOUN> <TENSE> <VERB>
Now, using this grammar, we can generate some correct statements, but some are wrong too. For example,
SENTENCE --> <NOUN> <TENSE> <VERB>
Grijesh is working [Correct statement]
But
Grijesh am working [wrong statement]
Reason: the value of <TENSE> depends on value <NOUN> (for example, I <TENSE> --> I am) and hence the grammar doesn't generate correct statements in the English language.
Actually we can't write a context-free grammar for complete English!
You might have noticed, any natural language translator or grammar checker doesn't works correctly (try with long statements). Because this problem comes under the context-sensitive parsing algorithm.
REFERENCE: You can watch Dr. Arun Kumar's lectures.
In some lecture he explains exactly what you are interested in.

Related

Guidelines for Google Code Prettify [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I'm hoping to pick up Issue 295 from Google's Code-Prettify project; i.e. to add support for lang-powershell.
Whilst I've found some code examples, I can't find any documentation on how this code should be written, or any submission guidelines (e.g. should syntax highlighting work for invalid code, or should it attempt to highlight such errors)?
Ironically I've tried Googling, but with no joy. The best I could find was their Style Guide.
Question
Please could someone point me to documentation for submitting a new language support script to Google Code Prettify?
should syntax highlighting work for invalid code
or should it attempt to highlight such errors
Prettify is often applied to code fragments, so you can assume that the fragment starts at a token boundary, but should not assume that it starts at a top level production.
On sites like SO, prettify is applied to inputs written by novices and maintainers who have a passing familiarity with other languages and are trying to make spot edits to an existing snippet of code.
Prettify should make it easy for people with a deep understanding of the language to quickly scan for problems in the snippet of code.
You should make a best effort to recover from errors. For example, if the snippet only includes single-line tokens, then an invalid token on one line shouldn't prevent prettifying of every subsequent line. If that's unavoidable, then an invalid token shouldn't prevent prettifying of previous tokens -- seeing where tokenizing fails can convey useful information to someone scanning a code snippet for problems.
If you want to call out obvious errors like unclosed string literals, that's great. I'd apply .err and then a style that wants to apply a wiggly underline in red could do so. I'd be happy to accept a change to the default stylesheet for that.
The way I think about it is that prettify bridges the gap between two concepts of language:
In parser theory, a "language" is a set of strings. The PowerShell language is the set of strings defined by a grammar in a spec document.
In common descriptivist usage, a "language" relates that which is produced by a speaker or author in the associated linguistic community to that which is in their mind when they produce the string. When a programmer sits down to produce a PowerShell script, what they produce is a string in the language even if they do a bad job or their mental model of PowerShell differs significantly from the spec document.
In the first sense, there is no such thing as a malformed PowerShell program, just a string that is not in the language and hence has no semantics per the spec. In the second, a malformed PowerShell program is a PowerShell program.
Please keep the second definition in mind, and remember that prettify does not need to work on the output of code generators.

Simplest real-world language [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am in the process of writing interpreters for a couple of languages, in TeX, which would allow TeX users to insert some code from their favorite language (if supported), and have TeX run it when producing the pdf result.
I started by writing an interpreter for Brainfuck, since it is a very simple language. I thought GolfScript would be a piece of cake, but it is richer than I had expected (mostly because it is based on the rather elaborate Ruby). I'll probably do Whitespace for the sake of it. But none of those is actually used by people, so up to now the whole process is mostly an exercise to see how to best write interpreters in TeX.
My question is: what real-world language should I consider? It should have the following qualities:
simple (I'm not ready for Python),
typically be used as one-liners (if possible),
and have a reasonably large user base.
I'm assuming that every language can have an interpreter (compiling only enhances the speed), please mention if you think of technical hurdles for the proposed language.
EDIT: I am also interested in comments such as "implement Perl 2, then gradually add support for later versions" (no idea if that particular scenario is a good idea, though). I've already coded some support for regular expressions.
How about a stack based language like Forth (or even a subset of PostScript)? Stack operations should translate to TeX constructs relatively easily. Finally, if all of this is done for the sake of an exercise, what about implementing a C preprocessor?
You can use some LISP based language like Beginning Student Language or some ML variant like OCaml. For a simple exercise in interpreting ML-like code, just implementing definitions, function applications, arithmetic and conditional expressions gives you a nice start:
let rec factorial n =
if n = 0 then 1 else n * (factorial (n-1))
let sqr x = x * x
let compose f g x = f (g x)
(compose sqr sqr) (factorial 3)
Also consider Lua which aims at being a good scripting language that can be used to let users customize your applications and is pretty popular.

When you are proving a language is decidable, what are you effectively doing?

When you are proving a language is decidable, what are you effectively doing?
If you asking HOW is it done, I'm unsure, but I can check.
Basically, decidable is the language for which one can construct an algorithm (i.e. Turing machine) that will halt for ANY finite input (with accepting or rejecting the input).
Undecidable is the language which is not decidable.
http://en.wikipedia.org/wiki/Recursive_language ... but more on the subject can easily be found. On this link there is only a quick mention of the term.
p.s. So, when constructing above mentioned algorithm, you are basically proving that language is decidable.

Is it possible to create a quine in every turing-complete language?

I just wanted to know if it is 100% possible, if my language is turing-complete, to write a program in it that prints itself out (of course not using a file reading function)
So if the language just has the really necessary things in order to make it turing complete (I would prove that by translating Brainf*ck code to it), like output, variables, conditions and gotos (hell yes, gotos), can I try writing a quine in it?
I'm also asking this because I'm not sure that a quine directly fits into Turing's law that the turing machine is capable of any computational task.
I just want to know so I don't try for years without knowing that it may be impossible.
Any programming language which is
Turing complete, and which is able to
output any string (by a computable
function of the string as program —
this is a technical condition that is
satisfied in every programming
language in existence) has a quine
program (and, in fact, infinitely many
quine programs, and many similar
curiosities) as follows by the
fixed-point theorem.
See here
I ran into this issue a couple of months ago.
While writing a quine doesn't necessarily prove that a language is Turing Complete, it is a strong suggestion ;) As far as Turing Completeness goes, if you can (like you said) provide a valid translation from your language to another Turing-Complete language, then your language is Turing Complete.
That being said, any language that is Turing Complete that can output a string should be able to generate a quine. Also, from Wikipedia:
A quine is a fixed point of an execution environment, when the execution environment is viewed as a function. Quines are possible in any programming language that has the ability to output any computable string, as a direct consequence of Kleene's recursion theorem. For amusement, programmers sometimes attempt to develop the shortest possible quine in any given programming language.
It is possible to have a programming language that cannot print all the symbols in its representation. For example, the I/O may be limited to 7-bit ASCII characters with language keywords in Arabic. That's the only exception I can think of.
Well, technically, not always. According to the proof on Wikipedia, the programming language has to be an admissible numbering. Practical and sane Turing-complate programming languages are all admissible numberings. And a Turing-complate programming language is an admissible numbering if it's possible to translate between that and another admissible numbering.
An example Turing-complete programming language that is not an admissible numbering:
The source code always contains one or two doublequoted escaped strings. If the input is empty, output the first string if there are two strings, or loop forever if there is one. Otherwise, evaluate the last string in Python, using the original input as input.
It's not an admissible numbering because, given a Python program, we have to know its behavior when the input is empty, to translate it into this language. But we may never know if it is an infinite loop, as we cannot solve the halting problem. We know a translation always exists, though.
It's impossible to write quines in this language.

Is it easier to write a recursive-descent parser using an EBNF or a BNF?

I've got a BNF and EBNF for a grammar. The BNF is obviously more verbose. I have a fairly good idea as far as using the BNF to build a recursive-descent parser; there are many resources for this. I am having trouble finding resources to convert an EBNF to a recursive-descent parser. Is this because it's more difficult? I recall from my CS theory classes that we went over EBNFs, but we didn't go over converting them into a recursive-descent parser. We did go over converting BNF's into a recursive-descent parser.
The reason I'm asking is because the EBNF is more compact.
From looking at the EBNF's in general, I notice that terms enclosed between { and } can be converted into a while loop. Are there any other guidelines or rules?
You should investigate so-called metacompilers, which essentially compile EBNF into recursive descent parsers. How they do it is exactly the answer your question.
(Its pretty straightfoward, but good to understand the details).
A really wonderful paper is the "MetaII" paper by Val Schorre. This is metacompiler technology from honest-to-God 1964. In 10 pages, he shows you how to build a metacompiler, and provides not just that, but another compiler too and the output of both!. There's an astonishing moment that you come too if you go build one of these, where you realized how the meta-compiler compiles itself using its own grammar. This moment got me
hooked on compiler back in about 1970 when I first tripped over this paper. This is one of those computer science papers that everybody in the software business should read.
James Neighbors (the inventor of the term "domain" in software engineering, and builder of the first program transformation system [based on these metacompilers] has a great online MetaII tutorial, for those of you that don't want the do-it-from-scratch experience. (I have nothing to do with this except that Neighbors and I were undergraduates together).
Both ways are a fine way to learn about metacompilers and generating parsers from EBNF.
The key ideas are that the left hand side of a rule creates a function that parses that nonterminal and returns true if match and advances the input stream; false if no match and the input stream doesn't advance.
The contents of the function is determined by the right hand side. Literal tokens are matched directly.
Nonterminals cause calls to other functions generated for the other rules.
Kleene* maps to while loops, alternations map to conditional branches. What EBNF doesn't address,
and the metacompilers do, is how does parsing do anyting other than saying "matched" or not?
The secret is weaving output operations into the EBNF. The MetaII paper makes all this crystal clear.
Neither is harder than the other. It is really the difference between implementing something iteratively and implementing something recursively. In BNF, everything is recursive. In EBNF, some of the recursion is expressed iteratively. There are different variations in EBNF syntax, so I'll just use the English... "zero or more" is a simple while loop as you have discovered. "One or more" is the same as one followed by "zero or more". "Zero or one times" is a simple if statement. That should cover most of the cases.
The early meta compilers META II and TREEMETA and their kin are not exactly recursive decent parser. They were were stated as using recursive functions. That just meant they could call them selves.
We do not call C a recursive language. A C or C++ function is recursive in the same way the early meta compilers are recursive.
Recursion can be used. They were programming languages. Recursion is generally used only when analyzing nexted language constructs. For example parenthesized expression and nexted blocks.
More of an LR recursive decent combination. CWIC the last documented one has extensive backtracking and look ahead features. The '-' not operator can match any language construct. And inverts it success or failure. -term fails if a term is matched for example. The input is never advanced. The '?' looks ahead and matches any language construct ?expr for example would try to parse an expr. The look ahead '?' matched construct is not kept or is the input advanced.