Regular grammars - grammar

In a regular grammar with the following rules
S->aS/aSbS/ε is it acceptable to do the following steps:
S->aSbS->a{aSbS}bS->aa{aSbS}bSbS->aaa{aSbS}bSbSbS
Do i have to replace every S in each step or i can replace one S out of two for example? In this : aSbS can i do aSb (following the rule S->ε) and if i cannot
should i replace all the S's with the same rule? (aSbS)-> a(aS)b(aS) (following the rule (S->aS) ) or i can do (aSbS->a(aS)b(aSbS)) .
ps. I use the parentheses do indicate which S's i have replaced

In a formal grammar, a derivation step consists of replacing a single instance of the left-hand side of a rule with the right-hand side of that rule. In the case of a context-free grammar, the left-hand side is a single non-terminal, so a derivation step consists of replacing a single instance of a non-terminal with one of the possible corresponding right-hand sides.
You never perform two replacements at the same time, and each derivation step is independent of every other derivation step, so in a context-free grammar, there is no way to express the constraint that two non-terminals need to be replaced with the same right-hand side. (In fact, you cannot directly express this constraint with a context-sensitive grammar either, but you can achieve the effect by using markers.)
Regular grammars are a subset of context-free grammars where either every right-hand side which includes a non-terminal has the form aC or every right-hand side which includes a non-terminal has the form Ba. (This comes straight out of the Wikipedia page which you link.) Your grammar is not a regular grammar, because the rule S → aSbS has two non-terminals on the right-hide side, which is not either of the regular forms.

Related

How can I use the same lexer to provide token streams with and without whitespace?

I have a lexer grammar that defines a lexer that is used in two ways: to identify tokens for a syntax-aware editor, and to identify tokens for the parser. In the first case, the lexer should return comments and whitespace, but in the second case, the comments and whitespace are not wanted. Do I need two different lexer classes, each defined by its own variant of the grammar? Or can I accomplish this with a single lexer by using channels? How?
If I need two separate grammars, I assume I can factor out all the rules except for comments and whitespace, and then import those rules from that separate "common" grammar.
Usually you filter out tokens (like whitespaces) via token channels (or skip them entirely). This is part of your grammar and hence you'd need 2 grammars if you want whitespaces in one use case and not in the other. And yes, you can import a base grammar with all the common rules into specialized grammars which only hold the differences. You can even override rules (define e.g. the whitespace rule in the base grammar and redefine it in your main grammar).
But keep in mind that not filtering whitespaces will have consequences for all your other rules. In that case you would have to explicitly add whitespace handling to your parser rules everywhere. For instance:
blah: a or b;
versus
blah: a WS* or WS* b;

Can a context-sensitive grammar have an empty string?

In one of my cs classes they mentioned that the difference between context-free grammar and context-sensitive grammar is that in CSG, then the left side of the production rule has to be less or equal than the right side.
So, one example they gave was that context-sensitive grammars can't have an empty string because then the first rule wouldn't be satisfied.
However, I have understood that regular grammars are contained in context-free, context-free are contained in context-sensitive, and context-sensitive are contained in recursive enumerable grammars.
So, for example if a grammar is recursive enumerable then is also of the type context-sensitive, context-free and regular.
The problem is that if this happens, then if I have a context-free grammar that contains an empty string then it would not satisfy the rule to be counted as a context-sensitive, but then a contradiction would occur, because each context-sensitive is context-free.
Empty productions ("lambda productions", so-called because λ is often used to refer to the empty string) can be mechanically eliminated from any context-free grammar, except for the possible top-level production S → λ. The algorithm to do so is presented in pretty well every text on formal language theory.
So for any CFG with lambda productions, there is an equivalent CFG without lambda productions which generates the same language, and which is also a context-sensitive grammar. Thus, the prohibition on contracting rules in CSGs does not affect the hierarchy of languages: any context-free language is a context-sensitive language.
Chomsky's original definition of context-sensitive grammars did not specify the non-contracting property, but rather an even more restrictive one: every production had to be of the form αAβ→αγβ where A is a single symbol and γ is not empty. This set of grammars generates the same set of languages as non-contracting grammars (that was also proven by Chomsky), but it is not the same set. Also, his context-free grammars were indeed a subset of context-sensitive grammars because by his original definition of CFGs, lambda productions were prohibited. (The 1959 paper is available online; see the Wikipedia article on the Chomsky hierarchy for a reference link.)
It is precisely the existence of a non-empty context -- α and β -- which leads to the names "context-sensitive" and "context-free"; it is much less clear what "context-sensitive" might mean with respect to an arbitrary non-contracting rule such as AB→BA . (Note 1)
In short, the claim that "every CFG is a CSG" is not technically correct given the common modern usage of CFG and CSG, as cited in your question. But it is only a technicality: the CFG with lambda productions can be mechanically transformed, just as a non-contracting grammar can be mechanically transformed into a grammar fitting Chomsky's definition of context-sensitive (see the Wikipedia article on non-contracting grammars).
(It is also quite common to allow both context-sensitive and context-free languages to include the empty string, by adding an exception for the rule S→λ to both CFG and CSG definitions.)
Notes
In Chomsky's formulation of context-free and -sensitive grammars, it was unambiguous what was meant by a parse tree, even for a CSG; since Chomsky is a linguist and was seeking a framework to explain the structure of natural language, the notion of a parse tree mattered. It is not at all obvious how you might describe the result of AB → BA as a parse tree, although it is quite clear in the case of, for example, a A b → B.

Mutually left-recursive?

I'm working on a parser for a grammar in ANTLR. I'm currently working on expressions where () has the highest order precedence, then Unary Minus, etc.
When I add the line ANTLR gives the error: The following sets of rules are mutually left-recursive [add, mul, unary, not, and, expr, paren, accessMem, relation, or, assign, equal] How can I go about solving this issue? Thanks in advance.
Easiest answer is to use antlr4 not 3, which has no problem with immediate left recursion. It automatically rewrites the grammar underneath the covers to do the right thing. There are plenty of examples. one could for example examine the Java grammar or my little blog entry on left recursive rules. If you are stuck with v3, then there are earlier versions of the Java grammar and also plenty of material on how to build arithmetic expression rules in the documentation and book.

RFC 6570 URL Templates : the role of / vs. other prefixes

I recently read some of : https://www.rfc-editor.org/rfc/rfc6570#section-1
And I found the following URL template examples :
GIVEN :
var="value";
x=1024;
path=/foo/bar;
{/var,x}/here /value/1024/here
{#path,x}/here #/foo/bar,1024/here
These seem contradictory.
In the first one, it appears that the / replaces ,
In the 2nd one, it appears that the , is kept .
Thus, I'm wondering wether there are inconsistencies in this particular RFC. I'm new to these RFC's so maybe I don't fully understand the culture behind how these develop.
There's no contradiction in those two examples. They illustrate the point that the rules for expanding an expression whose first character is / are different from the rules for expanding an expression whose first character is #. These alternative expansion rules are pretty much the entire point of having a variety of different magic leading characters -- which are called operators in the RFC.
The expression with the leading / is expanded according to a rule that says "each variable in the expression is replaced by its value, preceded by a / character". (I'm paraphrasing the real rule, which is described in section 3.2.6 of that RFC.) The expression with the leading # is expanded according to a rule that says "each variable in the expression is replaced by its value, with the first variable preceded by a # and subsequent variables preceded by a ,. (Again paraphrased, see section 3.2.4 for the real rule.)

Does exists one algorithmic to convert a Linear Grammar Right to a Linear Grammar Left?

Does exists one algorithmic to convert a Linear Grammar Right to the equal Linear Grammar Left?
For every right-linear grammar, there exists an equivalent left-linear grammar that generates the same language, and vice-versa.
Use the grammar to build the FSA that recognizes the language generated by the original grammar.
Swap initial states with final states.
Invert arrows orientation.
If multiple initial states are present, set them as not initial, create a dummy initial state and link it with them using spontaneous moves.
From the modified FSA, obtain another right-linear grammar, using the "standard" approach.
Reverse the right side of every production of the grammar.
You should get an equivalent left-linear grammar.