How to prove that a grammar is ambigous/unambigous? - automation

If we have the the grammer
how to correctly prove that this is unambigious? In my opinion it's hard to describe it so is there a mathematical or logic proof for that?
And then how to proove that L(G) = G? Next thing where it is hard for me to give a logic proof.
Hope somebody can help.

You can prove that a grammar is unambiguous by constructing a deterministic parser. You can prove that it is ambiguous by finding a sentence with two different parse trees (or, better said, with two different leftmost or rightmost derivations).
There is no definitive algorithm for producing such proofs, because the ambiguity of a grammar is undecidable.
I have no idea what you mean by proving that L(G) = G. That's clearly not true, because L(G) is a set of strings, while G is a grammar. The two objects are from different universes, so they cannot be equal. Perhaps you meant proving that some set of strings S is equal to L(G)? Again, this problem is, in general, undecidable, but in many useful cases you can construct such a proof. A common strategy is to use induction on the length of the string.

Related

How to find the language of a NFA

As above, I have a transition graph, but I'm not sure how to find the language of it, seems to me that there are a lot of possibilities, but I must be misunderstanding somehow. My understanding is that any word that leads from the initial to the final state is accepted. Surely there are a lot of different ways to achieve this. aab, ab, abb, abbaab. As I understand it, a language is a set of all the possible words, but if there are a vast amount of possible words, how can you find the language? Im a first year University student, this is part of my homework, but I'm not just trying to get you to do it for me, I want to understand it - if this doesn't make sense/ is in the wrong place I apologise in advance, thanks. Here's my graph
Some regular languages are finite - they contain a finite number of strings. When you have a finite number of things, that means you can count them all and get to the end eventually; you can write them all down if they're words in a language. Writing down all the words in a language is a way of giving an extensive definition of a language.
However - there are languages which are not finite. They do not contain any number of words you can count from beginning to end, or ever completely write down. If you think of all natural numbers (1, 2, …, 100, …) as strings in the language of decimal representations, clearly there are not finitely many of them. There are infinitely many. You cannot give extensive definitions of infinite languages (except, possibly, by suggestive use of the ellipsis, as I have done in my example). In these cases, you must describe the strings which are included and/or excluded and rely on the reader to deduce membership for any particular case.
Giving a finite automaton is one perfectly reasonable way of giving a criterion according to which membership in the language can be determined: run the string through the automaton and see if it is accepted. In this sense, asking what the language of a finite automaton is can be viewed as trivial: it accepts the language of strings that leave the finite automaton in an accepting state.
Another way of describing the language is to give a grammar or, for regular languages, a regular expression. These are not necessarily more or less helpful ways of describing a language than the finite automaton you already have is.
Typically, in coursework, when you are asked to describe the language of a finite automaton, you are probably being asked to give a plain, English-language definition - a simple one - of the strings, and possibly provide some set-builder notation. That's what we'll try to do here.
Your NFA loops in q0, accepting any number of a, until it sees at least one a. If it sees a b before an a, it crashes. After that, if it sees at least one b, it can accept; it can see any number of b, but after the initial run of a, it can never again see two a in a row. The machine accepts only if it ends with a b.
Taken together, this might be a description in plain English that is good for this language:
All strings that begin with at least one a, which end with b and which do not contain the substring aa after the first instance of b.
A regular expression for this language is aa*(bb*a)*.

Many instances of a terminal symbol in a BNF grammar

given a grammar like
<term>::= x[i]+exp(x[i]) | x[i]
<i>::= 1|2|3
Does a way exist to force the use of the same "i" in one solution of non terminal symbol ? So, I want to avoid solutions like x[1]+exp(2) or x[3]+exp(1)
Does a way exist to avoid that the same "i" is used in one solution of non terminal symbol ?So, I want to avoid solutions like x[1]+exp(1)
No, that's not possible with a context-free grammar.
This is essentially what "context-free" means. Every non-terminal in a production can be expanded independently without regard to the context in which it appears.
Of course, if i really only has three possible values, you can enumerate the finite number of legal productions, according to any definition of "legal" which you find convenient. But that gets really messy when the number of possibilities increases.
The most convenient solution is generally to accept the base syntax and check for concordance (or difference) in the associated semantic rule. That also allows for better error messages.

String - Matching Automaton

So I am going to find the occurrence of s in d. s = "infinite" and d = "ininfinintefinfinite " using finite automaton. The first step I need is to construct a state diagram of s. But I think I have problems on identifying the occurrence of the string patterns. I am so confused about this. If someone could explain a little bit on this topic to me, it'll be really helpful.
You should be able to use regular expressions to accomplish your goal. Every programming language I've seen has support for regular expressions and should make your task much easier. You can test your regexs here: https://regexr.com/ and find a reference sheet for different things. For your example, the regex /(infinite)/ should do the trick.

Is it, and if so why, wrong that these two regular grammars are different?

I'm tasked with writing a regular grammar based on a regular expression.
Given the regular expression a*b can be written as S -> b | aS
Is it incorrect that ba* as a regular grammar is S -> b | Sa?
I'm told the correct answer is in fact S -> bA, A -> ^| aA but I don't see the difference myself.
An explanation would be greatly appreciated!
IIRC, both your answer and the one being called "correct" are correct. See this. What you have constructed is a "left regular grammar", while the proponent(s) of the "correct" answer obviously prefer a "right regular grammar". There are other arbitrary rules that may be held more or less pedantically, like the "no empty productions" rule, but they don't really affect the class of regular languages, just the compactness of the grammar you use for a particular language, as your example highlights - a single production with two alternatives vs. two productions, one with a single clause, and one with two alternatives, one of which is empty.

common meanings of punctuation characters

I'm writing my own syntax and want characters that do not have obvious common meanings in that syntax [1]. Is there a list of the common meanings of punctuation characters (e.g. '?' could be part of a ternary operator, or part of a regex) so I can try to pick those which may not have 'obvious' syntax (I can be the judge of that :-).
[1] It's actually an extended Fortran FORMAT, but the details are irrelevant here
Here is an exhaustive survey of syntax across languages.
I am loath to be so defeatist, but this does sound a bit like it doesn't exist ( a list of all the symbols / operators across languages ) a quick look around would give a good idea of what is commonplace.
Assuming that you will restrict yourself to ASCII, the short-list is more or less what you can see on your keyboard and I can can think of a few uses for most of them. So maybe avoiding conflicts is a bit ambitious. Of course it depends on who is to be the user of this syntax, if for example symbols that are relatively unused in Fotran would be suitable then that is more realistic.
This link: Fotran 95 Spec gives a list of Fortran operators, which might help if avoided.
I'm sorry if any of this is a statement of the obvious or missing the point, or just not very helpful :)
I would say [a-z][A-Z] All do not have an obvious syntax for instance. if you used Upper case T as an operator.
x T v
The downfall is people like to use letters for variables.
Other than that you might want to investigate multicharacter operators, the downfall of these however is that they quickly grow weary to type things like
scalar = vec4i *+ vec4j
if you perhaps had a Fused multiply add operator. Well that one isnt so bad, but I'm sure you can find more cumbersome ones.