Is it, and if so why, wrong that these two regular grammars are different? - grammar

I'm tasked with writing a regular grammar based on a regular expression.
Given the regular expression a*b can be written as S -> b | aS
Is it incorrect that ba* as a regular grammar is S -> b | Sa?
I'm told the correct answer is in fact S -> bA, A -> ^| aA but I don't see the difference myself.
An explanation would be greatly appreciated!

IIRC, both your answer and the one being called "correct" are correct. See this. What you have constructed is a "left regular grammar", while the proponent(s) of the "correct" answer obviously prefer a "right regular grammar". There are other arbitrary rules that may be held more or less pedantically, like the "no empty productions" rule, but they don't really affect the class of regular languages, just the compactness of the grammar you use for a particular language, as your example highlights - a single production with two alternatives vs. two productions, one with a single clause, and one with two alternatives, one of which is empty.

Related

Many instances of a terminal symbol in a BNF grammar

given a grammar like
<term>::= x[i]+exp(x[i]) | x[i]
<i>::= 1|2|3
Does a way exist to force the use of the same "i" in one solution of non terminal symbol ? So, I want to avoid solutions like x[1]+exp(2) or x[3]+exp(1)
Does a way exist to avoid that the same "i" is used in one solution of non terminal symbol ?So, I want to avoid solutions like x[1]+exp(1)
No, that's not possible with a context-free grammar.
This is essentially what "context-free" means. Every non-terminal in a production can be expanded independently without regard to the context in which it appears.
Of course, if i really only has three possible values, you can enumerate the finite number of legal productions, according to any definition of "legal" which you find convenient. But that gets really messy when the number of possibilities increases.
The most convenient solution is generally to accept the base syntax and check for concordance (or difference) in the associated semantic rule. That also allows for better error messages.

String - Matching Automaton

So I am going to find the occurrence of s in d. s = "infinite" and d = "ininfinintefinfinite " using finite automaton. The first step I need is to construct a state diagram of s. But I think I have problems on identifying the occurrence of the string patterns. I am so confused about this. If someone could explain a little bit on this topic to me, it'll be really helpful.
You should be able to use regular expressions to accomplish your goal. Every programming language I've seen has support for regular expressions and should make your task much easier. You can test your regexs here: https://regexr.com/ and find a reference sheet for different things. For your example, the regex /(infinite)/ should do the trick.

Efficient method for storing simple regular expressions

I have a list of simple regular expressions:
ABC.+DE.+FHIJ.+
.+XY.+Z.+AB
.+KLM.+NO.+J.+
QRST.+UV
they all have alternating patterns of .+ and some text (I will call "words") repeated some number of times. A pattern may or may not begin or end in .+. These regular expression are all mutually exclusive. When another regex is added I want to remove any other matching regular expressions, and add one regular expression that combines the added one with all of its matches. For example, adding:
.+J.+
would match,
ABC.+DE.+FHIJ.+
.+KLM.+NO.+J.+
and thus, these would be remove and replaced with the added regular expression resulting in:
.+J.+
.+XY.+Z.+AB
QRST.+UV
I need to store these patterns either in some data structure or (preferably) in a database in an efficient manner. I first tried a tree of dictionaries, only to realize that in the case that a regex starts with a .* it has to search the entire tree for the next word, which is order O(2^n). Unfortunately, (unless I am mistaken) it appears that neither SQLite (which I am using) nor any other relational database that I have used, supports "regular expression" as a data type. My question is, is there an efficient method for storing and retrieving such simple regular expressions? If there is no canned method, is there some data structure that would be relatively efficient (say, at worst amortized polynomial time)?
Could you please explain what you are using these regular expressions for as that would make it easier to provide a better answer? In particular when I see the way you are splitting your regular expressions I'm wondering if a Trie or a Directed acyclic word graph would be a better fit.
From their you may find your answer is as simple as providing better normalization or finding an alternative no SQL db made specifically for your problem area.

Am I correct? (Finite Automata)

I was given a regular expression, and I am suppose to covert it to NFA and then DFA. Here's the regular expression:
a ( b | c )* a | a a c* b
Then I coverted this to NFA using Thomson's algorithm:
and here's the DFA:
Can someone please take a quick look at let me know if I am wrong or right?
Since this is very likely homework, I'm hesitant to just give you the complete correct solution.
Your NFA appears correct, but has a lot of superfluous states that aren't necessary but do not adversely affect its correctness. (At first glance it looks like you could remove 11 states.)
Your DFA is incorrect, though. This is because when you branch off to begin handling one condition of the string or the other, you later rejoin them together. This allows it to take the path from an accepted string matching a(b|c)*a and take in another b or c by travelling to nodes 15,17 or 11. It then accepts this string even though it doesn't match your expression.
What you need to do is basically stop this from happening. If you have additional questions feel free to ask.
I highly recommend making a list of test strings that you know should be and shouldn't be accepted, and then trace them through, making sure your automata ends in the correct (accept or reject) state.

common meanings of punctuation characters

I'm writing my own syntax and want characters that do not have obvious common meanings in that syntax [1]. Is there a list of the common meanings of punctuation characters (e.g. '?' could be part of a ternary operator, or part of a regex) so I can try to pick those which may not have 'obvious' syntax (I can be the judge of that :-).
[1] It's actually an extended Fortran FORMAT, but the details are irrelevant here
Here is an exhaustive survey of syntax across languages.
I am loath to be so defeatist, but this does sound a bit like it doesn't exist ( a list of all the symbols / operators across languages ) a quick look around would give a good idea of what is commonplace.
Assuming that you will restrict yourself to ASCII, the short-list is more or less what you can see on your keyboard and I can can think of a few uses for most of them. So maybe avoiding conflicts is a bit ambitious. Of course it depends on who is to be the user of this syntax, if for example symbols that are relatively unused in Fotran would be suitable then that is more realistic.
This link: Fotran 95 Spec gives a list of Fortran operators, which might help if avoided.
I'm sorry if any of this is a statement of the obvious or missing the point, or just not very helpful :)
I would say [a-z][A-Z] All do not have an obvious syntax for instance. if you used Upper case T as an operator.
x T v
The downfall is people like to use letters for variables.
Other than that you might want to investigate multicharacter operators, the downfall of these however is that they quickly grow weary to type things like
scalar = vec4i *+ vec4j
if you perhaps had a Fused multiply add operator. Well that one isnt so bad, but I'm sure you can find more cumbersome ones.