Construct a grammar for a language - grammar

I have a question regarding this question:
L= empty where the alphabet is {a,b}
how to create a grammar for this ? how can be the production rule ?
thanks in advance

A grammar G is an ordered 4-tuple {S, N, E, e, P} where:
N is a set of non-terminal symbols
E is a set of terminal symbols
N and E are disjoint
E is a superset of the alphabet of L(G)
e is the empty string
P is a set of ordered pairs of elements of (N U E U e); that is, P is a subset of (N U E U e) X (N U E U e)*.
S, the start symbol, is in N
A derivation in G is a sequence of elements of (N U E U e)* such that:
The first element is S
Adjacent elements w[i] and w[i+1] can be written as w[i] = uxv and w[i+1] = uyv such that (x, y) is in P
If there is a derivation in G whose last element is a string w[n] over (E U e)*, we say G generates w[n]; that is, w[n] is in L(G).
Now, we want to define a grammar G such that L(G) is the empty set. We fix the alphabet E = {a, b}. We must still define:
N, the set of nonterminals
S, the start symbol
P, the productions
We might as well take S as our start symbol. So N contains at least S; N is a superset of {S}. We will only add more nonterminals if we determine we need them. Let us turn our attention to the condition that L(G) is empty.
If L(G) is empty, that means there is no derivation in G that leads to a string of only terminal symbols. We can accomplish this easily be ensuring all our productions produce at least one nonterminal with any terminal. Or produce no terminals at all. So the following grammars would all work:
S := S
or
S := aSb
or
S := aXb | XXSSX
X := aabbXbbaaS
etc. All of these grammars have L(G) empty since none of them can derive a string of nonterminals.

Related

Stating a grammar formally as a 4 tuple

Got this interview q, that I know is easy but I can't seem to get.
State the following grammar formally as a 4-tuple. (Assume the terminal alphabet is the set of lowercase letters the appear in productions, the nonterminal alphabet is the set of uppercase letters that appear in productions and the start symbol is S.)
S -> abS|X
X -> baX|epsilon
A grammar G is defined as a four-tuple (N, S, E, P) where:
N is a finite set of nonterminal symbols
S, an element of N, is the start symbol
E is a finite set of terminal symbols
N and E are disjoint
P is a set of production, or ordered pairs over (N + E)* x (N + E)*
There is a derivation of string w in grammar G if there is a sequence w[1], w[2], ..., w[n] of productions in P such that:
w[1] = S
For each 1 <= i < n: w[i] = xyz, w[i+1] = xy'z and (y, y') is a production in P
w[n] = w is a string of terminal symbols.
The set of all strings w which have a derivation in G is called the language of G, L(G).
Now, for your grammar:
Nonterminal symbols: S, X
Start symbol: S
Terminal symbols: a, b
Productions: (S, abS), (S, X), (X, baX), (X, epsilon)

Constructing a linear grammar for the language

I find difficulties in constructing a Grammar for the language especially with linear grammar.
Can anyone please give me some basic tips/methodology where i can construct the grammar for any language ? thanks in advance
I have a doubt whether the answer for this question "Construct a linear grammar for the language: is right
L ={a^n b c^n | n belongs to Natural numbers}
Solution:
Right-Linear Grammar :
S--> aS | bA
A--> cA | ^
Left-Linear Grammar:
S--> Sc | Ab
A--> Aa | ^
As pointed out in the comments, these grammars are wrong since they generate strings not in the language. Here's a derivation of abcc in both grammars:
S -> aS -> abA -> abcA -> abccA -> abcc
S -> Sc -> Scc -> Abcc -> Aabcc -> abcc
Also as pointed out in the comments, there is a simple linear grammar for this language, where a linear grammar is defined as having at most one nonterminal symbol in the RHS of any production:
S -> aSc | b
There are some general rules for constructing grammars for languages. These are either obvious simple rules or rules derived from closure properties and the way grammars work. For instance:
if L = {a} for an alphabet symbol a, then S -> a is a gammar for L.
if L = {e} for the empty string e, then S -> e is a grammar for L.
if L = R U T for languages R and T, then S -> S' | S'' along with the grammars for R and T are a grammar for L if S' is the start symbol of the grammar for R and S'' is the start symbol of the grammar for T.
if L = RT for languages R and T, then S = S'S'' is a grammar for L if S' is the start symbol of the grammar for R and S'' is the start symbol of the grammar for T.
if L = R* for language R, then S = S'S | e is a grammar for L if S' is the start symbol of the grammar for R.
Rules 4 and 5, as written, do not preserve linearity. Linearity can be preserved for left-linear and right-linear grammars (since those grammars describe regular languages, and regular languages are closed under these kinds of operations); but linearity cannot be preserved in general. To prove this, an example suffices:
R -> aRb | ab
T -> cTd | cd
L = RT = a^n b^n c^m d^m, 0 < a,b,c,d
L' = R* = (a^n b^n)*, 0 < a,b
Suppose there were a linear grammar for L. We must have a production for the start symbol S that produces something. To produce something, we require a string of terminal and nonterminal symbols. To be linear, we must have at most one nonterminal symbol. That is, our production must be of the form
S := xYz
where x is a string of terminals, Y is a single nonterminal, and z is a string of terminals. If x is non-empty, reflection shows the only useful choice is a; anything else fails to derive known strings in the language. Similarly, if z is non-empty, the only useful choice is d. This gives four cases:
x empty, z empty. This is useless, since we now have the same problem to solve for nonterminal Y as we had for S.
x = a, z empty. Y must now generate exactly a^n' b^n' b c^m d^m where n' = n - 1. But then the exact same argument applies to the grammar whose start symbol is Y.
x empty, z = d. Y must now generate exactly a^n b^n c c^m' d^m' where m' = m - 1. But then the exact same argument applies to the grammar whose start symbol is Y.
x = a, z = d. Y must now generate exactly a^n' b^n' bc c^m' d^m' where n' and m' are as in 2 and 3. But then the exact same argument applies to the grammar whose start symbol is Y.
None of the possible choices for a useful production for S is actually useful in getting us closer to a string in the language. Therefore, no strings are derived, a contradiction, meaning that the grammar for L cannot be linear.
Suppose there were a grammar for L'. Then that grammar has to generate all the strings in (a^n b^n)R(a^m b^m), plus those in e + R. But it can't generate the ones in the former by the argument used above: any production useful for that purpose would get us no closer to a string in the language.

Return rows where the count of a character is?

I'm developing a rudimentary word finder app with sql and ruby where I have an array of letters to find available words. It's easier to make the query by narrowing down what alphabetic letters aren't in the array. For ex.
alphabet= %w{a b c d e f g h i j k l m n o p q r s t u v w x y z}
available_letters = %w{p k z l p m t l n g g r u a r t n d z w a l m n e}
I can then subtract from alphabet, letters to exclude from my search and end up with an sql query like the one below.
select * from words
where word not like '%b%' and word not like '%c%' and word not like '%f%'.....
This gives me all the available words with a combination of all available letters. It does not narrow them down by the number of times that letter occurs. So if I only have one "E", I would like the query to narrow down words that only contain one e. I'm not sure if this can be done with an sql query or whether I will need to use a procedure. Anyone know a good way of solving this?
You will probably want to look into faster ways of implementing this, but to answer your question, you can exclude words with more than one "e" with not like '%e%e%'.

Optimizing partial computation in Haskell

I'm curious how to optimize this code :
fun n = (sum l, f $ f0 l, g $ g0 l)
where l = map h [1..n]
Assuming that f, f0, g, g0, and h are all costly, but the creation and storage of l is extremely expensive.
As written, l is stored until the returned tuple is fully evaluated or garbage collected. Instead, length l, f0 l, and g0 l should all be executed whenever any one of them is executed, but f and g should be delayed.
It appears this behavior could be fixed by writing :
fun n = a `seq` b `seq` c `seq` (a, f b, g c)
where
l = map h [1..n]
a = sum l
b = inline f0 $ l
c = inline g0 $ l
Or the very similar :
fun n = (a,b,c) `deepSeq` (a, f b, g c)
where ...
We could perhaps specify a bunch of internal types to achieve the same effects as well, which looks painful. Are there any other options?
Also, I'm obviously hoping with my inlines that the compiler fuses sum, f0, and g0 into a single loop that constructs and consumes l term by term. I could make this explicit through manual inlining, but that'd suck. Are there ways to explicitly prevent the list l from ever being created and/or compel inlining? Pragmas that produce warnings or errors if inlining or fusion fail during compilation perhaps?
As an aside, I'm curious about why seq, inline, lazy, etc. are all defined to by let x = x in x in the Prelude. Is this simply to give them a definition for the compiler to override?
If you want to be sure, the only way is to do it yourself. For any given compiler version, you can try out several source-formulations and check the generated core/assembly/llvm byte-code/whatever whether it does what you want. But that could break with each new compiler version.
If you write
fun n = a `seq` b `seq` c `seq` (a, f b, g c)
where
l = map h [1..n]
a = sum l
b = inline f0 $ l
c = inline g0 $ l
or the deepseq version thereof, the compiler might be able to merge the computations of a, b and c to be performed in parallel (not in the concurrency sense) during a single traversal of l, but for the time being, I'm rather convinced that GHC doesn't, and I'd be surprised if JHC or UHC did. And for that the structure of computing b and c needs to be simple enough.
The only way to obtain the desired result portably across compilers and compiler versions is to do it yourself. For the next few years, at least.
Depending on f0 and g0, it might be as simple as doing a strict left fold with appropriate accumulator type and combining function, like the famous average
data P = P {-# UNPACK #-} !Int {-# UNPACK #-} !Double
average :: [Double] -> Double
average = ratio . foldl' count (P 0 0)
where
ratio (P n s) = s / fromIntegral n
count (P n s) x = P (n+1) (s+x)
but if the structure of f0 and/or g0 doesn't fit, say one's a left fold and the other a right fold, it may be impossible to do the computation in one traversal. In such cases, the choice is between recreating l and storing l. Storing l is easy to achieve with explicit sharing (where l = map h [1..n]), but recreating it may be difficult to achieve if the compiler does some common subexpression elimination (unfortunately, GHC does have a tendency to share lists of that form, even though it does little CSE). For GHC, the flags fno-cse and -fno-full-laziness can help avoiding unwanted sharing.

First & Follow Sets check for simple grammar

Here's a few questions I had on a quiz in a class and just want to verify their correctness.
Grammar:
S -> ABC
A -> df | epsilon
B -> f | epsilon
C -> g | epsilon
1.) The Follow set of B contains g and epsilon (T/F)? Ans: F.
There is no epsilon in the Follow sets, correct? (Only $ aka end of input)
2.) The First set of S contains d, f, g, and epsilon (T/F)? Ans: T.
I said false for this because I thought First(S) = First(A), which g is not a part of. Who is correct?
You are correct. If epsilon is involved, it will be accounted for in the First set, not the Follow set. If it's possible for the production to end the string, then $ goes in the Follow set, not epsilon.
The quiz is correct. The production S can indeed start with any of d, f, and g, and it can also be started by the empty string. Consider the input string g. It matches S, right? A is satisfied by the empty string, B is satisfied by the empty string, and C is satisfied by g. Since A, B, and C are all satisfied, S is satisfied. The first character consumed by S is g, so g must be in First(S).