Let's say I'd using below grammar for compiler
S -> a | aB
if I perform left factoring on it, it would be like (e is epsilon)
S -> aC
C -> B | e
then I want to remove epsilon which ends up being like
S -> a | aC
C -> B
notice that it looks like I again need to perform left factoring, and doing so infinitely left factoring and removing epsilon back and forth. Am I doing something wrong ??
Is it require to remove both left factoring and epsilon on grammar for compiler?
You shouldn't need to do either transformation, most of the time. In fact, for some grammars, left-factoring will create shift-reduce conflicts for LALR(1) parser generators.
Related
select age from person where name in (select name from eats where pizza="mushroom")
I am not sure what to write for the "in". How should I solve this?
In this case the sub-select is equivalent to a join:
select age
from person p, eats e
where p.name = e.name and pizza='mushroom'
So you could translate it in:
πage (person p ⋈p.name=e.name (σpizza='mushroom'(eats e)))
Here's my guess. I'm assuming that set membership symbol is part of relational algebra
For base table r, C a column of both r & s, and x an unused name,
select ... from r where C in s
returns the same value as
select ... from r natural join (s) x
The use of in requires that s has one column. The in is true for a row of r exactly when its C equals the value in s. So the where keeps exactly the rows of r whose C equals the value in s. We assumed that s has column C, so the where keeps exactly the rows of r whose C equals the C of the row in r. Those are same rows that are returned by the natural join.
(For an expression like this where-in with C not a column of both r and s then this translation is not applicable. Similarly, the reverse translation is only applicable under certain conditions.)
How useful this particular translation is to you or whether you could simplify it or must complexify it depends on what variants of SQL & "relational algebra" you are using, what limitations you have on input expressions and other translation decisions you have made. If you use very straightforward and general translations then the output is more complex but obviously correct. If you translate using a lot of special case rules and ad hoc simplifications along the way then the output is simpler but the justification that the answer is correct is longer.
I am trying to work out how SQL queries are run and have hit a bit of a stumbling block.
If a where clause akin to the below is used:
A OR B AND C
This could mean either of the below
(A OR B) AND C
or
A OR (B AND C)
In the majority of cases the results will be the same, but if the set to be queried contains solely {A}, the first variant would return an empty result set and the second would return {A}. SQL does in fact return the 1 result.
Does anyone know (or have links to) any insight that will help me understand how queries are built?
Ketchup
The order is the following according to MSDN:
~ (Bitwise NOT)
(*) (Multiply), / (Division), % (Modulo)
(+) (Positive), - (Negative), + (Add), (+ Concatenate), - (Subtract), & (Bitwise AND), ^ (Bitwise Exclusive OR), | (Bitwise OR)
=, >, <, >=, <=, <>, !=, !>, !< (Comparison operators)
NOT
AND
ALL, ANY, BETWEEN, IN, LIKE, OR, SOME
= (Assignment)
In the knowledge (from documentation) that AND has a higer precedence than OR, you should aim to write predicates for WHERE clauses in conjunctive normal form ("a seires of AND clauses").
If the intention is
( A OR B ) AND C
then write it thus and all is good.
However, if the intention is
A OR ( B AND C )
then I suggest you apply the distributive rewrite law that results in conjunctive normal form i.e.
( P AND Q ) OR R <=> ( P OR R ) AND ( Q OR R )
In your case:
A OR ( B AND C ) <=> ( A OR B ) AND ( A OR C )
AND and OR have different precedende.
See Precedence Level
For SQL-Server (which is your tag) here is the precedence http://msdn.microsoft.com/en-us/library/ms190276.aspx but..
If you're worried about the exact result set given you should indeed start working with () subsets.
This msdn link says that '(' and ')' has left to right associativity.
How does that make sense? Can someone give me a example?
In formal grammars an operator with left-to-right precedence is left recursive. So with binary addition:
S -> E
E -> E + a
E -> a
So the tree would look like:
S
|
E
|
E + a
|
E + a
|
a
As you can see the first two a's are added before the second two.
For a right-associative operator, your grammar would contain right-recursion. So with binary exponentiation:
S -> E
E -> a ** E
E -> a
Subsequently your parse tree would look like:
S
|
E
|
a ** E
|
a ** E
|
a
As you can see the last two a's are exponentiated, first, and the result of that is the power of the first a (this is the proper associativity for exponentiation, by the way).
For ternary and greater operators the same pattern is applied, the recursive rule is either the left-most nonterminal or the right-most nonterminal. In the case of unary operators, though, whether or not it's left- or right-recursive depends on whether or not the nonterminal s on the left or right, respectively. In the case of ( E ) there is a terminal on either side, and the nonterminal, although recursive, is neither left- nor right-recursive from a grammatical point of view so I think the MSDN article has arbitrarily declared it to be "left to right."
The associativity of parenthesis has no bearing on the fact that d is evaluated prior to a+b+c in d+(a+b+c) and has no bearing on the associativity of a+b+c as well, so I have no idea what the other Jared is talking about.
What is unclear? They come in a pair whenever you have a ( you will have a ). Once inside of them you work the arythmatic from left to right so if you have d+(a+b+c) you will do a+b then the sum of a+b added to c then the total sum of a,b,c added d
I need some help converting an SQL query into relational algebra.
Here is the SQL query:
SELECT * FROM Customer, Appointment
WHERE Appointment.CustomerCode = Customer.CustomerCode
AND Appointment.ServerCode IN
(
SELECT ServerCode FROM Appointment WHERE CustomerCode = '102'
)
;
I'm stuck because of the IN subquery in the above example.
Can anyone demonstrate for me how to express this SQL query in relational algebra?
Many thanks.
EDIT: Here is my proposed solution in relational algebra. Is this correct? Does it reproduce the SQL query?
Scodes ← ΠServerCode(σCustomerCode='102'(Appointment))
Ccodes ← ΠCustomerCode(Appointment ⋉ Scodes)
Result ← (Customer ⋉ Ccodes)
Your SQL code will result in duplicate columns for CustomerCode and the use of SELECT [ALL] is likely to result in duplicate rows. Because the result is not a relation, it cannot be expressed in relational algebra.
These problems are easily fixed in SQL:
SELECT DISTINCT *
FROM Customer NATURAL JOIN Appointment
WHERE Appointment.ServerCode IN
(
SELECT ServerCode FROM Appointment WHERE CustomerCode = '102'
)
;
You didn't specify which relational algebra you are intereted in. Date and Darwen proposed an algebra named A, specified an A language named D, and designed a D language named Tutorial D.
Tutorial D uses operators JOIN for natural join, WHERE for restriction and MATCHING for semijoin, The slight complication is the comparison in SQL:
CustomerCode = '102'
The comparison of a CustomerCode value to a CHAR value in SQL is possible because of implicit coercion. Tutorial D is stricter -- type safe, if you will -- requiring you to overload the equality operator or, more practically, define a selector operator for CHAR, which would typically have the same name as the type.
Therefore, the above (revised) SQL may be written in Tutorial D as:
( Customer JOIN Appointment )
MATCHING ( ( Appointment WHERE CustomerCode = CustomerCode ( '102' ) ) { ServerCode } )
"How do I represent my query in this standard form of RA?"
It's not so much a question of "type of algebra" as it is of "type of notation".
Notation using greek symbols typically uses sigma, the restrict condition in subscript appended to the sigma character, and then the subject of the restriction (the relational expression that is subjected to the restrict condition).
Date avoid that notation, because typesetting and/or creating text using such notations is usually a lot harder than it is using just the western alphabet (a math teacher of mine once told us that math textbooks contain the most errors of all).
σ <cond> (<rel exp>) thus denotes the very same algebra expression as (Date's syntax) "<rel exp> WHERE <cond>".
Similarly, with greek symbols, projection is typically denoted using the letter Pi, with the list of retained attributes in subscript appended to the Pi, and the expression that is the subject of the projection following that.
Π <attr list> (<rel exp>) thus denotes the very same algebra expression as (Date's syntax) "<rel exp> { <attr list> }".
The join family of operators is usually denoted, in "greek" symbols, using (variations of) the Unicode BOWTIE character, or that character consisting of a lowercase letter 'x' surrounded by a full circle (usually used to denote full cartesian product, cross-product, ... whatever your algebra course happens to name it).
Some courses provide a "greek-symbol" notation for rename, using the greek letter Rho. Appended in subscript is the rename list, in the form a1->b1,a2->b2,... Appended after that comes the relational expression that is subjected to the rename. Likewise, Date has a non-greek-symbol equivalent syntax : <rel exp> RENAME a1 AS b1, a2 AS b2 , ...
The important thing is to see that these differences are merely differences in syntactical notation, not "different algebrae".
EDIT
One could imagine that the greek symbols notation would be the way to program relational algebra into an APL engine, Date's syntax would be the way to program relational algebra into a cobol-like or PL/1-like engine (there effectively exists such an engine called Rel), and the way to program relational algebra into an OO-like engine, could look something like relation.NaturalJoin(otherRelation).Matching(yetOtherRelation.Restrict(condition).project(attributesList)).
I just heard that DLR has a three level caching strategy.. But what it is .. A simple explanation with simple example will be very helpful.
Thanks
This is how I understand it, the idea of the caching is to reuse expressions wherever possible to reduce the dynamic vs static overhead of dynamic expression evaluation.
imagine a dynamic expression
>> a + b
Then working this out the first time an expression/syntax tree will need to be created (if one doesn't exist). This is of the type
if a is an int and not null and b is an int and not null then result = a + b
This is essentially a rule that can evaulated and if true the expression can be used. Hence we have a level 1 cache.
Level 2 is simillar but a more complex rule, probably along the lines of:
if a is an int and not null and b is an int and not null then result = a + b
if a is string and b is an int then do Int.Parse(a) + b
etc...
Level 3 is more complex still.
if no expression can be found then a new expression is created and added to one of the caches (though I don't know anything about that).
As I understand it l1 is 1 rule, l2 is about 10 rules and l3 is about 100 rules.
I got all this from reading around the subject on google.
- http://dotnetslackers.com/articles/csharp/Dissecting-C-Sharp-4-0-Dynamic-Programming.aspx
- http://msdn.microsoft.com/en-us/magazine/cc163344.aspx
and some others I cannot recall now.