How can an ambiguous grammar with quantifiers be transformed into LL1? - grammar

I cannot get my head around resolving this grammar (? means zero or one occurrences, + means at least one occurrence) into some equivalent that can be parsed using LL(1):
S -> X? Y+
X -> aU
Y -> aV
The problem is: when I see an 'a', was it produced by X or Y? Any ideas?
EDIT: U and V can start with the same symbol ...

You need to left factor the rules, in order to create an LL(1) grammar. As long as U and V cannot start with the same symbol (and are not nullable), you could start with the equivalent regular expression a(Ua)?V(aV)*.
If U and V can start with the same symbol, you'll also need to factor out the common prefix(es).

Related

why soundex return irrelevant result

I wonder why :
WHERE 1=1
AND LTRIM(RTRIM(lastName)) ='Schmdli'
OR (
SOUNDEX(lastName) = SOUNDEX('Schmdli')
)
Return me result like
lastName
Schöntal
Schindler-Külling
Schindler
Schmidlin
Schindler
Schmidli
Schmidli
Schindler
while I expect only:
Schmidli
Schmidli
Schmidlin
My first AND LTRIM(RTRIM(lastName)) ='Schmdli' is to match exact value then with soundex I expect better near Schmdli result here some result like
Schöntal
Schindler-Külling
Schindler
shouldn't appear.
Thanks
Trivial answer: because SOUNDEX is a simple algorithm with limited space (one letter and three digits), and all of your examples happen to translate to the same one, S534, only taking into account the letters S, C, M and D. Incidentally, Schöntal only takes into account S, C, N and T, producing the same output since M and N encode in the same way, as do D and T.

About sql and logic. In the sql where clause, is "not (p and q)" equal to "(not p) or (not q)"

A SQL and logic problem. In the where clause, is
not (p and q)
equal to
(not p) or (not q)
Yes. De Morgan's laws are language-independent.
Refer the working fiddle:
Query 1: not (p and q)
select * from table1
where
!(p = 1 and q=1);
Query 2 : (not p) or (not q)
select * from table1
where p!=1 or q!=1;
There is no difference in the output and hence the boolean algebra logic !(p and Q) = (!p) or (!q) is true!!!
Though a bit late answer but what you are talking about is De Morgan's Law here. So your logic not (p and q) will get converted to
not p or not q
Cause Negation (not) will apply to to the statement (p and q)
not p
not and will get converted to or
not q
Although the two expressions are logically equivalent, they may not be functionally equivalent. It depends on the nature of p and q and the operation of the language's optimiser.
Consider, for instance, that p is false. In the case of (not p) or (not q), we can deduce that the expression is true without having to evaluate q. A clever optimiser that understands or might take a short cut like that. But we cannot do so in the case of not (p and q) (unless our theorised optimiser could itself apply de Morgan first).
Does anyone know if SQL Server or Oracle or the other major players does this type of optimisation?
The result may not just be a performance saving. Suppose the q is not just a simple boolean variable but some expression that includes the execution of some more complex function. If that function has side-effects other than returning a truth value, then by optimising-out the evaluation of q we would also not see those side effects.

SQL 'between' vs. comparing both ends

as far as I know, in SQL,
X between Y and Z
gives the same result as
X >= Y and X <= Z
But I often see the latter from people I believe to be SQL experts.
Is there a subtle difference that I should know about?
Great question. In a nutshell, they are essentially the same. Documentation says:
expr BETWEEN min AND max
If expr is greater than or equal to min and expr is less than or equal to max, BETWEEN returns 1, otherwise it returns 0. This is equivalent to the expression (min <= expr AND expr <= max) if all the arguments are of the same type. Otherwise type conversion takes place according to the rules described in Section 12.2, “Type Conversion in Expression Evaluation”, but applied to all the three arguments.
reference: https://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#operator_between
I believe the between syntax is not ANSI compliant, so it is not implemented in all databases. The most common one >= and <= can be used everywhere.

The parentheses rules of PostgreSQL, is there a summarized guide?

In Mathematics and many programming languages (and I think standard SQL as well), parentheses change precedence (grouping parts to be evaluated first) or to enhance readability (for human eyes).
Equivalent Examples:
SELECT array[1,2] #> array[1]
SELECT (array[1,2]) #> array[1]
SELECT array[1,2] #> (array[1])
SELECT ((array[1,2]) #> (array[1]))
But SELECT 1 = ANY array[1,2] is a syntax error (!), and SELECT 1 = ANY (array[1,2]) is valid. Why?
OK, because "the manual says so". But what the logic for humans to remember all exceptions?
Is there a guide about it?
I do not understand why (expression) is the same as expression in some cases, but not in other cases.
PS1: parentheses are also used as value-list delimiters, as in expression IN (value [, ...]). But an array is not a value-list, and there does not seem to be a general rule in PostgreSQL when (array expression) is not the same as array expression.
Also, I used array as example, but this problem/question is not only about arrays.
"Is there a summarized guide?", well... The answer is no, so: hands-on! This answer is a Wiki, let's write.
Summarized guide
Let,
F() a an usual function. (ex. ROUND)
L() a function-like operator (ex. ANY)
f a operator-like function (ex. current_date)
Op an operator
Op1, Op2 are distinct operators
A, B, C values or expressions
S a expression-list, as "(A,B,C)"
The rules, using these elements, are in the form
rule: notes.
"pure" mathematical expressions
When Op, Op1, Op2 are mathematical operators (ex. +, -. *), and F() is a mathematical function (ex. ROUND()).
Rules for scalar expressions and "pure array expressions":
A Op B = (A Op B): the parentheses is optional.
A Op1 B Op2 C: need to check precedence.
(A Op1 B) Op2 C: enforce "first (A Op1 B)".
A Op1 (B Op2 C): enforce "first (B Op2 C)".
F(A) = (F(A)) = F((A)) = (F((A))): the parentheses are optional.
S = (S): the external parentheses are optional.
f=(f): the parentheses are optional.
Expressions with function-like operators
Rules for operators as ALL, ANY, ROW, SOME, etc.
L(A) = L((A)): the parentheses is optional in the argument.
(L(A)): SYNTAX ERROR.
...More rules? Please help editing here.
ANY is a function-like construct. Like (almost) any other function in Postgres it requires parentheses around its parameters. Makes the syntax consistent and helps the parser avoid ambiguities.
You can think of ANY() like a shorthand for unnest() condensed to a single expression.
One might argue an additional set of parentheses around the set-variant of ANY. But that would be ambiguous, since a list of values in parentheses is interpreted as a single ROW type.

What does LEFT in SQL do when it is not paired with JOIN and why does it cause my query to time out?

I was given the following statement:
LEFT(f.field4, CASE WHEN PATINDEX('%[^0-9]%',f.field4) = 0 THEN LEN(f.field4) ELSE PATINDEX('%[^0-9]%',f.field4) - 1 END)=#DealNumber
and am having trouble contacting the person that wrote it. Could someone explain what that statement does, and if it is valid SQL? The goal of the statement is to compare the numeric character in f.field for to the DealNumber. DNumber and DealNumber are the same except for a wildcard at the end of DealNumber.
I am trying to use it in the context of the following statement:
SELECT d.Description, d.FileID, d.DateFiled, u.Contact AS UserFiledName, d.Pages, d.Notes
FROM Documents AS d
LEFT JOIN Files AS f ON d.FileID=f.FileID
LEFT JOIN Users AS u ON d.UserFiled=u.UserID
WHERE SUBSTRING(f.Field8, 2, 1) = #LocationIDString
AND f.field4=#DNumber OR LEFT(f.field4, CASE WHEN PATINDEX('%[^0-9]%',f.field4) = 0 THEN LEN(f.field4) ELSE PATINDEX('%[^0-9]%',f.field4) - 1 END)=#DealNumber"
but my code keeps timing out when I execute it.
It's the CASE clause which is slowing things down, not LEFT per se (although LEFT may prevent the use of indexes, which will have an effect).
The CASE determines what should be compared with #DealNumber, and I think it does the following...
If f.field4 does not start with a digit, use LEFT(f.field4, LEN(f.field4))=#DealNumber: that's equivalent to f.field4=#DealNumber.
If f.field4 does start with digits, use {those digits}=#DealNumber.
This sort of computation isn't very efficient.
I would attempt the following, which makes the large assumption that a mixed string can be cast as an integer — that is, that if you convert ABC to an integer you get zero, and if you convert 123ABC you get what can be converted, 123. I can't find any documentation which says whether that is possible or not.
AND f.field4=#DNumber
OR (f.field4=#DealNumber AND integer(f.field4)=0)
OR (integer(f.field4)=#DealNumber)
The first line is the same as your AND. The second line selects f.field4=#DealNumber only if f.field4 does not start with a number. The third line selects where the initial numeric portion of f.field4 is the same as #DealNumber.
As I say, there is an assumption here that integer() will work in this way. You may need to define a CAST function to do that conversion with strings. That's rather beyond me, although I would be confident that even such a function would be faster than a CASE as you currently have.
From the doc:
left(str text, n int)
Return first n characters in the string. When n is negative, return all but last |n| characters.