Avoid repetition in Coq - dry

I'm currently trying to implement Hilbert's geometry in Coq. When proving, very often a section of the proof is repeated multiple times; for example, here I'm trying to prove that there exists 3 lines which are different from each other.
Proposition prop3_2 : (exists l m n: Line, (l<>m/\m<>n/\n<>l)).
Proof.
destruct I3 as [A [B [C [[AneB [BneC CneA]] nAlgn]]]].
destruct ((I1 A B) AneB) as [AB [incAB unAB]].
destruct ((I1 B C) BneC) as [BC [incBC unBC]].
destruct ((I1 C A) CneA) as [CA [incCA unCA]].
refine (ex_intro _ AB _).
refine (ex_intro _ BC _).
refine (ex_intro _ CA _).
split.
(* Proving AB <> BC through contradiction *)
case (classic (AB = BC)).
intros AB_e_BC.
rewrite AB_e_BC in incAB.
pose (conj incBC (proj2 incAB)) as incABC.
specialize (nAlgn BC).
tauto.
trivial.
split.
(* Proving BC <> CA through contradiction *)
case (classic (BC = CA)).
intros BC_e_CA.
rewrite BC_e_CA in incBC.
pose (conj incCA (proj2 incBC)) as incABC.
specialize (nAlgn CA).
tauto.
trivial.
(* Proving CA <> AB through contradiction *)
case (classic (CA = AB)).
intros CA_e_AB.
rewrite CA_e_AB in incCA.
pose (conj incAB (proj2 incCA)) as incABC.
specialize (nAlgn AB).
tauto.
trivial.
Qed.
It'd be very nice if there was something like a macro in these cases.
I thought about creating a sub-proof halfway through:
Lemma prop3_2_a: (forall (A B C:Point) (AB BC:Line)
(incAB:(Inc B AB /\ Inc A AB)) (incBC:(Inc C BC /\ Inc B BC))
(nAlgn : forall l : Line, ~ (Inc A l /\ Inc B l /\ Inc C l)),
AB <> BC).
Proof.
...
But that's pretty cumbersome, and I'd have to create three different versions of nAlgn ordered differently, which is manageable but annoying.
The code can be found here: https://github.com/GiacomoMaletto/Hilbert/blob/master/hilbert.v
(Btw Any other comments on style or whatever are appreciated).

First, some simple advice to refactor the three cases individually.
At the start of each of them, the goal looks like this:
...
--------------
AB <> BC
The subsequent case analysis on (AB = BC) is somewhat redundant. The first case (AB = BC) is the interesting one, where you need to prove a contradiction, and the second case (AB <> BC) is trivial. A shorter way is intro AB_e_BC, which asks you just to prove the first case. This works because AB <> BC actually means AB = BC -> False.
The other steps are mostly straightforward propositional reasoning that can be bruteforced via tauto, except for a bit of rewriting and a crucial use of specialize. The rewriting only uses an equality between variables AB and BC, in that case you can use the subst shorthand that rewrites using all equalities where one side is a variable. So this fragment:
(* Proving AB <> BC through contradiction *)
case (classic (AB = BC)).
intros AB_e_BC.
rewrite AB_e_BC in incAB.
pose (conj incBC (proj2 incAB)) as incABC.
specialize (nAlgn BC).
tauto.
trivial.
becomes
intro; specialize (nAlgnABC BC); subst; tauto.
Now you still don't want to write that three times. The only varying part now is the variable BC. Luckily, you can read that off the goal before intro.
--------------
AB <> BC
^----- there's BC (and in the other two cases, CA and AB)
Actually picking either AB or BC is fine, since intro makes the assumption they're equal. You can use match goal with to parameterize your tactic by bits from the goal.
match goal with
| [ |- _ <> ?l ] => intro; specialize (nAlgnABC l); subst; tauto
end.
(* The syntax is:
match goal with
| [ |- ??? ] => tactics
end.
where ??? is an expression with wildcards (_) and existential
variables (?l), that can be referred to inside the body "tactics"
(without the question mark) *)
Next, moving up before the split:
-------------------------------------------
AB <> BC /\ BC <> CA /\ CA <> AB
You can compose tactics to get three subgoals at once: split; [| split]. (meaning, split once, and in the second subgoal split again).
Finally, you want to apply the match tactic above for each subgoal, that's another semicolon:
split; [| split];
match goal with
| [ |- _ <> ?l ] => intro; specialize (nAlgnABC l); subst; tauto
end.
I would also recommend using bullets and braces to structure your proof, so that when your definitions change, you avoid entering confusing proof states because tactics get applied to the wrong subgoal. Here are some possible layouts for a three-case proof:
split.
- ...
...
- split.
+ ...
...
+ ...
...
split; [| split].
- ...
...
- ...
...
- ...
...
split; [| split].
{ ...
...
}
{ ...
...
}
{ ...
...
}

Related

Constructing a linear grammar for the language

I find difficulties in constructing a Grammar for the language especially with linear grammar.
Can anyone please give me some basic tips/methodology where i can construct the grammar for any language ? thanks in advance
I have a doubt whether the answer for this question "Construct a linear grammar for the language: is right
L ={a^n b c^n | n belongs to Natural numbers}
Solution:
Right-Linear Grammar :
S--> aS | bA
A--> cA | ^
Left-Linear Grammar:
S--> Sc | Ab
A--> Aa | ^
As pointed out in the comments, these grammars are wrong since they generate strings not in the language. Here's a derivation of abcc in both grammars:
S -> aS -> abA -> abcA -> abccA -> abcc
S -> Sc -> Scc -> Abcc -> Aabcc -> abcc
Also as pointed out in the comments, there is a simple linear grammar for this language, where a linear grammar is defined as having at most one nonterminal symbol in the RHS of any production:
S -> aSc | b
There are some general rules for constructing grammars for languages. These are either obvious simple rules or rules derived from closure properties and the way grammars work. For instance:
if L = {a} for an alphabet symbol a, then S -> a is a gammar for L.
if L = {e} for the empty string e, then S -> e is a grammar for L.
if L = R U T for languages R and T, then S -> S' | S'' along with the grammars for R and T are a grammar for L if S' is the start symbol of the grammar for R and S'' is the start symbol of the grammar for T.
if L = RT for languages R and T, then S = S'S'' is a grammar for L if S' is the start symbol of the grammar for R and S'' is the start symbol of the grammar for T.
if L = R* for language R, then S = S'S | e is a grammar for L if S' is the start symbol of the grammar for R.
Rules 4 and 5, as written, do not preserve linearity. Linearity can be preserved for left-linear and right-linear grammars (since those grammars describe regular languages, and regular languages are closed under these kinds of operations); but linearity cannot be preserved in general. To prove this, an example suffices:
R -> aRb | ab
T -> cTd | cd
L = RT = a^n b^n c^m d^m, 0 < a,b,c,d
L' = R* = (a^n b^n)*, 0 < a,b
Suppose there were a linear grammar for L. We must have a production for the start symbol S that produces something. To produce something, we require a string of terminal and nonterminal symbols. To be linear, we must have at most one nonterminal symbol. That is, our production must be of the form
S := xYz
where x is a string of terminals, Y is a single nonterminal, and z is a string of terminals. If x is non-empty, reflection shows the only useful choice is a; anything else fails to derive known strings in the language. Similarly, if z is non-empty, the only useful choice is d. This gives four cases:
x empty, z empty. This is useless, since we now have the same problem to solve for nonterminal Y as we had for S.
x = a, z empty. Y must now generate exactly a^n' b^n' b c^m d^m where n' = n - 1. But then the exact same argument applies to the grammar whose start symbol is Y.
x empty, z = d. Y must now generate exactly a^n b^n c c^m' d^m' where m' = m - 1. But then the exact same argument applies to the grammar whose start symbol is Y.
x = a, z = d. Y must now generate exactly a^n' b^n' bc c^m' d^m' where n' and m' are as in 2 and 3. But then the exact same argument applies to the grammar whose start symbol is Y.
None of the possible choices for a useful production for S is actually useful in getting us closer to a string in the language. Therefore, no strings are derived, a contradiction, meaning that the grammar for L cannot be linear.
Suppose there were a grammar for L'. Then that grammar has to generate all the strings in (a^n b^n)R(a^m b^m), plus those in e + R. But it can't generate the ones in the former by the argument used above: any production useful for that purpose would get us no closer to a string in the language.

Is there a way to automate a Coq proof with rewrite steps?

I am working on a proof and one of my subgoals looks a bit like this:
Goal forall
(a b : bool)
(p: Prop)
(H1: p -> a = b)
(H2: p),
negb a = negb b.
Proof.
intros.
apply H1 in H2. rewrite H2. reflexivity.
Qed.
The proof does not rely on any outside lemmas and just consists of applying one hypothesis in the context to another hypothesis and doing rewriting steps with a known hypothesis.
Is there a way to automate this? I tried doing intros. auto. but it had no effect. I suspect that this is because auto can only do apply steps but no rewrite steps but I am not sure. Maybe I need some stronger tactic?
The reason I want to automate this is that in my original problem I actually have a large number of subgoals that are very similar to this one, but with small differences in the names of the hypotheses (H1, H2, etc), the number of hypotheses (sometimes there is an extra induction hypothesis or two) and the boolean formula at the end. I think that if I could use automation to solve this my overall proof script would be more concise and robust.
edit: What if there is a forall in one of the hypothesis?
Goal forall
(a b c : bool)
(p: bool -> Prop)
(H1: forall x, p x -> a = b)
(H2: p c),
negb a = negb b.
Proof.
intros.
apply H1 in H2. subst. reflexivity.
Qed
When you see a repetitive pattern in the way you prove some lemmas, you can often define your own tactics to automate the proofs.
In your specific case, you could write the following:
Ltac rewrite_all' :=
match goal with
| H : _ |- _ => rewrite H; rewrite_all'
| _ => idtac
end.
Ltac apply_in_all :=
match goal with
| H : _, H2 : _ |- _ => apply H in H2; apply_in_all
| _ => idtac
end.
Ltac my_tac :=
intros;
apply_in_all;
rewrite_all';
auto.
Goal forall (a b : bool) (p: Prop) (H1: p -> a = b) (H2: p), negb a = negb b.
Proof.
my_tac.
Qed.
Goal forall (a b c : bool) (p: bool -> Prop)
(H1: forall x, p x -> a = b)
(H2: p c),
negb a = negb b.
Proof.
my_tac.
Qed.
If you want to follow this path of writing proofs, a reference that is often recommended (but that I haven't read) is CPDT by Adam Chlipala.
This particular goal can be solved like this:
Goal forall (a b : bool) (p: Prop) (H1: p -> a = b) (H2: p),
negb a = negb b.
Proof.
now intuition; subst.
Qed.
Or, using the destruct_all tactic (provided you don't have a lot of boolean variables):
intros; destruct_all bool; intuition.
The above has been modeled after the destr_bool tactic, defined in Coq.Bool.Bool:
Ltac destr_bool :=
intros; destruct_all bool; simpl in *; trivial; try discriminate.
You could also try using something like
destr_bool; intuition.
to fire up powerful intuition after simpler destr_bool.
now is defined in Coq.Init.Tactics as follows
Tactic Notation "now" tactic(t) := t; easy.
easy is defined right above it and (as its name suggests) can solve easy goals.
intuition can solve goals which require applying the laws of (intuitionistic) logic. E.g. the following two hypotheses from the original version of the question require an application of the modus ponens law.
H1 : p -> false = true
H2 : p
auto, on the other hand, doesn't do that by default, it also doesn't solve contradictions.
If your hypotheses include some first-order logic statements, the firstorder tactic may be the answer (like in this case) -- just replace intuition with it.

Simplification of lambda-productions,unary rules and non-useful symbols of a Grammar

I know that is not a general question but I would like to know how to do it with an example that I've been already working a bit on it..Once said this:
I have the following Grammar. I tried to simplify it but I'm unsure of its correctness, could someone help me confirming if it's correct or not?
S -> BC | lambda
A -> aA | lambda
B -> bB
C -> c
If I have to simplify the Grammar I first apply lambda-eliminations where I have something like:
S -> BC | B | C
A -> aA | a
B -> bB
C -> c
And finally I have to eliminate non-useful symbols:
Firstly I eliminate the ones that are not productive and then the ones that are unreacheable so..
S -> BC | bB | C
A -> aA | a
B -> bB ---> non-productive
C -> c
S -> C | b | C
A -> aA | a --> unreacheable
C -> c
Finally I have something like this and I eliminate C because is unnecessary and I also eliminate BC because were eliminated so should be something like:
S -> b | c
But if i'm honest I don't think that what I've done it's correct but I don't know exactly
It looks like there might be some problems with your simplification - or I'm having trouble following it. I'll walk through what I would do and you compare to your understanding.
S -> BC | lambda
A -> aA | lambda
B -> bB
C -> c
I presume the goal is to eliminate as many lambda and non-terminal symbols as possible. The first thing to note is that the production S -> lambda cannot be eliminated without changing the language; but all other lambda productions can be eliminated. We see one other production, so we are assured it can be eliminated. How do we eliminate A -> lambda?
We note that A is unreachable from S, the start symbol. So we can quite easily eliminate A -> lambda by eliminating A altogether. We arrive at this simpler, equivalent, grammar:
S -> BC | lambda
B -> bB
C -> c
Now, as our goal is to eliminate non-terminal symbols (we have already eliminated all extraneous -> lambda productions) we can look at S, B and C. We know we need a start symbol, so we might as well keep S as it is. B can only generate bB, which contains non-terminals; and bB can never lead to a string of only non-terminals. B is unproductive and we can eliminate it. When we eliminate an unproductive symbol, any concatenated term in which it appears must also be eliminated, since the concatenated expression can never arrive at a string of only non-terminals (any concatenated expression in which an unproductive symbol appears is also unproductive):
S -> lambda
C -> c
Applying our analysis to C, we easily see it is as unreachable as A was initially and so it can be eliminated in the same way:
S -> lambda
This grammar is in simplest terms and is minimal in terms of non-terminal symbols and productions for the language {lambda}.

With respect to grammas, when are eplison production rules allowed?

I'm trying to understand a concept with respect to grammar and Production Rules.
According to most material on this subject:
1) Epsilon production rules are only allowable if they do not appear on the RHS of any other production rule.
However, taking a grammar:
G = { T,N,P,S }
Where:
T = {a,b}
N = {S,S1}
S = {S}
P {
S -> aSb
S -> ab
S1 -> SS1
S1 -> E //Please note, using E to represent Epsilon.
}
Where, the language of the grammar is:
L(G) = { a^n, b^n | n >= 1 }
In this case, a production rule containing Epsilon exists (derived from S1) but S1 also forms part of a RHS of another production rule (S1 -> SS1).
Doesn't this violate point 1?
Your statement:
Epsilon production rules are only allowable if they do not appear on the RHS of any other production rule.
would be better stated as
A non-terminal may have an epsilon production rules if that non-terminal does not appear on the right-hand side of any production rule.
In Chomsky's original hierarchy, epsilon productions were banned for all but Type 0 (unrestricted) grammars. If all epsilon productions are banned, then it is impossible for the grammar to produce the empty string. I believe this was not a concern for Chomsky; consequently, most modern formulations allow the start symbol to have an empty right-hand side as long as the start symbol itself does not appear on the right-hand side of any production.
As it happens, the restriction on epsilon-productions is somewhat stronger than is necessary. In the case of both context-free grammars and regular grammars (Chomsky type 2 and type 3 grammars), it is always possible to create a weakly-equivalent grammar without epsilon productions (except possibly the single production S → ε if the grammar can produce the empty string.) It is also possible to remove a number of other anomalies which complicate grammar analysis: unreachable symbols, unproductive symbols, and cyclic productions. The result of the combination of all these eliminations is a "proper context-free grammar".
Consequently, most modern formulations of context-free grammars do not require the right-hand sides to be non-empty.
Your grammar G = {T, N, S, P} with
T = {a, b}
N = {S, S1}
S = {S}
P {
S → a S b
S → a b
S1 → S S1
S1 → ε
}
contains an unreachable symbol, S1. We can easily eliminate it, producing the equivalent grammar G' = { T, N', S, P' }:
N' = {S}
P' {
S → a S b
S → a b
}
G' does not contain any epsilon productions (but even if it had, they could have been eliminated).

Why the need for terminals? Is my solution sufficient enough?

I'm trying to get my head around context free grammars and I think I'm close. What is baffling me is this one question (I'm doing practise questions as I have an exam in a month's time):
I've come up with this language but I believe it's wrong.
S --> aSb | A | B
A --> aA | Σ
B --> bB | Σ
Apparently this is the correct solution:
S --> aSb | aA | bB
A --> aA | Σ
B --> bB | Σ
What I don't quite understand is why we have S --> aSb | aA | bB and not just S --> aSb | A | B. What is the need for the terminals? Can't I just call A instead and grab my terminals that way?
Testing to see if I can generate the string: aaabbbb
S --> aSb --> aaSbb --> aaaSbbb --> aaaBbbb --> aaabbbb
I believe I generate the string correctly, but I'm not quite sure. I'm telling myself that the reason for S --> aSb | aA | bB is that if we start with aA and then replace A with a, we have two a's which gives us our correct string as they're not equal, this can be done with b as well. Any advice is greatly appreciated.
Into the Tuple (G-4-tuple)
V (None terminals) = {A, B}
Σ (Terminals) = {a, b}
P = { } // not quite sure how to express my solution in R? Would I have to use a test string to do so?
S = A
First:
Σ means language symbols. in your language Σ = {a, b}
^ means null symbols (it is theoretical, ^ is not member of any language symbol)
ε means empty string (it is theoretical, ε can be a member of some language)
See ^ symbol means nothing but we use it just for theoretical purpose, like ∞ infinity symbol we uses in mathematics(really no number is ∞ but we use it to understand, to proof some theorems) similarly ^ is nothing but we use it.
this point not written in any book, I am writing it to explain/for understanding point of view. The subject more belongs to theoretical and math and I am from computer science.
As you says your grammar is L = {am bn | m != n}. Suppose if productions are as follows:
First:
S --> aSb | A | B
A --> aA | Σ
B --> bB | Σ
It means.(very rare book may use Σ in grammar rules)
S --> aSb | A | B
A --> aA | a | b
B --> bB | a | b
I replaced Σ by a | b (a, b language symbols).
This grammar can generates a string of equal numbers of symbols a and symbol b(an bn). How it can generate an bn? See below an example derivation:
S ---> aSb ---> aAb ---> aaAb ---> aabb
^ ^ ^ ^
rule-1 S-->A A--> aA A --> b
But these kind of strings are not possible in language L because m != n.
Second:
For the same reason production rules S --> aSb | aA | bB is also not correct grammar if A --> aA | Σ or B --> bB | Σ are in grammar.
I think in second grammar you mean:
S --> aSb | aA | bB
A --> aA | ^
B --> bB | ^
Then this is correct grammar for language L = {am bn | m != n}. Because using:
S --> aSb
you can only generate equal numbers of a' and b and by replacing S either by aA or by bB you make a sentential form in which unequal numbers of a and b symbols are present and that can't convert back to generate a string of type an bn. (since A doesn't generates b and B doesn't generates a).
Third:
But usually we write grammar rules like:
S --> aSb | A | B
A --> aA | a
B --> bB | b
Both forms are equivalent (generate same language L = {am bn | m != n}) because once you convert S into either A or B you have to generate at-least one a or b (or more) respectively and thus constraint m != n holds.
Remember proofing, whether two grammars are equivalent or not is undecidable problem. We can't prove it by algorithm (but logically possible, that works because we are human being having brain better then processor :P :) ).
Fourth:
At the end I would also like to add, Grammar:
S --> aSb | A | B
A --> aA | ^
B --> bB | ^
doesn't produces L = {am bn | m != n} because we can generate an bn for example:
S ---> aSb ---> aAb ---> ab
^
A --> ^
Grammar in formal languages
Any class of formal languages can be represented by a formal Grammar consisting of the four-tuple (S, V, Σ, P). (note a Grammar or an automata both are finite representation weather language is finite or infinite: Check figures one & two).
Σ: Finite set of language symbols.
In grammar we commonly call it finite set of terminals (in contrast of variables V). Language symbols or terminals are thing, using which language strings (sentences) are constructed. In your example set of terminals Σ is {a, b}. In natural language you can correlate terminals with vocabulary or dictionary words.
Natural language means what we speak Hindi, English
V: Finite set of Non-terminals.
Non-terminal or say 'variable', should always participate in grammar production rules. (otherwise the variable counts in useless variables, that is a variable that doesn't derives terminals or nothing).
See: 'ultimate aim of grammar is to produce language's strings in correct form hence every variable should be useful in some way.
In natural language you can correlate variable set with Noun/Verbs/Tens that defined a specific semantical property of an language (like Verb means eating/sleeping, Noun means he/she/Xiy etc).
Note: One can find in some books V ∩ Σ = ∅ that means variables are not terminals.
S: Start Variable. (S ∈ V)
S is a special variable symbol, that is called 'Start Symbol'. We can only consider a string in language of grammar L(G) if it can be derived from Start variable S. If a string can not be derived from S (even if its consist of language symbols Σ) then string will not be consider in the language of grammar( actually that string belongs to 'complement language' of L(G), we writes complement language L' = Σ* - L(G) , Check: "the complement language in case of regular language")
P: Finite set of Production Rules.
Production Rules defines replacement rules in the from α --> β, that means during the derivation of a string from S, from grammar rules at any time α (lhs) can be replaced by β (rhs).(this is similar to Noun can be replace by he,she or Xiy, and Verb can be replace by eating, sleeping etc in natural language.
Production rules defines formation rules of language sentences. Formal language are similar to Natural language having a pattern that is certain thing can occurs in certain form--that we call syntax in programming language. And because of this ability of grammar, grammar use for syntax checking called parse).
Note: In α --> β, α and β both are consists of language symbols and terminals (V U Σ)* with a constraint that in α their must be at-least one variable. (as we can replace only a string contain variable by rhs of rule. a terminal can't replace by other terminal or we can say a sentence can't be replaced by other sentence)
Remember: There is two form Sentential Form and Sentence of a string:
Sentence: if all symbols are terminals (sentence can be either in L(G) or in complement language L' = Σ* - L)
Sentential: if any symbol is variable (not a language string but derivation string)
From #MAV (Thanks!!):
To represent grammar of above language L = {am bn | m != n}, 4-tuple are :
V = {S, A, B}
Σ = {a, b}
P = {S --> aSb | A | B, A --> aA | a, B --> bB | a}
S = S
note: Generally I use P for Production rules, your book may use R for rules
Terminology uses in theory of formal languages and automate
Capital letters are uses for variables e.g. S, A, B in grammar construction.
Small letter from start uses for terminals(language symbols) for example a, b.
(some time numbers like 0, 1 uses. Also ^ is null symbol).
Small letters form last uses for string of terminals z, y, w, x (for example you can find these notations in pumping lemma,
symbols use for language string or sub strings).
α, β, γ for Sentential forms.
Σ for language symbols.
Γ for input or output tap symbol, other then language symbols.
^ for null symbol, # or ☐ Symbol for blank symbol in Turing machine and PDA (^, #, ☐ are other then language symbols.
ε uses for empty string (can be a part of language string for example { } is empty body in C language, you can write while(1); or
while(1){ } both are valid see here I have defined a valid program
with empty sentences).
∅ means empty set in set theory.
Φ, Ψ uses for substring in Sentential forms.
Note: ∅ means set is empty, ε means string is empty, ^ means none symbol (don't mix in theory, all are different in semantic)
There is no rules I know about symbol notation, but these are commonly used terminology once can find in most standard books I observed during study.
Next post: Tips for writing Context free grammar