Gherkin Syntax - What do Uptick and Dollar symbols signify? - gherkin

I am ramping up my working knowledge of Gherkin and while the whole process is clear - I am seeing two DIFFERENT version of Gherkin syntaxes.
Given I bought two apples
And I bought two oranges
Then I have 10 euros left
vs
Given ^I bought two apples$
And ^I bought two oranges$
Then ^I have 10 euros left$
I have tried to find what the latter (Containing ^ and $) signifies and how is it different from the former. I've seen examples of both on the net but I dont understand the difference between the two or when to use which. Could someone help point out what these differences are and when to apply which sytax?

^ and $ are regular expression wildcards, where ^ stands for the start of the line, $ is the end of the line.
It looks like they are used in some particular implementation of the Gherkin step definitions.
They are not part of standard Gherkin. The first syntax is valid.

Related

Guidance on Regex

I am struggling to complete this regex condition:
"Match anything that is not a legitimate subdomain of rafael.com (including the domain rafael.com)"
For example, these 3 lines should not match because they are legitimate
rafael.com
hello.rafael.com
hi.hello.rafael.com
And all the lines below should match
hello.rafael.xyz
badrafael.com
rafaelbad.com
rafaelbad.xyz
badrafael.xyz
arafael.com
arafael.xyz
rafael.xyz
a.b.rafael.xyz
This expression .*rafael(?!\.com).* gets me part of the way, but it isn't matching, for example,
badrafael.com
arafael.com
I am getting caught up with the lookbehind portion of this regex, I have been staring at this for 3 hours and can't figure it out. Any guidance, suggestions, links to examples would be tremendously appreciated!
I have decided after much troubleshooting that the best way to do this is actually to create separate regexes for some of the most common scenarios that may come up when blocking domain names and their combinations, and create more than 1 rule, as opposed to trying to capture every single possible combination of a domain in 1 regex.

Many instances of a terminal symbol in a BNF grammar

given a grammar like
<term>::= x[i]+exp(x[i]) | x[i]
<i>::= 1|2|3
Does a way exist to force the use of the same "i" in one solution of non terminal symbol ? So, I want to avoid solutions like x[1]+exp(2) or x[3]+exp(1)
Does a way exist to avoid that the same "i" is used in one solution of non terminal symbol ?So, I want to avoid solutions like x[1]+exp(1)
No, that's not possible with a context-free grammar.
This is essentially what "context-free" means. Every non-terminal in a production can be expanded independently without regard to the context in which it appears.
Of course, if i really only has three possible values, you can enumerate the finite number of legal productions, according to any definition of "legal" which you find convenient. But that gets really messy when the number of possibilities increases.
The most convenient solution is generally to accept the base syntax and check for concordance (or difference) in the associated semantic rule. That also allows for better error messages.

Is it, and if so why, wrong that these two regular grammars are different?

I'm tasked with writing a regular grammar based on a regular expression.
Given the regular expression a*b can be written as S -> b | aS
Is it incorrect that ba* as a regular grammar is S -> b | Sa?
I'm told the correct answer is in fact S -> bA, A -> ^| aA but I don't see the difference myself.
An explanation would be greatly appreciated!
IIRC, both your answer and the one being called "correct" are correct. See this. What you have constructed is a "left regular grammar", while the proponent(s) of the "correct" answer obviously prefer a "right regular grammar". There are other arbitrary rules that may be held more or less pedantically, like the "no empty productions" rule, but they don't really affect the class of regular languages, just the compactness of the grammar you use for a particular language, as your example highlights - a single production with two alternatives vs. two productions, one with a single clause, and one with two alternatives, one of which is empty.

common meanings of punctuation characters

I'm writing my own syntax and want characters that do not have obvious common meanings in that syntax [1]. Is there a list of the common meanings of punctuation characters (e.g. '?' could be part of a ternary operator, or part of a regex) so I can try to pick those which may not have 'obvious' syntax (I can be the judge of that :-).
[1] It's actually an extended Fortran FORMAT, but the details are irrelevant here
Here is an exhaustive survey of syntax across languages.
I am loath to be so defeatist, but this does sound a bit like it doesn't exist ( a list of all the symbols / operators across languages ) a quick look around would give a good idea of what is commonplace.
Assuming that you will restrict yourself to ASCII, the short-list is more or less what you can see on your keyboard and I can can think of a few uses for most of them. So maybe avoiding conflicts is a bit ambitious. Of course it depends on who is to be the user of this syntax, if for example symbols that are relatively unused in Fotran would be suitable then that is more realistic.
This link: Fotran 95 Spec gives a list of Fortran operators, which might help if avoided.
I'm sorry if any of this is a statement of the obvious or missing the point, or just not very helpful :)
I would say [a-z][A-Z] All do not have an obvious syntax for instance. if you used Upper case T as an operator.
x T v
The downfall is people like to use letters for variables.
Other than that you might want to investigate multicharacter operators, the downfall of these however is that they quickly grow weary to type things like
scalar = vec4i *+ vec4j
if you perhaps had a Fused multiply add operator. Well that one isnt so bad, but I'm sure you can find more cumbersome ones.

Does Perl 6 make any promises about the order alternations will be used?

Given an alternation like /(foo|foobar|foobaz)/ does Perl 6 make any promises about which of the three will be used first, and if it does where in the documentation does it make that promise?
See the related question Does Perl currently (5.8 and 5.10) make any promises about the order alternations will be used?.
To put it only a few words: the alternatives should be matched (at least notionally) in parallel, and the longest match wins. If you want sequential alternations, you can use the double bar ||, which promises a left-to-right order just like | does in Perl 5 regexes.
S05 says
To that end, every regex in Perl 6 is required to be able to distinguish its "pure" patterns from its actions, and return its list of initial token patterns (transitively including the token patterns of any subrule called by the "pure" part of that regex, but not including any subrule more than once, since that would involve self reference, which is not allowed in traditional regular expressions). A logical alternation using | then takes two or more of these lists and dispatches to the alternative that matches the longest token prefix. This may or may not be the alternative that comes first lexically.
However, if two alternatives match at the same length, the tie is broken first by specificity. The alternative that starts with the longest fixed string wins; that is, an exact match counts as closer than a match made using character classes. If that doesn't work, the tie broken by one of two methods. If the alternatives are in different grammars, standard MRO (method resolution order) determines which one to try first. If the alternatives are in the same grammar file, the textually earlier alternative takes precedence. (If a grammar's rules are defined in more than one file, the order is undefined, and an explicit assertion must be used to force failure if the wrong one is tried first.)
This seems to be a very different promise from the one made in Perl 5.