What is the ellipsis (empty string) used for in a Treetop(PEG) grammar? - treetop

The Treetop website gives the following explanation that I don't understand
Ellipsis
An empty string matches at any position and consumes no input. It's useful when you wish to treat a single symbol as part of a sequence, for example when an alternate rule will be processed using shared code.
rule alts
( foo bar / baz '' )
{
def value
elements.map{|e| e.text_value }
end
}
end
when is useful to treat a symbol as a part of sequence? Can anybody provide a meaningful example of that?

I am not familiar with Treetop. From the example it would seem that ( foo bar / baz '' ) would either produce ['foo', 'bar'] or ['baz', ''].
If you remove the ellipsis, you would get either ['foo', 'bar'] or just 'baz' (no sequence/list/array).

Related

How to use If statement and str.contains() to create a new dataframe column [duplicate]

This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
I'm looking for a string.contains or string.indexof method in Python.
I want to do:
if not somestring.contains("blah"):
continue
Use the in operator:
if "blah" not in somestring:
continue
If it's just a substring search you can use string.find("substring").
You do have to be a little careful with find, index, and in though, as they are substring searches. In other words, this:
s = "This be a string"
if s.find("is") == -1:
print("No 'is' here!")
else:
print("Found 'is' in the string.")
It would print Found 'is' in the string. Similarly, if "is" in s: would evaluate to True. This may or may not be what you want.
Does Python have a string contains substring method?
99% of use cases will be covered using the keyword, in, which returns True or False:
'substring' in any_string
For the use case of getting the index, use str.find (which returns -1 on failure, and has optional positional arguments):
start = 0
stop = len(any_string)
any_string.find('substring', start, stop)
or str.index (like find but raises ValueError on failure):
start = 100
end = 1000
any_string.index('substring', start, end)
Explanation
Use the in comparison operator because
the language intends its usage, and
other Python programmers will expect you to use it.
>>> 'foo' in '**foo**'
True
The opposite (complement), which the original question asked for, is not in:
>>> 'foo' not in '**foo**' # returns False
False
This is semantically the same as not 'foo' in '**foo**' but it's much more readable and explicitly provided for in the language as a readability improvement.
Avoid using __contains__
The "contains" method implements the behavior for in. This example,
str.__contains__('**foo**', 'foo')
returns True. You could also call this function from the instance of the superstring:
'**foo**'.__contains__('foo')
But don't. Methods that start with underscores are considered semantically non-public. The only reason to use this is when implementing or extending the in and not in functionality (e.g. if subclassing str):
class NoisyString(str):
def __contains__(self, other):
print(f'testing if "{other}" in "{self}"')
return super(NoisyString, self).__contains__(other)
ns = NoisyString('a string with a substring inside')
and now:
>>> 'substring' in ns
testing if "substring" in "a string with a substring inside"
True
Don't use find and index to test for "contains"
Don't use the following string methods to test for "contains":
>>> '**foo**'.index('foo')
2
>>> '**foo**'.find('foo')
2
>>> '**oo**'.find('foo')
-1
>>> '**oo**'.index('foo')
Traceback (most recent call last):
File "<pyshell#40>", line 1, in <module>
'**oo**'.index('foo')
ValueError: substring not found
Other languages may have no methods to directly test for substrings, and so you would have to use these types of methods, but with Python, it is much more efficient to use the in comparison operator.
Also, these are not drop-in replacements for in. You may have to handle the exception or -1 cases, and if they return 0 (because they found the substring at the beginning) the boolean interpretation is False instead of True.
If you really mean not any_string.startswith(substring) then say it.
Performance comparisons
We can compare various ways of accomplishing the same goal.
import timeit
def in_(s, other):
return other in s
def contains(s, other):
return s.__contains__(other)
def find(s, other):
return s.find(other) != -1
def index(s, other):
try:
s.index(other)
except ValueError:
return False
else:
return True
perf_dict = {
'in:True': min(timeit.repeat(lambda: in_('superstring', 'str'))),
'in:False': min(timeit.repeat(lambda: in_('superstring', 'not'))),
'__contains__:True': min(timeit.repeat(lambda: contains('superstring', 'str'))),
'__contains__:False': min(timeit.repeat(lambda: contains('superstring', 'not'))),
'find:True': min(timeit.repeat(lambda: find('superstring', 'str'))),
'find:False': min(timeit.repeat(lambda: find('superstring', 'not'))),
'index:True': min(timeit.repeat(lambda: index('superstring', 'str'))),
'index:False': min(timeit.repeat(lambda: index('superstring', 'not'))),
}
And now we see that using in is much faster than the others.
Less time to do an equivalent operation is better:
>>> perf_dict
{'in:True': 0.16450627865128808,
'in:False': 0.1609668098178645,
'__contains__:True': 0.24355481654697542,
'__contains__:False': 0.24382793854783813,
'find:True': 0.3067379407923454,
'find:False': 0.29860888058124146,
'index:True': 0.29647137792585454,
'index:False': 0.5502287584545229}
How can in be faster than __contains__ if in uses __contains__?
This is a fine follow-on question.
Let's disassemble functions with the methods of interest:
>>> from dis import dis
>>> dis(lambda: 'a' in 'b')
1 0 LOAD_CONST 1 ('a')
2 LOAD_CONST 2 ('b')
4 COMPARE_OP 6 (in)
6 RETURN_VALUE
>>> dis(lambda: 'b'.__contains__('a'))
1 0 LOAD_CONST 1 ('b')
2 LOAD_METHOD 0 (__contains__)
4 LOAD_CONST 2 ('a')
6 CALL_METHOD 1
8 RETURN_VALUE
so we see that the .__contains__ method has to be separately looked up and then called from the Python virtual machine - this should adequately explain the difference.
if needle in haystack: is the normal use, as #Michael says -- it relies on the in operator, more readable and faster than a method call.
If you truly need a method instead of an operator (e.g. to do some weird key= for a very peculiar sort...?), that would be 'haystack'.__contains__. But since your example is for use in an if, I guess you don't really mean what you say;-). It's not good form (nor readable, nor efficient) to use special methods directly -- they're meant to be used, instead, through the operators and builtins that delegate to them.
in Python strings and lists
Here are a few useful examples that speak for themselves concerning the in method:
>>> "foo" in "foobar"
True
>>> "foo" in "Foobar"
False
>>> "foo" in "Foobar".lower()
True
>>> "foo".capitalize() in "Foobar"
True
>>> "foo" in ["bar", "foo", "foobar"]
True
>>> "foo" in ["fo", "o", "foobar"]
False
>>> ["foo" in a for a in ["fo", "o", "foobar"]]
[False, False, True]
Caveat. Lists are iterables, and the in method acts on iterables, not just strings.
If you want to compare strings in a more fuzzy way to measure how "alike" they are, consider using the Levenshtein package
Here's an answer that shows how it works.
If you are happy with "blah" in somestring but want it to be a function/method call, you can probably do this
import operator
if not operator.contains(somestring, "blah"):
continue
All operators in Python can be more or less found in the operator module including in.
So apparently there is nothing similar for vector-wise comparison. An obvious Python way to do so would be:
names = ['bob', 'john', 'mike']
any(st in 'bob and john' for st in names)
>> True
any(st in 'mary and jane' for st in names)
>> False
You can use y.count().
It will return the integer value of the number of times a sub string appears in a string.
For example:
string.count("bah") >> 0
string.count("Hello") >> 1
Here is your answer:
if "insert_char_or_string_here" in "insert_string_to_search_here":
#DOSTUFF
For checking if it is false:
if not "insert_char_or_string_here" in "insert_string_to_search_here":
#DOSTUFF
OR:
if "insert_char_or_string_here" not in "insert_string_to_search_here":
#DOSTUFF
You can use regular expressions to get the occurrences:
>>> import re
>>> print(re.findall(r'( |t)', to_search_in)) # searches for t or space
['t', ' ', 't', ' ', ' ']

Put named capture from regex in Subset into a variable in the signature

Consider
subset MySubset of Str where * ~~ /^ \d $<interesting> = ( \d+ ) $/;
Now I want to use the subset as a Type in my signature, but put the captured part(s) into a variable via unpacking, kinda like
sub f( MySubset $( :$interesting ) )
{
say $interesting;
}
f( "12345678" ); # should say 2345678
That's not working of course. Is it even possible to do this?
Subsignature unpacking is about turning a value into a Capture and matching against that.
class Point {
has ( $.x, $.y );
}
my ( :$x, :$y ) := Point.new( x => 3, y => 4 ).Capture;
say "[$x,$y]"; # [3,4]
Since a Str doesn't have a public attribute named $.interesting, it won't match.
A subset is just extra code to check a value more completely than you could otherwise do. It does not turn the value into a new type.
It would be more likely to work if you used $<interesting>.
sub f( MySubset )
{
say $<interesting>;
}
Of course since blocks get their own $/, this also does not work.
While it might be nice to pass information from a subset to a signature, I am not aware of anyway to do it.
As a side note, where already does smart matching so it is an incredibly bad idea to use ~~ inside of it.
This is basically how your subset works:
"12345678" ~~ ( * ~~ /…/ )
In this particular case you could just use .substr
sub f( MySubset $_ ) {
.substr(1)
}
I can't figure out a way with a subset type, however there is a way - with a little...creativity - to do a match and unpack it in the signature.
Match inherits from Capture, so having one be unpacked in a signature is straightforward - if only we can arrange for there to be a parameter that contains the Match we wish to unpack. One way to do that is to introduce a further parameter with a default. We can't really stop anyone passing to it - though we can make it a pain to do so by using the anonymous named parameter. Thus, if we write this:
sub foo($value, :$ (:$col, :$row) = $value.match(/^$<col>=[<:L>+]$<row>=[\d+]$/)) {
say $col;
say $row;
}
And call it as foo("AB23"), the output is:
「AB」
「23」
Finally, we may factor the rule out to a named token, achieving:
‌‌my token colrow { ^$<col>=[<:L>+]$<row>=[\d+]$ }
sub foo($value, :$ (:$col, :$row) = $value.match(&colrow)) {
say $col;
say $row;
}
I'm pretty sure wheres (and subsets) just answer True/False. Brad concurs.
There are essentially always metaprogramming answers to questions but I presume you don't mean that (and almost never dig that deep anyway).
So here are a couple ways to get something approaching what you seem to be after.
A (dubious due to MONKEYing) solution based on Brad's insights:
use MONKEY;
augment class Str {
method MyMatch { self ~~ / ^ \d $<interesting> = ( \d+ ) $ / }
}
class MyMatch is Match {}
sub f( MyMatch() $foo (:$interesting) ) { say ~$interesting }
f( "12345678" ); # 2345678
The bad news is that the sub dispatch works even if the string doesn't match. The doc makes it clear that the coercer method (method MyMatch in the above) cannot currently signal failure:
The method is assumed to return the correct type — no additional checks on the result are currently performed.
One can hope that one day augmenting a class will be an officially respectable thing to do (rather than requiring a use MONKEY...) and that coercing can signal failure. At that point I think this might be a decent solution.
A variant on the above that binds to $/ so you can use $<interesting>:
use MONKEY;
augment class Str {
method MyMatch { self ~~ / ^ \d $<interesting> = ( \d+ ) $ / }
}
class MyMatch is Match {}
sub f( MyMatch() $/ ) { say ~$<interesting> }
f( "12345678" ); # 2345678
Another way that avoids MONKEYing around is to use a subset as you suggest but separate the regex and subset:
my regex Regex { ^ \d $<interesting> = ( \d+ ) $ }
subset Subset of Str where &Regex;
sub f( Subset $foo ; $interesting = ~($foo ~~ &Regex)<interesting> )
{
say $interesting;
}
f( "12345678" ); # 2345678
Notes:
The regex parses the input value at least twice. First in the Subset to decide whether the call dispatches to the sub. But the result of the match is thrown away -- the value arrives as a string. Then the regex matches again so the match can be deconstructed. With current Rakudo, if the sub were a multi, it would be even worse -- the regex would be used three times because Rakudo currently does both a trial bind as part of deciding which multi to match, and then does another bind for the actual call.
Parameters can be set to values based on previous parameters. I've done that with $interesting. A signature can have parameters that are part of dispatch decisions, and others that are not. These are separated by a semi-colon. I've combined these two features to create another variable, thinking you might think that a positive thing. Your comment suggest you don't, which is more than reasonable. :)

What is the difference between ', ` and |, and when should they be used?

I've seen strings written like in these three ways:
lv_str = 'test'
lv_str2 = `test`
lv_str3 = |test|
The only thing I've notice so far is that ' trims whitespaces sometimes, while ` preserves them.
I just recently found | - don't know much about it yet.
Can someone explain, or post a good link here when which of these ways is used best and if there are even more ways?
|...| denotes ABAP string templates.
With string templates we can create a character string using texts, embedded expressions and control characters.
ABAP Docu
Examples
Use ' to define character-typed literals and non-integer numbers:
CONSTANTS some_chars TYPE char30 VALUE 'ABC'.
CONSTANTS some_number TYPE fltp VALUE '0.78'.
Use ` to define string-typed literals:
CONSTANTS some_constant TYPE string VALUE `ABC`.
Use | to assemble text:
DATA(message) = |Received HTTP code { status_code } with message { text }|.
This is an exhaustive list of the ways ABAP lets you define character sequences.
To answer the "when should they be used" part of the question:
` and | are useful if trailing spaces are needed (they are ignored with ', cf this blog post for more information, be careful SCN renders today the quotes badly so the post is confusing) :
DATA(arrival) = `Hello ` && `world`.
DATA(departure) = |Good | && |bye|.
Use string templates (|) rather than the combination of ` and && for an easier reading (it remains very subjective, I tend to prefer |; with my keyboard, | is easier to obtain too) :
DATA(arrival) = `Dear ` && mother_name && `, thank you!`.
DATA(departure) = |Bye { mother_name }, thank you!|.
Sometimes you don't have the choice: if a String data object is expected at a given position then you must use ` or |. There are many other cases.
In all other cases, I prefer to use ' (probably because I obtain it even more easily with my keyboard than |).
Although the other answers are helpful they do not mention the most important difference between 'and `.
A character chain defined with a single quote will be defined as type C with exactly the length of the chain even including white spaces at the beginning and the end of the character sequence.
So this one 'TEST' will get exactly the type C LENGTH 4.
wherever such a construct `TEST` will evaluate always to type string.
This is very important for example in such a case.
REPORT zutest3.
DATA i TYPE i VALUE 2.
DATA(l_test1) = COND #( WHEN i = 1 THEN 'ACT3' ELSE 'ACTA4').
DATA(l_test2) = COND #( WHEN i = 1 THEN `ACT3` ELSE `ACTA4`).
WRITE l_test1.
WRITE l_test2.

error in elm-lang `(==) is expecting the right side to be a:`

New to elm here, and at first it's driving me absolutely crazy not knowing the ins and outs of this picky language (even after reading a sh**load about it because it's just so different and finicky... I guess that's the nature of a functional lang) so when you try doing a simple thing it's like pulling hair at first.
I am getting the following error:
The right side of (==) is causing a type mismatch.
29| get 0 arrayOfValues == 'X'
^^^
(==) is expecting the right side to be a:
Maybe Char
But the right side is:
Char
Hint: With operators like (==) I always check the left side first. If it seems
fine, I assume it is correct and check the right side. So the problem may be in
how the left and right arguments interact.
Test:
it "blah blah blah" <|
let
someArray =
[ 'P', ' ' ]
in
expect (MyModule.doSomething someArray 'P') to equal 1
MyModule
doSomething : List Char -> Char -> Int
doSomething arrayOfValues symbol =
let
grid =
fromList arrayOfValues
found =
get 0 arrayOfValues == symbol
in
if found then
1
else
0
Now I'm assuming but not sure, that it's getting Nothing or something when trying to pull the first value out of my array but not sure. Maybe Char I assume is returning Nothing? donno, probably have other issues going on with it too.
I'd like to get the code above working, then refactor..I'm sure there's probably a more elegant way to code what I've coded above but first thing's first, fixing this error and understanding it better with the existing code. The error message while nice isn't that obvious to me as to how and what to handle. I have assumptions but not fully sure how to handle the behavior here of whatever is causing the issue.
Unique feature of the elm is certainty. Any variable (which is not of type maybe) will have a value of the defined type for sure.
But when it comes to array or list, it becomes uncertain if the array has an element on index "i". There may be an element and there may not be.
Hence elm has concept of Maybe,
so conceptually
Maybe String = [ Just "string_value" | Nothing ]
the alias for the Array.get is
get : Int -> Array a -> Maybe a
it takes
Int - index and
Array a - array of data type of array element
as parameters and returns
Maybe a - again a is the data type of array element
consider an example
array =
fromList ["one", "two"]
val1 =
get 0 array -- will return 'Just "one"'
val2 =
get 3 array -- will return 'Nothing', since the element does not exists
this way you will always have to handle both the situations, when you have a value and when you don't
case val1 of
Nothing ->
-- Raise some error message
Just val ->
-- `val` is the actual element/value found
and if you always need a default value, you can use
Maybe.withDefault "default_string" val1
this will always return a string value and will return "default_string" when the value is nothing otherwise the actual found value

Why doesn't elm use parenthesis?

Learning elm but don't get what author means by the below:
The reason we can avoid writing the parenthesis is because function
application associates to the left.
Any values or functions, specified after the function name, will be associated with the function as it's arguments automatically, that's really all it means.
In language, like JavaScript, you can explicitly distinguish the usage of a function, as an expression:
function foo (message) {
return message
}
console.log(foo) // Function as expression.
console.log(foo('Hello')) // Function application with result: "Hello"
In Elm this behaviour does not require parentesis.
foo message =
message
foo -- Function as expression.
foo "Hello" -- Function application with result: "Hello"
It's not like in JavaScript at all, when you want to apply the function and do something with result. Here you will have to tell the compiler explicitly, that (foo "Hello") is a single argument for String.toUpper
String.toUpper (foo "Hello") -- "HELLO"
The parentheses in question is ((divide 5) 2). My interpretation of that sentence is that you can write ((divide 5) 2) as divide 5 2 because divide 5 2 is evaluated from the left first, i.e. divide 5 -> divide5 then divide5 2 -> 2.5.
Though I can't see how else it could be evaluated! Neither 5 2 nor divide 2 then divide2 5 make sense.