List of common Gherkin step starting keywords - gherkin

I want to compile a list of of keywords in use by my project (but also in the gherkin world). I am calling these words 'step starting keywords' (aka dark blue words in Rubymine) to clarify exactly what I am looking for.
Below is my current list, but I would like to expand this list, but I have yet to find an index of these words (or even know if there is a word that these words are known as in talking about Gherkin usage.
Examples
Given
And
Then
When
Maybe also include the structure keywords
Examples:
Scenario Outline:
Feature:
Note: My ultimate goal would be to have a 'dictionary' of ALL of the words that I have in use in my Gherkins.

You can add 'But' and '*' with step starters list.

Related

Find MMT Unicode abbreviations for given symbol (e.g. given ☞, find "juri")

The usual IDEs/editors for MMT (e.g. IntelliJ + MMT plugin or jEdit) feature an autocompletion feature for certain useful Unicode characters. For instance, I can type jle and immediately get suggested jleftrightarrow that, upon autocompletion, is replaced by ↔.
Is there a way to find out the reverse association? E.g. I have the symbol ☞ at hand and would like to know the autocompletion abbreviation starting with j — if it exists. For that hand, I would get juri.
The MMT OnlineTools I developed allow this: https://comfreek.github.io/mmteditor.
See screenshot below: if you already have a string full of Unicode symbols that you don't know how to type, just paste it under "how do I type X?". And if you are looking for a specific abbreviation — by Unicode character or by (parts of its) name — use the "abbreviation search" feature.
Internally, my tools pulls from (a copy of) the same resource file that Dennis linked in his answer.
As far as I know, there currently isn't a good way to look up or search for the ASCII abbreviations, except to go straight for the source — which at least has the advantage that it's guaranteed to be up-to-date.
The IDE plugins all have access to an mmt.jar and load their abbreviations from a specific resource file embedded therein. You can find it here on GitHub: https://github.com/UniFormal/MMT/blob/master/src/mmt-api/resources/unicode/unicode-latex-map.
In the long term, we should consider extending that file with a third "field" that gives a short description, and e.g. have a text field in IntelliJ to search for a specific abbreviation.

Find All in a Textbox

I am working on an application to search for and build a list of all the times a string (or variable of) is in a text file. Kind of like a Find All function in a text editor that I can build a list with the info that is found, such as
S350
S250
S270
S5000
What can I use to do this search? It will have one value that does not change (The S in this case) followed by up to 4 digits
RegEx seems like a good choice.
Something like.. S(\d{1,4})? might work for you.
Expresso is my preferred regular expression composer.

Searching for a sql server database reference string pattern [Database].[Schema].[Object]

I need to search through various large T-SQL scripts and find all references to database objects which has [Database].[SchemaName].[Table|View|StoredProcedure] pattern.
I'm using notepad++ to search folders containing the target scripts. Could someone help me out with a regular expression to identify references to database objects that use the pattern described above. For example:
[MyDB].[MySchema].Employee
MyDb.MySchema.Employee
MyDb.[MySchema].uspGetEmployee
[MyDb].MySchema.vwEmployee
are all candidates to be found because they have the three layers.
[MySchema].Employee is not a candidate because it doesn't follow the pattern of [Db].[Schema].[Object].
Thank you.
This regex:
(\w+|\[\w+\])\.(\w+|\[\w+\])\.\w+
Is as simple as it gets. It means:
A word, or a word in between [] ((\w+|\[\w+\]));
Followed by a dot (\.);
Followed by a word, or a word in between [] ((\w+|\[\w+\]));
Followed by a dot (\.);
Followed by a word (\w+).
Check out this demo and see (and test) what it matches.
Naturally, just place it in the Find what: text field of notepad++ search box.

How exact phrase search is performed by a Search Engine?

I am using Lucene to search in a Data-set, I need to now how "" search (I mean exact phrase search) mechanism has been implemented?
I want to make it able to result all "little cat" hits when the user enters "littlecat". I now that I should manipulate the indexing code, but at least I should now how the "" search works.
I want to make it able to result all "little cat" hits when the user enters "littlecat"
This might sound easy but it is very tough to implement. For a human being little and cat are two different words but for a computer it does not know little and cat seperately from littlecat, unless you have a dictionary and your code check those two words in dictionary. On the other hand searching for "little cat" can easily search for "littlecat" aswell. And i believe that this goes beyong the concept of an exact phrase search. Exact phrase search will only return littlecat if you search for "littlecat" and vice versa. Even google seemingly (expectedly too), doesnt return "little cat" on littlecat search
A way to implement this is Dynamic programming - using a dictionary/corpus to compare your individual words against(and also the left over words after you have parsed the text into strings).
Think of it like you were writing a custom spell-checker or likewise. In this, there's also a scenario when more than one combination of words may be left over eg -"walkingmydoginrain" - here you could break the 1st word as "walk", or as "walking" , and this is the beauty of DP - since you know (from your corpus) that you can't form legitimate words from "ingmydoginrain" (ie rest of the string - you have just discovered that in this context - you should pick the segmented word as "Walking" and NOT walk.
Also think of it like not being able to find a match is adding to a COST function that you define, so you should get optimal results - meaning you can be sure that your text(un-separated with white spaces) will for sure be broken into legitimate words- though there may be MORE than one possible word sequences in that line(and hence, possibly also intent of the person seeking this)
You should be able to find pretty good base implementations over the web for your use case (read also : How does Google implement - "Did you mean" )
For now, see also -
How to split text without spaces into list of words?

"Exclude these words" feature

How do I implement "Exclude these words" feature for a search appliation using Lucene?
Thanks!
therefor i can use the stopanalyzer:
StopAnalyzer StopAnalyzer includes the lower-case filter, and also has a filter that drops out any "stop words", words like articles (a, an, the, etc) that occur so commonly in english that they might as well be noise for searching purposes. StopAnalyzer comes with a set of stop words, but you can instantiate it with your own array of stop words.
http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/analysis/StopAnalyzer.html
more information:
http://www.darksleep.com/lucene/
How to sort by Lucene.Net field and ignore common stop words such as 'a' and 'the'?
Look at the NOT operator here. Just construct your query accordingly or massage if it is a user-generated query.