Filter naming convention - naming-conventions

I have a mysql function that takes a string as input and returns only the alphanumeric characters.
101-FFKS 99S-5 would output 101FFKS99S5 for instance.
It was called alphanum which isn't very descriptive.
I was considering something along the lines of alphanum_filter or punctuation_filter.
What is the agreed convention? The thing you're filtering out, or the thing you're left with?
A few google searches didn't yield anything helpful.

I don't think there's any agreed convention in programming.
Note that there are three, not two options, as to the thing to put near "filter":
what you're filtering out,
what you're left with
and the combination of both, that is the substance which gets put into the filter.
Indeed I think that in general when in English you see a word near "filter" you think that that word is what will be put into the filter.
Note that while "air filter", "water filter", "oil filter" and "fuel filter" might seem to refer to what is left after the filtering, I strongly believe that there's an implicit "unfiltered" before all of them.
However there also nouns that really include "what you're left with", such as "high-pass filter" or "red filter", and ones that include "what you're filtering out", such as "spam filter".
So it will probably be equivocal in any case.
Personally at first glance I'd think that the word would be what is being filtered out, so I'd be more comfortable with punctuation_filter, but it's probably subjective.
So it would be better to find a less ambiguous name (although in some cases it's ok to use something ambiguous and let each programmer understand what it does by looking at it's source; what's much more important is to be consistent, you sure don't want a punctuation_filter that filters out punctuation and a num_filter that lets only numbers through in the same code).
An idea for a clearer name might be LeaveOnlyAlphanums.
Yes, it's an assertion rather than a noun, but the only unambiguous noun phrases you could use are probably FilterThatLeavesOnlyAlphanums, or ThingThatLeavesOnlyAlphanums.
An additional idea is to use FilterOut instead of simply Filter. That's for sure unambiguous, although it's an assertion as well.
If you're looking for something with some sort of authority in the field of programming, I found an explicit mention that filter is ambiguous in The Art of Readable Code.

Related

Is it acceptable to use `to` to create a `Pair`?

to is an infix function within the standard library. It can be used to create Pairs concisely:
0 to "hero"
in comparison with:
Pair(0, "hero")
Typically, it is used to initialize Maps concisely:
mapOf(0 to "hero", 1 to "one", 2 to "two")
However, there are other situations in which one needs to create a Pair. For instance:
"to be or not" to "be"
(0..10).map { it to it * it }
Is it acceptable, stylistically, to (ab)use to in this manner?
Just because some language features are provided does not mean they are better over certain things. A Pair can be used instead of to and vice versa. What becomes a real issue is that, does your code still remain simple, would it require some reader to read the previous story to understand the current one? In your last map example, it does not give a hint of what it's doing. Imagine someone reading { it to it * it}, they would be most likely confused. I would say this is an abuse.
to infix offer a nice syntactical sugar, IMHO it should be used in conjunction with a nicely named variable that tells the reader what this something to something is. For example:
val heroPair = Ironman to Spiderman //including a 'pair' in the variable name tells the story what 'to' is doing.
Or you could use scoping functions
(Ironman to Spiderman).let { heroPair -> }
I don't think there's an authoritative answer to this.  The only examples in the Kotlin docs are for creating simple constant maps with mapOf(), but there's no hint that to shouldn't be used elsewhere.
So it'll come down to a matter of personal taste…
For me, I'd be happy to use it anywhere it represents a mapping of some kind, so in a map{…} expression would seem clear to me, just as much as in a mapOf(…) list.  Though (as mentioned elsewhere) it's not often used in complex expressions, so I might use parentheses to keep the precedence clear, and/or simplify the expression so they're not needed.
Where it doesn't indicate a mapping, I'd be much more hesitant to use it.  For example, if you have a method that returns two values, it'd probably be clearer to use an explicit Pair.  (Though in that case, it'd be clearer still to define a simple data class for the return value.)
You asked for personal perspective so here is mine.
I found this syntax is a huge win for simple code, especial in reading code. Reading code with parenthesis, a lot of them, caused mental stress, imagine you have to review/read thousand lines of code a day ;(

ABNF rule `zero = ["0"] "0"` matches `00` but not `0`

I have the following ABNF grammar:
zero = ["0"] "0"
I would expect this to match the strings 0 and 00, but it only seems to match 00? Why?
repl-it demo: https://repl.it/#DanStevens/abnf-rule-zero-0-0-matches-00-but-not-0
Good question.
ABNF ("Augmented Backus Naur Form"9 is defined by RFC 5234, which is the current version of a document intended to clarify a notation used (with variations) by many RFCs.
Unfortunately, while RFC 5234 exhaustively describes the syntax of ABNF, it does not provide much in the way of a clear statement of semantics. In particular, it does not specify whether ABNF alternation is unordered (as it is in the formal language definitions of BNF) or ordered (as it is in "PEG" -- Parsing Expression Grammar -- notation). Note that optionality/repetition are just types of alternation, so if you choose one convention for alternation, you'll most likely choose it for optionality and repetition as well.
The difference is important in cases like this. If alternation is ordered, then the parser will not backup to try a different alternative after some alternative succeeds. In terms of optionality, this means that if an optional element is present in the stream, the parser will never reconsider the decision to accept the optional element, even if some subsequent element cannot be matched. If you take that view, then alternation does not distribute over concatenation. ["0"]"0" is precisely ("0"/"")"0", which is different from "00"/"0". The latter expression would match a single 0 because the second alternative would be tried after the first one failed. The former expression, which you use, will not.
I do not believe that the authors of RFC 5234 took this view, although it would have been a lot more helpful had they made that decision explicit in the document. My only real evidence to support my belief is that the ABNF included in RFC 5234 to describe ABNF itself would fail if repetition was considered ordered. In particular, the rule for repetitions:
repetition = [repeat] element
repeat = 1*DIGIT / (*DIGIT "*" *DIGIT)
cannot match 7*"0", since the 7 will be matched by the first alternative of repeat, which will be accepted as satisfying the optional [repeat] in repetition, and element will subsequently fail.
In fact, this example (or one similar to it) was reported to the IETF as an erratum in RFC 5234, and the erratum was rejected as unnecessary, because the verifier believed that the correct parse should be produced, thus providing evidence that the official view is that ABNF is not a variant of PEG. Apparently, this view is not shared by the author of the APG parser generator (who also does not appear to document their interpretation.) The suggested erratum chose roughly the same solution as you came up with:
repeat = *DIGIT ["*" *DIGIT]
although that's not strictly speaking the same; the original repeat cannot match the empty string, but the replacement one can. (Since the only use of repeat in the grammar is optional, this doesn't make any practical difference.)
(Disclosure note: I am not a fan of PEG. So it's possible the above answer is not free of bias.)

AreValid or IsValid? Naming bools that refer to multiple items

This probably sounds like an obvious one for experienced coders but for me who codes only occasionally AreValid seems to get lost in the code. So I am tempted to use IsValid, as long as the name is in plural form e.g. AreUserInputsValid but what do the naming conventions say?
I think in the most languages it is preferred to use the non-plural notation. So you can define it like IsUserInputValid (Notice Input instead of Inputs).
Input can be considered as a 'group' and therefore can be named as with Is. This goes the same for array/List implementation IsArrayValid, still it can has more than 1 entries, yet you will 'group' it by the name of array.

Localizable.strings - Why do I need to put the placeholder in the key?

In Localizable.strings file, why is it necessary to put placeholders in the key.
Assuming you use a dot notation like;
"welcome-back.label" = "welcome back, %#"
I've seen examples where they mix placeholders and dot notation something like this;
"welcome-back %#.label" = "welcome back, %#"
^ The above might be incorrect.
But what I don't understand is why you even need the placeholder at all in the key when its just a pointer to a value.
Can someone shed light on this?
Many thanks
You don't need it in the key, it's there to make life easier for people who read the code in the future so they can easily tell that a parameter should be passed, what it's for and therefore which variable should be used. If you want to use some other specification to indicate this that's fine. If you want to make it super terse and hard to use that's also fine, just discouraged...
NSLocalizedString will replace the string on the left hand side with the string on the right hand side. The string on the right hand side must obviously be the correct string for the situation, the string on the left hand side can be anything you want. You could use keys "1", "2", "3" etc and it would work (although you would go mad).
You can improve your life as a developer with the right strategies. I tend to never use plain english text as the key, because the same English word can have many different translations (for example "key" in German can be Taste, Schlüssel, Tonart and lots of other things). Instead I write some text that describes what the text is used for.
And to avoid problems when you type in the key incorrectly, which the compiler has no chance to find, I tend to use #define statements for the keys. Much easier to keep just a list of #defines and your localizable.strings in sync, and the compiler will tell you if you misspell a #defined constant.
And I tend to use the word "format" for strings that are format strings and not used directly. So I might have in one header file
#define kWelcomeBackLabelTitleFormat #"WelcomeBackLabelTitleFormat"
and in the localizable.strings file
"WelcomeBackLabelTitleFormat" = "welcome back, %#";
(The #define saves you if you used "WelcomebackLabelTitleFormat" by mistake with a lowercase b).

How exact phrase search is performed by a Search Engine?

I am using Lucene to search in a Data-set, I need to now how "" search (I mean exact phrase search) mechanism has been implemented?
I want to make it able to result all "little cat" hits when the user enters "littlecat". I now that I should manipulate the indexing code, but at least I should now how the "" search works.
I want to make it able to result all "little cat" hits when the user enters "littlecat"
This might sound easy but it is very tough to implement. For a human being little and cat are two different words but for a computer it does not know little and cat seperately from littlecat, unless you have a dictionary and your code check those two words in dictionary. On the other hand searching for "little cat" can easily search for "littlecat" aswell. And i believe that this goes beyong the concept of an exact phrase search. Exact phrase search will only return littlecat if you search for "littlecat" and vice versa. Even google seemingly (expectedly too), doesnt return "little cat" on littlecat search
A way to implement this is Dynamic programming - using a dictionary/corpus to compare your individual words against(and also the left over words after you have parsed the text into strings).
Think of it like you were writing a custom spell-checker or likewise. In this, there's also a scenario when more than one combination of words may be left over eg -"walkingmydoginrain" - here you could break the 1st word as "walk", or as "walking" , and this is the beauty of DP - since you know (from your corpus) that you can't form legitimate words from "ingmydoginrain" (ie rest of the string - you have just discovered that in this context - you should pick the segmented word as "Walking" and NOT walk.
Also think of it like not being able to find a match is adding to a COST function that you define, so you should get optimal results - meaning you can be sure that your text(un-separated with white spaces) will for sure be broken into legitimate words- though there may be MORE than one possible word sequences in that line(and hence, possibly also intent of the person seeking this)
You should be able to find pretty good base implementations over the web for your use case (read also : How does Google implement - "Did you mean" )
For now, see also -
How to split text without spaces into list of words?